Site hosted by Angelfire.com: Build your free website today!
www.angelfire.com/dragon/letstry
arnab_et@hotmail.com

HomeTutorialsAbout meLinksEtc

Tutorials:
DFACompilerAssemblyHardwareR

R tutorial:
Page 1Page 2Page 3

Least squares regression

In science we often have to find relation between two different quantities. A physicist, for instance, relates the volume of a gas to its pressure when temperature is constant (Boyle's law). Another example is Hooke's law that relates the elongation of a spring with the weight hung from it. Such relations are found by applying statistical methods to data collected from experiments.

Example: Consider the following Hooke's law experiment:

Hooke's law apparatus

Different known weights (w) are hung from the spring and the corresponding lengths (L) are measured. Here is the data
w kgL cm
0.531.3
1.031.9
1.533.0
2.034.2
2.535.1
3.036.1
3.536.8
4.037.9
4.539.0
5.040.1
For your convenience the same data set has been stored in the file spring.dat. To statistically analyse this data set using R we have to first read the data into R:

hook <- as.matrix(read.table("spring.dat"))

Now hook is a 10 by 2 matrix, the first column stores w and the second column stores L. Let us store the columns in two separate variables:

w <- hook[,1]
L <- hook[,2]

Now make a scatterplot of L versus w:

plot(w,L)

Notice that this plot consists of 10 points, one point for each (w,L) pair. The points lie approximately on a straight line. It is this linear pattern that first led Hooke to postulate his famous law:
L = a + b w,
where a and b are some constants depending on the nature of the spring. In particular, b is called Young's modulus of the spring. Let us try to estimate Young's modulus of our spring based on the given data. For this we have to fit a straight line to the scatterplot. Note that due to experimental errors, the points do not lie exactly on a straight line. To fit a straight line to the scatterplot we shall use the lm command:

val <- lm(L ~ w)
val$coeff

Notice the expression L ~ w carefully. The ~ character is called a tilde and can be found on the keyboard just below the Esc in the top, left hand corner. The expression L ~ w is called a formula in the R language. It stands for the straight line
L = a + b w.
The command lm(L ~ w) asked R to estimate the coefficients a and b from the data set so that the line passes through all the points as nearly as possible. The command val$coeff asks R to print the values of the coefficients, i.e., the estimated values of a and b. The output of R looks something like:


(Intercept)           w 
  30.133333    1.966061 


This says that a equals 30.133333 and b equals 1.966061. Thus, our spring has Young's modulus equal to 1.966061 cm/kg. The line L = a + b w for these a and b should approximately pass through all the points in the scatterplot? Check this by plotting the line on the scatterplot:

plot(w,L)
abline(val$coeff)

What will be the length of the spring if you hang weights 2.1 kg or 2.4 kg or 3.7 kg? You can compute the lengths by directly using the formula
L = a + b w.
But R can do that for you as follows. First make a list of all the values of w for which you want to predict the lengths:

newWs <- data.frame(w=c(2.1,2.3,3.7))

Now use the predict command:

predict(val,newWs)

The above procedure is of great use in many scientific applications. It is commonly called Least Squares Regression method. Finding the relation between two or more variables is called Regression in statistics. The particular technique employed by the lm command of R is called the Least Squares method.

Exercise: When a catapult is fired, its range (D) is related to the stretch (s) of the rubber band before release. The more you stretch it, the further your volley goes.

Catapult

Elementary physics tells us that
D = a + b sqrt(s),
where sqrt denotes square root, and a,b are two constants depending on the nature and position of the catapult and the weight of the volley. I have the following data set that gives the distance traveled by the volley for different amounts of stretch. (All my volleys have approximately the same weight.)
s cmD metre
202.37
253.12
303.94
354.38
405.35
455.82
475.88
506.03
The same data set is stored in catapult.dat. Approximately how far will the volley go if I stretch the catapult by 55 cm?

Exercise: Again consider the same data set. I want to send my volley to a distance of 7 metres. How much should I stretch my catapult? Notice that here I want to predict s based on D. So bring s to the l.h.s. to get the equation:
s = A + B D2,
where A,B are constants depending on the catapult and the volley.


PrevNext
© Arnab Chakraborty (2007)