Site hosted by Angelfire.com: Build your free website today!
www.angelfire.com/dragon/letstry
arnab_et@hotmail.com

HomeTutorialsAbout meLinksEtc

Tutorials:
DFACompilerAssemblyHardwareR

R tutorial:
Page 1Page 2Page 3

Testing hypotheses

Testing hypothesis is an important branch of statistics. It is concerned with answering "Yes/No" questions based on data. Here is an example. Suppose that it is known that the average lifetime of a Philip's bulb is 500 hours, i.e., if a Philip's bulb is left lighted then it will blow out after 500 hours of continuous burning. This is called the lifetime of the bulb. Of course, not all bulbs are identical. Due to manufacturing variations some bulbs will last slightly more than 500 hours, while some will last slightly less. Let us imagine that there is a new manufacturing technology that increases the average lifetime of bulbs. Philip's company is trying to decide whether to adopt this new technology or not. (Remember that adopting a new technology involves many troubles and expenditure: new machinery has to be bought, workers have to be trained etc. So unless there is a strong evidence that the new technology is decidedly better, the company would prefer to continue with the existing method.) For this, the company first implements the new method only on a small experimental basis and manufactures just 10 bulbs. These new bulbs are left on until they blow out. The lifetimes are found to be
510, 505, 498, 511, 490, 495, 512, 500, 497, 507 hrs.
Based on this data the company has to decide if the average lifetime of bulbs produced by the new technology is indeed more than 500 or not. Let us compute the average of these 10 numbers:
(510+ 505+ 498+ 511+ 490+ 495+ 512+ 500+ 497+ 507)/10 = 502.5
Since this average is more than 500 can we immediately conclude that the new technology is better? No, because the difference between 502.5 and 500 may be due to just manufacturing variation. After all, even a bad student can sometimes get more marks than a good student in some examination just by chance! So while we see that 502.5 is larger than 500, the company needs a way to know whether it is significantly larger. Test of hypotheses is the way to do this. There are various tests to suit different needs. We shall learn five tests here:
  1. One sample t test
  2. Paired sample t test
  3. Two sample t test
  4. Chi-squared goodness of fit test
  5. Chi-squared test for independence

One sample t test

Consider the Philip's bulb example once again. We need to perform a test called One sample t test here. For this first store the data in a variable called life, say,
life <- c(510, 505, 498, 511, 490, 495, 512, 500, 497, 507)
We want to see if on an average these numbers are more than 500. The R command for this is

t.test(life,alternative="greater",mu=500)

The phrase alternative="greater",mu=50 tells R to check if the lifetimes on an average is greater than 500. The output of R will look something like

        One Sample t-test

data:  life 
t = 1.0456, df = 9, p-value = 0.1615
alternative hypothesis: true mean is greater than 500 
95 percent confidence interval:
 498.1171      Inf 
sample estimates:
mean of x 
    502.5 

In this course we shall not learn to understand the entire output. Rather, we shall need only one number: the p-value. If it is less than 0.05, we shall conclude that average lifetime of bulbs produced by the new technology is indeed greater than 500. In our case, however, the p-value is 0.1615, which is greater than or equal to 0.05. So we conclude that the new technology does not really produce bulbs with longer lifetimes.

Exercise: A certain type of plant takes 13 months before bearing fruits. A new kind of fertilizer claims to make the plant grow faster, so that it can bear fruits before 13 months. The fertilizer is applied to eight plants and their fruit-bearing ages are found to be
11.0, 10.4, 13.5, 7.2, 8.0, 12.1, 12.6 months.
Perform a one-sample t-test to decide if the fertilizer is effective or not. Notice that unlike the bulb example here we want to test if the given numbers are less than the specified value 13 on an average. So in the t.test command you should write "less" instead of "greater".

Paired sample t test

In all the tests so far, we are comparing a given data set to some specified number (500 in the bulb example, 13 in the plant exercise.) In practice, such a number may not be known.

Example: In steel manufacturing process, one has to add carbon to the raw iron in order to increase its strength. A manufacturer wants to check if adding 5 grams of carbon indeed improves the strength significantly. However, the increase of strength of the steel depends not only on the quantity of carbon added, but also on the quality of iron used. So the manufacturer considers 8 different types of irons and takes two pieces of each type. He measures the strength of the first piece of each type, and gets the numbers
200,215, 210, 190, 199, 210, 213, 215
Here a larger number means more strength. Now he melts the other pieces separately, adds carbon to them and solidifies them into 8 pieces of steel. The strengths of these 8 pieces are, respectively,
210, 213, 210, 199, 200, 212, 211, 216.
Thus, he has two sets of 8 numbers. He wants to test if the second set is larger than the first set on an average. For this he first stores the data in two variables, iron and steel, say,

iron <- c(200,215, 210, 190, 199, 210, 213, 215)
steel <- c(220, 213, 220, 199, 200, 212, 221, 230)

Then he performs paired sample t-test

t.test(iron,steel,alternative="less",paired=T)

The result of the test will be something like


        Paired t-test

data:  iron and steel 
t = -3.0117, df = 7, p-value = 0.009807
alternative hypothesis: true difference in means is less than 0 
95 percent confidence interval:
      -Inf -2.921101 
sample estimates:
mean of the differences 
                 -7.875 


Since the p-value is less than 0.05, the manufacture concludes that the addition of the extra carbon indeed increases the strength.

Two sample t test

This case is similar to the paired sample test, except that the two samples need not have any pairing. In fact, the two samples may even be of two different sizes.

Example: The IQ levels of children are supposed to depend on the method of education imparted to them. In this sense Montessori method is claimed to be better than the traditional method. To test if this claim is true we take a bunch of 15 children of similar background. We send 8 of them to a Montessori school, while the remaining 7 are sent to a traditional school. After a year the IQ levels of the 15 children are measured. The Montessori kids get scores:
88, 90, 87, 95, 70, 90, 93, 87
The traditional children get
70, 99, 40, 46, 59, 71, 75
We want to test if the Montessori method is really better. For this we store the scores in two variables, mont and trad, say:

mont <- c(88, 90, 87, 95, 70, 90, 93, 87)
trad <- c(70, 99, 40, 46, 59, 71, 75)

Now we perform two-sample t-test:

t.test(mont, trad, alternative="greater", paired=F)

Then R will give us the following information:


        Welch Two Sample t-test

data:  mont and trad 
t = 2.7479, df = 7.556, p-value = 0.01327
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 6.92999     Inf 
sample estimates:
mean of x mean of y 
 87.50000  65.71429 


Since the p-value is less than 0.05, we conclude that the Montessori method is indeed more effective than the traditional method.

Chi-squared goodness of fit test

In a game of Ludo, we use a die to determine the moves. We assume that the die is a fair die, that is each of its six numbers have 1/6 chance of turning up. Certain gambling games depend crucially on the fairness of the die. It is only natural that crooked gamblers carry dice that are not fair, i.e., where the numbers have different chances of turning up. It is important, therefore, to ascertain that a die is fair, before you agree to use it. How does one do it? You may roll it a large number of times, say, 600. A fair die will show each of the numbers about 600/6=100 times. Let us assume that your die shows 1,2,...,6 the following numbers of times, respectively,
87, 97, 105, 100, 107, 108.
You want to test if these numbers are all close enough to the expected number 100. For this we perform the chi-squared goodness of fit test. First store the above numbers in a variable called die, say:

die <- c(87, 97, 105, 100, 107, 108)

Then store the probabilities in another variable called fair, say:

fair <- c(1/6,1/6,1/6,1/6,1/6,1/6)

Now use the chisq.test command of R:

chisq.test(die,p=fair)

You will see an output like the following.


        Chi-squared test for given probabilities

data:  die 
X-squared = 3.1126, df = 5, p-value = 0.6826


As before, you should look at the p-value, which is 0.6826 in this case. Since it is greater than 0.05, we conclude that the die is indeed fair. In other words, the difference between the observed numbers and 100 is solely due to chance.

Pearson's chi-squared test for independence

In our everyday life we have to often check if two different things are related. For instance, we face questions like:
  1. Is there any association between smoking and lung cancer?
  2. Are left-handed persons more intelligent?
  3. Some students prefer research work while some others prefer to get a job. Is it true that the gender of the student has to do with this choice? (e.g., can we say things like "Girls prefer research," or "Boys hate jobs")
In such cases we have to use Pearson's chi-squared test for independence.

Example: It is sometimes claimed that research is only for the students from well-to-to families. How can a student who has to support his poor family afford to reject a job offer? On the other hand, it is also well known some of the best researchers and scientists have come from absolutely poor families. So it is a debatable issue whether there is really any association between the research potential of a student and the financial status of his/her family. To settle the debate one would collect data about the research potential of students from different financial backgrounds. For instance, we can visit colleges and ask the students about the family background and also ask them if they plan to follow a research career or a job career. Here is a possible outcome of such a survey.
Low incomeMedium incomeHigh income
Research346050
Job220110123
Sch a data set is called a contingency table. Based on this we want to test if career option has any association with financial status of the family. First store the data set in a matrix called survey, say.

survey <- matrix(c(34,220, 60, 110, 50, 123),nrow=2)

Notice that we have listed the numbers column by column. Now use the chisq.test command to perform the test:

chisq.test(survey)

The result will look like:


        Pearson's Chi-squared test

data:  survey 
X-squared = 29.7491, df = 2, p-value = 3.468e-07


The p-value is 3.468e-07, which is the computer's way of writing 3.468 times 10-7, i.e., 0.0000003468. Since this is much smaller than 0.05, the data set provides strong evidence that the choice between research and job is associated with financial status of the family.


Prev
© Arnab Chakraborty (2007)