|
www.angelfire.com/dragon/letstry
arnab_et@hotmail.com | |
Tutorials:
R tutorial:
Testing hypotheses
Testing hypothesis is an important branch of statistics. It is concerned
with answering "Yes/No" questions based on data.
Here is an example. Suppose that it is known that the average lifetime of
a Philip's bulb is 500 hours, i.e., if a Philip's bulb is left lighted then
it will blow out after 500 hours of continuous burning. This is called the
lifetime of the bulb. Of course, not all bulbs are identical. Due to
manufacturing variations some bulbs will last slightly more than 500 hours, while
some will last slightly less. Let us imagine that there is
a new manufacturing technology that increases the average lifetime of
bulbs. Philip's company is trying to decide whether to adopt this new
technology or not. (Remember that adopting a new technology involves many
troubles and expenditure: new machinery has to be bought, workers have to
be trained etc. So unless there is a strong evidence that the new
technology is decidedly better, the company would prefer to continue with
the existing method.) For this, the company first implements the new
method only on a small
experimental basis and manufactures just 10 bulbs. These new bulbs are left on
until they blow out. The lifetimes are found to be
510, 505, 498, 511, 490, 495, 512, 500, 497, 507 hrs.
Based on this data the company has to decide if the average
lifetime of bulbs produced by the new technology is indeed more than 500
or not. Let us compute the average of
these 10 numbers:
(510+ 505+ 498+ 511+ 490+ 495+ 512+ 500+ 497+ 507)/10 = 502.5
Since this average is more than 500 can we immediately conclude that the
new technology is better? No, because the difference between 502.5 and 500
may be due to just manufacturing variation. After all, even a bad student
can sometimes get more marks than a good student in some examination just
by chance! So while we see that 502.5 is larger than 500, the company
needs a way to know whether it is significantly larger. Test of
hypotheses is the way to do this. There are various tests to suit
different needs. We shall learn five tests here:
- One sample t test
- Paired sample t test
- Two sample t test
- Chi-squared goodness of fit test
- Chi-squared test for independence
One sample t test
Consider the Philip's bulb example once again. We need to perform a test
called One sample t test here. For this first store the data
in a variable called life, say,
life <- c(510, 505, 498, 511, 490, 495, 512, 500, 497, 507)
We want to see if on an average these numbers are more than 500. The R command for
this is
t.test(life,alternative="greater",mu=500)
The phrase alternative="greater",mu=50 tells R to
check if the lifetimes on an average is greater than 500.
The output of R will look something like
One Sample t-test
data: life
t = 1.0456, df = 9, p-value = 0.1615
alternative hypothesis: true mean is greater than 500
95 percent confidence interval:
498.1171 Inf
sample estimates:
mean of x
502.5
In this course we shall not learn to understand the entire output. Rather,
we shall need only one number: the p-value. If it is less than 0.05, we
shall conclude that average lifetime of bulbs produced by the new technology is
indeed greater than 500. In our case, however, the p-value is 0.1615,
which is greater than or equal to 0.05. So we conclude that the new
technology does not really produce bulbs with longer lifetimes.
| Exercise:
A certain type of plant takes 13 months before bearing fruits. A new kind
of fertilizer claims to make the plant grow faster, so that it can bear
fruits before 13 months. The fertilizer is applied to eight plants and
their fruit-bearing ages are found to be
11.0, 10.4, 13.5, 7.2, 8.0, 12.1, 12.6 months.
Perform a one-sample t-test to decide if the fertilizer is effective or
not. Notice that unlike the bulb example here we want to test if the given
numbers are less than the specified value 13 on an average. So in
the t.test command you should write "less" instead of
"greater".
|
Paired sample t test
In all the tests so far, we are comparing a given data set to some
specified number (500 in the bulb example, 13 in the plant
exercise.) In practice, such a number may not be known.
| Example:
In steel manufacturing process, one has to add carbon to the raw iron in
order to increase its strength.
A manufacturer wants to check if adding 5 grams of carbon indeed
improves the strength significantly. However, the increase of strength of the steel depends
not only on the quantity of carbon added, but also on the quality of iron
used. So the manufacturer considers 8 different types of irons and takes
two pieces of each type. He measures the strength of the first piece of
each type, and gets the numbers
200,215, 210, 190, 199, 210, 213, 215
Here a larger number means more strength. Now he melts the other pieces separately,
adds carbon to them and solidifies them into 8 pieces of steel. The
strengths of these 8 pieces are, respectively,
210, 213, 210, 199, 200, 212, 211, 216.
Thus, he has two sets of 8 numbers. He wants to test if the second set is
larger than the first set on an average. For this he first stores the data
in two variables, iron and steel, say,
iron <- c(200,215, 210, 190, 199, 210, 213, 215)
steel <- c(220, 213, 220, 199, 200, 212, 221, 230)
Then he performs paired sample t-test
t.test(iron,steel,alternative="less",paired=T)
The result of the test will be something like
Paired t-test
data: iron and steel
t = -3.0117, df = 7, p-value = 0.009807
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -2.921101
sample estimates:
mean of the differences
-7.875
Since the p-value is less than 0.05, the manufacture concludes that the
addition of the extra carbon indeed increases the strength.
|
Two sample t test
This case is similar to the paired sample test, except that the two
samples need not have any pairing. In fact, the two samples may even be of
two different sizes.
| Example:
The IQ levels of children are supposed to depend on the method of
education imparted to them. In this sense Montessori method is claimed to
be better than the traditional method. To test if this claim is true we
take a bunch of 15 children of similar background. We send 8 of them
to a Montessori school, while the
remaining 7 are sent to a traditional school. After a year the IQ levels
of the 15 children are measured. The Montessori kids get scores:
88, 90, 87, 95, 70, 90, 93, 87
The traditional children get
70, 99, 40, 46, 59, 71, 75
We want to test if the Montessori method is really better. For this we
store the scores in two variables, mont and trad, say:
mont <- c(88, 90, 87, 95, 70, 90, 93, 87)
trad <- c(70, 99, 40, 46, 59, 71, 75)
Now we perform two-sample t-test:
t.test(mont, trad, alternative="greater", paired=F)
Then R will give us the following information:
Welch Two Sample t-test
data: mont and trad
t = 2.7479, df = 7.556, p-value = 0.01327
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
6.92999 Inf
sample estimates:
mean of x mean of y
87.50000 65.71429
Since the p-value is less than 0.05, we conclude that the Montessori
method is indeed more effective than the traditional method.
|
Chi-squared goodness of fit test
In a game of Ludo, we use a die to determine the moves. We assume that the
die is a fair die, that is each of its six numbers have 1/6 chance of
turning up. Certain gambling games depend crucially on the fairness of the
die. It is only natural that crooked gamblers carry dice that are not
fair, i.e., where the numbers have different chances of turning up. It is
important, therefore, to ascertain that a die is fair, before you agree to
use it. How does one do it? You may roll it a large number of times, say,
600. A fair die will show each of the numbers about 600/6=100 times. Let us
assume that your die shows 1,2,...,6 the following numbers of times, respectively,
87, 97, 105, 100, 107, 108.
You want to test if these numbers are all close enough to the
expected number 100. For this we perform the chi-squared goodness of
fit test. First store the above numbers in a variable called
die, say:
die <- c(87, 97, 105, 100, 107, 108)
Then store the probabilities in another variable called fair, say:
fair <- c(1/6,1/6,1/6,1/6,1/6,1/6)
Now use the chisq.test command of R:
chisq.test(die,p=fair)
You will see an output like the following.
Chi-squared test for given probabilities
data: die
X-squared = 3.1126, df = 5, p-value = 0.6826
As before, you should look at the p-value, which is 0.6826 in this
case. Since it is greater than 0.05, we conclude that the die is indeed
fair. In other words, the difference between the observed numbers and 100
is solely due to chance.
Pearson's chi-squared test for independence
In our everyday life we have to often check if two different things are
related. For instance, we face questions like:
- Is there any association between smoking and lung cancer?
- Are left-handed persons more intelligent?
- Some students prefer research work while some others prefer to get a
job.
Is it true that the gender of the student has to do with this choice?
(e.g., can we say things like "Girls prefer research," or "Boys hate jobs")
In such cases we have to use Pearson's chi-squared test for independence.
| Example:
It is sometimes claimed that research is only for the students from
well-to-to families. How can a student who has to support his poor family
afford to reject a job offer? On the other hand, it is also well known
some of the best researchers and scientists have come from absolutely poor
families. So it is a debatable issue whether there is really any
association between the research
potential of a student and the financial status
of his/her family. To settle the debate one would collect data about the
research potential of
students from different financial backgrounds. For instance, we can visit
colleges and ask the students about the family background and also ask
them if they plan to follow a research career or a job career. Here is a
possible outcome of such a survey.
| Low income | Medium income | High income |
Research | 34 | 60 | 50 |
Job | 220 | 110 | 123 |
Sch a data set is called a contingency table.
Based on this we want to test if career option has any association with
financial status of the family. First store the data set in a matrix
called survey, say.
survey <- matrix(c(34,220, 60, 110, 50, 123),nrow=2)
Notice that we have listed the numbers column by column. Now use
the chisq.test command to perform the test:
chisq.test(survey)
The result will look like:
Pearson's Chi-squared test
data: survey
X-squared = 29.7491, df = 2, p-value = 3.468e-07
The p-value is 3.468e-07, which is the computer's way of writing 3.468
times 10-7, i.e., 0.0000003468. Since this is much smaller than
0.05, the data set provides strong evidence that the choice between
research and job is associated with financial status of the family.
|
|