Types of Errors Assignment NAME:
Label each of the following statements as:
Type 1 Error (False Positive) or Type 2 Error (False
Negative).
1.A
blood test failing to detect the disease it was designed to detect, in a
patient who really has the disease.
2.An
experiment indicating that a medical treatment should cure a disease when in
fact it does not.
3.A
fire alarm going off indicating a fire when in fact there is no fire.
4.Convicting
an innocent person.
5.Letting
a guilty person go free.
6.Computer
application that classifies authorized users as imposters.
7.Computer
application that classifies imposters as authorized users.
8.When
spam
filtering or spam blocking techniques wrongly classify a legitimate
email message as spam and, as a result, interfere with its
delivery.
9.When
antivirus
software wrongly classifies an innocuous file as a virus.
10. A
shepherd Òcrying wolfÓ when there is no wolf present.
11. Security
alarm not detecting a bomb being brought onto a plane.
12. Security
alarm identifying an innocent traveller as a terrorist.
False positives are routinely
found every day in airport security screening, which are
ultimately visual inspection systems. The installed
security alarms are intended to prevent weapons being brought onto aircraft;
yet they are often set to such high sensitivity that they alarm many times a day
for minor items, such as keys, belt buckles, loose change, mobile phones, and
tacks in shoes.
The ratio of false positives
(identifying an innocent traveller as a terrorist) to true positives (detecting
a would-be terrorist) is, therefore, very high; and because almost every alarm
is a false positive, the positive predictive value of these screening
tests is very low.
The relative cost of false
results determines the likelihood that test creators allow these events to
occur. As the cost of a false negative in this scenario is extremely high (not
detecting a bomb being brought onto a plane could result in hundreds of deaths)
whilst the cost of a false positive is relatively low (a reasonably simple
further inspection) the most appropriate test is one with a low statistical specificity but high statistical sensitivity (one that allows a high rate of
false positives in return for minimal false negatives).
In the practice of medicine,
there is a significant difference between the applications of screening and testing.
Screening
involves relatively cheap tests that are given to large populations, none of
whom manifest any clinical indication of disease (e.g., Pap
smears).
Testing
involves far more expensive, often invasive, procedures that are given only to
those who manifest some clinical indication of disease, and are most often
applied to confirm a suspected diagnosis.
For example, most states in the
USA require newborns to be screened for phenylketonuria
and hypothyroidism, among other congenital disorders. Although they display a high rate of
false positives, the screening tests are considered valuable because they
greatly increase the likelihood of detecting these disorders at a far earlier
stage.
The simple blood tests used to
screen possible blood donors for HIV and hepatitis
have a significant rate of false positives; however, physicians use much more
expensive and far more precise tests to determine whether a person is actually
infected with either of these viruses.
Perhaps the most widely discussed
false positives in medical screening come from the breast cancer screening
procedure mammography. The US rate of false positive
mammograms is up to 15%, the highest in world. One consequence of the high
false positive rate in the US is that, in any 10-year period, half of the
American women screened receive a false positive mammogram. False positive
mammograms are costly, with over $100 million spent annually in the U.S.
on follow-up testing and treatment. They also cause women unneeded anxiety. As
a result of the high false positive rate in the US, as many as 90–95% of
women who get a positive mammogram do not have the condition. The lowest rate
in the world is in the Netherlands, 1%. The lowest rates are
generally in Northern Europe where mammography films are read twice and a high
threshold for additional testing is set (the high threshold decreases the power
of the test).
The ideal population screening
test would be cheap, easy to administer, and produce zero false-negatives, if
possible. Such tests usually produce more false-positives, which can
subsequently be sorted out by more sophisticated (and expensive) testing.
False negatives and false
positives are significant issues in medical
testing. False negatives may provide a falsely reassuring message to
patients and physicians that disease is absent, when it is actually present.
This sometimes leads to inappropriate or inadequate treatment of both the
patient and their disease.
Which Error Is Better?
By thinking in terms of false
positive and false negative results, we are better equipped to consider which
of these errors are better. Suppose you are designing a medical screening for a
disease. Is a Type I or a Type II error better? A false positive may give our
patient some anxiety, but this will lead to other testing procedures.
Ultimately our patient will discover that the initial test was incorrect.
Contrasted to this, a false negative will give our patient the incorrect
assurance that he does not have a disease when he in fact does. As a result of
this incorrect information, the disease will not be treated. If we could choose
between these two options, a false positive is more desirable than a false
negative.
Now suppose that you have been put on trial for murder. The null hypothesis here is that you are not guilty. Which of the two errors is more serious? Again, it depends. A Type I error occurs when you are found guilty of a murder that you did not commit. This is a very dire outcome for you. A Type II error occurs when you are guilty but are found not guilty. This is a good outcome for you, but not for society as a whole. Here we see the value in a judicial system that seeks to minimize Type I errors.