Steven Ringer, MD, Ph.D., Director of Newborn Services Brigham and Women's Hospital, Harvard Medical School, Boston MA
Richard Slavin, Technical Director of Neonatal Respiratory Therapy, Brigham and Women's Hospital, Harvard Medical School, Boston MA
Marjorie L Icenogle, Ph.D., Assistant Professor of Management University of South Alabama, Mobile, Alabama
Naomi Seiler, Research Assistant Harvard Medical School
Steven M. Zimmerman, Ph.D., Professor of Quality and Systems Management
University of South Alabama, Mobile, Alabama
Abstract
In statistical process control analysis (SPC) the use of three-sigma limits has been the standard practice for more than 30 years. However, as we increasingly rely on automated data collection methods, which collect vital sign data at rapid rates, the continued acceptance and use of three sigma limits as appropriate for biological data must be examined.
Clinical vital sign data may be collected as fast as heartbeat to heartbeat. This rate of data collection, appears to increase the number of chance caused outliers when three sigma control limits are used. This study investigates the accuracy of this impression and then identifies ways to eliminate or reduce the high probability of chance outliers.
Three methods available to reduce the number of chance caused outliers per subgroup are: (a) to require that vital sign data use a run of points beyond the three sigma limits before alarms are sounded or limits are recalculated, (b) increase the number of observations in each subgroup to reduce the risk of outliers, or (c) to increase the range to include the use of four, five, or six sigma limits.
This paper examines the selection of an outlier rule (2, 3, 4, ... sigma limits) as a function of data generating/sampling rates. The conclusion is that in environments where data are collected at a rapid rate, type I errors may be reduced by increasing subgroup size or by using six-sigma control limits.
Introduction
In statistical process control (SPC), control limits are set to determine when a change in a process has occurred. Comparisons between base period subgroups (the subgroups used to calculate the control limits) and sample subgroup averages and standard deviations indicate if a subgroup has the same or different mean and distribution as the base period subgroup. Control limits (sigma limits) are set to balance type I and type II errors. A type I error results when the user assumes a change when no actual change occurred. A type II error results when the user assumes no change and a change has occurred.
Initial attempts to use automated computer data collection and statistical process control (SPC) for clinical data analysis indicated that the SPC tended to show many chance outliers (false alarms). Experience has shown that if there is an excessive number of false alarms, then caregivers may ignore the alarms. Medical experts suggested that biological data are somehow different and outliers should be expected. At first the researchers accepted the notion that biological data are different; but over time we began to ask following questions:
How are biological data different?
Why are the biological data sets different?
In what ways do these differences effect the analyses?
Are the differences due to causes other than the source of the data?
Before computer data collection methods were developed, SPC control charts were produced after vital sign data were collected by nurses each hour. No problems were apparent with using three sigma limits or the theory of run rules listed in Table 1 to indicate a change in the patient's vital signs. However, when vital sign data were collected using computer monitoring, the number of outliers increased.
Table 1 Run rules
The researchers examined the use of the theory of runs for a SPC grand mean control chart (similar to the EWMA chart) for automated collected data and found that when a subgroup indicated a change in process, the next subgroup was likely to be beyond the three sigma control limits. Due to high autocorrelation in the data, when one rule indicated a change in the process, subsequent subgroups also supported application of succeeding rules. Therefore, autocorrelation nullified the need for the theory of runs(1). This finding provided justification to concentrate on a single out of limits rule. The biological data indicated the presence of a number of by chance alone (self-correcting) outliers, but fewer outliers were recognized than when we utilized the theory of runs. Slowly we realized that the key difference was not the "biological" data, but the data collection rate.
Timing of data collection
The researchers' industrial experience was similarly limited to collecting a subgroup of data once a day or once per hour. For these data sets, the three-sigma rule and the theory of run rules were adequate. Therefore, the researchers were not prompted to search for other rules.
Clinical data display and collection rates for heartbeats vary from once per heartbeat, to once per second, to once every two seconds, to once every five seconds, and so on. Using the industrial standards of practice (for newborns with hear rates up to 200 beats per minute or more) of sampling subgroup sizes of n = 5 and three (3) sigma limits, one can expect one outlier by chance alone every 10 minutes or approximately 6 per hour. An unacceptably high rate. Figure 1 illustrates these results with a simple mathematical equation.
Figure 1 Expected Outliers as Control Limits Widen (subgroup size n=5)
Possible Solutions
The number of outliers may be eliminated or reduced with one or more changes: (a) skip data between subgroups; (b) increase the subgroup size; (c) use a series of subgroups beyond three sigma rather than one subgroup; or (d) use 4, 5, or 6 sigma limits rather than 3 sigma limits.
The physicians consulted during the study objected to skipping data observations. Realizing that computerized data collection provided the opportunity to make decisions on more than one human recorded data point per « hour, physicians believed that it would be bad practice not to use all available information. This reaction persisted despite the fact that clinical monitors record data at varying rates; some clinical monitors send data at each heartbeat, while others send data every second, every two seconds, every five seconds, and slower (2).
When the data generation rates vary, it is common to change the subgroup size. A subgroup size of one is sometimes selected when the data generation rates are slow, such as once a day or once an hour. We have used subgroup sizes of n=20 for newborn babies using monitors with high data generation rates; however, some caregivers have reacted negatively to high subgroup sizes fearing they would loose detailed information. Figure 2 illustrates the spreadsheet calculations for a subgroup size of n=20. Using a subgroup size of n=5 and 3 sigma limits the number of expected outliers per hour is 6.5 while using subgroup size of 20 and three sigma limits the number of expected outliers per hour is 1.6.
Figure 2 Expected Outliers as Control Limits Widen (n=20)
Like all processes, biomedical processes (patient vital signs) may have small shifts and large shifts or jumps. A small-sustained shift can be identified from data of subgroups with results beyond the three-sigma limits. Our hypothesis is that using a single point beyond six-sigma limits rather than a run beyond three-sigma limits will more correctly identify a large substantial change. Analytical and simulation analysis indicates that when 6-sigma is used rather than 3-sigma there is a 100 percent reduction of outliers that occur by chance alone.
To compare 6-sigma to 3-sigma decision making using real observations, a data set was selected which was collected in June 1997 for early term newborns that had minimal respiratory problems. The criterion for comparing 3-sigma limits with 6-sigma limits was the number of automatic resets generated by the software for oxygen saturation and heart rates. Subgroups of 20 observations were compared. The oxygen saturation reset rule required a run of 10 subgroups in a row beyond the control limits or if the value of oxygen saturation was over 92 percent then 25 subgroups in a row beyond the control limit were required. The heart-rate reset rule required a run of 10 subgroups in a row beyond the control limits.
Data were charted using both 3-sigma and 6-sigma limits, to compare the number of resets. Each statistical process control chart represented approximately 35 minutes of observations. Figure 3 illustrates an oxygen saturation and heart rates screen by showing one screen using 3-sigma limits. Similarly, Figure 4 illustrates the screens for 6-sigma limits.
Figure 3 Using 3 sigma limits
Figure 4 Using 6 sigma limits
Figure 5 summarizes the screen by screen analysis of all the original data files collected at Brighman and Women's Hospital (Harvard Medical School) in the NICU for short-term patients during the summer 1997. The chart shows the percentages of changes in oxygen saturation when 3 sigma and 6 sigma limits are compared. Comparisons of the data screens found that on 68 percent of the screens no differences were found between 3-sigma and 6-sigma limits. In 19 percent of the observations, only one significant difference was found. Two differences were found in only three percent of the observations and on one screen, the 6-sigma limit reset one more time than the 3-sigma limit (a negative change). In eight percent of the observations, the timing of the changes differed.
Figure 5
Figure 6 shows the number of heart rate resets based on comparisons of the 3 and 6-sigma limits. Three-sigma limits resulted in more resets than 6-sigma limits in almost « of the comparisons. In 32 percent of the screens, 3-sigma limits resulted in one more change than 6-sigma limits. In 11 percent, two more changes were found, and in 5 percent, three changes more than 6-sigma. Timing changes were found in 24 percent of the screens, and no change was found in 27 percent. The larger number of resets using 3-sigma limits suggests that some of these resets were due to chance and were eliminated by using 6-sigma limits. In addition, we know that oxygen saturation is a more stable process than heart rate and reacts to a different set of causal factors.
Figure 6
Conclusions
From this study, we have learned that when oxygen saturation is monitored there is almost no difference between using 3 or 6-sigma control limits. The researchers have experience using 3-sigma limits for both stable newborns and critical newborns. The analysis of stable newborn's oxygen saturation indicating a 2 percent difference between 3-sigma and 6-sigma limits does not justify a change.
The researchers know that heart rate behavior is often self-correcting, that is, the heart rate changes for a short period of time and then returns to its prior level. This analysis of stable newborns' heart rates indicates a 30 percent difference between 3-sigma and 6-sigma limits does justify an additional study to examine the effect of changing control limits on a more unstable set of patients to identify the most appropriate control limit.
References
1. Zimmerman, Steven M., Robert N. Zimmerman, Lonnie D. Brown, and Shannon S. Brown, "USING MOVING AVERAGE PROCESS CONTROL CHARTS IN BIOMEDICAL APPLICATIONS," Proceedings- Ninth International Conference of the Israel Society of Quality Assurance, 1992, November 1992, pages 761-764.
2. Ringer, Steven, MD, Ph.D., Judith Azok, RN, MSN, Marjorie L. Icenogle, Ph.D., Steven Zimmerman, Ph.D., "Nurse's Role in Observing and Recording Vital Sign Data, homepage: "http: // www.usouthal.edu/usa/iems/table.htm."
3. Ringer, Steven, MD, Ph.D., Kimberly Zimmerman, R.Ph., Marjorie L. Icenogle, Ph.D., Steven Zimmerman, Ph.D., "Using SPC to Analyze Patient Recorded Vital Sign Data" "http: // www.usouthal.edu/usa/iems/table.htm."
4. Fackler, Jame, MD, Christine Tsien, Warren Beatty, Ph.D., and Steve Zimmerman, Ph.D., "Experimental Design: One Observations Out-of-specification limits system vs SPC Methods for Patient Vital Sign Management", "http: // www.usouthal.edu/usa/iems/table.htm."