Steven Ringer, MD, Ph.D., Director of Newborn Services, Brigham and Women's Hospital, Harvard Medical School, Boston MA James Fackler, MD, Assistant Professor Anesthesiology, Johns Hopkins University, School of Medicine Marjorie L Icenogle, Ph.D., Assistant Professor of Management, University of South Alabama Steven Zimmerman, Ph.D., VP of Research, Biomedical Quality Control of America, Inc.
ABSTRACT
Current clinical monitoring-alarm systems provide caregivers with measures of patient vital sign data shown as: 1) waveforms on a screen; 2) raw values; and 3) alarm systems. These monitoring systems have excessive Type I errors, which means that the care provider spends time looking for a change in vital signs, when no change has occurred. Most medical research has focused on reducing Type I errors, while researchers have virtually ignored controlling Type II errors, which means not looking for a change in vital signs when a change has in fact occurred. The hypothesis of this study is that:
The use of process control charts balances Type I and Type II errors and will thus reduce the time caregivers spend looking for a change when no change has occurred, while at the same time reducing the number of incidents of not looking for a change when change has occurred.
This study demonstrates that the vital signs oxygen saturation and heart rate behave in a similar manner for two diverse patient groups relative to the application of control charts. According to these data, the number of caregiver examinations due to oxygen saturation alarms could be reduced by 86 to 97 percent, while the number of examinations due to heart rate alarms could be reduced by 77 to 93 percent. The study also demonstrates how the control chart reset rule selected affects the percentage reduction obtained.
INTRODUCTION
Current practice in clinical monitoring is to provide caregivers with patient vital sign data shown as: 1) waveforms on the monitor display; 2) raw values; and 3) alarm systems, based on specification limits established by the medical judgement available, set to warn care providers of changes in vital signs. Some monitoring devises offer limited statistical procedures, such as trend calculations; however, most monitors have no procedures or provisions for capturing data to provide evidence that can be used as the basis for clinical decisions. All monitors are provided with default values for alarms and many monitors allow caregivers to adjust the alarm values based on their best medical judgment. With the exception of when alarms sound, most often, vital sign data are only recorded once or twice per hour by the caregiver. Based on these limited data, caregivers decide which treatment is appropriate. Statistical evidence suggests that one or two measures of a patient's heartbeat per hour are not a statistically adequate sample for decision making. Especially since more reliable procedures, based on statistical process control are available.
Current clinical practice can be compared to industrial procedures used in Japan prior to Dr. Deming's quality revolution. In the authors' opinions, the only reason the one point outside a judgement specification limit works in medicine is the extra effort that is put forth by high skill caregivers.
WHY?
Current clinical practice is to search for a cause when one measurement is beyond the specification limits. Industrial practice uses statistical process control (SPC) charts to more accurately determine if a change in the process has occurred. Clinical practice is lagging behind industrial practice for a number of possible reasons, such as:
Traditionally, physicians receive limited statistical training and no training in statistical quality control because there is limited time in the current medical curriculum for the addition of any new "untried-unproven" material. Some medical schools are trying to address this problem by integrating new technology into the traditional medical courses.
One of the greatest obstacles to applying industrial quality practices to healthcare is the legal system, which threatens caregivers to follow the accepted practice to avoid litigation. In addition, the FDA protects the public by limiting the use of new untested technology. It is impossible to establish the benefits of SPC in a health care setting without testing the benefits. When SPC is utilized, patient treatments are likely to change and the user is consequentially changing the practice of medicine. Therefore, the research for this study was limited to the analysis of data using SPC in the background of the traditional care provided.
SPC is a process control procedure that identifies the natural limits of a process and informs the user of the probability of a change in the process. Using the natural limits of a process, the user can determine the appropriate control limits above and below the average (mean of the measurements of the vital sign) that will limit the probability that a specific observation exceeds the control limits. The control limits determine the range within which acceptable data points will lie. A data point that exceeds a control limit is an outlier. Current medical practice requires the caregiver to search for a cause or apply a treatment when one data point exceeds the control limits. SPC improves caregivers' decisions because the control limits can be recalculated and reset when a series of measurements exceed a control limit.
The real-time Biomedical Quality Computer (BQC) system counts the number of outliers to determine when the control limits should be reset (recalculated). Recalculating control limits adjusts the range within which all acceptable data points will fall and will affect caregiver decisions with respect to the most appropriate treatment to apply. Using the BQC system, the caregiver can control the reset rule to control the number of outliers that are required before the control limits are reset. Changing the rule used to recalculate the control limits changes the number of times the control limits are reset. As the number of observations beyond the control limits increases, Type I error decreases, but the possibility of Type II errors increases.
TELL ME A STORY
The current clinical paradigm is that vital sign movement within the specification limits has no meaning. The magic of SPC is that it helps the caregiver read the story the vital signs are writing in real-time. The caregivers often ask the quality control engineer the meaning of the movement within the specification limits. SPC shows the caregiver when vital signs change, but only the caregiver, (nurse, physician, …) can determine why the change occurred. Figure 1 illustrates a Xbar and Sigma process control chart for selected vital signs.
DESCRIPTIONS OF STUDY #1 AND STUDY #2
This analysis includes the results of two studies that were performed at two different hospitals under the supervision of different physicians who did not communicate with each other in any way. The patient group in Study #1 included children age 2 through 15. The patient group in Study #2 included premature newborns. In Study #1, a SpaceLab® clinical monitor collected data at the rate of one observation every five seconds, while in Study #2 a Nellcor 200® monitor collected a data point for each heartbeat. The objective of Study #1 was to compare the number of traditional monitor alarms to the number of alarms that would result using SPC. The objective of Study #2 was to compare the affect of all caregiver actions on the patient (including alarms) to the statistical story written by the data provided via SPC. The common aspect of the two studies was that both studies counted the number of monitor alarms for SaO2 and heart rate data, and both compared the number of traditional monitor alarms to the number of alarms that would sound based on the SPC data.
Figure 1 Xbar and Sigma Process Control Chart
TRUE ALARMS
In the preliminary experimental design, the researchers wanted to investigate if caregivers could identify if an alarm was:
There was no way to execute this experiment because the caregivers may have to react to an alarm three, four, or more times before the reason for the condition (alarm) is understood. The experiment that attempted to identify the meaning of alarms as they occurred was a total failure. In the described in this paper, alarms were treated as a homogenous group.
RUNS BEYOND 3 SIGMA
Standard clinical practice is to react to one point beyond a control limit. In Study #1 data were collected at the rate of one observation every five seconds, while in Study #2 data were collected at the rate of one observation per heartbeat. In both studies the data were autocorrelated (any observation is correlated with the previous observation) and the SaO2 data are truncated to the nearest whole value making the data more strongly autocorrelated.
In addition, there are a large number of self-correcting changes in biomedical data. The standard Shewhart process control chart used in industry was not designed to handle self-correcting systems. The researchers adjusted the control limits for autocorrelation by calculating the amount of autocorrelation and then adjusting the control limits. The reset rule was also adjusted by requiring that a specific number of adjacent data points exceeding the control limits were required before adjusting the control limits, rather than relying on a single value to signal an adjustment in control limits. The number of data points that comprise "a run" is under the control of the caregiver. The objective of using a run, rather than a single data point, is to identify a sustained change in a vital sign. A run that indicates a sustained change prior to reset is indicated in Figure 1.
Study #1 used a SpaceLab® monitor which sampled vital sign data at the rate of one observation every five seconds. In Study #1 with a subgroup size of 5 a run of 10 subgroups in a row required exactly 4.17 minutes (10 measurements x 1 observation every 5 seconds x 5 observations per subgroup = 250 seconds; 250 seconds/60 seconds per minute = 4.167 minutes) while a run of 20 subgroups in a row required exactly 9.17 minutes.
Study #2 used a Nellcor 200® monitor that generated an observation for every heartbeat for short-term newborns that may be expected to have heart rates of 160 per minute plus. Using a subgroup size of 20 and a run of 10 subgroups in a row takes approximately 1.25 minutes and a run of 22 subgroups in a row takes approximately 2.75 minutes. Again, the decision on subgroup size and number of subgroups comprising a run is under the control of the caregiver. The caregiver may select the run length by estimating the amount of time desired. When using a clinical monitor with a fixed sampling rate, the relationship between sample size-runs and time is fixed. When using a heart beat monitor, the sample rate is a function of the patient's heart rate.
In order to control the decision-making time allowed in each study, the researchers selected a subgroup size of 5 for Study #1 to speed up the time for decision making and the subgroup size of 20 for Study #2 was selected to slow down the time for decision making. Because of the difference between monitors we could not match the decision making time in the two studies. In addition, because of the nature of SaO2 data was very close to 100 percent in these patients, our control chart program automatically changed to a run length of 22, even when the caregiver has selected a shorter run.
BALANCE TYPE I AND TYPE II ERRORS
All decision-making procedures are subject to errors. The two types of errors are:
A reduction in the Type I error always means an increase in the probability of a Type II error. We have experimented with the theory of runs (the number of runs required before the control limits are reset) using one point beyond the three sigma limits (three standard deviations), runs beyond the 3 sigma limits, and runs beyond the six sigma limits. Type I and II errors change according to the decision-making rules employed.
ALARMS VERSUS CONTROL CHARTS
An alarm system with a fixed set of decision rules may be compared to a given SPC system with a selected set of control limits and reset rules. This study compares two alarm systems as used at Children's Hospital in Boston (1995-Study#1) and the Brigham & Women's Hospital in Boston (1997-Study#2). One difference was that at Children's Hospital the alarm system was based on specification limits controlled by the caregiver for individual patients, while at Brigham and Women's a global limit for all patients was used.
Figure 2 shows the reduction in the number of alarms when using SPC rather than relying on one observation beyond the limit rule. In all cases a reduction of at least 77 percent was obtained. When using the SPC rule that required 22 points beyond the control limit before the computer would recalculated the reset, resets were reduced by at least 92 percent. Using SPC reduced the number of alarms associated with SaO2 more than the number of alarms associated with heart rates for both decision rules. The results show that by increasing the required number of runs falling beyond the control limits, the number of false alarms is reduced.
Figure 2 Reduction in Indicators
Despite the differences in the studies (completely different patient groups and the use of different decision making rules), there is a similar result for both locations. Using control charts in background indicates that control charts will reduce the number of times the caregiver should look at the patient. In addition, although the caregiver does have to examine the patient, the computer will provide a history of vital sign behavior on the screen reducing the effort required to obtain additional evidence of the patient's vital signs.
SUMMARY
The clinical monitor produces waveforms and raw digital data, which require caregivers to analyze the data. Setting specification limits are a matter of the caregiver's judgment. When the manual alarm process is compared to using process control charts, the manual system produces excessive false alarms and does not accomplish the objective of quality patient management and care.
The underlying design of the Children's Hospital was based on alarms. It was assumed that the caregiver could tell if an alarms was false, true, clinically significant or not. Assigning the cause to the alarms proved to be an impossible task for the caregivers in the study. The researchers believe that the effort to identify the nature of each alarm resulted in data inaccuracy and to mistaken recordings of data.
The SPC analysis shows that the vital sign oxygen saturation and heart rate behave in a similar manner for two diverse patient groups relative to the application of control charts. The decision rules are to find a balance of the Type I and Type II errors. In these studies, SPC reduced the number of examinations due to each vital sign alarms between 70 and 90 percent.
REFERENCES
1) Laffel, Glenn, Robert Luttman, and Steven M. Zimmerman "Using Control Charts to Analyze Serial Patient-Related Data", Quality Management in Health Care, 1994 2(1), p.70-77 Volume 3 Number 1 Fall 1994.
2) Plsek (1992), "Introduction to control charts," Quality Manage Health Case. p.65-73. Zimmerman, Steven M, and Steven Ringer, "Issues in Clinical Monitoring," Computers in Industrial Engineering Vol. 31 No ˝, pp 451-454, 1996.
3) Zimmerman, Steven M., Robert N. Zimmerman, Lonnie D. Brown, and Shannon S. Brown, (1992) "Using Moving Average Process Control Charts in Biomedical Applications," Proceedings- Ninth International Conference of the Israel Society of Quality Assurance, 1992, November 1992, p.761-764.
4) Zimmerman, Steven M., Lonnie D. Brown, Shannon S. Brown, and Richard L Goldhamer, M.D. (1990), "Quality Control Charts for Patient Data." The 8th International Conference of Israel Society for Quality Assurance Transactions November 26-29, 1990 Jerusalem.
5) Zimmerman, Steven M., Lonnie Brown, Shannon Brown, and Robert N. Zimmerman (1992), "Using the Theory of Runs in a Biomedical Application," 46th Annual Quality Control Congress Transactions May 18-20, p.903-908.