Site hosted by Angelfire.com: Build your free website today!
Autocorrelation in clinical data: skipping data to reduce autocorrelation

Steven M. Zimmerman, Ph.D., Professor of Quality and Systems Management, University of South Alabama
Steven Ringer, M.D. Director of New Born Services, Harvard Medical School, Brigham and Women's Hospital
Randy Tucker, Ph.D., Independent Consultant
Cathy Ncube, Ph.D., Assistant Professor Computer Science, University of West Florida
Matoteng Ncube, Ph.D., Associate Professor of Statistics, University of West Florida

Abstract

Autocorrelation occurs when the value of a measurement is a function of the parameters prior value. Clinical data often has high levels of autocorrelation. In order to analyze clinical data one must be able to first measure autocorrelation and in some situations reduce it by adjusting data sampling and collection schemes. The use of normal statistical process control (SPC) charts requires low levels of autocorrelation. One proposed method of reducing the effective autocorrelation in data is by spreading out data sampling. Our hypothesis is that autocorrelation may be reduced by skipping data. It is shown that autocorrelation can be reduced in simulated, oxygen saturation, and heart rate data. The paper also looks at the problem of measuring autocorrelation.

Introduction

Single period serial or autocorrelation is defined as the relationship between a measurement and its prior value. The hypothesis of our study was that by controlling the time between observations, the autocorrelation in a set of measurements could be reduced. Three types of data were tested: 1) Normally random generated values; 2) Oxygen saturation data collected in subgroups of 20 from short-term newborns; and 3)Heart rate data collected in subgroups of 20 from short-term newborns The 20 data points in a subgroup structure were used because this was the standard method of collected data from short-term newborns. Because newborns have high heart rates, relative to adults, the subgroup size of 20 was required to control the elapsed time of our control chart on a standard computer screen. The data file was collected by a real-time statistical process control (SPC) program that collected, analyzed, and graphed control charts for vital sign data.

Theory tells us that SPC requires statistically independent data to work without adjustments. The program we are using has build in procedures for adjusting control limits as a function of the amount of autocorrelation in the data. This paper was the result of our effort to better understand autocorrelation in vital sign data and to determine if we should change the autocorrelation adjustment in the software or to leave it alone. Figure 1 illustrates a control chart for oxygen saturation data using the Biomedical Medical Quality Control program. On the top of the screen is the process control chart, below that is a sigma (standard deviation) process control chart and on the bottom of the figure are charts of two different subgroup-to-subgroup methods for calculating the autocorrelation used to adjust the control limits.

Figure 1 SPC-BQC chart for short-term newborn oxygen saturation

Examination of the control chart illustrated in Figure 1 shows that there were four resets (the recalculating on control limit statistics) during the 32-minute period displayed. A reset is indicated by the background color of yellow. In black and white illustration (Figure 1) the darker background represents yellow. The sigma (standard deviation) process control chart indicated that the variation started high, dropped, and then returned to a high level. Both autocorrelation charts indicate the same result. The reason we were concerned with calculating methods for autocorrelation was that the program performs all calculations and graphing in real-time and there is no standard (accepted) procedure for performing such calculations in a real-time environment. Figure 2 illustrates an Excel spreadsheet with the oxygen saturation data displayed. The data are rounded to the nearest whole number. Current clinical monitors provide whole numbers for oxygen saturation and heart rate. The data are recorded in 20 columns as illustrated in Figure 2.

Figure 2 BQC-oxygen saturation data

Accurate calculations of autocorrelation require precision numbers. The nearer the autocorrelation is to one, the more the calculation procedures are limited by the lack of data precision. All our efforts to find ways to obtain more precise numbers have met with failure. We must work with what is available.

Spreadsheet Correlation Calculations

Our hypothesis was that we could reduce the effect of autocorrelation by skipping time (data) between observations. To test the hypothesis we decided to measure the correlation between the data in column B versus C, then column B versus D, and column B versus U. If autocorrelation were reduced by skipping data, the autocorrelation should go down as the number of columns skipped increased. Excel like all modern Windows spreadsheet programs have built-in calculation procedures for determining correlation. We selected not to use built-in formula because it required that we continuous reorganize the spreadsheet and we wanted to know and understand exactly how all calculations were being performed. The correlation formula used was:

Where r is the correlation coefficient, xi is the value in column B, is the average of column B, yi is the value in the nth column, and is the average of the nth column. One of the spreadsheets was selected as our example. There are three sheets in each spreadsheet. Sheet-1 is for normally distributed random numbers, sheet-2 is for oxygen saturation (1a13a17.bqc), and sheet-3 is for heart rate (1b13b17.bqc). The data files 1a13a17.bqc (oxygen) and 1b13b17.bqc (heart rate) were from short-term newborns collected on August 1st at 1.17 PM (1317 military time) 1997. One hundred data points were generated for the random number analysis and the first 100 subgroups were selected from the data files. Autocorrelation in all number types were reduced as we skipped data. A total of seven data files of each type were examined (20 points in each subgroup, 100 subgroups in each data set). The oxygen saturation and heart rate data were collected over a four-year period. On the average (when 18 data points were skipped) the autocorrelation in the simulation data was reduced 9 percent, the autocorrelation in the oxygen saturation data was reduced by 20 percent and the autocorrelation in the heart rate data was reduced by 30 percent.

Table 1 Reduction in autocorrelation

The reasons for the differences in autocorrelation reduction between the three types of data are subject to speculation. Figure 3 illustrates a graph of the reduction in autocorrelation in simulated data as the number of data points skipped is increased. Because we are working with random numbers the actual reduction was a random variable. Starting with an autocorrelation of 98 percent and skipping 18 columns we obtained autocorrelation of levels around 90 percent. Because we rounded our data to the nearest whole number, we were unable to simulated autocorrelation levels of 99 or 100 percent. From other work; we note that if the data are 100 percent correlation, no sampling scheme will change the correlation.

Figure 3 Sampling reduction of autocorrelation

Oxygen Saturation

Figure 4 illustrates a graph of our oxygen saturation results. As the number of data skipped increases, the autocorrelation goes down. Because 98 was the highest autocorrelation number the spreadsheet would except (without division by zero errors) we compared random numbers starting with 98 with the oxygen saturation results. Figure 4 is typical of the results we obtained.

Figure 4 Oxygen saturation and random numbers

Heart Rate

The reduction of heart rate autocorrelation was greater than the reduction for both oxygen saturation (with the exception of one sample) and simulated numbers.

Figure 5 Heart rate versus random numbers

Speculation

We do not know why the simulation data had a 9 percent reduction, the oxygen saturation data had a 20 percent reduction, and the heart rate had a 30 percent reduction. We know that real data both oxygen saturation and heart rate data includes changes in addition to autocorrelation. Simulation only includes the autocorrelation changes. The 9 percent reduction of the simulation data could be taken as a minimum when no other aspect of the data are changing. The patterns for these oxygen saturation and heart rate look different. Our experience tells us that the oxygen saturation data are more autocorrelated than the heart rate data. Using the traditional calculation method (formula used in this research) we do not get the expected results.

Figure 6 Oxygen saturation versus heart rate

Using a subgroup-by-subgroup autocorrelation calculation procedure we consistently obtain results that reflect the data patterns shown. Figure 1 illustrated the control charts and two subgroup-by-subgroup autocorrelation calculation methods results for oxygen saturation measurements (0.908 for method one and 0.604 for method two) from baby data file: 1a9a46. Figure 4 illustrates the control charts and two subgroup-by-subgroup autocorrelation calculations (0.863 for method one and 0.727 for method two) for the same baby for heart rate (1b9b46). In our opinion, the subgroup-by-subgroup autocorrelation calculations using method one better reflect the data pattern than the traditional correlation formula. Because of the data layout, the subgroup-to-subgroup calculation method could not be used for measuring the reduction in autocorrelation.

Figure 7 Heart rate autocorrelation

Conclusions

Skipping data reduces autocorrelation. The amount of reduction is a function of the data. Oxygen saturation autocorrelation is reduced faster than simulation data. Heart rate autocorrelation is reduced faster than oxygen saturation data. The traditional correlation formula illustrates these results. There is a question of how autocorrelation should be calculated in subgroup data. The traditional formula does not reflect data behavior. The subgroup-by-subgroup calculations results more closely agree with the visual results.

References

Laffel, Glenn, Robert Luttman, and Steven M. Zimmerman "Using Control Charts to Analyze Serial Patient-Related Data", Quality Management in Health Care, 1994 2(1), p.70-77 Volume 3 Number 1 Fall 1994.
Plsek (1992), "Introduction to control charts," Quality Manage Health Case. p.65-73.
Zimmerman, Steven M, and Steven Ringer, "Issues in Clinical Monitoring," Computers in Industrial Engineering Vol. 31 No ½, pp 451-454, 1996.
Zimmerman, Steven M., Robert N. Zimmerman, Lonnie D. Brown, and Shannon S. Brown, (1992) "Using Moving Average Process Control Charts in Biomedical Applications," Proceedings- Ninth International Conference of the Israel Society of Quality Assurance, 1992, November 1992, p.761-764.
Zimmerman, Steven M., Lonnie D. Brown, Shannon S. Brown, and Richard L Goldhamer, M.D. (1990), "Quality Control Charts for Patient Data." The 8th International Conference of Israel Society for Quality Assurance Transactions November 26-29, 1990 Jerusalem.
Zimmerman, Steven M., Lonnie Brown, Shannon Brown, and Robert N. Zimmerman (1992), "Using the Theory of Runs in a Biomedical Application," 46th Annual Quality Control Congress Transactions May 18-20, p.903-908.

Home
Articles about BioMedQC material
Dept-Z@BioMedQC.com