The Relationship between Spread and Skill in an Ensemble Forecast
Barbara E. Prater
Central Michigan University


Abstract

Past studies have shown that ensemble spread and forecast skill are correlated; however, more recent evidence indicates that this is not always true. This study investigates whether such a correlation exists in a simple model of convection. Results show that under certain conditions, it is inconclusive whether ensemble spread can be used to predict forecast skill.

1. INTRODUCTION

Although computer models have made vast improvements in forecast ability over the past decades, numerical weather prediction is hampered by errors that will never be overcome. First of all, it is impossible to perfectly sample the conditions of the atmosphere used to initialize a model. Secondly, the models themselves are far from perfect in their parameterizations. Because of the nature of the nonlinear equations that govern atmospheric flow, a small error in the initial conditions can be amplified very quickly in subsequent integrations.

A method used to improve upon the overall utility of model forecasts, that explicitly recognizes both the initial condition and model errors, is ensemble forecasting. This method involves generating an ensemble of reasonable perturbations around the observed initial conditions and making forecasts from each ensemble member. Among these ensemble forecasts, there is a greater probability of generating a reliable forecast than with one deterministic forecast (Stensrud and Fritsch 1994). The mean of the ensemble of forecasts produced by random perturbations around the initial value provides a forecast that is more skillful than any individual forecast when averaged over a long time period. (Leith 1974). Past studies also indicate that the spread of the ensemble members may be an indication of the skill of the forecasts (Palmer et al. 1990). If this is true, then ensemble forecasts would be of significant benefit to operational forecasting, by providing forecasters with a quantitative tool to use in predicting forecast skill. However, studies examining ensemble precipitation forecasts have shown that this relationship between ensemble spread and forecast skill does not hold (Hamill and Collucci 1997, 1998; Stensrud et al. 1999). Therefore, in order to investigate this apparent contradiction in results, a very simple model of convection is used to study the relationship between spread and skill.

The Lorenz (1963) equations model convection in a fluid, using three variables (X, Y, and Z) that behave chaotically for certain ranges of the control parameters. Plotted in two dimensions, the graph of X and Y forms a strange attractor in the shape of a butterfly. X, Y, and Z are at times chaotic because they have "sensitive dependence" on their initial conditions; slight variation in initial conditions can affect the outcome by a large degree (see Gleick 1987, Lorenz 1993). The fluid flow seems to behave randomly, but in actuality is predicted from a deterministic set of equations. With the plot of X and Y, the points tend to circle around two stationary points; the shape of the model trajectories forms a strange attractor around these points in the shape of a butterfly. Though their values seem random, X and Y will continue to loop around those two stationary points. Perturbing the initial conditions by even a small amount can affect the final position by entire loops around the attractor-by a significant distance from the final value of the initial conditions (Figure 1).





Figure 1. Plot of X and Y from the convective equations for an integration time of 100, representing part of a butterfly attractor. Note the cluster of random perturbations around the initial value at around (-15, -15). The ensemble forecast values are scattered over nearly an entire loop around the attractor.


The convective equations outlined by Lorenz (1963) are as simple a model of Rayleigh-Bénard convection that still allows representation of the physics of the process. Given a layer of air, the bottom is heated and the top cooled. Initially, the heat is distributed by conduction and the temperature profile varies linearly with height. However, the temperature gradient reaches a point where conduction alone does not distribute the heat quickly enough; the fluid in the layer is set in motion, forming convective cells with ascending and descending fluid. As the temperature gradient is increased further, the layer eventually will reach a state of turbulence. In this simple model, the amplitude of the distortion of the temperature distribution from the vertical, linear temperature profile is represented by Z. Y is proportional to the temperature difference between ascending and descending air, and X represents the intensity of circulation in the fluid.

Section 2 describes the model used to integrate the convective equations as well as the statistics used to evaluate the forecasts produced. Section 3 evaluates the correlation between ensemble spread and forecast skill and examines the terms of the mean-squared error (MSE) equation. Section 4 evaluates these results with respect to previous research, and Section 5 contains concluding remarks.


2. METHODS

Lorenz (1963) parameterized convection with the following ordinary differential equations:


X' = -sX + -sY
Y' = -XZ + rX - Y
Z' = XY - bZ

where a prime denotes the time derivative of a model variable, s=10.0, b=8.0/3.0, and r=28.0. Perturbations in s, b, and r are used to represent errors in the model physics. For comparison, the model was also run without those perturbations as a "perfect" model. The equations were integrated using the method of Lorenz (1963) in order to find X, Y, and Z at time t+1. The equations were first integrated in the model with known initial values, producing a "truth" forecast. Both model and initial condition errors were then introduced. Model error was incorporated into the equations by adding random perturbations to the values of s, b, and r using a random number generator and specifying the range of variability. Initial condition error is incorporated by calculating 100 initial perturbations in X, Y, and Z with a random number generator. This procedure produced a set of initial conditions that were normally distributed around the "true" initial conditions, having a standard deviation of 0.5. This process was used for each of 1000 different integrations in the model (Figure 2). Each perturbed initial value was then integrated over the given number of integration times, generating 100 possible final values of X, Y, and Z. The number of time steps over which the model was integrated was varied over values of 10, 50, 100, 150, 200, 250, and 500.






Figures 2a, 2b, and 2c-3 of the 1000 loops of the model at an integration time of 50.


The mean and spread of the final ensemble values were calculated, where the spread is the standard deviation of each ensemble member from the ensemble mean. The difference between the mean and truth values was also calculated, giving a value for the skill of the ensemble. In addition, the closest ensemble member among initial perturbations was found, and the value of this perturbation after its integration through the model saved as the "control" forecast.

Using the standard deviation from the mean as a measure of ensemble spread, and the calculated difference between the ensemble mean and the truth forecast as a measure of skill, the spread was plotted against the skill with a point for each of 1000 forecasts as a test for the correlation between ensemble spread and forecast skill. Finally, the Spearman rank correlation between spread and skill was calculated. In a perfect forecast, this value would equal 1.00, indicating that the best skill always corresponded to the lowest spread and the worst skill always corresponded to the highest spread; two fields that are uncorrelated have a rank correlation of 0.

In order to evaluate further the spread-skill relationship, the model forecasts were applied to a mean-squared error (MSE) equation:


MSE = mean MSE of ensemble + ensemble spread + square of covariance

where

MSE = (control-truth)2
mean MSE of ensemble = 1/N S(n=1,N) [(ensemble(n)-truth)2]
ensemble spread = -1/N S(n=1,N) [(ensemble(n)-control)2]
square of covariance = 2/N S(n=1,N) [(ensemble(n)-truth)(ensemble(n)-control)].

The magnitude of each term was then evaluated in order to see which term or terms were most important in determining the MSE.





Figures 3a, 3b, and 3c represent the distribution of the distance of the closest ensemble member to the truth forecast. Note the majority of points in the smallest grouping at an integration time of 50, the increasing number of points with larger distance at 150, and the frequency of poor forecasts at 250.


3. RESULTS

a. Plot of spread vs. skill

The relationship between spread and skill tends to worsen with increasing integration time, showing less correlation between low spread and high skill (or, conversely, high spread and low skill). This is verified by analyzing the distance of the closest ensemble member to the truth point. For every integration, the distance of the closest ensemble member to the truth forecast was saved; these values were then plotted on a histogram. The probability of all members giving a bad forecast, or having a large distance from the truth forecast, increased with increasing integration time (Figure 3).

Until an integration time of 100, the points on the plot of spread vs. skill tend to pack together with low difference (high skill) up to a certain value of spread; after this value, the correlation between spread and skill is inconclusive, with large values of spread often producing high model skill (Figure 4). Table 1 shows the number of points packed below a spread and skill of 2.0, indicating that the frequency of forecasts with low spread and high skill values decreases with increasing integration time.


Table 1. Number of corresponding spread and skill values less than 2.0 per 1000 iterations

KSTEP

X

Y

Z

10

1000

1000

1000

50

637

529

530

100

379

177

141

150

191

66

18





Figure 4. Plot of spread (standard deviation) vs. skill (absolute difference between mean and control forecast) for Y at an integration time of 50. Note the packing of points below a spread value of 2.0.


b. Rank correlation of spread and skill

In contrast to the appearance of the plot of spread vs. skill, the rank correlation values tend to reach a peak at an integration time of 150, with comparably good values from 50 to 200. This is also true of the perfect model forecasts (Table 2). Increasing the model error, by increasing the range of random perturbations around s, b, and r, worsens the correlation. At an integration time of 100, the range of error for all three constants was increased incrementally to ± 1.0. Table 3 shows the steady decrease in rank correlation values with each increase in the range of model error. However, increasing the error of r alone, even to values of ± 1.5, did not affect the correlation with any consistency; the increase in error of r affected the rank correlation values of X and Y by no more than ± 0.03, but affected Z by up to ± 0.10, in runs between time steps of 50 and 200.

The following test was run with an integration time of 100, with a value of 0.5 added to the initialization values of X, Y, and Z in order to represent a bias in our ability to accurately determine the initial state. First, the values of s, b, and r were given constant error perturbations of 0.25 for one run and -0.25 for another, in order to produce two tests representing one model each. The rank correlation values were worse with constant error of -0.25 than with the ensemble with 100 different representations of model error, but they were significantly worse with a constant error of 0.25. Next, to represent two models, the values of s, b, and r were assigned 2 different constant perturbations within each ensemble, one for the first 50 members and another for the second 50 members. Table 4 shows that there is improvement over the forecasts with one constant perturbation in those runs with lower perturbation values, but not when including larger perturbations. The best run among those with model bias is that with random perturbations around s, b, and r, representing 100 different models. It is evident from the rank correlation values that the correlation is less apparent with fewer models in the ensemble. The use of more models increases the chance that one of those final values will come close to the truth forecast.



Table 2. Rank correlation values of Y, with those of the perfect model (in parenthesis) for comparison.

KSTEP

Rank correlation

10

0.20 (0.26)

50

0.71 (0.77)

100

0.69 (0.78)

150

0.78 (0.83)

200

0.69 (0.82)

250

0.62 (0.67)

500

0.35 (0.57)



Table 3. Rank correlation for Y with increasing range of model error at a time step of 100.

Model error

Rank correlation

± 0.5

0.69

± 0.6

0.67

± 0.7

0.65

± 0.8

0.60

± 0.9

0.56

± 1.0

0.53



Table 4. Rank correlation values for Y at a time step of 100 after varying the number of models and the error in those models, including a truth bias of 0.5. Value in parenthesis is the rank correlation with a standard run, included for reference.

Model error

Rank correlation

Random

0.57 (0.69)

-0.25

0.51

-0.1 and -0.3

0.53

-0.2 and -0.4

0.37




c. MSE equation

If ensemble spread is the best indicator of forecast skill, then this would be reflected in the terms of the MSE equation-the ensemble spread term would dominate the other terms. However, this is not always true, as is seen by separating the MSE equation into its components. The first two terms of the equation-mean MSE of the ensemble and ensemble spread-tend to be the largest terms. They are also of opposite sign, effectively canceling each other out. Thus, the MSE remains fairly small and it is difficult to determine which term is dominant. Note in Figure 5 the spikes of both the MSE and covariance terms (scaled by the ensemble spread term) toward very small values, indicating that the ensemble spread is much larger than either term. The scaled mean MSE term tends to remain between 100 and 10-1, indicating that the ensemble spread and mean MSE of the ensemble are of similar magnitude in most cases.





Figure 5. Graph of the values of MSE (A), mean MSE (B), and covariance (C) terms, scaled by ensemble spread, for the first 50 loops through the model. Note that lines A and C often spike down into values of low magnitude, while B remains mainly between 0.1 and 1.0 and even spikes up above 1.0


4. DISCUSSION

The correlation between spread and skill has been investigated in past studies. Buizza (1997) noted "some correlation" between small ensemble spread and high forecast skill in an ensemble forecast. However, this correlation between spread and skill was not found in short-range forecasts (Stensrud et al. 1999). The current study found correlation between small spread and good skill for integration times 10-100, but only up to some given value of spread. Above this value, the correlation of spread and skill is inconclusive; visually, on the plots of spread vs. skill (Figure 4), the points are randomly distributed, with some points having low spread and low skill and others having high spread and high skill.

The rank correlation of spread and skill peaks at the medium range of integration time, with rather poor correlation in the short-range integration times. This may imply that short-range ensemble forecasting simply cannot be as good a forecast tool as medium-range ensemble forecasts are. It does indicate that short-range ensemble forecast skill should be evaluated with different parameters because of the inconclusive correlation between spread and skill.

Evaluating the terms of the MSE equation verifies that ensemble spread may not be the best means of evaluating forecast skill. The ensemble spread term is often the largest term in the equation, but the mean MSE term is also sometimes dominant; both terms are of similar magnitude in almost all runs. Thus, a small value of MSE could be associated with various values of ensemble spread.


5. CONCLUSION

Further investigation should be done on that value of spread above which results are inconclusive but below which there is a correlation with good skill to determine whether this can be quantified and applied. However, as indicated by the terms of the MSE equation, ensemble spread does not always have the largest influence to forecast skill and therefore may not be the best means of evaluating it. This is especially true of short-range forecasts, where rank correlation values between spread and skill were inconclusive. Small ensemble spread could be used as a validation for forecast skill; however, ensemble spread is not sufficiently correlated with forecast skill to rely on it as a means of evaluating the forecasts.


Acknowledgments. First and foremost, many thanks to David Stensrud, who served as a mentor in the truest sense of the word. Thanks also to Pamela MacKeen for filling in a couple of occasional gaps and to Melinda Kreth (Central Michigan University) for the long-distance editing. Finally, my deepest appreciation to the Research Experience for Undergraduates program and all associated people for presenting this opportunity.


REFERENCES

Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributions of the ECMWF ensemble prediction system. Mon. Wea. Rev., 125, 99-119.

Gleick, J: 1987. Chaos: Making a new science. Viking Penguin Inc, 352 pp.

Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta-RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 1312-1327.

-----, and -----, 1998: Evaluation of Eta-RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126, 711-724.

Leith, C. E., 1974: Theoretical skill of Monte Carlo forecasts. Mon. Wea. Rev., 102, 409-418.

Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130-141.

-----, 1993: The Essence of Chaos. Univ. of Washington Press, 227 pp.

Palmer, T. N, R. Mureau, and F. Molteni, 1990: The Monte Carlo forecast. Weather, 45, 198-207.

Stensrud, D. J. and J. M. Fritsch, 1994: Mesoscale convective systems in weakly forced large-scale environments. Part III: Numerical simulations and implications for operational forecasting. Mon. Wea. Rev., 122, 2084-2104.

-----, H. E. Brooks, J. Du, S. Tracton, and E. Rogers, 1999: Using ensembles for short-range forecasting. Mon. Wea. Rev., 127, 433-446.