ETA-BASED MOS PROBABILITY OF PRECIPITATION (PoP) AND QUANTITATIVE PRECIPITATION FORECAST (QPF) GUIDANCE FOR THE CONTINENTAL UNITED STATES


Joseph C. Maloney

1. INTRODUCTION

The National Weather Service's (NWS) Meteorological Development Laboratory (MDL) has developed a new Model Output Statistics (MOS) package based upon the National Centers for Environmental Prediction's (NCEP) Eta Model output (Black 1994, Rogers et al. 1996, Zhao et al. 1997). Included in this package are the probability of precipitation (PoP) and the quantitative precipitation forecast (QPF) guidance in both categorical and probabilistic (PQPF) form. The MOS technique (Glahn and Lowry 1972) uses multiple linear regression to statistically relate predictand data (such as observations of precipitation amount) with predictor data (such as model output and climate information). Eta MOS PoP and QPF guidance are produced for 6-, 12-, and 24-hour periods after the 0000 and 1200 UTC model initialization times. The 6-h forecasts are available for the 6-12, 12-18, ..., and 54-60 h projections; 12-h forecasts are available for the 12-24, 24-36, 36-48, and 48-60 h projections; and 24-h forecasts are available for the 12-36, 24-48, and 36-60 h projections. From the PoP and the PQPF, a "best category" QPF is produced.

2. DEVELOPMENT

a. Predictands

For PoP, the predictand is the occurrence of at least 0.01 inches (liquid equivalent) of precipitation in a 6-h, 12-h, or 24-h period. As a binary predictand, a value of one indicates the accumulation of at least 0.01 inches of precipitation in the period of interest, and zero otherwise. For QPF, the predictands are the conditional occurrence of at least 0.10, 0.25, 0.50, 1.00, and 2.00 inches of precipitation in a 6-h, 12-h, or 24-h period. (For the 6-h forecasts, the 2.00-inch category is eliminated.) In other words, the predictand is the occurrence of at least, say, 0.25 inches, given that 0.01 inches has accumulated (i.e., precipitation has occurred). These data were available from METAR observations archived by MDL at 0000, 0600, 1200, and 1800 UTC and were used to develop predictands at the various projections.

b. Predictors

Predictors offered to the regression routine included Eta model forecasts, station geographical information, and daily harmonics (e.g., the sine and cosine of the day of the year). Model data for development were available from the daily 0000 and 1200 UTC runs of the model. The data were interpolated onto a polar stereographic grid projection oriented 105W with a grid resolution of 90.75 km at 60N. Eta forecasts used as potential predictors included 6-h and 12-h precipitation amounts, mean layer relative humidities (RH), mean sea level pressure (MSLP), precipitable water, moisture divergence, u- and v-wind components, vertical velocities, lifted index, and K index. Table 1 gives more details on these predictors, such as available vertical levels and thresholds. Model data were available every six hours from 6 through 48 hours after initialization (data valid at initialization were not used). Data were offered at all projections necessary to cover the period of each forecast; however, for projections beyond 48 hours, only data from the 48-h projection were offered. The lone exception to this rule is the 24-h PoP/QPF forecast from 36 to 60 hours in advance, when model data were offered from the 36-h projection (the beginning of the 24-h period) and the 48-h projection (the midpoint).

Station geographical information, such as latitude, longitude, and elevation, were included as potential predictors to account for local dependencies. In addition, the first harmonics (sine and cosine) of the day of the year were included to simulate seasonal variations.

In addition to continuous predictors (data directly interpolated from the grid to station locales), many predictors also were used in point-binary and grid-binary form. To compute point-binaries, data are interpolated to stations, then the cutoff threshold is applied, yielding values of one, if the threshold is met or exceeded, or zero otherwise. However, to compute a grid-binary (Jensenius 1992), the cutoff threshold is applied to the gridded values prior to smoothing and station interpolation, which results in a value at the stations that can range from zero to one. This approach for binary predictors has the advantage of smoother transitions between the predictor extremes of zero and one compared to the point-binary. As an example, consider nearby stations A and B. The 1000-850 hPa mean layer RH at A is 81%; at B, 77%. Suppose this predictor has a coefficient of .20 in the equation. If a point-binary variable is used with a threshold of 80%, the forecast would drop 20% between these two stations despite a relatively small change in mean layer RH. On the other hand, using a grid-binary variable would result in a much smaller forecast difference between the stations. For this development, model-derived predictors were offered in continuous and grid-binary form, and station elevation was offered as a point-binary. Other variables, such as latitude, longitude, and the harmonics, were offered only in continuous form.

The most frequently selected predictors for the PoP equations include the 6-h and 12-h grid-binary precipitation amounts at the 0.01 inch and 0.05 inch thresholds, various mean layer RHs, and the horizontal wind components. QPF equations frequently utilize 6-h and 12-h grid-binary precipitation amounts at the 0.10 inch, 0.25 inch, and 0.50 inch thresholds, K index, and vertical velocities. MSLP, precipitable water, moisture divergence, and horizontal wind components were popular QPF predictors for forecasts beyond 48 hours from initialization.

c. Developmental Sample

Data were used from the 1997-1998, 1998-1999, 1999-2000, and 2000-2001 cool seasons and the 1997, 1998, 1999, 2000, and 2001 warm seasons for this development. For PoP and QPF, the cool season is defined as October 1 - March 31, and the warm season is defined as April 1 - September 30. While forecasts will be produced for 1258 stations in the contiguous United States, precipitation predictand data were used only from 395 stations which were found to have reliable precipitation records for the entire cool season developmental sample. With the longer data sample available, 571 stations were used for predictand data in the warm season development. The most recent seasons (cool season 2000-2001 and warm season 2001) were held out from the development as an independent sample for testing; however, they were included for the final equation development.

d. Equation Characteristics

Regionalized equations were developed for PoP and QPF because of the relative scarcity of some of the extreme QPF amounts, as well as the lack of precipitation measurements at many of the MOS forecast sites. This approach groups stations into regions based upon similar climatologies and geographic situation, and allows forecasts to be made at stations not included in the developmental sample. Thus, guidance can be generated for those stations with unreliable and non-existent precipitation observations, for which stable regression equations would otherwise be impossible to develop. In addition, using this approach allows pooling of observations at stations within a region to accumulate a larger sample of rarer events, which in turn will yield more stable regression equations. Figure 1a shows the 18 regions used for the cool season PoP development, while Figure 1b shows the 16 regions used for the cool season QPF development. Figure 2 shows the 12 regions used for both the warm season PoP and QPF developments. Note that while the same equation is applied to all stations in a given region, the equation will produce different forecasts at each station, as predictor values are interpolated to each station.

Despite the regionalization, some regions still did not have sufficient observations to develop equations for the highest QPF categories. For example, many of the western regions (aside from the Pacific Northwest) had few, if any, cases of 1.00 inches or more of precipitation in a 6-h period. A few regions on the East Coast and Great Lakes had an insufficient number of 2.00 inches or greater events in 12-h periods. Rather than combine regions spreading over large areas, which could cause a loss in the ability of the MOS system to distinguish local climatologies, it was decided not to produce probability equations for these regions for the rarest events. Tables 2a, 2b, 3a, and 3b show by forecast projection which categorical forecasts, if any, are not produced for each region for the cool season 0000 UTC, cool season 1200 UTC, warm season 0000 UTC, and warm season 1200 UTC guidance, respectively. Where the most extreme categories are not forecast, forecasts of the highest-available category should be interpreted as at least the lower bound of the category. For example, in a region where category 6 (2.00 inches or more) is never forecast, a category 5 forecast would be properly interpreted as 1.00 inches or greater of precipitation, instead of 1.00-1.99 inches.

The regression procedure added predictors to the equations until one of two stopping criteria was reached: either a maximum of 10 terms was included, or no remaining predictor could reduce the variance by at least an additional 0.25%. Most PoP equations contain about six terms; a few only have four, while others have 10. Nearly all of the QPF equations contain 10 terms.

PoP and QPF equations were developed separately; however, equations for the various conditional QPF thresholds (0.10, 0.25, 0.50, 1.00, 2.00) were developed simultaneously. This means that each of the four (five for 12-h and 24-h) equations for the conditional QPF thresholds have the same predictors, but different regression coefficients.

e. Categorical Forecasts

A categorical precipitation forecast is ultimately generated from the probabilistic MOS forecasts described above. To accomplish this, forecasts are compared to thresholds derived in the development process. These thresholds were calculated for each model cycle, forecast projection, and region. For this precipitation development, the goal was to find thresholds which would maximize the threat score while minimizing the false alarm rate and maintaining a bias close to unity. However, in testing, we found that MOS forecasts could be made more skillful without seriously inflating the false alarm rate by allowing biases for the second- and third-highest categories (0.25-0.49 inches and 0.50-0.99 inches for 6-h QPF, 0.50-0.99 inches and 1.00-1.99 inches for 12-h and 24-h QPF) to range as high as 1.15. Biases for other categories, including the highest category, were constrained between 0.95 and 1.05.

The selection algorithm compares the forecast probabilities to the thresholds in order from highest to lowest and selects the highest category for which the forecast exceeds the threshold as the "best category." Our categorical forecasts are:

3. POST PROCESSING

After the regression equations are evaluated, the resultant probability forecasts are checked through a series of post-processing steps to ensure statistical validity. Figure 3 illustrates these post-processing steps. First, the conditional PQPF values are made unconditional by multiplying the PoP and the conditional PQPF values. Secondly, the probabilities are truncated, ensuring no forecast is less than zero or greater than unity. Next, the forecasts are checked for monotonicity within each projection, by requiring that the probability of exceeding a threshold is not greater than the probability of exceeding the next-smallest threshold. If a "rarer" event is found to have a probability forecast greater than the "less rare" event, the "rare" event probability is set equal to the "less rare" event probability. Following this check, a consistency check is performed between 6-h and 12-h forecasts, as well as 12-h and 24-h forecasts, to make sure the probability for the longer period is at least as great as those of the shorter subperiods. Finally, a "best category" forecast is determined by using the probabilities and the thresholds described above.

4. OPERATIONAL PRODUCTS

Eta MOS PoP and QPF guidance products will be available starting in Spring 2002 as part of the new Eta MOS forecast guidance package. Dallavalle and Erickson (2002) describes the details of this new guidance and shows sample text messages. The 6- and 12-h PoP and categorical QPF values are presented in the text message. Figure 4 shows a sample of the precipitation portion of the text message. The four lines P06, P12, Q06, and Q12 contain the 6-h PoP, the 12-h PoP, the 6-h categorical QPF, and the 12-h categorical QPF, respectively. Additionally, all of the probabilities and categorical forecasts, including the 24-h guidance, will be made available in a Binary Uniform Form for the Representation of meteorological data (BUFR) message. Note that this guidance will not be available initially for sites outside of the continental United States (e.g., Alaska, Hawaii, Puerto Rico, etc.).

5. VERIFICATION

Test equations were developed by using the 1997-98, 1998-99, and 1999-2000 cool seasons (October 1 through March 31) for the stations in the developmental sample. These equations were then tested on the independent sample (the 2000-01 cool season) prior to the final equation development. Analogously, test equations for the warm season were developed by using the 1997, 1998, 1999, and 2000 warm seasons (April 1 through September 30) and tested on the independent sample (the 2001 warm season) prior to final development.

a. Cool Season

For PoP verification, we used the Brier Score (Brier 1950) and compared it to the operational NGM MOS PoP guidance (Su 1993) and AVN MOS PoP guidance (Dallavalle and Erickson 2000) available for the independent sample. Brier Score is similar to mean square error, but is used for binary events such as PoP. Figures 5 and 6 show these scores for 6-h and 12-h PoP forecasts, respectively, from the 0000 UTC model cycle for the cool season. Generally, the Eta MOS PoPs are superior to the AVN MOS PoPs through the early projections, with the AVN MOS showing more skill in later projections, particularly beyond 48 hours, where no Eta model data were available. Clearly, the Eta MOS and AVN MOS both show significant skill over the NGM MOS system. Verification from the 1200 UTC model cycle (not shown) shows a similar pattern.

Eta MOS categorical QPF output were also compared against the NGM MOS categorical QPF (Antolik 2000) and Eta direct model output (DMO), binned into the appropriate category. AVN MOS QPF was not used because it was not available for the 2000-2001 cool season. Figure 7 shows a plot of 6-h Heidke Skill Score (HSS) for these three systems for the 0000 UTC model cycle, as well as the Eta MOS QPF HSS improvement over the NGM for the cool season. All three systems show a decrease in skill with increasing projection; however, the Eta MOS QPF system is clearly the most skilled, and shows 5-15% skill improvement in HSS over the NGM MOS over nearly every forecast projection. Similar patterns can be seen for verification charts for the 1200 UTC model cycle and the 12-h QPF guidance (not shown).

b. Warm Season

Figures 8 and 9 show Brier Scores for the 6-h and 12-h PoP forecasts, respectively, from the 0000 UTC model cycle for the warm season. The scores are somewhat higher than the cool season scores (see Figs. 5 and 6). Remember that a higher Brier Score indicates less accurate forecasts. However, the Eta MOS PoPs still are superior through the early projections, but deteriorate beyond 48 hours. Clearly, both the Eta and AVN MOS show significant skill over the older NGM MOS system. The skill of the 1200 UTC Eta MOS guidance (not shown) is similar.

Unlike the cool season, AVN MOS QPF was available for the 2001 warm season, but only for the last two months. Figure 10 shows HSS for the entire warm season compared to NGM MOS and Eta DMO (AVN MOS is omitted here). As in the cool season, the Eta MOS shows skill over the NGM MOS and the DMO. Figure 11 shows HSS for the two-month portion of the warm season when AVN MOS QPF was available (plotted in place of Eta DMO). This is a rather limited sample; however, it seems that the AVN and ETA MOS QPF are generally equally skilled, and both are more skillful than the NGM MOS.

More complete verification information for all projections can be found on MDL's Test Results web page located at http://www.nws.noaa.gov/mdl/synop/results.htm.

6. OPERATIONAL CONSIDERATIONS

While the MOS technique can account for some biases in the model output, it is not effective in accounting for a poor model forecast. Forecasters should consider the validity of the model output when using MOS guidance. Note also that any future changes to the Eta model, particularly the precipitation algorithm, could affect the PoP and QPF guidance.

QPF probabilities are post-processed to ensure consistency; however, categorical forecasts are not post-processed and inconsistent forecasts can result (e.g., category 0 and 1 for 6-h forecasts, but category 4 for the 12-h forecast covering the period of the two 6-h forecasts). This will usually happen when the probabilities are very close to the thresholds, just higher in the 12-h case, but still below in the 6-h case, and can be rectified by considering other guidance products, such as PoP and thunderstorm guidance. (For example, very high PoPs and high probabilities of thunderstorms may suggest that the larger category is the better forecast.)

7. REFERENCES

Antolik, M. S., 2000: An overview of the National Weather Service's centralized statistical quantitative precipitation forecasts. J. Hydrology, 239, 306-337.

Black, T. L., 1994: The new NMC mesoscale Eta model: Description and forecast examples. Wea. Forecasting, 9, 265-278.

Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1-3.

Dallavalle, J. P., and M. C. Erickson, 2000: AVN-based MOS guidance - The alphanumeric messages. NWS Technical Procedures Bulletin No. 463, NOAA, U.S. Dept. of Commerce, 9 pp.

----, and ----, 2002: Eta-based MOS guidance - The 0000/1200 UTC alphanumeric messages.

NWS Technical Procedures Bulletin No. 486, NOAA, U.S. Dept. of Commerce, 8 pp.

Glahn, H. R., and D. A. Lowry, 1972: The use of Model Output Statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11, 1203-1211.

Jensenius, J. S., Jr., 1992: The use of grid-binary variables as predictors for statistical weather forecasting. Preprints, 12th Conference on Probability and Statistics in the Atmospheric Sciences, Toronto, Amer. Meteor. Soc., 225-230.

Rogers, E., T. L. Black, D. G. Deaven, G. J. DiMego, Q. Zhao, M. Baldwin, N. W. Junker, and Y. Lin, 1996: Changes to the operational "Early" Eta analysis/forecast system at the

National Centers for Environmental Prediction. Wea. Forecasting, 11, 391-413.Su, J. S., 1993: NGM-based MOS guidance for the probability of precipitation (PoP). NWS

Technical Procedures Bulletin No. 409, NOAA, U.S. Dept. of Commerce, 14 pp.

Zhao, Q., T. L. Black, and M. E. Baldwin, 1997: Implementation of the cloud prediction scheme in the Eta model at NCEP. Wea. Forecasting, 12, 697-712.


FIGURES