In 1993, for the first time in over a decade, the Techniques Development Laboratory (TDL) of the National Weather Service (NWS) developed statistical quantitative precipitation forecast (QPF) guidance based on an operational synoptic-scale numerical weather prediction (NWP) model. This QPF guidance utilizes the Model Output Statistics (MOS) technique (Glahn and Lowry 1972), applied to output fields from the Nested Grid Model (NGM) (Hoke et al. 1989) currently run twice daily at the National Centers for Environmental Prediction (NCEP). Six and 12-h categorical forecasts of accumulated liquid water-equivalent precipitation amounts up to 1 inch (for the 6-h forecasts) and 2 inches (12-h forecasts only) are produced each forecast cycle (0000 and 1200 UTC). These forecasts are valid for periods ending 12 to 60 hours later. Although produced for 718 sites in the contiguous United States and for 60 sites in Alaska, not all of the forecasts for the contiguous U. S. are disseminated as part of the alphanumeric guidance routinely produced by TDL (discussed below in section 2). At the time of publication of this bulletin, MOS categorical precipitation forecasts for approximately 660 locations in the contiguous U. S. and for all 60 Alaska sites are available to the NWS, the U.S. Air Force (USAF), and the private meteorological community over various military and civilian communication networks.
2. PRODUCT AVAILABILITY AND FORMAT
Upon implementation in late October 1993, the quantitative precipitation forecasts comprised the final element to be added to the FOUS14 (FWC) NGM MOS message for the contiguous United States. The NGM MOS message is an alphanumeric product which currently is available to NWS offices over the Automation of Field Operations and Services (AFOS) system and over the Advanced Weather Interactive Processing System (AWIPS) for the stations given in Technical Procedures Bulletin (TPB) No. 408 (Dallavalle et al. 1992), to various military facilities (as discussed in Miller 1993), and to private vendors via the NWS Family of Services Domestic Data Service. NGM MOS QPF guidance for Alaska was developed somewhat later, and implemented in September 1995. Similar alphanumeric messages, the FOAK25-29 bulletins, contain guidance for the 60 Alaskan stations. A companion message (FOAK14) containing MOS guidance for only the three NWS Forecast Offices in Alaska is distributed to the NWS Western Region via the AFOS system. Details concerning the format of the Alaskan guidance messages may be found in TPB No. 425 (Dallavalle et al. 1995).
An example of the FOUS14 (FWC) message format appears in Fig. 1. The single line labelled "QPF" and highlighted in the figure contains the quantitative precipitation forecast information. The forecasts consist of two digits, separated by a slash, which represent categorical amounts of 6- and 12-h accumulated precipitation, respectively, aligned under the column header associated with the end of the valid period. Note that the six-hourly times (06 UTC, 18 UTC) have only a single forecast appearing before the slash, as there are no 12-h forecasts valid at those times of the day. In the case of the Alaskan forecasts, however, the 12-h valid periods are offset by 6 hours so as to be more compatible with local time. So, in the case of the Alaskan guidance, the 12-h forecasts are valid at 06 UTC, 18 UTC, etc., and two digits appear in the message at these times instead. For the vast majority of forecast sites, the categorical QPF may take on one of six or seven values in the range 0-6, depending upon the
forecast period. These values represent a forecast of accumulated liquid-equivalent precipitation within one of the following exclusive intervals:
0 = no measurable precipitation during the period
1 = 0.01 - 0.09 inches
2 = 0.10 - 0.24 inches
3 = 0.25 - 0.49 inches
4 = 0.50 - 0.99 inches
5 = 1.00 inches or more (6-h forecasts)
1.00 - 1.99 inches(12-h forecasts)
6 = 2.00 inches or more (12-h forecasts only)
Note that a forecast of category 6 is only possible for the 12-h period. Furthermore, forecasts of the rarest events (i.e., 1 inch in 6 hours, 2 inches in 12 hours) are not made for portions of the western U.S., Northern Plains, and Great Lakes, as well as much of Alaska. We chose not to produce forecasts of 1 inch in the 12-h period for a smaller subset of those areas as well. The rationale behind these exceptions, as well as a more detailed description of the locations for which these categorical forecasts are not available, is included in Section 3 of this bulletin.
In the QPF line of Fig. 1, the characters "4/5", for example, represent the categorical precipitation amounts predicted during the 6- and 12-h periods ending 36 hours after the model cycle time (1200 UTC October 29). Note that these forecasts appear in the column of forecasts valid at 0000 UTC on October 31. The leading "4" indicates that the statistical system expected between 0.50 and 0.99 inches of liquid precipitation within the period beginning at 1800 UTC on October 30 and ending at 0000 UTC on the 31st. Likewise, the trailing "5" indicates that the MOS system expected a total accumulation in the range of 1.00 to 1.99 inches, inclusive, during the twelve hours from 1200 UTC October 30 to 0000 UTC October 31. Pairs of precipitation forecasts for other projections given in the message are interpreted in analogous fashion. The reader is again referred to Dallavalle et al. (1992) for a complete description of all other forecast elements contained within this message.
3.1 Statistical Method
In the development of the NGM-based statistical QPF, as with all other elements in the NGM statistical guidance package, TDL has employed the Model Output Statistics (MOS) technique (Glahn and Lowry 1972). The MOS technique uses statistical analysis to relate an historical sample of observed weather to output fields from operational numerical weather prediction models, observations, and other geoclimatic data. This is done in order to derive equations which express the relationships between the model forecasts and the occurrence of a particular weather event or events over the period of record. These equations then can be applied to a future set of model and atmospheric conditions. When applied to independent data, the set of equations yields forecasts of future sensible weather or, if the equations have been developed in an appropriate manner, the probability of occurrence of future weather events. The weather events for which the equations are intended to produce forecasts are referred to as the predictands, whereas the model fields and other meteorological data used in making the forecasts are referred to as the predictors.
Multiple linear regression with forward selection is the statistical technique generally used for MOS equation development. This regression procedure produces equations of the general form:
where Y is the predictand, the Xn are the predictors, the an are the predictor coefficients with the exception of a0, which is the regression constant (or Y-intercept of the line defined by the equation). The regression technique strives to minimize the total squared error of the resultant linear equation when it is applied to the dependent data, while the forward selection process deals with the specific predictors chosen for the equations and the order in which they appear. Usually, a large candidate pool of predictors is offered to the selection or "screening" procedure, but only a small subset of these ultimately appears in the MOS equation under development. The candidate predictors generally are those which are reasonable meteorologically, given the physical processes associated with the particular weather element to be forecast. The forward selection procedure first selects the predictor which explains the most variance of the predictand over the sample of dependent data. Once this most important predictor has been selected (i.e., the predictor which is most closely related statistically to the weather element being forecast), the forward selection procedure chooses the predictor which most reduces the remaining unexplained variance of the predictand when combined with the predictor already selected. Next, a third predictor is selected which most reduces the remaining unexplained predictand variance in combination with the first two, and so on. This process is allowed to continue until some predetermined condition(s) is/are met on the dependent sample. In the case of the NGM MOS QPF system, predictors were added to a given equation under development until a maximum of 15 were selected, or until the addition of new predictors accounted for less than a 0.1% reduction in predictand variance. Other details of the procedure are beyond the scope of this bulletin; however the interested reader may consult Murphy and Katz (1985) for a more theoretical treatment of this technique as applied to meteorological data.
In the case of the QPF system, the continuum of possible accumulated precipitation amounts was partitioned to yield a set of binary predictands during each of the 6- and 12-h periods for which forecasts were to be made. Our source for these reports was the 6-h station precipitation totals reported in the database of standard hourly aviation observations (SAOs) archived at TDL. The binary predictands represent the "occurrence" or "non-occurrence" of events which are defined to be the accumulation of precipitation greater than or equal to prescribed cutoff values. These cutoff values essentially are the lower bounds of each of the forecast categories listed in the preceding section. The predictands are treated cumulatively, meaning that a single observation of precipitation over one of the equation valid periods may qualify as a simultaneous occurrence of one or more distinct events. For instance, a single observation of 0.71 inches at a given location falling over a 12-h period is an "occurrence" of the four separate "events" defined as the accumulation of 0.01 inches, 0.10 inches, 0.25 inches, and 0.50 inches of precipitation. The remaining two possible "events" (i.e., the accumulation of 1.00 inches or 2.00 inches of precipitation) are deemed not to have occurred during the period. With the predictand defined in this manner, the regression technique yields a set of forecast equations for the probability of precipitation in a 12-h period equal to or exceeding 0.01 inches, 0.10 inches, 0.25 inches, 0.50 inches, 1.00 inches, and 2.00 inches. The set of 6-h probability equations is analogous except that there is no predictand and associated cumulative probability equation for the 2-inch amount. MOS regression equations for the entire set of predictands were developed simultaneously to promote consistency of the forecasts. That is, the same predictors will appear in the equations for all predictands. This type of regression procedure which utilizes one or more "binary" predictands is sometimes called Regression Estimation of Event Probabilities, or REEP. The interested reader may consult Glahn et al. (1991) for a more detailed treatment of issues associated with multi-category regression procedures. In the operational NGM MOS QPF system, the probabilities are subsequently converted to the categorical forecasts that appear on the MOS bulletins by a selection algorithm which compares each probability to predetermined "threshold" values (see section 3.4).
The predictors appearing in the QPF equations are almost exclusively obtained from NGM fields output on isobaric surfaces after post-processing. The NGM output fields used by TDL are archived in gridded form at NCEP on the 190.5-km grid used for the obsolete Limited-Area Fine Mesh (LFM) model. Before they are offered as predictors to the screening regression routine, the NGM fields must first be interpolated from this grid to the development station locations, although in many cases certain operations are performed on the raw output before interpolation. These operations might be the calculation of common derived meteorological quantities such as vorticity or static stability, or the evaluation of entirely new "interactive" predictor quantities. Interactive predictors are combinations in which two or more common meteorological fields are used mathematically in a way which mimics how a forecaster might consider their combined effects in the process of making a forecast. In addition to the interpolated fields, station-specific climatic information is also offered during predictor selection. No surface observations were used as predictors in the development of this system, although they are often used by the forecast equations for other weather elements in the MOS package.
Another type of predictor which has been used quite extensively in the QPF development is the grid binary (Jensenius 1992). A grid binary predictor compares the value of a particular gridded field to a specified cutoff value at every gridpoint. Gridpoints where the field exceeds the cutoff are assigned a value of "1", while all others are set to "0". This field of "1"s and "0"s is then smoothed and interpolated to the station locations, resulting in an interpolated value that falls into the range 0-1. The magnitude of the interpolated value, then, gives an idea of the extent of the area surrounding the station with gridpoint values which meet or exceed a specified criterion.
Figure 2 highlights the differences between the interpolation of an ordinary continuous gridded field, and the treatment of the same field as a grid binary. The letters "STA" represent a hypothetical station location within an area containing 16 gridpoints (denoted by the crosses in the figure). Figure 2a shows a section of a hypothetical relative humidity field. Note that the station indicated in the figure lies roughly at the western edge of the analyzed region of relative humidity exceeding 70%. The values to the right of each gridpoint represent the unsmoothed values of the field (in percent); the normal statistical treatment of this field as a continuous variable would be to apply a smoother to the raw values before interpolation. Interpolation in this example leads to a station value of around 68%.
Fig. 2b illustrates the value which is obtained at STA if we treat the relative humidity field as a grid binary with a cutoff value of 70%. Only gridpoints falling inside the 70% contour are assigned a value of "1" and the entire field is then smoothed. These smoothed values are interpolated to the station location, resulting in a predictor value of 0.38. Again, this value can be thought of as giving us some idea of the areal extent of the region surrounding the station that contains gridpoint values exceeding the cutoff of 70%.
In general, then, the predictors found in the QPF system fall into one of four classes: NGM "basic" forecast fields, NGM "derived" fields, NGM grid binary fields, and geoclimatic variables. Table 1 at the end of this document provides a list of specific predictor variables used in the QPF development and their classification as discussed previously. For the NGM-derived predictors, the right-hand column gives the specific vertical levels at which these variables were used. Where appropriate, the values in the square brackets indicate the grid binary cutoffs which were employed in the QPF development. Note that in the case of the grid binaries, some individual variables appear in the table with both associated levels and cutoffs. This does not imply that variables with all combinations of listed cutoffs and vertical levels were offered to the screening regression. The list merely reflects the entire range of possible cutoffs and levels over which grid binary versions of the particular variable were used.
As an illustration of the results of the regression procedure, a typical set of NGM MOS QPF equations is given in Table 2. This particular set of equations is valid for the 12-h period ending 24 hours after the 0000 UTC NGM cycle time, and is valid for stations in the mid-Atlantic coastal region of the United States (region 15 in Fig. 4a below). Table 2a lists the predictor variables and the applicable SI units and forecast projection (tau) of NGM output associated with each. Regression constants and coefficients (the a0 and an's of the general model) associated with each predictor for the six predictand amounts discussed in section 3.2 are given in Table 2b. Notice that the same set of predictors appears in the equations for all precipitation amount predictands, but the coefficients of each individual predictor differ among the various equations. Regression constants are also different for each of the six predictands. Use of the same set of predictor variables in equations for all predictand amounts helps to promote consistency in the predictand probabilities.
Interestingly enough, the results of the regression analysis performed during the development show that the statistically most important predictors in the equations are not the raw gridpoint precipitation fields as output by the model. The predictor variable which was selected most often (i.e., reduced the most variance of the dependent data) in most regions of the country over both forecast cycles and most forecast projections was the grid binary of mid-tropospheric (1000 mb - 490 mb) mean relative humidity at the 70% cutoff. This was followed by the grid binary of NGM gridpoint precipitation amount at the 0.01-inch cutoff. This implies that the model contains useful information regarding the expected precipitation amount which may not always be reflected in the model's explicit precipitation forecasts. Furthermore, the fact that the significant grid binary cutoffs are those at the lower end of the scale would imply that the model may not have much skill in forecasting the placement or magnitude of the heavier precipitation events. The most important factor would seem to be a model forecast of precipitation, regardless of amount. When examining the performance of the model and NGM MOS QPF equations on a sample of independent data, we observed behavior which would tend to support these general observations. This information is discussed further in section 4 covering the operational performance of the system.
In the example regression equation given, one will note the frequent appearance of grid binary predictors. Where both continuous and grid binary versions of important meteorological quantities were offered to the screening process, the grid binaries were overwhelmingly chosen in nearly all cases. The reasons for this are probably twofold. First, many meteorological quantities which are important to the precipitation process are inherently nonlinear. The binaries help the regression procedure to adapt to this inherent incompatibility between the nature of the predictor/predictand relationships and the formulation of the linear regression model. Secondly, model forecasts of variables such as precipitation amount, relative humidity, and vertical velocity are often characterized by sharp gradients. In other words, the spatial patterns of these variables usually consist of several localized maxima or minima (i.e., "bull's eyes") superimposed on a field which otherwise varies only moderately. Small phase errors in model forecasts often lead to large differences in the interpolated values of predictor variables at point locations. The grid binaries seem to help the MOS system to smooth out some of the phase error in NGM forecast fields which exhibit this type of variability.
3.4 Categorical Forecast Selection
After the MOS equations have been applied operationally at a particular location, the set of MOS probabilities is converted to a single categorical forecast by an algorithm which compares each forecast cumulative probability to a threshold value determined during the equation development process. One threshold is derived for each probability equation developed. The thresholds are designed to yield categorical forecasts which adhere to desired characteristics over the dependent data sample. In the case of the NGM MOS QPF, we wanted to maximize the skill of the categorical system as indicated by Critical Success Index (CSI)(1) scores. We particularly wished the categorical system to exhibit the best CSI possible on the rarest events for which the MOS system exhibited appreciable forecast skill. At the same time, however, we hoped to produce nearly unbiased categorical forecasts. Thirdly, we wished to meet the above conditions without unduly increasing the false alarm rates for each predictand. Specifically, we chose the thresholds which yielded maxima in CSI for each category under the condition that these maxima be achieved with a bias(2) close to unity. By insisting on near-unit biases we, in effect, required that the categorical forecasts closely reflect their observed relative frequencies within the sample of data used for development. If the thresholds producing the maximum CSI resulted in intolerably high false alarm rates on the dependent data, then the bias condition took precedence.
Thus, the thresholds chosen are related to the skill inherent in the MOS system. Where the MOS forecasts have limited skill, thresholds are such that the bias of forecasts should be near unity. On the other hand, where the MOS forecasts were seen to have appreciable skill, we adopted a strategy which allowed the bias to increase to the point where the skill of the categorical forecasts would be maximized. Since the skill of the NGM MOS QPF system tends to decline with rarity of the predictand event and increasing forecast projection (see Section 4 below), thresholds for all events at the extreme projections were derived by using the unit bias condition. This is also true for the rarest events at all projections. For the more common events over the contiguous U.S., thresholds were chosen which resulted in biases somewhat greater than 1.0 at the earliest projections. Biases of up to a maximum of 1.4 in the case of the second-rarest event (the 12-h forecasts of 1.00 inch or more and the 6-h forecasts of 0.50 inches or more) were permitted. In this case, slight overforecasting was allowed because we felt it especially important to maximize the performance of the categorical forecasts of the rarest event for which the system possessed appreciable skill. For the Alaska forecasts, however, we found that the system performed more reliably when thresholds were set on the basis of the unit bias condition for all equations.
Once the appropriate thresholds have been obtained, they are applied to the operational forecasts according to an algorithm which compares the probability forecasts for each predictand amount to their corresponding thresholds. The comparison is performed in a set manner, always beginning with the rarest event and proceeding, in turn, to the most common. The first predictand amount encountered for which the forecast probability exceeds the threshold determines the lower bound of the category selected for dissemination in the NGM MOS bulletin. This procedure is illustrated in Fig. 3. The figure shows the set of six 12-h probability forecasts which were issued for Washington, D.C. (DCA) at 1200 UTC on October 29, 1993, and valid for the period ending 48 hours later, or at 1200 UTC October 31, 1993. Individual precipitation amount probabilities are indicated by the bar graphs, while the stairstepped solid line shows the corresponding threshold value for each predictand. Corresponding category numbers appear across the top part of the graph, above the thresholds. In arriving at the categorical forecast which ultimately appeared on the NGM MOS bulletin (shown in Fig. 1), the algorithm compared each forecast probability to the corresponding threshold, beginning with the pair for the 2.00-inch predictand as indicated by the arrow at the right-hand side of Fig. 3. Probability forecasts for both 0.10 inches and 0.01 inches are seen to exceed their corresponding thresholds. However, the algorithm selects the heaviest amount for which the threshold is exceeded, resulting in a forecast of category "2". Note that the performance of precipitation forecasts from both the NGM and the NGM MOS QPF is superior during the cool season. During the winter months, precipitation is more likely to be forced by synoptic-scale weather systems which are well sampled and resolved by the NGM's regional analysis 1 and to the right of the slash separating the two entries valid at 1200 UTC October 31.
3.5 Data Stratification
The data sample for our development of the NGM MOS QPF for stations in the contiguous United States consisted of a 7-year collection of SAO precipitation reports covering the period from October 1986 to September 1993. Since NGM MOS QPF for Alaska was developed after completion of the system for the contiguous U.S., we were able to take advantage of a full 8-year dependent data sample with the addition of data covering the period from October 1993 to September 1994. Because of the differences in the nature of meteorological processes governing precipitation events during the winter and summer months, this sample was separated into two seasons: cool (October 1 - March 31) and warm (April 1 - September 30). Separate, independent developments of regression equations were performed for each season, and this seasonal division is adhered to when the equations are used operationally to generate MOS guidance forecasts. That is, the operational MOS system uses the cool season set of equations to generate QPF guidance from October 1 to March 31, and the warm season set during the rest of year.
It was clear from inspection of our data that the heaviest precipitation events occur quite rarely, especially in the dry, mountainous western United States. The relative scarcity of observations of a 1- or 2-inch accumulation within a 12-h period and of a 1-inch accumulation over a 6-h period posed a particular development problem, since the probability equations for all precipitation amounts were to be derived simultaneously. Even though a particular station might frequently experience light precipitation, it was often true that relatively few reports of the heaviest amounts were available at the same location. Linear regression techniques tend to be unstable mathematically when applied to small samples of predictand data so, even though we enjoyed a 7-year data collection period, we were essentially faced with an insufficient sample of these rare "events" in many areas. Thus, we recognized that it would be difficult to produce reliable single-station probability equations for amounts exceeding the highest cutoffs.
It therefore was necessary to combine data from several neighboring development sites in order to derive a single set of stable equations valid at all sites within each group. This process is known as "regionalization".(3) Figure 4 shows the regions used for development of the cool season QPF system for both the contiguous U.S. and Alaska, while Fig. 5 presents those used in warm season development. For the contiguous U.S., note that 19 regions were used for both seasons. After examination of the data for availability and quality control, we determined that data from 396 of the operational sites were reliable enough to be used for development of the warm season QPF system, while data at 399 were sufficiently good to be used for the cool season. For the Alaska QPF system, on the other hand, quality control of data presented us with many challenges. We found that many sites were subject to partial-day closure or prone to intermittent reporting. This was particularly true of remote locations during the winter months. In many instances, it was difficult to determine whether liquid-equivalent amounts reported during or after snowfall events represented amounts which had actually fallen during the time period for which the observation was valid, or had resulted from melting and/or gauge readings that had occurred well after the events had ended. When quality control was complete, data from only 27 sites could be used for equation development.
In determining which stations to include within a given region, we paid particular attention to the spatial distribution of the most important predictors. These were determined by single-station regression for the categories where sufficient data were available. Stations where the model and the predictands were similarly related were grouped together wherever possible. In addition, climatological information such as the observed relative frequency of both light and heavy precipitation events and the perceived effects of terrain also played a role in our final selection of regions. In short, we wished to group stations where the climatology, the terrain, and the model physics depicted the QPF problem in a consistent manner.
Even after regionalization, we still could not dependably forecast the rarest events in certain areas of the country. For instance, at many stations in the western U.S, the 7-year observed relative frequency of 2 inches of accumulated precipitation within a 12-h period was less than 0.5%. In other words, there was less than one observation of the event out of every 500 taken. Seven complete seasons of data at a given site amount to around 1275 total observations, leaving an average of 3 or 4 "events" in the entire developmental sample. Under these circumstances, we would need to combine data from stations over too large an area to achieve stable forecasts. When data from too many stations are combined, the danger is that some of the MOS system's ability to account for site-specific climatologies and systematic model errors could be compromised.
With this in mind, we elected not to produce probability equations for the rarest events (2 inches in 12 hours and 1 inch in 6 hours) for most of the western U.S. except for the northwest Pacific coast, the northern Plains, and the Great Lakes region. In addition, the equation for 1 inch in 12 hours was not developed for the cool season in the mountainous West. Because of these missing probability equations, categorical forecasts of categories 5 (1.00-1.99 inches) and 6 (2.00 inches or more) are not made for these areas. Where forecasts of the rarest events are not made, a forecast of the uppermost available category should be strictly interpreted as a prediction of precipitation amount greater than or equal to the lower cutoff of the category. Tables 3 and 4 provide a complete breakdown by region and forecast projection of the specific categorical forecasts not produced.
In the case of Alaska, these climatological restrictions were even more severe, especially during the cool season. Despite the frequent occurrence of heavy snowfall, the extreme Alaskan temperatures lead to a very low liquid water content for many of these events. As a result, observations of liquid-equivalent amounts exceeding 0.5 inches in 12 hours are quite rare. Therefore, it was impractical to attempt forecasts of the uppermost three categories in the northern portions of the state. In the extreme north (region 1 of Fig. 5b), forecasts of more than 0.25 inches in either 6 or 12 hours (category 2) are also unavailable for the warm season. This is a combination of both the rarity of the event and decisions made during the development process to enhance the performance of the MOS system for the more common events. It would have been possible to produce equations for the 0.25-inch amount if data from this region were combined with that from neighboring regions. During developmental tests on independent data, however, we found that the statistical relationships between NGM forecasts and precipitation events in this region were quite different from the surrounding area. Therefore, pooling of data with other sites led to a deterioration in MOS QPF performance. Thus, the omission of forecasts for rarer events was tolerated in order to improve the overall skill and consistency of forecasts in this region. Farther south, where warmer temperatures are more prevalent and thunderstorms occasionally are observed in the summer months, the data sample permitted the development of stable statistical equations for most predictands. The microclimate of one station, Yakutat, was sufficiently unique to justify the development of a set of QPF equations valid for that location alone. At Yakutat, the observing site is located in a protected bay, surrounded by 18,000-foot mountains. At all times of the year, the acute orographic forcing of moisture-laden air makes Yakutat by far the wettest location in the state. Thus, even without regionalization, there was no shortage of heavy precipitation events in our dependent data sample.
4. MOS QPF PERFORMANCE
4.1 Verification of Operational Forecasts
At the end of the first year following system implementation, we performed an extensive verification covering categorical forecasts made at all MOS forecast sites up to that time. Performance of the NGM MOS QPF during the 1993-94 cool and 1994 warm seasons was compared to the raw NGM gridpoint precipitation fields after interpolation to MOS station locations. Verification results were compiled for as many of the 718 MOS sites in the contiguous U.S. as for which verifying observations were available. This yielded a verification sample of approximately 60,000 forecasts for each projection. A representative sample of these results is presented in Figs. 6 and 7. These figures compare the CSI scores of the NGM MOS QPF system with that of the raw NGM gridpoint precipitation forecasts at each projection from the 0000 UTC model cycle time for observed amounts of 0.5 inches or more. Figure 6 shows the comparative scores for the 1993-94 cool season, with the 6-h forecasts in part (a) and 12-h forecasts in part (b). Figure 7 presents the analogous information for the 1994 warm season. CSI scores for other predictand amounts show similar trends and are not shown here.
Note that the performance of precipitation forecasts from both the NGM and the NGM MOS QPF is superior during the cool season. During the winter months, precipitation is more likely to be forced by synoptic-scale weather systems which are well sampled and resolved by the NGM's regional analysis
Since the Alaska NGM MOS QPF system was developed much later than the equations for the contiguous U.S., the 1993-94 cool and warm seasons were part of the developmental data sample for the operational Alaska QPF. However, data from this period were withheld from the dependent sample during development of the Alaska NGM MOS QPF system so that its likely performance could be evaluated. Results from these independent-data tests are shown in Figs. 8 and 9 for the cool and warm seasons, respectively. As can be seen, results of the verifications for Alaska show many of the same characteristics as seen in the data for the rest of the U.S., although the scores are more variable due to the sparse data coverage over Alaska. With only 27 stations contributing reliable verification data, there were only about one twentieth of the number of test cases as were available for the verification of the MOS QPF over the rest of the country. For this reason, the Alaska scores are more unstable and do not show the same monotonic decline in skill seen in the results for the contiguous U.S. One notable characteristic persistent over all projections, however, is that the CSI scores for the NGM gridpoint precipitation are considerably poorer than over the rest of the U.S. While lighter amounts of precipitation occur much more frequently in Alaska than they do over the contiguous U.S., the 0.5-inch event occurs with about the same frequency. Apparently, the NGM has more trouble forecasting the significant amounts over Alaska, even though they are no less common. Despite this, the skill of the MOS QPF is comparable to that achieved over the contiguous U.S. This suggests that there are signals in various NGM fields which are useful for predicting heavier precipitation, but these signals are not reflected in the model gridpoint precipitation output. As a result, the difference in skill between the NGM MOS and the NGM is so great that the 54-h MOS forecasts almost always are more skillful than those from the model at any projection. The only exception to this is the warm season 6-h forecasts valid at the 42-h projection, where the NGM exhibits uncharacteristically high skill. Due to the extremely small number of verifying cases, this may be attributable to instability in the scores.
We also examined the dependence of CSI and bias scores on the forecast amount. Figure 10 shows the results of this analysis for both seasons over the contiguous U.S., while Fig. 11 provides the same information for the Alaska forecasts. Scores are based on cumulative amounts, meaning that cases with forecasts or observations of 2 inches or more are included in computing the scores for the 1-inch predictand, and so on. The solid lines bisecting the figures at a constant value of "1" represent perfect forecasts in terms of both CSI and bias.
Again, the MOS QPF shows similar characteristics in both the contiguous U.S. and Alaska. In general, the CSI scores indicate that the MOS QPF maintained a consistent edge in performance over the NGM gridpoint precipitation during both seasons and for all forecast amounts. Overall, the skill of both systems is quite good for the lightest amounts, somewhat less so for amounts in the range of 0.1 to 0.5 inches, and deteriorates rapidly for accumulations above 0.5 inches. Apparently, both the MOS and model have great difficulty with the forecasts of the rarest events.
The curves depicting the bias characteristics show that the NGM MOS QPF maintains a more consistent bias across all forecast amounts, and that this bias remains close to unity. The deviations observed in the MOS QPF 2-inch CSI and bias values are likely a reflection of instability in the scores due to the extremely small number of observed cases in the 1-year independent sample. Unlike the MOS QPF, however, the NGM gridpoint precipitation fields exhibit a pronounced tendency to overforecast the lighter
amounts while grossly underforecasting the heaviest events. As Fig. 10 indicates, NGM overforecasting of the lighter amounts is most severe during the warm season over the contiguous U.S., whereas underforecasting of the heavier amounts is about equal in severity, regardless of location and the time of year. Similar NGM bias characteristics have been well documented in studies done at the NCEP Hydrometeorological Prediction Center (Junker, et al. 1989).
As mentioned previously, the NGM MOS QPF system makes use of cumulative probability information from a number of forecast equations to arrive at the best category forecasts. Communications bandwidth considerations limit our ability to transmit more than the single, categorical forecast in the alphanumeric messages. Unfortunately, useful information contained in the set of probabilities frequently can be lost in the process of conversion to a categorical forecast. This is especially true in situations where, for example, the forecast probabilities for a given location are all nearly equal to their thresholds or where the probabilities for two or more predictand amounts are nearly equal to each other. Given the nature of the categorical selection algorithm, the rarer event may be chosen as the "best" categorical forecast, even though the whole of the probabilistic information suggests that lesser amounts may be just as likely. Thus, there will sometimes be situations in which the categorical forecasts offer an incomplete or somewhat misleading representation of the information contained in the MOS QPF probabilities.
These properties of the categorical selection procedure can manifest themselves in a number of ways. First of all, since the 6- and 12-h MOS QPF were developed independently of each other, the probability/threshold system can sometimes produce inconsistencies between the 6- and 12-h categorical forecasts. We do not post-process the MOS QPF categorical forecasts to enforce consistency; therefore contradictory 6- and 12-h forecasts may at times appear on the NGM MOS bulletins. Figure 12 depicts an example of this problem. Both 6-h forecasts and the single 12-h forecast for Atlantic City, NJ (ACY) covering the period 24-36 hours after 0000 UTC 25 February 1994 are highlighted in the figure. Note that while both 6-h forecasts call for no measurable precipitation, the 12-h forecast covering the same time period indicates that a category "4" event is expected. In this case, the 12-h probabilities for a number of predictands were very near their corresponding thresholds, with the probability of 0.5 inches or more only slightly exceeding its threshold. The forecast of category "4" was chosen by the selection algorithm, as was to be expected, despite the knowledge that probabilities for all other amounts did not exceed their thresholds, and despite a forecast of category "0" from the 6-h probabilities.
Inconsistencies of the type shown can usually be resolved quite easily by examining forecasts valid for
the same time period but made at neighboring locations. This may be accomplished either by examining individual NGM MOS messages for other forecast locations or by plotting the categorical QPF and inferring the proper categorical forecast which preserves coherence. Secondly, forecasts for other weather elements appearing in the NGM MOS message may also provide information which will help the forecaster resolve QPF inconsistencies. In addition to the obvious mismatch between the 6- and 12-h categorical QPF in Fig. 12, the probability of precipitation forecasts and thunderstorm probability forecasts also remained quite low. Both of these are inconsistent with what might be expected from a category 4 rainfall event. Furthermore, snowfall amount equations indicated a 12-h forecast of category "1" which calls for a 1-2 inch accumulation, again inconsistent with the indicated 0.5 inches or more of liquid water from the 12-h QPF.
Secondly, we have found that certain properties of the probability/threshold pairs can influence the performance of the MOS QPF categorical forecasts. Specifically, in certain circumstances involving vigorous synoptic-scale systems, the categorical guidance may overforecast the magnitude of the heaviest precipitation amounts, especially at the later projections. This is a result of the reliability and skill characteristics inherent in MOS regression equations and thresholds. We have observed that the
MOS probability forecasts are also generally unbiased over their entire range, which is a consequence of the least-squares error minimization of the linear regression technique. By unbiased, we mean that the observed relative frequency of a given predictand is, over a long period of record, equal to the average forecast probability. While this must be true in a general sense over the complete range of forecast probabilities, it is possible for MOS probability equations to contain biases over a given subinterval of values. When statistical forecasts remain strictly unbiased in piecewise fashion over the entire range of possible forecast values, the forecasts are also said to be reliable. For example if the mean MOS forecast probability of measurable precipitation is 30%, precipitation should fall 30% of the time. But if we consider only those occasions for which the forecast probability is 30%, precipitation may not necessarily fall in 30% of them unless the forecasts are strictly reliable. The NGM MOS QPF system was developed to yield a high degree of reliability on the independent test data.
For a categorical system which has been designed to maintain an approximate unit bias over all forecast categories and projections, it follows from the preceding considerations that the thresholds for the MOS QPF equations also must decline with increasing projection and precipitation amount. Thus, as is true of the average MOS probabilities, thresholds for the rarest events at the later projections tend to be quite small. For example, thresholds for 60-h forecasts of the 2-inch accumulations in 12 hours typically range from about 4 to 8 percent, depending on geographical region. While these thresholds result in MOS forecasts of category "6" approximately as many times as observed, reliability considerations suggest that these forecasts verify only 4 to 8 percent of the time. The basic problem is, of course, the unavoidable lack of skill in predicting extreme events at extreme projections. With a strong, synoptic-scale system which is well represented in the model, the forecast MOS probabilities run much higher than average, causing the (relatively low) thresholds to be exceeded. This can lead to overforecasting by the categorical guidance in these cases, especially where the synoptic-scale forcing is most intense.
We have observed that the NGM MOS QPF often overforecasts amounts by about one category where the statistical signals for heavy precipitation are the strongest. As might be expected from the above discussion, this most often occurs at the latest projections. In situations where the NGM expects a significant event, the 54- and 60-h MOS QPF may contain a number of forecasts of extreme events (6-h category "5" or 12-h category "6") embedded along the expected axis of heavy precipitation. We would advise forecasters to regard these rare-event forecasts merely as an advance warning that signals exist within the NGM which indicate that the model expects a significant precipitation event, and not necessarily that the amounts ultimately observed will reach or exceed those associated with the particular MOS rare-event categories. If MOS forecasts of extreme events are seen to persist in subsequent forecast cycles (i.e., as the anticipated event approaches), then the forecaster should have greater confidence that precipitation amounts in line with the actual MOS categorical forecast values will be observed. Furthermore, since the MOS QPF utilizes predictors other than the NGM gridpoint precipitation fields, signals of an extreme event may not be observed in the model precipitation forecasts themselves, or may show up somewhat later than initially observed in the MOS QPF.
Figure 13 presents a composite of the observed 12-h precipitation amounts (a), NGM MOS QPF (b,c), and NGM gridpoint precipitation forecasts (d,e,f) for the case of 0000 UTC 31 October 1993. This case illustrates many of the characteristics of the NGM MOS QPF we have discussed previously. It is representative of several 1993-94 cool season cases we have examined in order to better understand how the NGM MOS QPF might behave under operational conditions. The cases we examined generally featured high-amplitude synoptic-scale troughs centered in the east-central portion of the country. This synoptic pattern places the mid-Atlantic region in broad southwesterly flow aloft, with outbreaks of precipitation occurring as short-wave systems migrate through the trough position. The configuration of surface and upper-level flows associated with this pattern allows an ample flux of moisture into the region from the Gulf of Mexico.
In the particular situation of Fig. 13, general 12-h rainfall totals of 0.5 inches or more were observed from south-central Pennsylvania through central South Carolina and the Georgia coast, largely confined to the eastern slopes of the Appalachian Mountains (presumably due to orographic lift of moisture in the lowest levels of the atmosphere). Within this area, 12-h totals of 0.7-0.8 inches were observed surrounding the Washington, D.C. area, with additional maxima at Greensboro, North Carolina, and Charleston, South Carolina. Panels (b) and (c) of the figure indicate that the MOS QPF correctly identified the axis of heaviest precipitation as much as 48 hours in advance. While the location of this axis is correctly placed, the MOS QPF guidance has overforecast amounts along this axis by about one category. As the event approached, the area of category "5" forecasts along this axis became smaller and centered more over Maryland and Delaware as can be seen in the 36-h MOS forecasts. By the 36-h projection, the overforecasting has diminished and the MOS QPF correctly has confined most of the 0.5-inch precipitation amounts to the east of the Appalachian ridges. Overforecasting of categorical amounts has occurred in this case because forecast probabilities for the heaviest predictand amounts were much higher than average. As a result, the relatively low 48-h thresholds for the uppermost categories were exceeded over a wide area. Since the forecast probabilities of the 1-inch predictand remained relatively constant as the event approached, the higher thresholds associated with the 36- and 24-h NGM MOS QPF (not shown) produced an overall reduction in the areal coverage of 1-inch amounts.
On the other hand, the NGM gridpoint precipitation forecasts are somewhat misleading as to the spatial distribution of precipitation, which illustrates the MOS QPF lead time advantage reflected in the aggregate verification statistics. Forty-eight hours before the event, the NGM has grossly overforecast amounts in the northern portion of the region (showing a band in excess of 1.4 inches from near DCA into southern New Jersey). Interestingly, this runs contrary to the general model tendency to underforecast the heaviest precipitation amounts. At the same time, the model has failed to indicate that much precipitation would extend farther to the south. Note that the model seems to emphasize that the heaviest precipitation will occur in an east-west band, and that it expects significant "wrap around" precipitation to occur west of the Appalachians. It was not until 24-h before the event (Fig. 13f) that the model ceased to call for significant precipitation west of the mountains and brought amounts more in line with those eventually observed on the eastern slopes.
This set of forecasts illustrates some of the general characteristics observed in our sample of cases, especially when the primary maximum of observed precipitation occurred east of the Appalachians. The model itself consistently placed too much emphasis on synoptically-driven maxima near the northern edge of precipitation systems (presumably near the track of the upper-level short-wave vortices), and too little on precipitation occurring farther to the south. Underforecasting of precipitation in cool season NGM forecasts over the southern U.S. was also seen by Junker et al. (1989) in their analysis of situations featuring appreciable Gulf inflow.
The MOS QPF categorical forecasts tended to more accurately depict the overall shape of the observed precipitation pattern at the later projections. In situations like that of Fig. 13 where the MOS and the NGM eventually agreed on the pattern characteristics, the MOS generally arrived at a more correct orientation of precipitation patterns about 12 hours ahead of the model. Interestingly enough, the 36-h forecasts from both the MOS and the model seemed to be somewhat better than the 24-h forecasts valid at the same time. Where we observed significant outbreaks of overrunning precipitation, the MOS forecasts also seemed to correct for the model underforecasting often observed in these situations. The MOS QPF also frequently hinted at the presence of convective activity, although MOS forecast of amounts were usually far short of the totals observed in heavy convective outbreaks. This is likely a result of the inability of both the MOS system and underlying NGM to depict mesoscale features of the precipitation field.
We must emphasize that the above qualitative impressions of MOS QPF performance are reflective of a limited number of cases under very specific synoptic conditions. It is likely that the MOS QPF performs quite differently under other weather regimes, in other months of the year, and in other areas of the country. Indeed, this is suggested by the general, unbiased nature of MOS forecasts. While MOS forecasts can correct for systematic errors in model forecasts, they probably do not consistently compensate for errors made only in specific situations. We strongly advise forecasters to develop a feel for the performance of the NGM MOS QPF guidance over a broad range of geographic areas and synoptic conditions. Knowledge of the error characteristics of the guidance in a variety of situations is likely to be the forecaster's best interpretive tool.
Antolik, M. S., 1995: NGM-based quantitative precipitation forecast guidance: Performance tests and practical applications. Preprints 14th Conference on Wea. Analysis and Forecasting, Dallas, Amer. Meteor. Soc., 182-187.
, 1996: Toward an objective NGM-based expected-value quantitative precipitation forecast system. Preprints 13th Conference on Probability and Statistics in the Atmospheric Sciences, San Francisco, Amer. Meteor. Soc., 57-64.
Dallavalle, J. P., J. S. Jensenius, Jr., and S. A. Gilbert, 1992: NGM-based MOS guidance--The FOUS14/FWC message. NWS Technical Procedures Bulletin No. 408, National Oceanic and Atmospheric Administration (NOAA), U.S. Department of Commerce, 16 pp.
, S. A. Gilbert, and F. G. Meyer, 1995: NGM-based MOS guidance for Alaska--The FOAK13/FOAK14 messages. NWS Technical Procedures Bulletin No. 425, National Oceanic and Atmospheric Administration (NOAA), U.S. Department of Commerce, 13 pp.
Glahn, H. R., and D. A. Lowry, 1972: The use of Model Output Statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11, 1203-1211.
, A. H. Murphy, L. J. Wilson, and J. S. Jensenius, Jr., 1991: Lectures and papers presented at the WMO training workshop on the interpretation of NWP products in terms of local weather phenomena and their verification, Wageningen, The Netherlands, World Meteorological Organization, Geneva, 340 pp.
Hoke, J. E., N. A. Phillips, G. J. DiMego, J. J. Tuccillo, and J. G. Sela, 1989: The regional analysis and forecast system of the National Meteorological Center. Wea. Forecasting, 4, 323-34.
Jensenius, J. S., Jr., 1992: The use of grid-binary variables as predictors for statistical weather forecasting. Preprints 12th Conference on Probability and Statistics in the Atmospheric Sciences, Toronto, Amer. Meteor. Soc., 225-230.
Junker, N. W., J. E. Hoke, and R. H. Grumm, 1989: Performance of NMC's regional models. Wea. Forecasting, 4, 368-390.
Miller, D. T., 1993: NGM-based MOS wind guidance for the contiguous United States. NWS Technical Procedures Bulletin No. 399, National Oceanic and Atmospheric Administration (NOAA), U. S. Department of Commerce, 19 pp.
Murphy, A. H., and R. W. Katz, 1985: Probability, Statistics, and Decision Making in the Atmospheric Sciences. Westview Press, 545 pp.
Table 1. Predictor variables offered as candidates to screening regression process during development of NGM MOS QPF equations. Right-hand column of table indicates vertical pressure levels at which predictors are valid. In the case of the grid binary predictors, quantities appearing in square brackets indicate the possible grid binary cutoffs. A comprehensive listing of levels and cutoffs is given for brevity; i.e. not all combinations of grid binary levels and cutoffs were actually used in development.
|NGM Basic Fields||Levels|
|6-H Accumulated Precipitation||-|
|12-H Accumulated Precipitation||-|
|Earth-Relative U/V Wind Components||10m, 950 mb, 850 mb, 700 mb|
|NGM Derived Fields||Levels|
|Moisture Divergence||950 mb, 850 mb, 700 mb|
|Relative Vorticity||950 mb, 850 mb|
|Advection of Relative Vorticity||700 mb, 500 mb|
|Q-Vector Divergence||500 mb|
|Vertical Velocity * Mean Rel. Humidity1||950 mb, 850 mb, 700 mb|
|NGM Grid Binaries [Cutoffs]||Levels|
|Vertical Velocity [1 Áb/s, 2 Áb/s, 3 Áb/s, 5 Áb/s, 9 Áb/s]||950 mb, 850 mb, 700 mb, 500 mb|
|Relative Humidity [70%, 90%]||950 mb, 900 mb, 850 mb, 700 mb, 500 mb|
|Mean Relative Humidity1 [50%, 70%, 90%]||-|
|6-H Accumulated Precip. [.01", .05", .10", .25"]||-|
|12-H Accumulated Precip. [.01", .05", .10", .25"]||-|
|K-Index [20C, 30C, 35C, 40C]||-|
|Station Latitude and Longitude||-|
|Sin/Cos (Day of Year)||-|
|Sin/Cos 2*(Day of Year)||-|
1 Mean relative humidity is calculated between the surface and 490 mb.
Table 2a. Predictors used to compute the probabilities of 12-h accumulated precipitation valid for the period ending 24 hours after 0000 UTC for all six predictand amounts in NGM MOS QPF equations for cool season (October-March) region 15 (Fig. 1b). All predictors are obtained directly or post-processed from NGM output; those predictors for which no units are provided are unitless.
|1||12-h Accumulated Precipitation||GB||24|
|2||12-h Accumulated Precipitation||C||m||24|
|3||1000-500 mb Mean Relative||GB||24|
|4||1000-500 mb Mean Relative||GB||18|
|7||500 mb Vertical Velocity||GB||24|
|8||950 mb Relative Humidity||GB||18|
|9||950 mb Vertical Velocity||GB||18|
|11||1000-500 mb Mean Relative||GB||18|
|14||850 mb Vertical Velocity||GB||18|
|15||500 mb Vertical Velocity||GB||18|
1 "C" denotes conventional continuous predictor, while "GB" indicates grid binary.
Table 2b. Regression constants and 15 predictor coefficients for each of the six equations which compute the probabilities of 12-h accumulated precipitation amount for cool season (October-March) region 15 (Fig. 1b), valid for the period ending 24 hours after 0000 UTC. The numbering of predictor coefficients corresponds to the 15 predictors as listed in Table 2a.
|PRECIPITATION AMOUNT PREDICTAND|
Table 2c. Probability threshold values used to transform the six 12-h precipitation probability forecasts computed using the equations of Tables 2a and 2b above into a single categorical forecast as per the procedure of Section 3.4, with category numbers as listed in Section 2.1. As before, thresholds are for cool season (October-March) region 15 (Fig. 1b), valid 12-24 hours after 0000 UTC.
Table 3a. Distribution by region and forecast projection of NGM MOS QPF categorical forecasts which are not produced for the cool season 0000 UTC cycle. Region numbers are as given for the contiguous U.S. in Fig. 2a. Numbers appearing in each column indicate categories which are not forecast at that particular valid time and region. Category numbers for the 6- and 12-h forecasts are as they appear on the NGM MOS messages and are explained in section 2. A complete set of NGM MOS QPF categorical forecasts is available when no entries are given.
|6-H FORECAST PROJECTION||12-H FORECAST PROJECTION|
Table 3b. Same as Table 3a above, but for the 0000 UTC cool season Alaska NGM MOS QPF with regions as indicated in Fig. 2b.
|6-H FORECAST PROJECTION||12-H FORECAST PROJECTION|
Table 3c. Same as Table 3a, but for the 1200 UTC cool season NGM MOS QPF for the contiguous U.S.
|6-H FORECAST PROJECTION||12-H FORECAST PROJECTION|
Table 3d. Same as Table 3b, but for the 1200 UTC cool season Alaska NGM MOS QPF.
|6-H FORECAST PROJECTION||12-H FORECAST PROJECTION|
Table 4a. Distribution by region and forecast projection of NGM MOS QPF categorical forecasts which are not produced for the warm season 0000 UTC cycle. Region numbers are as given for the contiguous U.S. in Fig. 5a. All other information is as described in Table 3a.
|6-H FORECAST PROJECTION||12-H FORECAST PROJECTION|
Table 4b. Same as Table 4a above, but for 0000 UTC warm season Alaska NGM MOS QPF with regions as indicated in Fig. 5b.
|6-H FORECAST PROJECTION||12-H FORECAST PROJECTION|
Table 4c. Same as Table 4a, but for the 1200 UTC warm season NGM MOS QPF for the contiguous U.S.
|6-H FORECAST PROJECTION||12-H FORECAST PROJECTION|
Table 4d. Same as Table 4b, but for 1200 UTC warm season Alaska NGM MOS QPF.
|6-H FORECAST PROJECTION||12-H FORECAST PROJECTION|
1. The CSI (also known as the threat score) is defined as the ratio:, where X is the number of correctly predicted events, and (Y + Z) is the sum of all outcomes incorrectly predicted. Y is the number of events occurring but not forecast, and Z is the number of cases where the event is forecast but does not occur.
2. Bias is defined as the ratio: F / O , where F is the number of events forecast and O is the number of events actually observed.
3. It must be kept in mind that although the same equation is applied when making operational forecasts at all stations within a given region, the forecast probabilities (and associated categorical forecasts) may still vary from station to station. The reason for this is that the interpolated predictor values frequently are different at each individual station location, resulting in a different estimation of the predictand.
4. Since the NGM is not run operationally beyond the 48-h projection, NGM MOS QPF at the 54- and 60-h projections utilizes predictors derived from 48-h model forecasts.