Skip to main content
Log in

Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds

  • Invited Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We discuss methods for the evaluation of probabilistic predictions of vector-valued quantities, that can take the form of a discrete forecast ensemble or a density forecast. In particular, we propose a multivariate version of the univariate verification rank histogram or Talagrand diagram that can be used to check the calibration of ensemble forecasts. In the case of density forecasts, Box’s density ordinate transform provides an attractive alternative. The multivariate energy score generalizes the continuous ranked probability score. It addresses both calibration and sharpness, and can be used to compare deterministic forecasts, ensemble forecasts and density forecasts, using a single loss function that is proper. An application to the University of Washington mesoscale ensemble points at strengths and deficiencies of probabilistic short-range forecasts of surface wind vectors over the North American Pacific Northwest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson JL (1996) A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J Climate 9:1518–1525

    Article  Google Scholar 

  • Azzalini A, Genton MG (2008) Robust likelihood methods based on the skew-t and related distributions. Int Stat Rev 76:106–129

    Article  MATH  Google Scholar 

  • Bernardo JM (1979) Expected information as expected utility. Ann Stat 7:686–690

    Article  MATH  MathSciNet  Google Scholar 

  • Berrocal VJ, Raftery AE, Gneiting T (2007) Combining spatial statistical and ensemble information in probabilistic weather forecasts. Mon Weather Rev 135:1386–1402

    Article  Google Scholar 

  • Besag J, Green P, Higdon D, Mengersen K (1995) Bayesian computing and stochastic systems. Stat Sci 10:3–66

    Article  MATH  MathSciNet  Google Scholar 

  • Bickel PJ (1969) A distribution free version of the Smirnov two sample test in the p-variate case. Ann Math Stat 40:1–23

    Article  MATH  Google Scholar 

  • Bickel PJ, Lehmann EL (1979) Descriptive statistics for nonparametric models IV. Spread. In: Jureckova J (ed) Contributions to statistics. Academia, Prague, pp 33–40

    Google Scholar 

  • Box GEP (1980) Sampling and Bayes’ inference in scientific modelling and robustness. J R Stat Soc Ser A 143:383–425

    Article  MATH  MathSciNet  Google Scholar 

  • Brockwell AE (2007) Universal residuals: a multivariate transformation. Stat Probab Lett 77:1473–1478

    Article  MATH  MathSciNet  Google Scholar 

  • Bröcker J, Smith LA (2007) Scoring probabilistic forecasts: the importance of being proper. Weather Forecast 22:382–388

    Article  Google Scholar 

  • Candille G, Talagrand O (2005) Evaluation of probabilistic prediction systems for a scalar variable. Q J R Meteorol Soc 131:2131–2150

    Article  Google Scholar 

  • Clements MP (2005) Evaluating econometric forecasts of economic and financial variables. Palgrave Macmillan, Basingstroke, Hampshire

    Google Scholar 

  • Clements MP, Smith J (2000) Evaluating the forecast densities of linear and non-linear models: applications to output growth and unemployment. J Forecast 19:255–276

    Article  Google Scholar 

  • Clements MP, Smith J (2002) Evaluating multivariate forecast densities: a comparison of two approaches. Int J Forecast 18:397–407

    Article  Google Scholar 

  • Czado C, Gneiting T, Held L (2007) Predictive model assessment for count data. Tech Rep no 518, Dept of Statistics, University of Washington

  • Dawid AP (1984) Statistical theory: the prequential approach. J R Stat Soc Ser A 147:278–292

    Article  MATH  MathSciNet  Google Scholar 

  • Dawid AP, Sebastiani P (1999) Coherent dispersion criteria for optimal experimental design. Ann Stat 27:65–81

    Article  MATH  MathSciNet  Google Scholar 

  • De Gooijer JG (2007) Power of the Neyman smooth test for evaluating multivariate forecast densities. J Appl Stat 34:371–381

    Article  MathSciNet  MATH  Google Scholar 

  • Delle Monache L, Hacker JP, Zhou Y, Deng X, Stull RB (2006) Probabilistic aspects of meteorological and ozone regional ensemble forecasts. J Geophys Res 111:D24307. doi:10.1029/2005JD006917

    Article  Google Scholar 

  • Diebold FX, Mariano RS (1995) Comparing predictive accuracy. J Bus Econ Stat 13:253–263

    Article  Google Scholar 

  • Diebold FX, Gunther TA, Tay AS (1998) Evaluating density forecasts: with applications to financial risk management. Int Econ Rev 39:863–883

    Article  Google Scholar 

  • Diebold FX, Hahn J, Tay AS (1999) Multivariate density forecast evaluation and calibration in financial risk management: high-frequency returns on foreign exchange. Rev Econ Stat 81:661–673

    Article  Google Scholar 

  • Eckel FA, Mass CF (2005) Aspects of effective short-range ensemble forecasting. Weather Forecast 20:328–350

    Article  Google Scholar 

  • Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7:697–717

    Article  MATH  MathSciNet  Google Scholar 

  • Genest C, Rivest LP (2001) On the multivariate probability integral transform. Stat Probab Lett 53:391–399

    Article  MATH  MathSciNet  Google Scholar 

  • Gneiting T (2008) Editorial: probabilistic forecasting. J R Stat Soc Ser A 171:319–321

    Article  MathSciNet  Google Scholar 

  • Gneiting T, Raftery AE (2005) Weather forecasting with ensemble methods. Science 310:248–249

    Article  Google Scholar 

  • Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378

    Article  MATH  MathSciNet  Google Scholar 

  • Gneiting T, Larson K, Westrick K, Genton MG, Aldrich E (2006) Calibrated probabilistic forecasting at the Stateline wind energy center: the regime-switching space-time (RST) method. J Am Stat Assoc 101:968–979

    Article  MATH  MathSciNet  Google Scholar 

  • Gneiting T, Balabdaoui F, Raftery AE (2007) Probabilistic forecasts, calibration and sharpness. J R Stat Soc Ser B 69:243–268

    Article  MATH  MathSciNet  Google Scholar 

  • Good IJ (1971) Comment on ‘Measuring information and uncertainty’ by Robert J. Buehler. In: Godambe VP, Sprott DA (eds) Foundations of statistical inference. Holt, Rinehart and Winston, Toronto, pp 337–339

    Google Scholar 

  • Gombos D, Hansen JA, Du J, McQueen J (2007) Theory and applications of the minimum spanning tree rank histogram. Mon Weather Rev 135:1490–1505

    Article  Google Scholar 

  • Granger CWJ (2006) Preface: Some thoughts on the future of forecasting. Oxf Bull Econ Stat 67S:707–711

    Google Scholar 

  • Grimit EP, Mass CF (2002) Initial results of a mesoscale short-range ensemble system over the Pacific Northwest. Weather Forecast 17:192–205

    Article  Google Scholar 

  • Grimit EP, Gneiting T, Berrocal VJ, Johnson NA (2006) The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Q J R Meteorol Soc 132:2925–2942

    Article  Google Scholar 

  • Hamill TM (1999) Hypothesis tests for evaluating numerical precipitation forecasts. Weather Forecast 14:155–167

    Article  Google Scholar 

  • Hamill TM (2001) Interpretation of rank histograms for verifying ensemble forecasts. Mon Weather Rev 129:550–560

    Article  Google Scholar 

  • Hamill TM, Colucci SJ (1997) Verification of Eta-RSM short-range ensemble forecasts. Mon Weather Rev 125:1312–1327

    Article  Google Scholar 

  • Hersbach H (2000) Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15:559–570

    Article  Google Scholar 

  • Huber PJ (1985) Projection pursuit. Ann Stat 13:435–475

    Article  MATH  MathSciNet  Google Scholar 

  • Ishida I (2005) Scanning multivariate conditional densities with probability integral transforms. Center for Advanced Research in Finance, University of Tokyo, Working Paper F-045

  • Jolliffe IT (2007) Uncertainty and inference for verification measures. Weather Forecast 22:637–650

    Article  Google Scholar 

  • Jolliffe IT, Stephenson DB (2003) Forecast verification: a practitioner’s guide in atmospheric science. Wiley, Chichester

    Google Scholar 

  • Judd K, Smith LA, Weisheimer A (2007) How good is an ensemble at capturing truth? Using bounding boxes for forecast evaluation. Q J R Meteorol Soc 133:1309–1325

    Article  Google Scholar 

  • Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7:48–50

    Article  MathSciNet  Google Scholar 

  • Krzysztofowicz R (2004) Bayesian processor of output: a new technique for probabilistic weather forecasting. In: Abstracts of the 17th conference on probability and statistics in the atmospheric sciences. Extended abstract no 4.2

  • Malmberg A, Holst J, Holst U (2008) A real-time assimilation algorithm applied to near-surface ocean wind fields. Environmetrics 19:319–330

    Article  MathSciNet  Google Scholar 

  • Mass CF, Albright M, Ovens D, Steed R, MacIver M, Grimit E, Eckel T, Lamb B, Vaughan J, Westrick K, Storck P, Colman B, Hill C, Maykut N, Gilroy M, Ferguson SA, Yetter J, Sierchio JM, Bowman C, Stender R, Wilson R, Brown W (2003) Regional environmental prediction over the Pacific Northwest. Bull Am Meteorol Soc 84:1353–1366

    Article  Google Scholar 

  • Matheson JE, Winkler RL (1976) Scoring rules for continuous probability distributions. Manag Sci 22:1087–1096

    Article  MATH  Google Scholar 

  • Murphy AH, Winkler RL (1992) Diagnostic verification of probability forecasts. Int J Forecast 7:435–455

    Article  Google Scholar 

  • Murphy AH, Brown BG, Chen YS (1989) Diagnostic verification of temperature forecasts. Weather Forecast 4:485–501

    Article  Google Scholar 

  • National Research Council (2006) Completing the forecast: characterizing and communicating uncertainty for better decisions using weather and climate forecasts. The National Academies Press, Washington

    Google Scholar 

  • O’Hagan A (2003) HSSS model criticism. In: Green PJ, Hjort NL, Richardson S (eds) Highly structured stochastic systems. Oxford University Press, Oxford, pp 423–444

    Google Scholar 

  • Oja H (1983) Descriptive statistics for multivariate distributions. Stat Probab Lett 1:327–332

    Article  MATH  MathSciNet  Google Scholar 

  • Oja H, Randles RH (2004) Multivariate nonparametric tests. Stat Sci 19:598–605

    Article  MATH  MathSciNet  Google Scholar 

  • Palmer TN (2002) The economic value of ensemble forecasts as a tool for risk assessment: from days to decades. Q J R Meteorol Soc 128:747–774

    Article  Google Scholar 

  • Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford statistical science series, vol 28. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Raftery AE, Gneiting T, Balabdaoui F, Polakowski M (2005) Using Bayesian model averaging to calibrate forecast ensembles. Mon Weather Rev 133:1155–1174

    Article  Google Scholar 

  • Rife DL, Davis CA (2005) Verification of temporal variations in mesoscale numerical wind forecasts. Mon Weather Rev 133:3368–3381

    Article  Google Scholar 

  • Rosenblatt M (1952) Remarks on a multivariate transformation. Ann Math Stat 23:470–472

    Article  MATH  MathSciNet  Google Scholar 

  • Roulston MS, Smith LA (2003) Combining dynamical and statistical ensembles. Tellus Ser A 55:16–25

    Article  Google Scholar 

  • Savage LJ (1971) Elicitation of personal probabilities and expectation. J Am Stat Assoc 66:783–801

    Article  MATH  MathSciNet  Google Scholar 

  • Shaked M, Shanthikumar JG (1994) Stochastic orders and their applications. Academic, Boston

    MATH  Google Scholar 

  • Shephard N (1994) Partial non-Gaussian state space. Biometrika 81:115–131

    Article  MATH  MathSciNet  Google Scholar 

  • Smith LA (2001) Disentangling uncertainty and error: on the predictability of nonlinear systems. In: Mees AI (ed) Nonlinear dynamics and statistics. Birkhäuser, Boston, pp 31–64

    Google Scholar 

  • Smith LA, Hansen JA (2004) Extending the limits of ensemble forecast verification with the minimum spanning tree histogram. Mon Weather Rev 132:1522–1528

    Article  Google Scholar 

  • Stephenson DB, Doblas-Reyes FJ (2000) Statistical methods for interpreting Monte Carlo forecasts. Tellus Ser A 52:300–322

    Article  Google Scholar 

  • Stigler SM (1975) The transition from point to distribution estimation. Bull Int Stat Inst 46:332–340

    MathSciNet  Google Scholar 

  • Talagrand O, Vautard R, Strauss B (1997) Evaluation of probabilistic prediction systems. In: Proceedings of a workshop held at ECMWF on predictability, 20–22 October 1997. European Centre for Medium-Range Weather Forecasts, Reading, pp 1–25

  • Timmermann A (2000) Density forecasting in economics and finance. J Forecast 19:231–234

    Article  Google Scholar 

  • Weisheimer A, Smith LA, Judd K (2005) A new view of seasonal forecast skill: bounding boxes from the DEMETER ensemble forecasts. Tellus Ser A 57:265–279

    Article  Google Scholar 

  • Wilks DS (2002) Smoothing forecast ensembles with fitted probability distributions. Q J R Meteorol Soc 128:2821–2836

    Article  Google Scholar 

  • Wilks DS (2004) The minimum spanning tree histogram as a verification tool for multidimensional ensemble forecasts. Mon Weather Rev 132:1329–1340

    Article  Google Scholar 

  • Wilks DS (2006) Statistical methods in the atmospheric sciences, 2nd edn. Elsevier Academic, Amsterdam

    Google Scholar 

  • Wilson LJ, Burrows WR, Lanzinger A (1999) A strategy for verification of weather element forecasts from an ensemble prediction system. Mon Weather Rev 127:956–970

    Article  Google Scholar 

  • Winkler RL (1977) Rewarding expertise in probability assessment. In: Jungermann H, de Zeeuw G (eds) Decision making and change in human affairs. D. Reidel, Dordrecht, pp 127–140

    Google Scholar 

  • Winkler RL (1996) Scoring rules and the evaluation of probabilities. Test 5:1–60

    Article  MATH  MathSciNet  Google Scholar 

  • Zuo Y, Serfling R (2000) General notions of statistical depth functions. Ann Stat 28:461–482

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tilmann Gneiting.

Additional information

This invited paper is discussed in the comments available at: http://dx.doi.org/10.1007/s11749-008-0115-9, http://dx.doi.org/10.1007/s11749-008-0116-8, http://dx.doi.org/10.1007/s11749-008-0117-7, http://dx.doi.org/10.1007/s11749-008-0118-6, http://dx.doi.org/10.1007/s11749-008-0119-5, http://dx.doi.org/10.1007/s11749-008-0120-z, http://dx.doi.org/10.1007/s11749-008-0121-y.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gneiting, T., Stanberry, L.I., Grimit, E.P. et al. Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. TEST 17, 211–235 (2008). https://doi.org/10.1007/s11749-008-0114-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-008-0114-x

Keywords

Mathematics Subject Classification (2000)

Navigation