Abstract
Statistical analysis of extremes currently assumes that data arise from a stationary process, although such an hypothesis is not easily assessable and should therefore be considered as an uncertainty. The aim of this paper is to describe a Bayesian framework for this purpose, considering several probabilistic models (stationary, step-change and linear trend models) and four extreme values distributions (exponential, generalized Pareto, Gumbel and GEV). Prior distributions are specified by using regional prior knowledge about quantiles. Posterior distributions are used to estimate parameters, quantify the probability of models and derive a realistic frequency analysis, which takes into account estimation, distribution and stationarity uncertainties. MCMC methods are needed for this purpose, and are described in the article. Finally, an application to a POT discharge series is presented, with an analysis of both occurrence process and peak distribution.
Similar content being viewed by others
References
Berger JO (1985) Statistical decision theory and bayesian analysis. Springer, Berlin Heidelberg New York, p 617
Chib S (1995) Marginal likelihood from the Gibbs output. J Am Stat Assoc 90:1313–1321
Coles S, Pericchi L (2003) Anticipating catastrophes through extreme value modelling. J R Stat Soc Ser C-Appl Stat 52:405–416
Coles SG, Powell EA (1996) Bayesian methods in extreme value modelling: a review and new developments. Int Stat Rev 64:119–136
Coles SG, Tawn JA (1996) A Bayesian analysis of extreme rainfall data. J R Stat Soc Ser C-Appl Stat 45:463–478
Coles S, Pericchi LR, Sisson S (2003) A fully probabilistic approach to extreme rainfall modelling. J Hydrol 273:35–50
Cooley D (2005) Statistical analysis of extremes motivated by weather and climate studies: applied and theoretical advances. University of Colorado. 122 p
Cooley D, Nychka D, Naveau P (2005) A spatial Bayesian hierarchical model for a precipitation return levels map. In: Extreme value analysis, Gothenburg, Sweden
CTGREF (1980–1982) Srae, S.H. Diame. Synthèse nationale sur les crues des petits bassins versants. Fascicule 2: la méthode Socose; Information Technique no 38–2 (Juin 1980); Fascicule 3: la méthode Crupedix
Cunderlik JM, Burn DH (2003) Non-stationary pooled frequency analysis. J Hydrol 276:210–223
Diebolt J, El-Aroui MA, Garrido M, Girard S (2003) Quasi-conjugate Bayes estimates for GPD parameters and application to heavy tails modelling. Rapport de recherche INRIA. 29 p
Favre AC, El Adlouni S, Perreault L, Thiemonge N, Bobee B (2004) Multivariate hydrological frequency analysis using copulas. Water Resour Res 40, WO1101, DOI 10.1029/2003WR002456
Fisher RA, Tippett LH (1928) Limiting forms of the frequency distribution of the largest or smallest member of a sample. Cambridge Philos Soc 24:180–190
Galéa G, Prudhomme C (1997) Notions de base et concepts utiles pour la compréhension de la modélisation synthétique des régimes de crue des bassins versants au sens des modèles QdF. Revue des Sciences de l’ Eau 1:83–101
Gelman A, Carlin JB, Stern HS, Rubin DB (1995) Bayesian data analysis. Chapman & Hall, London, 526p
GREHYS (1996) Presentation and review of some methods for regional flood frequency analysis. J Hydrol 186:63–84
Gumbel EJ (1958) Statistics of extremes. Columbia University Press, New York, pp 375
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109
IPCC (2001) Climate change 2001: synthesis report. Cambridge University Press, Cambridge, pp 408
Javelle P, Grésillon JM, Galéa G (1999) Discharge-duration-frequency curves modeling for floods and scale invariance. Comptes Rendus de l’Académie des Sciences, Sciences de la terre et des planètes 329:39–44
Javelle P, Ouarda T, Lang M, Bobee B, Galéa G, Grésillon JM (2002) Development of regional flood-duration-frequency curves based on the index-flood method. J Hydrol 258:249–259
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795
Katz RW, Parlange MB, Naveau P (2002) Statistics of extremes in hydrology. Adv Water Resour 25:1287–1304
Lang M (1999) Theoretical discussion and Monte-Carlo simulations for a Negative Binomial process paradox. Stoch Environ Res Risk Assess 13:183–200
Lang M, Ouarda TBMJ, Bobée B (1999) Towards operational guidelines for over-threshold modeling. J Hydrol 225:103–117
Madsen H, Mikkelsen PS, Rosbjerg D, Harremoes P (2002) Regional estimation of rainfall intensity-duration-frequency curves using generalized least squares regression of partial duration series statistics. Water Resour Res 38:1239
Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44:335–341
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092
Parent E, Bernier J (2003) Encoding prior experts judgments to improve risk analysis of extreme hydrological events via POT modeling. J Hydrol 283:1–18
Perreault L (2000) Analyse bayésienne rétrospective d’une rupture dans les séquences de variables aléatoires hydrologiques. ENGREF, INRS-Eau. 200 p
Perreault L, Fortin V (2003) Mixture and Hidden Markov models for peak flow analysis. in. Seizièmes entretiens du centre Jacques Cartier, Lyon, France
Perreault L, Bernier J, Bobee B, Parent E (2000a) Bayesian change-point analysis in hydrometeorological time series. Part 1. The normal model revisited. J Hydrol 235:221–241
Perreault L, Bernier J, Bobee B, Parent E (2000b) Bayesian change-point analysis in hydrometeorological time series. Part 2. Comparison of change-point models and forecasting. J Hydrol 235:242–263
Perreault L, Parent E, Bernier J, Bobee B, Slivitzky M (2000c) Retrospective multivariate Bayesian change-point analysis: a simultaneous single change in the mean of several hydrological sequences. Stoch Environ Res Risk Assess 14:243–261
Pickands J (1975) Statistical inference using extreme order statistics. Ann Stat 3:119–131
Prudhomme C (1995) Modèles synthétiques des connaissances en hydrologie. Université Montpellier II, CEMAGREF Lyon. 400 p, Montpellier
Ray BK, Tsay RS (2002) Bayesian methods for change-point detection in long-range dependent processes. J Time Ser Anal 23:687–705
Reed DW (1999) Flood estimation handbook. Vol 1: Overview. I. o. Hydrology. 108 p. Wallingford
Reis DS, Stedinger JR (2005) Bayesian MCMC flood frequency analysis with historical information. J Hydrol 313:97–116
Ritter C, Tanner MA (1992) Facilitating the Gibbs Sampler—the Gibbs Stopper and the Griddy-Gibbs Sampler. J Am Stat Assoc 87:861–868
Robert CP, Ryden T, Titterington DM (2000) Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. J R Stat Soc Ser B-Stat Method 62:57–75
Rosbjerg D, Madsen H (2004) Advanced approaches in PDS/POT modelling of extreme hydrological events. In: Hydrology: science practice for the 21st century, B. H. Society, London, pp 217–221
Strupczewski WG, Kaczmarek Z (2001a) Non-stationary approach to at-site flood frequency modelling II. Weighted least squares estimation. J Hydrol 248:143–151
Strupczewski WG, Singh VP, Feluch W (2001b) Non-stationary approach to at-site flood frequency modelling I. Maximum likelihood estimation. J Hydrol 248:123–142
Tanner MA (1996) Tools for statistical inference. Springer, Berlin Heidelberg New York, p 208
Acknowledgements
This work was conducted as part of a national program of hydrological research (PNRH), which associates Cemagref (Lyon), LTHE (Grenoble), Hydrosciences (Montpellier), Meteo France (Toulouse), and Electricité de France (EDF Chatou and Grenoble). The authors would like to thank all members of this project. The financial support provided by Cemagref and EDF for the PhD research of B. Renard is gratefully acknowledged. We also thank two anonymous reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Appendices
Annex 1: Griddy Gibbs sampling
The aim of this method is to simulate a sample from the posterior distribution \(p (\varvec{\theta} | \user2{X}).\) The Gibbs sampling algorithm can be written as follow:
-
Choose a starting value \({\varvec{\theta}}^{{(0)}} = {\left({\theta ^{{(0)}}_{1},\ldots,\theta ^{{(0)}}_{k}} \right)},\) and set j = 0.
-
Repeat N times:
-
j=j+1;
-
Sample θ (j)1 ∼ p(θ1 | θ (j −1)2 ,...,θ(j − 1), X),
-
Sample θ (j)2 ∼ p(θ2 | θ (j)1 ,θ (j − 1)3 ,...,θ (j − 1) k , X),
-
...
-
Sample θ (j) q ∼ p(θ q | θ (j)1 ,...,θ (j)q − 1 ,θ (j − 1)q+1 ,...,θ (j − 1) k , X),
-
... ...
-
Sample θ (j) k ∼ p(θ k | θ (j)1 ,...,θ (j)k − 1 , X)
-
The vectors series \((\varvec{\theta}^{(j)})\) converges to the target posterior distribution as j tends to infinity. In order to decrease the influence of the starting point, the first m iterations are usually deleted, and inference is made with the last N−m iterations. Sensitivity analysis may be necessary to determine acceptable values for m and N (usually, at least a few thousands of iterations are used for m and N). More generally, this method presents the same drawbacks as other iterative simulation techniques. Convergence thus has to be monitored, by choosing several starting points or computing convergence indices. Gelman et al. (1995) or Tanner (1996) provide some guidelines for improving the numerical simulations efficiency.
Unfortunately, the Gibbs sampling algorithm can usually not be used in this raw version, because it involves being able to sample from the full conditional densities p(θ q | θ (j)1 ,...,θ (j)q − 1 ,θ (j − 1)q+1 ,...,θ (j − 1) k ,X ). Ritter and Tanner (1992) proposed the use of a discrete approximation of the cumulative density functions (cdf) of these distributions:
-
Choose a grid of points y 1, ..., y p .
-
Evaluate p (θ i |θ1,...,θi−1, θi+1,...,θ q , X) on this grid, to obtain w 1,..., w p .
-
Compute the cumulative sums of w 1,...,w p to obtain an approximation of the CDF.
-
Sample u from a uniform distribution on [0, 1].
-
Transform u by the inverse of the approximate CDF.
This algorithm must be added at each step of the Gibbs iteration. If the full conditional densities are known only up to proportionality, p(θ q | θ (j)1 ,...,θ (j)q − 1 ,θ (j - 1)q+1 ,...,θ (j − 1) k ,X ) can be replaced by f(θ (j)1 ,...,θ (j)q − 1 ,y,θ (j − 1)q+1 ,...,θ (j − 1) k ), the product of prior and likelihood, and the cumulative sums of w 1, ..., w p must then be divided by the total sum to provide the CDF approximation. Inversion of the CDF can be made with a linear interpolation between two grid points. The grid choice is the most important issue of this technique: it has to be broad enough to cover the range of the distributions, and fine enough to ensure a sufficient accuracy, keeping in mind that this supplementary step is computing-time expensive. Some improvements of the method are described in Ritter and Tanner (1992).
Annex 2: Chib method
The aim of this method is to compute the marginal distribution of the observations, which is the normalizing constant of the Bayes theorem:
This relationship being true for any vector \(\varvec{\theta},\) let us consider a particular \({\varvec{\theta}}^{*} = (\theta ^{*}_{1},\ldots,\theta ^{*}_{k}).\) \(f (\varvec{\theta}^{*})\) is directly computable, which is not the case of the denominator. Consider the following relationship:
The first term can be evaluated thanks to the sample of the first marginal distribution, by using a Gaussian kernel as an example. The last term can be computed by 1-D numerical integration:
The griddy Gibbs sampling can be used to compute intermediary terms. p(θ * q | θ *1 ,θ *2 ,...,θ *q − 1 ,X) is indeed the first marginal of the distribution p(θ q ,θq+1,...,θ k | θ *1 , θ *2 ,...,θ *q − 1 ,X), evaluated at θ * q . The griddy Gibbs algorithm can thus be applied to the non-normalized posterior density with the first q − 1 components being fixed, that is f(θ *1 , θ *2 ,...,θ *q − 1 , θ q ,θ q+1,...,θ k ).
Although this approach is theoretically valid for any value \(\varvec{\theta}^{*}\) with non-zero posterior probability, Chib recommends the use of a high-density point to increase the method accuracy.
Rights and permissions
About this article
Cite this article
Renard, B., Lang, M. & Bois, P. Statistical analysis of extreme events in a non-stationary context via a Bayesian framework: case study with peak-over-threshold data. Stoch Environ Res Ris Assess 21, 97–112 (2006). https://doi.org/10.1007/s00477-006-0047-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-006-0047-4