Selection of the probabilistic model of extreme floods: The case of the River Tiber in Rome

https://doi.org/10.1016/j.jhydrol.2009.03.010Get rights and content

Summary

The probability distribution of the annual maximum peak flows on the River Tiber at the town of Rome (Ripetta) can be evaluated using the information available since the XV century. In this paper the probability distribution of two series of annual peak flows maxima observed at the Ripetta gauge is analysed: the systematic series, including all the data observed since the beginning of the systematic stage record in 1782, and the censored series of the exceptional floods that inundated the town of Rome, starting from the XV century. In order to find a criterion for choosing the “optimum” distribution law for the high return period quantiles, an index that takes into account both the accuracy and the uncertainty of the quantiles estimation is proposed.

Introduction

The definition of flood protection plans requires the evaluation of design floods with return periods up to 200–500 years. The following steps are involved in the frequency analysis of peak flows records: (1) selection of peak flows data; (2) choice of an appropriate probability distribution model, fitted using appropriate estimation techniques; (3) testing the goodness-of-fit; and (4) estimation of flood quantiles and construction of confidence intervals.

Very few stage gauges were installed in Italy before the XIX century, and since most were installed after the beginnings of the XX century, only seldom the record lengths reach 100 years, and are usually much shorter. Thus the estimation of high flood quantiles requires the extrapolation of the probability distributions far beyond the usual sample length, involving high estimation uncertainties.

Some tools are available to reduce this uncertainty: (1) using physically based rainfall–runoff models in continuous simulation; (2) combining data from several gauges in a regionalization procedure; and (3) incorporating the “non-systematic” information at a gauge into the instrumental records (Kidson and Richards, 2005). Non-systematic information, related to events occurred before the beginning of systematic recording, includes historical floods and paleofloods. The magnitude of large “historical floods” can be determined when these extraordinary events are described in extant historical accounts or recorded by manufactured physical marks. “Paleoflood” data can be obtained by analyzing physical evidence of the occurrence of large floods, as erosions, sediment deposits and botanical specimens. Because only large values are usually recorded by the non-systematic information, historical flood and paleoflood data may at best constitute censored samples (Stedinger and Cohn, 1986, Hirsch and Stedinger, 1987, Francés, 1998, Cohn et al., 2001, Martins and Stedinger, 2001, England et al., 2003, Kidson and Richards, 2005, Stedinger and Griffis, 2008), but they provide all the same an unique information about the upper tail of the probability distributions, even if the data are usually less reliable than those of the systematic data.

Several probability distribution functions have been used in flood frequency analysis (e.g. Kidson and Richards, 2005). The proliferation of the employed distributions is itself symptomatic of the weak theoretical basis of the probabilistic models of hydrologic phenomena.1 As a consequence, theoretical considerations are rarely employed to justify the selection of the probabilistic model (Singh and Strupczewski, 2002, Kidson and Richards, 2005).

In several countries the approach to flood frequency analysis follows a national standardized procedure, enforced for planning and insurance purposes. The use of a fixed probability model for all catchments is based on the rationale that the differences between the quantile estimators resulting from reasonable alternative distributions are substantially lower than the uncertainty of the quantile estimator themselves, and that in any case the true underlying probability distribution cannot be identified. Thus the main issues are the use of a reasonable distribution, consistent with the available data, and fitting the distribution by the best available procedure (Stedinger and Griffis, 2008). A remarkable example is the official adoption as a standard procedure in the USA of the Log-Pearson 3 distribution, that is considered a reasonable and flexible model within the range of the parameter values typical of US flood series (IACWD, 1982).

The standardized use of a single model can be interesting from the administrative point of view; however, it still remains that no theoretical argument justifies the use of a simple distribution, even for a single sample. That involves several questions: is a certain degree of climatic uniformity a sufficient reason to assume some kind of uniformity in the flood distribution of different basins? How relevant are the observations of ordinary floods for the estimation of high return period quantiles? Is there some common limiting behaviour of extreme floods?

Recent studies have shown that the flood samples are the outcome of different combinations of separate identifiable processes. Mixed populations, and hence mixed distributions, are the outcome of several possible factors, such as different types of flood producing storms, rainfall and snowmelt floods, inundations and floodplain flow, antecedent basin soil moisture, conditions of vegetal cover (Singh et al., 2005). More complex multi-parametric probability distribution functions are often required to describe flood series including subset generated by different processes. Nevertheless there are many arguments against the use of multi-parametric models in flood frequency analysis, such as the difficulty in parameter estimation, the increasing estimates uncertainties as the sample size decreases and the number of the parameters increases, and the limited number of extreme events. In some cases, simulation experiments showed higher accuracy in upper quantiles estimates using a simple model than by using the true multi-parameter distribution (Singh and Strupczewski, 2002). In some cases, parameters parsimony may suggest fitting simple distributions on the subsets generated by distinct processes, notwithstanding the smaller sample sizes (Kidson and Richards, 2005).

The identification of the “optimum” distribution law for high return period quantiles is often based on the use of a goodness-of-fit tests, that verify whether or not a probability model fits the data to a specific level of confidence. But the results obtained depend on the assumed significance level and are not always conclusive, because several models can pass the tests.

A promising approach, rarely applied in flood frequency analysis, is the use of model selection techniques, that lead objectively to the selection of a single probability distribution (Zucchini, 2000). The application of model selection techniques, combined with the knowledge of their performances, appears to increase the efficiency of design floods estimation (Mitosek et al., 2006, Di Baldassarre et al., 2008).

In this paper the annual maxima of peak flows series on the River Tiber in the town of Rome (Ripetta) are analysed using the information available since the XV century. In fact, the systematic stage measurements began at the end of the XVIII century, while the historical record of the peak stages of the floods that inundated the town goes back to the Renaissance. This yields a unique occasion to study the long-term behaviour of floods, provided that a method can be devised to estimate the stage-discharge relationships at the ancient Ripetta landing, where the stages of the exceptional floods were recorded.

The paper is organized as follows. The available systematic and historical data samples are described in detail in “Available data”. Two data samples are available: a systematic sample of 185 events from 1782 to 1989 and a censored sample of 20 historical floods exceeding a perception threshold from 1422 to 1989. The flood frequency analysis is developed in “Flood frequency analysis”. Owing to the complex behaviour shown by the systematic sample, we propose fitting both a heterogeneous multi-parameter probability law, estimated using the whole sample, and a simple two-parameters probability law, estimated using only the upper quantiles. A simple criterion for the selection of the “optimum” model for the estimation of upper quantiles is proposed in “Selection of the probabilistic model”. Some conclusions are drawn in “Conclusion”.

Section snippets

Available data

Systematic stage measurements on the River Tiber in Rome began in 1781, and daily stages were systematically measured at midday since 1782. Unfortunately, from 1787 to 1788 and from 1793 to 1801 the original records were lost, and only the maximum and minimum values for each month are now available. Stage observations were suspended after 1801 and were resumed only in 1821, continuing up to present times with only minor interruptions. Thus, the annual maximum daily stages are known from 1782 to

Systematic sample

The parameters of the probability distributions were estimated using the method of the maximum likelihood. The method is a non-analytical technique which offers the significant advantage to be easily applied when the analysis involves different types of data, such as systematic and historical data, or multimodal and complex probability density functions (e.g. distributions derived from the combination of events drawn from differently distributed variables). Under regularity conditions, maximum

Selection of the probabilistic model

Summing up, the behaviour of the upper quantiles of the annual maximum peak flow series of the River Tiber in Rome were described assuming three different probability models:

  • (a)

    NG-II, estimated on the systematic sample (1782–1989);

  • (b)

    G-Q1 estimated on the systematic sample censored under the threshold Q1 (1782–1989);

  • (c)

    G-Q2, estimated on the censored sample of the exceptional floods (1422–1989).

Since the stress is mainly on extreme events, the selection of a model should involve upper tail

Conclusions

This paper deals with the problem of finding a criterion to operate the selection of the optimum distribution among competing probability models that are estimated using different samples.

The proposed sample quantile criterion is based on the principle of maximizing the probability density of the elements of the sample that are considered relevant to the problem, and takes into account both the accuracy and the uncertainty of the estimation. This criterion was applied to the series of the

References (29)

  • T.A. Cohn et al.

    Confidence intervals for expected moments algorithm flood quantile estimates

    Water Resources Research

    (2001)
  • R.B. D’Agostino et al.

    Goodness-of-fit Techniques

    (1986)
  • Di Baldassarre, G., Laio, F., Montanari, A., 2008. Design flood estimation using model selection criteria. Physics and...
  • J.F. England et al.

    Comparison of two moment-based estimators that utilize historical and paleoflood data for the log Pearson type III distribution

    Water Resources Research

    (2003)
  • Cited by (56)

    • Simplified models for uncertainty quantification of extreme events using Monte Carlo technique

      2023, Reliability Engineering and System Safety
      Citation Excerpt :

      This renders MLE to be a robust method in many cases, and estimators perform well statistically. With the development of computational capacity, the advantages of MLE can be maximized without the limitations of computing time and resources [9,17]. Nevertheless, a neglected fact is that most actual observation stations were constructed decades ago, and the number of samples extracted from these stations was finite.

    • Save hydrological observations! Return period estimation without data decimation

      2019, Journal of Hydrology
      Citation Excerpt :

      This might emerge from the edf of the annual maxima, by manifesting a different statistical behavior for ordinary and extreme events; this heterogeneity is expected to emerge more strongly when analyzing the complete time-series that brings a larger number of values (also on the upper tail of the probability distribution function) with respect to annual maxima. In this regard, we believe that a priori physical knowledge about the underlying processes, if available, could be included in the analysis to support the assumption of complex mixed models for modeling (and extrapolation) purposes (see, e.g. Calenda et al., 2009). In absence of additional knowledge on the physical process, the heterogeneity assumption cannot be truly tested; however, all events generally occur under the combination of numerous factors, so that the probabilistic treatment of processes is by definition a macroscopic approach that does not care about each of the specific factors and reduces to fitting the most appropriate model to the empirical distribution of the complete data set (or to its part that is of interest for the specific problem at hand).

    View all citing articles on Scopus
    View full text