Letter The following article is Open access

Causes of dependence between extreme floods

, and

Published 20 July 2021 © 2021 The Author(s). Published by IOP Publishing Ltd
, , Citation Cristina Deidda et al 2021 Environ. Res. Lett. 16 084002 DOI 10.1088/1748-9326/ac07d5

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1748-9326/16/8/084002

Abstract

Compound events, like compound floods, have rapidly aroused interest due to the strong impacts associated with them. The spatial dependence has a fundamental role in the dynamics of these events, and causative investigations of their origins could contribute to elucidate their dynamics. Here, we addressed the pairwise spatial dependence between annual maximum (instantaneous) discharges occurring in river stations located in the United Kingdom. First, we tested the hypothesis that the dependence comes from the co-occurrence of annual maxima using Kendall's tau measure of association and its conditional version, calculated from the non-co-occurrent values. This hypothesis, commonly accepted in literature, would attribute to the co-occurrence of the origin of the spatial dependence between extreme floods. The analysis showed how there is also dependence between annual maxima pertaining to catchments located very far from one another, and where the co-occurrence of annual maxima is small, if not zero. We formulated a general hypothesis to explain the spatial dependence between annual maxima: dependence is the compound result of co-occurrences, and climatological and hydrological similarities. The origin of dependence is more complex than what is presently stated in the literature. Thus, not only is synchronization a cause, but similarities in climate and hydrological response may also play a role. We introduced three dissimilarity indices and dependence-dissimilarity maps to illustrate this general hypothesis.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Compound and cascading natural events, like floods, droughts, wildfires and heatwaves, may be due to a combination or interaction of different physical processes [1, 2]. The cause of such events is not always due to a single process, but to the combination of this single process with other factors [35]. The study of the dependence and the joint co-occurrence of the different variables at play is of crucial importance for risk analysis and for a deeper understanding of such hazards [68]. Here, the attention has been focused on extreme floods, represented by the annual maximum discharge, and in particular on the spatial dependence of extreme floods in different river sites, because compound extreme floods are high-impact events with severe consequences, as recently pointed out by Kemter et al [9]. The spatial dependence of annual maximum floods has been recognized to be important within the context of flood frequency regionalization studies (see e.g. [10], illustration 2.2, and references therein including [11, 12]) because the presence of spatial dependence has an impact on the information content within the region, and consequently on the evaluation of the uncertainty of quantile estimates. On the other side, in hydrology, a large body of literature is available on the categorization of catchment behavior, looking for ways of assessing the hydrological similarity [1318]. The importance of delineating groups of homogeneous catchments with similar hydrological behavior is the basis of the predictions in ungauged catchments using rainfall-runoff modeling (see, for example, the Prediction in Ungauged Basins PUB initiative, 2003–2012, of the International Association of Hydrological Sciences https://iahs.info/pub/index.php) [15, 19], but also considered in the definition of homogeneous regions in regionalization techniques [2023]. Homogeneous catchments can share a similar response to meteo-hydrological drivers, essential for risk analysis and modeling of ungauged catchments [20, 21, 24]. Different strategies and methods have been studied to determine the hydrological similarity, focusing on similarity factors such as physiographic catchment characteristics (altitude, slope), flood statistics, meteorological variables and geographical information [2527]. Although rainfall-runoff modeling is routinely used for hydrological applications, detecting homogeneous groups of catchments is still an intricate issue, as well as, the definition itself of similarity [24, 28, 29].

Understanding the causes of spatial dependence of extreme floods is a crucial issue for predictive analyses and risk assessment procedures [29]. The spatial dependence between flood extremes can be linked to a similarity of physical characteristics of catchments [25] or similar precipitation patterns [30]. The similarity in physiographic catchment characteristics does not necessarily imply similarity in hydrological response [28]. On the other hand, recent studies highlight the impacts of synchronization of events: meteorological drivers can cause similar patterns in the basin response also at large distances [1, 3137]. The complexity of physical processes that characterize the transformation from rainfall to discharge involves both meteorology, physical characteristics and climatology [38].

Here, we have investigated the causes of spatial dependence between extreme floods in different stations, even if they are far from each other [31, 39]. To understand the origin of this dependence, we have considered co-occurrence, climatological and hydrological causes, and tested two hypotheses on the origin of such dependence.

2. Is dependence due to co-occurrence?

As our study area, we have selected the United Kingdom (UK). We used a dataset of annual maximum (instantaneous) discharge, which includes 882 stations (figure 1(a)). The dataset is derived from the UK National River Flow Archive, a National Hydrological Information Service, which collects river flow data [27, 4042]. We considered all the pairs of stations (388 521), selecting just the pairs having at least 20 years of data in common, which resulted to be 224 185, and most of them cover the period between 1960 and 2000. We estimated the Kendall's tau, τk , for each couple (equation (2) in Methods). In figure 1, we report the value of Kendall's tau and the p-value of the independence test for all the couples considered. We found both positive and negative values of Kendall's tau. The presence of negative values of Kendall's tau in spatial extremes like annual maximum floods seems suspicious, because from the theoretical point of view, according to the max-stability property, extreme-like variables have only positive dependence, or independence [10, 43]. To control the probability of committing typeI error, in multiple tests, the false discovery rate procedure [44] was applied (see section 3.2). Accordingly, the p-value threshold selected was 0.0017, to distinguish with enough confidence the dependent couples (zoomed in figure 1(b)) from the independent ones. From figure 1(b), it is possible to see that a p-value threshold of 0.0017 corresponds substantially to statistically dependent couples having a $\tau_k\gt|0.23|$. Thus, among the 224 185 couples, we found that 216 482 were independent (96.5%) and the remaining 7703 (3.5%) couples were statistically dependent, with both positive and negative values. Among the dependent couples just 50 were negative dependent. Significant negative values of Kendall's tau were considered here as a type-II error, attributed to the sampling variability of the estimator, and due to this they were not counted.

Figure 1.

Figure 1. Estimate of Kendall's tau and its p-value for each couple of stations in the UK with at least 20 years of data in common, with the colors indicating the distance. The left inner panel (a) shows a UK map of the selected stations, and the right inner panel (b) shows a zoom of the Kendall's tau values of significant dependent couples, of which considered only the positive ones.

Standard image High-resolution image

Then, we tested the hypothesis that the dependence on annual maxima comes from co-occurrence, i.e. the co-occurrence is the common cause of the spatial dependence between the annual maximum discharge. In the literature [30, 39], it is the most accredited hypothesis. To test this, we calculated for each couple of stations the co-occurrence frequency (equation (4)) and also their geodesic distance, which varies between 0.013 and 944 km. We considered that the annual maxima co-occurred, if they happened in the same time window with ±3 days of tolerance. This value is in agreement to previous studies [31, 45, 46] about the lag time on both meteorological (precipitation) and hydrological variables.

Figure 2 summarizes the results and distinguishes the dependent and independent couples. Figures 2(a)–(c) show the sample densities of Kendall's tau (figure 2(a)), the co-occurrence frequency, Syn, calculated according to equation (4) (figure 2(b)), and the distance for each couple of stations (figure 2(c)). From figure 2(b), we found that independent couples have lower values of co-occurrence frequency compared to dependent ones. Looking at the distribution of distances in figure 2(c), it can be seen that most of the dependent couples have a distance less than say 210 km, but there are also dependent couples with higher distances, as can be seen from the right tail of the sample density. Figures 2(d)–(f) also show the scatter plots of each variable with the others. From the scatter plot of Kendall's tau with the co-occurrence frequency in figure 2(d), it is clear that the co-occurrence can play a key role in the degree of dependence: the highest values of Kendall's tau correspond to the highest co-occurrence frequency. Of the 7653 positively dependent couples, it emerges that 40.9% of couples have $Syn \geqslant 0.4$. It is clear that the co-occurrence frequency has a strong impact on the dependence of annual maxima. At the smallest distance, there are also couples with a $Syn \geqslant 0.6$ (9.8% of total). This group of couples will be called the high synchrony (High Syn) group, which represents the trivial case where the dependence is linked to co-occurrence due to the closeness of sites. Moreover, from the scatter plot of the couple distance with the co-occurrence frequency in figure 2(e), it is possible to see that decreasing the distance increases the co-occurrence frequency.

Figure 2.

Figure 2. Kendall's tau, percentage of co-occurrence (Syn), and geodesic distance for dependent (blue triangles) and independent (pink circles) couples. The first row shows the sample densities and the second row the scatter plots for each pair of variables.

Standard image High-resolution image

To further investigate the statistical dependence, we split the dependence couples into two groups: near and far ones, as shown in figure 3(a). We selected a threshold distance, in the range [200, 220] km, say 210 km (gray band in figure 3(a) with dashed line at 210 km), to distinguish near couples from far ones, because the median value of Kendall's tau, calculated with a regression line (black line in figure 3(a)), has approximately constant behavior after this distance. Figure 3(b) shows a histogram of the co-occurrence frequency, distinguishing between near and far couples. Near couples show a co-occurrence frequency in the range [0, 0.96], while far couples have a lower degree of synchronization in the range [0, 0.44], and a smaller dependence, with values of Kendall's tau in the range [0.28–0.65], as evident from figure 3(a). The existence of significant dependent couples, also at high distances, or in correspondence of low co-occurrence of extreme floods, has led us to explore further the causes of dependence. To investigate how much the co-occurrence influences the dependence, we refined the analysis, computing the conditional Kendall's tau (see section 3.2).

Figure 3.

Figure 3. (a) Scatter plot of distance—Kendall's tau for the dependent couples. The gray band (with a range of distance from 200 to 220 km) represents the distinction between near and far couples, while the line gives the regression line over the median values. (b) Histogram of synchronization (Syn) for near (upper subpanel) and far (lower subpanel) couples.

Standard image High-resolution image

For each of the 7653 positive dependent couples, we split the data into two parts: the co-occurrent part, or synchronous part, which includes all the years where there is co-occurrence of annual maximum discharge, and the non-co-occurrent part, or asynchronous part, which is the remaining data, after we took off the years for which we have synchronization. The Kendall's tau was calculated on the asynchronous part. We did this only for the couples for which the length of data of the asynchronous part was at least 20 data, i.e. 5562 couples in total. The same procedure applied for the analysis of all data is used here: the p-value threshold was selected applying the false rate procedure (see section 3.2). The p-value threshold was found to be 0.0424. Two different behaviors were found: dependent—asynchronous independent (also indicated in the following as Dep-Ind) and dependent—asynchronous dependent (Dep-Dep). For the rest of the dataset we cannot perform the conditional Kendall's tau since the asynchronous part is less than 20 data. The maximum value of synchronization for couples with a number of asynchrony more than 20 data is 0.63. Of the 7653 couples, 752 have $Syn\geqslant 0.60$, so for these couples also if we cannot perform the conditional Kendall's tau analysis (since the asynchronous part is so small (less than 20 data in common)) we can state that the dependence comes from co-occurrence due to the very high percentage of synchronization. Among the 5562 couples analyzed, 4767 (85.7%) were dependent—asynchronous dependent, and 795 (14.29%) were dependent—asynchronous independent. 'Dep-Ind' denotes the couples which, after taking off the co-occurrence part of the discharge data, become independent. These couples support the hypothesis that dependence comes from co-occurrence. The second behavior denotes the couples that are still dependent after removing the co-occurring data. The results of conditional Kendall's tau on the asynchronous part show that the majority of couples (85.7%) were dependent also without considering the co-occurring data. This means that without accounting for the co-occurrent floods, the data series still remains significantly dependent, or in other words, there exists a concordance of extreme floods that cannot be completely explained by local scale phenomena, like extreme flood synchronization. As a first conclusion, we can say that the co-occurrence cannot be considered as the only driver of the statistical dependence between annual maximum discharge variables. This has led to the search for other (large-scale) explanations of this dependence.

3. Causes of dependence: co-occurrence, climatology and hydrology

Here, we formulate a different hypothesis about the spatial dependence between the annual maximum discharge. We argue that dependence comes not only from the co-occurrence of maxima, but also from climatological and hydrological similarities. In other words, our hypothesis is that the co-occurrence of annual maximum discharge is not the only common cause of the dependence between the annual maximum discharge, but there is a compoundness of causes, including climatological and hydrological inner similarities in addition to the event synchronization, which play a role in the dependence between annual maximum discharge. Clearly, climatological and hydrological similarities and co-occurrences are correlated and not mutually exclusive. Thus, they can be viewed as three causes that act together, but some or one of them can be greater or lesser influencing factors in the dependence. In order to investigate the influence of these causes on the dependence between the annual maximum discharge variables, we have introduced three indices, namely the asynchrony index IA and the climatological IC and hydrological IH dissimilarity indices, and dependence-dissimilarity maps, where Kendall's tau is plotted with respect to each couple of the three indices (see section 3.2). The three indices reflect different causative drivers of extreme flood dependence, as well as different scales and perspectives of the problem. While the asynchrony index captures the joint occurrence of flood events, rainfall- or snow-melt-based, having a local temporal scale of a few days, the climatological dissimilarity index adds a large-scale perspective, a month–year temporal-scale explanation of the dependence, with the rationale that an inner dependence may come from the similar climatological regime. The hydrological dissimilarity index adds a catchment-scale perspective of the dependence, with the rationale that for an inner dependence in addition to the similar climatological regime, the similarity in hydrological response, in terms of the hydrological, topographical and land cover characteristics, also has a role. The quantification of each of the three indices is considered easy to calculate, reproducible and of general applicability.

The three indices vary in the [0, 1] range: the higher the value of the index, the higher the degree of dissimilarity. Index equal to 0 means fully similarity in the characteristics that the index represents. The three indices have been computed for a subsample of all the couples, where the hydrological and topographical information was available. We made the calculations for 7494 dependent and 198 461 independent couples. Among these, 4672 are Dep-Dep, 781 Dep-Ind, and 521 High Syn.

Figures 4(a)–(c) show boxplots of the three indices (asynchrony in panel (a), climatological in panel (b) and hydrological index in panel (c)), separating the dependent and independent couples. From figures 4(a)–(c), it is possible to see how, for all three indices, the boxplots of the dependent couples are lower than the independent couples, exhibiting the highest degree of similarity with respect to asynchrony, climatology and hydrology. In figures 5(a)–(f), the dependence-dissimilarity maps are shown separately for dependent (panels (a, c, e)) and independent (panels (b, d, f)) couples. In particular, in the first row, Kendall's tau is plotted in the plane $I_A-I_H$, i.e. the asynchrony index–hydrological index, in the second row, in the plane $I_A-I_C$, i.e. the asynchrony index–climatological index, and in the third row, in the plane $I_C-I_H$, i.e. the climatological index–hydrological index. From figure 5, it is possible to see more pixels distributed in the lower-left corner of the maps for the dependent case compared to the independent case. But if you look only at the dependent case, there are more pixels located on the right side, where the dissimilarity is higher.

Figure 4.

Figure 4. Boxplots of asynchrony index (panel (a)), climatological (panel (b)) and hydrological (panel (c)) dissimilarity indices, where dependent and independent couples are distinguished.

Standard image High-resolution image
Figure 5.

Figure 5. Dependence-dissimilarity maps for $I_A-I_H$ in the first row, $I_A-I_C$ in the second row, and $I_C-I_H$ in the third row, for dependent couples (on the left) and independent ones (on the right). Each pixel (0.05 × 0.05) reports the median value of Kendall's tau.

Standard image High-resolution image

In general, we can see that in all three maps, for the dependent case, the highest values of Kendall's tau are located in the lower-left corner of the map, so in the area where the dissimilarity indices have the lowest values. Moving toward the upper-right corner of the map, the Kendall's tau decreases. For the present case study (UK territory), we did not have the chance to explore the highest value of the climatological dissimilarity index (the maximum value observed was 0.8) as well as that of the hydrological dissimilarity index (the maximum value observed was 0.82). In any case, what is possible to observe from figures 5(c) and (d) is that the upper-left part of the map is empty. This means that high similarity due to co-occurrence cannot be associated with high dissimilarity from the climatological point of view, while the opposite is possible. A similar comment can be made when looking at figures 5(a) and (b), the $I_H-I_A$ maps. Looking at figures 5(e) and (f), the $I_C-I_H$ maps, we can see that all the couples are located in the lower-left corner of the map. Note that there are no dependent couples with both hydrological and climatological high dissimilarities.

Figures 6(a)–(c) show boxplots of the three dissimilarity indices in the first row, and in the second row (d–e) the boxplots of distance and Kendall's tau are given, showing the distinction between the Dep-Dep, Dep-Ind and High Syn couples. The latter group is the one with the highest values of Kendall's tau and lowest values of distance, as well as the lowest values of the three dissimilarity indices. In particular, the boxplot of the asynchrony index is much lower than those related to the Dep-Dep and Dep-Ind couples. For High Syn couples, we can say that the synchrony is an influential driver, but we cannot conclude that it is the only driver, because we cannot separate the contribution of synchrony by the contribution of climatological and hydrological similarities. The Dep-Ind couples have intermediate distances, smaller with respect to Dep-Dep couples but larger with respect to High Syn ones. The boxplots of the dissimilarity indices for the Dep-Ind and Dep-Dep couples have higher values with respect to High Syn couples, and smaller values of Kendall's tau. Dep-Dep couples demonstrate the greatest distances. Dep-Dep and Dep-Ind couples have similar Kendall's tau values, but the percentage of synchronization is higher for the Dep-Ind couples. Conditional Kendall's tau analysis provides evidence that a large amount of couples are dependent also with very low values of synchronization or not having co-occurrent data. From the boxplots, we can see that the Dep-Dep and Dep-Ind couples are very similar in terms of catchment characteristics. Dep-Dep couples have larger distances and also the lowest values of synchronization. For this group, synchronization cannot be the major cause of dependence; this is due to a compound effect of hydrological and climatological similarity. Dep-Ind couples are at intermediate distances not so near that they have high synchrony but they have a higher percentage of synchronization with respect to Dep-Dep couples. The independence emerged after removing the co-occurrences from the sample could be a statistical effect due to the short length of remaining data. From the boxplot in figure 6 we can see that the couples with the highest synchrony are also those with the highest similarity. This comment exemplifies how the relative contribution of each index is not easy to identify, and overlapping in the explanations may occur. High Syn couples are the closest and the most highly correlated. Due to their closeness, they will probably have the same hydrological characteristics, same climatology and high synchrony. In this situation, it is not possible to disentangle the three main causes. For these couples, the dependence will be the result of a combination of the three causes; see figure 6. For Dep-Dep couples, we can say that the dependence cannot be entirely attributed to the co-occurrence because the asynchronous part is still dependent. In conclusion, considering just one index or one cause may lead to a partial explanation of the problem, but will not provide a complete representation of the mechanisms that are at the root of extreme flood dependence.

Figure 6.

Figure 6. Boxplots of asynchrony and climatological and hydrological dissimilarity indices (panels (a, b, c)), distance (panel (d)), and Kendall's tau (panel (e)) for the three subgroups of couples: Dep-Dep, Dep-Ind and High Syn.

Standard image High-resolution image

3.1. Discussion and conclusion

Recent studies [33, 36] have pointed out the existence of large-distance connections between atmospheric variables. The large-scale meteorological circulation can influence the occurrence and co-occurrence of extreme events, like heavy rains and floods [33, 35, 47, 48]. In the period 1960–2010, the flood synchrony increased by 50%, increasing the radius of distance of co-occurrent floods in Europe due to compound weather and landscape conditions [37]. Due to the arising of these impacting effects, new models have been developed, taking into account meteorological drivers, atmospheric teleconnections and regional climate information for the assessment of flood risk [33]. On the other hand, there exists a large body of literature on the use of regionalization techniques for predicting hydrological variables in ungauged catchments [23, 49], where the concept of similarity between catchments is considered [14, 2022, 29, 50]. There is still a lack of recognized classification system due to the complexity of categorizing the variability and heterogeneity of catchment characteristics and hydrological mechanisms [29]. Including catchment similarities and causal perspectives is of fundamental importance to give a full explanation of extreme flood dependence. As co-occurrence can have a high influence on the dependence between near couples, it cannot be addressed as the unique cause of dependence. Co-occurrence explains a large part of the problem addressed; however, in some cases when we have a significant statistical dependence with practically zero co-occurrence, the dependence has another explanation: it is the joint result of similarities in climatology and hydrology. This represents the non-trivial part of the explanation to the problem . Here, having in mind both the synchronization of flood peaks and the hydrological similarities considered in regionalization techniques, we have investigated the causes of spatial dependence between extreme floods, firstly considering the co-occurrence of annual maxima, and then in a more general framework, introducing three dissimilarity indices: the asynchrony index and the climatological and hydrological dissimilarity indices. The asynchrony index accounts for the co-occurrence of annual maxima. This index represents the synchronized flood events that can be the result of meteorological factors (heavy rain) or hydrological factors (snow melting). From the Kendall's tau and conditional Kendall's tau of each couple of stations, it emerges that the synchronization of floods has a key role on the spatial dependence, as expected [30, 39], but in some cases, especially for long-distance stations, we found that it is not the sole cause or not at all the cause of dependence, motivating the consideration of climatological and hydrological issues. Thus, we considered the climatological dissimilarity index, which accounts for the annual precipitation, and the hydrological index, which takes in different information about the topography (maximum altitude, longest drainage path), hydrological characteristics (baseflow, percentage of runoff) and land cover (percentage of grassland and arable horticultural coverage) [16, 17, 21, 51]. The hydrological dissimilarity index is defined as the mean of a series of dissimilarity indices that take into consideration specific hydrological, topographical and land cover characteristics. The definition of this index is quite general, and flexible to include other variables, if necessary. The three dissimilarity indices grouped the important variables for similarity on the basis of three main driving mechanisms of dependence [38]. The application of conditional analyses, based on the conditional Kendall's tau, has been useful to disentangle the role of co-occurrences with respect to climatological and hydrological causes in the spatial dependence. We found that some couples were dependent also when not considering the years of co-occurrent floods. This highlights the presence of couples for which there exists an inner concordance of extreme floods. This leads us to look at the dependence from a more general perspective, considering indices of similarities of the two catchments as potential causes of dependence. We used dependence-dissimilarity maps and boxplots to assess the variability of Kendall's tau as a function of the dissimilarity indices; in particular, the dependence-dissimilarity maps show for each couple of indices the behavior of Kendall's tau. The dependence-dissimilarity maps can be an easy way to visualize the interconnections between the drivers and the value of dependence. Increasing the distance between the stations, the synchronization loses its key role in explaining the dependence, revealing that it is more closely linked to the climate and hydrological similarities. Obviously, the three indices are interrelated, but identifying the relative contribution is not easy, and overlapping in the explanations may occur. For the near couples, it is clear that synchronization plays the main role, but for these couples, it is possible to observe small values of hydrological and climatological dissimilarity indices due to the closeness. In addition, from the dependent-dissimilarity maps, it can be also underlined that the highest values of Kendall's tau have been found to correspond with the smallest values of dissimilarity indices. The fact that there exists a dependence also at large distances and between catchments of different areas shows the importance of looking for the causal mechanisms. The three indices defined are thought to be the three main causative categories, but could possibly be expressed differently, considering different indicators. The results of this study are restricted to the United Kingdom, where the variability of climatology is low. An extension of the case study to European or worldwide scale could provide useful hints and indications about the spatial dependence of extreme floods from a context with more climatological and hydrological variability, and represent a good test case for the usability of the indices here introduced. In addition, it is important to note that this analysis could be applied as it is to extreme floods selected with a peak over threshold approach rather than using an annual maxima approach.

3.2. Methods

3.2.1. Dataset

The discharge time series dataset from the National River Flow Archive (available at https://nrfa.ceh.ac.uk/) has been considered, which includes 882 stations in the UK. In particular, we used the dataset of peak over threshold discharge, from which we extracted the maximum annual value of instantaneous discharge. The quality of the data is discussed in [40]. The dataset has been used in many other works, including [27, 4042]. The first step of data analysis was to consider all the possible combinations between the stations, extracting for each pair the years in common. We selected only the couples having at least 20 years of common observation period in order to have enough data to calculate the pairwise statistical dependence, and we only considered stations with coordinates and catchment information available. For this, among 388 521 couples in total, we focused our attention on 224 185.

3.2.2. Kendall's tau

For each couple, we calculated the Kendall's tau, which is a non-parametric rank-based measure of association [10, 52]. Letting (X1,Y1) and (X2,Y2) be Independent and identically distributed (i.i.d.) vectors of continuous random variables (r.v.)'s, the Kendall's tau, τk , is defined as

Equation (1)

where P[.] is the probability. The first term on the right-hand side is the probability of concordance while the second one is the probability of discordance. τk is in the range [−1,+1]. Positive values $(\tau_k\gt0)$ mean positive dependence, while negative values $(\tau_k\lt0)$ mean negative dependence. In the case of independence, τk  = 0. If $(x_1, y_1 ), (x_2, y_2 ),\ldots, (x_n, y_n)$ is a sample of size n from the vector (X, Y), the sample version of Kendall's tau is

Equation (2)

where c is the number of concordant pairs, i.e. the couple $(x_i,y_i)$, $(x_j,y_j)$ satisfies the relation $(x_i-x_j)(y_i-y_j)\gt0$, while d is the number of discordant pairs, i.e. the couple satisfying the relation $(x_i-x_j)(y_i-y_j)\lt0$. Kendall's tau is used as a statistic to test the spatial independence with a normal distribution, having 0 mean, and variance equal to $2(2n+5)/(9n(n-1))$. Due to the 1 m3 s−1 resolution for the instantaneous maximum annual discharge, there is the presence of repeated values (viz. 'ties') in the data series, which can be a problem for the computation of Kendall's tau. A randomization procedure was used to eliminate the ties, following De Michele et al [53]. Specifically, for each repeated discharge value, we added a random value extracted from a uniform distribution in the range [−0.1, 0.1].

3.2.3. False discovery rate

Multiple testing procedures can lead to an increase in the false positive rate. To control the possibility of committing a type-I error we applied the false discovery rate proposed by Benjamini and Yekutieli [44]. The procedure consists of ordering, in an increasing manner, all the p-values resulting from all the tests (say, m). We let α be the significant level, here α = 0.05, and p(i) be the ith ordered p-value. We let k be the largest value of i, which satisfies the constraint

Equation (3)

Then p(k) will be the p-value threshold used for rejecting the independence hypothesis. For the first k p-values, the independence hypothesis is rejected. The procedure has been applied to test the pairwise independence of Kendall's tau estimates with a p-value threshold of 0.0017.

3.2.4. Co-occurrence and conditional Kendall's tau

We investigated whether the pairwise annual maxima are co-occurring (or synchronized). For each couple of stations, and each year, we considered that the annual maxima co-occur if the values occur on the same day, with a tolerance of ±3 days. This tolerance was chosen according to previous studies [31, 45, 46] on the lag time for both meteorological and hydrological variables. Thus, indicating with $Syn(S_i,S_j)$ the co-occurrence (or synchronization) frequency of the annual maximum discharge in the two stations Si and Sj , $Syn(S_i,S_j)$ is estimated as

Equation (4)

where n is the number of joint data in the two stations Si , Sj . The greater the synchrony, the greater the percentage of extreme floods occurring on the same day (± lag time). Because we are interested in understanding whether the co-occurrence is the cause of statistical dependence between the annual maximum in a couple of sites, we have split the sample of observations for each couple into two: observations where the annual maxima co-occurred (the synchronous part) and observations where the annual maxima did not co-occur (the asynchronous part). Then, we calculated Kendall's tau for the asynchronous part, or in other terms, the conditional Kendall's tau by the non-co-occurrence, in order to see if the dependence changes significantly when removing the synchronous part. We restricted this calculation to the couples having at least 20 data for the asynchronous part, and using a p-value threshold of 0.0424 for the independence test, obtained applying the false discovery rate procedure [44].

3.2.5. Catchment characteristics

For a comprehensive analysis of the factors that influence the dependence between pairwise annual maxima, we accounted for the catchment descriptors given in the National River Flow Archive and reported in the Flood Estimation Handbook (FEH) [41]. We considered characteristics including landform, climate, soil and land cover descriptors [24, 27]. These are currently used for flood statistical analysis [54] and in other catchment similarity metrics [27]. After doing some preliminary correlation analyses, we selected the characteristics that seemed to influence the dependence patterns most:

  • BFIHOST (indicated in the following with B): this is a base flow index and represents a measure of catchment responsiveness derived using the 29-class hydrology of soil types (HOST) classification [55]. The HOST dataset is available on a 1 km resolution grid, and reports the percentage associated with each HOST class present.
  • SAAR (indicated in the following with P): average annual rainfall (in mm), in the period 1961–1990 [54].
  • SPRHOST (indicated in the following with SPR): standard percentage runoff (%) associated with each HOST soil class.
  • Landform descriptors: maximum altitude (MA) and longest drainage path (LDP).
  • Land cover descriptors: arable/horticulture and grassland (indicated in the following with AHE and GR, respectively).

3.2.6. Dissimilarity indices

Here we used three indices to assess the dissimilarity between two catchments identified by two stations. We considered three dissimilarity indices: one of synchronization type, one of climatological type, and one of hydrological type.

  • Asynchrony Index. This is indicated as IA (i, j)∈[0, 1] and defined as
    Equation (5)
    where $Syn(S_i,S_j)$ is the co-occurrence (or synchronization) frequency of the annual maximum discharge in the two stations Si and Sj given in equation (4). IA (i, j) = 0 means that the two catchments have 100% of grade of synchronization. The synchronization can be due to co-occurring events like co-occurring meteorological events or co-occurring snow melting (thawing).
  • Climatological Dissimilarity Index. Let Si and Sj be two stations, each of them identifying a catchment, respectively i and j. Let Pi and Pj be the mean annual rainfall respectively of catchment i and j. We define the climatological dissimilarity index IC (i, j)∈[0, 1]
    Equation (6)
    In order to properly define equation (6), $\max(P_i,P_j)\gt0$. The index can be viewed as a normalized distance because ($\max (P_{i}, P_{j})-\min (P_{i}, P_{j})$) is the distance between the attributes (namely, the mean annual precipitation) in the two catchments, normalized with respect to the $\max (P_{i}, P_{j})$ so that the index is in the range [0, 1]. In the case of interest, we have that both Pi and Pj are greater than zero. IC (i, j) = 0 means that the two catchments are similar from a climatological point of view, having the same mean annual precipitation. Increasing the index from 0 to 1 increases the dissimilarity between the two catchments.
  • Hydrological Dissimilarity Index. The hydrological dissimilarity index IH (i, j)∈[0, 1] is defined as
    Equation (7)
    i.e. the mean of (six) dissimilarity indices related to hydrology, land cover and landform, in order to take into account different characteristics of the two catchments i and j. The indices in equation (7) are defined as $I_{X}(i,j) = 1- \frac{\min (X_{i},\, X_{j})}{\max (X_{i},\, X_{j})}$, where X is B BFIHOST, or MA the maximum altitude, or SPR SPRHOST, or LDP the longest drainage path, or GR the grassland area, or AHE the percentage of arable horticulture extension. Similarly to what is said for the climatological index, the hydrological dissimilarity index can be viewed as the mean of normalized distances, calculated for each of the attributes considered. Thus, IH (i, j) = 0 means that the two catchments are similar from the hydrological point of view, being similar in terms of base flow, topography, runoff percentage, wetness and land cover characteristics. Vice versa, IH (i, j) = 1 means that the two catchments are dissimilar from a hydrological point of view, being dissimilar with respect to all the considered characteristics. Note that we have chosen the 'mean' operator to synthesize in one index several hydrological and topographical characteristics and describe in mean terms the similarity/dissimilarity of two basins. Thus, for example, two basins may have a very different slope, but on average the rest of the characteristics are so similar that in the end the two catchments perform similarly. Clearly, other definitions of the hydrological dissimilarity index are possible, for example considering other, or more, characteristics, or changing the mean operator with the min or max operator. The structure of the hydrological dissimilarity index is quite general and flexible enough to permit possible changes in the future, like the inclusion of other variables (e.g. among those given in [56]), if necessary.

In the literature, Wagener et al [29] explained clearly that there is a need for a proper catchment classification system and to improve 'the understanding of the interactions between hydroclimate and catchment form that result in different signatures of catchment function at various temporal and spatial scales of interest.' The complexity and differences between catchments lead to difficulty in finding homogeneity, but investigation of the patterns and connections can lead to advances in hydrological science through the formulation of hypotheses or relationships that may have general applicability. Catchment response can be linked to specific configurations of climate, hydrological, landscape and topographical properties [38]. Classification based on flow, soil, topographical and hydrological combinations of catchment characteristics showed relatively strong patterns of regional dependence [56, 57]; however, climate-driven factors (precipitation) can also be an effective measure of similarity [25, 38, 57]. On the other hand, co-occurring meteorological or synchronized snow melting events can be very influential for spatial dependence [30, 58]. Besides, there is a lot of research that explores which variables most influence the spatial dependence, there is still the lack of a metric or index that can unify and categorize the most important catchment characteristics. A proper classification metric has to take into account both large-scale and small-scale phenomena: hydroclimatic perspective, hydrometeorology and catchment-scale characteristics [38].

In the literature, the hydrological similarity between two catchments is often addressed with a dissimilarity metric based on a distance like, for example, a weighted Euclidean distance in the multidimensional space of attributes (which includes both physiographic and climatic characteristics), see e.g. [59]. For this, the weights reflect the relative importance of the attribute. In addition, since the attributes have different units and ranges of variability, they are standardized (e.g. dividing for the empirical standard deviation), even if this does not guarantee the same range of variability for all the standardized attributes. Other types of distance (Manhattan, maximum, Minkowski, Canberra) could be used for the identification of similarity [56].

In this work, having in mind the existing literature, keeping to a minimum the underlyinghypotheses but at the same time trying to have a flexible tool of analysis, we have proposed three different indices, in equations (5)–(7). These have the same range of variability, and we have investigated their relative role in the spatial dependence of extreme floods.

3.2.7. Dependence-dissimilarity maps

In order to investigate the variability of pairwise statistical dependence as a function of dissimilarity indices, we introduced dependence-dissimilarity maps. The goal of these maps is to visualize the relationship between the three considered dissimilarity indices and the pairwise statistical dependence between the annual maximum values of discharge. We have three maps, one for each pair of indices, where Kendall's tau value is reported. The maps are pixel-based, created from a total of 7494 points for dependent couples and 198 461 for independent ones. The pixels have a size of 0.05 × 0.05. For each pixel, the mean value of Kendall's tau is calculated among the points that fall inside that pixel. With these maps we can study the connection between the dependence of annual maximum discharge and synchronization, climatological and hydrological characteristics of catchments.

Additional data that support the findings of this study are available upon request from the authors.

Authors' contribution

CDM conceived the idea, CD and LR performed the data handling, CD and LR conducted the statistical analyses, CD prepared figures and wrote the first draft of the manuscript, and CDM, CD and LR reviewed and finalized the manuscript.

Acknowledgments

We wish to acknowledge the support from the Italian Ministry of University and Research (Ministero della Universitá e della Ricerca Scientifica) through the PRIN2017 project RELAID.

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: https://nrfa.ceh.ac.uk/.

Please wait… references are loading.