1 Introduction

Like all major natural hazards known to mankind, drought has caused environmental and economic devastation in different regions worldwide. It is known for its effects as a result of the interplay between reduced precipitation and the demand people place on water supply [1]. Between the years 1990 and 2001, drought has reportedly occurred for about 782 times worldwide costing 16,800,000,000 US dollars [2] with 66,601 fatalities [3]. Historically, Ethiopia has been recurrently hit by drought for a very long time [4] and famine has been a periodic feature of its history; the first recorded occurrence dating back to the thirteenth-century [5]. As a result, Ethiopia has frequently been described as a drought-stricken country on different occasions [6]. However, studies show that not all parts of the country have a history of frequent droughts. Web and Braun [4] have identified that the southern, south-eastern, northern (current day Tigray Regional State) and north-eastern parts of the country are the most repeatedly drought and famine affected areas. Hence, Tigray is one of the regions repeatedly affected by recurrent drought events [7]. According to [4] out of the 26 drought events that occurred between 1800 and 1990, Tigray region experienced about 22 drought counts closely followed by 17 drought counts in Amhara regional state.

In an effort to understand the characteristics of drought, different studies have been made using a number of indices [8,9,10,11]. According to the definitions given by [12, 13], drought indices are understood as numerical representations of drought severity, computed from climatic or hydro-meteorological input data. Hence, among the indices implemented to assess drought SPI, SPEI, Vegetation Condition Index (VCI) and Normalized Difference Vegetation Index (NDVI) are the most common ones. Due to its characteristics of simplicity, flexibility, and strong adaptation to different climates [14] SPI has been identified as one of the most commonly used indices in more than 70 countries worldwide [15]. Various studies [16,17,18,19,20] have to consider the variations in temperature is considered successfully implemented SPI to assess and forecast drought occurrences. However, studies [15, 21] show that the dependence of SPI only on precipitation as an input to assess drought and its inability a major weakness. A performance comparison study by [22] argued that even though precipitation is the primary controlling factor of drought occurrences, the influence of temperature through the facilitation of evapotranspiration in the context of global warming cannot be ignored. However, SPEI, a variant of SPI and a multi-component drought index developed by [21], uses the variabilities of precipitation and temperature to assess drought in an area, hence, making it sensitive to global warming [22,23,24]. As a result, SPEI has been used in different drought-related researches worldwide [25,26,27,28].

Moreover, performance comparison studies [22, 29] between SPI and SPEI indicated that SPEI performs better than SPI on most occasions. Regardless of these facts, however, SPI continues to be widely used in different parts of the world. In Ethiopia, SPI is also more commonly used index [9, 11, 30,31,32] than SPEI. Only a few studies [33, 34] used SPEI to assess drought in Ethiopia. Due to the climate variability, which could be accounted to the undulating topography of the study area [35], variation in drought ratings from SPI and SPEI is expected. This study, thus, examined the degree of agreement between the ratings of SPI and SPEI as drought assessment tools over Tigray region. This, thus, provides information on the acceptability level of using SPI in place of SPEI as drought assessment tool, in the study area, especially in the absence of temperature data.

2 Materials and methods

2.1 Study area

The study area, Tigray Regional State, is one of the national regional states of Ethiopia located in the northernmost part of the country. Geographically its location lies between 12°15′N and 14°57′N latitude and 36°27′E and 39°59′E longitude [9]. The region is bordered by Eritrea to the north, Sudan to the west, and with Ethiopian regions of Amhara and Afar to the south and the east respectively [36]. The state is structured into six administrative zones, one special zone (Mekelle Special Zone) and 34 districts locally called “Wereda”. The areal coverage of the region is estimated to be 53,638 square kilometres with a total population of 5,484,405 (based on Central Statistical Agency (CSA) 2007 census projected for 2017 with an annual growth rate of 2.6%). The map of the study area and the grid point sample locations are indicated in Fig. 1.

Fig. 1
figure 1

Map of the study area

Tigray has a diverse topography, with an altitude that varies from about 500 m above sea level in the northeast to around 3800 m above sea level in the southwest. About 53% of the land is lowland (locally called “kola” and is less than 1500 m.a.s.l.), 39% is medium highland (also known as “weina-degua” falling within the elevation range of 1500–2300 m.a.s.l.), and 8% is upper highland (locally referred to as “degua” and ranges between 2300 and 3600 m.a.s.l.) [37]. The wide range of altitude governs the climatic conditions of the region [38]. This marked variation in altitude results in a distinct spatial distribution of temperature and rainfall [39].

The region belongs to the sub-tropical climate which is characterized by the sparse and highly uneven distribution of seasonal rainfall and frequent drought. The main rainfall season locally referred to as “kremti” starts in mid-June and lasts until mid-September. Rainfall in the region is highly variable temporally and spatially, which results in strong variation in yields of crops and livestock. Average rainfall varies from about 200 mm in the northeast lowlands to over 1000 mm in the southwest highlands [39]. According to [9], the mean annual rainfall of the region is estimated to be 473 mm. The average annual temperature varies from less than 7.5 °C in the highlands, with greater than 3500 m.a.s.l, to greater than 27 °C in the eastern lowlands [39]. The temporal distributions of precipitation and temperature are shown in Figs. 2 and 3 below.

Fig. 2
figure 2

Mean monthly precipitation and mean monthly temperature (minimum and maximum) values of the study area for the period 1901–2016. The values are averaged from 12 selected CRU TS 4.01 grid points obtained from Koninklijk Nederlands Meteorologisch Instituut (KNMI) climate data explorer

Fig. 3
figure 3

Annual precipitation and mean annual temperature values of the study area for the period 1901–2016. The values are averaged from 12 selected CRU TS 4.01 grid points obtained from Koninklijk Nederlands Meteorologisch Instituut (KNMI) climate data explorer

2.2 Data collection and processing

Even though there are meteorology stations in Tigray Regional State, they are mostly characterized by short climatic records containing missing values for several months or even years. Hence in order to avoid the errors and misrepresentations from the use of local climate data, gridded Climatic Research Unit (CRU) Time-series (TS) data version 4.01 were collected from Koninklijk Nederlands Meteorologisch Instituut (KNMI) climate explorer (https://climexp.knmi.nl/start.cgi) on monthly basis for the period 1901 to 2016 at 0.50 spatial resolution and on 12 selected grid points. The CRU TS 4.01 data are a monthly observational gridded data fields calculated from daily or sub-daily data by National Meteorological Services and other external agents [40,41,42]. These datasets were chosen for their broader application in various studies [43], and for the wider spatial and temporal coverage.

All the monthly climate data (1901–2016) for the 12 grid points were averaged to represent the climate data of Tigray region. The monthly average values were, thus, considered representative and used to compute the SPI and SPEI.

2.3 Data analyses techniques

2.3.1 Standard Precipitation Index (SPI) based drought analyses

SPI is a powerful drought modelling index requiring only precipitation as an input parameter [15]. It was developed for the purpose of defining and monitoring drought [44], hence, can be used as an indicator to establish a functional and quantitative definition of drought for each timescale. A drought event occurs when the SPI values are continuously negative and reach an intensity of − 1 and below [45]. The drought, however, ends when the SPI values are above zero. Any drought event, therefore, can be defined by its duration and the intensity [15].

The SPI for any location is calculated using the long-term precipitation records for the desired period of time. This long-term record is fitted to a gamma probability distribution, which is then transformed into a normal distribution resulting in zero mean SPI for the particular location and specified time period [44]. The negative and positive SPI values indicate below mean and above mean precipitation respectively. The below median precipitation indicates dryness and the above median indicate wetness events. Hence, SPI (Eq. 1) can be used to assess and monitor both wet and dry periods in an area [15].

$$SPI = \frac{{\left( {X_{i} - \bar{X}} \right)}}{\sigma }$$
(1)

where \(X_{i}\) is rainfall for year i, \(\bar{X}\) is long-term average rainfall and \(\sigma\) is the standard deviation.

In the presence of a time series of monthly precipitation data for a location, the SPI for any month in the record for the previous i months can be calculated where i = 1, 3, 12, 24, etc. depending upon the time scale of interest [44]. According to [15] groundwater, streamflow and reservoir storage reflect the longer-term precipitation variances. Differently, soil moisture conditions respond to precipitation variances on a relatively short timescale. Hence, one may want to look at a 1 month or 3-month SPI for meteorological drought, anywhere from 1 to 6-month SPI for agricultural drought, and 6-months up to 24-month SPI or more for hydrological drought. Therefore, to have the full picture of the drought events in the study area, SPI was automatically calculated using the “SPI” package [46] in R-statistical software at 1-month, 3-month, 6-month, 12-month and 24-month basis for the period 1901–2016.

2.3.2 Standard Precipitation Evapotranspiration Index (SPEI) based drought analyses

Despite the broader acceptance, SPI accounts only for precipitation from among all the atmospheric conditions that may affect drought severity. The atmospheric conditions that may influence drought occurrences and magnitude include precipitation, temperature, wind speed, and humidity. To ensure the inclusiveness of another atmospheric element in the computation, the Standardized Precipitation Evapotranspiration Index (SPEI) was developed by [21]. SPEI is computed in a much similar way to SPI, but by incorporating temperature changes as part of its analyses [47]. It retained the simplicity, multi-temporal nature, and statistical interpretability of the SPI and managed to provide a more comprehensive measure of climate variability in an area. The inclusion of Potential Evapotranspiration (PET) makes a discernible difference in index values, confirming that SPEI provides a significantly different drought index to the SPI. The SPEI is then recommended as an alternative to SPI to quantify anomalies in accumulated climatic water balance, incorporating potential evapotranspiration [48].

There are a number of equations that can be used to model PET (e.g. the Thornthwaite equation, the Penman–Monteith equation, the Hargreaves equation, etc.); however, the SPEI is not linked to any particular one [47]. According to [49], the Penman-method is the most physical and reliable method. However, its data requirements (i.e. air temperature, relative humidity, wind, and net radiation), make it difficult to use than the other techniques. Hargreaves’ model is a simpler model but it still requires two meteorological parameters, temperature (mean, maximum and minimum) and incident radiation [50, 51], while the Thornthwaite method requires only temperature data as an input [50]. Regardless of its less data requirement, however, there is a possibility for the Thornthwaite method to overestimate PET in places dissimilar to the place where the method was first implemented [49, 50]. In this study, thus, the Thornthwaite method was used to calculate the PET due to data limitation. Lastly the SPEI, being the difference between the precipitation (P) and PET for the month I, was then calculated using Eq. 2 as:

$$SPEI_{i} = P_{i} - PET_{i}$$
(2)

whereby the Monthly PET is calculated by the Thornthwaite equation (Eq. 3) as:

$$PET = 1.6K\left( {\frac{10T}{I}} \right)^{m}$$
(3)

where PET is monthly potential evapotranspiration, T is mean temperature and I is the heat index calculated as the total of 12 monthly index values, m is a coefficient that depends on heat index and K is a factor of correction calculated as a function of the month and latitude.

Based on this concept, SPEI was computed at 1-month, 3-month, 6-month, 12-month, 24-month time scales using the “SPEI” package [52] in R-statistical software. The classification of both SPI and SPEI based indices are indicated in Table 1.

Table 1 Classification criteria for drought indices.

2.3.3 Assessment of linear relationship between SPI and SPEI

The test for the linear relationship between SPI and SPEI was conducted using Pearson correlation coefficient (PCC) also called the product-moment correlation coefficient. According to [53, 54], PCC is the most common statistical technique used to show how strongly pairs of variables are related to each other. This is supported by literature of various disciplines which have successfully worked on and/or used PCC as a tool to test linear relationship between variables or methods [53, 55,56,57]. PCC is used to test for a linear relationship when couples of continuous data are available on the same experimental unit and follow a bivariate normal distribution [57]. PCC has a range of + 1 (perfect positive linear relationship) to − 1 (perfect negative linear relationship). The PCC value of 0 indicates that the variables do not have any linear relationship [53]. However, [58] have pointed out that it would be misleading to use PCC to characterize the degree of agreement between variables. Hence in this study, PCC was used only to test the strength and direction of the linear relationship between the two drought indices at each time scale.

2.3.4 Bland and Altman plot

According to [58], it is generally true that no two different techniques could give exactly the same result with no difference in their values at some point. Hence, when making a comparison between methods, the goal is to know by how much the results of the two methods differ. The Bland and Altman plot gives an insight into the level of agreement between methods also referred to as ratters. It was first developed by [58] for comparing two measurements, and displays the difference between the pairs of values on the y-axis against the means of the same pairs of values on the x-axis. This plot constructs limits of agreement and quantifies the agreement between two quantitative measurements using the mean and the standard deviation of their differences which may provide an insight about the extent of the agreement between the methods [54, 59, 60]. However, by how far the methods should differ and be accepted as having an agreement is a matter of personal judgement [58]. However, a general guideline by [59] shows that if the points on the Bland and Altman plot are highly scattered, above and below zero, it is an indication of inconsistent bias between the approaches. As per the guidelines of [58], the mean and differences of each pair of SPI and SPEI values were computed separately. The standard deviation (SD) of the difference and bias (mean difference) were then computed from the difference between each pair of values. A 95% confidence interval was used to calculate the Upper (Bias + 1.96 SD) and Lower (Bias − 1.96 SD) limits agreement.

2.3.5 Percent of match

The percent of match was used to show the number of times (in per cent) that the SPI and SPEI agreed with each other’s ratings or values. In doing so the number of times for which each value agreed were counted and divided by the total number of months and then multiplied by 100 to get the values in per cent (see Eq. 4).

$$\% Match = \frac{X}{N}*100$$
(4)

where X is the number of appraisals that match and N is the number of months (rows) of valid data.

2.3.6 Cohen’s Kappa statistics

Evaluating the degree of agreement between two or more assessment methods is common in various disciplines [61, 62]. Kappa statistics was introduced to measure nominal scale agreement between pair of assessment methods [63]. When assessing degrees of agreement between methods, the tests of agreement between two or more methods should include a statistic that considers the possibility of agreement or disagreement by mere chance. Due to the fact that Kappa statistics has the capability to resolve this issue, it is reported as one of the most commonly used statistics for this purpose [64]. Kappa values normally range from − 1 to + 1. High kappa values indicate stronger agreement levels. When Kappa = 1, a perfect agreement exists; Kappa = 0, the agreement is the same as would be expected by chance; Kappa < 0, the agreement is weaker than expected by chance; this rarely happens (see Table 2).

Table 2 Interpretation of Kappa values.

Kappa statistics can be calculated by either Cohen’s kappa [65] or Fleiss’ Kappa [63]. Cohen’s Kappa is used to assesses the degree of agreement when there are either two assessment methods with a single trial or one assessment method with two trials while Fleiss’s Kappa is an extension of Cohen’s kappa for three or more ratters (measurements). Moreover, the assumption with Cohen’s kappa is that the assessment methods are purposely chosen and fixed but with Fleiss’ kappa, the assumption is that the assessment methods are randomly selected from a larger population. Hence, based on these assumptions Cohen’s Kappa Statistics was found suitable for this study and used to test the degree of agreement between SPI and SPEI as drought assessment tools. But, because Kappa statistics works well with ordinal or nominal data, the continuous values of each assessment methods (i.e. SPI and SPEI) were transformed to ordinal data (with 8 classes) as indicated in Table 1. The transformed data of each assessment method were then used to calculate the degrees of agreement using Cohen’s Kappa Statistics. The formula for Cohen’s Kappa statistics is given in Eq. 5 below.

$$k = \frac{{\rho_{O} - \rho_{e} }}{{1 - \rho_{e} }}*100 = 1 - \frac{{1 - \rho_{O} }}{{1 - \rho_{e} }}$$
(5)

where ρo is the relative observed agreement among ratters and ρe is the hypothetical probability of chance agreement.

3 Results

3.1 SPI and SPEI analyses

The summary statistics in Table 3 indicated that the minimum and maximum values for SPI and SPEI are close to each other at all time scales. The mean values did not vary much from each other. Moreover, the standard error (SE) and standard deviation (SD) values for SPI and SPEI indicated that the deviation of the index values from their mean values were almost the same. For instance, the 1-month time scale SE for SPI and SPEI are 0.0272 and 0.0267. Similarly, at 1-month time scale, the SD for SPI and SPEI are 1.01 and 0.99. These values are close to each other and the same is also true for all time scales presented in Table 3. These nearly the same SE and SD values show the similarities of the deviation of each sampled value around their means.

Table 3 Summary statistics for SPI and SPEI at different time scales for 1901–2016

Table 3 shows that 1902 was the year of the strongest drought occurrence in the study area. However, SPI and SPEI showed differences in identifying the same year as the driest year at 1-month and 24-month time scales. It was observed that the strongest drought at 1-month time scale was felt in April 2015 (SPI) and August 1902 (SPEI). The SPI based analyses indicated that August 1902 was the second strongest drought year next to the year 2015. At 24-month time scale, the strongest drought years identified by SPI and SPEI were 1903 and 1902 respectively. Similar to the 1-month time scale, the year 1902 was the second strongest drought year based on the SPI analyses (Table 4). Moreover, it is indicated in Table 3 that the magnitude of the maximum drought occurrence at each time scale is higher for SPI than SPEI. It was at 1-month (SPI = −4.18, SPEI = −3.38) and 12-month (SPI = −4.31, SPEI = −2.79) time scales that the largest differences between the minimum values (extreme drought) were observed. Unlike for the identification of driest years, SPI and SPEI were able to identify the wettest year to have happened in 1916, except at 1-month time scale for which the SPEI identified 1920 as the wettest year in 116 years. The temporal distribution of SPI and SPEI are clearly indicated in Fig. 4.

Table 4 The longest and strongest drought duration, severities and intensities captured by SPI and SPEI for each time scale between 1901 and 2016
Fig. 4
figure 4

The temporal distribution of SPI and SPEI at 1-month, 3-month, 6-month, 12-month and 24-month time scales from 1901 to 2016

Furthermore, the longest and the strongest drought duration, severity and intensities identified by SPI and SPEI are presented in Table 4. It is indicated that SPI and SPEI have shown similarities and dissimilarities in identifying drought years. Both SPI and SPEI identified the year 1902 as the year of strongest drought severities and intensities at all time scales investigated except at 24-month time scale. At 24-month time scale, SPEI identified the year 2012 as the year of strongest drought severity and intensity while the SPI identified the years 1904 and 1920 as the years of strongest drought severity and intensity respectively. Additionally, SPEI performed well in capturing the drought years from 2009 to 2016 at all timescales. Table 4 shows that the SPI identified a smaller number of drought years between 2000 and 2016 compared to the SPEI. The gap widens from shorter time scales (3-month and 6-month) to longer time scales (12-month and 24-month). Regardless of their differences, however, both SPI and SPEI were able to identify major recorded drought years in the study area including 1984, 1990 and 2013 in most cases.

3.2 Linear relationship between SPI and SPEI

Pearson’s correlation coefficient was used to examine the linear relationship between SPI and SPEI at different time scales. The test result indicated that SPI and SPEI showed strong and significant relationship at 1-month (r = 0.69, p < 0.01) and 3-month (r = 0.7, p < 0.01) timescales. The linear relationship at 12-month and 24-month timescales (r = 0.83 and r = 0.79, respectively) remained significant at p < 0.01 and the strength subsequently increased to r = 0.83 and r = 0.79, respectively (see Table 5). Additionally, the scatter plot diagram in Fig. 5 revealed good positive linear relationship (R2 > 0.57) between the values of SPI and SPEI at 6-month, 12-month and 24-month timescales. However, at 1-month and 3-month time scales, R2 < 0.49 were observed indicating the decreasing linear relationship between SPI and SPEI at shorter time scales.

Table 5 Test for the degree of agreement between SPI and SPEI at the regional level
Fig. 5
figure 5

Scatter plot showing the linear relationship between SPI and SPEI at 1-month, 3-month, 6-month, 12-month and 24-month time scales for the period 1901–2016

3.3 Per cent of match between SPI and SPEI

To test the per cent of match between SPI and SPEI, continuous values had to be transformed into categorical data. In doing so, all the continuous values of SPI and SPEI were categorized into eight drought severity classes as indicated in Table 1. Then the per cent of match was computed based on the number of frequencies that the results of SPI and SPEI which were classified under the same category. Accordingly, the test results (Table 5) showed highest per cent of match (51.58%) at 1-month time scale. Similarly, above 50% of match was observed at 6-month and 12-month time scales. However, at 3-month and 24-month time scales, low per cent of matches were observed at 49.9% and 39.7% respectively. Hence no increasing or decreasing pattern in per cent match was observed the per cent of match showed no increasing or decreasing pattern with changing time scales.

3.4 Bland and Altman plot

The results in Fig. 6 indicate that the mean difference (bias) was 0.0036 at 1-month time scale with a 95% confidence interval of − 0.0029 and 0.0029. Accordingly, the SPEI tend to give a lower reading by 0.0036 than SPI. Similarly, the SPEI-3 and SPEI-6 tend to give 0.0022 and 0.0023 reading lower than SPI does respectively. An even smaller mean difference of 0.0021 and 0.0015 were observed at 12-month and 24-month timescales. Moreover, the small limits of agreement (i.e. lower limit = Bias − 1.96 SD, upper limit = Bias + 1.96 SD) at all investigated time scales of the analyses indicated an acceptable degree of agreement between SPI and SPEI.

Fig. 6
figure 6

Bland and Altman plot for SPI and SPEI at a 1-month, b 3-month, c 6-month, d 12-month and e 24-month time scales

3.5 Kappa statistics test

The Cohen’s Kappa Statistics result indicated a statistically significant fair (0.2–0.4) degree of agreement between SPI and SPEI in all investigated time scales at p < 0.01 (see Table 5). However, the degree of agreement at the 24-month time scale was the smallest in all-time scales.

4 Discussion

4.1 Characterizing SPI and SPEI based drought assessment

Both SPI and SPEI has been widely used to model drought all across the globe [25, 66,67,68,69,70]. According to [71], SPI and SPEI allow comparisons across climates using a univariate probability distribution to normalize the index. However, when it comes to the implementation of SPI and SPEI, the deviation between results obtained using these techniques is unavoidable. It was observed in this study that the results of SPI and SPEI are different. For instance, the 3-month time scale drought analyses indicate that 1902 was the driest year with intensity of drought SPI = 3.2 and SPEI = 2.4. Similarly, the 12-month time scale drought analyses indicated the same year (i.e. 1992) as the driest year with values of 2.3 and 1.8 for SPI and SPEI respectively. Differently, the analyses results revealed that the driest years at 24-month time scales are different for SPI (i.e. 1904) and SPEI (i.e. 1902).

It is indicated in Table 4 that SPI failed to capture all of the major drought years between 2000 and 2016 at all investigated time scales. By comparison, SPEI identified higher number of drought years that occurred between 2000 and 2016. This includes years 2003, 2009, 2013 and 2015 reported as either severe or extreme drought years by [11, 72] in different instances. However, SPI performed as well as SPEI did and captured most of the major drought occurrences before 2000. This finding was in agreement with an SPI and SPEI comparison study conducted by [73] in Nevada. According to this literature, both indices were able to detect the major drought periods of the first half of the twentieth century, however, during the late twentieth and early twenty-first centuries, SPI was unable to capture the magnitude of drought severity indicated by SPEI. The only occasion that SPEI and SPI were equally able to identify drought was only during a cool period. This shows the inability of SPI to consider the effects of global temperature change in drought modelling.

Moreover, when the individual values of SPI and SPEI are drawn together, the SPI values appear higher than the corresponding SPEI values. These dissimilarities between SPI and SPEI were also detected by other studies. A drought severity change research in China by [74] reported a variation between SPI and SPEI based drought assessment values. Similarly, [75] used the differences between SPI and SPEI to explain, in a way, how changes in temperature could cause discrepancy in results when coupled with precipitation than using precipitation alone. Hence, this supports the theory that these differences could arise from the variation in the input data they use to assess drought. SPI, developed by [45], is described by [76] as “a standardizing transformation of the probability of the observed precipitation”, indicating that it only uses the observed precipitation data as an input. However, SPEI, developed by [21] uses precipitation and temperature data as an input, thus have the capacity to include the effects of temperature variability on drought assessment. However, the existence of differences between SPI and SPEI values doesn’t mean that they give completely different results. In areas with low temperature variations, SPI can work as strong as SPEI does.

4.2 Linear relationship between SPI and SPEI

The Pearson’s correlation coefficient is a parametric test to assess possible linear relationship between two continuous variables [57, 77]. It has been used in different researches to show the linear relationship between variables [78,79,80]. The test for a linear relationship between SPI and SPEI, in this study, indicates that the two variables have shown a significant and positive correlation (r = 0.69, r = 0.70, r = 0.75 at p < 0.01) at 1-month, 3-month and 6-month timescales respectively. The positive correlation between SPI and SPEI can also be seen from scatter plots in Fig. 5. The strength of linear relationship however has increased at 12-month and 24-month time scales (r = 0.8, r = 0.79 at p < 0.01). Looking at the general pattern of the linear relationship between SPI and SPEI, this study has unveiled a consistently increasing trend in the strength of the linear relationship as the time scale of analyses increased from 1- to 12-months.

In contrary to the finding of this study, a decreasing trend has been reported in comparative analyses research by [81] carried out in the Chi River basin, Thailand. The results of this research work indicated that the correlation between the SPI and the SPEI were close at shorter timescales (1–6 months) and then dramatically decreased at longer timescales (9–24 months). It is stated in [60] that PCC is wrongly used to evaluate the level of agreement between methods. This is mainly due to the fact that two methods might have a perfect positive or negative correlation but with no agreement between them. This was reinforced by [54, 82], which recommended against the use of correlation as a method for assessing the comparability between methods. Hence, in this study, the PCC results strictly indicate the direction and strength of the linear relationship between SPI and SPEI not the degree of agreement between them. Moreover, the independent t test has indicated that there existed no significant variation between the means of SPI and SPEI ratings in the study area at p < 0.05.

4.3 Degree of agreement between SPI and SPEI

This study implemented Bland and Altman plot, useful graphical representation of the agreement between the two tests or measurement tools [59], and Cohen’s Kappa statistics [65] to test for agreement between SPI and SPEI. Bland and Altman’s plot has been used in researches of different disciplines to test the agreement between methods [83,84,85]. In this study, the Bland and Altman plot revealed that SPI and SPEI agreed at all time scales. The mean difference (bias) was the primary tool that was used to decide the agreement or disagreement between the methods. In an ideal situation where the two methods agree completely the mean difference would be zero [54]. Accordingly, the study results revealed that the mean difference values between SPI and SPEI were near to zero (i.e. 0.0036, 0,0022, 0.0023, 0.0021 and 0.0015 at 1-month, 3-month, 6-month, 12-month and 24-month time scales respectively), hence indicated good agreement between SPI and SPEI at all time scales. Moreover, the test results agreed with [82] which stated that good agreement can be expected if the scattering of points is diminished, and points lie relatively close to the line which represents mean bias. This proved that there existed a good level of agreement since most of the scattering points lie within the upper and lower limits of agreement close to the mean difference of each time scale (see Fig. 6).

Similarly, the Cohen’s Kappa statistics, a very well-known measure of agreement between two methods [86], result showed fair agreement between SPI and SPEI. Fair level of agreement (0.2–0.6) shows that these techniques agreed for 40–60% of the total study years. Hence, the study result was in agreement to drought comparison study in the horn of Africa by [87], even though some disagreements between SPI and SPEI were also reported in some parts of Central Africa.

5 Conclusion

This study examined the degree of agreement between SPI and SPEI using Bland and Altman method, per cent of match and Cohen’s Kappa Statistics. The results show that there is a fair agreement between the two methods at all timescales. The degree of agreement remained almost constant at all time scales except at 24-month time scale. This was clearly visible in the per cent of match which shows that the per cent of time that these methods agreed ranged from 49.9 to 51.8% at all investigated time scales except at 24-month time scale. The per cent of agreement was lowest at 24-month time scale (39.7%). However, the PCC increased slightly from 0.69 at 1-month time scale to 0.83 at 12-month time scale. It then dropped back to 0.79 at 24-month time scale.

Hence, based on the findings of this research, it can be inferred that SPEI performs better than SPI in Tigray region, especially in recent years. However, the agreement analyses have also indicated that SPI can be used in place of SPEI at all time scales keeping in mind that the two indices agreed to some extent. Hence, in the absence of temperature data and/or appropriate analyses tools to carry out SPEI, it is safe to say that SPI can be used to assess drought in the study area at all investigated time scales.