1 Introduction

Drought is a regional recurrent phenomenon, characterised by a temporary severe decrease of water availability, with significant societal, economic and environmental impacts (Tsakiris et al. 2013). The degree of a region’s vulnerability depends on many environmental and social factors (Bordi and Sutera 2008). Agriculture is one of the most vulnerable sectors to drought, especially in arid and semi-arid regions, such as the Mediterranean (Kumar 1998; Tsakiris and Tigkas 2007). The impacts of drought on agriculture in a region cannot be easily measured, because there is not a unique way to establish a relationship between a key factor that determines drought and crop yield. The selection of a single crop on which emphasis may be given, may simplify the analysis of agricultural drought for a region (Kumar and Panu 1997).

Wheat is a widely cultivated crop with a major role in world’s economy. Winter wheat is a typical rainfed crop in the Mediterranean climate, which is used in this study as the most representative crop for the assessment of drought impacts on crop yield. The early estimation of drought impacts is important to both farmers and responsible organisations in order to take the necessary proactive and supportive measures for mitigating the anticipated consequences.

Many researchers have worked on the field of early prediction of wheat yield, using methods that include meteorological parameters or indices (agrometeorological indices, drought indices, etc.). For example, Salman and Al-Karablieh (2001), and Lee et al. (2013) used empirical models that included precipitation and temperature data, and Kazmi and Razul (2012) used precipitation, temperature and sunshine duration data. Also, Mavromatis (2007) investigated the use of the Standardized Precipitation Index (SPI) and the Palmer Drought Severity Index (PDSI), while Sadat Noori et al. (2012) used SPI along with evapotranspiration and temperature data. Kogan et al. (2013) used an NDVI-based approach to predict winter wheat yield with regression models.

Dalezios et al. (2002) investigated the role of agrometeorological and agrohydrological indices on temporal development of phenological stages of wheat, and found that temperature and precipitation were important parameters. Kristensen et al. (2011) used agroclimatic indices for investigating winter wheat response and indicated the importance of temperature on the yield. Liakatas (1997) underlined the significance of weather parameters (precipitation, temperature and potential evapotranspiration) in various phenological stages of rainfed wheat, especially in arid environments. Porter and Gawith (1999) outlined the effects of climatic variability and temperature extremes on wheat yields.

From the above, it can be clearly concluded that the key determinant for wheat yield assessment is precipitation. However, other climatic variables seem to play a rather important role, as well, in the development stages of the crop. Among them, temperature seems to be the most important, either as a regulator of the crop water needs through the evapotranspiration process, or as a limiting factor that may cause major problems to the plants (e.g., low winter temperatures) (Proutsos et al. 2010).

Therefore, the use of an index that incorporates both precipitation and potential evapotranspiration (PET) should be appropriate for the assessment of drought impacts on crop yield. In this paper, an approach is proposed that uses the Reconnaissance Drought Index (RDI) as the main variable for the early assessment of the impacts on the wheat yield. The predictability efficiency is tested for various time frames (1 to 3 months before harvesting). Critical phenological stages of the crop are considered for using the appropriate reference periods of the RDI in linear regression models. This approach can be suitable for operational purposes, as it requires easily obtainable data (precipitation and temperature) and does not involve complex procedures.

For the case study used in this paper for illustrating the proposed method, precipitation and temperature data from two major agricultural areas of Greece are used. The evaluation of the drought effects is achieved by using the AquaCrop water productivity simulation model for the simulation of wheat yield in these areas.

2 Materials and Methods

2.1 The Reconnaissance Drought Index

The Reconnaissance Drought Index (RDI) is a simple, yet practical index for studies of drought impacts on agriculture, since it takes into account both precipitation and PET, which are key factors for the development stages of the plant (Tsakiris et al. 2010). Also, it can be used as a composite climatic index for identifying climatic variations (Tigkas et al. 2013a). The RDI is the ratio of the cumulative precipitation to the cumulative PET for a specific time period (Tsakiris and Vangelis 2005; Tsakiris et al. 2007). It is expressed either by its initial value (α k ) or by its standardised form (RDIst). The initial value (α k ) of RDI is calculated for the i-th year in a reference period of k (months) as follows:

$$ {\alpha}_k^{(i)}=\frac{{\displaystyle \sum_{j=1}^k{P}_{ij}}}{{\displaystyle \sum_{j=1}^kPE{T}_{ij}}},\ \mathrm{i} = 1(1)\mathrm{N}\kern0.5em \operatorname{and}\kern0.5em \mathrm{j} = 1(1)\mathrm{k} $$
(1)

in which P ij and PET ij are the precipitation and PET of the j-th month of the i-th year and N is the total number of years of the available data. The average of the annual values of α (\( {\overline{\alpha}}_{12} \)) is equal to the aridity index of the area (UNEP 1992).

The standardised form of RDI (RDIst) is calculated through a standardisation process, assuming that α k values fit the lognormal or the gamma distribution (Tsakiris et al. 2007, 2008; Tigkas 2008). The RDIst provides values categorised in predefined drought classes, so it can be used as a global index. Drought severity categorisation includes mild, moderate, severe and extreme drought classes, with corresponding boundary values of RDIst (−0.5 to −1.0), (−1.0 to −1.5), (−1.5 to −2.0) and (<−2.0), respectively (Tigkas et al. 2012).

It should be noted, that the method of calculation of PET does not have any significant effect on RDIst, meaning that temperature based methods, such as Hargreaves (Hargreaves and Samani 1982), can be sufficient for producing reliable RDI results (Vangelis et al. 2013).

2.2 Wheat Crop Characteristics and Development Stages

Wheat is one of the most produced crops worldwide, affecting significantly the economy in local and regional levels. Winter wheat is a typical rainfed crop for the Mediterranean region. Winter wheat planting takes place usually in November and the growing period lasts for about 8 months (Allen et al. 1998). The major physical constraints to wheat production in the Mediterranean are the terminal drought and terminal heat stress as well as year to year weather variability. Drought is probably the major cause of yield loss in these environments. For this reason, the wheat grain yield follows the year to year variability of rain (Acevedo et al. 1999).

Total cumulative evapotranspiration (ET) of wheat crop typically ranges from 200 to 500 mm, being around 400 mm for rainfed wheat in Mediterranean conditions (Shimshi 1973; Steduto et al. 2012).

Early in the growing season under Mediterranean conditions, the daily water consumption can be less than 2 mm/day due to relatively low temperatures and high humidity. As the canopy enlarges during tillering and stem elongation, the rate of water consumption increases and typically reaches a peak around anthesis, at rates between 5 and 8 mm/day (Steduto et al. 2012).

There are three periods in which wheat yield is considered to be most responsive to moisture stress: first, the period when tillers are developing and their abortion rates are highest; second, when florets are being formed and grains are set; and third, from early to mid-grain filling when young developing grains can be aborted due to a lack of assimilate (Fischer 1973; Turner 1997). Water deficit during tillering can reduce the number of tillers, affecting the final yield (Shimshi 1973). Important yield variables under drought conditions can be considered the number of spikes/m2, the weight of grains/spike, the harvest index and the biological yield (Leilah and Al-Khateeb 2005).

Low winter temperatures (mainly in January and February under Mediterranean conditions) play a significant role in the growth of wheat, affecting the tillering process. In addition, dormancy period may be developed if very low temperatures sustain for many days, while frost may cause leave death (Kazmi and Razul 2012).

2.3 Yield Simulation with AquaCrop Model

Long-term data availability of wheat yield are usually either limited or of unverified reliability. Further, the actual yield depends on various factors, such as soil characteristics, cultivation techniques, plant diseases etc. Since the purpose of this study is to assess the effects of drought on yield, which is determined by the climatic conditions, the AquaCrop model was employed for the simulation of the crop yield. This way, other influencing factors can be set to have a neutral role in crop growth, without affecting the interpretation of the results.

AquaCrop is a water-driven simulation model that was developed by FAO. The model uses the water production function approach (Doorenbos and Kassam 1979) as a starting point, and evolves from it by calculating the crop biomass based on the amount of water transpired, and the crop yield as the proportion of biomass that goes into the harvestable parts. One of the main characteristics of AquaCrop is the separation of the final yield (Y) into biomass (B) and harvest index (HI), which allows the partitioning of the corresponding functional relations as response to environmental conditions (Steduto et al. 2012):

$$ Y=HI\times B $$
(2)

The model requires a relatively small number of parameters and input data to simulate the yield response to water for most of the major field and vegetable crops. However, it produces a significant number of output data, including the simulation of canopy cover, biomass and soil water components over the entire growing cycle, and the final harvestable yield (Steduto et al. 2009; Raes et al. 2009). AquaCrop has been used in several applications around the globe (e.g., Andarzian et al. 2011; Abedinpour et al. 2012; Mkhabela and Bullock 2012; Kumar et al. 2014), and its simulation results appear to be satisfactory, despite the simplifications introduced in the model.

In this study, version 4.0 of the model is used. In regard to the recommended values provided for the crop parameters, they are estimated by a calibration – validation process of AquaCrop using experimental data (Raes et al. 2012).

2.4 Basic Notions on the Selection of Linear Models

The assessment-prediction of wheat yield in this study is achieved through linear regression modelling. A multiple linear regression model is described by the following equation, which can be applied to predict a dependent variable y, using a set of independent variables x j :

$$ \begin{array}{cc}\hfill y={b}_0+{\displaystyle \sum_{j=1}^k{b}_j{x}_j}+\varepsilon \hfill & \hfill j=1(1)k\hfill \end{array} $$
(3)

where b 0 is the intercept, k is the number of independent variables, b j is the corresponding regression coefficients and ε is the residual error.

It is evident that modelling of a physical process cannot be based solely on statistical criteria, since this would be vulnerable to over-fitting issues, deteriorating the ability to generalise the model. It is important to keep a more inspective approach, taking also into account the physical processes and their underlying interactions. This approach may lead to a more meaningful type of model, rather than a black-box approach.

Stepwise regression can be used to find the most parsimonious sets of predictors that are most effective in predicting the dependent variable (Hocking 1976). A semi-automatic procedure is performed for the selection of independent variables to maximize the model’s prediction efficiency. This procedure may assist in understanding the statistical behaviour of the variables and in identifying possible problems (e.g., collinearity issues). However, the produced results can be biased, as the same data sets are used both to formulate the model and evaluate its goodness of fit (Chatfield 1995).

In order to validate the goodness of fit of the model with an unbiased estimate of its performance and to avoid possible over-fitting effects, cross-validation techniques can be used (Picard and Cook 1984). The main concept of cross-validation is based on the partitioning of the available dataset into subsets, using one subset for calibrating the model, while the other is used for validation.

There are several types of cross-validation, for instance the split-sample, the random sub-sampling, the K-fold and the Leave-One-Out (Arlot and Celisse 2010). In this paper, the K-fold cross-validation is used, in which the dataset is partitioned into K parts. The K-1 parts are used for calibration, while the remaining one part is used for validation purposes. This process is repeated K times. The cross-validation criterion is the average, over each repetition, of the estimates of discrepancy between the dataset and the fitted model (Browne 2000; Zucchini 2000). The main advantage of this approach is that all the data are utilised for both calibration and validation phases of the model.

The criteria that are used for the evaluation of the performance of the models are (Allen 1974; Hocking 1976; Willmott et al. 2012):

  • The Mean Absolute Error (MAE)

$$ MAE=\frac{1}{n}{\displaystyle \sum_{i=1}^n\left|{y}_i-{\widehat{y}}_i\right|} $$
(4)
  • The Route Mean Square Error (RMSE)

$$ RMSE=\sqrt{\frac{{\displaystyle \sum_{i=1}^n{\left({y}_i-{\widehat{y}}_i\right)}^2}}{n}} $$
(5)
  • The Coefficient of Determination (R2)

$$ {R}^2=\frac{{\left({\displaystyle \sum_{i=1}^n\left({y}_i-{\overline{y}}_i\right)}\left({\widehat{y}}_i-{\overline{\widehat{y}}}_i\right)\right)}^2}{{\displaystyle \sum_{i=1}^n{\left({y}_i-{\overline{y}}_i\right)}^2}{\left({\widehat{y}}_i-{\overline{\widehat{y}}}_i\right)}^2} $$
(6)
  • The Index of Agreement (d)

$$ d=1-\frac{{\displaystyle \sum_{i=1}^n{\left({y}_i-{\widehat{y}}_i\right)}^2}}{{\displaystyle \sum_{i=1}^n{\left(\left|{\widehat{y}}_i-\overline{y}\right|+\left|{y}_i-\overline{y}\right|\right)}^2}} $$
(7)
  • The Predicted Residual Sum of Squares (PRESS)

$$ PRESS={\displaystyle \sum_{i=1}^n{\left({y}_i-{\widehat{y}}_i\right)}^2} $$
(8)

where y is the observed value, ŷ is the predicted value, \( \overline{y} \) and \( \overline{\widehat{y}} \) are the mean of the observed and the predicted values, respectively, and n is the number of observations.

From the above criteria, MAE and RMSE are error measures used to represent the average differences between model predicted and observed values. The coefficient of determination (R 2) describes the proportion of the total variance in the observed data that can be explained by the model. The index of agreement (d) is the ratio between the mean square error and the potential error, and measures the degree to which the observed data are approached by the predicted data (Quiring and Papakyriakou 2003). PRESS is the sum of squares of differences between the observed and predicted values.

3 Results and Discussion

3.1 Drought Characteristics and Wheat Yield

Meteorological data of monthly precipitation and average monthly mean, maximum and minimum temperature were provided by the Hellenic Meteorological Service, for two agricultural areas of central and northern Greece (Fig. 1): Larissa (47 years) and Alexandroupolis (50 years).

Fig. 1
figure 1

The location of the study areas (Larissa and Alexandroupolis)

The PET was estimated using the Hargreaves method (Hargreaves and Samani 1982) and the RDI values were calculated using DrinC software (Tigkas et al. 2013b, 2014). The main climatic characteristics of each area are presented in Table 1. As known, the aridity index (\( {\overline{\alpha}}_{12} \)) is calculated using Eq. (1).

Table 1 Main climatic characteristics of the study areas (annual values)

The annual drought conditions for the two areas, based on RDIst classification, can be briefly characterised as follows:

  • In Larissa, 15 drought years (31.9 %) were observed, from which 2 are characterised as severe and 1 as extreme drought. Regarding the drought persistence, there were 5 cases with 2 consecutive years of drought and 1 case with 3 consecutive years of drought.

  • In Alexandroupolis, there were 15 drought years (30 %), from which 1 is characterised as severe and 3 as extreme droughts. There were 5 cases with 2 consecutive years of drought and 2 cases with 3 consecutive years of drought.

The annual drought characterisation can provide a general image of the conditions of the area. However, the seasonal variations of drought may play a much more important role in the study of agricultural production. In Fig. 2, for instance, it can be seen that for the area of Larissa, in some years the drought conditions are noticeably different if the reference applies to the entire year or the 6-month winter/spring period (in which the main development of wheat takes place).

Fig. 2
figure 2

The seasonal RDIst-6 (Dec–May) and the annual RDIst for the area of Larissa

The data of each area were imported to the AquaCrop model for the simulation of wheat yield. The main outputs of the model appear in Table 2. It should be noted that since RDI takes into account only meteorological parameters, other parameters of the model, such as soil characteristics, field management, etc., were set so that they have a neutral effect (not limiting factors) to the crop production:

Table 2 Simulated wheat yield and biomass with AquaCrop model for each study area
  • no adjustment of biomass was considered in regard to soil fertility stress,

  • no surface mulches were considered (0 % cover),

  • for the field surface, runoff occurrence was considered and no soil bunds were applied,

  • the soil type was considered as sandy clay,

  • no shallow groundwater table was considered.

The calculated values of the RDIst-6 (Dec–May) and the simulated wheat yield for the 2 study areas are presented in Fig. 3.

Fig. 3
figure 3

The RDIst-6 (Dec–May) and the simulated wheat yield for: a Larissa; and b Alexandroupolis

3.2 Model Formulation

From the aforementioned characteristics of wheat, we can identify the most important periods regarding the crop water requirements for the Mediterranean conditions. These are from January to March, with gradually increasing water demand, April with a peak on water demand and May with gradually decreased requirements.

The first step for model formulation is to identify the correlation of yield with each one of the independent variables (RDIst for various reference periods) through simple linear regression. In addition to the RDI, the effect of the average monthly minimum temperatures of the winter months is also tested.

In Table 3, the coefficients of determination between wheat yield and various periods of RDIst and monthly average Tmin are presented. The annual RDIst (RDIst-12) and the 9-months RDIst (RDIst-9, October to June) seem to have a satisfactory correlation with the yield. However, the RDIst-6 for the period December to May, covering the most critical period of crop development, has a significantly better correlation. The RDIst-3 for the periods December to February and March to May, have also a good correlation for both areas of study. Regarding the monthly RDIst (RDIst-1), March and April have good correlation for both areas, while February exhibits also high correlation for Larissa. It should also be noted that Tmin for January and February have also good correlation, especially in the case of Alexandroupolis.

Table 3 Coefficients of determination (R2) between yield and various RDIst and monthly average Tmin

The second step is to use the stepwise technique to identify the best fitted set of parameters for a multiple linear regression model. Several model structures are tested, based on different variables sets for several prediction times, from 1 to 3 months before harvest (Table 4). For each model, a version including only RDI variables (model version a) and a version including also Tmin variables (model version b) were tested. The selected model structures for each candidate set of variables resulted using as criterion the probability (significance) of the F statistic: the variable is incorporated into the model if the probability of F is less than or equal to 0.05, and it is removed from the model if the probability of F is greater than 0.10. The correlation and standard error of estimate for each model appear in Table 5.

Table 4 Tested model structures with various candidate variable sets, including RDIst and average monthly minimum temperature for January and February (Tmin)
Table 5 Correlation coefficients (R), adjusted coefficients of determination (R2) and standard error of the estimate for the tested model structures based on the stepwise regression analysis

For the third step, the evaluation of each model, a 10-fold cross-validation approach was applied for all tested models. In Table 6, the results of the evaluation criteria for the cross-validation are presented, which are based only on the subset of data that were not utilised in the formulation process of the model for each iteration. This allows an unbiased interpretation of the predictive potential of each selected set of variables.

Table 6 The performance criteria based on the cross-validation process

From the results, it can be seen that there is a very satisfactory prediction of the yield 1 month before harvest (end of May) with almost all model structures, for both study areas. The estimation remains satisfactory even for 2 months before harvest (end of April). For the time frame of 3 months before harvest (end of March) the predictive capacity of the models is decreased considerably, but it is still within acceptable limits. Also, in most of the cases, there is a positive effect on the accuracy of the models when the Tmin of January and February is added in the candidate sets of variables.

The models #1a and #1b exhibit a very good performance, though, it should be noted that the selected RDI variables sets include overlapping time periods, e.g., RDIst-6 (Dec–May) and RDIst-3 (Feb–Apr), so they are sharing a substantial amount of information. Therefore, despite the fact that these variables are linearly independent in the specific cases (p < 0.05), these sets may not secure the generalisation of the corresponding model structures.

The rest of the models provide similar results, for the same time frame of prediction (March, April or May). The most important month for RDI, based on the significance of the variables in the models, is April, followed by March, May, January, February and December.

The use of RDI-1 for all months gives good results for both study areas. However, the use of RDI-3 for important periods of crop development (e.g., Feb–Mar) seems to be also efficient.

For illustration purposes, the observed and predicted wheat yields and the corresponding diagrams of the coefficients of determination are presented in Figs. 4 and 5, respectively, based on model #4b for the two study areas. The #4b model structure has the following mathematical expression for the two study areas, respectively:

Fig. 4
figure 4

Observed and predicted (based on model 4b) wheat yield for: a Larissa; and b Alexandroupolis

Fig. 5
figure 5

Comparison of predicted by model 4b wheat yields with observed ones, and coefficient of determination (R2) obtained from this model, for: a Larissa; and b Alexandroupolis

$$ Larissa:\widehat{y}=2.361+0.544{x}_1+0.664{x}_2+0.636{x}_3+0.343{x}_4+0.253{x}_5+0.212{x}_6 $$
(11)
$$ Alexandroupolis:\kern0.5em \widehat{y}=2.745+0.856{x}_1+0.499{x}_2+0.616{x}_3+0.308{x}_5+0.426{x}_6 $$
(12)

where, ŷ is the predicted yield, x 1 is RDIst-1 (May), x 2 is RDIst-1 (Apr), x 3 is RDIst-1 (Mar), x 4 is RDIst-1 (Feb), x 5 is Tmin (Feb) and x 6 is Tmin (Jan).

4 Concluding Remarks

A methodology for the early estimation (prediction) of drought impacts on the expected winter wheat yield is presented. The methodology is tested in two areas with Mediterranean climate. The proposed approach can be useful for operational use, even for areas with limited data availability. The required data are only the monthly precipitation and temperature.

The methodology is based on the formulation of linear regression models for the prediction of wheat yield using RDI (of various reference periods) as the key independent variable. The performance of the models was evaluated with a cross-validation process.

From the single linear regression results, it can be concluded that wheat yield has high correlation with RDIst-6 (Dec–May), which corresponds to the main development phase of the crop. Further, various multiple regression model structures were tested, for different candidate sets of variables and prediction time frames from 1 to 3 months before harvest. The prediction was very successful for 1 and 2 months before harvest. However, the models could still give reasonable estimates of the final yield for predictions 3 months before harvest. The correlation coefficients in both study areas between the observed and predicted values reached 0.87 and 0.91 for predictions 1 month before harvest, 0.84 and 0.82 for 2 months and 0.67 and 0.77 for 3 months before harvest. Additionally, these results are supported by the fact that the evaluation through the cross-validation process confirmed the stability of the tested models.

Model structures with monthly RDI values (RDIst-1), mainly from January to May, performed very successfully for both areas studied, though the use of longer reference periods for RDIst (e.g., RDIst-3), including important months in crop development, may improve the performance of the model in most cases. Further, the incorporation of the average monthly minimum temperature of January and February seemed to enhance the predictive ability of the models.

All the above lead to the conclusion that RDI can be successfully used in the early estimation of drought impacts on the yield of the winter wheat in Mediterranean climate, which is very important for an early reconnaissance estimation of yield losses due to drought, by organisations and stakeholders. Such information is also very useful in the planning against drought in the agricultural sector and in taking supporting measures (e.g., timely arrangement of imports) ensuring food security. Finally, it is useful for the activities of insurance companies and related organisations for covering the production losses of the affected farmers. Further research on the subject may be useful for extending these outcomes to other climatic zones or different rainfed crops.