Research papersA Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction
Introduction
It is well documented that streamflow process is complex and not easily predictable (Yaseen et al., 2015). This is mainly due to the non-stationary feature of the phenomenon and highly nonlinear relationship between streamflow and the characteristics of its catchment (Nourani et al., 2011, Danandeh Mehr et al., 2013). One of the common ways to model streamflow process is to use of data-driven techniques, which have the ability to learn about and extract the nonlinear relationships between the streamflow and its driving variables. When using such techniques, a sound knowledge of the underlying physical processes is not prerequisite (Hundecha et al., 2001, Noori and Kalin, 2016).
Short-term streamflow prediction with a lead time less than (or equal to) one day is necessary for the real-time flood warning and reservoir operation systems (Danandeh Mehr et al., 2015). The application of various data-driven techniques, such as artificial neural networks (ANN), genetic programming (GP), and fuzzy logic in short-term streamflow prediction has been extensively evaluated and published in recent years (e.g., Hundecha et al., 2001, Moradkhani et al., 2004, Kücük and Agiralioglu, 2006, Wang et al., 2006, Makkeasorn et al., 2008, Shiri and Kisi, 2010, Kisi, 2010, Rezaeianzadeh et al., 2013, Krishna, 2013, Hosseinzadeh Talaee, 2014, Danandeh Mehr et al., 2015). Regardless of the type of the data-driven technique employed, prediction accuracy is highly dependent on the variables used to train and validate the technique. In most of the flow prediction studies on a short-term basis, daily rainfall and streamflow records have been used to create so-called rainfall-runoff models (e.g., Mutlu et al., 2008, Nourani et al., 2011, Nourani et al., 2012, Shoaib et al., 2015). However, in poorly gauged basins, where no rainfall record is available, the rainfall-runoff commonly used models are not applicable. In such cases, single-station, cross-station, or successive-station runoff-runoff prediction models have been suggested (e.g., Ochoa-Rivera et al., 2002, Kisi and Cigizoglu, 2007, Demirel et al., 2009, Besaw et al., 2010, Can et al., 2012, Danandeh Mehr et al., 2015). For example, Kisi and Cigizoglu (2007) developed three ANN-based single-station prediction models to forecast daily streamflow at two rivers in Turkey. To this end, daily streamflow observations with one- to six-day antecedent records (lags) were considered as input vectors for the ANN. The authors demonstrated that three days lag (i.e., three input vectors) is sufficient to achieve the best one-day ahead streamflow forecasting model in respect of selected performance criteria. In another single-station streamflow prediction study, Özger (2009) developed two fuzzy inference systems using daily streamflow records at Demirkapi Station on Euphrates River, Turkey. Based upon strong serial dependence of observational flows, the author suggested that one- and two-day lags are enough to train the models. On the basis of auto- and partial autocorrelation analysis, Hosseinzadeh Talaee (2014) explained that one- to four-day lags of streamflow records at a station in Aspas Stream, Iran, are the most appropriate input vectors to train multilayer perceptron (MLP) neural networks for one-day ahead streamflow prediction. Most recently, Altunkaynak and Nigussie (2015) combined the SEASON algorithm with a MLP neural network and demonstrated that the new hybrid model can be used to extend lead time of single-station daily streamflow prediction models.
Despite providing acceptable prediction accuracy, none of the aforementioned studies provided explicit formulation in regard to single-station streamflow process. To bridge the gap, recent works have focused on applying GP to discover underlying process explicitly. Our review showed that only a few studies have investigated capability of GP in short-term streamflow prediction. For instance, Guven (2009) applied linear GP (LGP) and two versions of ANN to predict daily flow of Schuylkill River in the USA. The author demonstrated that the performance of LGP is higher than that of ANNs. Londhe and Charhate (2010) developed two GP-based one-day ahead streamflow prediction models at two stations in Narmada Catchment of India and demonstrated that GP performs superior than ANN and model trees. Although classical GP has been implemented in a few other streamflow modelling studies (e.g., Ni et al., 2010, Nourani et al., 2012, Nourani et al., 2013b), at the best of our knowledge, no study has yet been conducted to assess the potential of multigene topology in GP, i.e., MGGP, for single-station daily streamflow prediction. Moreover, previous studies have investigated the effectiveness of GP mostly in terms of prediction accuracy and thus, additional studies are required to address problems associated with complexity of the proposed models. In this sense, this paper, for the first time, proposes a Pareto-optimal moving average multigene genetic programming (Pareto-optimal MA-MGGP) model to develop a parsimonious (accurate and simple) model for single-station streamflow prediction. The model is applied for daily streamflow prediction at Senoz Catchment in Turkey, and its performance is compared with those of stand-alone GP, MGGP, and conventional multivariable linear regression (MLR) prediction models as benchmarks. From practical point of view, the proposed model is explicit and parsimonious so that motivating to be used in practice.
Section snippets
Genetic programming (GP)
The state-of-the-art GP is of the most popular data-driven techniques that evolves computer programs to automatically solve problems using Darwinian natural selection (Koza, 1992). The task is done by randomly generating a population of computer programs and then breeding together the best performing programs to create a new population (offspring). Mimicking Darwinian evolution, this process is iterated until the population contains programs that solve the task well (Searson, 2015). In
Moving average (MA) filtering
MA is a smoothing filter that isolates the effect of periodic components of the day of year on daily streamflow records and remove unwanted line noise from a manipulated streamflow measurements. In the simplest and commonly used form a symmetric MA filter of length N (i.e., MA (N)), each data point from a streamflow signal takes the average of every N neighbour samples of the point. If a modeller is interested in daily streamflow prediction, selection of higher length for MA may lead to less
The Pareto-optimal MA-MGGP model
Similar to some of the previous one-day ahead single-station streamflow prediction studies (e.g., Özger, 2009, Hosseinzadeh Talaee, 2014, Altunkaynak and Nigussie, 2015, Al-Juboori and Guven, 2016), the steady-state input-output transfer function given in Eq. (2) is considered as objective function to be formulated by the proposed MA-MGGP model in this study.
where Qt and Qt−i represent daily streamflow at the present time instant t and past time instant t − i, respectively. The
Study area and observational data
The proposed MA-MGGP model is trained and verified for the unregulated portion of Senoz Stream located in Rize Province, Turkey (Fig. 5). The province, which is considered as the wettest corner of the country, is characterized by high rainfall and rough terrain covered by forest and tea gardens experiencing occasional landslides and floods. According to Koppen-Geiger climate classification, the catchment has a borderline oceanic/humid subtropical climate with warm summers and cool winters (
Efficiency criteria
Many studies have indicated that a hydrological model can be sufficiently evaluated by Nash-Sutcliffe efficiency (NSE) and root mean square error (RMSE) measures (e.g., Legates and McCabe, 1999, Nourani et al., 2012, Danandeh Mehr et al., 2014a). The NSE (Nash and Sutcliffe, 1970) is a normalized statistic determining the relative magnitude of estimation error in comparison with the measured data variance (Eq. (3)). It indicates how well the plot of observed data versus predicted data fits the
GP and MGGP modelling
In GP-based modelling, the first step is creating an initial population of genes (or chromosomes for the case of MGGP). To this end, a modeller may specify a primitive sets of terminals and functions. In this study, the predefined input variables and a set of random constants in the range [−10 and +10] are chosen as members of terminal sets for the both GP and MGGP runs. In order to evolve a proper solution, an appropriate guess is required for selection of functions. For highly nonlinear
Results and discussions
The ACF, PACF, and the corresponding 95% confidence bands from lag 0 to lag 20 were estimated for streamflow data (Fig. 7). The ACF and PACF show significant autocorrelation at lag one and lag three, respectively. Therefore, three-day lag can be selected as the extent of lag required for an autoregressive model. As discussed earlier, the ACF and PACF show the dependence from the perspective of linearity. There is no guarantee that three days lag is optimal selection for a nonlinear predictive
Concluding remarks
This paper proposed a new explicit model for single-station streamflow prediction, namely Pareto-optimal MA-MGGP. We developed this model on the basis of two performance metrics comprising goodness of fit and a measure of model complexity, while most of the previous studies have considered a single performance metric (goodness of fit). Complexity of prediction models is important since, in general, we prefer not only accurate but also simple, i.e., parsimonious models for practical use. To this
Acknowledgements
This research was partly supported by funding (Project No. 112Y214) from the Scientific and Technological Research Council of Turkey (TUBITAK). The authors thank the TUBITAK for their support and Turkish State Hydraulic Works (DSI) for providing the data used in this study.
References (72)
- et al.
Advances in ungauged streamflow prediction using artificial neural networks
J. Hydrol.
(2010) - et al.
Input determination for neural network models in water resources applications. Part 1—background and methodology
J. Hydrol.
(2005) - et al.
Comparison between kinematic wave and artificial neural network models in event-based runoff simulation for an overland plane
J. Hydrol.
(2008) - et al.
A Pareto-optimal moving average-multigene genetic programming model for rainfall-runoff modelling
Environ. Modell. Softw.
(2017) - et al.
Streamflow prediction using linear genetic programming in comparison with a neuro-wavelet technique
J. Hydrol.
(2013) - et al.
A gene-wavelet model for long lead-time drought forecasting
J. Hydrol.
(2014) - et al.
Linear genetic programming application for successive-station monthly streamflow prediction
Comput. Geosci.
(2014) - et al.
Flow forecast by SWAT model and ANN in Pracana basin, Portugal
Adv. Eng. Softw.
(2009) Wavelet regression model for short-term streamflow forecasting
J. Hydrol.
(2010)- et al.
Short-term streamflow fore- casting with global climate change implications – a comparative study between genetic programming and neural network models
J. Hydrol.
(2008)