Elsevier

Journal of Hydrology

Volume 549, June 2017, Pages 603-615
Journal of Hydrology

Research papers
A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction

https://doi.org/10.1016/j.jhydrol.2017.04.045Get rights and content

Highlights

  • We developed a new hybrid model (Pareto-optimal MA-MGGP) for daily streamflow prediction.

  • The Pareto-optimal MA-MGGP provides more accurate results than stand-alone GP, MGGP and MLR models.

  • The Pareto-optimal MA-MGGP is explicit and less complex than MA-MGGP model.

  • Lagged prediction is the main drawback of stand-alone GP, MGGP and MLR models.

Abstract

Genetic programming (GP) is able to systematically explore alternative model structures of different accuracy and complexity from observed input and output data. The effectiveness of GP in hydrological system identification has been recognized in recent studies. However, selecting a parsimonious (accurate and simple) model from such alternatives still remains a question. This paper proposes a Pareto-optimal moving average multigene genetic programming (MA-MGGP) approach to develop a parsimonious model for single-station streamflow prediction. The three main components of the approach that take us from observed data to a validated model are: (1) data pre-processing, (2) system identification and (3) system simplification. The data pre-processing ingredient uses a simple moving average filter to diminish the lagged prediction effect of stand-alone data-driven models. The multigene ingredient of the model tends to identify the underlying nonlinear system with expressions simpler than classical monolithic GP and, eventually simplification component exploits Pareto front plot to select a parsimonious model through an interactive complexity-efficiency trade-off. The approach was tested using the daily streamflow records from a station on Senoz Stream, Turkey. Comparing to the efficiency results of stand-alone GP, MGGP, and conventional multi linear regression prediction models as benchmarks, the proposed Pareto-optimal MA-MGGP model put forward a parsimonious solution, which has a noteworthy importance of being applied in practice. In addition, the approach allows the user to enter human insight into the problem to examine evolved models and pick the best performing programs out for further analysis.

Introduction

It is well documented that streamflow process is complex and not easily predictable (Yaseen et al., 2015). This is mainly due to the non-stationary feature of the phenomenon and highly nonlinear relationship between streamflow and the characteristics of its catchment (Nourani et al., 2011, Danandeh Mehr et al., 2013). One of the common ways to model streamflow process is to use of data-driven techniques, which have the ability to learn about and extract the nonlinear relationships between the streamflow and its driving variables. When using such techniques, a sound knowledge of the underlying physical processes is not prerequisite (Hundecha et al., 2001, Noori and Kalin, 2016).

Short-term streamflow prediction with a lead time less than (or equal to) one day is necessary for the real-time flood warning and reservoir operation systems (Danandeh Mehr et al., 2015). The application of various data-driven techniques, such as artificial neural networks (ANN), genetic programming (GP), and fuzzy logic in short-term streamflow prediction has been extensively evaluated and published in recent years (e.g., Hundecha et al., 2001, Moradkhani et al., 2004, Kücük and Agiralioglu, 2006, Wang et al., 2006, Makkeasorn et al., 2008, Shiri and Kisi, 2010, Kisi, 2010, Rezaeianzadeh et al., 2013, Krishna, 2013, Hosseinzadeh Talaee, 2014, Danandeh Mehr et al., 2015). Regardless of the type of the data-driven technique employed, prediction accuracy is highly dependent on the variables used to train and validate the technique. In most of the flow prediction studies on a short-term basis, daily rainfall and streamflow records have been used to create so-called rainfall-runoff models (e.g., Mutlu et al., 2008, Nourani et al., 2011, Nourani et al., 2012, Shoaib et al., 2015). However, in poorly gauged basins, where no rainfall record is available, the rainfall-runoff commonly used models are not applicable. In such cases, single-station, cross-station, or successive-station runoff-runoff prediction models have been suggested (e.g., Ochoa-Rivera et al., 2002, Kisi and Cigizoglu, 2007, Demirel et al., 2009, Besaw et al., 2010, Can et al., 2012, Danandeh Mehr et al., 2015). For example, Kisi and Cigizoglu (2007) developed three ANN-based single-station prediction models to forecast daily streamflow at two rivers in Turkey. To this end, daily streamflow observations with one- to six-day antecedent records (lags) were considered as input vectors for the ANN. The authors demonstrated that three days lag (i.e., three input vectors) is sufficient to achieve the best one-day ahead streamflow forecasting model in respect of selected performance criteria. In another single-station streamflow prediction study, Özger (2009) developed two fuzzy inference systems using daily streamflow records at Demirkapi Station on Euphrates River, Turkey. Based upon strong serial dependence of observational flows, the author suggested that one- and two-day lags are enough to train the models. On the basis of auto- and partial autocorrelation analysis, Hosseinzadeh Talaee (2014) explained that one- to four-day lags of streamflow records at a station in Aspas Stream, Iran, are the most appropriate input vectors to train multilayer perceptron (MLP) neural networks for one-day ahead streamflow prediction. Most recently, Altunkaynak and Nigussie (2015) combined the SEASON algorithm with a MLP neural network and demonstrated that the new hybrid model can be used to extend lead time of single-station daily streamflow prediction models.

Despite providing acceptable prediction accuracy, none of the aforementioned studies provided explicit formulation in regard to single-station streamflow process. To bridge the gap, recent works have focused on applying GP to discover underlying process explicitly. Our review showed that only a few studies have investigated capability of GP in short-term streamflow prediction. For instance, Guven (2009) applied linear GP (LGP) and two versions of ANN to predict daily flow of Schuylkill River in the USA. The author demonstrated that the performance of LGP is higher than that of ANNs. Londhe and Charhate (2010) developed two GP-based one-day ahead streamflow prediction models at two stations in Narmada Catchment of India and demonstrated that GP performs superior than ANN and model trees. Although classical GP has been implemented in a few other streamflow modelling studies (e.g., Ni et al., 2010, Nourani et al., 2012, Nourani et al., 2013b), at the best of our knowledge, no study has yet been conducted to assess the potential of multigene topology in GP, i.e., MGGP, for single-station daily streamflow prediction. Moreover, previous studies have investigated the effectiveness of GP mostly in terms of prediction accuracy and thus, additional studies are required to address problems associated with complexity of the proposed models. In this sense, this paper, for the first time, proposes a Pareto-optimal moving average multigene genetic programming (Pareto-optimal MA-MGGP) model to develop a parsimonious (accurate and simple) model for single-station streamflow prediction. The model is applied for daily streamflow prediction at Senoz Catchment in Turkey, and its performance is compared with those of stand-alone GP, MGGP, and conventional multivariable linear regression (MLR) prediction models as benchmarks. From practical point of view, the proposed model is explicit and parsimonious so that motivating to be used in practice.

Section snippets

Genetic programming (GP)

The state-of-the-art GP is of the most popular data-driven techniques that evolves computer programs to automatically solve problems using Darwinian natural selection (Koza, 1992). The task is done by randomly generating a population of computer programs and then breeding together the best performing programs to create a new population (offspring). Mimicking Darwinian evolution, this process is iterated until the population contains programs that solve the task well (Searson, 2015). In

Moving average (MA) filtering

MA is a smoothing filter that isolates the effect of periodic components of the day of year on daily streamflow records and remove unwanted line noise from a manipulated streamflow measurements. In the simplest and commonly used form a symmetric MA filter of length N (i.e., MA (N)), each data point from a streamflow signal takes the average of every N neighbour samples of the point. If a modeller is interested in daily streamflow prediction, selection of higher length for MA may lead to less

The Pareto-optimal MA-MGGP model

Similar to some of the previous one-day ahead single-station streamflow prediction studies (e.g., Özger, 2009, Hosseinzadeh Talaee, 2014, Altunkaynak and Nigussie, 2015, Al-Juboori and Guven, 2016), the steady-state input-output transfer function given in Eq. (2) is considered as objective function to be formulated by the proposed MA-MGGP model in this study.Qt=f(Qt-i)+ε(t)

where Qt and Qt−i represent daily streamflow at the present time instant t and past time instant t  i, respectively. The

Study area and observational data

The proposed MA-MGGP model is trained and verified for the unregulated portion of Senoz Stream located in Rize Province, Turkey (Fig. 5). The province, which is considered as the wettest corner of the country, is characterized by high rainfall and rough terrain covered by forest and tea gardens experiencing occasional landslides and floods. According to Koppen-Geiger climate classification, the catchment has a borderline oceanic/humid subtropical climate with warm summers and cool winters (

Efficiency criteria

Many studies have indicated that a hydrological model can be sufficiently evaluated by Nash-Sutcliffe efficiency (NSE) and root mean square error (RMSE) measures (e.g., Legates and McCabe, 1999, Nourani et al., 2012, Danandeh Mehr et al., 2014a). The NSE (Nash and Sutcliffe, 1970) is a normalized statistic determining the relative magnitude of estimation error in comparison with the measured data variance (Eq. (3)). It indicates how well the plot of observed data versus predicted data fits the

GP and MGGP modelling

In GP-based modelling, the first step is creating an initial population of genes (or chromosomes for the case of MGGP). To this end, a modeller may specify a primitive sets of terminals and functions. In this study, the predefined input variables and a set of random constants in the range [−10 and +10] are chosen as members of terminal sets for the both GP and MGGP runs. In order to evolve a proper solution, an appropriate guess is required for selection of functions. For highly nonlinear

Results and discussions

The ACF, PACF, and the corresponding 95% confidence bands from lag 0 to lag 20 were estimated for streamflow data (Fig. 7). The ACF and PACF show significant autocorrelation at lag one and lag three, respectively. Therefore, three-day lag can be selected as the extent of lag required for an autoregressive model. As discussed earlier, the ACF and PACF show the dependence from the perspective of linearity. There is no guarantee that three days lag is optimal selection for a nonlinear predictive

Concluding remarks

This paper proposed a new explicit model for single-station streamflow prediction, namely Pareto-optimal MA-MGGP. We developed this model on the basis of two performance metrics comprising goodness of fit and a measure of model complexity, while most of the previous studies have considered a single performance metric (goodness of fit). Complexity of prediction models is important since, in general, we prefer not only accurate but also simple, i.e., parsimonious models for practical use. To this

Acknowledgements

This research was partly supported by funding (Project No. 112Y214) from the Scientific and Technological Research Council of Turkey (TUBITAK). The authors thank the TUBITAK for their support and Turkish State Hydraulic Works (DSI) for providing the data used in this study.

References (72)

  • P. Masselot et al.

    Streamflow forecasting using functional regression

    J. Hydrol.

    (2016)
  • A. Meshgi et al.

    Development of a modular streamflow model to quantify runoff contributions from different land uses in tropical urban environments using Genetic Programming

    J. Hydrol.

    (2015)
  • H. Moradkhani et al.

    Improved streamflow forecasting using self-organizing radial basis function artificial neural networks

    J. Hydrol.

    (2004)
  • J.E. Nash et al.

    River flow forecasting through conceptual models part I – a discussion of principles

    J. Hydrol.

    (1970)
  • N. Noori et al.

    Coupling SWAT and ANN models for enhanced daily streamflow prediction

    J. Hydrol.

    (2016)
  • V. Nourani et al.

    Two hybrid Artificial Intelligence approaches for modeling rainfall–runoff process

    J. Hydrol.

    (2011)
  • V. Nourani et al.

    Using self-organizing maps and wavelet transforms for space–time pre-processing of satellite precipitation and runoff data in neural network based rainfall–runoff modeling

    J. Hydrol.

    (2013)
  • M. Ravansalar et al.

    A wavelet-linear genetic programming model for sodium (Na+) concentration forecasting in rivers

    J. Hydrol.

    (2016)
  • A.M. Sattar et al.

    Gene expression models for prediction of longitudinal dispersion coefficient in streams

    J. Hydrol.

    (2015)
  • B. Selle et al.

    Testing the structure of a hydrological model using Genetic Programming

    J. Hydrol.

    (2011)
  • J. Shiri et al.

    Short-term and long-term streamflow forecasting using a wavelet and neuro-fuzzy conjunction model

    J. Hydrol.

    (2010)
  • M. Shoaib et al.

    Runoff forecasting using hybrid Wavelet Gene Expression Programming (WGEP) approach

    J. Hydrol.

    (2015)
  • W. Wang et al.

    Forecasting daily streamflow using hybrid ANN models

    J. Hydrol.

    (2006)
  • C.L. Wu et al.

    Data-driven models for monthly streamflow time series prediction

    Eng. Appl. Artif. Intel.

    (2010)
  • C.L. Wu et al.

    Rainfall–runoff modeling using artificial neural network coupled with singular spectrum analysis

    J. Hydrol.

    (2011)
  • C.L. Wu et al.

    Methods to improve neural network performance in daily flows prediction

    J. Hydrol.

    (2009)
  • Z.M. Yaseen et al.

    Artificial intelligence based models for stream-flow forecasting: 2000–2015

    J. Hydrol.

    (2015)
  • C.R. Zorn et al.

    Peak flood estimation using gene expression programming

    J. Hydrol.

    (2015)
  • A.M. Al-Juboori et al.

    A stepwise model to predict monthly streamflow

    J. Hydrol.

    (2016)
  • A. Altunkaynak et al.

    Performance comparison of SAS-multilayer perceptron and wavelet-multilayer perceptron models in terms of daily streamflow prediction

    J. Hydrol. Eng.

    (2015)
  • V. Babovic et al.

    Genetic programming as a model induction engine

    J. Hydroinf.

    (2000)
  • M. Brameier et al.

    Linear Genetic Programming

    (2007)
  • İ. Can et al.

    Daily streamflow modelling using autoregressive moving average and artificial neural networks models: case study of Çoruh basin, Turkey

    Water Environ. J.

    (2012)
  • A. Danandeh Mehr et al.

    Grid-based performance evaluation of GCM-RCM combinations for rainfall reproduction

    Theor. Appl. Climatol.

    (2016)
  • A. Danandeh Mehr et al.

    Successive-station monthly streamflow prediction using different ANN algorithms

    Int. J. Environ. Sci. Technol.

    (2015)
  • N.J. de Vos et al.

    Constraints of artificial neural networks for rainfall–runoff modeling: trade-offs in hydrological state representation and model evaluation

    Hydrol. Earth Syst. Sci.

    (2005)
  • Cited by (0)

    View full text