Elsevier

Global and Planetary Change

Volume 121, October 2014, Pages 53-63
Global and Planetary Change

Forecasting Caspian Sea level changes using satellite altimetry data (June 1992–December 2013) based on evolutionary support vector regression algorithms and gene expression programming

https://doi.org/10.1016/j.gloplacha.2014.07.002Get rights and content

Highlights

  • Caspian Sea level changes are predicted using artificial intelligent approaches.

  • Using promising SVM and GEP approaches as satisfactory forecasting models

  • Using time series obtained by satellite altimetry as available high-quality data

Abstract

Sea level forecasting at various time intervals is of great importance in water supply management. Evolutionary artificial intelligence (AI) approaches have been accepted as an appropriate tool for modeling complex nonlinear phenomena in water bodies. In the study, we investigated the ability of two AI techniques: support vector machine (SVM), which is mathematically well-founded and provides new insights into function approximation, and gene expression programming (GEP), which is used to forecast Caspian Sea level anomalies using satellite altimetry observations from June 1992 to December 2013. SVM demonstrates the best performance in predicting Caspian Sea level anomalies, given the minimum root mean square error (RMSE = 0.035) and maximum coefficient of determination (R2 = 0.96) during the prediction periods. A comparison between the proposed AI approaches and the cascade correlation neural network (CCNN) model also shows the superiority of the GEP and SVM models over the CCNN.

Introduction

Accurate predictions and a reliable foresight of sea level behavior have always been important in water resource management scenarios. The analysis of the long-term and short-term sea level fluctuations is especially important because it potentially affects the natural processes occurring in the basin and influences the infrastructure built along coastlines (Llovel et al., 2011, Kisi et al., 2012, Milne and Peros, 2013). Sea level variations are complex outcomes of different site-specific geographical and meteorological variables, including precipitation, runoff, evaporation, temperature, water salinity, and the interaction between surface water and low-lying aquifers, which differ throughout the area. Although sea level monitoring is essentially useful as an applied and a fundamental policy on water management strategies, anticipation of future conditions, both in the short term and in the long term, is necessary at certain times to make reliable hydrological and water management decisions. Regarding different contributors, accurate measurements and analyses by conventional approaches are still difficult to achieve and may suffer from large uncertainties (Talebizadeh and Moridnejad, 2011). Computer-intensive statistical methods have improved modeling approaches for time series data in water resources (e.g., American Society of Civil Engineers (ASCE) Task Committee on Application of Artificial Neural Networks in Hydrology, 2000, Renssen et al., 2007, Ghorbani et al., 2010, Ozyavas et al., 2010, Kisi et al., 2012). Most conventional techniques for sea level prediction are based on the extrapolation of linear trends, where nonlinear time series and irregular changes, such as the El Niño/Southern Oscillation, cannot be satisfactorily fitted. Artificial intelligence (AI) techniques, such as artificial neural network (ANN), decision tree techniques, and fuzzy network, have been developed and are being used to model complex nonlinear phenomena in hydrology and water resource engineering (e.g., More and Deo, 2003, Huang et al., 2006, Wu and Chau, 2010, Kisi et al., 2012, Imani et al., in press). Recently, neural network methodologies, namely, support vector machines (SVMs) and gene expression programming (GEP), have been introduced as applied forecasting techniques in time series analysis (e.g., Kim, 2003, Yu et al., 2006, Guven and Gunal, 2008, Rajasekaran et al., 2008, Ghorbani et al., 2010). The learning algorithms of SVMs, developed by Vapnik et al. (1997), are described specifically by the capacity control of the decision function, the kernel functions, and the sparsity of the solution (Cristianini and Taylor, 2000). SVMs are resistant to the over-fitting problem and thus demonstrate highly generalized performance in solving various time series forecasting cases. Unlike most of the traditional neural network models, which implement the empirical risk minimization principle, SVMs implement the structural risk minimization principle, which seeks to minimize the upper bound of the generalization error rather than minimize the training error (Tay and Cao, 2001). The main advantages of SVMs are being effective in high-dimensional spaces even when the number of dimensions is greater than the number of samples and comprising a subset of training points called support vectors as well as different kernel functions, which can be specified for the decision function. The traditional ANNs have considerable subjectivity in model architecture, whereas the learning algorithm of SVMs automatically decides the model architecture (number of hidden units). Moreover, traditional ANN models do not emphasize the generalization performance, whereas the main characteristic of SVMs is to address this subject in a rigorous theoretical setting (Vapnik, 1992, Haykin, 2003). Despite well-documented studies in other fields, the applications of SVM in hydrology are few. Sivapragasam et al. (2001) conducted one-lead-day rainfall and runoff forecasting using SVM, with preprocessing input data by singular spectrum analysis, resulting in a high-dimensional input space. Tripathi et al. (2006) applied SVM in the statistical downscaling of precipitation at a monthly timescale where the effectiveness of the approach is indicated by its application in meteorological subdivisions in India. Lins et al. (2013) presented a year-ahead prediction procedure based on sea surface temperature (SST) data of previous periods using SVMs. The proposed procedure was conducted based on the seasonal and intraseasonal features of SST. To the best of our knowledge, no study has used SVM in sea level time series prediction. Although the algorithm of SVMs automatically determines the model architecture, GEP is based on data alone to establish the structure and parameters of the model (e.g., Koza, 1992, Ferreira, 2006). GEP may generally be defined as an evolutionary algorithm for computer programs composed of multiple parse trees referred to as expression trees (ETs). GEP is based on the relationship between datasets, followed by model building to describe these connections. The advantage of the genetic programming (GP) approach over ANNs in developing climate change studies is that it provides efficient and transparent modeling results (Ferreira, 2006). Genetic programs are generally robust applications of optimization algorithms using statistical procedures to imitate nature. In this approach, a combination of mathematical expressions is derived to describe the relationship between different variables using operators, such as mutation, recombination, and evolution (Banzhaf et al., 1998). The comprehensibility of GEP models provides lower risk of over-fitting of training data and a way to improve the generalization of resulting models. In addition, the unique and multigenic nature of GEP allows evolution of highly complex programs composed of several subprograms the (Ferreira, 2001a, Ferreira, 2001b).

Only a few applications of GEP can be classified into the field of water and ocean engineering (e.g., Gaur and Deo, 2008, Ustoorikar and Deo, 2008). Drecourt (1999), Savic et al. (1999), Liong et al. (2002), and Aytek and Alp (2008) applied GP in rainfall–runoff modeling. Harris et al. (2003) used GP to predict velocity in compound channels with vegetated flood plains. Aytek and Kisi (2008) applied GP in suspended sediments and observed that GP is better than the conventional rating curve and multilinear regression techniques. Gaur and Deo (2008) applied GP in real-time wave forecasting.

Thus, the main focus of the present study is to predict Caspian Sea level anomalies using altimetric measurements from the TOPEX/Poseidon (T/P), Jason-1 (J-1), and Jason-2/OSTM (J-2) satellite missions from June 1992 to December 2013 by following new approaches, namely, SVMs and GEP, which have not been applied for satellite-based sea level analysis. Then, the proposed models are compared with the ANN-based approach.

Section snippets

Study site and data

The Caspian Sea (Fig. 1a) is the largest inland water body in the world, with a mean salinity of ~ 13 ppt and located in a depression (latitude, 36° to 47° N; longitude, 47° to 54° E) bordered by the Caucasus Mountains to the west, the Central Asian plateau and desert to the east, the Russian and Kazak plains in the north, and the Elbrus Mountains to the south (Kostianoy and Kosarev, 2005). The unique features of the Caspian Sea, such as the size, depth, chemical components, and peculiarities of

Support vector regression

SVM is an advanced neural network technology based on statistical learning (Vapnik, 1992, Vapnik, 1999). Compared with that of other neural network structures, the use of SVMs to estimate the regression function has three different characteristics. First, SVM estimation is performed through a set of linear functions, which are defined in a high-dimensional space. Second, SVM regression estimation is conducted by risk minimization using Vapnik's ε-insensitive loss function. Finally, SVMs use a

Results and discussion

The present study aims to represent 5 and 15 ~ 10-day ahead forecasting of Caspian Sea level anomalies using SVR and GEP models, the results of which are compared with those of a neural network-based model, namely, the cascade correlation neural network (CCNN). Several input combinations are constructed based on the autocorrelation function (ACF) and partial autocorrelation function (PACF) analysis (Fig. 6) (Shiri and Kisi, 2011, Kisi et al., 2012). The ACF and PACF are measures of association

Conclusion

The SVR and GEP techniques in forecasting short-term Caspian Sea level changes are investigated in the present study. Although the performance of both models is superior to that of the CCNN approach, the intercomparison of the obtained results shows that the SVR model outperforms the GEP model in sea level forecasting. The RBF kernel is used in SVM model development because its performance is better than that of the other kernels for the current time series data. The overall results show the

Acknowledgments

This research was supported by the grants from National Cheng Kung University (Taiwan), the National Science Council of Taiwan (NSC 102-2221-E-006-234 and NSC 101-2221-E-006-180-MY3) and the Headquarters of University Advancement at the National Cheng Kung University. Altimeter data products are from AVISO (Archivage, Validation et Interprétation des données des Satellites Océanographiques). We thank anonymous reviewers for their constructive comments. The figures are prepared using the GMT

References (57)

  • I.D. Lins et al.

    Prediction of sea surface temperature in the tropical Atlantic by support vector machines

    Comput. Stat. Data Anal.

    (2013)
  • W. Llovel et al.

    Terrestrial waters and sea level variations on interannual time scale

    Glob. Planet. Chang.

    (2011)
  • G. Milne et al.

    Data–model comparison of Holocene sea-level change in the circum-Caribbean region

    Glob. Planet. Chang.

    (2013)
  • A. More et al.

    Forecasting wind with neural networks

    Mar. Struct.

    (2003)
  • A. Ozyavas et al.

    A possible connection of Caspian Sea level fluctuations with meteorological factors and seismicity

    Earth Planet. Sci. Lett.

    (2010)
  • S. Rajasekaran et al.

    Support vector regression methodology for storm surge predictions

    Ocean Eng.

    (2008)
  • H. Renssen et al.

    Simulating long-term Caspian Sea level changes: the impact of Holocene and future climate conditions

    Earth Planet. Sci. Lett.

    (2007)
  • J. Shiri et al.

    Comparison of genetic programming with neuro-fuzzy systems for predicting short-term water table depth fluctuations

    Comp. Geosci.

    (2011)
  • M. Talebizadeh et al.

    Uncertainty analysis for the forecast of lake level fluctuations using ensembles of ANN and ANFIS models

    Expert Syst. Appl.

    (2011)
  • S. Tripathi et al.

    Downscaling of precipitation for climate change scenarios: a support vector machine approach

    J. Hydrol.

    (2006)
  • K. Ustoorikar et al.

    Filling up gaps in wave data with genetic programming

    Mar. Struct.

    (2008)
  • C.L. Wu et al.

    Data-driven models for monthly stream flow time series prediction

    Eng. Appl. Artif. Intell.

    (2010)
  • American Society of Civil Engineers (ASCE) Task Committee on Application of Artificial Neural Networks in Hydrology

    Artificial neural networks in hydrology. II: Hydrological applications

    J. Hydrol. Eng.

    (2000)
  • A. Aytek et al.

    An application of artificial intelligence for rainfall runoff modeling

    J. Syst. Sci.

    (2008)
  • W. Banzhaf et al.

    Genetic Programming

    (1998)
  • N. Cristianini et al.

    An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods

    (2000)
  • J.P. Drecourt

    Application of neural networks and genetic programming to rainfall runoff modeling

    D2K Technical Report 0699-1-1

    (1999)
  • S.E. Fahlman et al.
  • Cited by (18)

    • A locally relevant framework for assessing the risk of sea level rise under changing temperature conditions: Application in New Caledonia, Pacific Ocean

      2022, Science of the Total Environment
      Citation Excerpt :

      Changes in the sea level can be also estimated using data-driven models that are forced by downscaled climate projections (Rahmstorf, 2010; Bittermann et al., 2013; Leta et al., 2018) as well as astronomical tides (Walsh et al., 2012; Salvadori et al., 2016), and/or storm surges (Khanal et al., 2019). These models are particularly suitable for impact assessment at the local scale and can range from simple trend models (Chatfield, 2000; Hess et al., 2001; Kozłowski et al., 2018) to complicated statistical and machine learning models (Vermeer and Rahmstorf, 2009; Niedzielski and Kosek, 2009; Imani et al., 2014, 2021; Ardabili et al., 2019). Regardless of the straightforwardness and reasonable accuracy of these models during the historical periods, like any other top-down scenario-led impact assessment (see IPCC, 2014), their future estimates are subject to large uncertainty due to the inherent uncertainty in climate projections (Nazemi and Wheater, 2014; Nazemi et al., 2020; Bourdeau-Goulet and Hassanzadeh, 2021).

    • A framework for ‘Inclusive Multiple Modelling’ with critical views on modelling practices – Applications to modelling water levels of Caspian Sea and Lakes Urmia and Van

      2020, Journal of Hydrology
      Citation Excerpt :

      These have been used by various researchers in mathematical models to predict future sea states, e.g. see Vaziri (1997), Arpe et al. (2014) and Srivastava et al. (2016). Various AI techniques have been tested for more accurate predictions and identifying the ‘superior’ technique, e.g. Imani et al. (2014), who tested the performances of SVM, GEP and CCNN (cascade correlation neural network) using satellite altimetry observations from June 1992 to December 2013 and concluded that performances of SVM in predicting Caspian Sea level periods are superior over GEP and CCNN and that of SVM and GEP over CCNN and also they showed that the residuals are normally distributed. Main features of Lake Urmia (Lat: 37°42′N; Long: 45°19′E) include: (i) with respect to its baseline in 2000, it was the second most saline lake in the world; (ii) located within the West and East Azerbaijan provinces in Iran, it is distressed due to being deprived of its natural compensation flows at least over the last 10 years due to impounding almost all of its rivers and watercourses; (iii) it is registered under the Ramsar Convention in 1975 (Site No. 38) and therefore is supposed to be protected from anthropogenic impacts but to no avail; (iv) its water level fluctuated but this was an annual natural cycle.

    • Daily sea level prediction at Chiayi coast, Taiwan using extreme learning machine and relevance vector machine

      2018, Global and Planetary Change
      Citation Excerpt :

      These parameters include the regularization parameter C, which determines the tradeoff cost between minimizing the training error and minimizing model complexity, and the parameter sigma of the kernel function, which defines the non-linear mapping from the input space to some high-dimensional feature space (Wu et al., 2007). Detailed mathematical description of SVM approach can be found elsewhere (Imani et al., 2014b; Thissen et al., 2003). This study only considered the most extensively used kernel function (i.e., RBF) for constructing the SVM structure (Hsu et al., 2006).

    View all citing articles on Scopus
    View full text