Forecasting Caspian Sea level changes using satellite altimetry data (June 1992–December 2013) based on evolutionary support vector regression algorithms and gene expression programming

doi:10.1016/j.gloplacha.2014.07.002

Global and Planetary Change

Volume 121, October 2014, Pages 53-63

https://doi.org/10.1016/j.gloplacha.2014.07.002 Get rights and content

Highlights

•
Caspian Sea level changes are predicted using artificial intelligent approaches.
•
Using promising SVM and GEP approaches as satisfactory forecasting models
•
Using time series obtained by satellite altimetry as available high-quality data

Abstract

Sea level forecasting at various time intervals is of great importance in water supply management. Evolutionary artificial intelligence (AI) approaches have been accepted as an appropriate tool for modeling complex nonlinear phenomena in water bodies. In the study, we investigated the ability of two AI techniques: support vector machine (SVM), which is mathematically well-founded and provides new insights into function approximation, and gene expression programming (GEP), which is used to forecast Caspian Sea level anomalies using satellite altimetry observations from June 1992 to December 2013. SVM demonstrates the best performance in predicting Caspian Sea level anomalies, given the minimum root mean square error (RMSE = 0.035) and maximum coefficient of determination (R² = 0.96) during the prediction periods. A comparison between the proposed AI approaches and the cascade correlation neural network (CCNN) model also shows the superiority of the GEP and SVM models over the CCNN.

Introduction

Accurate predictions and a reliable foresight of sea level behavior have always been important in water resource management scenarios. The analysis of the long-term and short-term sea level fluctuations is especially important because it potentially affects the natural processes occurring in the basin and influences the infrastructure built along coastlines (Llovel et al., 2011, Kisi et al., 2012, Milne and Peros, 2013). Sea level variations are complex outcomes of different site-specific geographical and meteorological variables, including precipitation, runoff, evaporation, temperature, water salinity, and the interaction between surface water and low-lying aquifers, which differ throughout the area. Although sea level monitoring is essentially useful as an applied and a fundamental policy on water management strategies, anticipation of future conditions, both in the short term and in the long term, is necessary at certain times to make reliable hydrological and water management decisions. Regarding different contributors, accurate measurements and analyses by conventional approaches are still difficult to achieve and may suffer from large uncertainties (Talebizadeh and Moridnejad, 2011). Computer-intensive statistical methods have improved modeling approaches for time series data in water resources (e.g., American Society of Civil Engineers (ASCE) Task Committee on Application of Artificial Neural Networks in Hydrology, 2000, Renssen et al., 2007, Ghorbani et al., 2010, Ozyavas et al., 2010, Kisi et al., 2012). Most conventional techniques for sea level prediction are based on the extrapolation of linear trends, where nonlinear time series and irregular changes, such as the El Niño/Southern Oscillation, cannot be satisfactorily fitted. Artificial intelligence (AI) techniques, such as artificial neural network (ANN), decision tree techniques, and fuzzy network, have been developed and are being used to model complex nonlinear phenomena in hydrology and water resource engineering (e.g., More and Deo, 2003, Huang et al., 2006, Wu and Chau, 2010, Kisi et al., 2012, Imani et al., in press). Recently, neural network methodologies, namely, support vector machines (SVMs) and gene expression programming (GEP), have been introduced as applied forecasting techniques in time series analysis (e.g., Kim, 2003, Yu et al., 2006, Guven and Gunal, 2008, Rajasekaran et al., 2008, Ghorbani et al., 2010). The learning algorithms of SVMs, developed by Vapnik et al. (1997), are described specifically by the capacity control of the decision function, the kernel functions, and the sparsity of the solution (Cristianini and Taylor, 2000). SVMs are resistant to the over-fitting problem and thus demonstrate highly generalized performance in solving various time series forecasting cases. Unlike most of the traditional neural network models, which implement the empirical risk minimization principle, SVMs implement the structural risk minimization principle, which seeks to minimize the upper bound of the generalization error rather than minimize the training error (Tay and Cao, 2001). The main advantages of SVMs are being effective in high-dimensional spaces even when the number of dimensions is greater than the number of samples and comprising a subset of training points called support vectors as well as different kernel functions, which can be specified for the decision function. The traditional ANNs have considerable subjectivity in model architecture, whereas the learning algorithm of SVMs automatically decides the model architecture (number of hidden units). Moreover, traditional ANN models do not emphasize the generalization performance, whereas the main characteristic of SVMs is to address this subject in a rigorous theoretical setting (Vapnik, 1992, Haykin, 2003). Despite well-documented studies in other fields, the applications of SVM in hydrology are few. Sivapragasam et al. (2001) conducted one-lead-day rainfall and runoff forecasting using SVM, with preprocessing input data by singular spectrum analysis, resulting in a high-dimensional input space. Tripathi et al. (2006) applied SVM in the statistical downscaling of precipitation at a monthly timescale where the effectiveness of the approach is indicated by its application in meteorological subdivisions in India. Lins et al. (2013) presented a year-ahead prediction procedure based on sea surface temperature (SST) data of previous periods using SVMs. The proposed procedure was conducted based on the seasonal and intraseasonal features of SST. To the best of our knowledge, no study has used SVM in sea level time series prediction. Although the algorithm of SVMs automatically determines the model architecture, GEP is based on data alone to establish the structure and parameters of the model (e.g., Koza, 1992, Ferreira, 2006). GEP may generally be defined as an evolutionary algorithm for computer programs composed of multiple parse trees referred to as expression trees (ETs). GEP is based on the relationship between datasets, followed by model building to describe these connections. The advantage of the genetic programming (GP) approach over ANNs in developing climate change studies is that it provides efficient and transparent modeling results (Ferreira, 2006). Genetic programs are generally robust applications of optimization algorithms using statistical procedures to imitate nature. In this approach, a combination of mathematical expressions is derived to describe the relationship between different variables using operators, such as mutation, recombination, and evolution (Banzhaf et al., 1998). The comprehensibility of GEP models provides lower risk of over-fitting of training data and a way to improve the generalization of resulting models. In addition, the unique and multigenic nature of GEP allows evolution of highly complex programs composed of several subprograms the (Ferreira, 2001a, Ferreira, 2001b).

Only a few applications of GEP can be classified into the field of water and ocean engineering (e.g., Gaur and Deo, 2008, Ustoorikar and Deo, 2008). Drecourt (1999), Savic et al. (1999), Liong et al. (2002), and Aytek and Alp (2008) applied GP in rainfall–runoff modeling. Harris et al. (2003) used GP to predict velocity in compound channels with vegetated flood plains. Aytek and Kisi (2008) applied GP in suspended sediments and observed that GP is better than the conventional rating curve and multilinear regression techniques. Gaur and Deo (2008) applied GP in real-time wave forecasting.

Thus, the main focus of the present study is to predict Caspian Sea level anomalies using altimetric measurements from the TOPEX/Poseidon (T/P), Jason-1 (J-1), and Jason-2/OSTM (J-2) satellite missions from June 1992 to December 2013 by following new approaches, namely, SVMs and GEP, which have not been applied for satellite-based sea level analysis. Then, the proposed models are compared with the ANN-based approach.

Section snippets

Study site and data

The Caspian Sea (Fig. 1a) is the largest inland water body in the world, with a mean salinity of ~ 13 ppt and located in a depression (latitude, 36° to 47° N; longitude, 47° to 54° E) bordered by the Caucasus Mountains to the west, the Central Asian plateau and desert to the east, the Russian and Kazak plains in the north, and the Elbrus Mountains to the south (Kostianoy and Kosarev, 2005). The unique features of the Caspian Sea, such as the size, depth, chemical components, and peculiarities of

Support vector regression

SVM is an advanced neural network technology based on statistical learning (Vapnik, 1992, Vapnik, 1999). Compared with that of other neural network structures, the use of SVMs to estimate the regression function has three different characteristics. First, SVM estimation is performed through a set of linear functions, which are defined in a high-dimensional space. Second, SVM regression estimation is conducted by risk minimization using Vapnik's ε-insensitive loss function. Finally, SVMs use a

Results and discussion

The present study aims to represent 5 and 15 ~ 10-day ahead forecasting of Caspian Sea level anomalies using SVR and GEP models, the results of which are compared with those of a neural network-based model, namely, the cascade correlation neural network (CCNN). Several input combinations are constructed based on the autocorrelation function (ACF) and partial autocorrelation function (PACF) analysis (Fig. 6) (Shiri and Kisi, 2011, Kisi et al., 2012). The ACF and PACF are measures of association

Conclusion

The SVR and GEP techniques in forecasting short-term Caspian Sea level changes are investigated in the present study. Although the performance of both models is superior to that of the CCNN approach, the intercomparison of the obtained results shows that the SVR model outperforms the GEP model in sea level forecasting. The RBF kernel is used in SVM model development because its performance is better than that of the other kernels for the current time series data. The overall results show the

Acknowledgments

This research was supported by the grants from National Cheng Kung University (Taiwan), the National Science Council of Taiwan (NSC 102-2221-E-006-234 and NSC 101-2221-E-006-180-MY3) and the Headquarters of University Advancement at the National Cheng Kung University. Altimeter data products are from AVISO (Archivage, Validation et Interprétation des données des Satellites Océanographiques). We thank anonymous reviewers for their constructive comments. The figures are prepared using the GMT

References (57)

S.I. Ahmad et al.
Performance of stochastic approaches for forecasting river water quality
Water Res.
(2001)
A. Aytek et al.
A genetic programming approach to suspended sediment modeling
J. Hydrol.
(2008)
A. Cazenave et al.
Sea level changes in the Mediterranean and Black seas from satellite altimetry
Glob. Planet. Chang.
(2002)
V. Cherkassky et al.
Practical selection of SVM parameters and noise estimation for SVM regression
Neural Netw.
(2004)
J.F. Cretaux et al.
SOLS: a lake database to monitor in the Near Real Time water level and storage variations from remote sensing data
Adv. Space Res.
(2011)
S. Gaur et al.
Real-time wave forecasting using genetic programming
Ocean Eng.
(2008)
A. Guisan et al.
Predictive habitat distribution models in ecology
Ecol. Model
(2000)
M. Huang et al.
Application of artificial neural networks to the prediction of dust storms in Northwest China
Glob. Planet. Chang.
(2006)
K.J. Kim
Financial time series forecasting using support vector machines
Neurocomputing
(2003)
O. Kisi et al.
Forecasting daily lake levels using artificial intelligence approaches
Comput. Geosci.
(2012)

I.D. Lins et al.

Prediction of sea surface temperature in the tropical Atlantic by support vector machines

Comput. Stat. Data Anal.

(2013)

W. Llovel et al.

Terrestrial waters and sea level variations on interannual time scale

Glob. Planet. Chang.

(2011)

G. Milne et al.

Data–model comparison of Holocene sea-level change in the circum-Caribbean region

Glob. Planet. Chang.

(2013)

A. More et al.

Forecasting wind with neural networks

Mar. Struct.

(2003)

A. Ozyavas et al.

A possible connection of Caspian Sea level fluctuations with meteorological factors and seismicity

Earth Planet. Sci. Lett.

(2010)

S. Rajasekaran et al.

Support vector regression methodology for storm surge predictions

Ocean Eng.

(2008)

H. Renssen et al.

Simulating long-term Caspian Sea level changes: the impact of Holocene and future climate conditions

Earth Planet. Sci. Lett.

(2007)

J. Shiri et al.

Comparison of genetic programming with neuro-fuzzy systems for predicting short-term water table depth fluctuations

Comp. Geosci.

(2011)

M. Talebizadeh et al.

Uncertainty analysis for the forecast of lake level fluctuations using ensembles of ANN and ANFIS models

Expert Syst. Appl.

(2011)

S. Tripathi et al.

Downscaling of precipitation for climate change scenarios: a support vector machine approach

J. Hydrol.

(2006)

K. Ustoorikar et al.

Filling up gaps in wave data with genetic programming

Mar. Struct.

(2008)

C.L. Wu et al.

Data-driven models for monthly stream flow time series prediction

Eng. Appl. Artif. Intell.

(2010)

American Society of Civil Engineers (ASCE) Task Committee on Application of Artificial Neural Networks in Hydrology

Artificial neural networks in hydrology. II: Hydrological applications

J. Hydrol. Eng.

(2000)

A. Aytek et al.

An application of artificial intelligence for rainfall runoff modeling

J. Syst. Sci.

(2008)

W. Banzhaf et al.

Genetic Programming

(1998)

N. Cristianini et al.

An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods

(2000)

J.P. Drecourt

Application of neural networks and genetic programming to rainfall runoff modeling

D2K Technical Report 0699-1-1

(1999)

S.E. Fahlman et al.

Cited by (18)

A locally relevant framework for assessing the risk of sea level rise under changing temperature conditions: Application in New Caledonia, Pacific Ocean
2022, Science of the Total Environment
Citation Excerpt :
Changes in the sea level can be also estimated using data-driven models that are forced by downscaled climate projections (Rahmstorf, 2010; Bittermann et al., 2013; Leta et al., 2018) as well as astronomical tides (Walsh et al., 2012; Salvadori et al., 2016), and/or storm surges (Khanal et al., 2019). These models are particularly suitable for impact assessment at the local scale and can range from simple trend models (Chatfield, 2000; Hess et al., 2001; Kozłowski et al., 2018) to complicated statistical and machine learning models (Vermeer and Rahmstorf, 2009; Niedzielski and Kosek, 2009; Imani et al., 2014, 2021; Ardabili et al., 2019). Regardless of the straightforwardness and reasonable accuracy of these models during the historical periods, like any other top-down scenario-led impact assessment (see IPCC, 2014), their future estimates are subject to large uncertainty due to the inherent uncertainty in climate projections (Nazemi and Wheater, 2014; Nazemi et al., 2020; Bourdeau-Goulet and Hassanzadeh, 2021).
Sea level rise is a key feature in a warmer world and its impact can be seen globally. Assessing climate change-induced sea level rise, therefore, is urgently needed particularly in small island nations, where the threats of sea level rise are immediate, but the level of preparedness is low. Here, we propose a stochastic simulator to link changes in Mean Annual Temperature (MAT) to Mean Annual Sea Level (MASEL) at the local scale. This is through what-if scenarios that are developed based on the association between local temperature and sea level. The model can provide a basis for a bottom-up impact assessment by addressing limitations of applying large-scale projections in small islands and facilitating the accessibility of the impact assessment to stakeholders. For this purpose, we decompose the MAT and MASEL signals into their linear trend and autocorrelation components as well as independent and identically distributed residual terms. We further explore the association between trend and residual terms of MAT and MASEL. If such dependencies exist, scenarios of sea level can be synthesized based on the trend and residual terms of temperature. We use linear regression to link trends of MAT and MASEL, and copulas to formulate dependencies between residuals. This allows stochastic sampling of MASEL conditioned to trend and random variability in MAT. This framework is used for retrospective and prospective simulations of MASEL in Nouméa, the capital city of New Caledonia, the Pacific. We set up six different model configurations for developing the stochastic sampler, each including various parametric options. By selecting the best setup from each configuration, we provide a multi-model stochastic projection of MASEL, assuming the persistence in current long-term trend in MAT and MASEL. We demonstrate how such simulations can be used for a risk-based impact assessments and discuss sources of uncertainty in future projections.
A framework for ‘Inclusive Multiple Modelling’ with critical views on modelling practices – Applications to modelling water levels of Caspian Sea and Lakes Urmia and Van
2020, Journal of Hydrology
Citation Excerpt :
These have been used by various researchers in mathematical models to predict future sea states, e.g. see Vaziri (1997), Arpe et al. (2014) and Srivastava et al. (2016). Various AI techniques have been tested for more accurate predictions and identifying the ‘superior’ technique, e.g. Imani et al. (2014), who tested the performances of SVM, GEP and CCNN (cascade correlation neural network) using satellite altimetry observations from June 1992 to December 2013 and concluded that performances of SVM in predicting Caspian Sea level periods are superior over GEP and CCNN and that of SVM and GEP over CCNN and also they showed that the residuals are normally distributed. Main features of Lake Urmia (Lat: 37°42′N; Long: 45°19′E) include: (i) with respect to its baseline in 2000, it was the second most saline lake in the world; (ii) located within the West and East Azerbaijan provinces in Iran, it is distressed due to being deprived of its natural compensation flows at least over the last 10 years due to impounding almost all of its rivers and watercourses; (iii) it is registered under the Ramsar Convention in 1975 (Site No. 38) and therefore is supposed to be protected from anthropogenic impacts but to no avail; (iv) its water level fluctuated but this was an annual natural cycle.
A framework is formulated in this paper for data-driven modelling practices to characterise Inclusive Multiple Modelling (IMM) practices with multiple goals of enhancing the extracted information from given datasets and learning from multiple models. This can be a shift from traditional practices with the single goal of selecting a ‘superior’ model from multiple models without a statistical justification, which may be referred to as Exclusionary Multiple Modelling (EMM) practices. The dimensions of the framework for IMM practices are: Model Reuse (MR), Hierarchy and/or Recursion (HR), a provision of ‘Elastic’ model-Learning Environment (ELE) and Goal-Orientation (GO) – leading to the acronym of RHEO. Proof-of-concept is presented for IMM-RHEO using three testcases: the Caspian Sea (19-years of data), Lake Urmia (50-years of data) and Lake Van (73-years of data), approx. 500 km apart. IMM practices are implemented by investigating four strategies for each testcase. The learning from the results includes: (i) the IMM strategies are capable of enhancing the accuracy of predicted water levels; (ii) the accuracy of predicting the sea-state of the Caspian Sea serves confidence building on accuracy; and (iii) the time-length of the record of Lake Van is long enough for the confidence building on the study of possible trends. IMM serves a bottom-up learning opportunity for Lake Urmia that its distressed state is due to being deprived of compensation flows without contributions from climate change. Arguably, a good management policy is the key for its restoration. IMM is at its infancy but arguably, its potential application areas are wide.
Daily sea level prediction at Chiayi coast, Taiwan using extreme learning machine and relevance vector machine
2018, Global and Planetary Change
Citation Excerpt :
These parameters include the regularization parameter C, which determines the tradeoff cost between minimizing the training error and minimizing model complexity, and the parameter sigma of the kernel function, which defines the non-linear mapping from the input space to some high-dimensional feature space (Wu et al., 2007). Detailed mathematical description of SVM approach can be found elsewhere (Imani et al., 2014b; Thissen et al., 2003). This study only considered the most extensively used kernel function (i.e., RBF) for constructing the SVM structure (Hsu et al., 2006).
The analysis and the prediction of sea level fluctuations are core requirements of marine meteorology and operational oceanography. Estimates of sea level with hours-to-days warning times are especially important for low-lying regions and coastal zone management. The primary purpose of this study is to examine the applicability and capability of extreme learning machine (ELM) and relevance vector machine (RVM) models for predicting sea level variations and compare their performances with powerful machine learning methods, namely, support vector machine (SVM) and radial basis function (RBF) models. The input dataset from the period of January 2004 to May 2011 used in the study was obtained from the Dongshi tide gauge station in Chiayi, Taiwan. Results showed that the ELM and RVM models outperformed the other methods. The performance of the RVM approach was superior in predicting the daily sea level time series given the minimum root mean square error of 34.73 mm and the maximum determination coefficient of 0.93 (R²) during the testing periods. Furthermore, the obtained results were in close agreement with the original tide-gauge data, which indicates that RVM approach is a promising alternative method for time series prediction and could be successfully used for daily sea level forecasts.
Sea Surface Height Anomaly Prediction Based on Artificial Intelligence
2023, Artificial Intelligence Oceanography
A Hybrid Approach to Forecasting Water Quality in Urban Drainage Systems
2022, Research Square
Hybrid Multivariate Deep Learning Network for Multistep Ahead Sea Level Anomaly Forecasting
2022, Journal of Atmospheric and Oceanic Technology

View all citing articles on Scopus

View full text

Forecasting Caspian Sea level changes using satellite altimetry data (June 1992–December 2013) based on evolutionary support vector regression algorithms and gene expression programming

Highlights

Abstract

Introduction

Section snippets

Study site and data

Support vector regression

Results and discussion

Conclusion

Acknowledgments

Water Res.

J. Hydrol.

Glob. Planet. Chang.

Neural Netw.

Adv. Space Res.

Ocean Eng.

Ecol. Model

Glob. Planet. Chang.

Neurocomputing

Comput. Geosci.

Comput. Stat. Data Anal.

Glob. Planet. Chang.

Glob. Planet. Chang.

Mar. Struct.

Earth Planet. Sci. Lett.

Ocean Eng.

Earth Planet. Sci. Lett.

Comp. Geosci.

Expert Syst. Appl.

J. Hydrol.

Mar. Struct.

Eng. Appl. Artif. Intell.

Artificial neural networks in hydrology. II: Hydrological applications

J. Hydrol. Eng.

An application of artificial intelligence for rainfall runoff modeling

J. Syst. Sci.

Genetic Programming

An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods

Application of neural networks and genetic programming to rainfall runoff modeling

D2K Technical Report 0699-1-1