Elsevier

Science of The Total Environment

Volume 644, 10 December 2018, Pages 954-962
Science of The Total Environment

A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination

https://doi.org/10.1016/j.scitotenv.2018.07.054Get rights and content

Highlights

  • Novel risk assessment framework for nitrate groundwater contamination in arid regions.

  • Machine learning (ML) predicting vulnerability, pollution, and occurrence probability.

  • ML allows quick regional evaluation of risk posed by nitrate in groundwater.

Abstract

This study aimed to develop a novel framework for risk assessment of nitrate groundwater contamination by integrating chemical and statistical analysis for an arid region. A standard method was applied for assessing the vulnerability of groundwater to nitrate pollution in Lenjanat plain, Iran. Nitrate concentration were collected from 102 wells of the plain and used to provide pollution occurrence and probability maps. Three machine learning models including boosted regression trees (BRT), multivariate discriminant analysis (MDA), and support vector machine (SVM) were used for the probability of groundwater pollution occurrence. Afterwards, an ensemble modeling approach was applied for production of the groundwater pollution occurrence probability map. Validation of the models was carried out using area under the receiver operating characteristic curve method (AUC); values above 80% were selected to contribute in ensembling process. Results indicated that accuracy for the three models ranged from 0.81 to 0.87, therefore all models were considered for ensemble modeling process. The resultant groundwater pollution risk (produced by vulnerability, pollution, and probability maps) indicated that the central regions of the plain have high and very high risk of nitrate pollution further confirmed by the exiting landuse map. The findings may provide very helpful information in decision making for groundwater pollution risk management especially in semi-arid regions.

Introduction

Groundwater is one of the most valuable natural resources especially in arid regions due to negligible rainfall and the scarcity of surface water resources (Neshat et al., 2014; Choubin and Malekian, 2017). Groundwater provides about 63% of drinking water for population of Iran (IMOF, 2014), and it is the single source of drinking water for some large cities and many rural communities. In 2014, groundwater accounted for about 5% of water withdrawn for public use for cities and about 6% of water withdrawn by self-supplied systems for domestic supply (IMOF, 2014).

A variety of chemicals, including nitrate, can pass through the soil and potentially contaminate groundwater (Hutchins et al., 2018). Beneath the agricultural lands, nitrate is the primary form of nitrogen. It is soluble in water and can easily pass through soil to the groundwater table. Nitrate can remain in groundwater for decades and accumulate to high levels as more nitrogen is used to the land surface every year. Knowing where and what type of risks to groundwater exist can alert water resource managers to protect water supplies.

A number of different approaches including interpolation methods, statistical models, index methods, and process-based models have been applied to assess the status of pollution and vulnerability of groundwater around the world. The first method is geostatistical based techniques which use interpolation methods, such as Kriging methods (Stigter et al., 2006; Narany et al., 2014), to assess the contamination risk in groundwater. These approaches require very dense sampling points and always faced with high uncertainties. The second approach is based on statistical models such as linear and non-linear regressions (Johnson and Belitz, 2009). These methods are able to model the pollution through correlation between pollutant's concentration and various causative parameters (McLay et al., 2001). However, correlation does not imply causality and these models need experts knowledge to make accurate and meaningful predictions. The third group is called index methods, which devote a weight to each factor mostly based on expert's knowledge. Some of these expert methods include susceptibility index (SI) (Van Beynen et al., 2012), DRASTIC method (Aller et al., 1987; Neshat et al., 2014; Majolagbe et al., 2016), GOD method (Foster, 1987), and DRAV model (Zhou et al., 2010). The fourth and most complex approach is process based models such as ground-water flow model (MODFLOW) (Nobre et al., 2007), water flow and nitrate transport global model (WNGM) (Bonton et al., 2011; Qin et al., 2013), pesticide root zone model (PRZM-3) (Fontaine et al., 1992; Akbar et al., 2011), groundwater loading effects of agricultural management systems (GLEAMS) (Leone et al., 2009; Leonard et al., 1987). The main weaknesses associated with these models are (i) the need for large input data (Iqbal et al., 2012), and (ii) the limited regional scales applicability (Garnier et al., 1998; Anane et al., 2013).

Recently, machine learning (ML) and soft computing techniques such as artificial intelligence have been successfully applied for the prediction of hazard and risk in environmental sciences (Choubin et al., 2017a, Choubin et al., 2017b; Ghorbani Nejad et al., 2017; Choubin et al., 2018b; Singh et al., 2018). However, the implementation of ML approaches for assessment of groundwater pollution risk is limited; and an integrated framework for groundwater risk assessment is still lacking. Hence, this study attempts to fill these gaps by proposing an integrated framework for groundwater risk assessment. Therefore, the main objectives of the current study are: (i) comparing the performance of three machine learning models (including two new algorithms for the first time, namely MDA and BRT, and a widely used algorithm, SVM) to map the groundwater pollution occurrence probability, (ii) using ensemble occurrence probability map to assess groundwater pollution risk, and (iii) proposing an integrated framework for groundwater risk assessment.

Section snippets

Study area

The study area is Lenjanat plain in Isfahan province, in center of Iran, which covers about 1180 km2. The plain is located between 51° 04′ to 51° 41′ E longitudes and 32° 04′ to 32° 31′ N latitudes (Fig. 1). The plain is surrounded by calcareous mountains and elevations of the plain range between 1631 and 2337 m above sea level. The climate type in the study area is arid-cold. The mean annual precipitation is about 160 mm based on the rainfall data recorded during 1971 to 2017, which mostly

Groundwater vulnerability assessment

Groundwater vulnerability map (Fig. 5) was produced by the DRASTIC model. DRASTIC index (DI) was obtained through Eq. (1). According to the Civita and de Regibus (1995) and Martínez-Bastida et al. (2010) the groundwater vulnerability map was classified into five classes of very low (DI < 80), low (DI = 80–120), moderate (DI = 120–160), high (DI = 160–200), and very high (DI > 200). The east and west of study area indicate low and very low vulnerability, whereas the middle areas of the Lenjanat

Conclusion

Groundwater pollution risk assessment is a helpful implement for managing the groundwater resource, particularly in arid and semi-arid areas. This study developed a novel framework for assessing the groundwater pollution risk based on the ensemble modeling method. The proposed procedure highlighted that the risk is higher for central part of the plain due to, pollution, probability, and vulnerability maps. Based on the landuse map, it is verified that high and very high risk of groundwater

Acknowledgments

We are grateful for Early Career Researcher funding for Sabrina Cipullo as part of the Marie-Curie Innovation Training Network REMEDIATE: Improved decision-making in contaminated land site investigation and risk assessment (European Union’s Horizon 2020 Programme for research, technological development and demonstration, grant agreement No. 643087).

References (70)

  • A. Matzeu et al.

    Methodological approach to assessment of groundwater contamination risk in an agricultural area

    Agric. Water Manag.

    (2017)
  • C.D.A. McLay et al.

    Predicting groundwater nitrate concentrations in a region of mixed agricultural land use: a comparison of three approaches

    Environ. Pollut.

    (2001)
  • R.A. Monserud et al.

    Comparing global vegetation maps with the Kappa statistic

    Ecol. Model.

    (1992)
  • A. Neshat et al.

    Risk assessment of groundwater pollution using Monte Carlo approach in an agricultural region: an example from Kerman Plain, Iran

    Comput. Environ. Urban. Syst.

    (2015)
  • R.C.M. Nobre et al.

    Groundwater vulnerability and risk mapping using GIS, modeling and a fuzzy logic tool

    J. Contam. Hydrol.

    (2007)
  • A. Ozdemir

    Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey)

    J. Hydrol.

    (2011)
  • H.R. Pourghasemi et al.

    Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling

    Sci. Total Environ.

    (2017)
  • R. Qin et al.

    Assessing the impact of natural and anthropogenic activities on groundwater quality in coastal alluvial aquifers of the lower Liaohe River Plain, NE China

    Appl. Geochem.

    (2013)
  • A. Rahman

    A GIS based DRASTIC model for assessing groundwater vulnerability in shallow aquifer in Aligarh, India

    Appl. Geogr.

    (2008)
  • O. Rahmati et al.

    Application of Dempster–Shafer theory, spatial analysis and remote sensing for groundwater potentiality and nitrate pollution analysis in the semi-arid region of Khuzestan, Iran

    Sci. Total Environ.

    (2016)
  • S. Shrestha et al.

    Assessment of groundwater vulnerability and risk to pollution in Kathmandu Valley, Nepal

    Sci. Total Environ.

    (2016)
  • S.K. Singh et al.

    Developing robust arsenic awareness prediction models using machine learning algorithms

    J. Environ. Manag.

    (2018)
  • P.E. Van Beynen et al.

    Comparative study of specific groundwater vulnerability of a karst aquifer in central Florida

    Appl. Geogr.

    (2012)
  • L. Aller et al.

    DRASTIC: a Standardized System to Evaluate Groundwater Pollution Potential Using Hydrogeologic Settings

    (1987)
  • V. Amiri et al.

    Groundwater quality assessment using entropy weighted water quality index (EWQI) in Lenjanat, Iran

    Environ. Earth Sci.

    (2014)
  • M. Anane et al.

    GIS-based DRASTIC, pesticide DRASTIC and the susceptibility index (SI): comparative study for evaluation of pollution potential in the Nabeul-Hammamet shallow aquifer, Tunisia

    Hydrogeol. J.

    (2013)
  • R. Arabgol et al.

    Predicting nitrate concentration and its spatial distribution in groundwater resources using support vector machines (SVMs) model

    Environ. Model. Assess.

    (2016)
  • B. Choubin et al.

    Combined gamma and M-test-based ANN and ARIMA models for groundwater fluctuation forecasting in semiarid regions

    Environ. Earth Sci.

    (2017)
  • B. Choubin et al.

    An ensemble forecast of semi-arid rainfall using large-scale climate predictors

    Meteorol. Appl.

    (2017)
  • B. Choubin et al.

    Watershed classification by remote sensing indices: a fuzzy c-means clustering approach

    J. Mt. Sci.

    (2017)
  • B. Choubin et al.

    Precipitation forecasting using classification and regression trees (CART) model: a comparative study of different approaches

    Environ. Earth Sci.

    (2018)
  • M. Civita et al.

    Sperimentazione di alcune metodologie per la valutazione della vulnerabilità degli acquiferi. Atti 2° Conv. Naz

    (1995)
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • A. Dewan

    Floods in a Megacity: Geospatial Techniques in Assessing Hazards, Risk and Vulnerability

    (2013)
  • T.G. Dietterich

    Ensemble methods in machine learning

  • Cited by (259)

    View all citing articles on Scopus
    View full text