Skip to main content
Log in

A new implementation of stacked generalisation approach for modelling arsenic concentration in multiple water sources

  • Original Paper
  • Published:
International Journal of Environmental Science and Technology Aims and scope Submit manuscript

Abstract

The current study proposes an effective machine learning model based on a stacked generalisation technique for predicting arsenic content in water sources (groundwater, surface water and drinking water) based on physicochemical water parameters (turbidity, pH, electrical conductivity and total suspended solids). In the proposed approach, random forest and decision trees were stacked as base regressors in the first layer. Then, extreme gradient boosting was employed as a meta-regressor in the second layer to compute the final predictions. A comprehensive assessment of the proposed approach was performed using reliable statistical metrics and diagnostic plots of the observed and predicted arsenic concentration. The results demonstrated a better generalisation performance of the proposed stacked approach as compared with the standalone models of decision trees, random forest, extreme gradient boosting, generalised regression neural network, light gradient boosting, multi-layer perceptron, multivariate adaptive regression splines and other stacked variants models. The proposed stacked approach outperformed all comparative models by achieving the lowest RMSE and MAPE of 8.041E-04 and 0.4689, respectively, and the highest NSE and R2 of 0.9778 and 0.9787, respectively. Overall, the results have indicated that the proposed stacked generalisation performance is very sensitive to the choice of base learners. The outcome of this study indicates that a stronger predictive potential of base learners could lead to higher performance of the overall stacking model. Hence, the proposed approach could be principal in predicting arsenic concentration in water sources.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

Data generated or analysed during the study are available from the corresponding author by request.

References

  • Achite M, Jehanzaib M, Elshaboury N, Kim TW (2022) Evaluation of machine learning techniques for hydrological drought modeling: A case study of the wadi ouahrane basin in algeria. Water 14(3):431

    Google Scholar 

  • Ahmad A, van der Wens P, Baken K, de Waal L, Bhattacharya P, Stuyfzand P (2020) Arsenic reduction to < 1 µg/L in Dutch drinking water. Environ Int 134:105253

    CAS  PubMed  Google Scholar 

  • Ahmadi A, Olyaei M, Heydari Z, Emami M, Zeynolabedin A, Ghomlaghi A, Daccache A, Fogg GE, Sadegh M (2022) Groundwater level modeling with machine learning: a systematic review and meta-analysis. Water 14(6):949

    CAS  Google Scholar 

  • Ahoulé DG, Lalanne F, Mendret J, Brosillon S, Maïga AH (2015) Arsenic in African waters: a review. Water Air Soil Pollut 226(9):1–13

    Google Scholar 

  • Akbari M, Soleimani K, Mahdavi M and Habibnejhad M (2011), Monitoring of regional low-flow frequency using artificial neural networks.

  • Arthur CK, Temeng VA, Ziggah YY (2020) Multivariate adaptive regression splines (MARS) approach to blast-induced ground vibration prediction. Int J Min Reclam Environ 34(3):198–222

    Google Scholar 

  • Asante KA, Agusa T, Subramanian A, Ansa-Asare OD, Biney CA, Tanabe S (2007) Contamination status of arsenic and other trace elements in drinking water and residents from Tarkwa, a historic mining township in Ghana. Chemosphere 66(8):1513–1522

    CAS  PubMed  ADS  Google Scholar 

  • Asante KA, Agusa T, Kubota R, Subramanian A, Ansa-Asare OD, Biney CA and Tanabe S (2008), Evaluation of urinary arsenic as an indicator of exposure to residents of Tarkwa, Ghana. West Af J Appl Ecol, 12(1).

  • ASCE Task Committee on Definition of Criteria for Evaluation of Watershed Models of the Watershed Management Committee, Irrigation and Drainage Division, 1993. Criteria for evaluation of watershed models. J Irrig Drain Eng, 119(3), 429-442.

  • Ayotte JD, Nolan BT, Gronberg JA (2016) Predicting arsenic in drinking water wells of the Central Valley, California. Environ Sci Technol 50(14):7555–7563

    CAS  PubMed  ADS  Google Scholar 

  • Bhattacharya P, Welch AH, Stollenwerk KG, McLaughlin MJ, Bundschuh J, Panaullah G (2007) Arsenic in the environment: biology and chemistry. Sci Total Environ 379:109–120

    CAS  PubMed  ADS  Google Scholar 

  • Bhattacharya P, Sracek O, Eldvall B, Asklund R, Barmen G, Jacks G, Koku J, Gustafsson JE, Singh N, Balfors BB (2012) Hydrogeochemical study on the contamination of water resources in a part of Tarkwa mining area, Western Ghana. J Afr Earth Sc 66:72–84

    Google Scholar 

  • Bhatti S, Memon MA, Bhatti ZA (2020) Groundwater arsenic and health risk prediction model using machine learning for T.M Khan Sindh, Pakistan. Int J Inform Technol Comput Sci 2:24–31

    Google Scholar 

  • Breiman L (1996) Stacked regressions. Mach Learn 24(1):49–64

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Google Scholar 

  • Chakraborty D, Elhegazy H, Elzarka H, Gutierrez L (2020a) A novel construction cost prediction model using hybrid natural and light gradient boosting. Adv Eng Inform 46:101201

    Google Scholar 

  • Chakraborty M, Sarkar S, Mukherjee A, Shamsudduha M, Ahmed KM, Bhattacharya A, Mitra A (2020b) Modeling regional-scale groundwater arsenic hazard in the transboundary Ganges River Delta, India and Bangladesh: infusing physically-based model with machine learning. Sci Total Environ 748:141107

    CAS  PubMed  ADS  Google Scholar 

  • Chang FJ, Kao LS, Kuo YM, Liu CW (2010) Artificial neural networks for estimating regional arsenic concentrations in a blackfoot disease area in Taiwan. J Hydrol 388(1–2):65–76

    CAS  Google Scholar 

  • Chen T, Xu J, Ying H, Chen X, Feng R, Fang X, Gao H, Wu J (2019b) Prediction of extubation failure for intensive care unit patients using light gradient boosting machine. IEEE Access 7:150960–150968

    Google Scholar 

  • Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794.

  • Chen J, Yin J, Zang L, Zhang T and Zhao M (2019). Stacking machine learning model for estimating hourly PM2. 5 in China based on Himawari 8 aerosol optical depth data. Sci Total Environ, 697, p.134021.

  • Cho KH, Sthiannopkao S, Pachepsky YA, Kim KW, Kim JH (2011) Prediction of contamination potential of groundwater arsenic in Cambodia, Laos, and Thailand using artificial neural network. Water Res 45(17):5535–5544

    CAS  PubMed  Google Scholar 

  • Downer CW, Ogden FL (2004) GSSHA: Model to simulate diverse stream flow producing processes. J Hydrol Eng 9(3):161–174

    Google Scholar 

  • Dzigbodi-Adjimah K (1993) Geology and geochemical patterns of the Birimian gold deposits, Ghana, West Africa. J Geochem Explor 47(1–3):305–320

    CAS  Google Scholar 

  • Erickson ML, Elliott SM, Brown CJ, Stackelberg PE, Ransom KM, Reddy JE, Cravotta CA III (2021) Machine-learning predictions of high arsenic and high manganese at drinking water depths of the glacial aquifer system, Northern Continental United States. Environ Sci Technol 55(9):5791–5805

    CAS  PubMed  ADS  Google Scholar 

  • Erpul GUNAY, Norton LD, Gabriels D (2003) Sediment transport from interrill areas under wind-driven rain. J Hydrol 276(1–4):184–197

    Google Scholar 

  • Essumang DK (2009). Levels of arsenic in human hair as biomarkers of arsenic exposure in a mining community in Ghana. Bull Chem Soc Ethiopia, 23(2).

  • Ewusi A, Ahenkorah I, Kuma J (2017a) Groundwater vulnerability assessment of the Tarkwa mining area using SINTACS approach and GIS. Ghana Min J 17(1):18–30

    Google Scholar 

  • Ewusi A, Apeani BY, Ahenkorah I, Nartey RS (2017b) Mining and metal pollution: assessment of water quality in the Tarkwa mining area. Ghana Min J 17(2):17–31

    Google Scholar 

  • Ewusi A, Ahenkorah I, Aikins D (2021) Modelling of total dissolved solids in water supply systems using regression and supervised machine learning approaches. Appl Water Sci 11(2):1–6. https://doi.org/10.1007/s13201-020-01352-7

    Article  CAS  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R, (2008) The elements of statistical learning. N. Y. Springer Series in Statistics 1 (No. 10).

  • Friedman JH (1991). Multivariate adaptive regression splines. Ann Stat, 1–67.

  • Gao W, Wang W, Dimitrov D, Wang Y (2018) Nano properties analysis via fourth multiplicative ABC indicator calculating. Arab J Chem 11(6):793–801

    CAS  Google Scholar 

  • Géron A (2019), Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly Media.

  • Ghana. Statistical Service, 2014. 2010 Population and housing census: District analytical report, Tarkwa Nsuaem municipal. Ghana Statistical Service.

  • Griffis RJ, Barning K, Agezo FL and Akosah FK (2002), Gold deposits of Ghana. Minerals Commission, Accra, Ghana, 432.

  • Hadzi GY, Essumang DK, Ayoko GA (2018) Assessment of contamination and health risk of heavy metals in selected water bodies around gold mining areas in Ghana. Environ Monit Assess 190(7):1–17

    CAS  Google Scholar 

  • Ibrahim B, Ewusi A, Ahenkorah I (2022b) Assessing the suitability of boosting machine-learning algorithms for classifying arsenic-contaminated waters: a novel model-explainable approach using shapley additive exPlanations. Water 14(21):3509

    Google Scholar 

  • Ibrahim B, Ewusi A, Ahenkorah I and Ziggah YY (2022a), Modelling of arsenic concentration in multiple water sources: a comparison of different machine learning methods. Groundw Sustain Dev, p.100745.

  • Jain SK, Sudheer KP (2008) Fitting of hydrologic models: a close look at the Nash-Sutcliffe index. J Hydrol Eng 13(10):981–986

    Google Scholar 

  • Jiang M, Liu J, Zhang L, Liu C (2020) An improved Stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms. Physica A 541:122272

    Google Scholar 

  • Jiang T, Li JP, Haq AU, Saboor A, Ali A (2021) A novel stacking approach for accurate detection of fake news. IEEE Access 9:22626–22639

    Google Scholar 

  • Junner NR, Hirst T, Service H (1942), The Tarkwa Goldfeld. Gold Coast Geological Survey, vol. 6, pp. 48–55. Memoir, No.

  • Kalin L, Govindaraju RS, Hantush MM (2003) Effect of geomorphologic resolution on modeling of runoff hydrograph and sedimentograph over small watersheds. J Hydrol 276(1–4):89–111

    Google Scholar 

  • Kesse GO (1985) The mineral and rock resources of Ghana. Ballkema Publishers, Rotterdam, p 610

    Google Scholar 

  • Kortatsi BK (2004), Hydrochemistry of groundwater in the mining area of Tarwa–Prestea. Ghana, PhD thesis. University of Ghana, Legon-Accra, Ghana.

  • Kucheryavskiy S (2018) Analysis of NIR spectroscopic data using decision trees and their ensembles. J Anal Test 2(3):274–289

    Google Scholar 

  • Kuma JS (2007) Hydrogeological studies on the Tarkwa gold mining district, Ghana. Bull Eng Geol Env 66(1):89–99

    CAS  Google Scholar 

  • Kuma JS, Ewusi A (2009) Water resources issues in Tarkwa municipality, southwest Ghana. Ghana Min J 11:37–46

    Google Scholar 

  • Kusimi JM, Kusimi BA (2012) The hydrochemistry of water resources in selected mining communities in Tarkwa. J Geochem Explor 112:252–261

    CAS  Google Scholar 

  • Li P, Wu Q, Burges C (2007) Mcrank: learning to rank using multiple classification and gradient boosting. Adv Neural Inf Process Syst 20:897–904

    Google Scholar 

  • Liang CP, Sun CC, Suk H, Wang SW, Chen JS (2021) A machine learning approach for spatial mapping of the health risk associated with arsenic-contaminated groundwater in Taiwan’s Lanyang Plain. Int J Environ Res Public Health 18(21):11385

    CAS  PubMed  PubMed Central  Google Scholar 

  • Liu N, Gao H, Zhao Z, Hu Y, Duan L (2022) A stacked generalization ensemble model for optimization and prediction of the gas well rate of penetration: a case study in Xinjiang. J Pet Explor Prod Technol 12(6):1595–1608

    CAS  Google Scholar 

  • Lombard MA, Bryan MS, Jones DK, Bulka C, Bradley PM, Backer LC, Focazio MJ, Silverman DT, Toccalino P, Argos M, Gribble MO (2021) Machine learning models of arsenic in private wells throughout the conterminous United States as a tool for exposure assessment in human health studies. Environ Sci Technol 55(8):5012–5023

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  • Lu H, Li H, Liu T, Fan Y, Yuan Y, Xie M, Qian X (2019) Simulating heavy metal concentrations in an aquatic environment using artificial intelligence models and physicochemical indexes. Sci Total Environ 694:133591

    CAS  PubMed  ADS  Google Scholar 

  • Massaoudi M, Refaat SS, Chihi I, Trabelsi M, Oueslati FS, Abu-Rub H (2021) A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for short-term load forecasting. Energy 214:118874

    Google Scholar 

  • McCuen RH, Knight Z, Cutter AG (2006) Evaluation of the Nash-Sutcliffe efficiency index. J Hydrol Eng 11(6):597–602

    Google Scholar 

  • Medunić G, Fiket Ž and Ivanić M (2020), Arsenic contamination status in Europe, Australia, 569 and other parts of the world. In: Arsenic in Drinking Water and Food, Springer, Singapore, pp. 570 183–233.

  • Mohammadi B (2021) A review on the applications of machine learning for runoff modeling. Sustain Water Res Manag 7(6):1–11

    Google Scholar 

  • Mosaffa H, Sadeghi M, Mallakpour I, Jahromi MN and Pourghasemi HR (2022) Application of machine learning algorithms in hydrology. In Computers in Earth and Environmental Sciences (pp. 585–591). Elsevier.

  • Muslim MA and Dasril Y (2021), Company bankruptcy prediction framework based on the most influential features using XGBoost and stacking ensemble learning. Int J Elect Comput Eng, (2088–8708), 11(6).

  • Naimi AI, Balzer LB (2018) Stacked generalization: an introduction to super learning. Eur J Epidemiol 33(5):459–464

    CAS  PubMed  PubMed Central  Google Scholar 

  • Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I–a discussion of principles. J Hydrol 10(3):282–290

    Google Scholar 

  • Nguyen H, Bui XN, Tran QH, Nguyen HA, Nguyen DA, Hoa LTT and Le QT (2021), Prediction of ground vibration intensity in mine blasting using the novel hybrid MARS–PSO–MLP model. Eng Comput, pp.1–19.

  • Nordstrom DK (2002) Worldwide occurrences of arsenic in groundwater. Science 296(5576):2143–2145

    CAS  PubMed  Google Scholar 

  • Norouzi H, Moghaddam AA (2020) Groundwater quality assessment using random forest method based on groundwater quality indices (case study: Miandoab plain aquifer, NW of Iran). Arab J Geosci 13(18):1–13

    Google Scholar 

  • Obosu JK, Kuma JSY, Buah WK (2019) Estimation of the quantity of water in the abandoned underground mine of gold fields Ghana Limited, Tarkwa: a potential source to augment water supply to Tarkwa municipality. Ghana Min J 19(1):9–20

    Google Scholar 

  • Papacharalampous G and Tyralis H (2022), A review of machine learning concepts and methods for addressing challenges in probabilistic hydrological post-processing and forecasting. arXiv preprint arXiv:2206.08998.

  • Park Y, Ligaray M, Kim YM, Kim JH, Cho KH, Sthiannopkao S (2016) Development of enhanced groundwater arsenic prediction model using machine learning approaches in Southeast Asian countries. Desalin Water Treat 57(26):12227–12236

    CAS  Google Scholar 

  • Peters J, De Baets B, Verhoest NE, Samson R, Degroeve S, De Becker P, Huy-brechts W (2007) Random forests as a tool for ecohydrological distribution modelling. Ecol Modell 207:304–318. https://doi.org/10.1016/j.ecolmodel.2007.05.011

    Article  Google Scholar 

  • Petrusevski B, Sharma S, Schippers JC, Shordt K (2007) Arsenic in drinking water. Delft: IRC Int Water Sanit Centre 17(1):36–44

    Google Scholar 

  • Pigois JP, Groves DI, Fletcher IR, McNaughton NJ, Snee LW (2003) Age constraints on Tarkwaian palaeoplacer and lode-gold formation in the Tarkwa–Damang district. SW Ghana Miner Deposita 38:695–714

    CAS  ADS  Google Scholar 

  • Podgorski J, Berg M (2020) Global threat of arsenic in groundwater. Science 368(6493):845–850

    CAS  PubMed  ADS  Google Scholar 

  • Podgorski J, Wu R, Chakravorty B, Polya DA (2020) Groundwater arsenic distribution in india by machine learning geospatial modeling. Int J Environ Res Public Health 17(19):7119

    PubMed  PubMed Central  Google Scholar 

  • Purkait B, Kadam SS and Das SK (2008). Application of artificial neural network model to study arsenic contamination in groundwater of malda district, Eastern India. J Environ Inform, 12(2).

  • Rahman M, Chen N, Elbeltagi A, Islam MM, Alam M, Pourghasemi HR, Tao W, Zhang J, Shufeng T, Faiz H, Baig MA (2021) Application of stacking hybrid machine learning algorithms in delineating multi-type flooding in Bangladesh. J Environ Manage 295:113086

    PubMed  Google Scholar 

  • Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104

    ADS  Google Scholar 

  • Seidu J, Ewusi A, Kuma JS (2019) Combined electrical resistivity imaging and electromagnetic survey for groundwater studies in the Tarkwa mining area, Ghana. Ghana Min. J. 19(1):29–41

    Google Scholar 

  • Shahid M, Imran M, Khalid S, Murtaza B, Niazi NK, Zhang Y, Hussain I (2020) 593 Arsenic environmental contamination status in South Asia. Arsenic in drinking water and 594 food. Springer, Singapore, pp 13–39

    Google Scholar 

  • Shi F, Liu Y, Liu Z, Li E (2018) Prediction of pipe performance with stacking ensemble learning based approaches. J Intell Fuzzy Syst 34(6):3845–3855

    Google Scholar 

  • Smedley PL (1996) Arsenic in rural groundwater in Ghana: part special issue: hydrogeochemical studies in sub-Saharan Africa. J Afr Earth Sc 22(4):459–470

    CAS  Google Scholar 

  • Smedley PL, Kinniburgh DG (2002) A review of the source, behaviour and distribution of arsenic in natural waters. Appl Geochem 17(5):517–568

    CAS  ADS  Google Scholar 

  • Smith AH, Lingas EO, Rahman M (2000) Contamination of drinking-water by arsenic in Bangladesh: a public health emergency. Bull World Health Organ 78:1093–1103

    CAS  PubMed  PubMed Central  Google Scholar 

  • Smyth P, Wolpert D (1997) Stacked density estimation. Adv Neural Inf Process Syst 10

  • Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2(6):568–576

    CAS  PubMed  Google Scholar 

  • Ting KM, Witten IH (1997) Stacked Generalization: when does it work?

  • Ting KM, Witten IH (1999) Issues in stacked generalization. J Artif Intell Res 10:271–289

    Google Scholar 

  • Tyralis H, Papacharalampous G, Burnetas A, Langousis A (2019) Hydrological post-processing using stacked generalization of quantile regression algorithms: large-scale application over CONUS. J Hydrol 577:123957

    Google Scholar 

  • UNICEF (2006) Arsenic mitigation in Bangladesh fact sheet. Retrieved May 10, 2016.

  • Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Google Scholar 

  • Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82

    Google Scholar 

  • World Health Organisation (2017). Guidelines for Drinking-Water Quality. World Health Organiation.

  • Wu R, Alvareda EM, Polya DA, Blanco G, Gamazo P (2021a) Distribution of groundwater arsenic in uruguay using hybrid machine learning and expert system approaches. Water 13(4):527

    Google Scholar 

  • Wu T, Zhang W, Jiao X, Guo W, Hamoud YA (2021b) Evaluation of stacking and blending ensemble learning methods for estimating daily reference evapotranspiration. Comput Electron Agric 184:106039

    Google Scholar 

  • Zandi O, Zahraie B, Nasseri M and Behrangi A (2022). Stacking machine learning models versus a locally weighted linear model to generate high-resolution monthly precipitation over a topographically complex area. Atmos Res, 106159.

  • Zheng S, Wang P, Wang C, Hou J, Qian J (2013) Distribution of metals in water and suspended particulate matter during the resuspension processes in Taihu Lake sediment. China Quat Int 286:94–102

    Google Scholar 

  • Zounemat-Kermani M, Alizamir M, Keshtegar B, Batelaan O, Hinkelmann R (2022) Prediction of effluent arsenic concentration of wastewater treatment plants using machine learning and kriging-based models. Environ Sci Pollut Res 29(14):20556–20570

    CAS  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the technical support offered by the laboratory staffs at the University of Mines and Technology, Ghana, in regard to data analysis and acquisition. This work was performed (in part) at the Geological Engineering Department, University of Mines and Technology, Ghana.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. Ahenkorah.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Editorial responsibility: S. Mirkia.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 29 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ibrahim, B., Ewusi, A., Ziggah, Y.Y. et al. A new implementation of stacked generalisation approach for modelling arsenic concentration in multiple water sources. Int. J. Environ. Sci. Technol. 21, 5035–5052 (2024). https://doi.org/10.1007/s13762-023-05343-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13762-023-05343-4

Keywords

Navigation