Abstract
Data-driven models are important to predict groundwater quality which is controlling human health. The water quality index (WQI) has been developed based on the physicochemical parameters of water samples. In this area, water quality is medium to poor and is found in saline zones; very high pH ranges are directly affected on the water quality in this study area. Conventional WQI computation demands more time and is often observed with enormous errors during the calculation of sub-indices. In the present work, four standalone methods such as additive regression (AR), M5P tree model (M5P), random subspace (RSS), and support vector machine (SVM) were employed to predict WQI based on variable elimination technique. The groundwater samples were collected from the Akot basin area, located in the Akola district, Maharashtra, in India. A total of nine different input combinations were developed in this study. The datasets were demarcated into two classes (ratio 80:20) for model construction (training dataset) and model verification (testing dataset) using a fivefold cross-validation approach. The models were assessed using statistical and graphical appraisal metrics. The best input combinations varied among the model, generally, the optimal input variables (EC, pH, TDS, Ca, Mg, and Cl) during the training and validation stages. Results show that AR outperformed the other data-driven models (R2 = 0.9993, MAE = 0.5243, RMSE = 0.0.6356, %RAE = 3.8449, and RRSE% = 3.9925). The AR is proposed as an ideal model with satisfactory results due to enhanced prediction precision with the minimum number of input parameters and can thus act as the reliable and precise method in the prediction of WQI at the Akot basin.
Similar content being viewed by others
References
Abba SI, Hadi SJ, Sammen SS, Salih SQ, Abdulkadir RA, Pham QB, Yaseen ZM (2020) Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination. J Hydrol 587:124974
Abbasnia A, Yousefi N, Mahvi AH, Nabizadeh R, Radfard M, Yousefi M, Alimohammadi M (2019) Evaluation of groundwater quality using water quality index and its suitability for assessing water for drinking and irrigation purposes: case study of Sistan and Baluchistan province (Iran). Hum. Ecol. Risk Assess 25(4):988–1005. https://doi.org/10.1080/10807039.2018.1458596
Adnan, R.M., Jaafari, A., Mohanavelu, A., Kisi, O., Elbeltagi, A., 2021. Novel ensemble forecasting of streamflow using locally weighted learning algorithm. Sustain.
Ahmadi M, Etedali HR, Elbeltagi A (2021) Evaluation of the effect of climate change on maize water footprint under RCPs scenarios in Qazvin plain. Iran. Agric. Water Manag. 254:106969. https://doi.org/10.1016/j.agwat.2021.106969
Al-Adhaileh MH, Alsaade FW (2021) Modelling and prediction of water quality by using artificial intelligence. Sustain. 13:1–18. https://doi.org/10.3390/su13084259
Aldhyani THH, Al-Yaari M, Alkahtani H, Maashi M (2020) Water quality prediction using artificial intelligence algorithms. Appl. Bionics Biomech. 2020. https://doi.org/10.1155/2020/6659314
Asadollah SBHS, Sharafati A, Motta D, Yaseen ZM (2021) River water quality index prediction and uncertainty analysis: a comparative study of machine learning models. J. Environ. Chem. Eng. 9:104599. https://doi.org/10.1016/j.jece.2020.104599
Ahmed U, Mumtaz R, Anwar H, Shah AA, Irfan R, García-Nieto J (2019) Efficient water quality prediction using supervised machine learning. Water 11(11):2210. https://doi.org/10.3390/w11112210
Ajmera TK, Goyal MK (2012) Development of stage discharge rating curve using model tree and neural networks: an application to Peachtree Creek in Atlanta. Expert Syst. Appl. 39(5):5702–5710
Asefa T, Kemblowski M, Urroz G, McKee M (2005) Support vector machines (SVMs) for monitoring network design. Ground Water 43:413–422
APHA, American Public Health Association (2005) Standard methods for the examination of water and waste water, 21st edn. APHA, Washington
Arun Pratap Mishra, Harish Khali, Sachchidanand Singh, Chaitanya B Pande, Raj Singh, Shardesh K Chaurasia, (2021) An assessment of in-situ water quality parameters and its variation with Landsat 8 level 1 surface reflectance datasets, Int J Environ Anal Chem, pp. 1-23, https://doi.org/10.1080/03067319.2021.1954175.
Babaee M, Maroufpoor S, Jalali M, Zarei M, Elbeltagi A (2021) Artificial intelligence approach to estimating rice yield*. Irrig. Drain. 1–11. https://doi.org/10.1002/ird.2566
Bajirao TS, Kumar P, Kumar M, Elbeltagi A, Kuriqi A (2021) Superiority of hybrid soft computing models in daily suspended sediment estimation in highly dynamic rivers. Sustain. 13:1–29. https://doi.org/10.3390/su13020542
Babbar, R., Babbar, S., (2017), Predicting river water quality index using data mining techniques, Environ Earth Sci (2017) 76:504 https://doi.org/10.1007/s12665-017-6845-9
Banerji S, Mitra D (2019) Geographical information system-based groundwater quality index assessment of northern part of Kolkata, India for drinking purpose. Geocarto Int. 34:943e958. https://doi.org/10.1080/10106049.2018.1451922
Panneerselvam B, Muniraj K, Pande C, Ravichandran N (2021a) Prediction and evaluation of groundwater characteristics using the radial basic model in semi-arid region. India, International Journal of Environmental Analytical Chemistry, pp 1–17. https://doi.org/10.1080/03067319.2021.1873316
BIS (Bureau of Indian Standards) (2012) Indian standard drinking water-specification, 1st rev., pp 1–8
Brown, A., & Matlock, M. D. (2011) A review of water scarcity indices and methodologies. White paper106, 19.
Brown, R.M., McClelland, N.I., Deininger, R.A., Tozer, R.G., 1970. A water quality index do we dare.
Bui DT, Khosravi K, Tiefenbacher J et al (2020a) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci Total Environ 721:137612. https://doi.org/10.1016/j.scitotenv.2020.137612
Busico, G., Kazakis, N., Cuoco, E., Colombani, N., Tedesco, D., Voudouris, K., Mastrocicco, M., 2020. A novel hybrid method of specific vulnerability to anthropogenic pollution using multivariate statistical and regression analyses.
Bui DT, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N (2020b) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci. Total Environ. 721:137612. https://doi.org/10.1016/j.scitotenv.2020.137612
Chen W, Pradhan B, Li S, Shahabi H, Rizeei HM, Hou E, Wang S (2019) Novel hybrid integration approach of bagging-based Fisher’s linear discriminant function for groundwater potential analysis. Nat. Resour. Res. 28:1239–1258
Deng T, Chau KW, Duan HF (2021) Machine learning based marine water quality prediction for coastal hydro-environment management. Journal of Environmental Management 284:112051
El Bilali A, Taleb A, Brouziyne Y (2021) Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agricultural Water Management 245:106625
Elbeltagi A, Azad N, Arshad A, Mohammed S, Mokhtar A, Pande C, Ramezani H, Ahmad S, Reza A, Islam T, Deng J (2021) Applications of Gaussian process regression for predicting blue water footprint: case study in Ad Daqahliyah. Egypt. Agric. Water Manag. 255:107052. https://doi.org/10.1016/j.agwat.2021.107052
Elbeltagi, A., Deng, J., Wang, K., Hong, Y., 2020a. Crop water footprint estimation and modeling using an artificial neural network approach in the Nile Delta, Egypt. Agric. Water Manag. 235, 106080. https://doi.org/10.1016/j.agwat.2020.106080
Elbeltagi A, Deng J, Wang K, Malik A, Maroufpoor S (2020b) Modeling long-term dynamics of crop evapotranspiration using deep learning in a semi-arid environment. Agric. Water Manag. 241:106334. https://doi.org/10.1016/j.agwat.2020.106334
Elbeltagi A, Rizwan M, Malik A, Mehdinejadiani B, Srivastava A, Singh A, Deng J (2020c) The impact of climate changes on the water footprint of wheat and maize production in the Nile Delta. Egypt. Sci. Total Environ. 743:140770. https://doi.org/10.1016/j.scitotenv.2020.140770
Elbeltagi A, Zhang L, Deng J, Juma A, Wang K (2020d) Modeling monthly crop coefficients of maize based on limited meteorological data: a case study in Nile Delta. Egypt. Comput. Electron. Agric. 173:105368. https://doi.org/10.1016/j.compag.2020.105368
Fagbote EO, Olanipekun EO, Uyi HS (2014) Water quality index of the ground water of bitumen deposit impacted farm settlements using entropy weighted method. Int. J. Environ. Sci. Technol. 11:127e138. https://doi.org/10.1007/s13762-0120149-0
Fu JC, Huang HY, Jang JH, Huang PH (2019) River stage forecasting using multiple additive regression trees. Water Resour. Manag. 33:4491–4507. https://doi.org/10.1007/s11269-019-02357-x
Gazzaz NM, Yusoff MK, Aris AZ, Juahir H, Ramli MF (2012) Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Marine Pollut Bull 64:2409–2420
Gorgij AD, Kisi O, Moghaddam AA, Taghipour A (2017) Groundwater quality ranking for drinking purposes, using the entropy method and the spatial autocorrelation index. Environ Earth Sci 76(7):269
Hastie T, Tibshirani R (1986) Generalized additive models. Stat. Sci. 6:15–51
He S, Wu J (2019) Relationships of groundwater quality and associated health risks with land use/land cover patterns: a case study in a loess area, northwest China. Hum. Ecol. Risk Assess. 25(1e2):354–373
Heddam S, Kisi O (2018) Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree. J. Hydrol. 559:499–509
Horton RK (1965) An index number system for rating water quality. J. Water Pollut. Control Fed. 37:300–306
Islam ARMT, Talukdar S, Mahato S et al (2021) Machine learning algorithm-based risk assessment of riparian wetlands in Padma River Basin of Northwest Bangladesh. Environ Sci Poll Res. https://doi.org/10.1007/s11356-021-12806-z
Islam ARMT, Mamun AA, Rahman MM, Zahid A (2020b) Simultaneous comparison of modified-integrated water quality and entropy weighted indices: implication for safe drinking water in the coastal region of Bangladesh. Ecological Indicators 113:106229. https://doi.org/10.1016/j.ecolind.2020.106229
Islam ARMT, Siddiqua MT, Zahid A, Tasnim SS, Rahman MM (2020a) Drinking appraisal of coastal groundwater in Bangladesh: an approach of multi-hazards towards water security and health safety. Chemosphere 255:126933. https://doi.org/10.1016/j.chemosphere.2020.126933
Islam ARMT, Shen S, Haque MA et al (2018) Assessing groundwater quality and its sustainability in Joypurhat district of Bangladesh using GIS and multivariate statistical approaches, Environment. Dev Sustain 20(5):1935–1959. https://doi.org/10.1007/s10668-017-9971-3
Islam ARMT, Bodrud-doza M, Rahman MS, Amin SB, Chu R, Mamun HA (2019) Sources of trace elements identification in drinking water of Rangpur districtBangladesh and their potential health risk following multivariate techniques and Monte-Carlo simulation. Groundwater Sustain Dev 9:100275. https://doi.org/10.1016/j.gsd.2019.100275
Islam ARMT, Ahmed N, Bodrud-Doza M, Chu R (2017) Characterizing groundwater quality ranks for drinking purposes in Sylhet district, Bangladesh, using entropy method, spatial autocorrelation index, and geostatistics. Environ Sci Poll Res 24(34):26350–26374. https://doi.org/10.1007/s11356-017-0254-1
Jerin JN, Islam HMT, Islam T, Shahid S, Zhenghua H, Mehnaz B, Ronghao C, Ahmed E (2021) Spatiotemporal trends in reference evapotranspiration and its driving factors in Bangladesh. Theor. Appl. Climatol. https://doi.org/10.1007/s00704-021-03566-4
Moharir K, Pande C, Singh SK, Choudhari P, Kishan R, Jeyakumar L (2019) Spatial interpolation approach-based appraisal of groundwater quality of arid regions. J Water Supply: Res Technol-Aqua 68(6):431–447
Kabir MM, Akter S, Ahmed FT, Mohinuzzaman M, Didar-ul-Alam M, Mostofa KMG, Islam ARMT, Niloy NM (2021) Salinity-induced fluorescent dissolved organic matter influence co-contamination, quality and risk to human health of tube well water, southeast coastal Bangladesh. Chemosphere 275:130053. https://doi.org/10.1016/j.chemosphere.2020.130053
Kazakis N, Mattas C, Pavlou A, Patrikaki O, Voudouris K (2017) Multivariate statistical analysis for the assessment of groundwater quality under different hydrogeological regimes. Environ Earth Sci 76(9):349
Khosravi K, Pham B, Chapi K, Shirzadi A, Shahabi H, Revhaug I, Bui D (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 627:744–755
Khosravi K, Shahabi H, Pham BT, Adamowski J, Shirzadi A, Pradhan B, Dou J, Ly H-B, Gróf G, Ho HL et al (2019) A comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning methods. J. Hydrol. 573:311–323
Khozani Z, Khosravi K, Pham B, Kløve B, Mohtar W, Yaseen Z (2019) Determination of compound channel apparent shear stress: application of novel data mining models. J. Hydro. inform. 21:798–811
Kisi O, Azad A, Kashi H, Saeedian A, Ali S, Hashemi A, Ghorbani S (2018) Modeling groundwater quality parameters using hybrid neuro-fuzzy methods. Water Resour Manag. https://doi.org/10.1007/s11269-018-2147-6
Kumar M, Kumari A, Kumar D, Al-ansari N, Ali R, Kumar R, Kumar A, Elbeltagi A, Kuriqi A (2021) The superiority of data-driven techniques for estimation of daily pan evaporation. Atmosphere (Basel).:1–23
Laanaya F, St-Hilaire A, Gloaguen E (2017) Water temperature modelling: comparison between the generalized additive model, logistic, residuals regression and linear regression models. Hydrol. Sci. J. 62:1078–1093. https://doi.org/10.1080/02626667.2016.1246799
Leong WC, Bahadori A, Zhang J, Ahmad Z (2019) Prediction of water quality index (WQI) using support vector machine (SVM) and least square- support vector machine (LS-SVM). Intl. J. River Basin Manag.:1–8. https://doi.org/10.1080/15715124.2019.1628030
Li X, Ding J, Ilyas N (2021) Machine learning method for quick identification of water quality index (WQI) based on Sentinel-2 MSI data: Ebinur Lake case study. Water Sci. Technol. Water Supply 21:1291–1312. https://doi.org/10.2166/ws.2020.381
Li PY, Wu JH, Qian H (2010) Groundwater quality assessment based on entropy weighted osculating value method. Int. J. Environ. Sci. 1(4):621e630
Mokhtar A, Jalali M, Elbeltagi A, Al-Ansari N, Alsafadi K, Abdo HG, Sammen SS, Gyasi-Agyei Y, Rodrigo-Comino J, He H (2021) Estimation of SPEI meteorological drought using machine learning algorithms. IEEE Access XX. https://doi.org/10.1109/ACCESS.2021.3074305
Moriasi DN, Wilson BN, Douglas-Mankin KR, Arnold JG, Gowda PH (2012) Hydrologic and water quality models: use, calibration, and validation. Trans. ASABE 55:1241–1247
Nguyen L (2017) Tutorial on support vector machine. Appl. Comput. Math. 6:1–15
Ongley, E.D., 2000. Water quality management: design, financing and sustainability considerations-II. In: Invited Presentation at the World Bank’s Water Week Conference: towards a Strategy for Managing Water Quality Management, pp. 1e16.
Pham BT, Bui DT, Prakash I, Dholakia M (2017) Hybrid integration of multilayer perceptron neural networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using gis. Catena. 149:52–63
Pande CB, Moharir K (2018) Spatial analysis of groundwater quality mapping in hard rock area in the Akola and Buldhana districts of Maharashtra, India. Appl Water Sci 8:106. https://doi.org/10.1007/s13201-018-0754-2
Pande CB, Moharir KN, Singh SK et al (2020) Groundwater evaluation for drinking purposes using statistical index: study of Akola and Buldhana districts of Maharashtra, India. Environ Dev Sustain 22:7453–7471. https://doi.org/10.1007/s10668-019-00531-0
Panneerselvam B, Muniraj K, Thomas M, Ravichandran N (2021b) GIS-based legitimatic evaluation of groundwater’s health risk and irrigation susceptibility using water quality index, pollution index, and irrigation indexes in semiarid region. In: Pande CB, Moharir KN (eds) Groundwater resources development and planning in the semi-arid region. Springer, Cham. https://doi.org/10.1007/978-3-030-68124-1_13
Raghavendra NS, Deka PC (2014) Support vector machine applications in the field of hydrology: a review. Appl. Soft Comput. 19:372–386
Saha N, Bodrud-doza M, Islam ARMT et al (2020) Hydrogeochemical evolution of shallow and deeper aquifers in central Bangladesh: arsenic mobilization process and health risk implications from the potable use of groundwater. Environ Earth Sci 79(20):477. https://doi.org/10.1007/s12665-020-09228-4
Sharafati A, Khosravi K, Khosravinia P, Ahmed K, Salman SA, Yaseen ZM (2019) The potential of novel data mining models for global solar radiation prediction. Int. J. Environ. Sci. Technol. 16:7147–7164
Singha S, Pasupuleti S, Singha SS, Singh R, Kumar S (2021) Prediction of groundwater quality using efficient machine learning technique. Chemosphere 276:130265
Sinha MK, Rajput P, Baier K, Azzam R (2021) GIS-based assessment of urban groundwater pollution potential using water quality indices. In: Pande CB, Moharir KN (eds) Groundwater resources development and planning in the semi-arid region. Springer, Cham. https://doi.org/10.1007/978-3-030-68124-1_15
Skurichina M, Duin RPW (2002a) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl. 5(2):121–135
Skurichina M, Duin RP (2002b) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal. Appl. 5:121–135
Suryakant T, Pravendra B, Manish K, Ahmed E, Alban K (2021) Potential of hybrid wavelet - coupled data - driven - based algorithms for daily runoff prediction in complex river basins. Theor. Appl. Climatol. 21. https://doi.org/10.1007/s00704-021-03681-2
Ho TK, Baird HS (Apr. 1998) Pattern classification with compact distribution maps. Computer vision and image understanding 70(1):101–110
Tiyasha Tung TM, Yaseen ZM (2020) A survey on river water quality modelling using artificial intelligence models: 2000e2020. J. Hydrol. 585:124670. https://doi.org/10.1016/j.jhydrol.2020.124670
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2:45–66
Towfiqul Islam ARM, Talukdar S, Mahato S, Kundu S, Eibek KU, Pham QB, Kuriqi A, Linh NTT (2021) Flood susceptibility modelling using advanced ensemble machine learning models. Geosci. Front. 12. https://doi.org/10.1016/j.gsf.2020.09.006
Valentini M, dos Santos GB, Muller Vieira B (2021) Multiple linear regression analysis (MLR) applied for modeling a new WQI equation for monitoring the water quality of Mirim Lagoon, in the state of Rio Grande do Sul—Brazil. SN Appl. Sci. 3:1–11. https://doi.org/10.1007/s42452-020-04005-1
Water Res. 171, 115386, Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models. Ann Stat 17(2):453–555 JSTOR 2241560
WHO (World Health Organization) (2011) Guidelines for drinking water quality, 4th edn. World Health Organization, Geneva
Yaseen Z, Ehteram M, Sharafati A, Shahid S, Al-Ansari N, El-Shafie A (2018) The integration of nature-inspired algorithms with least square support vector regression models: application to modeling river dissolved oxygen concentration. Water 10(9):1124
Yidana SM, Yidana A (2010) Assessing water quality using water quality index and multivariate analysis. Environ Earth Sci 59(7):1461–1473
Zerouali B, Al-ansari N, Chettih M, Mohamed M, Abda Z, Santos C, Zerouali B, Elbeltagi A (2021) An enhanced innovative triangular trend analysis of rainfall based on a spectral approach. Water (Switzerland):13. https://doi.org/10.3390/w13050727
Zhang Q, Qian H, Xu P, Hou K, Yang F (2021) Groundwater quality assessment using a new integrated-weight water quality index (IWQI) and driver analysis in the Jiaokou Irrigation District, China. Ecotoxicol Environ Saf 212:111992
Zhu S, Heddam S (2019) Prediction of dissolved oxygen in urban rivers at the three Gorges Reservoir, China: extreme learning machines (ELM) versus artificial neural network (ANN). Water Qual. Res. J. 55(1):1–13
Zhu S, Hrnjica B, Ptak M, Choinski A, Sivakumar B (2020) Forecasting of water level in multiple temperate lakes using machine learning models. J. Hydrol. 124819
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Author information
Authors and Affiliations
Contributions
Ahmed Elbeltagi: methodology, development of ML models, validation, and formal analysis and writing (review and editing)
Chaitanya B. Pande: methodology, original draft writing, writing editing, plotting, supervision, data collection and analysis for modeling purpose, and investigation
Saber Kouadri: writing the Results section and development of graphs
Abu Reza Md. Towfiqul Islam: writing review and editing
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Responsible Editor: Xianliang Yi
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
ESM 1
(DOCX 15 kb)
Rights and permissions
About this article
Cite this article
Elbeltagi, A., Pande, C.B., Kouadri, S. et al. Applications of various data-driven models for the prediction of groundwater quality index in the Akot basin, Maharashtra, India. Environ Sci Pollut Res 29, 17591–17605 (2022). https://doi.org/10.1007/s11356-021-17064-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-021-17064-7