Introduction

In the modern era, human civilization is on the edge of a global water catastrophe and facing a serious concern of freshwater pollution. In India's context, around 600 million people are at risk of water stress, and nearly two lakh Indians are dying annually due to the unavailability of potable water for drinking (WRI 2019; Matta 2020). River Ganga is one of the major freshwater resources following towards the north-east region of India and sustaining the million of inhibitants (MOEF 2009).

The biodegradable solid waste such as agriculture waste, food wastage and faecal matter of human and animal determined as the organic matter in river water (Finnveden et al. 2009; Stoate et al. 2009). The nutrient (particularly phosphorous and nitrogen) loading in the surface water bodies (e.g. river, lakes) is mainly due to the disposal of municipal sewage water and agricultural wastewater (Tiemeyer et al. 2006; Kumar et al. 2020; Singh et al. 2021), resulting in eutrophication of the river (Matta et al. 2020a, 2020b; Kumar et al. 2021c).

The identification of critical sources of pollution is an essential task for the researchers to manage the available water resources while implementing pollution remedies on surface water bodies like rivers and canals (Khan et al. 2017; Matta et al. 2018a). Quality of freshwater at any particular location of the river system governed by many factors like lithology of the river system, anthropogenic activities, atmospheric changes, ecological and climatic conditions (Matta and Kumar 2017a). Climate change is likely to influence the availability of freshwater resources significantly. A study based on 100-year temperature and rainfall datasets determine a declining rainfall trend from 1970 to 2011 in the Uttarakhand Region; this declining trend has become steeper. Besides, the average reduction rate in annual total rainfall was observed and found insignificant, which may lead to significant stress on the state's water resources (Mishra et al. 2015; Matta et al. 2018b; Kumar et al. 2021b). An another long-term appraisal of last 49 year (1971–2020) dataset of water quality parpmeters indicated slight contamination level, and at downstream areas, a decreasing trend in pH and dissolved oxygen content indicated that the river WQ may fall if the same scenario is repeated in the future (Kumar et al. 2021).

At present, the change in land-use pattern, cropping systems, unsustainable use of resources, changes in agriculture patterns in terms of irrigation, and drainage systems impacting the hydrological cycle of India's various regions and river systems (Gautam and Singh 2015; Matta and Kumar 2017b). Various agencies analyze and plan to retain the river's water quality level in the last three decades. Under the Environment Protection Act, 1986, one of the prominent organizations called the “National Ganga River Basin Authority (NGRBA)” made in 2009 to look after all activities to conserve the aesthetic value of the River Ganga and its tributaries (Kumar et al. 2020). The declining situation in the quality of river water appealed to many investigators' consideration to determine the potential sources of pollution (Tyagi et al. 2013a, b; Bhardwaj et al. 2010; Kumar et al. 2017, 2020). In this regard, various developmental activities (i.e. construction of roads and dams at the upstream region of the Himalayas) and discharge of agricultural wastes in the river reported as critical sources of contaminations (Matta et al. 2018c, 2020c).

The accurate and comprehensive properties of a water system are a major challenge. In recent past decades, many of the water quality indices (WQIs) have been formed and used towards this goal (Tyagi et al. 2013a; Trikoilidou et al. 2017). In this study, a comparison of two different WQIs (WQI by arithmetic mean method and Overall Index of Pollution: OIP) is performed and discussed to portray overall river water quality for human consumption. The suitability of WQI by arithmetic mean method is its usefulness water quality for any intended use as well as in pollution abatement and OIP is largely used specifically under Indian conditions (Trikey et al. 2013).

Thus, depletion in river water and pollution is severe, which was the main focus of the investigation to identify the probable aspects controlling the physicochemical property of water and assessing the suitability of water quality for drinking and other life-supporting activities. In this study, WQI viz., the overall index of pollution (OIP), and environmetrics techniques like principal component analysis (PCA) and cluster analysis (CA) are applied to categorize the WQ of the Ganga river into different quality classes and further identify the sources of pollution. The study could be helpful for the environmentalist, policy planners, and managers to take further necessary steps to conserve the Ganga River's aesthetic values.

Materials and methods

Study area

The catchment area of River Ganga System in the North Indian States viz. Uttarakhand and Uttar-Pradesh States is about 2,94,364 sq. km. In contrast, the current study area covered is about 316 km in the origin state of River Ganga from sampling site-1 to sampling site-20 during the study (Fig. 1). The upstream site (Gangotri) is a valley-type glacier (NRCD 2009) positioned in the Uttarkashi District of Uttarakhand. River at the downstream site (Roorkee) flows under a canal system, which starts from Haridwar; up to this site, the river receives types of pollutants from the untreated sewage discharge to industrial effluents, wastewater from commercial complexes to solid and liquid waste from significant development and tourist activities (Sharma et al. 2015; Kumar et al. 2021).

Fig. 1
figure 1

Location map of study area with monitoring station of Ganga River System

Sampling method

For the study, the total number of 80 samples in triplicate collected (2016–2017) from twenty different sites (Fig. 1 and Table 1) in four different seasons with overall variation observed in temperature ranging from 0 to 46 °C. In pre-monsoon (summer) and monsoon season, the upstream, midstream, and downstream temperatures reported between 20–25 °C, 25–27 °C, and 27 °C to > 30 °C. Similarly, during post-monsoon (winter), the temperature at upstream, midstream, and downstream was reported between13–15 °C, 15–18 °C, and 18 to > 20 °C, respectively (CWC 2012). Geo-coordinates during the sampling for locations recorded using GPSMAP (GARMIN) device model no.: 60CSx (Made in Taiwan). The variation observed in temperature ranging from 0 to 46 °C. The collection of water samples done in pre-acid-washed Nalgene Wide-Mouth Natural HDPE polypropylene, 1,000 ml bottles. The WQ parameters like conductivity (cond.), temperature (temp.), dissolved oxygen (DO) were measured on-site using the potable multi-parameter instrument, model no: HQ40D and DR1900 (HACH). For other WQ parameters (e.g. TS, TDS, TSS, BOD, COD, Cl, P, TKN, hardness, acidity, alkalinity, sulphate, etc.), the river water samples are tested in the laboratory, and further analysis was conducted by following the standard methods (APHA 2012). For sulphate and phosphate measurements, a colourimetric analysis carried out using UV Spectrophotometer Cary 60 (USEPA 2000; Matta et al. 2018d).

Table 1 Details of Sampling Site along Ganga River system with their geo-coordinated

Water quality index

Numerous water quality indices (WQIs) are available and applied recently for complex and large datasets of various river basins to understand the water quality of river systems. In this study, the arithmetic means method adopted to assess the water quality into four rating scales of probable pollution level.

The WQ Indexing equation (Eq. 1):

$${\text{WQI}} = \mathop \sum \limits_{i = 0}^{n} \frac{{W_{i} q_{i} }}{{W_{i} }}$$
(1)

where qi = sub-index or quality rating for the ith parameter.

Wi = unit weight for the ith parameter.

The calculation of WQI involves four steps: first, the selection of parameters; in this study, 14 hydro-chemical variables were selected out of 19 due to the lack of proposed permissible limit of drinking water (WHO 2011; BIS 2012); second, computation of sub-index or quality rating (qi); the equation (Eq. 2) expressed as (Brown et al. 1972):

$$q_{i} = \left\{ {\frac{{\left( {V_{a} - V_{i} } \right)}}{{\left( {V_{s} - V_{i} } \right)}}} \right\} \times 100$$
(2)

qi = sub-index for the ith parameter; Va = actual value present of the ith parameter at a given sampling station.

Vi = ideal value for the ith parameter.

Vs = standard value for the ith parameter.

Suppose quality rating = zero that means the complete absence of pollutants. While quality rating 0 < qi < 100 implies that the pollutants are above the standards (Ahmad 2014).

The third step is calculating unit weight (Wi) (Eq. 3) for the ith parameter, which is inversely proportional to the standard value of that particular variable.

$$W_{i} = \frac{k}{{S_{i} }}$$
(3)

where Si = standard value for the ith parameter.

k = proportionality constant, which calculated as (Eq. 4):

$$k = \frac{1}{{\sum \frac{1}{{S_{i} }}}}$$
(4)

Step four is to categorize computed WQI values into five classes for WQ given as 0–25 is excellent (E); 26–50 is good (G); 51–75 is moderately polluted (M); 76–100 is severely polluted (S), and > 100 is unfit (U) for drinking purposes (Banerjee and Srivastava 2009).

Overall index of pollution

The health condition of freshwater assessed with the applicability of the overall index of pollution (OIP), calculated as middling of pollution index (Pi) for individual variables and expressed by the following mathematical formula (Eq. 5):

$${\text{OIP}} = \frac{{\mathop \sum \nolimits_{i = 1} P_{i} }}{n}$$
(5)

Pi estimated by converting the measured concentration into numerical value through various mathematical expressions (Table 2) for individual variables, and ‘n’ represents the considered number of parameters. The adopted OIP classification scheme proposed as 0–1 for excellent (class C1), 1–2 for acceptable (class C2), 2–4 for slightly polluted (class C3), 4–8 for polluted (class C4), and 8–16 for heavily polluted (class C5) (Sargaonkar and Deshpandey 2003).

Table 2 Mathematical expressions for function curves for considered parameters in OIP calculation

Environmetrics

The term “environmetrics” used for multivariate statistical analysis like PCA to quantify the significance of variables that describes the evaluated grouped data set and patterns of the internal characteristics of the sampling locations. PCA used to explain the reduced set of observed variables from orthogonal (non-correlated) variables. Many researchers used these methodologies to characterize and appraise the freshwater and sediment quality (Sargaonkar and Deshpandey 2003; Mishra et al. 2015; Herojeet et al. 2016). Kaiser–Meyer–Olkin (KMO) and Bertlett's tests were used to determine whether the dataset was suitable for PCA. Communalities values > 0.5 were used to test the variable selection for PCA. To ensure that the dataset was normal, another preliminary assumption test (Kolmogorov–Smirnov: KS and Shaphiro-Wilk: SW) was performed (Kumar et al. 2021a).

Principal component analysis (PCA) derived hidden linear relationships from a data set of variables about the possible effects on hydrochemistry (Osei et al. 2010). PCA is a method that provides an analytical procedure whereby an original dataset containing factors reconstructed to a reduced set of new factors. PCA can obtain the information related to a particular variable with the most negligible loss of the entire data set (Simeonov et al. 2003). To attain this, converting the new set of uncorrelated factors and assembling most of the variation the first few present in an extensive data set of original variables. The data matrix values were standardized based on the correlation matrix between each variable before statistical analysis (Singh et al. 2004), considering the variables equally and ensuring that no parameter of different units with higher absolute values dominates the PCA. The PCs formed in a successively order with reducing influences to the variance, i.e. variations present in the original data explained by the first principal component (PC1), and decreasing proportions of the variance accounted by successive principal components (Simeonov et al. 2004; Vieira et al. 2012).

Results and discussion

For the study's course, samples assessment of freshwater in the Ganga River System in terms of physicochemical parameters seasonally represented in Table S1 and Table S2. The descriptive statistics for the nineteen parameters for all the monitoring locations throughout the study presented in Table 3. The seasonally observed variation in dataset was primarily compared with their respective standard acceptable values of BIS (Bureau Of Indian Standards) for drinking purpose. Considerably, the recorded concentration of TU was found above their acceptable limit all the monitoring locations during the study period which indicated the sedimentation load makes the river water more turbid. The LI (light intensity) value ranged from 321.67 to 5652.85 (µ mol. m-2 s-1) with mean of 1956.84 µ mol. m-2 s-1 at all studied locations. The LI helps to indicate the biological and chemical process in the water body, and an increase in turbidity implies a massive reduction in light for phytoplankton (Singh et al. 2005; Lionard et al. 2005; Matta et al. 2020b). The temperature variation observed from 13 to 29 °C, with a mean value of 20.44 °C. Similar observations found during the study of a tributary of River Ganga (Matta et al. 2020a).

Table 3 Descriptive statistics for study area along with standard limits proposed by BIS (2012)

The conductivity, turbidity, and velocity varied from 96.45 to 131.57 (µmhos/Cm2), 16.79–376.46 NTU, and 0.56–1.23 m/s during the study period. In river water, the average concentration of solids, TS, TSS, and TDS was 579.25, 504.34, and 71.19 mg/L, representing the sediment load during the monsoon season transported from the watersheds. A maximum amount of total dissolved solids may be present due to the soil and clay particles (Daphne et al. 2011). The concentration of oxygen and its consumption level observed in COD, DO, and BOD, with an average value of 8.33, 4.52, and 6.47 mg/L throughout the study period. The observation of free CO2 levels also helped to understand the respiration process and a living planktonic community (Matta and Uniyal 2017). The reported mean concentration of free CO2 was 0.67 mg/L throughout the study period.

The minimum and maximum concentration of alkalinity (109–223 mg/L), total hardness (112.5–256 mg/L), and acidity (26.51–60.32 mg/L) throughout the study period determines the discharge of domestic and industrial sewage, which contributes to the accumulation of large quantities of alkaline ions into the river water. Mean nutrient concentrations occur in the order SO4 > Cl > PO3−4 > TKN (Total Kjeldahl Nitrogen). The sulphate concentration ranges between 8.64 and 22.9 mg/L, whereas chloride, phosphate, and TKN concentration ranging between 5.01 and 13.21 mg/L, 0.23–0.9 mg/L, and 0–0.1 mg/L, respectively. These nutrients present naturally in surface water, sulphate and chloride give taste to water, but excessive amounts make it unfit to drink. Phosphate and TKN essential for the growth of aquatic plants, which gives food and habitat for organisms like fish and microorganisms (Matta et al. 2015a; McCarthy 2004; Tare et al. 2003).

Multivariate statistical analysis

The KMO test value (0.69, p = 0.00) was close to 1, indicating that the dataset was suitable for PCA. The Q-Q plot and the test values of KS and SW (p < 0.05) demonstrated that the data distribution was not normal. The communalities had values greater than 0.5 in all of the parameters. As a result, the water quality data from 20 separate places may be summed up into an eight-variable data set. The original mean dataset's variability was characterized by a screen plot (Fig. 2), determining seven variables, contributing 85.1% of cumulative variance for the water quality at 20 different sampling locations. On the eigenvalue criteria (Pathak and Limaye 2011; EPA 2012), three principal components (PCs) identified by varimax normalized rotation as essential as their eigenvalues are found higher than one and their variability was observed > 10%.

Fig. 2
figure 2

Screen plot of the eigenvalue for each component

The factor weights values of assessed factors concerning three PCs signified in Table 4. Further, it found that PC1 showed 29.4% of total variance included a significant part of the variables connected to temperature, conductivity, turbidity, Cl, TKN, P, solids, and dissolved oxygen. The variables temperature (0.75), conductivity (0.68), turbidity (0.83), chlorides (0.61), TKN (0.84 and P (0.72) had a positive factor, whereas the solids and DO having a negative factor weight in the development of PC1 (Table 4). The loading weight of PC1 indicated that the consumption of oxygen by the dissolved organic matter or pollutants, which could be linked to domestic wastewater, municipal point source effluents and agricultural non-point source runoff (Simeonov 2003; Mishra 2010). On the other hand, PC2 explained 16.7% of total variance and found with a positive weight with free CO2 (0.79) only. Moreover, BOD (0.76) and DO ( − 0.65), variables had negative weight, while TKN (0.76) and sulphate's (0.824) weight was positive to form PC3 with total variability of 10.6%. The factor weights of the PCA observations indicated the elevation difference, geogenic input, rainfall runoff from mountainous locations, agricultural waste from urban and semi-urban areas, developmental activities and biochemical processes as main sources of these variables (temp., TU, TS, TDS, TSS, DO, BOD, free CO2, Cl, TKN, P and sulphates) to reflect the overall water quality (Fig. 3).

Table 4 Factor matrix obtained by the method of principal components analysis
Fig. 3
figure 3

Dendrogram of spatial similarities between monitoring stations formulated by CA

Interpretation of WQIs

The value of WQI for all selected sites during all seasons varying in WQI of selected 20 sampling locations and the riverbank in Uttarakhand state, India, is illustrated in Table 5 and graphically represented in Figure S1. The results define temporal and spatial changes in water quality and reveal a specific trend of vacillations among different seasons. Index value helped to detect water quality at different sites; based on it, site-5 (WQI: 52.25), site-13 (WQI: 51.57), and site-16 (WQI: 56.28) show the moderate quality, and the rest of the sites were in a good quality class. During the seasonal index calculation, water samples collected at the Bahadrabad town (sampling site 19) observed with the highest WQI (71.6) in monsoon season, signifying moderate pollution in river water (as per rating scale of WQI). Except for sampling locations 2, 3, 5, 13, 16, 18, and 20, the rest of all sampling locations exhibit good to excellent water quality in winter and post-monsoon seasons.

Table 5 Site-wise water quality indexing

On the other hand, OIP also calculated for different seasons and individual monitoring locations. The average values of six parameters (turbidity, TDS, BOD, harness, chloride, and sulphate) were used along with their mathematical expression to estimate the individual pollution index (Pi) (Table 6). The mean of integrated Pi gave a final numerical value, represents OIP. The observed values of OIP claimed the river water quality in class C3 at most of the monitoring sites, which indicated slightly polluted conditions as OIP ranged from 2 to 4. Only sites 4 and 17 showed class C2 (OIP: 1–2), representing the acceptable condition (Table 5). The seasonal assessment also represents the degraded water quality of the river in various seasons. The C2 class was observed at seven monitoring locations during summer, whereas six locations were under acceptable conditions during the post-monsoon season. During the winter and monsoon season, class C3 reported at most of the sampling locations. The seasonal OIP is represented graphically in Figure S2. Researchers of different life works have applied OIP to represent the health of various rivers flowing in India (Shukla et al. 2017; Dhawde et al. 2018).

Table 6 Water quality index (WQI) in different seasons

Policy issues and recommendations

In the recent time, the country like India is facing scarcity of water quality. The availability of clean and fresh water has become a critical concern for the health and hygiene of human beings as well as other living organisms. River Ganga sustains a major population of India by providing water for agricultural, industrial and domestic activities. The current government is very much concern and putting very much efforts to maintain its quality by approving an integrated conservation mission (Namami Gange Programme) in June 2014 with budget 20 thousand crore for pollution abatement, conservation and rejuvenation. The river water quality is getting affected by the big drains bringing waste into it, developmental activities, change in nutrient level and many other forms which may have serious negative influence on human health as well as flora and fauna those consume river water directly or indirectly. This is a serious matter of concern for conservators or policy makers to reduce the bad impacts, protect and conserve river water judiciously. To assess the impacts of change in river water quality on human lives, different hydrochemical parameters are evaluated for diversified used from human consumption to agriculture; from commercial to industrial. Therefore, it is recommended that a rapid cost-effective in situ and laboratory analysis is required to determine the weather river water is safe or unfit for use. If the quality comes under the polluted category, there should be proper management and restoration steps followed to prevent human and environment before dumping of debris, effluents and sewage into the river system. There is a need of more studies or tools for river basin to monitor the main quality parameters and sources of contaminations in river water. Hence, it is suggested that issue-based studies should be conducted into account for resolving the pollution problems. In addition to it, the hard core locations must be assessed by advanced and modern techniques to identify and control the sources of pollution.

Conclusions

The outcome of the present study conducted on the Ganga River System, covering twenty sampling stations with the applicability of a comparative WQI, and PCA) to categorization water quality into diverse classes of quality, estimation of potential pollution sources that influence the hydrochemistry. The present study clearly defines the valuable information through different indices and multivariate statistical techniques in probing and amplifying compound data sets, recognizing contamination causes, and understanding variations in water quality to better design action plans for river rejuvenation. PCA identified the chief variable or sources responsible for variation in water quality. Results clearly show that the critical source of river water deprivation is the emancipation of domestic sewage wastewater, waste from developmental activities and agricultural wastes at the down-sites of River Ganga and contamination from local villages into the river water. The rejuvenation should diminish livestock activities around the river; otherwise, pollution can affect the human population and all living forms and reduce socio-economic and environmental disasters like climate change. These determinations should have considered for future planning and management of the Ganga River and its tributaries.