Das Kapitel befasst sich mit der kritischen Messung der Armut, die für die Bekämpfung wirtschaftlicher Ungleichheiten und die Gestaltung wirksamer politischer Strategien unverzichtbar ist. Er untersucht zwei Hauptansätze zur Messung der Armut: den wohlfahrtsstaatlichen Ansatz, der sich auf das individuelle Wohlergehen und den Konsum konzentriert, und den nichtwohlfahrtsorientierten Ansatz, der grundlegenden Errungenschaften wie Bildung und Gesundheit Priorität einräumt. Die Analyse zeigt deutliche Fortschritte bei der Verringerung der weltweiten Armut, insbesondere in Afrika südlich der Sahara und Südasien, trotz der jüngsten Verlangsamung aufgrund wirtschaftlicher Verwerfungen. Herkömmliche Methoden zur Bewertung der Armut durch Haushaltsbefragungen sind kostspielig und zeitaufwändig, was zu Rufen nach einer Datenrevolution führt. Dieses Kapitel schlägt den Einsatz von Algorithmen des maschinellen Lernens (ML) und hochauflösenden Satellitenbildern zur Vorhersage und Analyse von Armutsquoten vor und bietet eine kostengünstige, zeitnahe und detaillierte Alternative. Die Studie integriert verschiedene Geodaten, einschließlich Nachtlichtintensität, NDVI, LST, Niederschlag und POI-Dichte, um die Genauigkeit von Armutsschätzungen zu verbessern. Das Random-Forest-Modell erweist sich als die effektivste Technik und zeigt überlegene Leistungen bei der Vorhersage von Armutsniveaus. Das Kapitel unterstreicht auch die Bedeutung der Berücksichtigung sowohl einkommensbasierter als auch multidimensionaler Armutsmaßstäbe und liefert wertvolle Erkenntnisse für politische Entscheidungsträger, um die Bemühungen zur Armutsbekämpfung wirksamer anzugehen. Die Analyse unterstreicht das Potenzial der Integration von ML-Techniken mit Geodaten zur Verbesserung der Armutsmessung und zur Unterstützung einer effektiveren politischen Entscheidungsfindung, die zum globalen Ziel der Beendigung der Armut bis 2030 beiträgt.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
This chapter focuses on Sustainable Development Goal (SDG) 1, which aims to end poverty by 2030. Although significant progress has been made in poverty reduction, but the pace has slowed, especially after the COVID-19 pandemic. As of 2024, 8.9% of people global population live in extreme poverty, while 23.6% lives in poverty in low- and middle-income countries. South Asia, including India, continue to faces serious challenges especially in accurately measuring poverty. Traditional household surveys, while useful, are often costly, time-consuming, and outdated. To address this gap, this study explores the use of machine learning (ML) technique the combine geospatial and survey data to improve poverty prediction in India. It incorporates indicators such as nightlight intensity, land temperature, rainfall, vegetation, and points of interest. Among the ML models tested, the Random Forest algorithm produced the most accurate results. Nightlight intensity and point of interest density emerged as the most important predictors. These findings highlights the potential of ML tools to generate faster and more precise poverty estimates at local levels, offering valuable support for targeted policymaking.
Hinweise
Disclaimer: The presentation of material and details in maps used in this chapter does not imply the expression of any opinion whatsoever on the part of the Publisher or Author concerning the legal status of any country, area or territory or of its authorities, or concerning the delimitation of its borders. The depiction and use of boundaries, geographic names and related data shown on maps and included in lists, tables, documents, and databases in this chapter are not warranted to be error free nor do they necessarily imply official endorsement or acceptance by the Publisher or Author.
4.1 Introduction
Measurement of poverty is essential for comparing economic inequalities and designing effective policies. Ravallion (1994) notes that poverty can be measured in two main ways: ordinally, to track changes over time and between places, or cardinally, to measure its extent across different programs. There are two main approaches to measuring poverty: welfarist and non-welfarist. The welfarist approach, described by Sen (1981), focuses on individual well-being, highlighting the consumption of goods and services. In contrast, the non-welfarist approach prioritizes basic achievements like education and health. Understanding these approaches is important for accurately measuring poverty and designing effective strategies to reduce it, which is a key global goal outlined in Sustainable Development Goal 1 (SDG 1) to end all forms of poverty.
The World Bank global poverty update for March 2024 highlighted that global poverty rates have declined significantly in recent decades. However, progress has slowed recently due to economic disruptions from the COVID-19 pandemic and other conflicts. According to the update, about 8.9% of the world's population still lives in extreme poverty, which is defined as living on less than $2.15 (PPP) per day (Aguilar et al., 2024). This threshold aims to track progress towards reducing extreme poverty below 3% by 2030. Sub-Saharan Africa and South Asia have the highest poverty rates, with Sub-Saharan Africa accounting for over one-third (36.7%) and South Asia for one-fifth (10.6%) of the world's extremely poor. At the international poverty line of $3.65 (PPP), relevant for lower-middle-income countries (LMICs), the global poverty headcount ratio stands at 23.6%, with Sub-Saharan Africa at 62.3% and South Asia at 42.3%. These thresholds are used to gauge progress towards reducing extreme poverty to negligible levels in all its forms by 2030.
Anzeige
Monitoring poverty and identifying its determinants are essential for policymakers and researchers to understand living conditions and develop effective poverty alleviation strategies. Poverty is measured using various indicators: income-based, consumption-based, nutrition-based, anthropological, and multidimensional poverty index (MPI) (Alkire et al., 2017). Each method has its advantages and limitations. The global commitment to eradicating extreme poverty, as outlined in SDG 1 as mentioned above, also emphasizes the need for timely and accurate poverty data (Jean et al., 2016). However, the traditional methods of evaluating poverty through household surveys are costly, time-consuming, and often delayed. Standard methods like census enumeration and household surveys are often delayed. For example, censuses are conducted every 10 years in most countries, and household sample surveys are conducted every 3–5 years (Blumenstock, 2016). These time lags hinder accurate monitoring of progress (Devarajan, 2013; Njuguna & McSharry, 2017), leading to calls for a data revolution by the UN (IEAG, 2014). This revolution requires more frequent data collection, which traditional methods often cannot support due to high costs (Demombynes & Sandefur, 2014; Jerven, 2017). This is especially challenging in developing countries like India, where poverty is a pressing issue (Blumenstock, 2016).
In India, poverty is primarily measured using a consumption-based approach, which calculates the poverty headcount ratio by comparing consumption expenditure to a defined poverty line. Despite its criticisms, this method remains fundamental due to its simplicity and long-standing use. Traditionally, poverty data in India is obtained from household consumption surveys conducted by the National Statistics Office (NSO). Additionally, the Multidimensional poverty data has more recently been calculated by India’s policy think tank, NITI Aayog using the National Family Health Survey (NFHS). The NFHS surveys are conducted by the International Institute for Population Studies (IIPS) with support from the Ministry of Health and Family Welfare of the Government of India. However, issues such as outdated methods, infrequent estimates, lack of regional granularity, cost, and timeliness impact the reliability of these surveys (Deaton & Kozel, 2005; Devarajan, 2013).
To address these challenges globally, researchers and policymakers are increasingly using advanced methods like machine learning (ML) algorithms and high-resolution satellite imagery to predict and analyse poverty levels (Jean et al., 2016). These techniques help identify poverty hotspots, track changes over time, and evaluate the impact of policy interventions. By applying ML to satellite images, mobile phone data, and geospatial information, researchers gain valuable insights into how poverty is distributed. These methods offer a cost-effective, timely, and detailed alternative to traditional data collection methods, especially in areas where conventional surveys are hard to carry out. Using these technologies, policymakers can create targeted poverty reduction programs, allocate resources more effectively, and make progress towards ending poverty by 2030. Recent studies also show that ML algorithms can accurately predict poverty using geospatial data. In India, where reliable, high-frequency data are limited, researchers and policymakers are proposing the use of geospatial data as an alternative for development indicators (Duan et al., 2017; Dugoua et al., 2018; Ghosh et al., 2013; Nischal et al., 2015; Suraj et al., 2018). This underscores the need for comprehensive India-specific studies that use various geospatial data using ML techniques to enhance poverty predictions.
This chapter is organized into seven sections. After the introduction, the second section reviews related work on the topic. The third section outlines the study’s objectives and research questions. The fourth section describes the study framework and data sources, providing detailed descriptions of the data and machine learning techniques used, as well as limitations of the study. The fifth section presents the results and discussions. The sixth section forecasts poverty levels. The final, seventh section concludes with key findings and policy recommendations.
Anzeige
4.2 Related Work
Over the past two decades, the use of geospatial data combined with machine learning (ML) techniques has significantly advanced methods for predicting poverty. Researchers have utilized satellite data to study various socio-economic aspects, including poverty and economic activity (Asher et al., 2021; Donaldson & Storeygard, 2016; Hodler et al., 2023). Nightlight data, in particular, has proven useful for assessing the impact of natural disasters, policy actions, and conflicts on the economy and poverty (Beyer et al., 2018; Chodorow-Reich et al., 2018; Bundervoet et al., 2015).
Jean et al. (2016) used satellite nightlight data to estimate poverty levels in African countries with high accuracy. In Nepal, Bilton et al. (2017) combined geospatial data with survey data using ML techniques such as random forests and decision trees, achieving a 67% accuracy in predicting cluster-level poverty. Tingzon et al. (2019) further advanced this by integrating data from OpenStreetMap and regional indicators, reaching 63% accuracy in poverty predictions. A key study by Henderson et al. (2012) demonstrated that nightlights could effectively augment economic growth for 188 countries, showing a significant correlation between nightlights and GDP. This ‘Henderson elasticity’ has been validated across various contexts. Nightlight data is especially valuable in countries facing challenges in collecting timely data (Chen & Nordhaus, 2011; Hu & Yao, 2018). For example, in China, nightlight-adjusted GDP growth was found to be lower than official estimates (Zhou & Zeng, 2018). Similarly, in India and Angola, national accounts data were more accurate than household survey-based income estimates (Pinkovskiy & Sala-i-Martin, 2016). Advancements in short-term economic growth applications have emerged with the availability of monthly nightlight data. For instance, Bhadury et al. (2018) used nightlight data to improve the accuracy of nowcasts for India's gross value-added estimates. Machine learning, particularly deep learning, has also been applied to satellite imagery for tasks such as image segmentation and object identification, which aids in predicting poverty-related parameters (Bruzzone & Demir, 2014; Huang et al., 2015; Xie et al., 2016; Abelson et al., 2014).
In India, nightlight data has been employed to estimate economic activity at the district level, where reliable estimates are often lacking. Studies by Bhandari and Roychowdhury (2011) demonstrated that nightlights could capture GDP differences at the district level using multinomial regression techniques. Non-linear models yielded better results, as linear models often underestimated urban GDP and overestimated GDP in agriculture-dominated areas. Regional studies using nightlight data have highlighted intra-state divergence, with significant differences observed across districts (Chakravarty & Dehejia, 2017). Chanda and Kabiraj (2018) found evidence of both absolute and conditional convergence in rural areas, but not in urban areas. The nightlight data for about 600,000 villages from 1993 to 2013 (University of Michigan, n.d.) used by researchers shows that rural economic activity and poverty using ML regression algorithms and neural networks shows that higher nightlight intensity often corresponds to lower poverty rates (Ghosh et al., 2013).
However, studies in India have faced several limitations, such as variations in data quality and resolution, which can impact the accuracy of poverty predictions (Chen & Nordhaus, 2011). Factors like power outages and regional economic differences can distort the relationship between nightlight intensity and income levels (Gibson et al., 2019). Additionally, the use of nightlight data in India has revealed gaps in understanding long-term economic trends and state-specific effects. There is a need for further research to explore other geospatial data and to consider sectoral and regional variations in poverty levels with the application of more sophisticated ML algorithms. While satellite nightlight data combined with ML techniques offers valuable insights, integrating these with other data sources and refining predictive models is crucial for improving accuracy. This study aims to address these gaps by exploring advanced ML techniques and alternative data sources to enhance poverty prediction and policy interventions in India.
4.3 Objectives and Questions
The primary objective of this study is to assess the effectiveness of novel machine learning (ML) techniques in predicting poverty levels using geospatial data in India. By integrating various datasets, the study aims to enhance the accuracy of poverty estimates, providing more timely and precise information essential for effective policymaking and resource allocation.
This study explores several important questions:
How can integration of geospatial and survey data using ML techniques predict poverty more effectively? This involves examining methods and approaches for combining these data types to enhance prediction models.
How can geospatial data contribute to accurate and cost-effective poverty predictions? This question focuses on the value of geospatial data in improving the precision and efficiency of poverty estimates.
How are ML methods used for poverty prediction? This involves reviewing various ML techniques applied to predict poverty and assessing their performance. The ML methods generate a functional relationship between dependent variables like poverty and its determinants.
What are the advantages of using ML techniques in poverty measurement, and what are their potential future uses? This includes evaluating the benefits of ML methods over traditional approaches and exploring their future applications in poverty analysis.
Overall, this study seeks to advance the understanding of how innovative data and ML techniques can refine poverty measurement. The insights gained will be crucial for developing more effective poverty alleviation strategies and policies for developing countries like India.
4.4 Study Framework and Data Sources
The study involves a detailed process of data collection, pre-processing, and integration of data, followed by using the various predictive machine learning (ML) models.
4.4.1 Data Sources
This study employed various geospatial data, including nightlight intensity, the Normalized Difference Vegetation Index (NDVI), Land Surface Temperature (LST), rainfall, and Point of Interest (POI) density. These variables serve as proxies for economic activity, urbanization, climate risk, and accessibility to basic services. Additionally, indicators such as Monthly Per Capita Consumption and the Multidimensional Poverty Index are used as proxies for poverty or outcome variables. Table 4.1 and Fig. 4.1 provide an overview of these data sources.
Table 4.1
Data sources and variables
Variable
Dataset
Source
Proxy
POI density
Open street map
Open source
Accessibility to services and economic activities
NightLight
VNP46A2: VIIRS Lunar Gap-Filled BRDF Nighttime Lights Daily L3 Global 500 m
NASA LP DAAC at USGS EROS Center
Economic activities
Land Surface Temperature (LST)
MOD11A1.061 Terra Land Surface Temperature and Emissivity Daily Global 1 km
NASA LP DAAC at USGS EROS Center
Urbanization and climate risk
NDVI
MOD13A1.061 Terra Vegetation Indices 16-Day Global 500 m
NASA LP DAAC at USGS EROS Center
Economic activities and urbanization
Rainfall
CHIRPS Pentad: Climate Hazards Group InfraRed Precipitation with Station Data (Version 2.0 Final)
University of California, Santa Barbara
Climate risk
Multidimensional Poverty Index (MPI)
National Family and Health Surveys (NFHS)
IIPS
Poverty
Monthly Per Capita Expenditure
Periodic Labour Force Survey (PLFS)
NSO
Poverty
Fig. 4.1
Study framework
4.4.2 Description of Data Sources
The data, their sources and importance are briefly discussed below.
Point of Interest (POI) Density: POI density data from OpenStreetMap provides information on the accessibility of basic services and economic activities. It reveals the distribution of facilities such as schools, hospitals, and markets, which are crucial for understanding the availability of essential services (Hu et al., 2016; Ye et al., 2019; Nattapong et al., 2022).
NightLight Data: Nightlight data, specifically the VIIRS Lunar Gap-Filled BRDF Nighttime Lights Daily L3 Global 500 m dataset, is used as a proxy for economic activities. The intensity of nighttime lights is indicative of economic development and has been utilized in various studies to estimate poverty levels (Head et al., 2017; Jean et al., 2016).
Land Surface Temperature (LST): The MOD11A1.061 Terra LST dataset provides insights into urbanization and climate risk by measuring the surface temperature of the land. This data is essential for understanding the impact of urban heat islands and climate-related factors on poverty (Weng, 2001; Ruthirako et al., 2015).
Normalized Difference Vegetation Index (NDVI): NDVI data from the MOD13A1.061 Terra Vegetation Indices dataset reflects vegetation health and land use changes. This measure helps in analysing economic activities and urbanization (Leroux et al., 2017; Sruthi et al., 2015).
Rainfall Data: Rainfall data from the CHIRPS Pentad dataset captures climate factors affecting poverty levels. Rainfall can impact economic activities and amplify climate risks (Richardson, 2007; Arzeki & Brückner, 2012).
Monthly Per Capita Consumption (MPCE): The MPCE data is sourced from the Periodic Labour Force Survey conducted by the National Sample Survey Organisation, Government of India. This international poverty lines for Low- and Middle-Income Countries (LMICs) was at $3.20 PPP upto 2022. This international poverty line for LMICs have been used on household MPCE after converting into US dollar PPP from NSS survey to calculate poverty rate.
Multidimensional Poverty Index (MPI): The MPI assesses poverty across three dimensions: Health, Education, and Living Standards. It uses ten indicators to measure deprivations and provides a comprehensive view of poverty beyond mere income metrics (Alkire & Santos, 2010; Alkire et al., 2011). The MPI approach highlights disparities in well-being and aids in designing targeted poverty interventions. The data for calculating the MPI is sourced from the National Family Health Survey (NFHS) conducted by IIPS, Mumbai, India.
4.4.3 Data Extraction and Integration
Data Pre-processing: Geospatial data, including nightlights, NDVI (Normalized Difference Vegetation Index), LST (Land Surface Temperature), and rainfall, were extracted using the geemap package in Python. This package allows for overlaying district-level shapefiles of India on raster files to extract data points for each district. The raster data, collected daily for the survey period, were averaged for each district to create vector data. Normalization was performed according to Google Earth Engine's dataset instructions before averaging. For Points of Interest (POI) data from OpenStreetMap, district shapefiles were used to intersect geographical points, calculating POI density by dividing the number of points within a district's boundary by the district's total area (Figs. 4.1 and 4.2).
Fig. 4.2
Data extraction and integration
Data Integration or Merging: The geospatial data were merged with the PLFS (Periodic Labour Force Survey) and NFHS (National Family Health Survey) data, as well as district-level poverty headcount data. Poverty headcount data were derived using the international poverty line for PLFS data and the Multidimensional Poverty Index (MPI) for NFHS data. The combined data were used to predict poverty, with PLFS data predicting based on the international poverty line and NFHS data predicting MPI-based poverty (Figs. 4.1 and 4.2).
4.4.4 Analytical Tools
The predictive power of geospatial variables was assessed using various methods: Generalized Least Squares (GLS), Random Forest (RF), and other tree-based algorithms such as Decision Trees, Bagging, Gradient Boosting, and Adaboost. Neural Networks (NN) were also employed. The model with the highest accuracy was selected to determine variable importance, revealing each variable's contribution to poverty prediction and identifying the most influential geographical variables.
Generalized Least Squares (GLS): GLS is a statistical technique for modelling relationships between variables when error terms may be correlated or have varying variances. It adjusts for these issues, offering more reliable estimates than Ordinary Least Squares (OLS), especially in complex socio-economic data.
Random Forest (RF): RF is an ensemble learning method using multiple decision trees built from random subsets of data. It averages predictions (for regression) or takes a majority vote (for classification) from all trees. RF handles large datasets with many features and is robust to overfitting, making it effective for predicting poverty with diverse geospatial and socio-economic data.
Decision Trees: This technique splits data into subsets based on feature values. Each node represents a decision rule, and branches indicate the outcomes. Decision Trees are easy to interpret and visualize, useful for understanding how variables impact poverty.
Gradient Boosting: An ensemble method that builds models sequentially, correcting errors of previous models. It combines predictions of several weak learners to form a strong model. Gradient Boosting is known for its high accuracy and ability to capture complex patterns, aiding in precise poverty prediction.
Neural Networks (NN): Inspired by the human brain, NN consists of interconnected nodes organized in layers. NN models complex, non-linear relationships and are effective for handling large datasets with intricate patterns, helping to identify subtle interactions affecting poverty.
The data were split into 80% for training and 20% for testing. Performance metrics compared different algorithms (Hu et al., 2022; McBride & Nichols, 2018). Combining survey and geospatial data helped determine the predictive power of geospatial variables and their contribution to poverty prediction.
4.4.5 Limitations
Geospatial Data: While useful for economic activity measurement, the relationship between nightlight intensity and economic activity is inconsistent in some geographical areas. Variations in electricity generation, sectoral output, and other factors can influence the intensity differently across regions and times. Additionally, nightlight data are better for cross-sectional rather than longitudinal predictions. On the other hand, the point of interest data is continuously updated, which sometimes may not fully give the current status of available facilities.
Data Variability: The heterogeneity in data sources and their resolutions can introduce inconsistencies. The normalization and integration process attempts to mitigate these issues but may not completely eliminate them.
Model Dependency: The accuracy of predictions heavily depends on the chosen ML techniques and their implementation. Different algorithms might yield varying results, which can affect the robustness of poverty estimates.
Survey Data Limitations: Survey data, while comprehensive, may have inherent biases and limitations in capturing the full scope of socio-economic conditions, especially in remote or underrepresented areas. In the NSS survey data for some districts in India, the sample size is too small (less than 30) to estimate a robust value. In these cases, the missing values for those districts have been imputed using the K-Nearest Neighbours (KNN) method.
4.5 Results and Discussions
This section provides an overview of poverty in India and regional level using two distinct approaches: the monetary approach, which employs the international poverty line, and the non-monetary approach, which utilizes the Multidimensional Poverty Index (MPI).
4.5.1 Poverty in India: Status and Trends
Poverty in India is a complex and multifaceted issue, measured using various metrics to capture its different dimensions. Over the past few decades, India has made significant progress in reducing poverty, a trend that is evident through both income-based and multidimensional measures. As discussed earlier, the international poverty line, set at $3.20 per day for lower-middle-income countries (LMICs), is a key benchmark for tracking poverty. This threshold helps identify individuals who struggle to meet basic needs such as food, shelter, and clothing. India has shown remarkable progress in reducing poverty during the last two decades. In 2004–05, approximately 37% of the population lived in poverty, which declined to 22% in 2011–12 and further to 17% in 2021–22. This reduction underscores India's progress, especially compared to other LMICs, where the average poverty headcount was 46.1% in 2021–22, according to the World Bank. Additionally, the Poverty Clock data indicates that India's extreme poverty dropped to below 3% (2.4%) in 2022, suggesting the country is on track to achieve the SDG 2030 target of eliminating extreme poverty. The poverty headcount as measured using PLFS data, is estimated at 41.3% in 2021–22 (Appendix Table 4.4).
Beyond income or consumption-based measures, India also estimates poverty through multidimensional poverty index. As discussed earlier, this method considers deprivations in health, education, and living standards. According to Oxford Poverty and Human Development Initiative (OPHI), the multidimensional poverty in India decreased from 55.1% in 2005–06 to 27.9% in 2015–16, and further to 16.4% in 2019–211(see Appendix Table 4.5 for MPI headcounts for 2019–21). In 2005–06, approximately 645 million people were classified as multidimensionally poor, but this number dropped to about 370 million in 2015–16, and further to 230 million in 2019–21. This indicates that around 415 million individuals escaped multidimensional poverty over a span of thirteen years.
In addition, the recent poverty estimates based on the Household Consumption Survey 2022–23 by the National Statistics Office (NSO) and the India Human Development Survey (IHDS) by the National Council of Applied Economic Research (NCAER) also reveal a substantial reduction in poverty over the last decade. According to Rangarajan and Mahendra Dev (2022), India's poverty rate decreased from 21.9% in 2011–12 to 10.8% in 2022–23, using updated poverty lines recommended by the Rangarajan committee. Similarly, SBI Research found a lower poverty rate of around 4.5–5% for 2022–23, based on NSO data but using the updated Tendulkar committee poverty line (Gera, 2024). The IHDS survey indicates that poverty in India, according to the updated Tendulkar committee poverty line, declined from 21.2% in 2011–12 to 8.5% in 2023–24 (Desai et al., 2024). However, these estimates have reignited debates on the methodology of poverty measurement in India, given the long gap between surveys and varying poverty lines.
4.5.2 Regional Level Poverty
District-level poverty headcount calculated using the international poverty line ($3.20 PPP) and data from the Periodic Labour Force Survey (PLFS) reveal significant regional differences. A higher concentration of poverty is evident in the districts of eastern and central India. Specifically, districts in Bihar, Jharkhand, and Odisha in the east, as well as Chhattisgarh, Madhya Pradesh, and Uttar Pradesh in the central region, exhibit notably higher poverty rates (Map 4.1).
Map 4.1
Spatial distribution of poverty in India at district level using poverty head count.
Source Authors’ calculations from PLFS, 2021–22
The MPI approach, also reveals that poverty remains notably high in specific districts of eastern and central India. In the eastern states of Bihar, Jharkhand, and Odisha, the MPI indicates elevated levels of multidimensional poverty, reflecting persistent deprivations beyond income (Map 4.2). Similarly, central Indian states including Madhya Pradesh, Uttar Pradesh, and Chhattisgarh exhibit significant MPI-based poverty, highlighting widespread challenges in accessing essential services and improving living conditions.
Map 4.2
Spatial distribution of poverty in India at district level using MPI approach.
Source Authors’ calculations from NFHS, 2019–21
Several factors contribute to these regional disparities in poverty such as historical and structural factors, agrarian distress, low educational attainment, limited industrialization, governance and implementation issues.
The eastern and central regions of India have historically lagged behind in terms of industrial development and infrastructure compared to other parts of the country. This underdevelopment has perpetuated a cycle of poverty that is difficult to break (Bhattacharya, 2018).
Agriculture remains the primary source of livelihood in these regions. However, frequent droughts, poor irrigation facilities, and low agricultural productivity exacerbate poverty levels. Studies indicate that areas reliant on rain-fed agriculture are more vulnerable to poverty due to the unpredictability of weather patterns (Chandrasekhar & Mehrotra, 2016).
Lower levels of educational attainment in these regions limit economic opportunities. Education is a critical determinant of economic mobility, and districts with poor educational infrastructure and outcomes tend to have higher poverty rates (Tilak, 2015).
The lack of industrialization and economic diversification means fewer employment opportunities outside of agriculture. Industrial hubs in western and southern India have attracted more investment and job creation, leaving eastern and central regions behind (Saxena, 2018).
Inefficiencies in governance and the implementation of poverty alleviation programs can hinder progress. Effective delivery of social welfare schemes is often weaker in these regions, contributing to persistent poverty (Jha, 2019).
The above analysis of the data on poverty reveals that India has achieved significant progress in poverty reduction through both income and multidimensional measures. The decline in poverty rates, even amidst global challenges such as the COVID-19 pandemic, underscores the effectiveness of the country’s economic and social policies.
4.5.3 Poverty Prediction: Poverty Head Count
(i) Correlation Analysis of Poverty Predictors: A correlation heatmap was employed to analyse the relationships among various variables used in predicting poverty. The results of this analysis offer valuable insights into how these factors interact and influence poverty levels (Fig. 4.3).
Fig. 4.3
Correlation heat map: poverty head count.
Source Authors’ calculations
Nightlight Intensity: Nightlight intensity measures the brightness of nighttime illumination in an area. The analysis revealed a negative correlation between nightlight intensity and poverty, suggesting that regions with higher levels of nightlight generally experience lower poverty. This correlation is attributed to the fact that greater nightlight intensity often indicates higher economic activity and better infrastructure, which are associated with improved living conditions and economic opportunities (Elvidge et al., 2017).
Normalized Difference Vegetation Index (NDVI): The NDVI assesses the health and density of vegetation. A negative correlation with poverty was observed, indicating that higher NDVI values—reflecting lush, healthy vegetation—are often linked to reduced poverty. This relationship suggests that robust vegetation correlates with better agricultural productivity and environmental conditions, which can alleviate poverty (Pettorelli et al., 2014). However, this correlation is not uniform across regions. The states in eastern India like Odisha and West Bengal have high green cover due to favourable rainfall conditions but still face economic disadvantages. This discrepancy highlights the need to integrate NDVI with other socio-economic indicators for a more comprehensive understanding of poverty dynamics (Kumar & Patel, 2021).
Rainfall: Rainfall is a crucial factor for agriculture, a primary livelihood source for many rural inhabitants in India. The analysis shows a negative correlation between rainfall and poverty, indicating that higher rainfall generally correlates with lower poverty levels due to improved agricultural yields (Dube et al., 2021). However, this correlation is nuanced. In regions such as Punjab and Haryana, where extensive irrigation systems and advanced agricultural practices mitigate the impact of lower natural rainfall, poverty rates are lower despite less rainfall. Conversely, eastern regions with higher rainfall often encounter issues like flooding and underdeveloped agricultural infrastructure, which can constrain productivity and limit poverty reduction (Kumar & Verma, 2023).
Point of Interest (POI) Density: POI density, which measures the concentration of essential services like schools, hospitals, and markets, shows a negative correlation with poverty. This suggests that higher POI density provides better access to essential services and economic opportunities, contributing to lower poverty levels. Areas with more POIs typically offer enhanced services and infrastructure, which support economic growth and improve quality of life (Henderson et al., 2018).
Land Surface Temperature (LST): LST, in contrast to other variables, exhibits a positive correlation with poverty. Higher land surface temperatures can signify harsher living conditions and lower agricultural productivity due to heat stress, leading to increased poverty levels. Elevated temperatures can exacerbate climate risks, affecting both economic activities and overall living standards (Zhao et al., 2014).
Further, the analysis of correlations among key independent variables yields insightful findings about their interrelationships and implications for poverty prediction.
Nightlight Intensity and Point of Interest (POI) Density: Nightlight intensity and POI density are found to be highly correlated. Areas with higher nighttime light intensity often exhibit a greater density of points of interest, such as schools, hospitals, and markets. This correlation is logical, as urban and economically vibrant areas, which tend to be better illuminated at night, are likely to host a greater number of essential services and facilities. This relationship underscores how economic activity and infrastructure development, reflected in higher nightlight intensity, are closely associated with the availability of essential services, which can support poverty reduction (Elvidge et al., 2017; Henderson et al., 2018).
Rainfall and Normalized Difference Vegetation Index (NDVI): Rainfall and NDVI are also strongly correlated. Regions with adequate rainfall typically support healthier and denser vegetation, which is captured in higher NDVI values. Higher NDVI often reflects better environmental conditions and robust agricultural productivity, which are critical for reducing poverty. This relationship indicates that regions benefiting from sufficient rainfall are likely to experience improved vegetation health, contributing to better agricultural yields and, consequently, lower poverty levels (Kumar & Verma, 2023; Pettorelli et al., 2014).
This correlation analysis highlights the significance of various geospatial and environmental factors in understanding and predicting poverty levels. Nightlight Intensity, NDVI, Rainfall, and POI Density show a negative correlation with poverty, suggesting that improvements in these areas are associated with reduced poverty. Enhanced nightlight intensity and higher POI density indicate better infrastructure and access to essential services, while higher NDVI and adequate rainfall contribute to better environmental conditions and agricultural productivity. These factors collectively support poverty alleviation efforts by improving living conditions and economic opportunities. In contrast, LST is positively correlated with poverty. Higher land surface temperatures often indicate harsher living conditions and lower agricultural productivity due to heat stress. This correlation emphasizes the need for targeted climate mitigation and adaptation strategies to address the adverse effects of elevated temperatures and support poverty reduction in vulnerable regions (Zhao et al., 2014).
(ii) Goodness of Fit: When predicting poverty using machine learning and deep learning techniques, evaluating the goodness of fit is crucial for determining how well a model's predictions align with actual observed values. In this analysis, various methods were tested, and their performance was assessed using metrics such as Root Mean Squared Error (RMSE) and R-Squared (R2). The Random Forest algorithm emerged as the most effective model, demonstrating superior performance compared to other methods (Figs. 4.4, 4.5 and Table 4.2).
Fig. 4.4
Root mean squared error for different models: poverty head count.
Source Authors’ calculations
Fig. 4.5
Goodness of fit of GLS, NN, RF and RF only: poverty head count.
Source Authors’ calculations
Table 4.2
R-squared value of major models (least squares, neural network, and random forest): poverty head count
Model
R-square
Generalize Least Squares (GLS)
0.243
Neural Network (NN)
0.630
Random Forest (RF)
0.919
Source Authors’ calculations
Random Forest (RF) and Other Tree-Based Models: The Random Forest model exhibited the best performance among the tested methods. It achieved the lowest RMSE of 0.074 and the highest R-Squared value of 0.91. This indicates that Random Forest predictions were closest to the actual values, and the model accounted for a significant proportion of the variance in the data. The low RMSE suggests minimal prediction error, while the high R-Squared indicates strong explanatory power. Other tree-based models, including Adaboost, Bagging, and Gradient Boosting, also performed well but had slightly higher RMSE values compared to Random Forest, though their R-Squared values were similarly high (Breiman, 2001; Chen & Guestrin, 2016).
Generalized Least Squares (GLS): The Generalized Least Squares method showed the poorest performance. It recorded the highest RMSE of 0.243 and the lowest R-Squared value of 0.15. This suggests that GLS predictions were the least accurate, with substantial discrepancies between predicted and actual values. The high RMSE and low R-Squared indicate that GLS was unable to effectively capture the variance in the data and provided a less reliable fit compared to other methods (Greene, 2018).
Neural Networks: Neural Networks outperformed GLS but were less effective compared to Random Forest. The RMSE for neural networks was 0.162, and the R-Squared value was 0.63. While these results were an improvement over GLS, they still fell short of the accuracy achieved by Random Forest. Neural networks, despite their complexity and potential for capturing non-linear relationships, did not match the performance of tree-based models in this context (Goodfellow et al., 2016).
This analysis clearly demonstrates that the Random Forest model is the most effective technique for poverty prediction among those tested. Its superior performance, evidenced by the lowest RMSE and highest R-Squared value, makes it the most reliable method for generating accurate predictions. In contrast, GLS exhibited the least accuracy, with significant discrepancies in predictions, while neural networks performed moderately but did not surpass the efficacy of tree-based models. This evaluation highlights the importance of selecting appropriate machine learning techniques for predictive tasks. The Random Forest model’s robustness and precision make it a preferred choice for poverty prediction, while other methods may offer value but with varying degrees of accuracy.
(iii) Regional Poverty Predication Using Random Forest: The Random Forest (RF) model, noted for its superior accuracy and minimal Root Mean Squared Error (RMSE), has been utilized to predict poverty levels across districts. The process began by establishing a benchmark with district-level poverty headcount data from the Periodic Labour Force Survey (PLFS). This data served as the foundation for the RF model, which then generated poverty predictions for each district.
To validate the RF model's predictions, a comparative analysis was performed between the actual poverty headcounts from the PLFS and the predicted values from the RF model. This comparison is visually represented in Map 4.3. Both maps illustrate a similar spatial pattern, confirming that the RF model's predictions closely align with real-world data. This correspondence highlights the model's accuracy and reliability in forecasting poverty levels at the district level.
Map 4.3
Actual and predicted spatial distribution of poverty: poverty head count.
Source Authors’ calculations
Further analysis shows that the distribution of poverty across districts predicted by the RF model is almost identical to the actual distribution, with only minor discrepancies in the 40–60% range of poverty headcounts. This close alignment underscores the effectiveness of the RF model in capturing the spatial distribution of poverty. For policymakers and social scientists, this capability is invaluable as it provides a precise tool for identifying regions in need of targeted interventions and resource allocation (Fig. 4.6).
Fig. 4.6
Districts falling in similar poverty range: poverty head count.
Source Authors’ calculations
This indicates that Machine learning techniques like Random Forest offer significant advantages over traditional methods by enabling granular, district-level predictions that are otherwise challenging to achieve. The consistency between the actual and predicted poverty headcounts not only demonstrates the RF model's accuracy but also its practical utility in guiding effective poverty alleviation strategies.
(iv) Variable Importance: To improve the accuracy of poverty prediction, a Variable Importance Analysis was conducted to determine which spatial variables are most influential. This analysis utilized three different models to assess the impact of various spatial variables on poverty predictions: Model 1: All Variables Included—This model includes all spatial variables employed in the analysis. Model 2: Excludes Point of Interest (POI) Density—This model evaluates the impact of excluding POI density on predictive performance. Model 3: Excludes Average Nightlight—This model assesses the effect of removing average nightlight data (Fig. 4.7).
Fig. 4.7
Variable importance from RF: poverty head count.
Source Authors’ calculations
Model 1: All Variables Included: In Model 1, where all variables were considered, POI density emerged as the most influential predictor, whereas nightlight intensity as the least. However, the high correlation between these two variables affected their individual importance. Specifically, POI density accounted for 40% of the variability in poverty, while average nightlight explained 9%. The overlap between POI density and nightlight data suggests that much of the information captured by nightlight is also reflected in POI density, which may reduce the distinct contribution of nightlight to the model (Fig. 4.7, Panel 1) (Chen et al., 2020).
Model 2: Excluding POI Density: When POI density was excluded from Model 2, average nightlight became the most significant variable, explaining 33% of the variability in poverty (Fig. 4.7, Panel 2). This shift highlights the substantial role of nightlight data in capturing poverty variation when POI density is not considered. Nightlight data, often associated with economic activity and infrastructure, provides valuable insights into poverty levels (Doll et al., 2008).
Model 3: Excluding Average Nightlight: In Model 3, where average nightlight was excluded, POI density emerged as the predominant predictor, explaining 42% of the variability in poverty (Fig. 4.7, Panel 3). This finding underscores the substantial role of POI density in predicting poverty when nightlight data is not available. POI density reflects the accessibility of essential services and infrastructure, which are critical factors in poverty alleviation (Yao et al., 2023).
This analysis reveals the crucial roles of both nightlight and POI density in predicting poverty. The combined use of these variables enhances model stability and prediction accuracy. For instance, Model 1, incorporating both variables, achieved a lower Root Mean Squared Error (RMSE) of 0.075 compared to 0.089 in models using only one variable. This demonstrates that integrating both nightlight and POI density improves the reliability and comprehensiveness of poverty predictions (Gao et al., 2017). Although POI density is a significant predictor, its limitations, such as periodic updates and potential errors, can affect its reliability. On the other hand, nightlight data provides a more stable and consistent measure of economic activity and infrastructure. Therefore, combining both variables offers a more robust approach to predicting poverty, leveraging the strengths of each data source to achieve more accurate and reliable predictions.
4.5.4 Poverty Prediction: Multidimensional Poverty Index (MPI) Poverty
(i) Correlation Analysis of Poverty Predictors: Similar to Sect. 4.5.3, a correlation heat map has been used to examine the relationships among various variables employed to predict Multidimensional Poverty Index (MPI) poverty (Fig. 4.8). The analysis reveals several significant relationships between MPI-based poverty and environmental as well as socio-economic variables.
Fig. 4.8
Correlation heat map: MPI approach.
Source Authors’ calculations
Nightlight Intensity, NDVI, Rainfall, and POI Density: There is negative correlation with Nightlight Intensity, NDVI, Rainfall, and POI Density. The brightness of nighttime lights in an area is inversely related to MPI poverty levels. Areas with higher nightlight intensity typically have lower levels of MPI poverty. This correlation suggests that increased economic activity and infrastructure, often reflected in brighter nightlights, are associated with improved living conditions and reduced poverty.
Normalized Difference Vegetation Index (NDVI): The NDVI shows a negative correlation with MPI poverty. Higher NDVI values, indicating lush and healthy vegetation, are linked to better agricultural productivity and environmental conditions, which contribute to lower poverty levels. Adequate and consistent rainfall is crucial for agriculture, a primary livelihood source for many in rural areas. The analysis shows that regions with higher rainfall tend to have lower MPI poverty. This is because sufficient rainfall improves agricultural yields, boosting income levels and reducing poverty.
Point of Interest (PoI) Density: The PoI density, is negatively correlated with MPI poverty. Higher PoI density indicates better access to crucial services and economic activities, thereby reducing MPI poverty levels.
Land Surface Temperature (LST): Unlike the other variables, LST is positively correlated with MPI poverty. Higher land surface temperatures often indicate harsher living conditions, reduced agricultural productivity due to heat stress, and increased vulnerability to climate risks. These factors contribute to higher levels of MPI poverty.
These correlations observed in the analysis have significant implications for poverty alleviation strategies. The negative correlations with nightlight intensity, NDVI, rainfall, and PoI density suggest that improving infrastructure, adequate rainfall, enhancing vegetation health, and increasing access to essential services can effectively reduce MPI poverty. Conversely, the positive correlation with LST underscores the need for climate adaptation strategies to mitigate the adverse impacts of higher temperatures on vulnerable populations. Studies have also shown that nighttime light intensity is a strong indicator of economic activity and development. It reflects infrastructure development and economic prosperity, which are crucial for reducing poverty (Henderson et al., 2012).
The health of vegetation, as measured by NDVI, is closely linked to agricultural productivity. Higher NDVI values indicate better crop yields, which are essential for the livelihoods of rural populations (Tucker, 1979). Consistent and adequate rainfall is vital for agricultural productivity, especially in regions dependent on rain-fed agriculture. Improved rainfall patterns can significantly enhance crop yields and reduce poverty (Grove, 1996). Access to essential services such as education, health care, and markets is critical for improving living standards and reducing poverty. Higher PoI density facilitates better access to these services, contributing to poverty reduction (Baker & Grosh, 1994). Higher land surface temperatures can exacerbate living conditions and reduce agricultural productivity, making populations more vulnerable to poverty. Climate adaptation measures are essential to address these challenges (Sivakumar et al., 2005). However, there are limitations of these high correlation across the regions in the country as discussed in the earlier section.
(ii) Goodness of Fit: The results of the goodness of fit of different ML models are presented in Figs. 4.9, 4.10, and Table 4.3. Among all the ML techniques, the Random Forest method exhibited the best performance. It had the lowest RMSE of 0.054 and the highest R-Square value of 0.85. These metrics indicate that: The predictions made by the Random Forest model were very close to the actual values, demonstrating high accuracy. The model explained a significant portion of the variance in the data, highlighting its robustness in capturing the underlying patterns. Other tree-based models, such as AdaBoost, Bagging, and Gradient Boosting, also demonstrated low RMSE and high R-Square values but were not as effective as Random Forest. These models had slightly higher RMSE values, indicating that while they were accurate, they did not perform as well as Random Forest in predicting poverty.
Fig. 4.9
Root mean squared error for different models: MPI poverty.
Source Authors’ calculations
Fig. 4.10
Goodness of fit of GLS, NN, RF and RF only: MPI poverty.
Source Authors’ calculations
Table 4.3
R-squared of major models (leas squares, neural network, and random forest): MPI poverty
Model
R-square
Generalize Least Squares (GLS)
0.04
Neural Network (NN)
0.79
Random Forest (RF)
0.85
Source Authors’ calculations
On the other hand, the Generalized Least Squares method showed the poorest performance among the methods tested with high RMSE (0.140) suggests a larger discrepancy between the predicted and actual values, indicating poor predictive accuracy. Also, low R-Square (0.04) indicate that the model failed to capture much of the variance in the data, making it ineffective in predicting poverty. The high RMSE and low R-Square values suggest that GLS is not suitable for generating accurate poverty predictions.
Similarly, the Neural Networks performed better than GLS but were still not as effective as other machine learning models. The RMSE (0.063) value is lower than that of GLS, indicating improved predictive accuracy. Although R-Square (0.79) is higher than GLS, this value is still lower than that of Random Forest, suggesting that Neural Networks were less effective in capturing the variance in the data compared to tree-based models. The performance of Neural Networks, while better than GLS, lagged behind Random Forest and other tree-based models in predicting poverty.
Random Forest is the most effective method for predicting MPI poverty among the ML techniques tested. Its superior performance, evidenced by the lowest RMSE and highest R-Square, makes it the best choice for generating accurate poverty predictions. On the contrary, GLS showed the least accuracy, while Neural Networks performed moderately but still lagged behind other machine learning models.
(iii) Regional Poverty Predication Using Random Forest: The Random Forest (RF) model, which demonstrated the highest accuracy and lowest Root Mean Squared Error (RMSE), has been used to predict MPI poverty levels for each district. To conduct the regional poverty prediction, district-wise MPI poverty headcount data from the National Family Health Survey 5 (NFHS-5) was used as a benchmark. The RF model was then applied to these headcount figures to generate predicted poverty levels for each district. The actual poverty headcounts from the NFHS-5 data and the predicted headcounts from the RF model were compared in Map 4.4. Both maps exhibit a similar pattern, demonstrating that the RF model's predictions closely align with the real-world data.
Map 4.4
Actual and predicted spatial distribution of poverty: MPI poverty.
Source Authors’ calculations
This alignment further validates the accuracy and reliability of the RF model in predicting district-level poverty levels (Fig. 4.11). The almost identical number of districts (99%, 705 districts out of total of 709 districts) falling within different poverty ranges, except for the 31–46% range, for the actual and predicted poverty headcounts, highlights the effectiveness of the RF model in capturing the spatial distribution of MPI poverty. This capability is crucial for policymakers and social scientists as it provides a robust tool for identifying regions that require more focused interventions and resources.
Fig. 4.11
Districts falling in similar MPI poverty range.
Source Authors’ calculations
This analysis reveals that RF model is the invaluable tools for predicting MPI poverty at granular levels, which is not feasible with traditional methods. The consistency between the actual and predicted poverty headcounts not only underscores the model's accuracy but also its practical utility in guiding poverty alleviation efforts.
(iv) Variable Importance: A Variable Importance Analysis was conducted to determine which spatial variables most significantly contribute to predicting MPI poverty levels. Three models were evaluated to assess the impact of various spatial variables (Fig. 4.11):
Model 1—All Variables Included: With all variables included, POI density and average nightlight emerged as the most influential. However, due to their high correlation, their individual contributions were affected. POI density explained 42% of the variability in MPI poverty, while nightlight explained 11%. The overlap in information provided by these variables reduced the apparent importance of nightlight (Fig. 4.12, Panel 1) (Liaw & Wiener, 2002).
Fig. 4.12
Variable importance from RF: MPI poverty.
Source Authors’ calculations
Model 2—Excluding POI Density: When POI density was excluded, average nightlight became the most significant variable, accounting for 35% of the variability in MPI poverty (Fig. 4.12, Panel 2). This highlights that nightlight data alone can capture a substantial portion of poverty variation when POI density is not considered.
Model 3—Excluding Average Nightlight: In the absence of average nightlight data, POI density emerged as the primary contributor, explaining 46% of the variability in MPI poverty (Fig. 4.12, Panel 3). This underscores the significant role of POI density in poverty prediction when nightlight data is not available.
4.6 Poverty Projection for 2023–24
The poverty projections for the years 2023–24 using ML techniques and geospatial data using monetary based international poverty line and non-monetary based MPI poverty has been estimated to explore that how in absence of survey data, the poverty can be estimated using the integration of geospatial data and survey trend past poverty trends in India.
4.6.1 Poverty Projection Based on Poverty Head Count
The poverty projections of poverty head count using the international poverty line for 2023–24 indicate that India as a whole is expected to see a decrease in poverty from 41.3% in 2021–22 to 34.7% in 2023–24, reflecting overall progress, yet state-level trends show a mix of increases and decreases (Fig. 4.13).
Fig. 4.13
Poverty projection for major states in India: poverty head count (2023–24). Note The 2021–22 figure represents the actual poverty headcount based on the World Bank's international poverty line for Lower-Middle-Income Countries (LMICs). The 2023–24 figure is a projected poverty headcount, estimated using machine learning techniques.
Source Authors’ calculations
States with Decrease in Poverty: Delhi's poverty rate is projected to decrease from 3.0 to 2.3%, likely due to effective urban poverty alleviation strategies and robust economic activity. Haryana and Karnataka are also expected to see reductions from 26.6% to 25.2% and 39.1% to 33.0%, respectively, possibly due to economic growth and improved social programs. Maharashtra shows a decrease from 42.0 to 38.8%, while Assam is projected to reduce poverty from 49.2 to 42.9%, indicating successful state-level interventions and growth. Madhya Pradesh and Odisha are expected to see slight decreases from 50.2% to 49.0% and 53.1% to 51.3%, respectively. Uttar Pradesh shows a significant reduction from 58.1 to 32.5%, reflecting major improvements in poverty reduction programs and economic development. Jharkhand and Bihar, among the poorest states, are projected to reduce their poverty rates from 59.9% to 57.5% and 72.1% to 62.5%, respectively, indicating progress in targeted poverty alleviation efforts. Chhattisgarh is also expected to decrease from 74.2 to 63.1%, showing substantial improvements.
States with Increase in Poverty: Kerala is projected to see an increase in poverty from 7.9 to 10.2%, which may be attributed to economic challenges or shifts in poverty measurement criteria. Punjab shows a rise from 13.1 to 17.2%, potentially due to economic disruptions or inefficacies in poverty alleviation measures. Andhra Pradesh and Tamil Nadu are expected to see significant increases from 16.0% to 25.6% and 17.8% to 27.8%, respectively, possibly due to socio-economic disruptions or policy challenges. Telangana and Himachal Pradesh are also expected to experience increases from 18.3% to 21.2% and 19.6% to 23.0%, respectively. West Bengal shows a marginal increase from 26.4 to 27.4%, and Uttarakhand sees a slight rise from 26.7 to 27.5%. Jammu & Kashmir and Gujarat are projected to see increases from 29.8% to 30.2% and 32.0% to 35.3%, respectively, while Rajasthan shows a significant increase from 34.6 to 38.5%, indicating potential challenges in poverty reduction strategies.
The mixed trends across states highlight the complex interplay of economic, social, and policy factors influencing poverty. States with decreasing poverty rates often benefit from effective poverty alleviation programs, robust economic growth, and improved social services, while those with increasing rates may face economic disruptions, policy challenges, or measurement changes.
4.6.2 Poverty Projection Based on MPI Poverty
The MPI poverty projections for 2023–24 reveal substantial differences across Indian states compared to the all-India rate, which is expected to decline from 17.1% in 2019–21 to 10.2% in 2023–24 (Fig. 4.14).
Fig. 4.14
Poverty projection for major states in India: MPI (2023–24).
Note The 2019–21 figure represents the actual poverty based on the MPI. The 2023–24 figure is a projected MPI poverty, estimated using machine learning techniques. Source Authors’ calculations
Among the states in India, Kerala is expected to further reduce its poverty rate from 0.9 to 0.3%, highlighting its continued focus on high education and healthcare standards. Punjab and Telangana are also set to see significant reductions, with Punjab's rate dropping from 4.7 to 3.1% and Telangana's from 6.8 to 3.5%, showing the success of their poverty alleviation measures and economic policies.
Andhra Pradesh and Karnataka are projected to make notable progress, with poverty rates declining from 7.3% to 3.3% and from 8.5% to 4.5%, respectively. Maharashtra's rate is expected to decrease from 9 to 6.9%, Uttarakhand's from 9.8 to 5.5%, Gujarat's from 14.4 to 8.5%, and West Bengal's from 15.3 to 8.5%. These reductions reflect the successful implementation of targeted social and economic interventions.
Despite high poverty levels, Jharkhand and Bihar are projected to achieve considerable reductions, with Jharkhand's rate falling from 30.6 to 18.1% and Bihar's from 34.7 to 23.3%, indicating effective poverty reduction strategies. Himachal Pradesh is expected to see a slight decrease from 5.3 to 5.2%.
Ladakh and Jammu & Kashmir show modest improvements, with Ladakh's rate decreasing from 6.5 to 4.8% and Jammu & Kashmir's from 6.3 to 4.7%. Odisha's poverty rate is projected to slightly decrease from 20.8 to 19.4%, while Assam's is expected to drop from 21.4 to 17.4%. Madhya Pradesh's rate is likely to fall from 24 to 20.5%, and Uttar Pradesh's from 22.9 to 13.4%, reflecting effective poverty alleviation programs.
On the other hand, Tamil Nadu is projected to see an increase in its poverty rate from 3.8 to 4.7%, possibly due to economic challenges or policy inefficiencies. Delhi's poverty rate is expected to remain unchanged at around 3.4%. However, Haryana is likely to experience an increase in poverty from 7.2 to 8.9%, suggesting potential socio-economic disruptions.
The above poverty projections using the poverty head count (PHC) and the MPI poverty for 2023–24 both indicate an overall reduction in poverty in India, but they highlight different aspects and trends. The poverty head count, which measures poverty based on income, shows a decrease in India's overall poverty rate, with states like Delhi, Haryana, Karnataka, Maharashtra, Assam, Madhya Pradesh, Odisha, Uttar Pradesh, Jharkhand, Bihar, and Chhattisgarh showing decreases, reflecting effective urban poverty alleviation, economic growth, and improved social programs, while states like Kerala, Punjab, Andhra Pradesh, Tamil Nadu, Telangana, Himachal Pradesh, West Bengal, Uttarakhand, Jammu & Kashmir, Gujarat, and Rajasthan. In contrast, the MPI poverty, which considers multiple dimensions of poverty such as education, health, and living standards, also shows a decrease in the all-India poverty rate, with states like Kerala, Punjab, Telangana, Andhra Pradesh, Karnataka, Maharashtra, Uttarakhand, Gujarat, West Bengal, Jharkhand, and Bihar showing significant reductions, while states like Tamil Nadu, Delhi, Himachal Pradesh, Haryana, Jammu & Kashmir, Odisha, and Assam show either slight increases or minor improvements. The difference in poverty rate for PHC and MPI can be attributed to several factors, including the methods of measurement, variable used, the socio-economic factors, and implementation of variations in social welfare schemes, and other policy implementations. This highlights the importance of considering both income and multiple dimensions of poverty reduction policies to address poverty in different regions of the country.
4.7 Summary and Conclusion
This chapter explores the use of machine learning techniques combined with geospatial and survey data to improve poverty prediction in India. The goal is to determine if integrating multiple data sources can provide more accurate and timely poverty estimates compared to traditional, expensive survey methods, and also predict poverty. The analysis shows significant declines in poverty using both income-based and multidimensional methods, though regional disparities persist, especially in states like Bihar and Uttar Pradesh. Geospatial variables like nightlight intensity and NDVI (Normalized Difference Vegetation Index) are negatively correlated with poverty, while higher land surface temperatures are positively correlated.
Among the various machine learning techniques tested, the Random Forest Model proved to be the most accurate in predicting poverty levels, particularly when using variables such as nightlight intensity and Point of Interest (POI) density. This indicates that integrating machine learning techniques with geospatial data offers a highly effective, timely, and cost-efficient method for obtaining detailed poverty estimates. This approach provides valuable insights for policymakers to target poverty alleviation efforts more effectively. Compared to traditional survey methods, which are costly and infrequent, this method offers more immediate and detailed insights into poverty distribution. The relevance of these ML methods and use of geospatial data include, strengthening the ability of national statistical systems, producing better and more timely data to inform policies and monitor progress, contributing to existing literature and leveraging data innovations, and also potential to improve integration of geospatial and other statistical data.
This analysis highlights the effectiveness of machine learning techniques, particularly Random Forest, in predicting poverty at granular levels. This integrated approach improves the accuracy of poverty predictions and supports more effective policymaking, contributing to better-targeted interventions and resource allocation in the fight against poverty. The contribution of this study is introduction of the integration of conventional household survey data with non-conventional data like geospatial information and satellite imagery. Demonstrates the potential of applying ML algorithms on integrated data for timely analysis of the spatial distribution of poverty. This can be extended in the future to other non-conventional data sources, enabling a multidimensional examination of spatial associations. Expanding analysis with enhanced spatial resolution will provide better insights into poverty and inequality.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.