Zum Inhalt
Erschienen in:

Open Access 2025 | OriginalPaper | Buchkapitel

3. Decomposition of Inequality of Opportunity

verfasst von : Balwant Singh Mehta, Ravi Srivastava, Siddharth Dhote

Erschienen in: Predicting Inequality of Opportunity and Poverty in India Using Machine Learning

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Dieses Kapitel untersucht das komplexe Zusammenspiel zwischen Wirtschaftswachstum und Einkommensungleichheit, wobei ein besonderer Schwerpunkt auf dem Konzept der Chancenungleichheit (IOp) liegt. Zunächst wird die historische Debatte um die Kuznets-Inverted-U-Hypothese untersucht, die postuliert, dass die Einkommensungleichheit zunächst zunimmt und dann abnimmt, wenn sich Volkswirtschaften entwickeln. Das Kapitel vertieft sich dann in den aktuellen globalen Kontext und beleuchtet die krassen Einkommens- und Wohlstandsunterschiede, die seit den 1980er Jahren entstanden sind, angetrieben durch Deregulierungs- und Liberalisierungspolitik. Die Analyse zeigt, dass sich die weltweite Einkommensungleichheit zwischen den Ländern zwar verringert hat, die Ungleichheit innerhalb der einzelnen Länder jedoch zugenommen hat, insbesondere in Entwicklungsländern wie Indien. Das Kapitel bietet eine detaillierte Fallstudie über Indien, wo rasches Wirtschaftswachstum nicht zu einer gerechten Einkommensverteilung geführt hat, was zu erheblichen Einkommensunterschieden und sozialen Ungleichheiten geführt hat. Es untersucht die Rolle historischer und soziokultureller Faktoren wie Kastendiskriminierung und Geschlechterungleichheiten bei der Aufrechterhaltung der Einkommensungleichheit. Das Kapitel behandelt auch die methodischen Ansätze zur Messung des Augeninnendrucks, einschließlich traditioneller parametrischer Methoden und fortgeschrittener maschineller Lerntechniken. Er hebt die Verwendung von bedingten Folgerbäumen, bedingten Folgerbäumen und Transformationsbäumen hervor, um Arten und Ausmaß des Aufwands zu identifizieren, und liefert ein differenzierteres Verständnis der Faktoren, die zur Einkommensungleichheit beitragen. Das Kapitel schließt mit politischen Empfehlungen zur Bekämpfung der Einkommensungleichheit und zur Förderung der Chancengleichheit für alle Menschen, wobei die Notwendigkeit gezielter Interventionen betont wird, um Bildungschancen, Beschäftigungsaussichten und Lebensbedingungen für benachteiligte Gruppen zu verbessern.
Hinweise
Disclaimer: The presentation of material and details in maps used in this chapter does not imply the expression of any opinion whatsoever on the part of the Publisher or Author concerning the legal status of any country, area or territory or of its authorities, or concerning the delimitation of its borders. The depiction and use of boundaries, geographic names and related data shown on maps and included in lists, tables, documents, and databases in this chapter are not warranted to be error free nor do they necessarily imply official endorsement or acceptance by the Publisher or Author.
Earlier version of this chapter was published as a working paper by Institute for Human Development (IHD), and in the Indian Journal of Labour Economics (Mehta et al., 2023). Excerpts have been re-used here with permission.

3.1 Introduction

The interplay between economic growth and income distribution has been a central topic in economic research for several decades. A landmark study by Kuznets (1955) introduced the concept that as countries experience early stages of rapid economic growth, income inequality initially widens. This phenomenon occurs because the benefits of growth are not evenly distributed. However, as economies mature and achieve higher levels of income, the advantages of growth become more widely shared, leading to a reduction in income inequality. This creates an inverted U-shaped relationship between economic growth and inequality, known as Kuznets's inverted-U hypothesis. This hypothesis has sparked extensive debate and research. While some empirical studies support the inverted-U relationship, others challenge it. Critics argue that economic growth alone is insufficient to address poverty and inequality. Instead, they stress the need to differentiate between the ‘growth effect’ and the ‘inequality effect’ within an economy (Bourguignon, 2004; Ravallion & Chen, 2003). In the contemporary global context, many countries are grappling with this dual challenge of reducing poverty while addressing rising income inequality amid their growth trajectories.
According to the latest World Inequality Report (WIR, 2022) by Chancel et al. (2022), income and wealth disparities have markedly increased since the 1980s. This rise in inequality is linked to various deregulation and liberalization policies implemented across different countries. The report highlights a widening gap between the rich and the poor: the wealthiest 10% of the global population now captures 52% of global income, while the poorest 50% own only 8.5%. Wealth inequality is even more pronounced, with the poorest 50% holding just 2% of total wealth, while the richest 10% control an astounding 76%. Although global income inequality between countries has reduced since the 1990s—evidenced by a significant reduction in the income gap between the richest and poorest countries—within-country inequalities have intensified. The income disparity between the richest 10% and the poorest 50% within countries has nearly doubled, from a ratio of 8.5–15 times. This trend is particularly evident in developing nations like India and China, which face the challenge of reducing inequality and poverty concurrently during periods of rapid economic growth.
India, in particular, has witnessed substantial economic growth over the past two decades, averaging an annual growth rate of 7% (Anand & Thampi, 2016; Chancel & Piketty, 2019). Despite this growth, income inequality has worsened. The benefits of economic progress have not been equitably distributed, leading to India being categorized as one of the ‘most unequal countries in the world’ (Chancel et al., 2022). Research by Chancel and Piketty (2019) reveals that the richest 10% of the Indian population holds 57% of the national income, whereas the poorest 50% hold only 13%. Household surveys show significant income disparity, with a Gini coefficient of 0.543 in 2012 (Anand & Thampi, 2016), and a higher Gini coefficient of 0.587 for per capita income among agricultural households in 2013. The Institute for Competitiveness's inequality report indicates a substantial divergence in earnings, with the cumulative annual earnings of the top 1% being nearly three times larger than those of the bottom 10%. Between 2017–18 and 2019–20, the share of income captured by the richest 1% increased from 6.1% to 6.8%, while the share of the poorest 50% remained stagnant at around 22%. During this period, the income of the richest 1% grew by 15%, and the income of the richest 10% rose by 8.1%, whereas the income of the poorest 10% declined by 1% (Chancel et al., 2022). These trends suggest that the concept of ‘trickle-down economics’ is not applicable in the Indian context. Studies have also highlighted the negative consequences of uneven growth processes in India. Anand and Thampi (2016) discuss the rising wealth inequality during the neoliberal era, and Chancel and Piketty (2019) emphasize the increase in the income share of the wealthiest 1%. Deaton and Stone (2013) argue that when income is concentrated at the top without a simultaneous rise in average income levels at the bottom, income inequality inevitably increases.
In India, rising income inequality is not merely a result of skewed income distribution but is also driven by entrenched social disparities and hierarchies. The persistence of resource inequalities, such as land ownership and income, along with ongoing caste-based discrimination, exacerbates this issue (Tagade et al., 2018). Gender disparities are evident in women's limited participation in the labour market and the disproportionate burden of unpaid work (Ghose, 2021). Additionally, rural–urban wage gaps and gender-based wage disparities highlight persistent gender discrimination (Deshpande et al., 2018; IHD, 2014). Addressing these issues is crucial for creating a society that ensures equal opportunities for all individuals, bridges gaps between identity groups, and provides greater representation for historically marginalized populations (Weisskopf, 2011).
In rapidly growing countries like India, the persistent trend of rising income inequality has prompted increased research to identify its main causes. There is a consensus that equal opportunities for success are vital. Literature also highlights the connection between inequality of opportunity (IOp) and income inequality, particularly with the growing accumulation of income at the top where inheritances are more prevalent (Piketty, 2011; Piketty & Zucman, 2015). As discussed in Chapters 1 and 2, Roemer (1998) argues that IOp results from the interplay between ‘circumstances’ and the ‘efforts’ exerted by individuals (also discussed in detail later section). The study of IOp has gained prominence in empirical research, focusing on the unfair aspects of societal inequality (Brunori & Neidhöfer, 2021a; Brunori et al., 2019; Checchi & Peragine, 2010; Ferreira & Gignoux, 2011; Ferreira & Peragine, 2015; Fleurbaey, 2008; Hothorn & Zeileis, 2021; Roemer & Trannoy, 2016; Salas-Rojo & Rodríguez, 2022).
Most research on IOp has primarily concentrated on developed nations, with comparatively limited attention given to developing countries like India. Addressing inequality in India necessitates substantial redistribution of income and wealth due to the country's pronounced social and spatial inequalities. Historical and socio-cultural factors—such as social groups, religions, locations, and geographical regions—have contributed to a fragmented society in India, resulting in unequal privileges across different groups (Das & Biswas, 2022; Singh, 2012). Moreover, gender disparities further exacerbate this inequality at the individual level. Some select studies have assessed IOp in India by analysing consumption and earnings while considering various circumstances such as social groups, gender, place of birth, rural versus urban locations, parental education, and occupation (Asadullah & Yalonetzky, 2012; Choudhary et al., 2019; Das & Biswas, 2022; Lefranc & Kundu, 2020; Motiram, 2018). These studies highlight that a significant portion of income or consumption inequality is attributable to unequal circumstances, with parental education emerging as a crucial determinant as also highlighted in Chapter 2. However, many of these studies rely on statistical assumptions and potentially biased or arbitrary model selection approaches. Notably, the empirical research to date has often overlooked the role of effort—a critical component of Roemer’s theory (Ramos & Van de Gaer, 2021). Incorporating effort into IOp estimates would offer a more nuanced understanding of the factors influencing inequality and help shape more effective policy interventions aimed at reducing it. This chapter aims to explore this aspect of IOp in detail through a case study of India.
The structure of the rest of the chapter is as follows: Sect. 3.2 outlines the conceptual framework and methodology, while Sect. 3.3 delves into the measurement approaches for IOp. Section 3.4 provides details on the data sources and variables utilized in the analysis. Section 3.5 presents descriptive statistics of the sample, discusses the results, and explores key variables contributing to IOp through conditional inference regression trees and transformation trees. Finally, Sect. 3.6 concludes the chapter by summarizing the main findings and offering policy recommendations.

3.2 Conceptual Framework

John Rawls (1958a, 1958b, 1971) argued that justice in an egalitarian society could be achieved through the principle of equality of opportunity, often represented by metaphors such as ‘leveling the playing field’ or ‘equality at the starting gate.’ In Rawls’ vision of a just society, individuals are granted fair and equal chances to pursue their interests, with a particular focus on ‘primary goods’. This ethical justification for equality of opportunity spurred significant philosophical debate and reshaped the understanding of equality (Arneson, 1989; Cohen, 1989; Dworkin, 1981a, 1981b; Sen, 1980). Building on Rawls’ work, Roemer (1993, 1998) and Fleurbaey (1995, 2008) developed a systematic approach to measuring inequality of opportunity (IOp). They identified two main factors influencing individuals’ outcomes: effort and circumstances.
Effort refers to variables within an individual's control, such as the hours devoted to work or study, the quality of their work, and their occupational choices. Circumstances, in contrast, include factors beyond individual control, such as family background, socio-economic status, ethnicity, gender, and age (Roemer & Trannoy, 2016). Mathematically, this relationship can be expressed as follows: In a population of individuals indexed from \(1, \ldots N\), the outcome of individual I, denoted as \({y}_{i}\), as a result of the interaction between circumstances (\({C}_{i})\) and effort (\({e}_{i})\).
$$y_{i} = g(C_{i} ,e_{i} ),\quad \forall \,i = 1, \ldots N$$
(3.1)
In this Eq. (3.1), the function g represents how circumstances and effort collectively determine the outcome for each individual.
Roemer's framework for achieving equality of opportunity involves categorizing the population into groups based on circumstances (types) and effort levels (tranches). Types are defined so that individuals within the same type share identical circumstances, while individuals within the same tranche exhibit similar effort levels. Under this framework, equal opportunity requires that individuals within the same type have equal potential to convert resources into outcomes, making it essential to address between-type inequality while disregarding within-type variability due to individual effort (Roemer, 2002).
As discussed in Chapter 2, the literature on IOp frequently discusses two ethical principles: the ‘compensation’ principle and the ‘reward’ principle. The compensation principle advocates for addressing inequalities arising from circumstances, while the reward principle emphasizes individual responsibility, suggesting that higher outcomes should be awarded for greater effort (Plassot et al., 2022). The reward principle accepts variations in outcomes based on effort, whereas the compensation principle focuses on rectifying unjust inequalities. The compensation principle can be approached through either the ex-ante or ex-post method (Fleurbaey & Peragine, 2013; Plassot et al., 2022). The ex-ante approach addresses inequalities between individuals of different circumstances or types, while the ex-post approach focuses on individuals with the same effort levels or tranches. These differing approaches arise from varying perspectives on the nature of effort (Fleurbaey & Peragine, 2013; Ramos & Van de Gaer, 2016).
In the ex-post approach, equality of opportunity is realized when individuals who exert the same effort achieve the same outcome, regardless of their type. This principle is assessed by comparing outcomes of individuals with the same effort but different circumstances (Brunori & Neidhöfer, 2021a, 2021b). Given that effort is often unobservable, Roemer (2002) proposed a method based on three key assumptions: firstly, individuals are categorized into types; secondly, outcomes increase monotonically with effort. This means that higher effort within each type should correspond to better outcomes, mathematically expressed as:
$$y^{k} (e_{i} ) \ge y^{k} (e_{j} ) \Leftrightarrow e_{i}^{k} \ge e_{j}^{k} ,\quad \forall \,k = 1, \ldots K;\,\forall \,e_{i} ,e_{j} \, \in R$$
(3.2)
where \(y^{k} (e_{i} )\) denotes the outcome of an individual in type k with degree of effort i represented by \(e_i\), while by \(y^{k} (e_{j} )\) indicate the outcome of individual in type k with degree of effort j represented by \(e_{j}\), and K signifies the total number of types. It is assumed that the distribution of effort is a characteristic of the type, meaning that when comparing efforts across individuals in different types, adjustments should be made to account for the fact that these efforts are drawn from different distributions, for which individuals should not be held responsible.
Roemer distinguishes between ‘level of effort’ and ‘degree of effort’. The ‘degree of effort’ is a morally relevant measure, identified as the quantile of the effort distribution for the specific type to which an individual belongs. The assumption is that all circumstances are identified and are exogenous to individual control. If different types face varied incentives and constraints in exerting effort, this is considered a characteristic of the type, falling under circumstances beyond individual control (Brunori et al., 2023). For instance, a student with educated parents might find it easier to dedicate long hours to studying compared to a student from a less favourable background.
The distribution of effort within type k and quantiles \(\,\pi \in [0,1]\) is denoted as \(G^{k} (e)\). . When effort is unobservable but the outcome monotonically increases with effort, Roemer suggests defining the ‘degree of effort’ by the quantile position in the type-specific outcome distribution (y), represented as \(y^{k} G^{k} (e) = y^{k} (\pi )\). This definition adjusts for differences in absolute effort levels, allowing for comparison across types. This requirement that outcomes should be identical for individuals exerting the same effort is expressed as:
$$y^{k} (\pi ) = y^{l} (\pi ) \Leftrightarrow F^{k} (y) = F^{l} (y),\quad \forall \,k,l = 1, \ldots K,\,{\text{and}}\,\pi \in [0,1]$$
(3.3)
where Fk(y) denotes the type-specific cumulative distribution of outcomes in type k.
Checchi and Peragine (2010) and Ferreira and Gignoux (2011) propose an ex-post IOp measure that evaluates inequality within a standardized distribution. This measure accounts for variability in outcomes among individuals exerting the same effort. When Eq. (3.3) is satisfied, indicating that individuals with the same effort achieve the same outcome, the measure equals zero. As disparities in outcomes among similarly exerted efforts increase, the IOp measure rises accordingly. The standardized distribution, \({\overline{\text{Y}}}_{{{\text{EP}}}}\), is computed by replacing individual outcomes with standardized values, as:
$${\acute{y}}_{i}^{k} (\pi ) = y_{i}^{k} (\pi )\frac{\mu }{{\mu^{\pi } }},\quad \forall \,i = 1, \ldots N;\,k = 1, \ldots K;\,\forall \,\pi \in [0,1]$$
(3.4)
where \(y_{i}^{k} (\pi )\) represents the outcome of individual i in the type k at quantile π of the type-specific effort distribution, \({\mu }^{\uppi }\) denotes the average outcome of individuals at quantile π across all types, and \(\mu\) is the population mean outcome. In the standardized distribution, the average value for individuals in all the quantiles is uniform, eliminating between quantile inequality, while preserving relative distance within-quantile. Hence, the ex-post measure of IOp, is given by:
$${\text{Ex-post}}\,{\text{IOp}}_{{{\text{EP}}}} = I({\overline{\text{Y}}}_{{{\text{EP}}}} )$$
(3.5)
where I is an inequality measure satisfying standard properties, including scale invariance. Notably, ex-post measures of IOp are less commonly used in empirical studies compared to ex-ante measures. The ex-ante IOp measure, proposed by Van de Gaer (1993), also known as the ‘weak equality of opportunity’ criterion, allows for some within-group inequality but requires that mean advantage levels are equal across types (Ferreira & Gignoux, 2011). This approach defines the opportunity set for each type by the type-specific outcome distribution. The value of the opportunity set for each type is determined by the mean outcome of that type. Consequently, in this framework, IOp is essentially the inequality between types. The counterfactual distribution, denoted as ȲEA, is derived by substituting individual outcomes with the mean outcome, expressed as:
$${\acute{y}}_{i}^{k} (\pi ) = \mu^{k} ,\quad \forall \,i = 1, \ldots N;\,\forall \,k = 1, \ldots K;\,\forall \,i,\pi \in [0,1]$$
(3.6)
where \({\mu }^{\text{k}}\) represents the mean outcome of type k. Thus, the ex-ante IOp is given by:
$${\text{Ex-ante}}\,{\text{IOp}}_{{{\text{EA}}}} = I({\acute{y}}_{{{\text{EA}}}} )$$
(3.7)
These measures offer different perspectives on IOp. The ex-post measure focuses on within-group inequality among individuals exerting the same effort while the ex-ante measure examine inequality between types based on average advantage levels.

3.3 Measurement Approaches for Estimating IOp: Data-Driven Machine Learning Techniques

Traditional methods for measuring IOp often face significant limitations, such as the researcher's discretion in selecting variables related to circumstances or effort. This subjective selection process can lead to the exclusion of relevant variables or the inclusion of too many, resulting in biased estimates. Omitting important variables can reduce the model's explanatory power and lead to downward-biased estimates, while including excessive variables may produce upward-biased estimates (Brunori et al., 2019; Ferreira & Gignoux, 2011; Hufe et al., 2017). Machine learning (ML) algorithms provide a promising alternative for measuring IOp by adhering to a data-driven approach. These algorithms minimize arbitrary and ad-hoc selections, thereby reducing the risk of bias. They also standardize the approach to balancing upward and downward biases (Brunori & Neidhöfer, 2021a2021b; Brunori et al., 2019; Hothorn & Zeileis, 2021; Hothorn et al., 2006; Salas-Rojo & Rodríguez, 2022).
In this chapter, the ML algorithms employed to estimate IOp using both ex-ante and ex-post approaches. Specifically, the conditional inference regression tree and conditional inference forest algorithms are used to estimate ex-ante IOp, and the transformation tree algorithm for ex-post IOp. These ML techniques help identify types and the degree of effort needed to calculate IOp through both approaches. It is important to note that the first step in both the ex-ante and ex-post approaches is the identification of types by dividing the sample into subgroups with identical circumstances (Brunori et al., 2023, p. 8).

3.3.1 Identification of Types: Conditional Inference Tree and Conditional Inference Forest

The identification of types based on individual circumstances is crucial for empirical analyses of Inequality of Opportunity (IOp) (Brunori et al., 2019; Ferreira & Gignoux, 2011). To facilitate this, data-driven machine learning algorithms such as conditional inference trees and conditional inference forests are utilized (Brunori & Neidhöfer, 2021a2021b). These techniques have been extensively employed in recent empirical studies (Brunori & Neidhöfer, 2021a, 2021b; Brunori et al., 2018, 2019; Lefranc & Kundu, 2020).
Conditional inference trees provide a visually intuitive representation of the structure of opportunities by recursively splitting the range of circumstances and identifying subgroups with similar characteristics. Conditional inference forests, an extension of the conditional inference tree created through bootstrapping, enhance the reliability of IOp estimates by aggregating multiple trees. A key feature of conditional inference forests is their ability to determine the relative importance of factors beyond the structure of the tree.
The algorithm for conditional inference trees involves two stages: (i) selection of the initial splitting circumstance and (ii) growth of the opportunity tree. In the first stage, a hypothesis test (typically a t-test) is performed before each split to assess whether equal opportunities exist within a given sample or subsample. If a split is not warranted (p-value > significance level α), the null hypothesis of equal opportunity is not rejected or it fails to reject null hypothesis. This occurs when the p-value associated with the circumstance being considered (C*) is greater than a pre-determined significance level (α). Conversely, if a split is justified (p-value < α), the selected circumstance (C*) becomes the splitting variable, and the algorithm continues to grow the tree. This process generates a hierarchical arrangement of circumstances, reflecting significant associations with the outcome. The terminal nodes represent the average predicted outcome for each type or group, while internal nodes and branches illustrate the predictor space divisions. The final prediction is the average outcome of each identified group or type.
The conditional inference forest algorithm generates multiple conditional inference trees and combines their results by averaging. The repetitive extraction of subsamples ensures the independence of each tree, resulting in diverse estimates for each subsample. Each tree follows the same two-step structure as the conditional inference tree.

3.3.2 Identification of Tranches of Effort Degrees: Transformation Trees

Conditional inference trees and forests primarily estimate the mean differences between types to compute IOp, overlooking higher moments of the within-type distribution and the importance of effort ranks. To address these limitations, the Transformation Tree (TrT) model is employed. This ex-post method uses an algorithm that estimates the outcome distribution within each type using Bernstein polynomial coefficients.1 The TrT model predicts the shape of the outcome distribution by partitioning the regressors’ space and identifying heterogeneity among the distributions defining each type. The process involves estimating the unconditional distribution and searching for binary splitting variables. Splitting is permitted if the resulting conditional distributions exhibit sufficient shape dissimilarity. The distribution shape is approximated using a linear combination of Bernstein basis polynomials, defined as:
$$B_{m} ({\text{y}},{\text{a}},{\text{z}}) = \sum\limits_{i = 0}^{m} {\beta_{i} b_{j,m} } ({\text{y}},{\text{a}},{\text{z}})$$
(3.8)
For a Bernstein polynomial of order m, m + 1 parameters define the shape of the objective distribution. The TrT algorithm involves: (i) setting a confidence level (α) and a polynomial order (m); (ii) estimating the unconditional distribution using Bernstein polynomial approximation; (iii) testing for parameter stability for all possible partitions based on the regressors and storing the p-values. If the parameters are stable (Bonferroni-adjusted p-value > α), the conditional distributions fall into the same terminal node. If unstable (p-value < α), a binary split occurs, and the process repeats until stability is achieved. The TrT generates groups or types, with Bernstein polynomials interpolating the shape of the distributions.
To estimate IOp, each individual's outcome value \(({\acute{y}}_{i}^{p,t} )\) is multiplied by the ratio of the population mean (\({\mu }^{p}\)) to the mean within the respective quantile \((\mu^{p,t} )\) to obtain the adjusted value. IOp is then measured using any inequality measure applied to the adjusted values. It is denoted as:
$${\acute{y}}_{i}^{p,t} = \mu^{p,t} /\mu^{p} \quad \forall i,p,t$$
(3.9)
IOp is estimated with any inequality measure applied over \({\acute{y}}_{i}^{p,t}\).

3.3.3 Decomposition of IOp Measure

The Shapley decomposition method, based on Shapley value in cooperative game theory (Shapley, 1953), estimates the relative contribution of various factors or circumstances to total income IOp also used in Chapter 2. Shapley values are order-independent and compute the value of a function considering all possible combinations of circumstances. The functional form of the index is:
$${\text{IOP}} = f(X_{11} \ldots X_{{N_{1} 1}} ,X_{12} \ldots X_{{N_{1} 2}} ,X_{13} \ldots X_{{N_{1} 3}} )$$
(3.10)
where \({X}_{ij}\) denotes the income of ith individual (\(i = 1, \ldots N_{j}\)), within the subgroup \(j = 1,2,3\).
Additive decomposition considers the impact of inequality within subgroups, between subgroups, ranking, and relative size within each subgroup. Shapley decomposition derives the marginal impact of each circumstance by measuring the difference in the inequality index value between the observed situation and a reference scenario where income does not change with the circumstance (Das & Biswas, 2022).

3.4 Data Sources and Variables

This paper utilizes data from the annual Periodic Labour Force Survey (PLFS) for the year 2022–23 to calculate income Inequality of Opportunity (IOp). The PLFS, conducted by the National Statistics Office (NSO), is a cross-sectional survey representative at both national and state levels. The analysis focuses on household per capita labour income (MPCI), used as a proxy for households’ income, which is computed by aggregating the monthly income of regular, self-employed, and casual wage workers within a household and dividing it by the household size. For casual wage workers, the PLFS provides weekly income data, which has been converted into monthly figures. Regular salaried and self-employed individuals’ income data are reported monthly. In addition, the monthly per capital expenditure (MPCE) also used in the analysis for comparison with MPCI.
The PLFS data include several circumstances variables: parents’ education levels (categorized as no education, primary, secondary, higher secondary, and graduate or above); parents’ occupations (classified into non-routine cognitive/high skilled, routine cognitive/medium skilled, non-routine manual/low skilled, and routine manual/unskilled); social group (scheduled caste (SC), scheduled tribes (ST), other backward classes (OBC), and general caste (GC)); gender (male and female); place of birth or region (north, east, central, northeast, south, and west); and location (rural and urban).
A sample of 80,155 individuals with parental information, selected from a total of 419,512 covered in the PLFS 2022–23, has been used for analysis. This sample includes only individuals with available parental background information and is restricted to working-age individuals (15–64 years old). Detailed procedures for sample and variable selection are outlined in Appendix 1 in the previous chapter.

3.5 Results and Discussion

3.5.1 Sample Profile

Table 3.1 provides an overview of the sample's demographic characteristics, region, gender, social group, and employment status. The majority of the sample resides in rural areas. Geographically, a quarter of the sample is from the central region, with one-fifth from the eastern and southern regions. The northern and northeastern regions have smaller representations, with around 14.5% and 3.7%, respectively. Males constitute 70% of the sample, while females make up 30%. Approximately 44% of individuals belong to OBC, followed by GC (28%), SC (20%), and ST (9%). Nearly half of the sample is involved in self-employment, about one-third in regular salaried positions, and one-fourth in casual wage work.
Table 3.1
Characteristics of sample individuals (in %)
  
%
Sector
Rural
62.5
Urban
37.5
Region
North
17.8
East
17.3
Central
21.6
North East
14.5
South
17.2
West
11.5
Social group
ST
15.9
SC
18.5
OBC
41.4
GC
24.3
Gender
Male
66.1
Female
33.9
Status of Employment
Self-employment
46.8
Regular salaried
31.0
Casual labour
22.2
Total
 
100.0
Source Authors calculations from PLFS, 2022–23
Table 3.2 outlines the educational qualifications and occupations of the sampled individuals. A larger proportion of them have secondary or higher secondary education, while a smaller percentage have graduate-level education or are illiterate. Regarding the occupations, nearly three-fourth of the sampled individuals engaged in non-routine manual low-skilled jobs, with about one-fifth in routine manual unskilled jobs. On the other hand, only few are engaged in non-routine cognitive high-skilled jobs (5.4%) or routine cognitive medium-skilled jobs (2.2%).
Table 3.2
Educational qualifications and occupations of the sample (in %)
  
%
Education levels
No education
26.3
Below secondary
49.0
Secondary/higher secondary
17.4
Graduate and above
7.3
Occupation by skill levels
Non-routine cognitive (high skilled)
5.4
Routine cognitive (medium skilled)
2.2
Non-routine manual (low skilled)
68.7
Routine manual (unskilled)
23.7
Total
 
100.0
Source Authors calculations from PLFS, 2022–23
Table 3.3 highlights notable differences in average MPCI of the sample households between urban and rural areas. The households in urban areas exhibit significantly higher average MPCI compared to rural areas, suggesting higher income levels in urban regions. Additionally, the average MPCI varies across social groups, with households belonging to the General Category (GC) having the highest average MPCI, followed by Other Backward Classes (OBC), Scheduled Castes (SC), and Scheduled Tribes (ST). The households in urban areas have relatively higher median value and standard deviation than others.
Table 3.3
Per capita household income (MPCI) of the sample
  
Mean
Median
SD
Sector
Rural
4030
3284
3039
Urban
6524
5000
5891
Social group
ST
4720
3536
3993
SC
4391
3600
3433
OBC
4691
3700
4118
GC
6028
4500
5790
Total
 
4965
3800
4498
Source Authors calculations from PLFS, 2022–23
Table 3.4 presents data on the association between individual’s education, occupation, geographical locations, with their household MPCI. The findings reveal substantial differences in average MPCI based on these factors. Individuals with graduate-level educational qualification have nearly two and a half times higher average household level MPCI compared to those without education. The average household MPCI is also significantly higher for individuals with graduate-level education compared to those with below secondary education or secondary and higher secondary education. Similarly, individuals engaged in high and medium-skilled jobs exhibit significantly higher average household MPCI compared to those involved in low-skilled and unskilled manual jobs. Furthermore, the median value and standard deviation among graduates, those engaged in highly skilled jobs, and sample individuals from the southern regions is relatively higher compared to those from others.
Table 3.4
MPCE (in Rs.) by education, occupation, and geographical regions
  
Mean
Median
SD
Educational levels
No education
3796
3200
2679
Below secondary
4457
3700
3117
Secondary/higher secondary
5864
4545
4805
Graduate and above
10,412
8000
9607
Occupation by skill levels
Non-routine cognitive (high skilled)
11,410
9000
10,509
Routine cognitive (medium skilled)
9503
7125
7771
Non-routine manual (low skilled)
4693
3750
3699
Routine manual (unskilled)
4083
3442
2925
Region
North
5259
4000
5136
East
3922
3100
3415
Central
3527
2800
3018
North East
5890
4786
4169
South
6463
5125
5495
West
5363
4144
4783
Total
 
4965
3800
4498
Source Authors calculations from PLFS, 2022–23
In sum, the sample characteristics provide valuable insights into the factors contributing to variations in household MPCI. Key factors influencing these differences include caste or social groups, sector (rural–urban), education, occupational status, and geographical locations. Categories associated with higher average MPCI values exhibit notable characteristics in terms of median value and standard deviation. Urban areas, southern regions, highly educated individuals, and those from the General and OBC social groups show significantly higher levels of variability, suggesting diverse economic conditions, and opportunities.

3.5.2 Ex-ante Inequality of Opportunity

This section presents a comparative analysis of ex-ante Inequality of Opportunity (IOp) results for MPCI (income henceforth) using three distinct approaches: the parametric approach, the conditional inference tree approach, and the conditional forest approach. The parametric approach employs ordinary least squares (OLS) regression to estimate IOp measures for income, modeling the relationship between outcome variables and various circumstances, while accounting for potential confounding factors. This regression analysis helps identify factors significantly contributing to IOp in terms of income.
Parametric Approach: The parametric approach, based on methodologies by Ferreira and Gignoux (2014) and Wendelspiess and Soloaga (2014), utilizes an OLS regression model where MPCI serves as the dependent variable. Sector, gender, caste, parental occupations, parental education, and regions are considered explanatory or circumstance variables. Using the estimated coefficients from the regression, a counterfactual distribution is derived, enabling the decomposition of MPCI inequality within the sample population. The Gini coefficient2 of the predicted income values from the regression provides an absolute measure of IOp, while a relative measure of IOp is obtained by dividing the absolute Gini measure of IOp by the overall Gini measure of inequality.
Table 3.5 shows the results of the parametric approach, with 1536 types representing groups of the sample population with similar circumstances. The overall income inequality is estimated at 0.392, indicating moderate inequality. The opportunity Gini coefficient from the parametric approach is 0.248, suggesting that differences in average income among the 1536 subgroups are lower than the overall Gini coefficient, indicating relatively smaller inequality within these subgroups. The relative IOp is estimated at 0.632, meaning that around 63% of overall income inequality can be attributed to circumstances such as sector, gender, caste, parental occupations, parental education, and regions.
Table 3.5
Parametric: ex-ante income IOp
 
Types/IOp
Types
1536
Overall Gini
0.392
Absolute Gini
0.248
Relative IOp
0.632
Source Authors’ calculations from PLFS, 2022–23
Conditional Inference Tree: As mentioned earlier, tree algorithms divide a dataset into mutually exclusive groups of observations based on sequential and hierarchical criteria. Once all the partitions are completed, the algorithm assigns the average value of the dependent variable to each observation (Salas-Rojo & Rodríguez, 2022). However, one of the main drawbacks of tree-based algorithms is their strong reliance on various factors, including the chosen alpha level, which determines the threshold for accepting or rejecting the null hypothesis. To address this issue, the Grid Search Cross-Validation method has been utilized to obtain an endogenously tuned alpha level (Table 3.9). The alpha level with the lowest root mean squared error (RMSE) is 0.07. The results at an alpha level of 0.07 are also compared with standard measures of alpha ranging from 1% to 5%, as detailed in Appendix “1”.
Table 3.6 presents the results based on a conditional inference tree using an endogenously chosen alpha level. The opportunity Gini coefficient for IOp is calculated to be 0.208, indicating that differences in average income among the 16 subgroups of the sample population are significantly less than the overall income inequality. The relative IOp using the conditional inference tree approach is estimated to be 0.532. This suggests that around 53% of the overall income inequality is attributable to various circumstances such as sector, gender, caste, parents’ occupations, parents’ education, and regions. Nevertheless, the relative IOp estimates obtained from the conditional inference tree method are comparatively lower than those obtained from the parametric method. This difference can be attributed to the machine learning (ML) algorithm used in the conditional inference tree method, which automatically generates a smaller number of types compared to the parametric estimates. These types correspond to distinct circumstances contributing to inequality and consequently provide a more robust measure of relative IOp compared to the parametric approach.
Table 3.6
Conditional inference tree: ex-ante income IOp
 
Types/IOp
Types
16
Overall Gini
0.392
Absolute Gini
0.208
Relative IOp
0.532
Source Authors’ calculations from PLFS, 2022–23
Additionally, the conditional inference tree graphically illustrates the key circumstances that influence income IOp, as shown in Fig. 3.1. The results show that parents’ education is the most important circumstance determining earnings or income IOp, as indicated by the initial node in Fig. 3.1. Individuals with parents having a graduate or higher level of education tend to have lower income IOp compared to those whose parents are educated below graduate level. For individuals whose parents’ education level is graduate or above, parents’ occupation becomes the second most important variable in determining income IOp, whereas for individuals whose parents are educated below the graduate level or not educated at all, region becomes the second most important variable, followed by sector as the third most important variable in determining income IOp.
Fig. 3.1
Conditional inference tree for MPCI. Note R: Rural; U: Urban; N: North; NE: North East; S: South; W:West:E: East; C: Central; Sec/HS: Secondary/Higher Secondary; GradAbv: Graduate and Above; NoEdu: Illiterate or Nor Formal Schooling; BS: Below Secondary; US: Unskilled; Low: Low Skilled; Med: Medium Skill; High: High Skilled; M: Male; F: Female.
Source Authors’ calculations from PLFS, 2022–23
In the case of individuals with parents educated upto the graduate or above level and parents in medium or high-skilled jobs, region becomes the third most important variable in determining income IOp. Those residing in the north, south, and west of India have lower income IOp compared to individuals in the east, central, and northeast India. In both regional groups, individuals in urban areas have lower income IOp compared to their rural counterparts. In the case of individuals with graduate or above-educated parents in low and unskilled jobs, sector becomes the third most important variable in determining income IOp. Individuals in urban areas have lower income IOp compared to those in rural areas. Additionally, in urban areas, individuals from the north, northeast, and south have lower income IOp compared to individuals in the east, central, and western India. In rural areas, individuals from the northeast parts of India have lower income IOp compared to those from all other regions.
Conditional Inference Forests: To address the sensitivity or high variance inherent in the conditional inference trees approach, a more robust approach of conditional inference forests has been proposed by Hothorn et al. (2006) and Brunori et al. (2023). Conditional inference forests employ bootstrapping within the ML framework. In this approach, multiple conditional inference trees are generated, and the final prediction is obtained by averaging the predictions of all the trees. The use of subsamples ensures that each tree provides an independent estimate (Salas-Rojo & Rodríguez, 2022, p. 36). Similar to the conditional inference tree, an endogenous level of alpha has been obtained by Grid Search Cross-Validation method to determine the appropriate combination of the number of trees for the analysis. The level of alpha with the lowest RMSE is 0.06. The results obtained at 0.06% level are also compared with standard measures of alpha ranging from 1% to 5% (see Table 3.10 for detail).
Table 3.7 shows the results based on conditional inference forest using an endogenously chosen alpha level. The opportunity Gini coefficient for income IOp is estimated to be 0.190. This means that the difference in average income among the 16 subgroups of the sample population is significantly less than the overall income inequality, but slightly higher compared to the result obtained from the conditional inference tree. The relative IOp measured using the conditional inference forest approach is estimated to be 0.486. This suggests that around 48% of the overall income inequality is attributed to various circumstances such as sector, gender, caste, parents’ occupations, parents’ education, and regions. This relative IOp estimate obtained from the conditional inference forest method is marginally lower  than those obtained from the conditional inference tree method. This difference can be credited to the bootstrapping within the ML algorithm used in the conditional inference forest method. This technique helps address the sensitivity or high variance inherent in the conditional inference tree. Consequently, this method provides a more robust measure of relative IOp compared to the conditional inference tree and parametric approach.
Table 3.7
Conditional inference forest: ex-ante IOp Results
 
Types/IOp
Types
16
Overall Gini
0.392
Absolute Gini
0.190
Relative IOp
0.486
Source Authors’ calculations from PLFS, 2022–23
Ex-Ante Shapley Value Decomposition: The ex-ante decomposition exercise presented in Fig. 3.2 provides insights into the importance of different circumstance variables in contributing to income IOp. The analysis reveals the relative significance of each factor in explaining the observed variations in income IOp. Among the circumstance variables, the education of parents emerges as the most important factor, accounting for the largest share of income IOp at 31.7%. This suggests that the educational background and qualification of parents have a substantial impact on income IOp. The geographical location or region of individuals, with a contribution of 27.3%, is the second key factor, indicating that regional disparities in economic development and access to resources can significantly impact income levels. The sector (20.8%) and parents’ occupation (19.6%) also play a vital role in shaping income IOp. This indicates that disparities in employment opportunities between rural and urban areas, as well as the occupation of parents, play an important role, reflecting the influence of employment opportunities and earnings across occupations based on skill level on income disparities. Social groups, although relatively less influential, also contribute to income IOp.
Fig. 3.2
Decomposition of factors contributing to ex-ante IOp (in %).
Source Authors' calculations from PLFS, 2022–23
Fig. 3.3
Log-normal, Kernel-Gaussian, and Bernstein polynomials distribution of MPCI.
Source Authors' calculations from PLFS, 2022–23

3.5.3 Ex-post Inequality of Opportunity

As previously discussed, the estimation of IOp using conventional parametric, non-parametric, and data-driven conditional inference tree ML techniques is primarily based on the mean difference between types. These methods do not take into account the higher moments of the within-type distribution. To address this limitation, different methods are employed to approximate the distribution of the outcome variable. These methods include the log-normal, kernel-Gaussian, and Bernstein polynomials, as shown in Figs. 3.3 and 3.7.
Among these methods, the Bernstein polynomials are found to be more flexible in predicting the distribution of outcomes within each type. These polynomials enable a more accurate representation of the underlying distribution, capturing higher moments beyond the mean difference. By utilizing the Bernstein polynomials, the degree of effort or ex-post IOp can be measured following the approach proposed by Brunori and Neidhöfer (2021a, 2021b). The ex-ante approach focuses on the mean of each type, while the ex-post approach examines the distribution functions of each type. Instead of examining statistically significant differences between means, the ex-post approach identifies the most statistically significant differences between the full expected conditional distribution functions.
Transformation Tree: As previously mentioned, conditional inference trees and conditional inference forests select partitions based on differences in a single statistic of interest within each type, specifically the mean of the conditional outcome distribution (Brunori et al., 2023). In contrast, the transformation tree (TrT) approach utilizes splits or partitions based on differences across multiple functions of distribution, including variance, skewness, and kurtosis (Hothorn & Zeileis, 2021). In this paper, the TrT approach is employed to analyse the effects of different variables on the conditional outcome (MPCI or income) distribution. It reveals the configuration of variables that strongly influence the distribution and provides insights into specific conditional outcome distributions (Hothorn, 2018).
The TrT demonstrates the distributions obtained after applying the Bernstein polynomial transformation. For this study, the Bernstein polynomial of order 5 is used to transform the outcome variables, and a transformation tree model is employed to predict the types for each data point for ex-post income IOp analysis. The model predicts the income quantile position of each individual within each type to determine the ‘degree of effort’. Based on these income quantile positions, the mean outcome value for each quantile, as well as the population mean, are determined. The individual's outcome value is adjusted using the ratio of the population mean to the quantile mean, enabling the measurement of ex-post IOp. As shown in Table 3.6, the number of types generated by the transformation trees is 16. The overall Gini inequality in the ex-post approach is 0.392, indicating a moderate level of inequality among the sample population. However, IOp measures for income in the ex-post approach yield relatively smaller values compared to the ex-ante approach. The estimated opportunity Gini coefficient is 0.133, and the relative IOp value is 0.339. This indicates that around 34% of the overall income inequality is attributed to differences in the degree of effort. These results suggest that the ex-post IOp measures, which consider the entire distribution functions obtained through the transformation trees or the contribution of efforts in explaining the IOp, are lower than the measures based on mean differences in the ex-ante approaches or the contribution of circumstances (Table 3.8).
Table 3.8
Transformation tree: ex-post IOp results
 
Types/IOp
Types
16
Overall Gini
0.392
Absolute Gini
0.133
Relative IOp
0.339
Source Authors’ calculations from PLFS, 2022–23
The transformation tree, depicted in Fig. 3.4, highlights the significant factors influencing ex-post IOp. It reveals that parents’ education emerges as the most important factor, exhibiting statistically significant variations in average income between the two groups. The first group consists of individuals whose parents are educated at the graduate level or above. The second group comprises individuals whose parents are educated below the graduate level, including up to secondary or higher secondary, below secondary, and no formal education. The first group, with graduate or above-educated parents, is further subdivided by parents’ occupation, distinguishing those with parents in medium or high-skilled jobs from those with parents in low or unskilled jobs. For individuals with parents in medium and high-skilled jobs, a further split is made into two broad regions: North, Northeast, South, West (NNESW), and East and Central (EC), which are further subdivided into rural and urban areas.
Fig. 3.4
Transformation tree for MPCI. Note R: Rural; U: Urban; N: North; NE: North East; S: South; W: West:E: East; C: Central; Sec/HS: Secondary/Higher Secondary; GradAbv: Graduate and Above; NoEdu: Illiterate or Nor Formal Schooling; BS: Below Secondary; US: Unskilled; Low: Low Skilled; Med: Medium Skill; High: High Skilled; M: Male; F: Female.
Source Authors’ calculations from PLFS, 2022–23
The second group, comprising individuals whose parents have an education below the graduate level, is subdivided into two broad regions: North, Northeast, South, West (NNESW) and East and Central (EC). For those located in the NNESW regions, an additional division is made based on the sector (rural–urban). Urban individuals are further subdivided based on their parents’ occupations into those whose parents have medium and high-skilled jobs and those whose parents have low and unskilled jobs. Rural areas in the NNESW regions are split again by region into two groups: one consisting of the Northeast and South, and the other consisting of the North and West. Similarly, individuals in the EC regions are divided based on rural and urban areas. In rural areas, there is an additional split based on social group: one group comprises individuals from the general caste, while the other consists of individuals from the ST, SC, and OBC categories. In urban areas, a further division is made based on parents’ education, with one group comprising individuals whose parents have an education up to the secondary or higher secondary level and the other comprising individuals whose parents have an education below the secondary level or no formal schooling.
The final nodes of the transformation tree confirm the results of the conditional inference tree. These results indicate the lowest income distribution among individuals whose parents are educated below the graduate level, reside in rural areas of the central and eastern regions, and belong to the ST, SC, or OBC caste group. The highest income distribution is observed among individuals whose parents are educated to the graduate level or above, are involved in high and medium-skilled occupations, and reside in urban areas of the north, northeast, south, and western regions of India. Similar results can also be seen from the Expected Conditional Distribution Function (ECDF) as depicted in Fig. 3.5. The lowest average income (MPCI) is clearly seen at the leftmost node (Node 5), while the highest is at the rightmost part of the figure (Node 31), representing the two groups discussed above.
Fig. 3.5
Expected cumulative distribution functions for MPCI.
Source Authors' calculations from PLFS, 2022–23
Ex-Post Shapley Value Decompositions: The final step in the ex-post analysis, similar to the ex-ante approach, is to assess the relative importance of individual circumstance variables using the Shapley value decomposition, as shown in Fig. 3.6. The results are quite similar to those obtained in the ex-ante analysis. Parents’ education (46.3%) emerges as the most influential factor, indicating that different levels of parents’ education significantly contribute to income IOp. This is closely followed by region (27.7%) and sector (18.9%), which also demonstrate considerable importance in explaining income IOp. However, parents’ occupation and social groups have a minimal role in explaining income IOp, with their contributions being 5.6% and 1.5% respectively.
Fig. 3.6
Decomposition of factors contributing to ex-post IOp (in %).
Source Authors' calculations from PLFS, 2022–23

3.5.4 Regional Analysis

Labour Income IOp Analysis of labour income IOp, measured by the Gini, shows significant variations across Indian states. Jharkhand has the highest income inequality with an IOp of 65%, while Himachal Pradesh has the lowest at 35% (Fig. 3.2 and Table 3.11).
High-income inequality is particularly pronounced in the eastern (Jharkhand and Odisha) and central (Chhattisgarh and Madhya Pradesh) regions. In the eastern region, Jharkhand leads with an income IOp of 65%, followed by Odisha at 58%. West Bengal stands at 48%, and Bihar is at 40%. In central India, Chhattisgarh shows a high income IOp of 59%, closely followed by Madhya Pradesh at 58%. Uttarakhand has an income IOp of 44%, and Uttar Pradesh stands at 42%. In Northern India, Delhi and Haryana both have high income IOp of 52%, while Jammu and Kashmir and Punjab each have 50%. Rajasthan has an income IOp of 46%, and Himachal Pradesh has the lowest in the north at 35%. In Western India, Gujarat has the highest income IOp at 50%, followed by Maharashtra at 46%. In Southern India, Telangana exhibits the highest income IOp at 52%, followed by Karnataka at 49%, Andhra Pradesh at 41%, Tamil Nadu at 40%, and Kerala at 38%.
These patterns align with the income IOp trends discussed in Chapter 2, highlighting that both less developed states in the eastern and central regions and more developed states in the southern and western regions face challenges related to unequal resource distribution, concentrated benefits of economic growth among a few, and difficulties in implementing effective welfare programs (Map 3.1).
Map 3.1
Regional income IOp (Gini).
Source Authors' calculations from PLFS, 2022–23
The factors contributing to income IOp vary significantly across different states in India (Table 3.12). Two of the most important factors driving unequal income opportunities are individual’s parental occupation and where they live—whether in a rural or urban area.
  • Parental Occupation: In states like Assam, Odisha, Jharkhand, Telangana, Maharashtra, Karnataka, Uttar Pradesh, Gujarat, and Himachal Pradesh, parental occupation plays a major role in determining income IOp. In these states, individuals born into families where parents have lower skilled or informal jobs tend to have fewer opportunities to earn higher incomes, compared to those whose parents hold higher-status jobs. This creates a persistent cycle of inequal opportunity in earnings (well-paid occupations) that limits social mobility.
  • Location of Residence: In other states, such as Rajasthan, Andhra Pradesh, Tamil Nadu, West Bengal, Haryana, Kerala, Bihar, Punjab, Uttarakhand, Jammu and Kashmir, and Chhattisgarh, whether a person is born in a rural or urban area is a key factor explaining income IOp. Rural areas often have fewer job opportunities, lower access to education, and weaker infrastructure, leading to unequal chances of improving one’s income or well-paid occupations compared to people born in urban centres with better access to resources.
  • Gender: In states like Delhi and Madhya Pradesh, gender is the leading factor contributing to income IOp. In these states, men often have more access to well-paying jobs, career advancement, and education compared to women, creating a significant gender gap in income IOp.
This variation in factors across states highlights the complex nature of income inequality in India and the need for targeted policy solutions that address these specific drivers. The findings emphasize the role of social background, location, and gender in shaping an individual's economic future.

3.6 Summary and Conclusion

This study provides both ex-ante and ex-post estimates of income inequality of opportunity (IOp) at the national level. It is also the first attempt to determine and represent the types and structure of opportunities in Indian society through the use of conditional inference trees, conditional inference forests, and transformation trees. These tree-based methodologies allow for graphical representations of the opportunities provided by society, making the results easily communicable to policymakers and other stakeholders. The ex-ante estimate of income IOp is relatively higher than the ex-post estimate, highlighting differences in interpretation and understanding of IOp within society. Using the ex-ante approach, approximately 48–63% of the total income-based inequality of opportunity can be attributed to differences in circumstances. In contrast, the ex-post method suggests that around 34% of the total income IOp is explained by differences within-tranche or efforts.
The tree-based analysis reveals that parents’ occupation, areas of residence (rural or urban), and region (geographical location) are the most important variables in determining income IOp in Indian society, followed by parental education and social group. The ex-ante and ex-post Shapley decomposition exercises further confirm that parents’ occupation, geographic location, sector (rural–urban areas), and parents’ education are the most significant circumstances contributing to income IOp. In particular, individuals in the central and eastern regions, those residing in rural areas, those whose parents are employed in low-skilled and unskilled occupations, those with below secondary education or no formal education, and those belonging to marginalized social groups exhibit significantly lower average incomes. The regional analysis shows that income IOp varies widely across Indian states, with significant differences even within developed and underdeveloped regions.
This again highlights the urgent need for targeted regional-state level development policies that address the needs of marginalized groups to foster a more equitable society and reduce overall income inequality in India. The study's findings underscore the importance of targeted interventions to address income IOp and suggest that policies should prioritize improving educational opportunities, job prospects, and living conditions for disadvantaged groups. By doing so, India can make significant strides towards reducing income inequality and promoting social equity.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Anhänge

Appendices

Appendix 1: Grid Search Cross-Validation (CV) Process for Conditional Inference Tree and Conditional Inference Forest

The Grid Search Cross-Validation (CV) process involves splitting the dataset into training and test sets to evaluate model performance. This process tests various combinations of hyperparameters—specifically, the minimum number of observations required to perform a split (min-split) and alpha values. The goal is to identify the combination that results in the lowest root mean squared error (RMSE) on the test set, where RMSE measures the accuracy of the model's predictions.
Conditional Inference Tree Model: For the Conditional Inference Tree model with MPCI () as the dependent variable, the Grid Search CV was conducted to find optimal hyperparameters. After testing various combinations, it was found that an alpha value of 0.07 and a min-split value of 10,000 resulted in the lowest RMSE. To assess the robustness of the chosen alpha value, we compared the results with alpha values of 0.01 and 0.05, as detailed in Tables 3.9 and 3.10. This comparison follows the methodology outlined by Salas-Rojo and Rodríguez (2022).
Table 3.9
Ctree results for MPCI
Alpha
Types
Overall inequality (Gini)
Absolute Gini
IoP Gini
0.07
14
0.392
0.208
0.532
0.01
14
0.392
0.208
0.532
0.05
14
0.392
0.208
0.532
Source Authors calculations from PLFS, 2022–23
Table 3.10
Cforest results for MPCI
Alpha
Types
Overall inequality (Gini)
Absolute Gini
IoP Gini
0.06
16
0.392
0.190
0.486
0.01
16
0.392
0.190
0.486
0.05
16
0.392
0.190
0.486
Source Authors calculations from PLFS, 2022–23
Conditional Inference Forest Model: Similarly, for the Conditional Inference Forest model, the Grid Search CV process identified an alpha value of 0.06 and a tree count of 200 as yielding the lowest RMSE. To verify the robustness of this alpha value, we compared the results with alpha values of 0.01 and 0.05, as shown in Tables 3.11 and 3.12.
These tables display the results for each combination of alpha values, showing the overall inequality, absolute Gini, and inequality of opportunity (IoP) Gini for the Conditional Inference Tree and Forest models. The consistency in results across different alpha values indicates the robustness of the chosen hyperparameters.

Regional IOp Tables

See Tables 3.11 and 3.12.
Table 3.11
Gini IOp measures for MPCI
State name
State code
Overall inequality Gini MPCI
Absolute IOp Gini MPCI
Gini MPCI relative IOp
Jharkhand
20
0.41
0.27
0.65
Chhattisgarh
22
0.44
0.26
0.59
Madhya Pradesh
23
0.38
0.22
0.58
Odisha
21
0.42
0.24
0.58
Delhi
7
0.41
0.21
0.52
Telangana
36
0.35
0.18
0.52
Haryana
6
0.36
0.18
0.51
Jammu & Kashmir
1
0.37
0.18
0.50
Gujarat
24
0.34
0.17
0.50
Punjab
3
0.38
0.19
0.50
Assam
18
0.31
0.16
0.50
Karnataka
29
0.36
0.17
0.49
West Bengal
19
0.34
0.16
0.48
Maharashtra
27
0.44
0.21
0.47
Rajasthan
8
0.42
0.19
0.46
Uttarakhand
5
0.35
0.15
0.44
Uttar Pradesh
9
0.41
0.17
0.42
Andhra Pradesh
28
0.37
0.15
0.41
Bihar
10
0.31
0.12
0.40
Tamil Nadu
33
0.35
0.14
0.40
Kerala
32
0.38
0.15
0.38
Himachal Pradesh
2
0.47
0.16
0.35
Source Authors calculations from PLFS, 2022–23
Table 3.12
Variable importance for each state
State
Region
Parents occupation
Sector
Gender
Social group
Parents education
Major var
Delhi
North
34.3
19.4
46.4
Gender
Madhya Pradesh
Central
7.5
11.1
42.5
31.6
7.3
Gender
Assam
North East
91.3
4.2
4.4
Parents occupation
Odisha
East
78.7
4.4
7.5
2.5
7
Parents occupation
Jharkhand
East
67.3
6.8
26.1
Parents occupations
Telangana
South
66.5
11.6
8.8
13.2
Parents occupation
Maharashtra
West
64
1.1
10.2
3.5
21.3
Parents occupations
Karnataka
South
55.2
32.1
4.5
8
0.3
Parents occupations
Uttar Pradesh
Central
54.8
6.5
12.7
22.4
3.6
Parents occupations
Gujarat
West
53.9
17.6
15.1
2.7
10.7
Parents occupations
Himachal Pradesh
North
47.8
16.8
21.8
12.7
0.9
Parents occupations
Rajasthan
North
27.4
41.8
11.9
11.8
7.1
Sector
Andhra Pradesh
South
19.4
35.2
6.9
30.2
8.3
Sector
Tamil Nadu
South
19.1
48.9
4
7
21
Sector
West Bengal
East
17.2
69.5
2.5
7.3
3.6
Sector
Haryana
North
16.7
61.6
6.6
3.3
11.8
Sector
Kerala
South
13.3
42.9
20.1
22.5
1.1
Sector
Bihar
East
9.7
61.4
15
4.7
9.2
Sector
Punjab
North
9.3
56.4
3.8
15.5
15
Sector
Uttarakhand
Central
9.2
61.3
5.2
8.9
15.3
Sector
Jammu & Kashmir
North
7.3
67
5.8
10.8
9.1
Sector
Chhattisgarh
Central
4.6
71
8.8
15.7
Sector
Source Authors calculations from PLFS, 2022–23

Density Plots for Terminal Nodes

See Fig. 3.7.
Fig. 3.7
Plots for MPCI.
Source Authors calculations from PLFS, 2022–23
Fußnoten
1
It is widely used in computer graphics to model smooth curves (Farouki, 2012). It outperforms competitors such as kernel estimators, in approximating distribution function (Leblanc, 2012).
 
2
Unlike Chapter 2, which uses the Mean Log Deviation (MLD) for measuring IOp, this chapter uses Gini coefficient to calculate ex-ante and ex-post IOp, for this reason the ex-ante IOp measures for MPCI might not match in Chapter 2 and this chapter.
 
Literatur
Zurück zum Zitat Arneson, R. (1989). Equality and equal opportunity for welfare. Philosophical Studies, 56(1), 77–93.CrossRef Arneson, R. (1989). Equality and equal opportunity for welfare. Philosophical Studies, 56(1), 77–93.CrossRef
Zurück zum Zitat Bourguignon, F. (2004). The poverty-growth-inequality triangle (Indian Council for Research on International Economic Relations Working Paper, 131, 35). Bourguignon, F. (2004). The poverty-growth-inequality triangle (Indian Council for Research on International Economic Relations Working Paper, 131, 35).
Zurück zum Zitat Brunori, P., & Neidhöfer, G. (2021a). Inequality of opportunity in comparative perspective: Recent advances and challenges. In Handbook of income distribution (Vol. 3B, pp. 1393–1479). Elsevier. Brunori, P., & Neidhöfer, G. (2021a). Inequality of opportunity in comparative perspective: Recent advances and challenges. In Handbook of income distribution (Vol. 3B, pp. 1393–1479). Elsevier.
Zurück zum Zitat Chancel, L., Piketty, T., Saez, E., & Zucman, G. (2022). World inequality report 2022. UNDP, World Inequality Lab. Chancel, L., Piketty, T., Saez, E., & Zucman, G. (2022). World inequality report 2022. UNDP, World Inequality Lab.
Zurück zum Zitat Cohen, G. A. (1989). On the currency of egalitarian justice. Ethics, 99(4), 906–944.CrossRef Cohen, G. A. (1989). On the currency of egalitarian justice. Ethics, 99(4), 906–944.CrossRef
Zurück zum Zitat Das, P., & Biswas, S. (2022). Social identity, gender and unequal opportunity of earning in urban India: 2017–2018 to 2019–2020. Indian Journal of Labour Economics, 65(1), 39–57. Das, P., & Biswas, S. (2022). Social identity, gender and unequal opportunity of earning in urban India: 2017–2018 to 2019–2020. Indian Journal of Labour Economics, 65(1), 39–57.
Zurück zum Zitat Dworkin, R. (1981a). What is equality? Part 2: Equality of resources. Philosophy & Public Affairs, 10(4), 283–345. Dworkin, R. (1981a). What is equality? Part 2: Equality of resources. Philosophy & Public Affairs, 10(4), 283–345.
Zurück zum Zitat Dworkin, R. (1981b). What is equality? Part 1: Equality of welfare. Philosophy & Public Affairs, 10(3), 185–246. Dworkin, R. (1981b). What is equality? Part 1: Equality of welfare. Philosophy & Public Affairs, 10(3), 185–246.
Zurück zum Zitat Farouki, R. T. (2012). The Bernstein polynomial basis: A centennial retrospective. Computer Aided Geometric Design, 29(6), 379–419. Farouki, R. T. (2012). The Bernstein polynomial basis: A centennial retrospective. Computer Aided Geometric Design, 29(6), 379–419.
Zurück zum Zitat Ferreira, F., & Peragine, V. (2015). Equality of opportunity: Theory and evidence (IZA Discussion Papers, No. 8994). Ferreira, F., & Peragine, V. (2015). Equality of opportunity: Theory and evidence (IZA Discussion Papers, No. 8994).
Zurück zum Zitat Fleurbaey, M. (1995). Equal opportunity or equal social outcome? Economics and Philosophy, 11(1), 25–55. Fleurbaey, M. (1995). Equal opportunity or equal social outcome? Economics and Philosophy, 11(1), 25–55.
Zurück zum Zitat Fleurbaey, M. (2008). Fairness, responsibility, and welfare. Oxford University Press.CrossRef Fleurbaey, M. (2008). Fairness, responsibility, and welfare. Oxford University Press.CrossRef
Zurück zum Zitat Fleurbaey, M., & Pergaine, V. (2013). Ex post inequalities and ex ante inequalities. In Justice, political liberalism, and utilitarianism: Themes from Harsanyi and Rawls (pp. 59–77). Cambridge University Press. Fleurbaey, M., & Pergaine, V. (2013). Ex post inequalities and ex ante inequalities. In Justice, political liberalism, and utilitarianism: Themes from Harsanyi and Rawls (pp. 59–77). Cambridge University Press.
Zurück zum Zitat Hufe, P., Peichl, A., Roemer, J., & Ungerer, M. (2017). Inequality of income acquisition: the role of childhood circumstances. Social Choice and Welfare, 49, 499–544. Hufe, P., Peichl, A., Roemer, J., & Ungerer, M. (2017). Inequality of income acquisition: the role of childhood circumstances. Social Choice and Welfare, 49, 499–544.
Zurück zum Zitat Institute for Human Development. (2014). India Labour and Employment Report, 2014: Workers in the era of globalization. Academic Foundation. Institute for Human Development. (2014). India Labour and Employment Report, 2014: Workers in the era of globalization. Academic Foundation.
Zurück zum Zitat Kuznets, S. (1955). Economic growth and income inequality. The American Economic Review, 45(1), 1–28. Kuznets, S. (1955). Economic growth and income inequality. The American Economic Review, 45(1), 1–28.
Zurück zum Zitat Leblanc, A. (2012). On estimating distribution functions using Bernstein polynomials. Annals of the Institute of Statistical Mathematics, 64, 919–943. Leblanc, A. (2012). On estimating distribution functions using Bernstein polynomials. Annals of the Institute of Statistical Mathematics, 64, 919–943.
Zurück zum Zitat Lefranc, A., & Kundu, S. (2020). Machine learning approaches to inequality of opportunity measurement: A comparative study. Applied Economics, 52(16), 1723–1741. Lefranc, A., & Kundu, S. (2020). Machine learning approaches to inequality of opportunity measurement: A comparative study. Applied Economics, 52(16), 1723–1741.
Zurück zum Zitat Motiram, S. (2018). Inequality of opportunity in India: Concepts, measurement and empirics. Indian Journal of Human Development, 12(2), 236–247.CrossRef Motiram, S. (2018). Inequality of opportunity in India: Concepts, measurement and empirics. Indian Journal of Human Development, 12(2), 236–247.CrossRef
Zurück zum Zitat Piketty, T. (2011). On the long-run evolution of inheritance: France 1820–2050. The Quarterly Journal of Economics, 126(3), 1071–1131.CrossRef Piketty, T. (2011). On the long-run evolution of inheritance: France 1820–2050. The Quarterly Journal of Economics, 126(3), 1071–1131.CrossRef
Zurück zum Zitat Plassot, M., Ramos, X., & Van de Gaer, D. (2022). The ex-ante and ex-post measurement of inequality of opportunity: A normative framework. Review of Income and Wealth, 68(1), 4–31. Plassot, M., Ramos, X., & Van de Gaer, D. (2022). The ex-ante and ex-post measurement of inequality of opportunity: A normative framework. Review of Income and Wealth, 68(1), 4–31.
Zurück zum Zitat Ramos, X., & Van de Gaer, D. (2016). Approaches to inequality of opportunity: Principles, measures and evidence. Journal of Economic Surveys, 30(5), 855–883.CrossRef Ramos, X., & Van de Gaer, D. (2016). Approaches to inequality of opportunity: Principles, measures and evidence. Journal of Economic Surveys, 30(5), 855–883.CrossRef
Zurück zum Zitat Ramos, X., & Van de Gaer, D. (2021). Is inequality of opportunity robust to the measurement approach?. Review of Income and Wealth, 67(1), 18–36. Ramos, X., & Van de Gaer, D. (2021). Is inequality of opportunity robust to the measurement approach?. Review of Income and Wealth, 67(1), 18–36.
Zurück zum Zitat Rawls, J. (1958b). Justice as fairness. The Philosophical Review, 67(2), 164–194.CrossRef Rawls, J. (1958b). Justice as fairness. The Philosophical Review, 67(2), 164–194.CrossRef
Zurück zum Zitat Roemer, J. E. (1993). A pragmatic theory of responsibility for the egalitarian planner. Philosophy & Public Affairs, 22(2), 146–166. Roemer, J. E. (1993). A pragmatic theory of responsibility for the egalitarian planner. Philosophy & Public Affairs, 22(2), 146–166.
Zurück zum Zitat Roemer, J. E. (2002). Equality of opportunity: A progress report. Social Choice and Welfare, 19(2), 455–471.CrossRef Roemer, J. E. (2002). Equality of opportunity: A progress report. Social Choice and Welfare, 19(2), 455–471.CrossRef
Zurück zum Zitat Salas-Rojo, P., & Rodríguez, J. G. (2022). Inheritances and wealth inequality: A machine learning approach. The Journal of Economic Inequality, 20(1), 27–51.CrossRef Salas-Rojo, P., & Rodríguez, J. G. (2022). Inheritances and wealth inequality: A machine learning approach. The Journal of Economic Inequality, 20(1), 27–51.CrossRef
Zurück zum Zitat Sen, A. (1980). Equality of what? In Tanner Lectures on human values (Vol. 1). Cambridge University Press. Sen, A. (1980). Equality of what? In Tanner Lectures on human values (Vol. 1). Cambridge University Press.
Zurück zum Zitat Singh, A. (2012). Inequality of opportunity in earnings and consumption expenditure: The case of Indian men. Review of Income and Wealth, 58(1), 79–106.CrossRef Singh, A. (2012). Inequality of opportunity in earnings and consumption expenditure: The case of Indian men. Review of Income and Wealth, 58(1), 79–106.CrossRef
Zurück zum Zitat Shapley, L. S. (1953). A value for n-person games. Shapley, L. S. (1953). A value for n-person games.
Zurück zum Zitat Van De Gaer, D. (1993). Equality of opportunity and investment in human capital. Katholieke Universiteit Leuven. Van De Gaer, D. (1993). Equality of opportunity and investment in human capital. Katholieke Universiteit Leuven.
Zurück zum Zitat Weisskopf, T. E. (2011). Why worry about inequality in the booming Indian economy? Economic and Political Weekly, 46(47), 41–51. Weisskopf, T. E. (2011). Why worry about inequality in the booming Indian economy? Economic and Political Weekly, 46(47), 41–51.
Zurück zum Zitat Wendelspiess, F., & Soloaga, I. (2014). Iop: Estimating ex-ante inequality of opportunity. The Stata Journal, 14(4), 830–846.CrossRef Wendelspiess, F., & Soloaga, I. (2014). Iop: Estimating ex-ante inequality of opportunity. The Stata Journal, 14(4), 830–846.CrossRef
Metadaten
Titel
Decomposition of Inequality of Opportunity
verfasst von
Balwant Singh Mehta
Ravi Srivastava
Siddharth Dhote
Copyright-Jahr
2025
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-96-2544-4_3