20092021  Original Paper Open Access
Lower and upper bound estimates of inequality of opportunity for emerging economies
 Journal:
 Social Choice and Welfare
Important notes
We thank John E. Roemer (editor) and two anonymous referees for many insightful comments on earlier drafts. This paper benefited from discussions with Paolo Brunori, Daniel Mahler, and Valentin Lang. Furthermore, we are grateful to seminar and conference audiences in Canazei, Luxembourg and Munich. We gratefully acknowledge funding from Deutsche Forschungsgemeinschaft (DFG) through NORFACE project “IMCHILD: The impact of childhood circumstances on individual outcomes over the lifecourse” (PE 1675/51).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
Equality of opportunity (EOp) is an ideal of distributive justice that garners widespread public support and is plausibly related to macroeconomic indicators of development (Marrero and Rodríguez
2013; Ferreira et al.
2018; Aiyar and Ebeke
2019; Cappelen et al.
2007; Alesina et al.
2018). However, limitations in the underlying data sources lead to both upward and downward biased estimates of inequality of opportunity (IOp). Both biases are potentially large in emerging countries where the data quality is arguably worse than in industrialized economies. However, it is not clear ex ante which of the two biases prevails and whether IOp estimates rather tend to be downward or upward biased. In this paper, we address this uncertainty by constructing lower bound (LB) and upper bound (UB) estimates of IOp for twelve emerging economies and compare them to estimates from the conventional approach.
EOp distinguishes ethically justifiable (fair) inequalities from unjustifiable (unfair) inequalities using the concepts of circumstances and effort.
^{1} Circumstances are defined as all factors that are not under the control of the individual—for instance, the biological sex, the parental background and the birthplace. To the contrary, working hours and educational decisions are under the (partial) control of individuals and are therefore characterized as efforts. Opportunity egalitarians consider inequalities based on exogenous circumstances as unfair, while inequalities resulting from effort exertion are deemed fair sources of inequality (among others Cohen
1989; Arneson
1989).
Advertisement
This distinction is not only relevant from a normative perspective but provides important insights for the patterns and drivers of economic development (Marrero and Rodríguez
2013; Peragine et al.
2014; Ferreira et al.
2018; Neidhöfer et al.
2018). For instance, a leveled playing field fosters human capital accumulation by providing incentives for skill acquisition (Mejía and StPierre
2008). Furthermore, circumstancebased variation in life outcomes reflects horizontal inequality and segregation, both of which are important drivers of social tensions and conflict (Rohner
2011).
What we call the “standard approach” (S) towards IOp estimation in this paper, constructs a counterfactual distribution of life outcomes from a linear prediction using all circumstance information observable by the econometrician. In line with the opportunityegalitarian doctrine, inequality in this counterfactual distribution is considered “unfair” since it only varies with immutable circumstance characteristics. Due to limitations in the underlying data sources, this conventional method can lead to both upward and downward biased empirical measurements of IOp. First, due to the partial observability of circumstances, standard IOp estimates tend to be
downward biased (Balcázar
2015; Hufe et al.
2017). The downward bias may be particularly pronounced in countries that lack household surveys combining information on the outcome of interest with rich information on individual characteristics. Most emerging economies fall into this category. Second, if the ratio between the number of parameters to be estimated and the available degrees of freedom is large, the ensuing noise in the parameter estimates will artificially inflate the measured impact of observed circumstances on individual life outcomes (Brunori et al.
2019b). Emerging economies may again be particularly susceptible to such
upward bias in standard IOp estimates since the sample sizes of available household surveys tend to be comparatively small. Ex ante it is unclear which of the two biases prevails for the group of emerging economies. As a consequence, policy makers that rely on standard estimates may over or underestimate the true degree of IOp and enact policy measures without considering the uncertainty around such estimates (Kanbur and Wagstaff
2016).
In this paper, we address the uncertainty around empirical IOp estimates by drawing on longitudinal household surveys from twelve emerging economies which enable us to estimate both LB and UB measures of IOp. First, we calculate LB measures of IOp by estimating the impact of
observable circumstances on incomes with a crossvalidated lasso procedure. Assessing statistical models by outofsample crossvalidation disciplines the process of model selection and therefore prevents overfitting the circumstance parameters to the estimation sample. As a consequence, the relevant circumstance parameters are estimated with less noise which in turn cushions upward biases in IOp measures.
Second, we leverage the panel dimension of the data to calculate UB estimates based on the individual fixed effect (FE) estimator proposed in Niehues and Peichl (
2014). By their most common definition, circumstance characteristics are timeconstant but partly unobservable by the econometrician. Individual FEs capture the full set of
unobservable circumstances and therefore yield the maximum amount of outcome variation that can be explained by circumstances. However, individual FEs also capture timeconstant effort variables and therefore may overstate the extent of unequal opportunities. Hence, they yield an upper bound of the true IOp estimate.
Advertisement
Our results can be summarized as follows. In emerging economies the standard approach of estimating inequality of opportunity produces results that closely align with the lower bound. In theory, the restricted data infrastructures of many emerging economies could lead to either upward biased (small sample sizes) or downward biased (little circumstance information) estimates. In practice, the latter concern clearly dominates the former in our sample. With respect to individual (equivalized household) incomes, the average difference between the standard estimate and the lower bound estimate is 5.7 (5.0) percentage points (pp). To the contrary, the average distance between the standard estimate and the upper bound estimate is 22.8 pp (28.5 pp).
These results from emerging economies contrast recent evidence for European countries. For example, Brunori et al. (
2018) show for a set of European countries that standard estimates may be upward biased by up to 300%. This contrast emphasizes that the particularities of data environments are crucial for an assessment of the relative importance of upward and downward biases. Second, the large distance between the standard estimate and the upper bound estimate in emerging economies emphasizes the concern of providing misleading reference points to policymakers who could use downwardbiased estimates of IOp to downplay the moral significance of inequality (Kanbur and Wagstaff
2016). In the absence of data innovations, providing reasonable bounds on inequality of opportunity may be the only way to address such concerns. Our paper is the first to conduct such a bounding exercise for a set of emerging economies with broad geographical coverage and thereby contributes to the growing literature on EOp in these countries.
^{2}
The remainder of this paper is organized as follows. In Sect.
2 we formalize the EOp concept and outline the corresponding estimation strategies for its LB and UB measures. After introducing the data sources in Sect.
3, we present results and robustness analyses for both LB and UB estimates in Sect.
4. Section
5 concludes the paper.
2 Conceptual framework
Important life outcomes such as income and consumption are determined by an extensive vector of personal characteristics that can be subsumed by a binary classification into circumstances and efforts. Those characteristics that are completely beyond the realm of individual control are called circumstances. To the contrary, those characteristics that are at least partially controlled by individuals are called efforts. The more the distribution of outcomes depends on circumstances, the stronger the violation of the opportunityegalitarian ideal and the higher the measure of inequality of opportunity.
Consider a finite population indexed by
\(i\in \{1,\ldots ,N\}\).
^{3} Each individual is characterized by the tuple
\(\{y_{it}, \mathbf {C_i}, \mathbf {E_{it}}\}\).
\(y_{it}\) constitutes the periodspecific outcome of interest,
\( \mathbf {C_i} \) the vector of timeinvariant circumstances, and
\(\mathbf {E_{it}}\) periodspecific effort. Life outcomes are a function of circumstances and efforts
^{4}:
Note that we allow circumstances to have a direct and an indirect impact on the outcome of interest. For example, certain groups may be excluded from offices and positions based on outright discrimination (direct impact). However, such discrimination may also lead to adjustments in individual effort exertion since the imposed circumstance constraints alter the individual optimization calculus (indirect impact). Whether the correlation between circumstances and efforts contributes to the fair or the unfair part of inequality is widely debated (Jusot et al.
2013). In this paper we follow Roemer (
1998) who proposes that outcome differences due to a correlation between circumstances and effort constitute a violation of EOp.
^{5}
$$\begin{aligned} y_{it}=f(\mathbf {C_i},\mathbf {E_{it}}(\mathbf {C_i})). \end{aligned}$$
(1)
The literature on EOp further distinguishes the exante from the expost approach (Ramos and Van De Gaer
2016). While the exante approach requires that there are no differences in life outcomes across circumstance types, the expost approach demands that individuals exerting the same effort enjoy the same level of advantage. In this paper we focus on the exante approach. That is, we use
\(\mathbf {C_i}\) to construct a partition of disjunct types
\(\Pi =\{T_1,\ldots ,T_P\}\) such that all members of a type are homogeneous in circumstances. The average outcome of type
k is denoted by
\(\mu ^k_t\). EOp is achieved if typemeans in period
t are equalized across types, i.e. if
\(\mu ^k_{t}=\mu ^l_{t}~\forall ~l,k~~T_k,T_l\in \Pi \).
Computing inequality in a counterfactual distribution
\(M_{t}=\left( \mu ^1_{1t},\ldots ,\mu ^k_{it},\ldots ,\mu ^P_{Nt}\right) \), in which each individual
i of type
k is assigned its corresponding type outcome
\(\mu ^k_{t}\) yields a scalar measure of IOp. It decreases with PigouDalton transfers between circumstance types but is invariant to such transfers within circumstance types. Inequality in the counterfactual distribution of typemeans can thus be considered unfair as it only depends on disparities due to immutable circumstance characteristics.
Standard Estimation (S) The standard approach towards IOp measurement (Bourguignon et al.
2007; Ferreira and Gignoux
2011) constructs an estimate for the counterfactual distribution of type means in a twostep procedure. First, for the year of interest
t we estimate:
Note that this specification accounts for both the direct and the indirect effect of circumstances since the correlation between
\(\mathbf {C_i}\) and
\(\mathbf {E_{it}}\) is implicitly captured by
\(\varvec{\beta }\). Second, we use the vector of estimated parameters
\(\varvec{\hat{\beta }}\) to parametrically construct an estimate for the distribution of type means
\({\tilde{M}}^S_{t}=\left( {\tilde{\mu }}^S_{1t},\ldots ,{\tilde{\mu }}^S_{it},\ldots ,{\tilde{\mu }}^S_{Nt}\right) \)
^{6}:
Lower bound estimation (LB) Conceptually, Ferreira and Gignoux (
2011) show that the outlined standard estimate of IOp is a LB of its true value if the circumstance vector
\(\mathbf {C_i}\) contains only a subset of all relevant circumstances. Empirically, however, this lower bound measure may be upward biased due to sampling variance in the distribution of type means (Brunori et al.
2019b). With decreasing sample size and increasing size of the circumstance set, the available degrees of freedom to estimate
\(\varvec{\beta }\) shrink. The ensuing noise in
\(\varvec{\hat{\beta }}\) artificially inflates the variance in the distribution of estimated type means
\({\tilde{M}}_{t}^{S}\), which in turn leads to upward biased lower bound measures of IOp.
$$\begin{aligned} \ln y_{it}=\alpha + {\varvec{\beta }} * \mathbf {C}_{{\mathbf{i}}} + \epsilon _{it}. \end{aligned}$$
(2)
$$\begin{aligned} {\tilde{\mu }}_{it}^{S} = \exp \bigg \{ \hat{\alpha } + \varvec{\hat{\beta }}*\mathbf {C_i} + \frac{\sigma ^2}{2} \bigg \}. \end{aligned}$$
(3)
The literature has proposed different methods to address the upward bias in IOp estimates. Using the European Union Survey on Income and Living Conditions (EUSILC), Brunori et al. (
2019b) select models by 5fold cross validation. Thereby, the authors prespecify a large variety of potential models which differ in circumstance characteristics and their interactions. After estimating these models on random folds of the data, the algorithm chooses the model which minimizes the average out of sample mean squared error.
^{7} An alternative approach to model selection are conditional inference trees and forests (Brunori et al.
2018). The regression tree method recursively splits the data according to the circumstance variables which have the strongest association with the outcome of interest while regression forest provide average estimates over multiple regression trees applied to random subsets of the data.
In this work we calculate lower bound estimates based on two different crossvalidated lasso estimations that select the relevant circumstances to maximize the outofsample prediction accuracy of the model. Lasso estimations have two advantages in comparison to previous methods. First, one does not have to prespecify the models to be evaluated by crossvalidation—the preferred method in Brunori et al. (
2019b). Second, they are less computationally expensive than random forests—the preferred method in Brunori et al. (
2018). In Fig.
4, we use EUSILC data to validate the lasso methodology against the findings of Brunori et al. (
2018,
2019b). Both lasso estimates align very closely with the alternative estimation procedures. The implied Pearson correlation coefficients are 0.90/0.87 in comparison to the findings of Brunori et al. (
2019b), and 0.91/0.89 in comparison to the findings of Brunori et al. (
2018). All correlation coefficients are not statistically different from one at the 5% significance level.
In both estimation approaches, we first estimate
Part (1) of Eq.
4 is a perfect mirror of the OLS algorithm used to estimate Eq.
2. Part (2) however introduces a penalization term that varies with the absolute value of the estimated coefficient
\(\hat{\beta _j}^{lasso}\). The larger (smaller) the penalization term
\(\lambda \), the more (less) parsimonious the model and the lower the variance (bias) in the predictions based on the parameter vector
\(\hat{\varvec{\beta }}^{lasso}\). We choose the optimal parameterization of
\(\lambda \) by means of 5fold cross validation.
^{8}
$$\begin{aligned} \underset{\varvec{\beta }}{\mathrm {argmin}}\sum _{i}\underbrace{\left( \ln y_{it}\alpha ^{lasso}  \sum _{j}\beta _j^{lasso} * C_{ij}\right) ^2}_{(1)} + \underbrace{\sum _{j}\lambda \left \beta _j^{lasso}\right }_{(2)}. \end{aligned}$$
(4)
The first lower bound estimate (LB1) uses the resulting vector
\(\hat{\varvec{\beta }}^{lasso}\) to construct the counterfactual distribution
\({\tilde{M}}_{t}^{LB1}=({\tilde{\mu }}^{LB 1}_{1t},\ldots ,{\tilde{\mu }}^{LB 1}_{it},\ldots ,{\tilde{\mu }}^{LB 1}_{Nt})\):
The second lower bound estimate (LB2) implements a postOLS lasso estimation (Hastie et al.
2013). We only retain the subset
\(\mathbf {C^r}\subseteq {\mathbf {C}}\), i.e. those circumstances whose coefficients were not shrunk to zero in Eq.
4. Then, we estimate
\(\hat{\varvec{\beta }}^{Postlasso}\) by running an OLS regression on the restricted set of circumstances:
We use
\(\hat{\varvec{\beta }}^{Postlasso}\) to construct the counterfactual distribution
\({\tilde{M}}_{t}^{LB2}=\left( {\tilde{\mu }}^{LB 2}_{1t},\ldots ,{\tilde{\mu }}^{LB 2}_{it},\ldots ,{\tilde{\mu }}^{LB 2}_{Nt}\right) \):
Note that LB1 and LB2 are just different estimates of the same parameter vector. The choice between these two estimation methods is not straightforward. On the one hand, Belloni and Chernozhukov (
2013) argue that the postlasso may have a superior prediction accuracy than the standard lasso approach. On the other hand, the methodological validation based on EUSILC reveals that the standard lasso approach tends to align more closely with the results in Brunori et al. (
2018,
2019b) (Fig.
4). In our empirical application, we refer to standard lasso as our baseline LB estimate. However, we show that our main conclusions are insensitive to this choice.
^{9}
$$\begin{aligned} {\tilde{\mu }}_{it}^{LB 1} = \exp \bigg \{ \hat{\alpha }^{lasso} + \varvec{\hat{\beta }}^{lasso}*\mathbf {C_i} + \frac{\sigma ^2}{2} \bigg \}. \end{aligned}$$
(5)
$$\begin{aligned} \ln y_{it}=\alpha ^{Postlasso} + \varvec{\beta }^{Postlasso} * \mathbf {C^r_i} + \epsilon _{it}. \end{aligned}$$
(6)
$$\begin{aligned} {\tilde{\mu }}_{it}^{LB2} = \exp \bigg \{ \hat{\alpha }^{Postlasso} + \varvec{\hat{\beta }}^{Postlasso}*\mathbf {C^r_i} + \frac{\sigma ^2}{2} \bigg \}. \end{aligned}$$
(7)
Upper bound estimation (UB) Since S and LB are based on the subset of observable circumstances only, the resulting IOp estimates may be downward biased. Following Niehues and Peichl (
2014) we therefore construct UBs of IOp using an individual fixed effects (FE) estimator. Assuming circumstances to be timeinvariant, individual FEs capture the full set of
\(\mathbf {C_i}\) even though not all circumstances are observable by the econometrician. A counterfactual distribution of type means constructed from individual FEs thus captures the upper ceiling of outcome variation that can be attributed to the impact of circumstances. In particular, the smoothed distribution of the UB is constructed as follows.
First, using observations from all periods
\(v\ne t\), we estimate the individual FE
\(c_i\) while accounting for common yearspecific shocks
\(u_v\):
^{10}
Second, we regress the individual outcome in period
t on the estimated individual FE:
Third, we use the vector of parameters
\(\hat{\Psi }\) to construct the counterfactual distribution
\({\tilde{M}}_{t}^{UB}=\left( {\tilde{\mu }}^{UB}_{1t},\ldots ,{\tilde{\mu }}^{UB}_{it}, \ldots ,{\tilde{\mu }}^{UB}_{Nt}\right) \):
Note that this estimator would yield the true estimate of IOp if
\(c_i\) captured timeinvariant circumstances only. However, the individual FE may also absorb timeinvariant effort exertion (e.g. longterm motivation, ambition) leading to an UB interpretation of this IOp estimate.
$$\begin{aligned} \text {ln }y_{iv}=c_i + u_v + \epsilon _{iv}. \end{aligned}$$
(8)
$$\begin{aligned} \text {ln } y_{it} = {\Psi }*{\hat{c}}_i + \epsilon _{it}. \end{aligned}$$
(9)
$$\begin{aligned} {\tilde{\mu }}_{it}^{UB} = \exp \bigg \{ \hat{\Psi }*{\hat{c}}_i + \frac{\sigma ^2}{2} \bigg \}. \end{aligned}$$
(10)
Inequality measurement We follow existing IOp literature and summarize the information in counterfactual distributions
\({\tilde{M}}_{t}^{S}\),
\({\tilde{M}}_{t}^{LB1}\),
\({\tilde{M}}_{t}^{LB2}\), and
\({\tilde{M}}_{t}^{UB}\) by the mean log deviation (MLD) and the Gini coefficient. The MLD is part of the generalized entropy class of inequality measures satisfying symmetry, the Pigou–Dalton transfer principle, scale invariance, population replication, as well as additive and pathindependent subgroup decomposability (Shorrocks
1980; Foster and Shneyerov
2000). However, the MLD is very sensitive to low incomes many of which are smoothed out when constructing counterfactual distributions. Therefore, Brunori et al. (
2019a) argue in favor of using the Gini index in spite of its imperfect subgroup decomposability.
^{11} For both inequality measures, we provide relative measures of IOp that relate the MLD (Gini) of the counterfactual distributions
\({\tilde{M}}_{t}^{S}\),
\({\tilde{M}}_{t}^{LB1}\),
\({\tilde{M}}_{t}^{LB2}\) and
\({\tilde{M}}_{t}^{UB}\) to the actual outcome distribution
\(Y_t\). The latter measures can be interpreted as the share of total inequality that is explained by circumstances and thus violates the opportunityegalitarian ideal.
3 Data
We estimate IOp in income and consumption expenditure for twelve emerging economies in different geographical areas of the world ranging from Africa (Ethiopia, Malawi, South Africa, Tanzania), Central and South America (Argentina, Chile, Mexico, Peru), Europe and Central Asia (Russia), to East and SouthEast Asia (China, Indonesia, Thailand). The country selection is guided by the availability of household panel data with (1) information on relevant circumstance variables, and (2) a sufficient number of observations in the longitudinal dimension.
^{12} Table
2 provides an overview of the underlying data sources.
We consider three outcomes of interest. First, we calculate IOp in individual income—before or after taxes and transfers depending on data availability. Second, we account for resource sharing at the household level and calculate IOp in equivalized household income. Accounting for resource sharing at the household level is particularly relevant in emerging economies since female participation in formal labor markets tends to be low (Cubas
2016). Third, to derive a more direct measure of IOp in material wellbeing, we also consider equivalized household consumption expenditures. Household income and consumption expenditure are deflated by the modified OECD equivalence scale.
Throughout the paper, we restrict ourselves to withincountry comparisons. Table
2 documents many differences across the underlying data sources. These include differences in the reference period, the income and consumption expenditure aggregates, the detail of available circumstance characteristics, as well as the sampled populations. For example, while the data for Mexico avails net income information until 2004, the data for Thailand provides gross income figures until 2016. The Ethiopian panel provides a rather parsimonious set of circumstances for a rural fraction of the population, whereas the Russian panel provides a rich set of circumstances for a nationally representative sample of households. We therefore refrain from crosscountry comparisons but focus our discussion on intracountry comparisons between the different estimation approaches.
To ensure the consistency of these intracountry comparisons, we only retain those units of observation for which we observe (1) all circumstance variables, and (2) positive outcomes in all available outcome dimensions for at least three periods of observation. We further restrict our samples to individuals aged 25–55.
^{13}
Table
1 displays relevant summary statistics for the estimation of S, LB, and UB by country.
Table 1
Circumstance Information by Country
Standard estimate (S) and lower bound (LB)

Upper bound (UB)



N

Circumstances

Parameter

N (FE)

Start

End

Min. Years

Avg. Years


Argentina

3019

Gender, year of birth, place of birth

8

6038

2013

2014

3

3.00

Chile

2808

Gender, year of birth, place of birth, education of father/mother, ethnicity, labor force status of father/mother, chronic disease

37

8424

2006

2008

4

4.00

China

243

Gender, year of birth, ethnicity, urbanity of birthplace

22

717

1988

2010

3

3.95

Ethiopia

660

Gender, year of birth, education of father/mother, ethnicity, religion

69

2433

1994

2004

3

4.69

Indonesia

786

Gender, year of birth, education of father/mother, ethnicity, religion, language

29

2036

1992

2006

3

3.59

Malawi

362

Gender, year of birth, education of father/mother, religion

16

995

2004

2008

3

3.67

Mexico

3050

Gender, year of birth, language

5

16,552

1999

2004

5

6.43

Peru

2193

Gender, year of birth, place of birth, language, chronic disease

37

5878

1998

2011

3

3.68

Russia

1181

Gender, year of birth, place of birth, urbanity of birthplace, education of father/mother, labor force status of father/mother, height

54

10,816

1994

2016

5

10.16

South Africa

670

Gender, year of birth, place of birth, education of father/mother, ethnicity

48

2331

2008

2015

4

4.48

Tanzania

221

Gender, year of birth, place of birth, ethnicity, religion

18

819

1991

2004

3

4.71

Thailand

465

Gender, year of birth, education of father/mother, wealth of parents, family plot size

15

6338

1997

2016

3

14.63

4 Results
Figure
1 displays bounds of relative IOp, i.e. the percentage of total inequality that can be explained by exogenous circumstances.
^{14} Standard estimates (S) indicate IOp based on all observable circumstances available in the particular country data set. Lower bound estimates (LB) also use the full set of observable circumstances but account for potential upward biases through lasso estimation in which irrelevant circumstance parameters are shrunk to zero.
^{15} Upper bound estimates (UB) account for unobservable circumstances through the FE estimation procedure outlined in Sect.
2.
Individual income Panel (a) shows the results for individual income. The standard IOp estimate (S) for individual income ranges from 9.3% (Argentina) to 30.6% (Peru, South Africa). Accounting for sampling variation and the ensuing potential for upward biases in S provides only minor reductions in IOp. According to LB, between 6% (China) and 25.9% (Peru) of outcome inequality must be considered unfair. The average difference between S and LB estimates amounts to 5.7pp.
^{16} When using the postlasso OLS procedure, the average difference is even smaller and equals 0.5pp. These results suggest that the standard estimation approach (S) is largely uncompromised by overfitting circumstance parameters to the available data. Instead—and in line with the theoretical reasoning of Ferreira and Gignoux (
2011)—the standard approach indeed recovers estimates close to the lower bound (LB) estimate in all countries under consideration. Note that this result stands in contrast to recent evidence for European countries suggesting that the standard approach overestimates lower bound IOp by up to 300% (Brunori et al.
2018,
2019b). This difference is reconciled by the quality of the underlying data sources. While the richness of the European data confers the opportunity to overfit the circumstance information to the data, the sparsity of circumstance information in the household surveys under consideration prevents upward biases in the standard estimate (S).
The lower bound estimator selects the circumstance parameters with the highest outofsample prediction accuracy. In Table
5, we show for each outcome of interest, which of the circumstance variables and categories are chosen by the lasso estimator in a particular country. Across all countries, gender plays a prominent role reflecting concerns about gender inequality in the context of emerging and developing economies (Jayachandran
2015). However, it is important to note that the selection of particular variables by lasso only indicate a predictive correlation and does not necessarily imply a causal relationship. For instance, even though both maternal and paternal education could causally affect the income of individuals, a high correlation between fathers’ and mothers’ education might lead the lasso to choose only one of the two circumstance characteristics.
×
While sparse circumstance information limits the scope for upward biases, it may lead to downward biases due to the neglection of circumstances that are unobserved by the econometrician. Therefore, we take account of
unobservable circumstances by means of the fixed effect estimation outlined in Sect.
2. The UB estimates of IOp vary between 17.2% (Mexico) and 72.5% (South Africa). On average, UB exceeds S by 22.8pp. It therefore yields a significant upward correction of IOp in comparison to S and LB, respectively. The difference between UB and S is broadly comparable to the respective gap in developed economies (Niehues and Peichl
2014). As such, our results reflect recent concerns that downward biased IOp estimates based on observable circumstance characteristics provide misleading reference points as regards the normative significance of inequality (Kanbur and Wagstaff
2016).
^{17}
Household income Panel (b) of Fig.
1 displays analogous IOp estimates for equivalized household income. In contrast to the results on individual income, we thereby account for resource sharing at the household level and heterogeneity in household compositions. Estimates for S (LB) decrease for the vast majority of countries and now lie between 1.2% in Argentina (0%, China) and 35.9% in South Africa (24.7%, South Africa). This decrease follows from the assumption of resource sharing at the household level that largely nullifies genderbased differences in incomes. Hence, the average difference between S and LB remains at a very low level of 5.0pp. Again, using the alternative postlasso OLS estimation strategy decreases this difference to 1.3 pp. To the contrary, the UB estimates are largely comparable to their individual income analogues. According to UB, IOp ranges between 8.6% (Mexico) and 73.9% (South Africa). As a consequence, the average difference between S and UB increases from 22.8 pp to a level of 28.5 pp when considering household instead of individual incomes. Our general conclusion, however, remains intact: In the context of the developing economies under consideration, the standard estimation approach recovers an estimate close to LB. However, its large distance to UB suggests severe underestimations due to the influence of unobservable circumstances.
Household expenditure In Panel (c), we show IOp estimates for equivalized household expenditure. There are different explanations for potential deviations of IOp in household expenditure and household income. First, if households smooth consumption its distribution is less unequal than the distribution of income. Additionally, assuming transitory fluctuations to be more strongly reflected in the outcome distribution
\(Y_t\) than the smoothed distribution
\({\tilde{M}}_t\), we would expect relative IOp in consumption expenditures to be
higher than in income.
^{18} In fact, this is the pattern observed by Ferreira and Gignoux (
2011) when comparing IOp in income and consumption for five LatinAmerican countries. Second, even if households smooth consumption, expenditures for consumption items, especially durables, can be lumpy (Meyer and Sullivan
2017). This tendency is amplified by the fact that reference periods for expenditure reporting are oftentimes shorter (e.g. weekly, monthly, quarterly) in order to allow survey respondents to recall their expenditures in different categories. Again, assuming transitory fluctuations to be more strongly reflected in the outcome distribution
\(Y_t\) than the smoothed distribution
\({\tilde{M}}_t\), we would expect relative IOp in consumption expenditures to be
lower than in income. Which of the two tendencies dominates is an empirical question and varies with the mode of data collection in the different countries. In our country sample the second channel tends to dominate. Compared to relative IOp in household income, IOp in household expenditure is on average 2.5 pp (S), 1.6 pp (LB), and 4.5 pp (UB) lower. However, there is heterogeneity across countries. According to the standard estimate, relative IOp for household expenditure is higher than IOp for income in Peru, South Africa, and Thailand. The reverse is true for China, Ethiopia, Indonesia, and Russia.
Estimates for S (LB) with respect to consumption expenditure lie between 6.3% in Tanzania (0%, China) and 40.3% in South Africa (29.5%, South Africa). According to UB, IOp ranges between 12.2% (Tanzania) and 67.6% (South Africa). As a consequence, the average difference between S and LB (UB) amounts to 5.9pp (20.2 pp). These findings support our conclusion that the standard estimation approach recovers an estimate close to LB.
Sensitivity analysis We conduct four sensitivity checks in which we probe the robustness of our conclusions to alternative specification choices.
MLD vs. Gini coefficient The majority of empirical IOp estimations draw on the MLD due to its pathindependent decomposability property. In the context of IOp measurment, this property allows for a perfect decomposition into circumstancebased unfair inequality and effortbased fair inequality. However, as noted by Brunori et al. (
2019a) the MLD’s senstivity to low income values leads to low relative measures of IOp.
Hence, we replicate our analysis based on the Gini coefficient and show the results in Fig.
2. Indeed, relative IOp based on the Gini is larger than suggested by the MLD. For individual incomes, the standard estimate on average increases by 30 pp and now lies between 34.1% (Argentina) and 68.1% (Peru). The corresponding UB on average increases by 26pp and ranges from 43.5% (Mexico) to 89.8% (South Africa). The LB on average increases by 27.8 pp and lies between 28.7% (China) and 62.3% (Peru). The pattern is very similar for equivalized household income and expenditure (see Table
3).
These results indicate that the attenuating effect implied by the tail sensitivity of the MLD largely outweighs the attenuating effect implied by the imperfect decomposability of the Gini coefficient. Furthermore, although using the Gini coefficient widens the gap between S and LB, the difference between UB and S is still larger for the majority of outcomes and countries in our sample. This observation confirms that independent of the inequality measure, the potential for downward biased IOp estimates is much larger than the potential to overestimate IOp in emerging economies.
×
×
Circumstance availability The differences between S and LB (UB) may vary with the size of the invoked circumstance set. To test the relevance of this concern in our sample, we reestimate S and LB while restricting ourselves to a harmonized set of circumstances that is available in all countries under consideration. The internationally comparable circumstance set includes gender and year of birth. In Panel (a) of Fig.
3 we plot the difference between S and UB (LB) according to the harmonized circumstance specification (yaxis) against the analogous differences in our baseline estimates (xaxis). The closer data points align with the 45 degree line, the more similar the results between the baseline and the alternative specification.
Restricting the circumstance set mechanically attenuates S but leaves UB unaltered. It is therefore unsurprising that the difference between S and UB increases for all countries under consideration. The reverse holds true for the difference between S and LB. In fact, the restriction of the circumstance set leads to a zero difference between S and LB for the majority of the country cases. These results therefore confirm our main conclusion: The more parsimonious the circumstance set, the stronger the correspondence between S and LB and the higher the downward bias. Unfortunately, we cannot run the reverse test by increasing the number of circumstances. Therefore, we cannot provide a direct assessment of the precise conditions under which S and LB come adrift.
Number of periods The difference between S and UB may differ with the number of periods used to construct the individual FEs. In the baseline we set a minimum threshold for the number of periods used to calculate the fixed effect. However, in spite of implementing this minimum threshold the de facto number of observations used for the construction of the individual FEs is not bounded from above and therefore varies across countries (Table
1). To test the relevance of this concern, we construct UB estimates in which we restrict the sample to the three most recent observations for each individual in each country. In Panel (b) of Fig.
3 we plot the differences between S and UB according to this harmonized specification (yaxis) against the analogous differences according to our baseline estimates (xaxis). The closer data points align with the 45 degree line, the more similar the results between the baseline and the alternative specification.
We find that all data points with respect to the difference between S and UB closely align to the 45 degree line. This pattern suggest that even short panels deliver reliable indicators for UB inequality of opportunity. Note that the panel length impinges upon the UB estimate only. Therefore, all differences between S and LB remain unaffected by this harmonization.
Year of interest Our results may be sensitive to alternations in the time period of interest. In our baseline analysis we focus on the most recent available data years covering a range from 2009 to 2017. Therefore, we replicate our analysis for the countryspecific wave in closest proximity to 2009.
^{19} In Panel (c) of Fig.
3 we plot the differences between S and UB (LB) according to this harmonized specification (yaxis) against the analogous differences according to our baseline estimates (xaxis). The closer the data points align with the 45 degree line, the more similar the results between the baseline and the alternative specification.
Given that a society’s opportunity structure is shaped by longrun institutional features, one would expect these differences to be small. Indeed, we find that the data points for the difference between S and UB closely group around the 45 degree line. A similar conclusion holds for the difference between S and LB although the dispersion around the 45 degree line is somewhat larger.
5 Conclusion
Measures of IOp are of considerable policy relevance since they reflect widelyheld principles of distributive justice and plausibly correlate with measures of economic development. In spite of their interest, point estimates of IOp are surrounded by severe uncertainty since they can be both upward and downward biased. Due to poorer data infrastructures with smaller sample sizes and less information on circumstance characteristics, IOp estimates in emerging economies may be particularly susceptible to both biases and it is unclear which of the two biases prevails.
We show that downward bias clearly dominates in the context of emerging economies. On the one hand, sparsely populated circumstance sets restrict the scope for overfitting circumstance information to the data. As a consequence, standard estimates of IOp strongly correspond to their lower bound analogues. This result stands in contrast to recent evidence from countries with richer data environments. On the other hand, the sparsity of observable circumstance information leads to large differences between standard estimates of IOp and their upper bound analogues. The extent of these differences is largely comparable to more developed countries and ranges between 20 pp and 30 pp.
While we provide reasonable bounds for IOp in these countries, substantial differences between lower and upper bound IOp remain. Our results therefore tie in with recent concerns that downward biased IOp estimates could misguide judgments on the normative significance of inequality. In the future, such gaps may be closed as better data sets become available. However, until such innovations materialize, bounding the range of potential estimates remains a viable way to limit the scope for downplaying the normative significance of inequality in the countries of interest.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/.
Additional tables
Table 2
Data overview
Country

Country Abbr.

Panel period

Waves

Data

Source

Individual

Household

Sample weights available



Income

Income

Expenditure


Argentina

AG

2003–2015

12

Encuesta Permanente de Hogares

Instituto Nacional de Estadística y Censos (INDEC)

Net

Net

–

Yes

Chile

CL

2006–2009

4

Encuesta Panel CASEN

Ministerio de Desarrollo Social, Chile

Net

Net

–

Yes

China

CN

1988–2014

10

China Health and Nutrition Survey

Carolina Population Center at the University of North Carolina at Chapel Hill and National Institute for Nutrition and Health (NINH) at the Chinese Center for Disease Control and Prevention (CCDC)

Net (Labor)

Net (Labor)

Yes

No

Ethiopia

EP

1994–2009

6

Ethiopia Rural Household Survey

International Food Policy Research Institute (IFPRI), Washington DC

Net

Net

Yes

No

Indonesia

ID

1992–2013

5

Indonesian Family Life Survey (IFLS)

RAND Social and Economic WellBeing

Net (Labor)

Net (Labor)

Yes

No

Malawi

MW

1998–2010

4

Malawi Longitudinal Study of Family and Health

Population Studies Center at the University of Pennsylvania and College of Medicine at the University of Malawi and Invest in Knowledge (IKI) in Zomba, Malawi

–

–

Yes

No

Mexico

MX

1999–2009

7

Encuesta Evaluation de los Hogares (ENCEL)

International Food Policy Research Institute (IFPRI), Washington DC

Net

Net

–

No

Peru

PR

1998–2011

14

Encuesta Nacional de Hogares, Condiciones de Vida y Pobreza

Instituto Nacional de Estadíística e Informatica

Gross

Gross

Yes

Yes

Russia

RSA

1994–2017

22

Russia Longitudinal Monitoring Survey (RLMS)

National Research University “Higher School of Economics”, OOO “Demoscope”, Carolina Population Center, University of North Carolina at Chapel Hill and the Institute of Sociology of the Federal Center of Theoretical and Applied Sociology of the Russian Academy of Sciences

Net (Labor)

Net (Labor)

Yes

Yes

South Africa

SAF

2008–2017

5

National Income Dynamics Study (NIDS)

Southern Africa Labour and Development Research Unit (SALDRU), University of Cape Town

Net (Labor)

Net (Labor)

Yes

Yes

Tanzania

TAZ

1991–2010

6

Kagera Health and Development Survey

Economic Development Initiatives

–

–

Yes

No

Thailand

THA

1997–2017

21

Townsend Thai Data

The Townsend Thai Project

–

Gross

Yes

No

Table 3
Absolute and relative inequality of opportunity, baseline specification
Country

Year

N

N (FE)

Parameters

Inequality

MLD, absolute

MLD, relative (%)

Gini, absolute

Gini, Relative (%)



P

\(P^S\)

\(N/P^S\)

\(P^{LB}\)

\(\lambda ^*\)

MLD

Gini

S

LB 1

LB 2

UB

S

LB 1

LB 2

UB

S

LB 1

LB 2

UB

S

LB 1

LB 2

UB


Individual Income

Argentina

2015

3019

6038

8

7

431.29

7

0.003

0.285

0.386

0.027

0.026

0.027

0.159

9.33

8.99

9.33

55.70

0.132

0.129

0.132

0.297

34.13

33.50

34.13

76.97

Chile

2009

2808

8424

37

26

108.00

16

0.015

0.395

0.430

0.078

0.064

0.078

0.190

19.79

16.24

19.74

48.16

0.220

0.200

0.220

0.319

51.25

46.58

51.17

74.36


China

2014

243

717

22

14

17.36

7

0.070

0.530

0.471

0.084

0.032

0.080

0.114

15.91

5.95

15.05

21.54

0.217

0.135

0.213

0.245

46.01

28.73

45.25

52.03


Ethiopia

2009

660

2433

69

33

20.00

12

0.056

0.873

0.622

0.214

0.143

0.201

0.262

24.54

16.42

23.04

29.97

0.349

0.286

0.331

0.371

56.08

45.95

53.33

59.64


Indonesia

2013

786

2036

29

20

39.30

9

0.036

0.533

0.478

0.096

0.064

0.093

0.238

18.04

12.04

17.41

44.65

0.249

0.204

0.244

0.345

52.07

42.75

51.13

72.20


Mexico

2009

3050

16552

5

4

762.50

4

0.001

0.186

0.311

0.025

0.025

0.025

0.032

13.42

13.28

13.42

17.18

0.110

0.109

0.110

0.135

35.33

35.09

35.33

43.54


Peru

2010

2193

5878

37

33

66.45

26

0.017

0.695

0.489

0.213

0.180

0.213

0.279

30.64

25.85

30.59

40.10

0.333

0.305

0.333

0.361

68.11

62.31

68.01

73.66


Russia

2017

1181

10816

54

46

25.67

14

0.033

0.234

0.358

0.049

0.027

0.046

0.137

20.85

11.47

19.52

58.27

0.177

0.132

0.171

0.287

49.52

36.89

47.84

80.30


South Africa

2017

670

2331

48

34

19.71

20

0.025

0.418

0.471

0.128

0.091

0.126

0.303

30.62

21.68

30.24

72.46

0.286

0.239

0.284

0.423

60.62

50.71

60.22

89.78


Household income

Argentina

2015

3019

6038

8

7

431.29

6

0.007

0.236

0.365

0.003

0.002

0.003

0.147

1.17

0.79

1.16

62.42

0.042

0.035

0.042

0.296

11.55

9.51

11.49

81.01

Chile

2009

2808

8424

37

26

108.00

18

0.009

0.261

0.385

0.038

0.031

0.038

0.133

14.45

12.03

14.41

50.88

0.154

0.140

0.153

0.285

39.88

36.42

39.83

74.11


China

2014

243

717

22

14

17.36

1

0.185

0.517

0.480

0.047

0.000

0.000

0.079

9.14

0.00

0.00

15.36

0.146

0.000

0.000

0.208

30.36

0.00

0.00

43.21


Ethiopia

2009

660

2433

69

33

20.00

13

0.050

0.953

0.640

0.250

0.178

0.240

0.308

26.26

18.73

25.14

32.36

0.370

0.316

0.355

0.401

57.92

49.43

55.56

62.67


Indonesia

2013

786

2036

29

20

39.30

12

0.022

0.482

0.473

0.117

0.092

0.116

0.226

24.23

19.08

24.09

47.00

0.273

0.243

0.272

0.346

57.61

51.32

57.48

73.24


Mexico

2009

3050

16552

5

4

762.50

4

0.000

0.198

0.331

0.006

0.006

0.006

0.017

3.07

3.03

3.07

8.56

0.062

0.061

0.062

0.102

18.59

18.46

18.59

30.81


Peru

2010

2193

5878

37

33

66.45

25

0.015

0.561

0.470

0.134

0.111

0.134

0.238

23.91

19.74

23.86

42.45

0.265

0.237

0.264

0.348

56.32

50.31

56.21

74.09


Russia

2017

1181

10816

54

46

25.67

11

0.031

0.194

0.330

0.019

0.006

0.015

0.105

10.02

3.12

7.96

54.10

0.112

0.063

0.100

0.251

33.85

19.00

30.27

75.96


South Africa

2017

670

2331

48

34

19.71

20

0.033

0.475

0.495

0.170

0.117

0.167

0.351

35.85

24.65

35.26

73.94

0.329

0.271

0.326

0.447

66.53

54.74

65.92

90.42


Thailand

2017

465

6338

15

12

38.75

5

0.034

0.289

0.419

0.021

0.011

0.021

0.154

7.43

3.94

7.32

53.22

0.113

0.083

0.112

0.311

27.05

19.75

26.62

74.12


Household expenditure

China

2014

243

717

22

14

17.36

1

0.255

1.427

0.766

0.091

0.000

0.000

0.270

6.38

0.00

0.00

18.95

0.222

0.000

0.000

0.387

28.96

0.00

0.00

50.50

Ethiopia

2009

660

2433

69

33

20.00

14

0.029

0.613

0.557

0.092

0.061

0.088

0.104

15.02

9.91

14.35

16.90

0.196

0.150

0.190

0.247

35.20

26.90

34.03

44.39


Indonesia

2013

786

2036

29

20

39.30

12

0.028

0.398

0.480

0.069

0.046

0.068

0.143

17.33

11.45

17.21

35.81

0.211

0.172

0.210

0.295

43.93

35.78

43.75

61.47


Malawi

2010

362

965

16

12

30.17

7

0.101

1.213

0.720

0.139

0.067

0.133

0.238

11.47

5.56

10.99

19.60

0.285

0.198

0.276

0.371

39.65

27.54

38.41

51.55


Peru

2010

2193

5878

37

33

66.45

27

0.007

0.187

0.331

0.045

0.038

0.045

0.106

24.13

20.43

24.12

56.88

0.166

0.151

0.166

0.254

49.96

45.62

49.94

76.75


Russia

2017

1181

10816

54

46

25.67

16

0.033

0.472

0.518

0.039

0.015

0.034

0.151

8.22

3.21

7.28

32.00

0.156

0.097

0.146

0.303

30.13

18.78

28.27

58.48


South Africa

2017

670

2331

48

34

19.71

21

0.030

0.503

0.531

0.203

0.148

0.200

0.340

40.26

29.46

39.66

67.57

0.357

0.303

0.354

0.452

67.18

57.11

66.69

85.01


Tanzania

2010

221

819

18

15

14.73

3

0.114

0.586

0.569

0.037

0.002

0.023

0.071

6.27

0.28

4.00

12.20

0.152

0.033

0.123

0.187

26.74

5.74

21.68

32.94


Thailand

2017

465

6338

15

12

38.75

4

0.065

0.764

0.620

0.063

0.029

0.057

0.451

8.19

3.80

7.51

58.98

0.194

0.130

0.183

0.501

31.29

21.01

29.51

80.87

Table 4
Sample selection
Country

Outcome

Full sample

+Age 25–55

+Circumstance availability

+Outcome availability (Recent Year)

+Outcome availability (Longitudinal)


Argentina

Equiv. HH Income

N = 93,473

N = 37,190

N = 37,181

N = 31,577

N = 3019

ARP 73,704

ARP 80,235

ARP 80,240

ARP 83,832

ARP 78,103


Chile

Equiv. HH Income

N = 21,087

N = 8483

N = 6307

N = 4918

N = 2808

CLP 2,632,864

CLP 2,736,782

CLP 2,782,129

CLP 2,925,172

CLP 3,093,657


China

Equiv. HH Income

N = 10,434

N = 5579

N = 5047

N = 1118

N = 243

CNY 34,858

CNY 40,624

CNY 40,573

CNY 32,430

CNY 35,796


Ethiopia

Equiv. HH Income

N = 6982

N = 2117

N = 764

N = 742

N = 660

ETB 18,253

ETB 18,465

ETB 17,169

ETB 17,645

ETB 18,570


Indonesia

Equiv. HH Income

N = 31,035

N = 14,382

N = 5426

N = 3620

N = 786

IDR 13,231,805

IDR 14,516,044

IDR 18,097,790

IDR 20,517,590

IDR 26,036,236


Malawi

Equiv. HH Expenditure

N = 3,397

N = 2243

N = 440

N = 440

N = 362

MWK 23,053

MWK 24,301

MWK 27,512

MWK 27,512

MWK 30,052


Mexico

Equiv. HH Income

N = 30,789

N = 11,075

N = 9443

N = 6047

N = 3050

MXP 21,039

MXP 20,169

MXP 20,279

MXP 22,390

MXP 21,123


Peru

Equiv. HH Income

N = 71,758

N = 26,710

N = 26,095

N = 15,280

N = 2193

PEN 6189

PEN 7314

PEN 7375

PEN 9075

PEN 9013


Russia

Equiv. HH Income

N = 15,201

N = 7597

N = 1738

N = 1383

N = 1181

RUB 287,663

RUB 309,332

RUB 309,794

RUB 328,866

RUB 323,750


South Africa

Equiv. HH Income

N = 21,306

N = 8602

N = 3187

N = 2402

N = 670

ZAR 40,550

ZAR 49,357

ZAR 65,522

ZAR 74,904

ZAR 71,921


Tanzania

Equiv. HH Expenditure

N = 4289

N = 2681

N = 936

N = 936

N = 221

TZS 486,370

TZS 504,302

TZS 532,841

TZS 532,841

TZS 464,709


Thailand

Equiv. HH Income

N = 3649

N = 1439

N = 473

N = 473

N = 465

THB 156,869

THB 180,727

THB 150,957

THB 150,957

THB 150,089

Table 5
All (LassoSelected) parameter categories
Country

Parameters selected by lasso: Individual Income (
^{●}), Household Income (
^{◆}), and Household Expenditure (
^{▲})


Argentina

gender
^{●◆},
birthyear
^{●◆},
birthplace (current place of residence
^{●}, different place than current residence
^{●◆}, other province
^{●◆}, neighboring country
^{●}, other country
^{●◆})

Chile

gender
^{●},
birthyear
^{●◆} ,
father education (no schooling
^{●◆}, primary, secondary
^{●◆}, tertiary
^{●◆}),
mother education (no schooling
^{●◆}, primary, secondary
^{●◆}, tertiary
^{●◆}),
birthplace (national, foreign),
ethnicity (not member of any indigeneous population
^{●◆}, Aymara
^{●}, Rapa Nui, Quechua, Mapuche
^{◆}, Atacameno
^{●◆}, Coya, Kawaskar, Yagan, Diaguita
^{●◆}),
chronic disease
^{●◆} (yes/no),
labor force status of father (not working
^{●}, employer
^{◆}, selfemployed
^{◆}, employed
^{◆}, domestic worker, armed forces
^{◆}),
labor force status of mother (not working
^{●}, employer, selfemployed, employed, domestic worker, armed forces
^{◆})

China

gender
^{●},
birthyear
^{●},
ethnicity (Han
^{●}, Mongolian, Hui, Tibetan, Vaguer, Miao, Yi, Zhuang, Buyi, Korean, Man, Dong, Yao, Tujia, other
^{●}),
birthplace urbanity (city
^{●}, suburban, county capital city, village
^{●})

Ethiopia

gender
^{▲},
birthyear
^{▲},
father education (no schooling
^{●◆}, some nursery school, 1st grade, 2nd grade, 3rd grade
^{▲}, 4th grade, 5th grade, 6th grade
^{▲}, 7th grade, 8th grade, 9th grade, 10th grade, 11th grade, 12th grade, uncompleted nonuniversity higher education, completed non university higher education, university education, adult literacy program
^{◆}, other literacy program, parochia education, koranic education, other),
mother education (no schooling, some nursery school
^{●◆}, 1st grade, 2nd grade, 3rd grade, 4th grade, 5th grade, 6th grade, 7th grade, 8th grade, 9th grade, 10th grade, 11th grade, 12th grade, uncompleted nonuniversity higher education, completed non university higher education, university education, adult literacy program
^{●}, other literacy program, parochia education, koranic education, other),
ethnicity (Amhara
^{●▲}, Oromo
^{●◆▲}, Tigrai
^{●◆▲}, Adere, Afar, Gurage
^{●◆}, Somali, other, Gedeo
^{●◆▲}, Gamo, Kembata
^{▲}, Wolaita
^{●◆▲}, Hadiya
^{◆}, Saho
^{●◆▲}),
religion (None
^{◆}, Orthodox, Catholic
^{▲}, Muslim, Other Christian, Protestant, Traditional
^{●◆▲}, other)

Indonesia

gender
^{◆▲},
birthyear
^{●◆▲},
father education (no schooling, elementary schooling
^{●◆▲}, junior high, senior high
^{●◆}, junior college/college/university
^{▲}, other
^{◆})
mother education (no schooling, elementary schooling, junior high, senior high
^{●◆▲}, junior college/college/university, other),
ethnicity (Jawa
^{●◆▲}, Sunda
^{●◆}, Bali
^{●◆▲}, Minang, Betawi, other
^{▲}),
religion (Islam
^{▲}, Catholic
^{◆▲}, Protestant, Hindu, Buddha, Konghucu),
foreign language
^{●◆▲}

Malawi

gender
^{▲},
birthyear,
father education (no schooling
^{▲}, primary schooling, more than primary schooling, other),
mother education (no schooling
^{▲}, primary schooling
^{▲}, more than primary schooling, other),
religion (Catholic, Protestant
^{▲}, Revival, Moslem
^{▲}, Traditional, other)

Mexico

gender
^{●◆} ,
birthyear
^{●◆} ,
indigeneous
^{●◆}

Peru

gender
^{●◆▲},
birthyear
^{●◆▲},
birthplace (Amazonas
^{●◆▲}, Áncash
^{●◆}, Apurímac
^{●◆}, Arequipa
^{●◆▲}, Ayacucho
^{●◆▲}, Cajamarca
^{●◆▲}, Callao
^{●◆▲}, Cusco
^{▲}, Huancavelica
^{●◆}, Huánuco
^{●◆▲}, Ica
^{●◆}, Junín
^{●▲}, La Libertad
^{●◆▲}, Lambayeque
^{●◆▲}, Lima
^{●◆▲}, Loreto
^{●◆▲}, Madre de Dios
^{●◆▲}, Moquegua
^{◆▲}, Pasco, Piura
^{●◆▲}, Puno, San Martin, Tacna
^{▲}, Tumbes
^{▲}, Ucayali
^{●▲}, Other county
^{▲}),
chronic disease
^{●◆▲},
language (Quechua
^{●◆▲}, Aymara, other native language
^{●◆▲}, Spanish
^{●◆▲}, foreign language, deafdumb
^{●◆▲})

Russia

gender
^{●},
birthyear,
father education (without education/illiterate, elementary school/incomplete secondary school
^{●▲}, professional courses, vocational training without secondary education, vocational training with secondary education, secondary education
^{▲}, technical community college
^{●▲}, institute/university/academy
^{◆▲}, postgraduate course, academic degree
^{●}),
mother education (without education/illiterate, elementary school/incomplete secondary school
^{●◆▲}, professional courses
^{▲}, vocational training without secondary education, vocational training with secondary education, secondary education, technical community college, institute/university/academy
^{●◆▲}, postgraduate course, academic degree),
birthplace (Russia, Ukraine, Belorussia
^{●▲}, Azerbaizhan, Kazakhstan
^{◆}, Uzbekistan, other country
^{▲}),
father occupation (armed forces
^{◆}, legislators/senior officials/managers
^{●}, professionals
^{▲}, technicians/associate professionals, clerks, service workers/shop market sales work, skilled agricultural and fishery worker
^{▲}, craft and related trade workers, plant and machine operators/assemblers, elementary occupations),
mother occupation (armed forces
^{●▲}, legislators/senior officials/managers, professionals, technicians/associate professionals
^{◆▲}, clerks, service workers/shop market sales work, skilled agricultural and fishery worker, craft and related trade workers, plant and machine operators/assemblers, elementary occupations
^{●◆}),
birthplace urbanity (city
^{●◆▲}, urbantype settlement, village/Derevnia/Kishlak/Aul
^{●◆▲}),
height
^{●◆}

South Africa

gender
^{● ◆▲},
birthyear
^{●◆▲} ,
father education (Grade R/0, Grade 1
^{●◆▲}, Grade 2, Grade 3
^{●◆▲}, Grade 4, Grade 5
^{▲}, Grade 6
^{●}, Grade 7, Grade 8
^{◆▲}, Grade 9
^{●◆}, Grade 10
^{◆▲}, Grade 11
^{●▲}, Grade 12
^{●◆▲}, other, no schooling
^{●◆▲}, National Certificate Vocational 2, National Certificate Vocational 4, NTC 1, NTC 2, NTC 3),
mother education (Grade R/0, Grade 1, Grade 2
^{●}, Grade 3
^{●◆}, Grade 4, Grade 5
^{●◆▲}, Grade 6
^{▲}, Grade 7
^{◆}, Grade 8
^{●◆▲}, Grade 9, Grade 10
^{●◆▲}, Grade 11
^{▲}, Grade 12
^{●◆▲}, other, no schooling
^{●◆▲}, National Certificate Vocational 2, National Certificate Vocational 4, NTC 3),
foreign birthplace
^{▲},
ethnicity (African, Coloured
^{●◆}, Asian/Indian
^{●◆▲}, White
^{●◆▲}, other)

Tanzania

gender
^{▲},
birthyear
^{▲},
birthplace (nonforeign/foreign),
ethnicity (Mhaya, Mnyambo, Mhangaza, Msubi, Kishubi, Mzinza, other),
religion (Musilim, Catholic, Protestant, Other Christian, Traditional, other)

Thailand

gender,
birthyear,
father education (no education, less than P4, P4, more than P4),
mother education (no education
^{◆▲}, less than P4, P4, more than P4),
wealth of parents (among the poorest households in the village
^{◆}, around the middle in terms of wealth
^{◆▲}, among the rich households in the village),
land size of parents
^{◆▲}

Table 6
Year of interest, baseline and harmonized
Year baseline

Year harmonized

Difference in years



Argentina

2015

2009

6

Chile

2009

2009

0

China

2014

2010

4

Ethiopia

2009

2009

0

Indonesia

2013

2006

7

Malawi

2010

2010

0

Mexico

2009

2009

0

Peru

2010

2009

1

Russia

2017

2009

8

South Africa

2017

2008

9

Tanzania

2010

2010

0

Thailand

2017

2009

8

Existing studies
See Table
7.
Table 7
Overview existing studies
Country

Year

Study/source

Data source

Sample size

Sample restrictions

Circumstances

Outcome

Method

Index

Abs.

Rel. (%)


Argentina

2014

EqualChances.org

Encuesta Nacional sobre la Estructura Social

N.A.

Working age respondents

Education of Parents, Occupation of Parents, Origin (Race, Ethnic Origin, Area of Birth)

Equivalized Household Disposable Income

Parametric

Gini

0.159

41.99

Chile

2005

Ferreira et al. (
2018)

SocioEconomic Database for Latin America and the Caribbean

N.A.

N.A.

Gender, Ethnicity, language, region of residence

Net Household Income

Parametric

MLD

0.460

5.30

Chile

2009

EqualChances.org

Encuesta de Caracterización Socioeconómica Nacional

N.A.

Working age respondents

Education of Parents, Occupation of Parents, Origin (Race, Ethnic Origin, Area of Birth)

Equivalized Household Disposable Income

Parametric

Gini

0.230

49.94

China

2010

Golley et al. (
2019)

Survey of Women’s Social Status in China

15,974

Age 2465

Gender, Education of Father, Occupation of Father, Hukou status at birth, Region, Age

Indididual Labor Earnings

Parametric

MLD

0.140

25.00

China

2010–2014

Song and Zhou (
2019)

China Family Panel Studies

5,892

Households with children at school or positive education expenditures

Gender, Education of Father/Mother, Hukou Status at 3 years old

Individual Income

Parametric

Theil

0.069

21.70

China

19892006

Zhang and Eriksson (
2010)

China Health and Nutrition Survey

1287

Age 20–50, parental background information through longitudinal matching available

Gender, Age, Birthplace, Education of Parents, Employment of Parents, Income of Parents

Individual Income

Parametric

Gini

N.A.

63.00

Ethiopia

2005

Ferreira et al. (
2018)

Demographic and Health Survey

N.A.

N.A.

Region of birth, Religion, Mother tongue

Wealth Index

Parametric

Variance

6.160

7.70

Indonesia

2005

Ferreira et al. (
2018)

Demographic and Health Survey

N.A.

N.A.

Religion

Wealth Index

Parametric

Variance

2.450

2.10

Malawi

2010–2011

Brunori et al. (
2019a)

Third Integrated Household Survey

30,137

N.A.

Sex, Birthplace, Parental Education

Household Income

Tree

Gini

0.235

49.64

Mexico

2006

Juárez Wendelspiess Chávez (
2015)

Mexican Social Mobility Survey

5277

Age 25–64

Gender, Education of Father/Mother, Parents owned House, SocioEconomic Situation at age 14, Indigeneous Group

Individual Income

Parametric

Variance

N.A.

18.30

Mexico

2009

EqualChances.org

Mexican Family Life Survey

N.A.

Working age respondents

Education of Parents, Occupation of Parents, Origin (Race, Ethnic Origin, Area of Birth)

Equivalized Household Disposable Income

Parametric

Gini

0.135

23.80

Peru

2001

Ferreira and Gignoux (
2011)

Encuesta Nacional de Hogares

13,621

Age 30–49, household heads or spouses

Education of Father/Mother, Ethnicity, Region of Birth, Gender

Household per Capita Income

NonParametric

MLD

0.163

29.30

Russia

2006–2016

Brock et al. (
2016)

Life in Transition Survey

N.A.

N.A.

Gender, Birthplace Urbanity, Ethnicity, Education of Father/Mother, Membership of Communist Party

SelfReported Income over the last twelve months

Parametric

Gini

0.130

34.50

South Africa

2008–2012

Piraino (
2015)

National Income Dynamics Study

2587

Age 20–44, only male respondents

Race, Education of Father

Individual Gross Income

NonParametric

MLD

N.A.

23.70

South Africa

2014

EqualChances.org

National Income Dynamics Study

N.A.

Working age respondents

Education of Parents, Occupation of Parents, Origin (Race, Ethnic Origin, Area of Birth)

Equivalized Household Disposable Income

Parametric

Gini

0.337

57.66

Tanzania

2010–2011

Brunori et al. (
2019a)

World Bank, National Panel Survey

20,569

N.A.

Sex, Birthplace, Parental Education

Household Income

Tree

Gini

0.177

44.71

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Footnotes
4
Note that the current literature largely abstracts from timevariant circumstance characteristics. This abstraction can be rationalized by the blurry distinction between timevariant factors beyond individual control and individual efforts. For example, consider local economic shocks or local outburst of conflict as potential embodiments of timevariant circumstances. Their effect could be confounded by individual migration decisions which are at least partially under individual control. However, as we outline below, our normative framework accounts for the effect of such factors to the extent that they are correlated with timeconstant factors such as the region of birth.
5
This normative assumption is adopted by much of the empirical literature on IOp but can be easily relaxed, see Niehues and Peichl (
2014) and Jusot et al. (
2013). We refrain from doing so in our empirical application since restricting samples on availability of effort information would further reduce the number of observations.
6
\(\frac{\sigma ^2}{2}\) represents the residual variance that corrects for differences in the marginal impact of circumstances due to the logtransformation (Blackburn
2007).
7
Intuitively,
kfold crossvalidation works as follows. The sample is divided into
kfolds. Under each specification, the model parameters are estimated on
\(k1\) folds and the ensuing predictions are benchmarked against the data points in the
\(k{th}\) fold. Repeating this procedure
k times, one chooses the model that delivers the lowest average meansquared prediction error across the
k iterations.
8
The general idea of crossvalidation is explained in footnote 7. In the case of lasso estimations, its implementation is as follows: We reestimate Eq.
4 for different values of
\(\lambda \) on each of the five folds. Ultimately, we choose
\(\lambda \) that on average minimizes the meansquared prediction error across the five folds. The meansquared prediction error is a standard measure of prediction accuracy (Hastie et al.
2013) and the appropriate target statistic to tradeoff upward and downward bias in inequality of opportunity estimates (Brunori et al.
2019b). In Table
3 we show the chosen values of
\(\lambda \) for each country in our sample.
9
The postlasso approach will yield results that are more in line with standard estimations based on OLS. This is the case since standard lasso retains parameter estimates that are shrunk by penalization. To the contrary—and analogous to OLS—postlasso reestimates these parameters without penalization.
10
Accounting for yearspecific shocks is necessary since the panel data used to estimate the fixed effect are unbalanced. In case of a balanced panel, the individual fixed effect would be completely orthogonal to the yearspecific shock, i.e. one could abstract from
\(u_v\).
11
Technically, the Gini coefficient nevertheless yields conservative IOp estimates as the residual in the Gini decomposition does contain elements of betweengroup inequality (Brunori et al.
2019a).
12
For countries, in which multiple panel data sets are available, we use the data set with the highest number of waves.
14
Point estimates for absolute IOp, relative IOp, as well as total inequality are disclosed in Table
3.
15
As highlighted above: Unless otherwise indicated the LB estimate refers to the standard lasso estimation.
16
This crosscountry average conceals heterogeneity. In particular, the lower the sample size relative to the number of estimated circumstance parameters, the larger the difference between S and LB. See Table
3 where we list the ratio of sample size and estimated parameters by country.
17
Due to differences in the underlying data, we refrain from comparing our results to other IOp estimates in the relevant countries: See for example, Brock et al. (
2016), Brunori et al. (
2019a) Ferreira and Gignoux (
2011), Ferreira et al. (
2018), Golley et al. (
2019), Piraino (
2015), Song and Zhou (
2019), Juárez Wendelspiess Chávez (
2015), Zhang and Eriksson (
2010). These differences pertain to reference periods, the considered outcomes of interest, the detail of available circumstance characteristics, sample selection criteria, estimation methods, as well as inequality indices. However, we provide detailed information on these studies in Table
7.