3.1 Model setup
The setup follows Huber and Melly (
2015). The wage equation for all individuals (employed or unemployed) is
$$\begin{aligned} Y^{*}=X\beta +v \, , \end{aligned}$$
(1)
where
\(Y^{*}\) denotes the latent log wage in the absence of selection,
X the vector of observable covariates, being determinants of wages,
v the error term, and
\(\beta \) the vector of coefficients. We assume that
\(\beta _{0.5}=\beta \), i.e.,
\(\beta \) represents the median coefficients and
v represents the residual of a median regression. Assuming a linear quantile regression, the conditional
\(\tau \)-quantile of the latent wage
\(Q_{\tau }(Y^{*}|X)\) is specified by
$$\begin{aligned} Q_{\tau }(Y^{*}|X)=X\beta +Q_{\tau }(v|X)=X\beta _{\tau } \, , \end{aligned}$$
(2)
which also means that
\(Q_{\tau }(v|X)=X(\beta _{\tau }-\beta )\) is a linear function of
X. Correspondingly, the
\(\tau \)th quantile regression of
\(Y^{*}\) is
\(X\beta _{\tau }+v_{\tau }\), with
\(v_{\tau }=v-Q_{\tau }(v|X)=v-X(\beta _{\tau }-\beta )\).
The selection problem arises because we only observe wages for employed individuals. Let
Y denote the observed wage and
D the selection indicator. We specify
$$\begin{aligned} D=1(Z\gamma +\varepsilon \ge 0) \, , \end{aligned}$$
where
Z is a strict superset of
X, thus also including instruments for selection, which are excluded in Eq. (
1), and
\(\varepsilon \) is assumed to be independent of
Z. The probability of selection
$$\begin{aligned} Pr(D=1|Z)=Pr(Z\gamma +\varepsilon > 0 | Z) \, \end{aligned}$$
(3)
is a function of
\(Z\gamma \). For the selective sample, the observation rule is
\(Y=Y^{*}\) (
\(Y^*\) observed) only if
\(D=1\). A conditional quantile in the selected sample is
$$\begin{aligned} Q_{\tau }(Y|Z)= & {} X\beta _{\tau }+Q_{\tau }(v_{\tau }|Z,D=1) \, . \end{aligned}$$
(4)
The term
\(Q_{\tau }(v_{\tau }|Z,D=1)\) denotes the quantile-
\(\tau \)-specific selection bias, with
\(Q_{\tau }(v_{\tau }|Z,D=1)>(<)0\) representing positive (negative) selection. The selection bias can be rewritten as
$$\begin{aligned} Q_{\tau }(Y|Z) = Q_{\tau }(Y^{*}|Z,D=1) = X\beta _{\tau }+{\tilde{g}}(X,Z\gamma ) \end{aligned}$$
(5)
where
\(Q_{\tau }(v_{\tau }|Z,D=1)={\tilde{g}}(X,Z\gamma )\) because
\(v_{\tau }\) depends on
X and
\(D=1\) on
\(Z\gamma \).
The control function
\({\tilde{g}}(X,Z\gamma )\), which properly accounts for selection bias, should be a flexible function of
X and
\(Z\gamma \), which is challenging because of the curse-of-dimensionality regarding
X being multivariate. Nonparametric identification requires both independent variation in
\(Z\gamma \) given
X and identification at infinity. Identification at infinity means that with positive probability, based on the distribution of
\(Z\gamma \), the selection probabilities
\(Pr(D=1|Z)\) is close to one (Das et al.
2003). The selection model above implies that
\(Q_{\tau }(v_{\tau }|Z,D=1)\) converges to zero (no selection), if the employment probability
\(P(D=1|Z)\) converges to one, which is equivalent to
\(Z\gamma \) going to infinity.
Extending upon Heckman (
1990) and Andrews and Schafgans (
1998), who consider the case where
u is independent of
X, both the intercept and the slope coefficients
\(\beta \) can be identified, if we have observations with a selection probability close to one for each value of
X. Given the linear specification of
\(X\beta _\tau \), a smaller subspace of the support
A of
X suffices, where
\(E\left[ (X'X)\cdot I(X\in A)\right] \) can be inverted [
I(.) denotes the indicator function] and where the selection probability is close to one with positive probability. In our application, the selection probability is quite large for most observations and the subset of observations with a selection probability close to one (to anticipate: the median (upper quartile) of the selection probabilities lies above 93% (96%) in all four subsamples considered, see Table
3), is sufficiently large to estimate
\(\beta _\tau \) consistently. In our application, we will use the coefficient estimates based on the identification-at-infinity sample to characterize the selection bias in the full sample.
5
3.2 Buchinsky’s approach
The selection correction approach proposed by Buchinsky (
1998;
2001) applies a standard Heckman selection approach with instruments (Heckman
1979; Vella
1998) to quantile regression. Buchinsky specifies the selection correction term in the second stage [Eq. (
3)] as a function of the inverse Mills ratio
\(\lambda (Z{\hat{\gamma }})\). However, even under joint normality of
\(\varepsilon \) and
v, the selection correction term
\(Q_{\tau }(v_{\tau }|Z,D=1)\) is generally not a linear function in
\(\lambda \). Thus, Buchinsky suggests to approximate the selection correction term
\(Q_{\tau }(v_{\tau }|Z,D=1)\) by a power series (polynomial) of
\(\lambda \) (see Vella 1998 on semiparametric approaches for selection correction in mean regressions). Further, Buchinsky assumes that the joint distribution of
v and
\(\varepsilon \) is independent of
Z, conditional on the probability of selection
\(Pr(Z\gamma +\varepsilon > 0 | Z)\) (Huber and Melly
2015).
In the second step, the selection-corrected quantile regression
$$\begin{aligned} Q_{\tau }(Y|X)= & {} X{\beta }_{\tau }+\theta _{\tau }g(\lambda ) \end{aligned}$$
(6)
is estimated for the selective sample with
\(D=1\). Equation (
6) presumes that
\(\theta _{\tau }g(\lambda )\) represents
\(Q_{\tau }(v_{\tau }|Z,D=1)\).
g(.) is a power series of
\(\lambda \), and thus
\(\theta _{\tau }g(\lambda )\) approximates the selection correction term
\(Q_{\tau }(v_{\tau }|Z,D=1)\).
Without the assumption that the joint distribution of
v and
\(\varepsilon \) is independent of
X conditional on
\(Z\gamma \), the selection model specified by Eqs. (
2) and (
3) implies that the selection correction term
\(Q_{\tau }(v_{\tau }|Z,D=1)\) is some unknown function of both
X and
\(Z{\gamma }\), see discussion of Eq. (
5) in Sect.
3.1.
3.3 Huber–Melly test for conditional independence
Huber and Melly (
2015) propose a quantile regression based test for the conditional independence assumption, which says that the joint density of
v and
\(\varepsilon \) is independent of
Z conditional on
\(Z\gamma \). As noted by Huber and Melly (
2015), Buchinsky’s approach builds upon this conditional independence assumption, which implies homogeneous slope coefficients across all quantiles, see discussion of Eq. (
2) in Sect.
3.1.
6
We illustrate this point in the following. Conditional independence implies for the joint density of
v and
\(\varepsilon \)$$\begin{aligned} f_{{v},\varepsilon }(\cdot |Z)=f_{{v},\varepsilon }(\cdot |Pr(D=1|Z))=f_{{v},\varepsilon }(\cdot |Z\gamma ) \, . \end{aligned}$$
(7)
When there is no sample selection, i.e.,
\(Pr(D=1 | Z)=1 \,\forall Z\), Eq. (
7) implies that
v and
\(\varepsilon \) are independent of
Z. Under conditional independence, the quantile regression coefficients
\(\beta _{\tau }\) are identified when controlling for the selection bias term
\(Q_{\tau }(v_{\tau }|Z,D=1)\) only by flexible function of
\(Z\gamma \) as in Buchinsky (1998, 2001), see also Huber and Melly (
2015, Sect. 2.2).
Conditional independence in Eq. (
7) also holds for
\({v}_{\tau }\) and
\(\varepsilon \), implying that
\(Q_{\tau }(v_{\tau }|Pr(D=1|Z),D=1) - Q_{\tau }(v|Pr(D=1|Z),D=1)\) does not depend upon
Z conditional upon the selection probability. Thus, the term
\(X(\beta _{\tau }-\beta )\) only involves a constant difference in the intercept, meaning that the slope coefficients in
\(\beta _{\tau }\) do not depend upon
\(\tau \).
When the conditional independence assumption does not hold, slope coefficients \(\beta _{\tau }\) may vary across quantiles, which is typically a motivation as to why researchers apply quantile regression in the first place. This limits the applicability of Buchinsky’s approach.
Huber and Melly (
2015) suggest a test based on the entire process of quantile regression coefficients to investigate whether the conditional independence assumption holds. They estimate quantile coefficients for a fine grid of quantiles across the distribution and then test the null hypothesis that the slope coefficients are identical. Violations of the null hypothesis are detected by using Kolmogorov–Smirnov (KS) and Cramér–von Mises (CM) test statistics to the coefficient process across quantiles. In practice, Huber and Melly use a grid of quantiles and suggest to implement the test for a range from the 10th to the 90th percentile as a starting point. The first stage is estimated using the semiparametric Klein and Spady (
1993) estimator. The sample selection correction is based on a polynomial in the inverse mills ratio of the estimated index function estimated. Inference is based on resampling the influence function of the quantile regression estimator, building on the differentiability of the selection correction function to take account of the first stage estimation error.
3.4 Our approach
In short, we first implement Buchinsky’s approach based on the original data and then apply the conditional independence test which strongly rejects. This is why we suggest to transform the dependent variable to account of heteroscedasticity in the original data and then apply Buchinsky’s approach on the transformed dependent variable. Relying on identification at infinity, the transformation is based on quantile regressions for the subsample with a very high probability of participating. In our application, we are successful in finding a transformation after which the Huber–Melly test passes. Note that it is not guaranteed to find such a transformation and we perform a specification search to find a proper transformation. If the conditional independence assumption is not rejected for the transformed model, we can use the transformed model to account for selection bias. Transforming back the dependent variable allows us to estimate counterfactual distributions in absence of selection or in the presence of a different selection mechanism.
7
Now, we describe in detail different steps of our approach:
1.
To estimate the probability to be in the selective sample, we estimate a Probit regression
\(Pr(D=1|Z)=\Phi (Z\gamma )\), assuming that the distribution of
\(\varepsilon \) in Eq. (
3) is independent of
Z.
8
2.
Based on the Probit estimates in step 1), a subsample of the data is determined for which identification at infinity is plausible, i.e., selection is negligible. We estimate standard quantile regressions based on this identification-at-infinity subsample. Using coefficient estimates
\(\delta _{u}\),
\(\delta _{l}\) at the upper quantile
u and the lower quantile
l, respectively, we then estimate the predicted conditional quantile differences (
l and
u are tuning parameters)
$$\begin{aligned} \sigma (X,\delta )=X\delta _{u}-X\delta _{l} \end{aligned}$$
(8)
for a worker with characteristics
X. The transformation then involves dividing
Y by
\(\sigma (X,\delta )\).
9
3.
Next, we run selection-corrected quantile regressions for the transformed outcome:
$$\begin{aligned} Q_{\tau }\left( \left. \frac{Y}{\sigma (X,\delta )} \right| X \right) =X{\check{\beta }}_{\tau }+g(\theta _{\tau },Z\gamma ) \, . \end{aligned}$$
(9)
We specify the selection correction as a piecewise constant function, with
\(g(\theta _{\tau },Z\gamma )=\sum _{j=1}^4 \theta _{\tau ,j} I(Z\gamma \in Q_j)\) involving dummies for four quintiles of the propensity score
\(I(Z\gamma \in Q_j)\) and
\(\theta _{\tau }=(\theta _{\tau ,j})_{j=1,\ldots ,4}\) (the highest quintile
\(Q_5\) represents the omitted category).
10 Then, as our implementation of the Huber–Melly test for conditional independence, we implement a Wald test of the equality of the slope coefficients
\({\check{\beta }}_{\tau }\) along a grid of
\(\tau \).
4.
This step assumes that the conditional independence test in the previous step passes. We run OLS for the transformed model for the identification-at-infinity sample and then estimate the selection effect based on quantile regressions of the OLS residuals based on the entire sample.
11 We then use the implied residuals based on entire sample to estimate the selection effects along the distribution.
5.
Finally, we undo the transformation by multiplying the coefficients with \(\sigma (X,\delta )\).
For simplicity, we implement the conditional independence test as a Wald test of the equality of slope coefficients over an equi-spaced grid of quantiles. Our application differs from Huber and Melly (
2015) regarding the following three issues, which prevent us from using their implementation. First, bootstrapping the entire estimation process, inference takes account of the estimation error in all stages including the transformation. Second, applying a weighted cluster bootstrap inference avoids nonconvergence of the Probit in the first stage and is cluster robust at the regional level, which is the level of the variation in the instruments.
12 Third, we approximate the selection correction term by a piece-wise constant selection correction function which is non-differentiable. Furthermore, implementing the Huber–Melly test for Buchinsky’s estimator using a polynomial in the inverse-Mills-ratio based on the untransformed model requires a lot of computation time due to our large sample size.
If the conditional independence test for the transformed model rejects, we use this for respecifying our estimation approach. Note as a caveat that inference for our Wald tests for homogeneous slopes does not take account of the fact that we search for a transformation such that the conditional independence test passes. Hence, multiple hypotheses testing is a concern given that we search for the proper specification of the transformation model.
13 A key point is that in contrast with the standard concern in the literature about searching for significant effects by running different model specifications, here we search for a transformation of the dependent variable which leads to a non-rejection. Thus, standard approaches (e.g., Bonferroni/Holm) to adjust critical values (
p-values) under the zero hypothesis do not apply—rather power concerns arise. Our approach involves testing different (typically incompatible) zero hypotheses, and the validity of the final estimates hinges on the nonrejected zero hypothesis being true. To explore whether the first-best transformation involves a singular non-rejection, we also report the results for the second-best transformation.
14 The latter prove very close to those of the first-best ones, thus strengthening our findings. As an additional robustness check, we perform a random split of the sample into a training sample to estimate the transformation model and a validation sample to perform the conditional independence test and to estimate the selection-corrected quantile regressions. Our findings show that the transformation model from the training sample implies a non-rejection of the conditional independence test when implemented for the validation sample. Also, the model fit in the validation sample is very good. These additional findings are available upon request.
As part of our specification search, we investigate which quantile regression coefficients change strongly across quantiles. To illustrate this point, note that, based on preliminary estimates, the conditional independence tests never passed for a model pooling both education groups. Therefore, we conclude that the nature of the selection bias differs between the two education groups, which motivates us to estimate separate models by education group.
15
3.5 Counterfactual wage distribution under alternative selection rules
We use the estimated selection-corrected quantile regressions to estimate the counterfactual wage distribution under different selection rules. We estimate the counterfactual distribution using a selection-corrected Melly (
2006) approach as in Albrecht et al. (
2009) (see also Machado and Mata
2005; Chernozhukov et al.
2013), while taking account of the transformation of the outcome. Let
Z,
X,
\(g(Z\gamma )\) apply to the observed sample and
\({\tilde{Z}}\),
\({\tilde{X}}\), and
\(g({\tilde{Z}}{\tilde{\gamma }})\) to the counterfactual sample, where
\({\tilde{\gamma }}\) represents the counterfactual selection rule. Specifically, we estimate two counterfactuals: First, the wage distribution if all individuals in the sample were employed, and, second, the wage distribution if the selection rule of a different calendar year applies. The first counterfactual involves the covariates
\({\tilde{X}}\) of the entire sample and sets
\(g(\theta _{\tau },{\tilde{Z}}{\tilde{\gamma }})\) equal to zero, i.e.,
\(\theta _{\tau }=0\), corresponding to a selection probability of one. For the second counterfactual,
\({\tilde{Z}}\) and
\({\tilde{X}}\) represent the employees and
\(g({\tilde{Z}}{\tilde{\gamma }})\) their selection rule (implied by the first stage Probit estimates) in the different calendar year.
16
Our implementation of the Melly (
2006) approach uses predictions of conditional quantiles for a fine grid of equi-spaced
\(\tau \in [0.01,0.02,\ldots ,0.99]\) for each observation in the counterfactual sample to estimate the conditional distribution of log wages. The counterfactual conditional quantile is
$$\begin{aligned} Q_{\tau }(Y|{\tilde{Z}})=\sigma ({\tilde{X}},\delta )\left[ {\tilde{X}}{\check{\beta }}_{\tau }+g(\theta _{\tau },{\tilde{Z}}{\tilde{\gamma }})\right] \, , \end{aligned}$$
where
\({\check{\beta }}_{\tau }\),
\(\delta \), and
\(g(\theta _{\tau },.)\) (including the definition of the quintile dummies) are estimates based on the observed sample.
We then stack the 99 predictions for all individual observations in the counterfactual sample represented by (\({\tilde{Z}},{\tilde{X}}\)) and calculate the unconditional empirical quantiles of the entire expanded sample, where the number of observations is 99 times the number of observations in the counterfactual sample. This counterfactual distribution, denoted by \(T_{Y}({\tilde{X}},{\check{\beta }},\delta ,\theta ,{\tilde{\gamma }})\), represents the counterfactual distribution of Y for the sample with characteristics \({\tilde{Z}}\), the alternative selection rule \({\tilde{\gamma }}\), the selection-corrected coefficients for the transformed model \({\check{\beta }}\), the coefficients of the selection correction terms \(\theta \), and the transformation coefficients \(\delta \).
The difference between the observed wage distribution, which is denoted by
\(TO_{Y}\) representing the quantiles of
Y in the selective observed sample with
\(D=1\), and the counterfactual distribution
\(T_{Y}({\tilde{Z}},{\check{\beta }},\delta ,\theta ,{\tilde{\gamma }})\) is given by
$$\begin{aligned} TO_{Y}-T_{Y}({\tilde{Z}},{\check{\beta }},\delta ,\theta ,{\tilde{\gamma }}) \, . \end{aligned}$$
(10)
This difference measures the total effect of selection relative to the counterfactual.
We can now decompose the total selection effect into a component due to differences in observed characteristics driving wages, i.e., the difference between
X and
\({\tilde{X}}\), and a component due to differences in selection based on unobservables. To this end, we calculate the counterfactual distribution denoted by
\(T_{Y}({\tilde{X}},\alpha )\) based on running linear quantile regressions using
X from the observed sample of employees (without transformation) and then predicting the counterfactual distribution for the sample with
\({\tilde{X}}\) using the Melly (
2006) approach as described above. Here,
\(\alpha \) involves the quantile regression coefficients for the observed sample.
The total selection effect in Eq. (
10) can be decomposed into the effect of changes in observable characteristics
$$\begin{aligned} TO_{Y}-T_{Y}({\tilde{X}},\alpha ) \, , \end{aligned}$$
(11)
and the residual effect of selection on unobservables
$$\begin{aligned} T_{Y}({\tilde{X}},\alpha ) -T_{Y}({\tilde{Z}},{\check{\beta }},\delta ,\theta ,{\tilde{\gamma }})\, . \end{aligned}$$
(12)
We now discuss the two cases separately. The first counterfactual wage distribution which would prevail if all observed individuals in a given year, both full-time workers and unemployed, were employed and earning market wages is obtained by setting
\(\theta _{\tau }\) equal to zero. Then, Eq. (
10) defines the total effect of selection into work, which is decomposed into the selection effect due to observables [Eq. (
11)] and the effect of selection on unobservables [Eq. (
12)] when contrasting full-time workers with the total sample of full-time workers and unemployed.
The second counterfactual wage distribution allows us to study the effect of changes in selection over time. To estimate this counterfactual, we fix the conditional probability of selection into full-time work, i.e., the index \(Z\gamma \), and the distribution of observed characteristics fixed at the level of the base year. Using the coefficient estimates obtained in the observation year (in our application the year 2010), we estimate the counterfactual wage distribution under the selection rule of a base year (in our application the year 1995). Let the index b denote the base year and o the observation year.
Then,
$$\begin{aligned} TO_{Y}^o-T_{Y}(Z^b,{\check{\beta }}^o,\delta ^o,\theta ^o,{\tilde{\gamma }}^b) \, \end{aligned}$$
(13)
is the total selection effect. It can be decomposed as above into the effect of the change between base year and observation year in the selection of observables and in the selection on unobservables, both among full-time workers. To account for the selection of observables, we estimate the counterfactual distribution
\(T_{Y}({\tilde{X}},\alpha )\) [as in Eq. (
11)] with observables in the employment sample 1995
\({\tilde{X}}\) and coefficients
\(\alpha \) for wage regressions among the employed in 2010. To account both for selection on observables and unobservables, we estimate
\(T_{Y}(Z^b,{\check{\beta }}^o,\delta ^o,\theta ^o,{\tilde{\gamma }}^b)\) [as in Eq. (
12)] where
\({\check{\beta }}^o,\delta ^o,\theta ^o\) represent the coefficient estimates of our selection-corrected quantile regressions in (
\(o=\)) 2010.
\(Z^b\) are the sample characteristics for the employed in 1995,
\({\tilde{\gamma }}^b\) the coefficients of the selection model in 1995, and
\(Z^b{\tilde{\gamma }}^b\) determines the 1995 selection probability.
The following standard caveat applies: These counterfactual distributions do not account for general equilibrium effects which might potentially lead to changing returns to skills in response to an influx of previously unemployed into employment [see the detailed discussion in Fortin et al. (
2011)]. One likely response to such an influx would be falling returns to those skill levels over-represented among the unemployed, e.g., low levels of education. Therefore, returns to education might increase due to higher relative scarcity. Then, the estimated counterfactual wage distribution would be less dispersed than the one arising when all unemployed are employed and general equilibrium effects operate.