1 Introduction
2 The DS-HECK estimator
2.1 Settings
2.2 Estimation of the selection equation
-
Step 1 (post-lasso probit) We start by estimating a penalized probit of \(y_2\) on \(\textbf{x}\) and \(\textbf{z}\) using the lasso penalty:where \( \mathbb {E}_N \) denotes the sample mean of N observations, \(\Lambda _i(\cdot )\) is the negative log-likelihood for the probit model, \(||\cdot ||_1 \) is the lasso (\(l_1\)) norm of the parameters and \(\uplambda _1\) is a tuning parameter chosen using the plug-in method of Drukker and Liu (2022). This produces a subset of the variables in \(\textbf{z}\) indexed by \(support({\hat{{\eta }}})\), where for a p-vector v, \(support(v):=\{ j \in \{1,..., p\}: v_j \ne 0\}\). These variables are used in the post-lasso probit:$$\begin{aligned} ({\hat{{\beta }}}, {\hat{{\eta }}})&= \mathop {\mathrm {arg\,min}}\limits _{{\beta }, {\eta }} \mathbb {E}_N (\Lambda _i({\beta }, {\eta })) + \uplambda _1 ||({\beta }, {\eta })||_1, \end{aligned}$$As a result, we obtain the sparse probit estimates \(({\tilde{{\beta }}}, {\tilde{{\eta }}})\) where \({\tilde{{\eta }}}\) contains only a few non-zero elements. Belloni et al. (2016b) propose using these estimates to construct weights \({\hat{f}}_i = {\hat{w}}_i/{\hat{\sigma }}_i\), where \({\hat{w}}_i = \phi (\textbf{x}_i'{\tilde{{\beta }}} + \textbf{z}_i'{\tilde{{\eta }}})\), and \({\hat{\sigma }}_i^2 = \Phi (\textbf{x}_i'{\tilde{{\beta }}} + \textbf{z}_i'{\tilde{{\eta }}}) (1 - \Phi (\textbf{x}_i'{\tilde{{\beta }}} + \textbf{z}_i'{\tilde{{\eta }}}))\), for \(i=1,\ldots , N\).$$\begin{aligned} ({\tilde{{\beta }}}, {\tilde{{\eta }}})&= \mathop {\mathrm {arg\,min}}\limits _{{\beta }, {\eta }} \mathbb {E}_N (\Lambda _i({\beta }, {\eta })): support({\eta }) \subseteq support({\hat{{\eta }}}) \end{aligned}$$
-
Step 2. We use the weights from Step 1, to run a weighted lasso regression in which for each variable \(x_j\) in \(\textbf{x}\), \(j=1, \ldots , k\), we run the penalized regression of \({\hat{f}}_i x_{ij}\) on \({\hat{f}}_i \textbf{z}_i\),where \(\uplambda _2\) is chosen by the plug-in method of Drukker and Liu (2022). For each element of \(\textbf{x}\), this produces a selection from the variables in \(\textbf{z}\) indexed by \(support({{\hat{\theta }}}_j), j=1, \ldots , k\).$$\begin{aligned} {\hat{\theta }}_j = \mathop {\mathrm {arg\,min}}\limits _{\theta _j} \mathbb {E}_N ({\hat{f}}_i^2(x_{ij} - \textbf{z}_i'\theta _j)^2) + \uplambda _2||\theta _j||_1, \end{aligned}$$
-
Step 3 (double-selection probit). We use the variables selected from \(\textbf{z}\) in Steps 1 and 2 to run the probit of \(y_2\) on \(\textbf{x}\) and the union of the sets of variables selected in Steps 1 and 2:where \(support(\eta ) \subseteq support({{\hat{{\eta }}}}) \cup support({{\hat{\theta }}}_1) \cup \ldots \cup support({\hat{\theta }}_k)\).$$\begin{aligned} ({\check{{\beta }}}, {\check{\eta }}) = \mathop {\mathrm {arg\,min}}\limits _{\beta , \eta } \mathbb {E}_N (\Lambda _i({\beta }, \eta ){\hat{f}}_i/{\hat{\sigma }}_i), \end{aligned}$$
vce(robust)
syntax. We obtain a regular \(\sqrt{N}\)-consistent estimator of \(\beta \) with standard inference, even though a penalized non-\(\sqrt{N}\) estimator is being used to carry out model selection for the high-dimensional nuisance parameter \(\eta \).2.3 Connection to redundancy of moment conditions
2.4 Choice of penalty parameter
2.5 Estimation of the main equation
2.6 Variance matrix estimation
dsheckman
command implements this variance estimator.3 Monte Carlo simulations
dsheckman
, available on the authors’ web pages along with data sets and codes for the simulations and application, and we describe its syntax in Appendix C.3.1 Setup
3.2 Results
\( p = 1000 \) | \( p = 2100 \) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
True | MSE | Mean | SD | \({\overline{SE}}\) | Rej. Rate | MSE | Mean | SD | \({\overline{SE}}\) | Rej. Rate | |
\(\gamma = 0.4\) | |||||||||||
Oracle | 1 | 0.0014 | 0.9984 | 0.0369 | 0.0374 | 0.0450 | 0.0013 | 0.9985 | 0.0362 | 0.0374 | 0.0510 |
DS | 1 | 0.0025 | 0.9977 | 0.0501 | 0.0484 | 0.0540 | 0.0022 | 0.9991 | 0.0464 | 0.0486 | 0.0380 |
Naive | 1 | 0.0049 | 0.9424 | 0.0395 | 0.0350 | 0.4200 | 0.0052 | 0.9379 | 0.0360 | 0.0349 | 0.4310 |
OLS | 1 | 0.0046 | 0.9422 | 0.0350 | 0.0350 | 0.3890 | 0.0045 | 0.9421 | 0.0339 | 0.0350 | 0.3610 |
\(\gamma = 0.5\) | |||||||||||
Oracle | 1 | 0.0014 | 0.9993 | 0.0380 | 0.0387 | 0.0430 | 0.0015 | 0.9980 | 0.0389 | 0.0387 | 0.0480 |
DS | 1 | 0.0025 | 1.0000 | 0.0504 | 0.0502 | 0.0570 | 0.0027 | 0.9979 | 0.0518 | 0.0502 | 0.0600 |
Naive | 1 | 0.0067 | 0.9289 | 0.0409 | 0.0362 | 0.5240 | 0.0075 | 0.9229 | 0.0389 | 0.0361 | 0.5840 |
OLS | 1 | 0.0063 | 0.9291 | 0.0353 | 0.0362 | 0.4930 | 0.0066 | 0.9277 | 0.0373 | 0.0363 | 0.5190 |
\(\gamma = 0.6\) | |||||||||||
Oracle | 1 | 0.0016 | 0.9995 | 0.0395 | 0.0401 | 0.0440 | 0.0015 | 0.9984 | 0.0393 | 0.0400 | 0.0520 |
DS | 1 | 0.0025 | 1.0001 | 0.0504 | 0.0519 | 0.0460 | 0.0027 | 0.9981 | 0.0522 | 0.0521 | 0.0500 |
Naive | 1 | 0.0089 | 0.9183 | 0.0475 | 0.0376 | 0.6110 | 0.0101 | 0.9081 | 0.0413 | 0.0373 | 0.6880 |
OLS | 1 | 0.0087 | 0.9148 | 0.0374 | 0.0376 | 0.6190 | 0.0087 | 0.9144 | 0.0370 | 0.0376 | 0.6150 |
\(\gamma = 0.7\) | |||||||||||
Oracle | 1 | 0.0019 | 0.9969 | 0.0435 | 0.0416 | 0.0690 | 0.0018 | 1.0001 | 0.0419 | 0.0419 | 0.0510 |
DS | 1 | 0.0032 | 0.9956 | 0.0564 | 0.0539 | 0.0570 | 0.0028 | 1.0008 | 0.0525 | 0.0543 | 0.0410 |
Naive | 1 | 0.0126 | 0.8999 | 0.0508 | 0.0389 | 0.7400 | 0.0135 | 0.8930 | 0.0455 | 0.0390 | 0.7640 |
OLS | 1 | 0.0118 | 0.8993 | 0.0401 | 0.0392 | 0.7170 | 0.0115 | 0.9005 | 0.0397 | 0.0393 | 0.7110 |
\(\gamma = 0.8\) | |||||||||||
Oracle | 1 | 0.0019 | 0.9962 | 0.0437 | 0.0434 | 0.0470 | 0.0019 | 0.9968 | 0.0438 | 0.0434 | 0.0510 |
DS | 1 | 0.0031 | 0.9965 | 0.0560 | 0.0558 | 0.0470 | 0.0034 | 0.9966 | 0.0582 | 0.0562 | 0.0530 |
Naive | 1 | 0.0165 | 0.8835 | 0.0544 | 0.0406 | 0.8020 | 0.0173 | 0.8777 | 0.0484 | 0.0404 | 0.8360 |
OLS | 1 | 0.0152 | 0.8838 | 0.0408 | 0.0409 | 0.8150 | 0.0148 | 0.8858 | 0.0414 | 0.0408 | 0.7870 |
\( p = 1000 \) | \( p = 2100 \) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
True | MSE | Mean | SD | \({\overline{SE}}\) | Rej. Rate | MSE | Mean | SD | \({\overline{SE}}\) | Rej. Rate | |
\(\gamma = 0.4\) | |||||||||||
Oracle | 1 | 0.0028 | 1.0018 | 0.0527 | 0.0524 | 0.0530 | 0.0027 | 1.0014 | 0.0518 | 0.0524 | 0.0500 |
DS | 1 | 0.0033 | 1.0015 | 0.0576 | 0.0584 | 0.0390 | 0.0034 | 1.0025 | 0.0580 | 0.0587 | 0.0530 |
Naive | 1 | 0.0145 | 1.1018 | 0.0641 | 0.0530 | 0.5210 | 0.0151 | 1.1093 | 0.0564 | 0.0532 | 0.5340 |
OLS | 1 | 0.0057 | 1.0541 | 0.0523 | 0.0512 | 0.1820 | 0.0054 | 1.0538 | 0.0506 | 0.0512 | 0.1760 |
\(\gamma = 0.5\) | |||||||||||
Oracle | 1 | 0.0029 | 0.9998 | 0.0536 | 0.0542 | 0.0470 | 0.0031 | 1.0029 | 0.0558 | 0.0542 | 0.0620 |
DS | 1 | 0.0036 | 1.0005 | 0.0600 | 0.0603 | 0.0380 | 0.0038 | 1.0030 | 0.0613 | 0.0606 | 0.0600 |
Naive | 1 | 0.0200 | 1.1249 | 0.0659 | 0.0547 | 0.6420 | 0.0224 | 1.1365 | 0.0614 | 0.0548 | 0.7020 |
OLS | 1 | 0.0070 | 1.0651 | 0.0523 | 0.0530 | 0.2460 | 0.0077 | 1.0683 | 0.0546 | 0.0530 | 0.2540 |
\(\gamma = 0.6\) | |||||||||||
Oracle | 1 | 0.0031 | 1.0014 | 0.0555 | 0.0562 | 0.0510 | 0.0030 | 1.0020 | 0.0548 | 0.0560 | 0.0450 |
DS | 1 | 0.0038 | 1.0034 | 0.0620 | 0.0623 | 0.0450 | 0.0038 | 1.0038 | 0.0615 | 0.0628 | 0.0410 |
Naive | 1 | 0.0271 | 1.1453 | 0.0775 | 0.0566 | 0.7150 | 0.0307 | 1.1626 | 0.0653 | 0.0567 | 0.8000 |
OLS | 1 | 0.0095 | 1.0804 | 0.0550 | 0.0550 | 0.3100 | 0.0092 | 1.0795 | 0.0536 | 0.0549 | 0.2950 |
\(\gamma = 0.7\) | |||||||||||
Oracle | 1 | 0.0036 | 1.0031 | 0.0596 | 0.0583 | 0.0620 | 0.0034 | 0.9997 | 0.0582 | 0.0585 | 0.0430 |
DS | 1 | 0.0042 | 1.0034 | 0.0649 | 0.0648 | 0.0590 | 0.0043 | 0.9994 | 0.0653 | 0.0657 | 0.0430 |
Naive | 1 | 0.0377 | 1.1755 | 0.0833 | 0.0587 | 0.8170 | 0.0414 | 1.1902 | 0.0725 | 0.0591 | 0.8680 |
OLS | 1 | 0.0121 | 1.0934 | 0.0583 | 0.0573 | 0.3740 | 0.0118 | 1.0925 | 0.0572 | 0.0575 | 0.3640 |
\(\gamma = 0.8\) | |||||||||||
Oracle | 1 | 0.0037 | 1.0060 | 0.0606 | 0.0607 | 0.0440 | 0.0037 | 1.0016 | 0.0611 | 0.0608 | 0.0520 |
DS | 1 | 0.0049 | 1.0064 | 0.0698 | 0.0676 | 0.0630 | 0.0049 | 1.0010 | 0.0703 | 0.0683 | 0.0530 |
Naive | 1 | 0.0505 | 1.2059 | 0.0898 | 0.0611 | 0.8690 | 0.0519 | 1.2143 | 0.0777 | 0.0613 | 0.8990 |
OLS | 1 | 0.0158 | 1.1106 | 0.0598 | 0.0598 | 0.4510 | 0.0147 | 1.1052 | 0.0600 | 0.0598 | 0.4250 |
\( p = 1000 \) | \( p = 2100 \) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
True | MSE | Mean | SD | \({\overline{SE}}\) | Rej. Rate | MSE | Mean | SD | \({\overline{SE}}\) | Rej. Rate | |
Oracle | 0.4 | 0.0092 | 0.3948 | 0.0959 | 0.0948 | 0.0490 | 0.0087 | 0.3969 | 0.0933 | 0.0952 | 0.0430 |
DS | 0.4 | 0.0096 | 0.3873 | 0.0970 | 0.0943 | 0.0580 | 0.0091 | 0.3877 | 0.0946 | 0.0942 | 0.0510 |
Naive | 0.4 | 0.0115 | 0.3658 | 0.1017 | 0.0982 | 0.0710 | 0.0108 | 0.3646 | 0.0979 | 0.0988 | 0.0590 |
Oracle | 0.5 | 0.0102 | 0.4940 | 0.1011 | 0.0973 | 0.0540 | 0.0093 | 0.4948 | 0.0961 | 0.0972 | 0.0460 |
DS | 0.5 | 0.0107 | 0.4843 | 0.1022 | 0.0969 | 0.0650 | 0.0095 | 0.4837 | 0.0963 | 0.0964 | 0.0550 |
Naive | 0.5 | 0.0135 | 0.4558 | 0.1077 | 0.1010 | 0.0820 | 0.0125 | 0.4537 | 0.1017 | 0.1013 | 0.0760 |
Oracle | 0.6 | 0.0092 | 0.5952 | 0.0958 | 0.1000 | 0.0390 | 0.0098 | 0.5945 | 0.0988 | 0.1000 | 0.0460 |
DS | 0.6 | 0.0098 | 0.5835 | 0.0977 | 0.0998 | 0.0420 | 0.0102 | 0.5810 | 0.0995 | 0.0993 | 0.0500 |
Naive | 0.6 | 0.0135 | 0.5454 | 0.1024 | 0.1041 | 0.0780 | 0.0142 | 0.5450 | 0.1058 | 0.1050 | 0.0800 |
Oracle | 0.7 | 0.0116 | 0.6903 | 0.1072 | 0.1030 | 0.0600 | 0.0104 | 0.6999 | 0.1020 | 0.1031 | 0.0410 |
DS | 0.7 | 0.0124 | 0.6765 | 0.1091 | 0.1029 | 0.0770 | 0.0111 | 0.6830 | 0.1040 | 0.1025 | 0.0510 |
Naive | 0.7 | 0.0172 | 0.6359 | 0.1145 | 0.1080 | 0.1170 | 0.0153 | 0.6453 | 0.1111 | 0.1091 | 0.0740 |
Oracle | 0.8 | 0.0117 | 0.7912 | 0.1078 | 0.1064 | 0.0510 | 0.0117 | 0.7831 | 0.1071 | 0.1066 | 0.0560 |
DS | 0.8 | 0.0131 | 0.7762 | 0.1119 | 0.1065 | 0.0720 | 0.0130 | 0.7657 | 0.1088 | 0.1062 | 0.0820 |
Naive | 0.8 | 0.0200 | 0.7260 | 0.1204 | 0.1124 | 0.1380 | 0.0202 | 0.7213 | 0.1182 | 0.1132 | 0.1240 |
4 Application to female earnings estimation
4.1 Labor force participation and earnings
4.2 Sample construction
Variable | Type | Definition |
---|---|---|
Controls | ||
Education | Continuous | Years of education |
High school education | Categorical | Graduated from high school (3 categories) |
Enrolled in school | Categorical | If currently enrolled in regular school (2 categories) |
College attendance | Categorical | If attended college (2 categories) |
Other degree or certificate | Categorical | If received other degree/certificate (2 categories) |
US education | Categorical | If the individual obtained her education in the USA, outside the USA, or both (3 categories) |
Father’s education | Categorical | Educational level of the individual’s father (8 categories) |
Mother’s education | Categorical | Educational level of the individual’s mother (8 categories) |
If owns a vehicle | Categorical | If the individual owns a vehicle (2 categories) |
Current state | Categorical | Geographical location of the individual (46 states) |
Rural–urban location | Categorical | Beale-Ross rural–urban continuum code for the individual’s current residence (9 categories) |
Potential exclusion restrictions | ||
Number of kids | Continuous | The number of children in the household under 18 years of age |
If kids less than or equal to 15 years old | Categorical | If there are any children age 15 years old or younger in the individual’s household (2 categories) |
Child care expenditure | Continuous | Child care expenses (in thousand dollars) |
Husband’s labor income | Continuous | Annual labor income of the individual’s husband (in thousand dollas) |
Household major expenditure | Continuous | Expenses on household furnishings and equipment, including household textiles, furniture, floor coverings, major appliances, small appliances and miscellaneous housewares (in thousand dollars) |
In the labor force | Not in the labor force | All | ||||
---|---|---|---|---|---|---|
Mean | SD | Mean | SD | Mean | SD | |
Controls | ||||||
Age | 43.84 | 11.74 | 53.63 | 16.17 | 47.26 | 14.24 |
Experience | 9.86 | 6.64 | 12.21 | 9.30 | 10.68 | 7.75 |
Education (in years) | 14.78 | 1.98 | 13.96 | 2.16 | 14.49 | 2.08 |
Graduated from high school (1–3 range) | 1.05 | 0.27 | 1.14 | 0.46 | 1.09 | 0.35 |
If currently enrolled in regular school | 0.03 | 0.17 | 0.03 | 0.16 | 0.03 | 0.17 |
If attended college | 0.82 | 0.39 | 0.68 | 0.47 | 0.77 | 0.42 |
If received other degree/certificate | 0.21 | 0.41 | 0.20 | 0.40 | 0.20 | 0.40 |
Received education in the USA (1–3 range) | 1.04 | 0.30 | 1.04 | 0.27 | 1.04 | 0.29 |
Father’s educational level (1–8 range) | 5.08 | 1.80 | 4.56 | 1.92 | 4.90 | 1.86 |
Mother’s educational level (1–8 range) | 4.96 | 1.61 | 4.54 | 1. 64 | 4.81 | 1.64 |
If owns a vehicle | 0.99 | 0.09 | 0.98 | 0.15 | 0.99 | 0.12 |
Exclusion restrictions | ||||||
Number of kids under 18 years old | 1.00 | 1.17 | 0.82 | 1.27 | 0.93 | 1.21 |
If kids less than or equal to 15 years old | 0.47 | 0.50 | 0.35 | 0.48 | 0.43 | 0.49 |
Child care expenditure | 1.47 | 3.91 | 0.25 | 1.64 | 1.04 | 3.35 |
Husband’s labor income | 61.49 | 81.16 | 62.76 | 195.65 | 61.93 | 132.85 |
Household major expenditure | 1.49 | 2.90 | 1.20 | 2.69 | 1.39 | 2.83 |
Outcomes | ||||||
Labor income | 45,575.65 | 43,701.59 | 12,439.52 | 15,746.09 | 41,909.04 | 42,822.50 |
Log (Labor income) | 10.37 | 0.98 | 8.52 | 1.58 | 10.17 | 1.21 |
If in the labor force | 1 | 0 | 0 | 0 | 0.65 | 0.48 |
Number of observations | 1,294 | 695 | 1,989 |
4.3 Empirical findings
(1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | |
---|---|---|---|---|---|---|---|---|
Earnings equation | ||||||||
Education (years) | 0.053 | 0.047 | 0.089*** | 0.056* | 0.089*** | 0.060* | 0.078*** | 0.075*** |
(0.034) | (0.040) | (0.023) | (0.034) | (0.023) | (0.033) | (0.026) | (0.016) | |
Experience | 0.026*** | 0.006 | − 0.000 | − 0.018 | − 0.012 | 0.001 | ||
(0.007) | (0.018) | (0.015) | (0.021) | (0.017) | (0.014) | |||
Experience\(^2\) | 0.001 | 0.001 | 0.001 | 0.000 | ||||
(0.001) | (0.001) | (0.001) | (0.000) | |||||
Age | 0.019*** | − 0.090 | 0.018*** | − 0.086 | − 0.001 | 0.002 | ||
(0.004) | (0.057) | (0.005) | (0.055) | (0.027) | (0.022) | |||
Age\(^2\) | 0.001* | 0.001* | 0.000 | 0.000 | ||||
(0.001) | (0.001) | (0.000) | (0.000) | |||||
\(\lambda \) | − 1.336*** | − 1.463*** | − 0.838*** | − 1.584*** | − 0.924*** | − 1.581*** | − 0.783*** | − 0.942*** |
(0.388) | (0.484) | (0.146) | (0.456) | (0.171) | (0.455) | (0.167) | (0.156) | |
Number of observations (n) | 1294 | 1294 | 1294 | 1294 | 1294 | 1294 | 1294 | 1294 |
Number of controls | 35 | 34 | 34 | 34 | 35 | 34 | 29 | |
Number of selected controls | 19 | 19 | 20 | 21 | 21 | 21 | 18 | |
Labor force participation equation | ||||||||
Education (years) | 0.066*** | 0.066*** | 0.074*** | 0.074*** | 0.070*** | 0.069*** | 0.068*** | 0.067*** |
(0.024) | (0.024) | (0.025) | (0.025) | (0.025) | (0.025) | (0.025) | (0.026) | |
Experience | 0.012 | 0.012 | 0.042*** | 0.041*** | 0.042*** | 0.042*** | ||
(0.013) | (0.013) | (0.015) | (0.015) | (0.015) | (0.015) | |||
Experience\(^2\) | − 0.001*** | − 0.001*** | − 0.001** | − 0.001** | − 0.001** | |||
(0.000) | (0.000) | (0.000) | (0.000) | (0.000) | ||||
Age | 0.168*** | 0.168*** | 0.160*** | 0.159*** | 0.152*** | 0.164*** | ||
(0.023) | (0.023) | (0.022) | (0.023) | (0.023) | (0.020) | |||
Age\(^2\) | − 0.002*** | − 0.002*** | − 0.002*** | − 0.002*** | − 0.002*** | − 0.002*** | ||
(0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | |||
Child care expenditure | 0.073*** | 0.073*** | 0.075*** | 0.075*** | 0.072*** | 0.072*** | 0.067*** | 0.085*** |
(0.025) | (0.025) | (0.026) | (0.026) | (0.026) | (0.025) | (0.025) | (0.016) | |
If kids under 15 | − 0.248* | − 0.288** | ||||||
(0.129) | (0.130) | |||||||
Number of kids | − 0.199*** | − 0.195*** | ||||||
(0.046) | (0.047) | |||||||
Husband’s income | − 0.001 | − 0.001** | ||||||
(0.000) | (0.000) | |||||||
Major expenses | 0.007 | 0.014 | ||||||
(0.020) | (0.012) | |||||||
Number of observations (N) | 1989 | 1989 | 1989 | 1989 | 1989 | 1989 | 1989 | 1989 |
Number of controls | 35 | 34 | 34 | 34 | 35 | 34 | 29 | |
Number of selected controls | 21 | 21 | 22 | 22 | 23 | 22 | 20 |