Abstract
We introduce a framework for estimating the effect that a binary treatment has on a binary outcome in the presence of unobserved confounding. The methodology is applied to a case study which uses data from the Medical Expenditure Panel Survey and whose aim is to estimate the effect of private health insurance on health care utilization. Unobserved confounding arises when variables which are associated with both treatment and outcome are not available (in economics this issue is known as endogeneity). Also, treatment and outcome may exhibit a dependence which cannot be modeled using a linear measure of association, and observed confounders may have a non-linear impact on the treatment and outcome variables. The problem of unobserved confounding is addressed using a two-equation structural latent variable framework, where one equation essentially describes a binary outcome as a function of a binary treatment whereas the other equation determines whether the treatment is received. Non-linear dependence between treatment and outcome is dealt using copula functions, whereas covariate-response relationships are flexibly modeled using a spline approach. Related model fitting and inferential procedures are developed, and asymptotic arguments presented.
Similar content being viewed by others
References
Abadie, A., Drukker, D., Herr, J.L., Imbens, G.W.: Implementing matching estimators for average treatment effects in Stata. Stata J. 4, 290–311 (2004)
Azzalini, A.: A class of distributions which includes the normal one. Scand. J. Stat. 12, 171–178 (1985)
Azzalini, A., Arellano-Valle, R.B.: Maximum penalized likelihood estimation for skew-normal and skew-t distributions. J. Stat. Plan. Inference 143, 419–433 (2013)
Barndorff-Nielsen, O., Cox, D.: Asymptotic Techniques for Use in Statistics. Chapman and Hall, London (1989)
Bazan, J.L., Bolfarinez, H., Branco, M.B.: A framework for skew-probit links in binary regression. Commun. Stat. Theory Methods 39, 678–697 (2010)
Brechmann, E.C., Schepsmeier, U.: Modeling dependence with c- and d-vine copulas: the R package CDVine. J. Stat. Softw. 52(3), 1–27 (2013)
Buchmueller, T.C., Grumbach, K., Kronick, R., Kahn, J.G.: Book review: the effect of health insurance on medical care utilization and implications for insurance expansion: a review of the literature. Med. Care Res. Rev. 62, 3–30 (2005)
Chib, S., Greenberg, E.: Semiparametric modeling and estimation of instrumental variable models. J. Comput. Graph. Stat. 16, 86–114 (2007)
Chib, S., Hamilton, B.H.: Semiparametric Bayes analysis of longitudinal data treatment models. J. Econom. 110, 67–89 (2002)
Clarke, P.S., Windmeijer, F.: Instrumental variable estimators for binary outcomes. J. Am. Stat. Assoc. 107, 1638–1652 (2012)
Deheuvels, P.: A Kolmogorov–Smirnov type test for independence and multivariate samples. Rom. J. Pure Appl. Math. 26, 213–226 (1981a)
Deheuvels, P.: A Nonparametric Test of Independence, pp. 29–50. L’ Institut Statistique Universitaire de Paris, Paris (1981b)
Durante, F.: Construction of non-exchangeable bivariate distribution functions. Stat. Pap. 50, 383–391 (2009)
Frees, E.W., Valdez, E.A.: Understanding relationships using copulas. North Am. Actuar. J. 2, 1–25 (1998)
Genest, C., Ghoudi, K., Rivest, L.P.: A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82, 543–552 (1995)
Genest, C., Nikoloulopoulos, A.K., Rivest, L.-P., Fortin, M.: Predicting dependent binary outcomes through logistic regressions and meta-elliptical copulas. Braz. J. Probab. Stat. 27, 265–284 (2013)
Geyer, C.J.: Trust regions. http://cran.r-project.org/web/packages/trust/vignettes/trust.pdf (2013)
Gitto, L., Santoro, D., Sobbrio, G.: Choice of dialysis treatment and type of medical unit (private vs public), application of a recursive bivariate probit. Health Econ. 15, 1251–1256 (2006)
Goldman, D.P., Bhattacharya, J., McCaffrey, D.F., Duan, N., Leibowitz, A.A., Joyce, G.F., Morton, S.C.: Effect of insurance on mortality in an HIV-positive population in care. J. Am. Stat. Assoc. 96, 883–894 (2001)
Goodman, L.A., Kruskal, W.H.: Measures of association for cross classification. J. Am. Stat. Assoc. 49, 732–764 (1954)
Greene, W.H.: Econometric Analysis. Prentice Hall, New York (2012)
Gu, C.: Smoothing Spline ANOVA Models. Springer, London (2002)
Han, S., Vytlacil, E.J.: Identification in a generalization of bivariate probit models with endogenous regressors. Revise and resubmit. J. Econom. http://ideas.repec.org/p/tex/wpaper/130908.html (2014)
Hastie, T., Tibshirani, R.: Varying-coefficient models. J. R. Stat. Soc. B 55, 757–796 (1993)
Heckman, J.J.: Dummy endogenous variables in a simultaneous equation system. Econometrica 46, 931–959 (1978)
Heckman, J.J., Ichimura, H., Todd, P.: Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev. Econ. Stud. 64, 605–654 (1997)
Holly, A., Gardiol, L., Domenighetti, G., Brigitte, B.: An econometric model of health care utilization and health insurance in Switzerland. Eur. Econ. Rev. 42(3–5), 513–522 (1998)
Hopkins, S., Kiddi, M.P.: The determinants of the demand for private health insurance under medicare. Appl. Econ. 28, 1623–1632 (1996)
Jones, A.M., Koolman, X., Doorslaer, E.V.: The impact of having supplementary private health insurance on the uses of specialists. Annales d’Economie et de Statistique 83/84, 251–275 (2006)
Kauermann, G.: Penalized spline smoothing in multivariable survival models with varying coefficients. Comput. Stat. Data Anal. 49, 169–186 (2005)
Kauermann, G., Krivobokova, T., Fahrmeir, L.: Some asymptotics results on generalized penalized spline smoothing. J. R. Stat. Soc. B 71, 487–503 (2009)
Kawatkar, A.A., Nichol, M.B.: Estimation of causal effects of physical activity on obesity by a recursive bivariate probit model. Value Health 12, A131–A132 (2009)
Kim, Y.J., Gu, C.: Smoothing spline gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. B 66, 337–356 (2004)
Latif, E.: The impact of diabetes on employment in Canada. Health Econ. 18, 577–589 (2009)
Li, Y., Jensen, G.A.: The impact of private long-term care insurance on the use of long-term care. Inquiry 48(1), 34–50 (2011)
Maddala, G.S.: Limited Dependent and Qualitative Variables in Econometrics. Cambridge University Press, Cambridge (1983)
Marra, G., Radice, R.: SemiParBIVProbit: semiparametric bivariate probit modelling. R package version 3.3 (2015)
Marra, G.: On p-values for semiparametric bivariate probit models. Stat. Methodol. 10, 23–28 (2013)
Marra, G., Radice, R.: Estimation of a semiparametric recursive bivariate probit model in the presence of endogeneity. Can. J. Stat. 39, 259–279 (2011a)
Marra, G., Radice, R.: A flexible instrumental variable approach. Stat. Model. 11, 581–603 (2011b)
Marra, G., Wood, S.N.: Practical variable selection for generalized additive models. Comput. Stat. Data Anal. 55, 2372–2387 (2011)
Marra, G., Wood, S.: Coverage properties of confidence intervals for generalized additive model components. Scand. J. Stat. 39, 53–74 (2012)
McCullagh, P.: Tensor Methods in Statistics. Chapman and Hall, London (1987)
Nelsen, R.: An Introduction to Copulas. Springer, New York (2006)
Nelsen, R.B.: Extremes of nonexchangeability. Stat. Pap. 48, 329–336 (2007)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)
R Development Core Team: R: a Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015). (ISBN 3-900051-07-0)
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, New York (2003)
Shane, D., Trivedi, P.K.: What drives differences in health care demand? the role of health insurance and selection bias. Health, Econometrics and Data Group (HEDG) working papers (2012)
Sindelar, J.L.: Differential use of medical care by sex. J. Polit. Econ. 90, 1003–1019 (1982)
Sklar, A.: Fonctions de répartition é n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris 8, 229–231 (1959)
Sklar, A.: Random variables, joint distributions, and copulas. Kybernetica 9, 449–460 (1973)
Srivastava, P., Zhao, X.: Impact of private health insurance on the choice of public versus private hospital services. Health, Econometrics and Data Group (HEDG) working papers (2008)
Swihart, B.J., Caffo, B.S., Crainiceanu, C.M.: A unifying framework for marginalised random-intercept models of correlated binary outcomes. Comput. Stat. Data Anal. 82, 275–295 (2014)
Tajar, A., Denuit, M., Lambert, P.: Copula-type representation for random couples with bernoulli margins. Working paper (2001)
Trivedi, P.K., Zimmer, D.M.: Copula modeling: an introduction for practitioners. Found. Trends. Econom. 1(1), 1–111 (2005)
Tutz, G., Petry, S.: Generalized additive models with unknown link function including variable selection. Technical report (2013)
Vuong, Q.H.: Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57, 307–333 (1989)
Wiesenfarth, M., Kneib, T.: Bayesian geoadditive sample selection models. J. R. Stat. Soc. C 59, 381–404 (2011)
Wilde, J.: Identification of multiple equation probit models with endogenous dummy regressors. Econ. Lett. 69, 309–312 (2000)
Winkelmann, R.: Copula bivariate probit models: with an application to medical expenditures. Health Econ. 21, 1444–1455 (2012)
Wood, S.N.: Thin plate regression splines. J. R. Stat. Soc. B 65, 95–114 (2003)
Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99, 673–686 (2004)
Wood, S.N.: Generalized additive models: an introduction with R. Chapman & Hall/CRC, London (2006)
Wood, S.N.: On p-values for smooth components of an extended generalized additive model. Biometrika 100, 221–228 (2013)
Acknowledgments
We would like to thank two anonymous reviewers and the Associate Editor for many suggestions which helped to clarify the contribution of the paper and improved considerably the presentation of the article.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Radice, R., Marra, G. & Wojtyś, M. Copula regression spline models for binary outcomes. Stat Comput 26, 981–995 (2016). https://doi.org/10.1007/s11222-015-9581-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-015-9581-6