Skip to main content

Advertisement

Log in

Copula regression spline models for binary outcomes

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We introduce a framework for estimating the effect that a binary treatment has on a binary outcome in the presence of unobserved confounding. The methodology is applied to a case study which uses data from the Medical Expenditure Panel Survey and whose aim is to estimate the effect of private health insurance on health care utilization. Unobserved confounding arises when variables which are associated with both treatment and outcome are not available (in economics this issue is known as endogeneity). Also, treatment and outcome may exhibit a dependence which cannot be modeled using a linear measure of association, and observed confounders may have a non-linear impact on the treatment and outcome variables. The problem of unobserved confounding is addressed using a two-equation structural latent variable framework, where one equation essentially describes a binary outcome as a function of a binary treatment whereas the other equation determines whether the treatment is received. Non-linear dependence between treatment and outcome is dealt using copula functions, whereas covariate-response relationships are flexibly modeled using a spline approach. Related model fitting and inferential procedures are developed, and asymptotic arguments presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Abadie, A., Drukker, D., Herr, J.L., Imbens, G.W.: Implementing matching estimators for average treatment effects in Stata. Stata J. 4, 290–311 (2004)

    Google Scholar 

  • Azzalini, A.: A class of distributions which includes the normal one. Scand. J. Stat. 12, 171–178 (1985)

    MathSciNet  MATH  Google Scholar 

  • Azzalini, A., Arellano-Valle, R.B.: Maximum penalized likelihood estimation for skew-normal and skew-t distributions. J. Stat. Plan. Inference 143, 419–433 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Barndorff-Nielsen, O., Cox, D.: Asymptotic Techniques for Use in Statistics. Chapman and Hall, London (1989)

    Book  MATH  Google Scholar 

  • Bazan, J.L., Bolfarinez, H., Branco, M.B.: A framework for skew-probit links in binary regression. Commun. Stat. Theory Methods 39, 678–697 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Brechmann, E.C., Schepsmeier, U.: Modeling dependence with c- and d-vine copulas: the R package CDVine. J. Stat. Softw. 52(3), 1–27 (2013)

    Article  Google Scholar 

  • Buchmueller, T.C., Grumbach, K., Kronick, R., Kahn, J.G.: Book review: the effect of health insurance on medical care utilization and implications for insurance expansion: a review of the literature. Med. Care Res. Rev. 62, 3–30 (2005)

    Article  Google Scholar 

  • Chib, S., Greenberg, E.: Semiparametric modeling and estimation of instrumental variable models. J. Comput. Graph. Stat. 16, 86–114 (2007)

    Article  MathSciNet  Google Scholar 

  • Chib, S., Hamilton, B.H.: Semiparametric Bayes analysis of longitudinal data treatment models. J. Econom. 110, 67–89 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Clarke, P.S., Windmeijer, F.: Instrumental variable estimators for binary outcomes. J. Am. Stat. Assoc. 107, 1638–1652 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Deheuvels, P.: A Kolmogorov–Smirnov type test for independence and multivariate samples. Rom. J. Pure Appl. Math. 26, 213–226 (1981a)

    MathSciNet  MATH  Google Scholar 

  • Deheuvels, P.: A Nonparametric Test of Independence, pp. 29–50. L’ Institut Statistique Universitaire de Paris, Paris (1981b)

  • Durante, F.: Construction of non-exchangeable bivariate distribution functions. Stat. Pap. 50, 383–391 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Frees, E.W., Valdez, E.A.: Understanding relationships using copulas. North Am. Actuar. J. 2, 1–25 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Genest, C., Ghoudi, K., Rivest, L.P.: A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82, 543–552 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Genest, C., Nikoloulopoulos, A.K., Rivest, L.-P., Fortin, M.: Predicting dependent binary outcomes through logistic regressions and meta-elliptical copulas. Braz. J. Probab. Stat. 27, 265–284 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Geyer, C.J.: Trust regions. http://cran.r-project.org/web/packages/trust/vignettes/trust.pdf (2013)

  • Gitto, L., Santoro, D., Sobbrio, G.: Choice of dialysis treatment and type of medical unit (private vs public), application of a recursive bivariate probit. Health Econ. 15, 1251–1256 (2006)

    Article  Google Scholar 

  • Goldman, D.P., Bhattacharya, J., McCaffrey, D.F., Duan, N., Leibowitz, A.A., Joyce, G.F., Morton, S.C.: Effect of insurance on mortality in an HIV-positive population in care. J. Am. Stat. Assoc. 96, 883–894 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Goodman, L.A., Kruskal, W.H.: Measures of association for cross classification. J. Am. Stat. Assoc. 49, 732–764 (1954)

    MATH  Google Scholar 

  • Greene, W.H.: Econometric Analysis. Prentice Hall, New York (2012)

    Google Scholar 

  • Gu, C.: Smoothing Spline ANOVA Models. Springer, London (2002)

    Book  MATH  Google Scholar 

  • Han, S., Vytlacil, E.J.: Identification in a generalization of bivariate probit models with endogenous regressors. Revise and resubmit. J. Econom. http://ideas.repec.org/p/tex/wpaper/130908.html (2014)

  • Hastie, T., Tibshirani, R.: Varying-coefficient models. J. R. Stat. Soc. B 55, 757–796 (1993)

    MathSciNet  MATH  Google Scholar 

  • Heckman, J.J.: Dummy endogenous variables in a simultaneous equation system. Econometrica 46, 931–959 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Heckman, J.J., Ichimura, H., Todd, P.: Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev. Econ. Stud. 64, 605–654 (1997)

    Article  MATH  Google Scholar 

  • Holly, A., Gardiol, L., Domenighetti, G., Brigitte, B.: An econometric model of health care utilization and health insurance in Switzerland. Eur. Econ. Rev. 42(3–5), 513–522 (1998)

    Article  Google Scholar 

  • Hopkins, S., Kiddi, M.P.: The determinants of the demand for private health insurance under medicare. Appl. Econ. 28, 1623–1632 (1996)

    Article  Google Scholar 

  • Jones, A.M., Koolman, X., Doorslaer, E.V.: The impact of having supplementary private health insurance on the uses of specialists. Annales d’Economie et de Statistique 83/84, 251–275 (2006)

    Article  Google Scholar 

  • Kauermann, G.: Penalized spline smoothing in multivariable survival models with varying coefficients. Comput. Stat. Data Anal. 49, 169–186 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Kauermann, G., Krivobokova, T., Fahrmeir, L.: Some asymptotics results on generalized penalized spline smoothing. J. R. Stat. Soc. B 71, 487–503 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Kawatkar, A.A., Nichol, M.B.: Estimation of causal effects of physical activity on obesity by a recursive bivariate probit model. Value Health 12, A131–A132 (2009)

    Article  Google Scholar 

  • Kim, Y.J., Gu, C.: Smoothing spline gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. B 66, 337–356 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Latif, E.: The impact of diabetes on employment in Canada. Health Econ. 18, 577–589 (2009)

    Article  Google Scholar 

  • Li, Y., Jensen, G.A.: The impact of private long-term care insurance on the use of long-term care. Inquiry 48(1), 34–50 (2011)

    Google Scholar 

  • Maddala, G.S.: Limited Dependent and Qualitative Variables in Econometrics. Cambridge University Press, Cambridge (1983)

    Book  MATH  Google Scholar 

  • Marra, G., Radice, R.: SemiParBIVProbit: semiparametric bivariate probit modelling. R package version 3.3 (2015)

  • Marra, G.: On p-values for semiparametric bivariate probit models. Stat. Methodol. 10, 23–28 (2013)

    Article  MathSciNet  Google Scholar 

  • Marra, G., Radice, R.: Estimation of a semiparametric recursive bivariate probit model in the presence of endogeneity. Can. J. Stat. 39, 259–279 (2011a)

    Article  MathSciNet  MATH  Google Scholar 

  • Marra, G., Radice, R.: A flexible instrumental variable approach. Stat. Model. 11, 581–603 (2011b)

    Article  MathSciNet  Google Scholar 

  • Marra, G., Wood, S.N.: Practical variable selection for generalized additive models. Comput. Stat. Data Anal. 55, 2372–2387 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Marra, G., Wood, S.: Coverage properties of confidence intervals for generalized additive model components. Scand. J. Stat. 39, 53–74 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • McCullagh, P.: Tensor Methods in Statistics. Chapman and Hall, London (1987)

    MATH  Google Scholar 

  • Nelsen, R.: An Introduction to Copulas. Springer, New York (2006)

    MATH  Google Scholar 

  • Nelsen, R.B.: Extremes of nonexchangeability. Stat. Pap. 48, 329–336 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)

    MATH  Google Scholar 

  • R Development Core Team: R: a Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015). (ISBN 3-900051-07-0)

    Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, New York (2003)

    Book  MATH  Google Scholar 

  • Shane, D., Trivedi, P.K.: What drives differences in health care demand? the role of health insurance and selection bias. Health, Econometrics and Data Group (HEDG) working papers (2012)

  • Sindelar, J.L.: Differential use of medical care by sex. J. Polit. Econ. 90, 1003–1019 (1982)

    Article  Google Scholar 

  • Sklar, A.: Fonctions de répartition é n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris 8, 229–231 (1959)

    MathSciNet  MATH  Google Scholar 

  • Sklar, A.: Random variables, joint distributions, and copulas. Kybernetica 9, 449–460 (1973)

    MathSciNet  MATH  Google Scholar 

  • Srivastava, P., Zhao, X.: Impact of private health insurance on the choice of public versus private hospital services. Health, Econometrics and Data Group (HEDG) working papers (2008)

  • Swihart, B.J., Caffo, B.S., Crainiceanu, C.M.: A unifying framework for marginalised random-intercept models of correlated binary outcomes. Comput. Stat. Data Anal. 82, 275–295 (2014)

    MathSciNet  Google Scholar 

  • Tajar, A., Denuit, M., Lambert, P.: Copula-type representation for random couples with bernoulli margins. Working paper (2001)

  • Trivedi, P.K., Zimmer, D.M.: Copula modeling: an introduction for practitioners. Found. Trends. Econom. 1(1), 1–111 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Tutz, G., Petry, S.: Generalized additive models with unknown link function including variable selection. Technical report (2013)

  • Vuong, Q.H.: Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57, 307–333 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  • Wiesenfarth, M., Kneib, T.: Bayesian geoadditive sample selection models. J. R. Stat. Soc. C 59, 381–404 (2011)

    Article  MathSciNet  Google Scholar 

  • Wilde, J.: Identification of multiple equation probit models with endogenous dummy regressors. Econ. Lett. 69, 309–312 (2000)

    Article  MATH  Google Scholar 

  • Winkelmann, R.: Copula bivariate probit models: with an application to medical expenditures. Health Econ. 21, 1444–1455 (2012)

  • Wood, S.N.: Thin plate regression splines. J. R. Stat. Soc. B 65, 95–114 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99, 673–686 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Wood, S.N.: Generalized additive models: an introduction with R. Chapman & Hall/CRC, London (2006)

    MATH  Google Scholar 

  • Wood, S.N.: On p-values for smooth components of an extended generalized additive model. Biometrika 100, 221–228 (2013)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We would like to thank two anonymous reviewers and the Associate Editor for many suggestions which helped to clarify the contribution of the paper and improved considerably the presentation of the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosalba Radice.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 910 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Radice, R., Marra, G. & Wojtyś, M. Copula regression spline models for binary outcomes. Stat Comput 26, 981–995 (2016). https://doi.org/10.1007/s11222-015-9581-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-015-9581-6

Keywords

Navigation