Skip to main content
Top

2016 | OriginalPaper | Chapter

What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited

Authors : Jan Mielniczuk, Paweł Teisseyre

Published in: Challenges in Computational Statistics and Data Mining

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The problem of fitting logistic regression to binary model allowing for missppecification of the response function is reconsidered. We introduce two-stage procedure which consists first in ordering predictors with respect to deviances of the models with the predictor in question omitted and then choosing the minimizer of Generalized Information Criterion in the resulting nested family of models. This allows for large number of potential predictors to be considered in contrast to an exhaustive method. We prove that the procedure consistently chooses model \(t^{*}\) which is the closest in the averaged Kullback-Leibler sense to the true binary model t. We then consider interplay between t and \(t^{*}\) and prove that for monotone response function when there is genuine dependence of response on predictors, \(t^{*}\) is necessarily nonempty. This implies consistency of a deviance test of significance under misspecification. For a class of distributions of predictors, including normal family, Rudd’s result asserts that \(t^{*}=t\). Numerical experiments reveal that for normally distributed predictors probability of correct selection and power of deviance test depend monotonically on Rudd’s proportionality constant \(\eta \).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Bache K, Lichman M (2013) UCI machine learning repository. University of California, Irvine Bache K, Lichman M (2013) UCI machine learning repository. University of California, Irvine
2.
go back to reference Bishop CM (2006) Pattern recognition and machine learning. Springer, New York Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
3.
go back to reference Bogdan M, Doerge R, Ghosh J (2004) Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitative trait loci. Genetics 167:989–999 Bogdan M, Doerge R, Ghosh J (2004) Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitative trait loci. Genetics 167:989–999
4.
go back to reference Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analitycal extensions. Psychometrika 52:345–370 Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analitycal extensions. Psychometrika 52:345–370
5.
go back to reference Burnham K, Anderson D (2002) Model selection and multimodel inference. A practical information-theoretic approach. Springer, New York Burnham K, Anderson D (2002) Model selection and multimodel inference. A practical information-theoretic approach. Springer, New York
6.
go back to reference Carroll R, Pederson S (1993) On robustness in the logistic regression model. J R Stat Soc B 55:693–706MathSciNetMATH Carroll R, Pederson S (1993) On robustness in the logistic regression model. J R Stat Soc B 55:693–706MathSciNetMATH
7.
go back to reference Casella G, Giron J, Martinez M, Moreno E (2009) Consistency of Bayes procedures for variable selection. Ann Stat 37:1207–1228 Casella G, Giron J, Martinez M, Moreno E (2009) Consistency of Bayes procedures for variable selection. Ann Stat 37:1207–1228
8.
go back to reference Chen J, Chen Z (2008) Extended Bayesian Information Criteria for model selection with large model spaces. Biometrika 95:759–771 Chen J, Chen Z (2008) Extended Bayesian Information Criteria for model selection with large model spaces. Biometrika 95:759–771
9.
go back to reference Chen J, Chen Z (2012) Extended BIC for small-n-large-p sparse glm. Statistica Sinica 22:555–574 Chen J, Chen Z (2012) Extended BIC for small-n-large-p sparse glm. Statistica Sinica 22:555–574
10.
go back to reference Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge University Press, CambridgeCrossRefMATH Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge University Press, CambridgeCrossRefMATH
11.
go back to reference Czado C, Santner T (1992) The effect of link misspecification on binary regression inference. J Stat Plann Infer 33:213–231MathSciNetCrossRef Czado C, Santner T (1992) The effect of link misspecification on binary regression inference. J Stat Plann Infer 33:213–231MathSciNetCrossRef
13.
14.
go back to reference Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Stat Assoc 96:1348–1360MathSciNetCrossRefMATH Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Stat Assoc 96:1348–1360MathSciNetCrossRefMATH
15.
go back to reference Foster D, George E (1994) The risk inflation criterion for multiple regression. Ann Stat 22:1947–1975 Foster D, George E (1994) The risk inflation criterion for multiple regression. Ann Stat 22:1947–1975
16.
go back to reference Hjort N, Pollard D (1993) Asymptotics for minimisers of convex processes. Unpublished manuscript Hjort N, Pollard D (1993) Asymptotics for minimisers of convex processes. Unpublished manuscript
17.
go back to reference Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New York Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New York
18.
go back to reference Lehmann E (1959) Testing statistical hypotheses. Wiley, New YorkMATH Lehmann E (1959) Testing statistical hypotheses. Wiley, New YorkMATH
20.
go back to reference Qian G, Field C (2002) Law of iterated logarithm and consistent model selection criterion in logistic regression. Stat Probab Lett 56:101–112MathSciNetCrossRefMATH Qian G, Field C (2002) Law of iterated logarithm and consistent model selection criterion in logistic regression. Stat Probab Lett 56:101–112MathSciNetCrossRefMATH
21.
go back to reference Ruud P (1983) Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models. Econometrica 51(1):225–228 Ruud P (1983) Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models. Econometrica 51(1):225–228
22.
23.
go back to reference Zak-Szatkowska M, Bogdan M (2011) Modified versions of Baysian Information Criterion for sparse generalized linear models. Comput Stat Data Anal 5:2908–2924 Zak-Szatkowska M, Bogdan M (2011) Modified versions of Baysian Information Criterion for sparse generalized linear models. Comput Stat Data Anal 5:2908–2924
24.
Metadata
Title
What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited
Authors
Jan Mielniczuk
Paweł Teisseyre
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-18781-5_15

Premium Partner