Skip to main content

2016 | OriginalPaper | Buchkapitel

What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited

verfasst von : Jan Mielniczuk, Paweł Teisseyre

Erschienen in: Challenges in Computational Statistics and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem of fitting logistic regression to binary model allowing for missppecification of the response function is reconsidered. We introduce two-stage procedure which consists first in ordering predictors with respect to deviances of the models with the predictor in question omitted and then choosing the minimizer of Generalized Information Criterion in the resulting nested family of models. This allows for large number of potential predictors to be considered in contrast to an exhaustive method. We prove that the procedure consistently chooses model \(t^{*}\) which is the closest in the averaged Kullback-Leibler sense to the true binary model t. We then consider interplay between t and \(t^{*}\) and prove that for monotone response function when there is genuine dependence of response on predictors, \(t^{*}\) is necessarily nonempty. This implies consistency of a deviance test of significance under misspecification. For a class of distributions of predictors, including normal family, Rudd’s result asserts that \(t^{*}=t\). Numerical experiments reveal that for normally distributed predictors probability of correct selection and power of deviance test depend monotonically on Rudd’s proportionality constant \(\eta \).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Bache K, Lichman M (2013) UCI machine learning repository. University of California, Irvine Bache K, Lichman M (2013) UCI machine learning repository. University of California, Irvine
2.
Zurück zum Zitat Bishop CM (2006) Pattern recognition and machine learning. Springer, New York Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
3.
Zurück zum Zitat Bogdan M, Doerge R, Ghosh J (2004) Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitative trait loci. Genetics 167:989–999 Bogdan M, Doerge R, Ghosh J (2004) Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitative trait loci. Genetics 167:989–999
4.
Zurück zum Zitat Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analitycal extensions. Psychometrika 52:345–370 Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analitycal extensions. Psychometrika 52:345–370
5.
Zurück zum Zitat Burnham K, Anderson D (2002) Model selection and multimodel inference. A practical information-theoretic approach. Springer, New York Burnham K, Anderson D (2002) Model selection and multimodel inference. A practical information-theoretic approach. Springer, New York
6.
Zurück zum Zitat Carroll R, Pederson S (1993) On robustness in the logistic regression model. J R Stat Soc B 55:693–706MathSciNetMATH Carroll R, Pederson S (1993) On robustness in the logistic regression model. J R Stat Soc B 55:693–706MathSciNetMATH
7.
Zurück zum Zitat Casella G, Giron J, Martinez M, Moreno E (2009) Consistency of Bayes procedures for variable selection. Ann Stat 37:1207–1228 Casella G, Giron J, Martinez M, Moreno E (2009) Consistency of Bayes procedures for variable selection. Ann Stat 37:1207–1228
8.
Zurück zum Zitat Chen J, Chen Z (2008) Extended Bayesian Information Criteria for model selection with large model spaces. Biometrika 95:759–771 Chen J, Chen Z (2008) Extended Bayesian Information Criteria for model selection with large model spaces. Biometrika 95:759–771
9.
Zurück zum Zitat Chen J, Chen Z (2012) Extended BIC for small-n-large-p sparse glm. Statistica Sinica 22:555–574 Chen J, Chen Z (2012) Extended BIC for small-n-large-p sparse glm. Statistica Sinica 22:555–574
10.
Zurück zum Zitat Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge University Press, CambridgeCrossRefMATH Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge University Press, CambridgeCrossRefMATH
11.
Zurück zum Zitat Czado C, Santner T (1992) The effect of link misspecification on binary regression inference. J Stat Plann Infer 33:213–231MathSciNetCrossRef Czado C, Santner T (1992) The effect of link misspecification on binary regression inference. J Stat Plann Infer 33:213–231MathSciNetCrossRef
13.
Zurück zum Zitat Fahrmeir L (1990) Maximum likelihood estimation in misspecified generalized linear models. Statistics 4:487–502MathSciNetCrossRef Fahrmeir L (1990) Maximum likelihood estimation in misspecified generalized linear models. Statistics 4:487–502MathSciNetCrossRef
14.
Zurück zum Zitat Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Stat Assoc 96:1348–1360MathSciNetCrossRefMATH Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Stat Assoc 96:1348–1360MathSciNetCrossRefMATH
15.
Zurück zum Zitat Foster D, George E (1994) The risk inflation criterion for multiple regression. Ann Stat 22:1947–1975 Foster D, George E (1994) The risk inflation criterion for multiple regression. Ann Stat 22:1947–1975
16.
Zurück zum Zitat Hjort N, Pollard D (1993) Asymptotics for minimisers of convex processes. Unpublished manuscript Hjort N, Pollard D (1993) Asymptotics for minimisers of convex processes. Unpublished manuscript
17.
Zurück zum Zitat Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New York Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New York
18.
Zurück zum Zitat Lehmann E (1959) Testing statistical hypotheses. Wiley, New YorkMATH Lehmann E (1959) Testing statistical hypotheses. Wiley, New YorkMATH
20.
Zurück zum Zitat Qian G, Field C (2002) Law of iterated logarithm and consistent model selection criterion in logistic regression. Stat Probab Lett 56:101–112MathSciNetCrossRefMATH Qian G, Field C (2002) Law of iterated logarithm and consistent model selection criterion in logistic regression. Stat Probab Lett 56:101–112MathSciNetCrossRefMATH
21.
Zurück zum Zitat Ruud P (1983) Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models. Econometrica 51(1):225–228 Ruud P (1983) Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models. Econometrica 51(1):225–228
22.
Zurück zum Zitat Sin C, White H (1996) Information criteria for selecting possibly misspecified parametric models. J Econometrics 71:207–225MathSciNetCrossRefMATH Sin C, White H (1996) Information criteria for selecting possibly misspecified parametric models. J Econometrics 71:207–225MathSciNetCrossRefMATH
23.
Zurück zum Zitat Zak-Szatkowska M, Bogdan M (2011) Modified versions of Baysian Information Criterion for sparse generalized linear models. Comput Stat Data Anal 5:2908–2924 Zak-Szatkowska M, Bogdan M (2011) Modified versions of Baysian Information Criterion for sparse generalized linear models. Comput Stat Data Anal 5:2908–2924
24.
Zurück zum Zitat Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320MathSciNetCrossRef Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320MathSciNetCrossRef
Metadaten
Titel
What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited
verfasst von
Jan Mielniczuk
Paweł Teisseyre
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-18781-5_15