Skip to main content

2013 | OriginalPaper | Buchkapitel

Regularization and Model Selection with Categorical Covariates

verfasst von : Jan Gertheiss, Veronika Stelz, Gerhard Tutz

Erschienen in: Algorithms from and for Nature and Life

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The challenge in regression problems with categorical covariates is the high number of parameters involved. Common regularization methods like the Lasso, which allow for selection of predictors, are typically designed for metric predictors. If independent variables are categorical, selection strategies should be based on modified penalties. For categorical predictor variables with many categories a useful strategy is to search for clusters of categories with similar effects. We focus on generalized linear models and present L 1-penalty approaches for factor selection and clustering of categories. The methods proposed are investigated in simulation studies and applied to a real world classification problem.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The original dataset (Wolberg and Mangasarian 1990) was of size 369 (reported January 1989). Two instances were removed later and additional groups of all in all 332 samples were collected (between October 1989 and November 1991).
 
Literatur
Zurück zum Zitat Bondell, H. D., & Reich, B. J. (2009). Simultaneous factor selection and collapsing levels in anova. Biometrics, 65, 169–177.MathSciNetMATHCrossRef Bondell, H. D., & Reich, B. J. (2009). Simultaneous factor selection and collapsing levels in anova. Biometrics, 65, 169–177.MathSciNetMATHCrossRef
Zurück zum Zitat Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (2nd ed.). New York: Springer.MATHCrossRef Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (2nd ed.). New York: Springer.MATHCrossRef
Zurück zum Zitat Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.MathSciNetMATHCrossRef Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.MathSciNetMATHCrossRef
Zurück zum Zitat Gertheiss, J. (2011). ordPens: Selection and/or Smoothing of Ordinal Predictors. R package version 0.1–7 Gertheiss, J. (2011). ordPens: Selection and/or Smoothing of Ordinal Predictors. R package version 0.1–7
Zurück zum Zitat Gertheiss, J., & Tutz, G. (2009). Penalized regression with ordinal predictors. International Statistical Review, 77, 345–365.CrossRef Gertheiss, J., & Tutz, G. (2009). Penalized regression with ordinal predictors. International Statistical Review, 77, 345–365.CrossRef
Zurück zum Zitat Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4, 2150–2180.MathSciNetMATHCrossRef Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4, 2150–2180.MathSciNetMATHCrossRef
Zurück zum Zitat Leisch, F., & Dimitriadou, E. (2010). mlbench: Machine Learning Benchmark Problems. R package version 2.0-0 Leisch, F., & Dimitriadou, E. (2010). mlbench: Machine Learning Benchmark Problems. R package version 2.0-0
Zurück zum Zitat McCullagh, P., & Nelder, J. A. (1989). Generalized linear models, 2nd edn. New York: Chapman & HallMATH McCullagh, P., & Nelder, J. A. (1989). Generalized linear models, 2nd edn. New York: Chapman & HallMATH
Zurück zum Zitat Park, M.Y, & Hastie, T. (2007). L1 regularization-path algorithm for generalized linear models. Journal of the Royal Statistical Society, B 69, 659–677. Park, M.Y, & Hastie, T. (2007). L1 regularization-path algorithm for generalized linear models. Journal of the Royal Statistical Society, B 69, 659–677.
Zurück zum Zitat Stelz, V. (2010). L1-Regularisierung bei kategorialen Prädiktoren in generalisierten linearen modellen. Master thesis, Ludwig-Maximilians-University Munich Stelz, V. (2010). L1-Regularisierung bei kategorialen Prädiktoren in generalisierten linearen modellen. Master thesis, Ludwig-Maximilians-University Munich
Zurück zum Zitat Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267–288.MathSciNetMATH Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267–288.MathSciNetMATH
Zurück zum Zitat Ulbricht, J. (2010). lqa: Penalized Likelihood Inference for GLMs. R package version 1.0–3 Ulbricht, J. (2010). lqa: Penalized Likelihood Inference for GLMs. R package version 1.0–3
Zurück zum Zitat Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87, 9193–9196.MATHCrossRef Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87, 9193–9196.MATHCrossRef
Zurück zum Zitat Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.MathSciNetMATHCrossRef Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.MathSciNetMATHCrossRef
Metadaten
Titel
Regularization and Model Selection with Categorical Covariates
verfasst von
Jan Gertheiss
Veronika Stelz
Gerhard Tutz
Copyright-Jahr
2013
DOI
https://doi.org/10.1007/978-3-319-00035-0_21