Skip to main content
Top

2013 | OriginalPaper | Chapter

Regularization and Model Selection with Categorical Covariates

Authors : Jan Gertheiss, Veronika Stelz, Gerhard Tutz

Published in: Algorithms from and for Nature and Life

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The challenge in regression problems with categorical covariates is the high number of parameters involved. Common regularization methods like the Lasso, which allow for selection of predictors, are typically designed for metric predictors. If independent variables are categorical, selection strategies should be based on modified penalties. For categorical predictor variables with many categories a useful strategy is to search for clusters of categories with similar effects. We focus on generalized linear models and present L 1-penalty approaches for factor selection and clustering of categories. The methods proposed are investigated in simulation studies and applied to a real world classification problem.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The original dataset (Wolberg and Mangasarian 1990) was of size 369 (reported January 1989). Two instances were removed later and additional groups of all in all 332 samples were collected (between October 1989 and November 1991).
 
Literature
go back to reference Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (2nd ed.). New York: Springer.MATHCrossRef Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (2nd ed.). New York: Springer.MATHCrossRef
go back to reference Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.MathSciNetMATHCrossRef Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.MathSciNetMATHCrossRef
go back to reference Gertheiss, J. (2011). ordPens: Selection and/or Smoothing of Ordinal Predictors. R package version 0.1–7 Gertheiss, J. (2011). ordPens: Selection and/or Smoothing of Ordinal Predictors. R package version 0.1–7
go back to reference Gertheiss, J., & Tutz, G. (2009). Penalized regression with ordinal predictors. International Statistical Review, 77, 345–365.CrossRef Gertheiss, J., & Tutz, G. (2009). Penalized regression with ordinal predictors. International Statistical Review, 77, 345–365.CrossRef
go back to reference Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4, 2150–2180.MathSciNetMATHCrossRef Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4, 2150–2180.MathSciNetMATHCrossRef
go back to reference Leisch, F., & Dimitriadou, E. (2010). mlbench: Machine Learning Benchmark Problems. R package version 2.0-0 Leisch, F., & Dimitriadou, E. (2010). mlbench: Machine Learning Benchmark Problems. R package version 2.0-0
go back to reference McCullagh, P., & Nelder, J. A. (1989). Generalized linear models, 2nd edn. New York: Chapman & HallMATH McCullagh, P., & Nelder, J. A. (1989). Generalized linear models, 2nd edn. New York: Chapman & HallMATH
go back to reference Park, M.Y, & Hastie, T. (2007). L1 regularization-path algorithm for generalized linear models. Journal of the Royal Statistical Society, B 69, 659–677. Park, M.Y, & Hastie, T. (2007). L1 regularization-path algorithm for generalized linear models. Journal of the Royal Statistical Society, B 69, 659–677.
go back to reference Stelz, V. (2010). L1-Regularisierung bei kategorialen Prädiktoren in generalisierten linearen modellen. Master thesis, Ludwig-Maximilians-University Munich Stelz, V. (2010). L1-Regularisierung bei kategorialen Prädiktoren in generalisierten linearen modellen. Master thesis, Ludwig-Maximilians-University Munich
go back to reference Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267–288.MathSciNetMATH Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267–288.MathSciNetMATH
go back to reference Ulbricht, J. (2010). lqa: Penalized Likelihood Inference for GLMs. R package version 1.0–3 Ulbricht, J. (2010). lqa: Penalized Likelihood Inference for GLMs. R package version 1.0–3
go back to reference Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87, 9193–9196.MATHCrossRef Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87, 9193–9196.MATHCrossRef
Metadata
Title
Regularization and Model Selection with Categorical Covariates
Authors
Jan Gertheiss
Veronika Stelz
Gerhard Tutz
Copyright Year
2013
DOI
https://doi.org/10.1007/978-3-319-00035-0_21

Premium Partner