Skip to main content
Erschienen in: Advances in Data Analysis and Classification 1/2016

01.03.2016 | Regular Article

Supervised clustering of variables

verfasst von: Mingkun Chen, Evelyne Vigneau

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In predictive modelling, highly correlated predictors lead to unstable models that are often difficult to interpret. The selection of features, or the use of latent components that reduce the complexity among correlated observed variables, are common strategies. Our objective with the new procedure that we advocate here is to achieve both purposes: to highlight the group structure among the variables and to identify the most relevant groups of variables for prediction. The proposed procedure is an iterative adaptation of a method developed for the clustering of variables around latent variables (CLV). Modification of the standard CLV algorithm leads to a supervised procedure, in the sense that the variable to be predicted plays an active role in the clustering. The latent variables associated with the groups of variables, selected for their “proximity” to the variable to be predicted and their “internal homogeneity”, are progressively added in a predictive model. The features of the methodology are illustrated based on a simulation study and a real-world application.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Barnes RJ, Dhanoa MS, Lister SJ (1989) Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl Spectrosc 45:772–777CrossRef Barnes RJ, Dhanoa MS, Lister SJ (1989) Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl Spectrosc 45:772–777CrossRef
Zurück zum Zitat Chun H, Keles S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc B 72(1):3–25CrossRefMathSciNet Chun H, Keles S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc B 72(1):3–25CrossRefMathSciNet
Zurück zum Zitat Filzmoser P, Liebmann B, Varmuza K (2009) Repeated double cross validation. J Chemom 23:160–171CrossRef Filzmoser P, Liebmann B, Varmuza K (2009) Repeated double cross validation. J Chemom 23:160–171CrossRef
Zurück zum Zitat Hastie T, Tibshirani R, Botstein D, Brown P (2001) Supervised harvesting of expression trees. Genom Biol 2(1):1–12 Hastie T, Tibshirani R, Botstein D, Brown P (2001) Supervised harvesting of expression trees. Genom Biol 2(1):1–12
Zurück zum Zitat Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the lasso. J Comput Graph Stat 12:531–547CrossRefMathSciNet Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the lasso. J Comput Graph Stat 12:531–547CrossRefMathSciNet
Zurück zum Zitat Le Cao KA, Rossouw D, Robert-Grani C, Besse P (2008) Sparse PLS: variable selection when integrating omics data. Stat Appl Genet Mol Biol 7(1): Art No 35 Le Cao KA, Rossouw D, Robert-Grani C, Besse P (2008) Sparse PLS: variable selection when integrating omics data. Stat Appl Genet Mol Biol 7(1): Art No 35
Zurück zum Zitat Le Thi HA, Le HM, Nguyen VV, Dinh TP (2008) A DC programming approach for feature selection in support vector machines learning. Adv Data Anal Classif 2:259–278CrossRefMathSciNetMATH Le Thi HA, Le HM, Nguyen VV, Dinh TP (2008) A DC programming approach for feature selection in support vector machines learning. Adv Data Anal Classif 2:259–278CrossRefMathSciNetMATH
Zurück zum Zitat Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6(5):267–281CrossRef Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6(5):267–281CrossRef
Zurück zum Zitat Naes T, Kowalski B (1989) Predicting sensory profiles from external instrumental measurements. Food Qual Prefer 1:135–147CrossRef Naes T, Kowalski B (1989) Predicting sensory profiles from external instrumental measurements. Food Qual Prefer 1:135–147CrossRef
Zurück zum Zitat Park MY, Hastie T, Tibshirani R (2007) Averaged gene expressions for regression. Biostatistics 8(2):212–227 Park MY, Hastie T, Tibshirani R (2007) Averaged gene expressions for regression. Biostatistics 8(2):212–227
Zurück zum Zitat Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analysers. Adv Data Anal Classif 7(1):5–40CrossRefMathSciNetMATH Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analysers. Adv Data Anal Classif 7(1):5–40CrossRefMathSciNetMATH
Zurück zum Zitat Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58(1):267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58(1):267–288MathSciNetMATH
Zurück zum Zitat Vigneau E, Thomas F (2012) Model calibration and feature selection for orange juice authentication by 1H NMR spectroscopy. Chemom Intell Lab 117:22–30CrossRef Vigneau E, Thomas F (2012) Model calibration and feature selection for orange juice authentication by 1H NMR spectroscopy. Chemom Intell Lab 117:22–30CrossRef
Zurück zum Zitat Vigneau E, Sahmer K, Qannari EM, Bertrand D (2005) Clustering of variables to analyze spectral data. J Chemom 19(3):122–128 Vigneau E, Sahmer K, Qannari EM, Bertrand D (2005) Clustering of variables to analyze spectral data. J Chemom 19(3):122–128
Zurück zum Zitat Vigneau E, Endrizzi I, Qannari E (2011) Finding and explaining clusters of consumers using the CLV approach. Food Qual Pref 22(4):705–713CrossRef Vigneau E, Endrizzi I, Qannari E (2011) Finding and explaining clusters of consumers using the CLV approach. Food Qual Pref 22(4):705–713CrossRef
Zurück zum Zitat Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B 67(3):301–320 Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B 67(3):301–320
Metadaten
Titel
Supervised clustering of variables
verfasst von
Mingkun Chen
Evelyne Vigneau
Publikationsdatum
01.03.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 1/2016
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-014-0191-5

Weitere Artikel der Ausgabe 1/2016

Advances in Data Analysis and Classification 1/2016 Zur Ausgabe

Premium Partner