Skip to main content
Erschienen in: Advances in Data Analysis and Classification 1/2023

15.01.2022 | Regular Article

Clusterwise elastic-net regression based on a combined information criterion

verfasst von: Xavier Bry, Ndèye Niang, Thomas Verron, Stéphanie Bougeard

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many research questions pertain to a regression problem assuming that the population under study is not homogeneous with respect to the underlying model. In this setting, we propose an original method called Combined Information criterion CLUSterwise elastic-net regression (Ciclus). This method handles several methodological and application-related challenges. It is derived from both the information theory and the microeconomic utility theory and maximizes a well-defined criterion combining three weighted sub-criteria, each being related to a specific aim: getting a parsimonious partition, compact clusters for a better prediction of cluster-membership, and a good within-cluster regression fit. The solving algorithm is monotonously convergent, under mild assumptions. The Ciclus principle provides an innovative solution to two key issues: (i) the automatic optimization of the number of clusters, (ii) the proposal of a prediction model. We applied it to elastic-net regression in order to be able to manage high-dimensional data involving redundant explanatory variables. Ciclus is illustrated through both a simulation study and a real example in the field of omic data, showing how it improves the quality of the prediction and facilitates the interpretation. It should therefore prove useful whenever the data involve a population mixture as for example in biology, social sciences, economics or marketing.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Ahonen I, Nevalainen J, Larocque D (2019) Prediction with a flexible finite mixture-of-regressions. Comput Stat Data Anal 132:212–224MathSciNetCrossRefMATH Ahonen I, Nevalainen J, Larocque D (2019) Prediction with a flexible finite mixture-of-regressions. Comput Stat Data Anal 132:212–224MathSciNetCrossRefMATH
Zurück zum Zitat Aldana-Bobadilla E, Kuri-Morales A (2015) A clustering method based on the maximum entropy principle. Entropy 151–180 Aldana-Bobadilla E, Kuri-Morales A (2015) A clustering method based on the maximum entropy principle. Entropy 151–180
Zurück zum Zitat Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12:2385–2404CrossRef Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12:2385–2404CrossRef
Zurück zum Zitat Beck G, Azzag H, Bougeard S, Lebbah M, Niang N (2018) A new micro-batch approach for partial least square clusterwise regression. Procedia Comput Sci 144:239–250CrossRef Beck G, Azzag H, Bougeard S, Lebbah M, Niang N (2018) A new micro-batch approach for partial least square clusterwise regression. Procedia Comput Sci 144:239–250CrossRef
Zurück zum Zitat Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE T Pattern Anal 22:719–725CrossRef Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE T Pattern Anal 22:719–725CrossRef
Zurück zum Zitat Biernacki C, Garcia-Escudero L, S I (2020) Special issue on innovations on model based clustering and classification. Adv Data Anal Classif 14(2):231–234 Biernacki C, Garcia-Escudero L, S I (2020) Special issue on innovations on model based clustering and classification. Adv Data Anal Classif 14(2):231–234
Zurück zum Zitat Bock H (1969) The equivalence of two extremal problems and its application to the iterative classification of multivariate data. In: Vortragsausarbeitung, Tagung. Mathematisches Forschungsinstitut Oberwolfach Bock H (1969) The equivalence of two extremal problems and its application to the iterative classification of multivariate data. In: Vortragsausarbeitung, Tagung. Mathematisches Forschungsinstitut Oberwolfach
Zurück zum Zitat Bougeard S, Abdi H, Saporta G, Niang N (2017) Clusterwise analysis for multiblock component methods. Adv Data Anal Classif 12(2):285–313MathSciNetCrossRefMATH Bougeard S, Abdi H, Saporta G, Niang N (2017) Clusterwise analysis for multiblock component methods. Adv Data Anal Classif 12(2):285–313MathSciNetCrossRefMATH
Zurück zum Zitat Bougeard S, Cariou V, Saporta G, Niang N (2018) Prediction for regularized clusterwise multiblock regression. Appl Stoch Model Bus 34(6):852–867MathSciNetCrossRefMATH Bougeard S, Cariou V, Saporta G, Niang N (2018) Prediction for regularized clusterwise multiblock regression. Appl Stoch Model Bus 34(6):852–867MathSciNetCrossRefMATH
Zurück zum Zitat Brusco M, Cradit J, Taschian A (2003) Multicriterion clusterwise regression for joint segmentation settings: an application to customer value. J Mark Res 40:225–234CrossRef Brusco M, Cradit J, Taschian A (2003) Multicriterion clusterwise regression for joint segmentation settings: an application to customer value. J Mark Res 40:225–234CrossRef
Zurück zum Zitat Brusco M, Cradit J, Steinley D, Fox G (2008) Cautionary remarks on the use of clusterwise regression. Multivar Behav Res 43:29–49CrossRef Brusco M, Cradit J, Steinley D, Fox G (2008) Cautionary remarks on the use of clusterwise regression. Multivar Behav Res 43:29–49CrossRef
Zurück zum Zitat Bry X, Verron T, Redont P, Cazes P (2012) THEME-SEER: a multidimensional exploratory technique to analyze a structural model using an extended covariance criterion. J Chemom 26:158–169CrossRef Bry X, Verron T, Redont P, Cazes P (2012) THEME-SEER: a multidimensional exploratory technique to analyze a structural model using an extended covariance criterion. J Chemom 26:158–169CrossRef
Zurück zum Zitat Bry X, Trottier C, Mortier F, Cornu T, Verron T (2016) Supervised component generalized linear regression with multiple explanatory blocks: THEME-SCGLR. In: Vinzi V, Russolillo G, Saporta G, Trinchera L, Abdi H (eds) The multiple facets of partial least squares and related methods, Springer proceedings in mathematics and statistics, pp 141–154 Bry X, Trottier C, Mortier F, Cornu T, Verron T (2016) Supervised component generalized linear regression with multiple explanatory blocks: THEME-SCGLR. In: Vinzi V, Russolillo G, Saporta G, Trinchera L, Abdi H (eds) The multiple facets of partial least squares and related methods, Springer proceedings in mathematics and statistics, pp 141–154
Zurück zum Zitat Bushel P, Wolfinger R, Gibson G (2007) Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes. BMC Syst Biol 1–15 Bushel P, Wolfinger R, Gibson G (2007) Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes. BMC Syst Biol 1–15
Zurück zum Zitat Charles C (1977) Régression typologique et reconnaissance des formes. PhD thesis, University of Paris IX, France Charles C (1977) Régression typologique et reconnaissance des formes. PhD thesis, University of Paris IX, France
Zurück zum Zitat Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) Nbclust: an r package for determining the relevant number of clusters in a data set. J Stat Softw 61:1–36CrossRef Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) Nbclust: an r package for determining the relevant number of clusters in a data set. J Stat Softw 61:1–36CrossRef
Zurück zum Zitat Cheng C, Fu A, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, pp 84–93 Cheng C, Fu A, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, pp 84–93
Zurück zum Zitat Cover T, Thomas J (2006) Elements of Information Theory, 2nd edn. Wiley Cover T, Thomas J (2006) Elements of Information Theory, 2nd edn. Wiley
Zurück zum Zitat DeSarbo W, Grisaffe D (1998) Combinatorial optimization approaches to constrained market segmentation: an application to industrial market segmentation. Mark Lett 9:115–134CrossRef DeSarbo W, Grisaffe D (1998) Combinatorial optimization approaches to constrained market segmentation: an application to industrial market segmentation. Mark Lett 9:115–134CrossRef
Zurück zum Zitat Devijver E (2015) Finite mixture regression: a sparse variable selection by model selection for clustering. Electron J Stat 9:2642–2674MathSciNetCrossRefMATH Devijver E (2015) Finite mixture regression: a sparse variable selection by model selection for clustering. Electron J Stat 9:2642–2674MathSciNetCrossRefMATH
Zurück zum Zitat Diday E (1976) Classification et sélection de paramètres sous contraintes. Tech. rep, IRIA-LABORIA Diday E (1976) Classification et sélection de paramètres sous contraintes. Tech. rep, IRIA-LABORIA
Zurück zum Zitat Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRef Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRef
Zurück zum Zitat Gitman I, Chen J, Lei E, Dubrawski A (2018) Novel prediction techniques based on clusterwise linear regression. arXiv arXiv:1804.10742 Gitman I, Chen J, Lei E, Dubrawski A (2018) Novel prediction techniques based on clusterwise linear regression. arXiv arXiv:​1804.​10742
Zurück zum Zitat Heinloth A, Irwin R, Boorman G, Nettesheim P, Fannin R, Sieber S, Snell M, Tucker C, Li L, Travlos G, Vansant G, Blackshear P, Tennant R, Cunningham M, Paules R (2004) Gene expression profiling of rat livers reveals indicators of potential adverse effects. Toxicol Sci 80:193–202 Heinloth A, Irwin R, Boorman G, Nettesheim P, Fannin R, Sieber S, Snell M, Tucker C, Li L, Travlos G, Vansant G, Blackshear P, Tennant R, Cunningham M, Paules R (2004) Gene expression profiling of rat livers reveals indicators of potential adverse effects. Toxicol Sci 80:193–202
Zurück zum Zitat Heller R, Stanley D, Yekutieli D, Rubin N, Benjamini Y (2006) Cluster-based analysis of FMRI data. NeuroImage 33:599–608CrossRef Heller R, Stanley D, Yekutieli D, Rubin N, Benjamini Y (2006) Cluster-based analysis of FMRI data. NeuroImage 33:599–608CrossRef
Zurück zum Zitat Hubert H, Arabie P (1985) Comparing partitions. J Classif 193–218 Hubert H, Arabie P (1985) Comparing partitions. J Classif 193–218
Zurück zum Zitat Le Cao K, Rossouw D, Robert-Granie C, Besse P (2008) A sparse PLS for variable selection when integrating omics data. Stat Appl Genet Mol 7:1MathSciNetMATH Le Cao K, Rossouw D, Robert-Granie C, Besse P (2008) A sparse PLS for variable selection when integrating omics data. Stat Appl Genet Mol 7:1MathSciNetMATH
Zurück zum Zitat Leisch F (2004) FlexMix: a general framework for finite mixture models and latent class regression in R. J Stat Softw 11:1CrossRef Leisch F (2004) FlexMix: a general framework for finite mixture models and latent class regression in R. J Stat Softw 11:1CrossRef
Zurück zum Zitat Mortier F, Ouedraogo D, Claeys F, Tadesse M, Cornu G, Baya F, Benedet F, Freycon V, Gourlet-Fleury S, Picard N (2015) Mixture of inhomogeneous matrix models for species-rich ecosystems. Environmetrics 26:39–51MathSciNetCrossRef Mortier F, Ouedraogo D, Claeys F, Tadesse M, Cornu G, Baya F, Benedet F, Freycon V, Gourlet-Fleury S, Picard N (2015) Mixture of inhomogeneous matrix models for species-rich ecosystems. Environmetrics 26:39–51MathSciNetCrossRef
Zurück zum Zitat Nadaraya E (1964) On estimating regression. Theory of probability and its applications. Theory Probab Appl 9:141–142CrossRef Nadaraya E (1964) On estimating regression. Theory of probability and its applications. Theory Probab Appl 9:141–142CrossRef
Zurück zum Zitat Rand W (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850CrossRef Rand W (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850CrossRef
Zurück zum Zitat Rohart F, Gautier B, Singh A, Le Cao KA (2017) mixomics: an r package for ’omics feature selection and multiple data integration. PLoS computational biology 13(11):e1005752 Rohart F, Gautier B, Singh A, Le Cao KA (2017) mixomics: an r package for ’omics feature selection and multiple data integration. PLoS computational biology 13(11):e1005752
Zurück zum Zitat Shannon C (1948) A mathematical theory of communication. L’Institut d’electronique et d’informatique Gaspard-Monge (Reprinted with corrections from The Bell System Technical Journal) 27:379–423 Shannon C (1948) A mathematical theory of communication. L’Institut d’electronique et d’informatique Gaspard-Monge (Reprinted with corrections from The Bell System Technical Journal) 27:379–423
Zurück zum Zitat Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B 36:111–147MathSciNetMATH Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B 36:111–147MathSciNetMATH
Zurück zum Zitat Vinzi V, Lauro C, Amato S (2005) PLS typological regression. In: Monari P, Mignani S, Montanari A, Vichi M (eds) New developments in classification and data analysis. Springer, pp 133–140 Vinzi V, Lauro C, Amato S (2005) PLS typological regression. In: Monari P, Mignani S, Montanari A, Vichi M (eds) New developments in classification and data analysis. Springer, pp 133–140
Zurück zum Zitat Vinzi V, Trinchera L, Squillacciotti S, Tenenhaus M (2009) REBUS-PLS: a response-based procedure for detecting unit segments in PLS path modeling. Appl Stochastic Models Bus Ind 24:439–458CrossRefMATH Vinzi V, Trinchera L, Squillacciotti S, Tenenhaus M (2009) REBUS-PLS: a response-based procedure for detecting unit segments in PLS path modeling. Appl Stochastic Models Bus Ind 24:439–458CrossRefMATH
Zurück zum Zitat Wilderjans T, Ceulemans E (2013) Clusterwise Parafac to identify heterogeneity in three-way data. Chemometr Intell Lab 129:87–97CrossRef Wilderjans T, Ceulemans E (2013) Clusterwise Parafac to identify heterogeneity in three-way data. Chemometr Intell Lab 129:87–97CrossRef
Zurück zum Zitat Wilderjans T, Vande Gaer E, Kiers H, Van Mechelen I, Ceulemans E (2017) Principal covariates clusterwise regression (PCCR): Accounting for multicollinearity and population heterogeneity in hierarchically organized data. Psychometrika 82:86–111MathSciNetCrossRefMATH Wilderjans T, Vande Gaer E, Kiers H, Van Mechelen I, Ceulemans E (2017) Principal covariates clusterwise regression (PCCR): Accounting for multicollinearity and population heterogeneity in hierarchically organized data. Psychometrika 82:86–111MathSciNetCrossRefMATH
Zurück zum Zitat Woo CW, Krishnan A, Wager T (2014) Cluster-extent based thresholding in fMRI analyses: Pitfalls and recommendations. Neuroimage 91:412–419CrossRef Woo CW, Krishnan A, Wager T (2014) Cluster-extent based thresholding in fMRI analyses: Pitfalls and recommendations. Neuroimage 91:412–419CrossRef
Zurück zum Zitat Xiang S, Yao W (2020) Semi parametric mixtures of regressions with single-index for model based clustering. Adv Data Anal Classif 14:261–292MathSciNetCrossRefMATH Xiang S, Yao W (2020) Semi parametric mixtures of regressions with single-index for model based clustering. Adv Data Anal Classif 14:261–292MathSciNetCrossRefMATH
Metadaten
Titel
Clusterwise elastic-net regression based on a combined information criterion
verfasst von
Xavier Bry
Ndèye Niang
Thomas Verron
Stéphanie Bougeard
Publikationsdatum
15.01.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 1/2023
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-021-00489-w

Weitere Artikel der Ausgabe 1/2023

Advances in Data Analysis and Classification 1/2023 Zur Ausgabe

Premium Partner