Skip to main content
Erschienen in: Advances in Data Analysis and Classification 4/2022

10.11.2021 | Regular Article

Sparse dimension reduction based on energy and ball statistics

verfasst von: Emmanuel Jordy Menvouta, Sven Serneels, Tim Verdonck

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Two new methods for sparse dimension reduction are introduced, based on martingale difference divergence and ball covariance, respectively. These methods can be utilized straightforwardly as sufficient dimension reduction (SDR) techniques to estimate a sufficient dimension reduced subspace, which contains all information sufficient to explain a dependent variable. Moreover, owing to their sparsity, they intrinsically perform sufficient variable selection (SVS) and present two attractive new approaches to variable selection in a context of nonlinear dependencies that require few model assumptions. The two new methods are compared to a similar existing approach for SDR and SVS based on distance covariance, as well as to classical and robust sparse partial least squares. A simulation study shows that each of the new estimators can achieve correct variable selection in highly nonlinear contexts, yet are sensitive to outliers and computationally intensive. The study sheds light on the subtle differences between the methods. Two examples illustrate how they can be applied in practice, with a slight preference for the option based on martingale difference divergence in a bioinformatics example.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Bühlmann P, Kalisch M, Meier L (2014) High-dimensional statistics with a view toward applications in biology. Ann Rev Stat Appl 1(1):255–278CrossRef Bühlmann P, Kalisch M, Meier L (2014) High-dimensional statistics with a view toward applications in biology. Ann Rev Stat Appl 1(1):255–278CrossRef
Zurück zum Zitat Chen X, Zou C, Cook R (2010) Coordinate-independent sparse sufficient dimension reduction and variable selection. Ann Stat 38(6):3696–3723MathSciNetCrossRefMATH Chen X, Zou C, Cook R (2010) Coordinate-independent sparse sufficient dimension reduction and variable selection. Ann Stat 38(6):3696–3723MathSciNetCrossRefMATH
Zurück zum Zitat Chen X, Shen W, Yin X (2018) Efficient sparse estimate of sufficient dimension reduction in high dimension. Technometrics 60(2):161–168MathSciNetCrossRef Chen X, Shen W, Yin X (2018) Efficient sparse estimate of sufficient dimension reduction in high dimension. Technometrics 60(2):161–168MathSciNetCrossRef
Zurück zum Zitat Chun H, Keleş S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J Roy Stat Soc B 72:3–25MathSciNetCrossRefMATH Chun H, Keleş S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J Roy Stat Soc B 72:3–25MathSciNetCrossRefMATH
Zurück zum Zitat Conover W (1999) Practical nonparametric statistics. 3rd edn. Wiley series in probability and statistics, vol VIII, p 584. Wiley, New York Conover W (1999) Practical nonparametric statistics. 3rd edn. Wiley series in probability and statistics, vol VIII, p 584. Wiley, New York
Zurück zum Zitat Edelman A, Arias TA, Smith ST (1999) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353MathSciNetCrossRefMATH Edelman A, Arias TA, Smith ST (1999) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353MathSciNetCrossRefMATH
Zurück zum Zitat Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRefMATH Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRefMATH
Zurück zum Zitat Harrison D, Rubinfeld D (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1):81–102CrossRefMATH Harrison D, Rubinfeld D (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1):81–102CrossRefMATH
Zurück zum Zitat Hilafu H, Yin X (2017) Sufficient dimension reduction and variable selection for large-p-small-n data with highly correlated predictors. J Comput Graph Stat 26(1):26–34MathSciNetCrossRef Hilafu H, Yin X (2017) Sufficient dimension reduction and variable selection for large-p-small-n data with highly correlated predictors. J Comput Graph Stat 26(1):26–34MathSciNetCrossRef
Zurück zum Zitat Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621CrossRefMATH Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621CrossRefMATH
Zurück zum Zitat Knyazev A, Argentati M (2002) Principal angles between subspaces in an a-based scalar product: algorithms and perturbation estimates. SIAM J Sci Comput 23:2008–2040MathSciNetCrossRefMATH Knyazev A, Argentati M (2002) Principal angles between subspaces in an a-based scalar product: algorithms and perturbation estimates. SIAM J Sci Comput 23:2008–2040MathSciNetCrossRefMATH
Zurück zum Zitat Li B (2018) Sufficient dimension reduction: methods and applications with R. Monographs on statistics and applied probability. Chapman & Hall /CRC, New YorkCrossRefMATH Li B (2018) Sufficient dimension reduction: methods and applications with R. Monographs on statistics and applied probability. Chapman & Hall /CRC, New YorkCrossRefMATH
Zurück zum Zitat Menvouta EJ, Serneels S, Verdonck T (2020) direpack: A Python 3 package for state-of-the-art statistical dimension reduction methods. arXiv eprints arXiv:2006.01635 Menvouta EJ, Serneels S, Verdonck T (2020) direpack: A Python 3 package for state-of-the-art statistical dimension reduction methods. arXiv eprints arXiv:​2006.​01635
Zurück zum Zitat Nghiem L, Hui FKC, Mueller S, Welsh AH (2021) Sparse sliced inverse regression via Cholesky matrix penalization. arXiv:2104.09838 [stat.ME] Nghiem L, Hui FKC, Mueller S, Welsh AH (2021) Sparse sliced inverse regression via Cholesky matrix penalization. arXiv:​2104.​09838 [stat.ME]
Zurück zum Zitat Pan W, Wang X, Zhang H, Zhu H, Zhu J (2020) Ball covariance: a generic measure of dependence in Banach space. J Am Stat Assoc 115(529):307–317MathSciNetCrossRefMATH Pan W, Wang X, Zhang H, Zhu H, Zhu J (2020) Ball covariance: a generic measure of dependence in Banach space. J Am Stat Assoc 115(529):307–317MathSciNetCrossRefMATH
Zurück zum Zitat Raymaekers J, Rousseeuw PJ (2019) Fast robust correlation for high-dimensional data. Technometrics 1–15 Raymaekers J, Rousseeuw PJ (2019) Fast robust correlation for high-dimensional data. Technometrics 1–15
Zurück zum Zitat Shao X, Zhang J (2014) Martingale difference correlation and its use in high-dimensional variable screening. J Am Stat Assoc 109(507):1302–1318MathSciNetCrossRefMATH Shao X, Zhang J (2014) Martingale difference correlation and its use in high-dimensional variable screening. J Am Stat Assoc 109(507):1302–1318MathSciNetCrossRefMATH
Zurück zum Zitat Sheng W, Yin X (2016) Sufficient dimension reduction via distance covariance. J Comput Graph Stat 25(1):91–104MathSciNetCrossRef Sheng W, Yin X (2016) Sufficient dimension reduction via distance covariance. J Comput Graph Stat 25(1):91–104MathSciNetCrossRef
Zurück zum Zitat Székely G, Rizzo M (2013a) Energy statistics: a class of statistics based on distances. J Stat Plan Inference 143:1249–1272MathSciNetCrossRefMATH Székely G, Rizzo M (2013a) Energy statistics: a class of statistics based on distances. J Stat Plan Inference 143:1249–1272MathSciNetCrossRefMATH
Zurück zum Zitat Székely G, Rizzo M, Bakirov N (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):2769–2794MathSciNetCrossRefMATH Székely G, Rizzo M, Bakirov N (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):2769–2794MathSciNetCrossRefMATH
Zurück zum Zitat Wächter A, Biegler L-T (2006) On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math Program 106:25–57MathSciNetCrossRefMATH Wächter A, Biegler L-T (2006) On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math Program 106:25–57MathSciNetCrossRefMATH
Zurück zum Zitat Yin X, Hilafu H (2015) Sequential sufficient dimension reduction for large p, small n problems. J Roy Stat Soc Ser B (Stat Method) 77(4):879–892 Yin X, Hilafu H (2015) Sequential sufficient dimension reduction for large p, small n problems. J Roy Stat Soc Ser B (Stat Method) 77(4):879–892
Zurück zum Zitat Zhou J, He X (2008) Dimension reduction based on constrained canonical correlation and variable filtering. Ann Stat 36(4):1649–1668MathSciNetCrossRefMATH Zhou J, He X (2008) Dimension reduction based on constrained canonical correlation and variable filtering. Ann Stat 36(4):1649–1668MathSciNetCrossRefMATH
Zurück zum Zitat Zhang Y, Liu J, Wu Y, Fang X (2019) A martingale-difference-divergencebased estimation of central mean subspace. Stat Interface 12:489–500MathSciNetCrossRefMATH Zhang Y, Liu J, Wu Y, Fang X (2019) A martingale-difference-divergencebased estimation of central mean subspace. Stat Interface 12:489–500MathSciNetCrossRefMATH
Metadaten
Titel
Sparse dimension reduction based on energy and ball statistics
verfasst von
Emmanuel Jordy Menvouta
Sven Serneels
Tim Verdonck
Publikationsdatum
10.11.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 4/2022
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-021-00470-7

Weitere Artikel der Ausgabe 4/2022

Advances in Data Analysis and Classification 4/2022 Zur Ausgabe

Premium Partner