Skip to main content
Erschienen in: Advances in Data Analysis and Classification 4/2022

07.10.2021 | Regular Article

The minimum weighted covariance determinant estimator for high-dimensional data

verfasst von: Jan Kalina, Jan Tichavský

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In a variety of diverse applications, it is very desirable to perform a robust analysis of high-dimensional measurements without being harmed by the presence of a possibly larger percentage of outlying measurements. The minimum weighted covariance determinant (MWCD) estimator, based on implicit weights assigned to individual observations, represents a promising and flexible extension of the popular minimum covariance determinant (MCD) estimator of the expectation and scatter matrix of mlutivariate data. In this work, a regularized version of the MWCD denoted as the minimum regularized weighted covariance determinant (MRWCD) estimator is proposed. At the same time, it is accompanied by an outlier detection procedure. The novel MRWCD estimator is able to outperform other available robust estimators in several simulation scenarios, especially in estimating the scatter matrix of contaminated high-dimensional data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agostinelli C, Leung A, Yohai VJ, Zamar RH (2015) Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. TEST 24:441–461MathSciNetMATHCrossRef Agostinelli C, Leung A, Yohai VJ, Zamar RH (2015) Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. TEST 24:441–461MathSciNetMATHCrossRef
Zurück zum Zitat Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300MathSciNetMATH Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300MathSciNetMATH
Zurück zum Zitat Boudt K, Rousseeuw PJ, Vanduffel S, Verdonck T (2020) The minimum regularized covariance determinant estimator. Stat Comput 30:113–128MathSciNetMATHCrossRef Boudt K, Rousseeuw PJ, Vanduffel S, Verdonck T (2020) The minimum regularized covariance determinant estimator. Stat Comput 30:113–128MathSciNetMATHCrossRef
Zurück zum Zitat Cerioli A, Riani M, Atkinson AC, Corbellini A (2018) The power of monitoring: how to make the most of a contaminated multivariate sample. Stat Methods Appl 27:559–587MathSciNetMATHCrossRef Cerioli A, Riani M, Atkinson AC, Corbellini A (2018) The power of monitoring: how to make the most of a contaminated multivariate sample. Stat Methods Appl 27:559–587MathSciNetMATHCrossRef
Zurück zum Zitat Chen Y, Wiesel A, Hero AO (2011) Robust shrinkage estimation of high dimensional covariance matrices. IEEE Trans Signal Process 59:4097–4107MathSciNetMATHCrossRef Chen Y, Wiesel A, Hero AO (2011) Robust shrinkage estimation of high dimensional covariance matrices. IEEE Trans Signal Process 59:4097–4107MathSciNetMATHCrossRef
Zurück zum Zitat Couillet R, McKay M (2014) Large dimensional analysis and optimization of robust shrinkage covariance matrix estimators. J Multivar Anal 131:99–120MathSciNetMATHCrossRef Couillet R, McKay M (2014) Large dimensional analysis and optimization of robust shrinkage covariance matrix estimators. J Multivar Anal 131:99–120MathSciNetMATHCrossRef
Zurück zum Zitat DeMiguel V, Martin-Utrera A, Nogales FJ (2013) Size matters: optimal calibration of shrinkage estimators for portfolio selection. J Bank Finance 37:3018–3034CrossRef DeMiguel V, Martin-Utrera A, Nogales FJ (2013) Size matters: optimal calibration of shrinkage estimators for portfolio selection. J Bank Finance 37:3018–3034CrossRef
Zurück zum Zitat Filzmoser P, Todorov V (2011) Review of robust multivariate statistical methods in high dimension. Anal Chinica Acta 705:2–14CrossRef Filzmoser P, Todorov V (2011) Review of robust multivariate statistical methods in high dimension. Anal Chinica Acta 705:2–14CrossRef
Zurück zum Zitat Fritsch V, Varoquaux G, Thyreau B, Poline JB, Thirion B (2011) Detecting outlying subjects in high-dimensional neuroimaging datasets with regularized minimum covariance determinant. Lect Notes Comput Sci 6893:264–271CrossRef Fritsch V, Varoquaux G, Thyreau B, Poline JB, Thirion B (2011) Detecting outlying subjects in high-dimensional neuroimaging datasets with regularized minimum covariance determinant. Lect Notes Comput Sci 6893:264–271CrossRef
Zurück zum Zitat Gschwandtner M, Filzmoser P (2013) Outlier detection in high dimension using regularization. In: Kruse R et al (eds) Synergies of soft computing and statistics. Springer, Berlin, pp 37–244 Gschwandtner M, Filzmoser P (2013) Outlier detection in high dimension using regularization. In: Kruse R et al (eds) Synergies of soft computing and statistics. Springer, Berlin, pp 37–244
Zurück zum Zitat Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca RatonMATHCrossRef Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca RatonMATHCrossRef
Zurück zum Zitat Hubert M, Debruyne M (2010) Minimal covariance determinant. Wiley Interdiscip Rev Comput Stat 2:36–43CrossRef Hubert M, Debruyne M (2010) Minimal covariance determinant. Wiley Interdiscip Rev Comput Stat 2:36–43CrossRef
Zurück zum Zitat Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47:64–79MathSciNetCrossRef Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47:64–79MathSciNetCrossRef
Zurück zum Zitat Hubert M, Rousseeuw PJ, Verdonck T (2012) A deterministic algorithm for robust location and scatter. J Comput Graph Stat 21:618–637MathSciNetCrossRef Hubert M, Rousseeuw PJ, Verdonck T (2012) A deterministic algorithm for robust location and scatter. J Comput Graph Stat 21:618–637MathSciNetCrossRef
Zurück zum Zitat Hubert M, Debruyne M, Rousseeuw PJ (2018) Minimum covariance determinant and extensions. WIREs Comput Stat 10:e1421MathSciNetCrossRef Hubert M, Debruyne M, Rousseeuw PJ (2018) Minimum covariance determinant and extensions. WIREs Comput Stat 10:e1421MathSciNetCrossRef
Zurück zum Zitat Jurečková J, Sen PK, Picek J (2013) Methodology in robust and nonparametric statistics. CRC Press, Boca RatonMATH Jurečková J, Sen PK, Picek J (2013) Methodology in robust and nonparametric statistics. CRC Press, Boca RatonMATH
Zurück zum Zitat Jurečková J, Picek J, Schindler M (2019) Robust statistical methods with R, 2nd edn. CRC Press, Boca RatonMATHCrossRef Jurečková J, Picek J, Schindler M (2019) Robust statistical methods with R, 2nd edn. CRC Press, Boca RatonMATHCrossRef
Zurück zum Zitat Kalina J, Tichavský J (2019) Statistical learning for recommending (robust) nonlinear regression methods. J Appl Math Stat Inform 15(2):47–59MathSciNetMATHCrossRef Kalina J, Tichavský J (2019) Statistical learning for recommending (robust) nonlinear regression methods. J Appl Math Stat Inform 15(2):47–59MathSciNetMATHCrossRef
Zurück zum Zitat Kalina J, Tichavský J (2020) On robust estimation of error variance in (highly) robust regression. Meas Sci Rev 20:6–14CrossRef Kalina J, Tichavský J (2020) On robust estimation of error variance in (highly) robust regression. Meas Sci Rev 20:6–14CrossRef
Zurück zum Zitat Kalina J, Hlinka J, (2017) Implicitly weighted robust classification applied to brain activity research. In: Fred A, Gamboa H (eds) Biomedical engineering systems and technologies BIOSTEC, (2016) Communications in Computer and Information Science 690. Springer, Cham, pp 87–107 Kalina J, Hlinka J, (2017) Implicitly weighted robust classification applied to brain activity research. In: Fred A, Gamboa H (eds) Biomedical engineering systems and technologies BIOSTEC, (2016) Communications in Computer and Information Science 690. Springer, Cham, pp 87–107
Zurück zum Zitat Karjanto S, Ramli NM, Ghani NAM, Aripin R, Yusop NM (2015) Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection. AIP Conf Proc 1643:225–231CrossRef Karjanto S, Ramli NM, Ghani NAM, Aripin R, Yusop NM (2015) Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection. AIP Conf Proc 1643:225–231CrossRef
Zurück zum Zitat Marozzi M, Mukherjee A, Kalina J (2020) Interpoint distance tests for high-dimensional comparison studies. J Appl Stat 47:653–665MathSciNetMATHCrossRef Marozzi M, Mukherjee A, Kalina J (2020) Interpoint distance tests for high-dimensional comparison studies. J Appl Stat 47:653–665MathSciNetMATHCrossRef
Zurück zum Zitat Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223CrossRef Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223CrossRef
Zurück zum Zitat Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New YorkMATHCrossRef Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New YorkMATHCrossRef
Zurück zum Zitat Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85:633–639CrossRef Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85:633–639CrossRef
Zurück zum Zitat Rusiecki A (2008) Robust MCD-based backpropagation learning algorithm. Lect Notes Artif Intell 5097:154–163 Rusiecki A (2008) Robust MCD-based backpropagation learning algorithm. Lect Notes Artif Intell 5097:154–163
Zurück zum Zitat Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4:32MathSciNetCrossRef Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4:32MathSciNetCrossRef
Zurück zum Zitat Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Softw 32(3):1–47CrossRef Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Softw 32(3):1–47CrossRef
Zurück zum Zitat Tong J, Hu R, Xi J, Xiao Z, Guo Q, Yu Y (2018) Linear shrinkage estimation of covariance matrices using low-complexity cross-validation. Signal Process 148:223–233CrossRef Tong J, Hu R, Xi J, Xiao Z, Guo Q, Yu Y (2018) Linear shrinkage estimation of covariance matrices using low-complexity cross-validation. Signal Process 148:223–233CrossRef
Zurück zum Zitat Víšek JÁ (2011) Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47:179–206MathSciNetMATH Víšek JÁ (2011) Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47:179–206MathSciNetMATH
Metadaten
Titel
The minimum weighted covariance determinant estimator for high-dimensional data
verfasst von
Jan Kalina
Jan Tichavský
Publikationsdatum
07.10.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 4/2022
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-021-00471-6

Weitere Artikel der Ausgabe 4/2022

Advances in Data Analysis and Classification 4/2022 Zur Ausgabe

Premium Partner