Skip to main content

2016 | OriginalPaper | Buchkapitel

4. Which Resampling-Based Error Estimator for Benchmark Studies? A Power Analysis with Application to PLS-LDA

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Resampling-based methods such as k-fold cross-validation or repeated splitting into training and test sets are routinely used in the context of supervised statistical learning to assess the prediction performances of prediction methods using real data sets. In this paper, we consider methodological issues related to comparison studies of prediction methods which involve several real data sets and use resampling-based error estimators as the evaluation criteria. In the literature papers often claim that, say, “Method 1 performs better than Method 2 on real data” without applying any proper statistical inference approach to support their claims and without clearly explaining what they mean by “perform better.” We recently proposed a new statistical testing framework which provides a statistically correct formulation of such paired tests—which are often performed in the machine learning community—to compare the performances of two methods on several real data sets. However, the behavior of the different available resampling-based error estimation procedures in this statistical framework is unknown. In this paper we empirically assess this behavior through an exemplary benchmark study based on 50 microarray data sets and formulate tentative recommendations regarding the choice of resampling-based error estimation procedures in light of the results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Binder, H., Schumacher, M.: Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Stat. Appl. Genet. Mol. Biol. 7, 12 (2008)MathSciNetMATH Binder, H., Schumacher, M.: Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Stat. Appl. Genet. Mol. Biol. 7, 12 (2008)MathSciNetMATH
Zurück zum Zitat Bock, J.: Bestimmung des Stichprobenumfangs. Oldenburg Verlag, München Wien (1998) Bock, J.: Bestimmung des Stichprobenumfangs. Oldenburg Verlag, München Wien (1998)
Zurück zum Zitat Boulesteix, A.-L.: PLS dimension reduction for classification with microarray data. Stat. Appl. Genet. Mol. Biol. 3, 33 (2004)MathSciNetMATH Boulesteix, A.-L.: PLS dimension reduction for classification with microarray data. Stat. Appl. Genet. Mol. Biol. 3, 33 (2004)MathSciNetMATH
Zurück zum Zitat Boulesteix, A.-L.: On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by Smith et al. Bioinformatics 29, 2664–2666 (2013)CrossRef Boulesteix, A.-L.: On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by Smith et al. Bioinformatics 29, 2664–2666 (2013)CrossRef
Zurück zum Zitat Boulesteix, A.-L., Lauer, S., Eugster, M.: A plea for neutral comparison studies in computational sciences. PLOS ONE 8, 61562 (2013)CrossRef Boulesteix, A.-L., Lauer, S., Eugster, M.: A plea for neutral comparison studies in computational sciences. PLOS ONE 8, 61562 (2013)CrossRef
Zurück zum Zitat Boulesteix, A.-L., Hable, R., Lauer, S., Eugster, M.: A statistical framework for hypothesis testing in real data comparison studies. Am. Stat. 69, 201–212 (2015)MathSciNetCrossRef Boulesteix, A.-L., Hable, R., Lauer, S., Eugster, M.: A statistical framework for hypothesis testing in real data comparison studies. Am. Stat. 69, 201–212 (2015)MathSciNetCrossRef
Zurück zum Zitat Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH
Zurück zum Zitat de Souza, B.F., de Carvalho, A., Soares, C.: A comprehensive comparison of ML algorithms for gene expression data classification. In: The 2010 International Joint Conference of Neural Networks (IJCNN), Barcelona, pp. 1–8 (2010) de Souza, B.F., de Carvalho, A., Soares, C.: A comprehensive comparison of ML algorithms for gene expression data classification. In: The 2010 International Joint Conference of Neural Networks (IJCNN), Barcelona, pp. 1–8 (2010)
Zurück zum Zitat Dougherty, E.R., Sima, C., Hanczar, B., Braga-Neto, U.M.: Performance of error estimators for classification. Curr. Bioinform. 5, 53–67 (2010)CrossRef Dougherty, E.R., Sima, C., Hanczar, B., Braga-Neto, U.M.: Performance of error estimators for classification. Curr. Bioinform. 5, 53–67 (2010)CrossRef
Zurück zum Zitat Molinaro, A., Simon, R., Pfeiffer, R.M.: Prediction error estimation: a comparison of resampling methods. Bioinformatics 21, 3301–3307 (2005)CrossRef Molinaro, A., Simon, R., Pfeiffer, R.M.: Prediction error estimation: a comparison of resampling methods. Bioinformatics 21, 3301–3307 (2005)CrossRef
Zurück zum Zitat Slawski, M., Daumer, M., Boulesteix, A.-L.: CMA: a comprehensive bioconductor package for supervised classification with high dimensional data. BMC Bioinform. 9, 439 (2008)CrossRef Slawski, M., Daumer, M., Boulesteix, A.-L.: CMA: a comprehensive bioconductor package for supervised classification with high dimensional data. BMC Bioinform. 9, 439 (2008)CrossRef
Metadaten
Titel
Which Resampling-Based Error Estimator for Benchmark Studies? A Power Analysis with Application to PLS-LDA
verfasst von
Anne-Laure Boulesteix
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-40643-5_4