Skip to main content
Erschienen in:
Buchtitelbild

2019 | OriginalPaper | Buchkapitel

1. Introduction to Imbalanced Data

verfasst von : Osamu Komori, Shinto Eguchi

Erschienen in: Statistical Methods for Imbalanced Data in Ecological and Biological Studies

Verlag: Springer Japan

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

An imbalance of sample sizes among class labels makes it difficult to obtain high classification accuracy in many scientific fields, including medical diagnosis, bioinformatics, biology, and fisheries management. This difficulty is referred to as “class imbalance problem” and is considered to be among the 10 most important problems in data mining research. This topic has also been widely discussed in several machine learning workshops. The critical feature of the imbalance problem is that it significantly degrades the performance of standard classification methods, which implicitly assume balanced class distributions and equal costs of misclassification for each class. Hence, new strategies are required for mitigating such imbalances, based on resampling techniques, modification of the classification algorithms, adjustment of weights for class distributions, and so on.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol 12:387–415MathSciNetCrossRef Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol 12:387–415MathSciNetCrossRef
2.
Zurück zum Zitat Chawla NV (2010) Data mining for imbalanced datasets: an overview. Springer, MA, pp 875–886 Chawla NV (2010) Data mining for imbalanced datasets: an overview. Springer, MA, pp 875–886
4.
Zurück zum Zitat Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Advances in intelligent computing, pp 878–887 Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Advances in intelligent computing, pp 878–887
5.
Zurück zum Zitat Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516CrossRef Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516CrossRef
6.
Zurück zum Zitat Komori O (2011) A boosting method for maximization of the area under the ROC curve. Ann Inst Stat Math 63:961–979MathSciNetCrossRef Komori O (2011) A boosting method for maximization of the area under the ROC curve. Ann Inst Stat Math 63:961–979MathSciNetCrossRef
7.
Zurück zum Zitat Komori O, Eguchi S (2010) A boosting method for maximizing the partial area under the ROC curve. BMC Bioinform 11:314CrossRef Komori O, Eguchi S (2010) A boosting method for maximizing the partial area under the ROC curve. BMC Bioinform 11:314CrossRef
8.
Zurück zum Zitat McIntosh MW, Pepe MS (2002) Combining several screening tests: optimality of the risk score. Biometrics 58:657–664MathSciNetCrossRef McIntosh MW, Pepe MS (2002) Combining several screening tests: optimality of the risk score. Biometrics 58:657–664MathSciNetCrossRef
9.
Zurück zum Zitat Nguyen HM, Coopery EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Parad 3:4–21CrossRef Nguyen HM, Coopery EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Parad 3:4–21CrossRef
10.
Zurück zum Zitat Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2:37–63 Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2:37–63
11.
Zurück zum Zitat Prati RC, Batista GE, Monard MC (2009) Data mining with imbalanced class distributions: concepts and methods. In: Indian international conference artificial intelligence, pp 359–376 Prati RC, Batista GE, Monard MC (2009) Data mining with imbalanced class distributions: concepts and methods. In: Indian international conference artificial intelligence, pp 359–376
13.
Zurück zum Zitat Zhou X, Obuchowski NA, McClish DK (2002) Statistical methods in diagnostic medicine. Wiley, New YorkCrossRef Zhou X, Obuchowski NA, McClish DK (2002) Statistical methods in diagnostic medicine. Wiley, New YorkCrossRef
Metadaten
Titel
Introduction to Imbalanced Data
verfasst von
Osamu Komori
Shinto Eguchi
Copyright-Jahr
2019
Verlag
Springer Japan
DOI
https://doi.org/10.1007/978-4-431-55570-4_1

Premium Partner