Skip to main content

2017 | OriginalPaper | Buchkapitel

Virtual Balancing of Decision Classes

verfasst von : Marzena Kryszkiewicz

Erschienen in: Intelligent Information and Database Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

It has been observed in the literature and practice that the quality of classifying based on confidences of decision rules is poor when a decision table consists of decision classes which significantly differ in the number of objects. A typical approach to overcome negative consequences of the occurrence of this phenomenon is to apply oversampling of minority decision classes and/or undersampling of majority decision classes. In this paper, we introduce a notion of a virtual balancing of decision classes, which does not require any replication of data, but produces the same results as a physical balancing of decision classes. Also, we derive a number of properties of selected evaluation measures (coverage, confidence, lift and growth) of decision rules and relations among them w.r.t. virtually (and by this, physically) balanced decision classes. In particular, we show how to determine threshold values for confidence, lift, growth and coverage so that resulting sets of decision rules were identical.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft Comp. Appl. 7(3), 176–204 (2015) Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft Comp. Appl. 7(3), 176–204 (2015)
2.
Zurück zum Zitat Batista, G., Prati, R., Monard, R.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)CrossRef Batista, G., Prati, R., Monard, R.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)CrossRef
3.
Zurück zum Zitat Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: ACM SIGMOD 1997, pp. 255–264. ICMD (1997) Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: ACM SIGMOD 1997, pp. 255–264. ICMD (1997)
4.
Zurück zum Zitat Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005)CrossRef Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005)CrossRef
5.
Zurück zum Zitat Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: classification by aggregating emerging patterns. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 30–42. Springer, Heidelberg (1999). doi:10.1007/3-540-46846-3_4 CrossRef Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: classification by aggregating emerging patterns. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 30–42. Springer, Heidelberg (1999). doi:10.​1007/​3-540-46846-3_​4 CrossRef
6.
Zurück zum Zitat Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 99, 1–22 (2011) Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 99, 1–22 (2011)
7.
Zurück zum Zitat Kryszkiewicz, M.: A lossless representation for association rules satisfying multiple evaluation criteria. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, T.-P. (eds.) ACIIDS 2016. LNCS (LNAI), vol. 9622, pp. 147–158. Springer, Heidelberg (2016). doi:10.1007/978-3-662-49390-8_14 CrossRef Kryszkiewicz, M.: A lossless representation for association rules satisfying multiple evaluation criteria. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, T.-P. (eds.) ACIIDS 2016. LNCS (LNAI), vol. 9622, pp. 147–158. Springer, Heidelberg (2016). doi:10.​1007/​978-3-662-49390-8_​14 CrossRef
8.
Zurück zum Zitat Stefanowski, J.: Dealing with data difficulty factors while learning from imbalanced data. In: Matwin, S., Mielniczuk, J. (eds.) Challenges in Computational Statistics and Data Mining. SCI, vol. 605, pp. 333–363. Springer, Heidelberg (2016). doi:10.1007/978-3-319-18781-5_17 CrossRef Stefanowski, J.: Dealing with data difficulty factors while learning from imbalanced data. In: Matwin, S., Mielniczuk, J. (eds.) Challenges in Computational Statistics and Data Mining. SCI, vol. 605, pp. 333–363. Springer, Heidelberg (2016). doi:10.​1007/​978-3-319-18781-5_​17 CrossRef
Metadaten
Titel
Virtual Balancing of Decision Classes
verfasst von
Marzena Kryszkiewicz
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-54472-4_63

Premium Partner