Skip to main content
Top

2013 | OriginalPaper | Chapter

22. Preliminary Evaluation of Classification Complexity Measures on Imbalanced Data

Authors : Yan Xing, Hao Cai, Yanguang Cai, Ole Hejlesen, Egon Toft

Published in: Proceedings of 2013 Chinese Intelligent Automation Conference

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Classification complexity measures play an important role in classifier selection and are primarily designed for balanced data. Focusing on binary classification, this paper proposes a novel methodology to evaluate their validity on imbalanced data. The twelve complexity measures composed by Ho are evaluated on synthetic imbalanced data sets with various probability distributions, various boundary shapes and various data skewness. The experimental results demonstrate that most of the complexity measures are statistically changeable as data skewness varies. They need to be revised and improved for imbalanced data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge
2.
go back to reference Ho TK, Basu M, Law MHC (2006) In: Basu M, Ho TK (eds) Data complexity in pattern recognition. Measures of geometrical complexity in classification problems, Springer, Berlin, pp 3–24 Ho TK, Basu M, Law MHC (2006) In: Basu M, Ho TK (eds) Data complexity in pattern recognition. Measures of geometrical complexity in classification problems, Springer, Berlin, pp 3–24
3.
go back to reference Moran S, He Y, Liu K (2009) Choosing the best bayesian classifier: an empirical study. IAENG Int J Comput Sci 36(4):9–19 Moran S, He Y, Liu K (2009) Choosing the best bayesian classifier: an empirical study. IAENG Int J Comput Sci 36(4):9–19
4.
go back to reference Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):298–300 Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):298–300
5.
go back to reference Sun Y, Wong AC, Kamel MS (2009) Classification of Imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719CrossRef Sun Y, Wong AC, Kamel MS (2009) Classification of Imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719CrossRef
6.
go back to reference Ho TK (2008) Data complexity analysis: linkage between context and solution in classification. Lect Notes Comput Sci 5342:986–995CrossRef Ho TK (2008) Data complexity analysis: linkage between context and solution in classification. Lect Notes Comput Sci 5342:986–995CrossRef
7.
go back to reference Ho TK (2002) A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal Appl 5(2):102–112MathSciNetMATHCrossRef Ho TK (2002) A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal Appl 5(2):102–112MathSciNetMATHCrossRef
8.
go back to reference Weng CG, Poon J (2010) CODE: a data complexity framework for imbalanced datasets. Lect Notes Artif Intell 5569:16–27 Weng CG, Poon J (2010) CODE: a data complexity framework for imbalanced datasets. Lect Notes Artif Intell 5569:16–27
9.
go back to reference Moore DS, McCabe GP, Craig BA (2009) Introduction to the practice of statistics, 6th edn. W.H. Freeman, New York Moore DS, McCabe GP, Craig BA (2009) Introduction to the practice of statistics, 6th edn. W.H. Freeman, New York
10.
go back to reference Orriols-Puig A, Macia N, Ho TK (2010) Documentation for the data complexity library in C++. Universitat Ramon Llull, La Salle Orriols-Puig A, Macia N, Ho TK (2010) Documentation for the data complexity library in C++. Universitat Ramon Llull, La Salle
Metadata
Title
Preliminary Evaluation of Classification Complexity Measures on Imbalanced Data
Authors
Yan Xing
Hao Cai
Yanguang Cai
Ole Hejlesen
Egon Toft
Copyright Year
2013
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-38466-0_22