Skip to main content

2018 | OriginalPaper | Buchkapitel

Multi-class Imbalanced Learning with One-Versus-One Decomposition: An Empirical Study

verfasst von : Yanjun Song, Jing Zhang, Han Yan, Qianmu Li

Erschienen in: Cloud Computing and Security

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In supervised learning, the underlying skewed distribution of multiple classes poses extreme difficulties for learning good models. A common scheme to deal with the multi-class imbalanced problem is to decompose an original dataset into several binary-class subsets and incorporate some imbalanced learning techniques. This paper presents our empirical study on the state-of-the-art multi-class imbalanced learning algorithms which are based on One-versus-One (OVO) decomposition. We implemented six algorithms in literature, including SMOTEBagging, UnderBagging, OVO plus OVA, OVO plus SMOTE, One-Against-Higher-Order, and DynamicOVO, and evaluate their performance in terms of multi-class Area Under the ROC (MAUC) on eighteen datasets with different characteristics. Experimental results show that the OVO plus SMOTE algorithm is superior to other algorithms and it is quite stable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yang, Q., Wu, X.D.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(04), 597–604 (2006)CrossRef Yang, Q., Wu, X.D.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(04), 597–604 (2006)CrossRef
2.
Zurück zum Zitat He, H.B., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef He, H.B., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef
3.
Zurück zum Zitat Zhou, Z.H., Liu, X.Y.: On multi-class cost-sensitive learning. Nat. Conf. Artif. Intell. 26(3), 567–572 (2006)MathSciNet Zhou, Z.H., Liu, X.Y.: On multi-class cost-sensitive learning. Nat. Conf. Artif. Intell. 26(3), 567–572 (2006)MathSciNet
4.
Zurück zum Zitat Anand, R., Mehrotra, K., Mohan, C.K., Ranka, S.: Efficient classification for multiclass problems using modular neural networks. IEEE Trans. Neural Netw. 6(1), 117–124 (1995)CrossRef Anand, R., Mehrotra, K., Mohan, C.K., Ranka, S.: Efficient classification for multiclass problems using modular neural networks. IEEE Trans. Neural Netw. 6(1), 117–124 (1995)CrossRef
5.
Zurück zum Zitat Fernández, A., López, V., Galar, M., Del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42(2), 97–110 (2013)CrossRef Fernández, A., López, V., Galar, M., Del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42(2), 97–110 (2013)CrossRef
6.
Zurück zum Zitat Galar, M., Ndez, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn. 44(8), 1761–1776 (2011)CrossRef Galar, M., Ndez, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn. 44(8), 1761–1776 (2011)CrossRef
7.
Zurück zum Zitat Galar, M., Fernández, A., Ndez, A., Barrenechea, E., Bustince, H., Herrera, F.: Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recogn. 46(12), 3412–3424 (2013)CrossRef Galar, M., Fernández, A., Ndez, A., Barrenechea, E., Bustince, H., Herrera, F.: Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recogn. 46(12), 3412–3424 (2013)CrossRef
8.
Zurück zum Zitat Lorena, A.C., Carvalho, A.C., Gama, J.M.: A review on the combination of binary classifiers in multi-class problems. Artif. Intell. Rev. 30(1–4), 19–37 (2008)CrossRef Lorena, A.C., Carvalho, A.C., Gama, J.M.: A review on the combination of binary classifiers in multi-class problems. Artif. Intell. Rev. 30(1–4), 19–37 (2008)CrossRef
9.
Zurück zum Zitat Fernández, A., del Jesus, M.J., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS (LNAI), vol. 6178, pp. 89–98. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14049-5_10CrossRef Fernández, A., del Jesus, M.J., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS (LNAI), vol. 6178, pp. 89–98. Springer, Heidelberg (2010). https://​doi.​org/​10.​1007/​978-3-642-14049-5_​10CrossRef
10.
Zurück zum Zitat Krawczyk, B.: Combining one-vs-one decomposition and ensemble learning for multi-class imbalanced data. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds.) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. AISC, vol. 403, pp. 27–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-26227-7_3CrossRef Krawczyk, B.: Combining one-vs-one decomposition and ensemble learning for multi-class imbalanced data. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds.) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. AISC, vol. 403, pp. 27–36. Springer, Cham (2016). https://​doi.​org/​10.​1007/​978-3-319-26227-7_​3CrossRef
11.
Zurück zum Zitat Ghanem, A.S., Venkatesh, S., West, G.: Multi-class pattern classification in imbalanced data. In: IEEE 2010 International Conference on Pattern Recognition, pp. 2881–2884 (2010) Ghanem, A.S., Venkatesh, S., West, G.: Multi-class pattern classification in imbalanced data. In: IEEE 2010 International Conference on Pattern Recognition, pp. 2881–2884 (2010)
12.
Zurück zum Zitat Murphey, Y.L., Wang, H., Ou, G., Feldkamp, L.A.: OAHO: an effective algorithm for multi-class learning from imbalanced data. In: IEEE 2007 International Joint Conference on Neural Networks, pp. 406–411 (2007) Murphey, Y.L., Wang, H., Ou, G., Feldkamp, L.A.: OAHO: an effective algorithm for multi-class learning from imbalanced data. In: IEEE 2007 International Joint Conference on Neural Networks, pp. 406–411 (2007)
13.
Zurück zum Zitat Tan, A.C., Gilbert, D., Deville, Y.: Multi-class protein fold classification using a new ensemble machine learning approach. Genome Inf. 14, 206–217 (2011) Tan, A.C., Gilbert, D., Deville, Y.: Multi-class protein fold classification using a new ensemble machine learning approach. Genome Inf. 14, 206–217 (2011)
14.
Zurück zum Zitat Vluymans, S., Fernández, A., Saeys, Y., Cornelis, C., Herrera, F.: Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach. Knowl. Inf. Syst. 1, 1–30 (2017) Vluymans, S., Fernández, A., Saeys, Y., Cornelis, C., Herrera, F.: Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach. Knowl. Inf. Syst. 1, 1–30 (2017)
15.
Zurück zum Zitat Cerf, L., Gay, D., Selmaoui-Folcher, N., Milleux, B., Boulicaut, J.F.: Editorial: parameter-free classification in multi-class imbalanced data sets. Data Knowl. Eng. 87(9), 109–129 (2013)CrossRef Cerf, L., Gay, D., Selmaoui-Folcher, N., Milleux, B., Boulicaut, J.F.: Editorial: parameter-free classification in multi-class imbalanced data sets. Data Knowl. Eng. 87(9), 109–129 (2013)CrossRef
16.
Zurück zum Zitat Zhang, Z., Krawczyk, B., Garcìa, S., Rosales-Pérez, A., Herrera, F.: Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl. Based Syst. 106(C), 251–263 (2016)CrossRef Zhang, Z., Krawczyk, B., Garcìa, S., Rosales-Pérez, A., Herrera, F.: Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl. Based Syst. 106(C), 251–263 (2016)CrossRef
17.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over–sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over–sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef
18.
Zurück zum Zitat Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE 2009 Symposium on Computational Intelligence and Data Mining, vol. 1, no. 5, pp. 324–331 (2009) Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE 2009 Symposium on Computational Intelligence and Data Mining, vol. 1, no. 5, pp. 324–331 (2009)
19.
Zurück zum Zitat Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern.-Part C 42(4), 463–484 (2012)CrossRef Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern.-Part C 42(4), 463–484 (2012)CrossRef
20.
Zurück zum Zitat Garcia-Pedrajas, N., Ortiz-Boyer, D.: Improving multi-class pattern recognition by the combination of two strategies. IEEE Trans. Pattern Anal. Mach. Intell. 28(6), 1001–1006 (2006)CrossRef Garcia-Pedrajas, N., Ortiz-Boyer, D.: Improving multi-class pattern recognition by the combination of two strategies. IEEE Trans. Pattern Anal. Mach. Intell. 28(6), 1001–1006 (2006)CrossRef
21.
Zurück zum Zitat Martino, M.D., Fernández, A., Iturralde, P.: Novel classifier scheme for imbalanced problems. Pattern Recogn. Lett. 34(10), 1146–1151 (2013)CrossRef Martino, M.D., Fernández, A., Iturralde, P.: Novel classifier scheme for imbalanced problems. Pattern Recogn. Lett. 34(10), 1146–1151 (2013)CrossRef
22.
Zurück zum Zitat Hand, D.J., Till, R.J.: A simple generalisation of the area under the roc curve for multiple class classification problems. Mach. Learn. 45(2), 171–186 (2001)CrossRef Hand, D.J., Till, R.J.: A simple generalisation of the area under the roc curve for multiple class classification problems. Mach. Learn. 45(2), 171–186 (2001)CrossRef
23.
Zurück zum Zitat Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. rtif. Intell. 5(4), 1–12 (2016) Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. rtif. Intell. 5(4), 1–12 (2016)
Metadaten
Titel
Multi-class Imbalanced Learning with One-Versus-One Decomposition: An Empirical Study
verfasst von
Yanjun Song
Jing Zhang
Han Yan
Qianmu Li
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-00012-7_56