Skip to main content

2017 | OriginalPaper | Buchkapitel

An Effective Sampling Strategy for Ensemble Learning with Imbalanced Data

verfasst von : Chen Zhang, Xiaolong Zhang

Erschienen in: Intelligent Computing Methodologies

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Classification of imbalanced datasets is one of the challenges in machine learning and data mining domains. The traditional classifiers still need to handle with minority instances. In this paper, we propose an effective method which applies sampling method based on ensemble learning. It uses Adaboost-SVM based on spectral clustering to boost the performance. This method also uses over-sampling and under-sampling methods based on the misclassified instances got by ensemble learning. Compared with the preview algorithms, the experiment results show that the proposed method is effective in dealing with imbalanced data in binary classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. J. Acm SIGKDD Explor. Newslett. 6, 1–6 (2004)CrossRef Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. J. Acm SIGKDD Explor. Newslett. 6, 1–6 (2004)CrossRef
2.
Zurück zum Zitat Gao, J.W., Liang, J.Y.: Research and advancement of classification method of imbalanced data sets. J. Comput. Sci. 35, 10–13 (2008) Gao, J.W., Liang, J.Y.: Research and advancement of classification method of imbalanced data sets. J. Comput. Sci. 35, 10–13 (2008)
3.
Zurück zum Zitat Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. J. Pattern Recogn. 45, 3738–3750 (2012)CrossRef Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. J. Pattern Recogn. 45, 3738–3750 (2012)CrossRef
4.
Zurück zum Zitat Chawla, N.V., Cieslak, D.A., Hall, L.O.: Automatically countering imbalance and its empirical relationship to cost. J. Data Mining Knowl. Discov. 17, 225–252 (2008)MathSciNetCrossRef Chawla, N.V., Cieslak, D.A., Hall, L.O.: Automatically countering imbalance and its empirical relationship to cost. J. Data Mining Knowl. Discov. 17, 225–252 (2008)MathSciNetCrossRef
5.
Zurück zum Zitat Sun, Z., Song, Q., Zhu, X.: A novel ensemble method for classifying imbalanced data. J. Pattern Recogn. 48, 1623–1637 (2015)CrossRef Sun, Z., Song, Q., Zhu, X.: A novel ensemble method for classifying imbalanced data. J. Pattern Recogn. 48, 1623–1637 (2015)CrossRef
6.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH
7.
Zurück zum Zitat Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. J. Lect. Notes Comput. Sci. 3644, 878–887 (2005)CrossRef Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. J. Lect. Notes Comput. Sci. 3644, 878–887 (2005)CrossRef
8.
Zurück zum Zitat Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Adv. Knowl. Discov. Data Mining 5476, 475–482 (2009)CrossRef Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Adv. Knowl. Discov. Data Mining 5476, 475–482 (2009)CrossRef
9.
Zurück zum Zitat Fan, W., Stolfo, S.J, Zhang, J.: AdaCost: misclassification cost-sensitive boosting. In: Sixteenth International Conference on Machine Learning, pp. 97–105 . Morgan Kaufmann Publishers Inc. (1999) Fan, W., Stolfo, S.J, Zhang, J.: AdaCost: misclassification cost-sensitive boosting. In: Sixteenth International Conference on Machine Learning, pp. 97–105 . Morgan Kaufmann Publishers Inc. (1999)
10.
Zurück zum Zitat Lertampaiporn, S., Thammarongtham, C., Nukoolkit, C.: Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification. J. Nucleic Acids Res. 41, e21 (2013)CrossRef Lertampaiporn, S., Thammarongtham, C., Nukoolkit, C.: Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification. J. Nucleic Acids Res. 41, e21 (2013)CrossRef
11.
Zurück zum Zitat Chawla, N.V., Lazarevic, A., Hall, L.O.: Smoteboost: improving prediction of the minority class in boosting. J. Lect. Notes Comput. Sci. 2838, 107–119 (2003)CrossRef Chawla, N.V., Lazarevic, A., Hall, L.O.: Smoteboost: improving prediction of the minority class in boosting. J. Lect. Notes Comput. Sci. 2838, 107–119 (2003)CrossRef
12.
Zurück zum Zitat Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V.: Rusboost: a hybrid approach to alleviating class imbalance. J IEEE Trans. Syst. Man Cybern. 40, 185–197 (2010)CrossRef Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V.: Rusboost: a hybrid approach to alleviating class imbalance. J IEEE Trans. Syst. Man Cybern. 40, 185–197 (2010)CrossRef
13.
Zurück zum Zitat Wang, C., Hongye, S.U., Yu, Q.U.: Imbalanced data sets classification method based on over-sampling technique. J. Comput. Eng. Appl. 47, 139–143 (2011) Wang, C., Hongye, S.U., Yu, Q.U.: Imbalanced data sets classification method based on over-sampling technique. J. Comput. Eng. Appl. 47, 139–143 (2011)
14.
Zurück zum Zitat Li, X.F., Li, J., Dong, Y.F.: A new learning algorithm for imbalanced data—pcboost. J. Chinese J. Comput. 2, 202–209 (2012)MathSciNetCrossRef Li, X.F., Li, J., Dong, Y.F.: A new learning algorithm for imbalanced data—pcboost. J. Chinese J. Comput. 2, 202–209 (2012)MathSciNetCrossRef
15.
Zurück zum Zitat Sobhani, P., Viktor, H., Matwin, S.: Learning from imbalanced data using ensemble methods and cluster-based undersampling. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2014. LNCS, vol. 8983, pp. 69–83. Springer, Cham (2015). doi:10.1007/978-3-319-17876-9_5 Sobhani, P., Viktor, H., Matwin, S.: Learning from imbalanced data using ensemble methods and cluster-based undersampling. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2014. LNCS, vol. 8983, pp. 69–83. Springer, Cham (2015). doi:10.​1007/​978-3-319-17876-9_​5
16.
Zurück zum Zitat Sun, Z., Song, Q., Zhu, X.: Using coding-based ensemble learning to improve software defect prediction. J. IEEE Trans. Syst. Man Cybern. Part C 42, 1806–1817 (2012)CrossRef Sun, Z., Song, Q., Zhu, X.: Using coding-based ensemble learning to improve software defect prediction. J. IEEE Trans. Syst. Man Cybern. Part C 42, 1806–1817 (2012)CrossRef
17.
Zurück zum Zitat Schapire, R.E.: The strength of weak learnability. J. Mach. Learn. 5, 197–227 (1990) Schapire, R.E.: The strength of weak learnability. J. Mach. Learn. 5, 197–227 (1990)
18.
Zurück zum Zitat Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1999)CrossRefMATH Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1999)CrossRefMATH
19.
Zurück zum Zitat Li, X., Wang, L., Sung, E.: Adaboost with SVM-based component classifiers. J. Eng. Appl. Artif. Intell. 21, 785–795 (2008)CrossRef Li, X., Wang, L., Sung, E.: Adaboost with SVM-based component classifiers. J. Eng. Appl. Artif. Intell. 21, 785–795 (2008)CrossRef
20.
Zurück zum Zitat Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. J. Pattern Recogn. 30, 1145–1159 (1997)CrossRef Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. J. Pattern Recogn. 30, 1145–1159 (1997)CrossRef
21.
Zurück zum Zitat Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. J. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005)CrossRef Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. J. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005)CrossRef
Metadaten
Titel
An Effective Sampling Strategy for Ensemble Learning with Imbalanced Data
verfasst von
Chen Zhang
Xiaolong Zhang
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-63315-2_33