Skip to main content
Erschienen in: World Wide Web 6/2017

08.03.2017

Multi-window based ensemble learning for classification of imbalanced streaming data

verfasst von: Hu Li, Ye Wang, Hua Wang, Bin Zhou

Erschienen in: World Wide Web | Ausgabe 6/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Imbalanced streaming data is commonly encountered in real-world data mining and machine learning applications, and has attracted much attention in recent years. Both imbalanced data and streaming data in practice are normally encountered together; however, little research work has been studied on the two types of data together. In this paper, we propose a multi-window based ensemble learning method for the classification of imbalanced streaming data. Three types of windows are defined to store the current batch of instances, the latest minority instances, and the ensemble classifier. The ensemble classifier consists of a set of latest sub-classifiers, and the instances employed to train each sub-classifier. All sub-classifiers are weighted prior to predicting the class labels of newly arriving instances, and new sub-classifiers are trained only when the precision is below a predefined threshold. Extensive experiments on synthetic datasets and real-world datasets demonstrate that the new approach can efficiently and effectively classify imbalanced streaming data, and generally outperforms existing approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alippi, C., Boracchi, G., Roveri, M.: Just in time classifiers: Managing the slow drift case. In: International Joint Conference on Neural Networks, 2009. IJCNN 2009. pp. 114–120 (2009). Alippi, C., Boracchi, G., Roveri, M.: Just in time classifiers: Managing the slow drift case. In: International Joint Conference on Neural Networks, 2009. IJCNN 2009. pp. 114–120 (2009).
2.
Zurück zum Zitat Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In, In SIAM International Conference on Data Mining (2007)CrossRef Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In, In SIAM International Conference on Data Mining (2007)CrossRef
3.
Zurück zum Zitat Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) Advances in Intelligent Data Analysis VIII, pp. 249–260. Springer, Berlin Heidelberg (2009)CrossRef Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) Advances in Intelligent Data Analysis VIII, pp. 249–260. Springer, Berlin Heidelberg (2009)CrossRef
4.
Zurück zum Zitat Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O. and Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. pp. 853–867. Springer US (2005). Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O. and Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook. pp. 853–867. Springer US (2005).
5.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH
6.
Zurück zum Zitat Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) Knowledge Discovery in Databases: PKDD 2003, pp. 107–119. Springer, Berlin Heidelberg (2003)CrossRef Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) Knowledge Discovery in Databases: PKDD 2003, pp. 107–119. Springer, Berlin Heidelberg (2003)CrossRef
7.
Zurück zum Zitat Chen, S., He, H.: Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol. Syst. 2, 35–50 (2010)CrossRef Chen, S., He, H.: Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol. Syst. 2, 35–50 (2010)CrossRef
8.
Zurück zum Zitat Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. In: CEng, P.A.M.Bs., MSc, R.E.Bs., and Allen, D.T. (eds.) Applications and Innovations in Intelligent Systems XII. pp. 3–16. Springer London (2005). Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. In: CEng, P.A.M.Bs., MSc, R.E.Bs., and Allen, D.T. (eds.) Applications and Innovations in Intelligent Systems XII. pp. 3–16. Springer London (2005).
9.
Zurück zum Zitat Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 71–80. ACM, New York (2000). Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 71–80. ACM, New York (2000).
10.
Zurück zum Zitat Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22, 1517–1531 (2011)CrossRef Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22, 1517–1531 (2011)CrossRef
11.
Zurück zum Zitat He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)CrossRef He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)CrossRef
12.
Zurück zum Zitat Hoens, T.R., Polikar, R., Chawla, N.V.: Learning from streaming data with concept drift and imbalance: an overview. Prog. Artif. Intell. 1, 89–101 (2012)CrossRef Hoens, T.R., Polikar, R., Chawla, N.V.: Learning from streaming data with concept drift and imbalance: an overview. Prog. Artif. Intell. 1, 89–101 (2012)CrossRef
13.
Zurück zum Zitat Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 97–106. ACM, New York, (2001). Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 97–106. ACM, New York, (2001).
14.
Zurück zum Zitat Kolter, J.Z., Maloof, M.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003. pp. 123–130 (2003). Kolter, J.Z., Maloof, M.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003. pp. 123–130 (2003).
15.
Zurück zum Zitat Lichtenwalter, R.N., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data Streams. In: Theeramunkong, T., Nattee, C., Adeodato, P.J.L., Chawla, N., Christen, P., Lenca, P., Poon, J., Williams, G. (eds.) New Frontiers in Applied Data Mining, pp. 53–75. Springer, Berlin Heidelberg (2010)CrossRef Lichtenwalter, R.N., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data Streams. In: Theeramunkong, T., Nattee, C., Adeodato, P.J.L., Chawla, N., Christen, P., Lenca, P., Poon, J., Williams, G. (eds.) New Frontiers in Applied Data Mining, pp. 53–75. Springer, Berlin Heidelberg (2010)CrossRef
16.
Zurück zum Zitat Liu, W., Chawla, S., Cieslak, D.A., Chawla, N.V.: A robust decision tree algorithm for imbalanced data sets. In: in SIAM International Conference on Data Mining, 2010. pp. 766–777. Liu, W., Chawla, S., Cieslak, D.A., Chawla, N.V.: A robust decision tree algorithm for imbalanced data sets. In: in SIAM International Conference on Data Mining, 2010. pp. 766–777.
17.
Zurück zum Zitat Liu, W., Wang, L., Yi, M.: Simple-random-sampling-based multiclass text classification algorithm. Sci. World J. 2014, 1–7 (2014) Liu, W., Wang, L., Yi, M.: Simple-random-sampling-based multiclass text classification algorithm. Sci. World J. 2014, 1–7 (2014)
18.
Zurück zum Zitat Parveen, P., Weger, Z.R., Thuraisingham, B., Hamlen, K., Khan, L.: Supervised learning for insider threat detection using stream mining. In: Proceedings of the 2011 I.E. 23rd International Conference on Tools with Artificial Intelligence. pp. 1032–1039. IEEE Computer Society, Washington, DC, (2011). Parveen, P., Weger, Z.R., Thuraisingham, B., Hamlen, K., Khan, L.: Supervised learning for insider threat detection using stream mining. In: Proceedings of the 2011 I.E. 23rd International Conference on Tools with Artificial Intelligence. pp. 1032–1039. IEEE Computer Society, Washington, DC, (2011).
19.
Zurück zum Zitat Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)MathSciNetMATH Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)MathSciNetMATH
20.
Zurück zum Zitat Shen, X., Boutell, M., Luo, J., Brown, C.: Multilabel machine learning and its application to semantic scene classification. Presented at the Storage and Retrieval Methods and Applications for Multimedia 2004 December 1 (2003). Shen, X., Boutell, M., Luo, J., Brown, C.: Multilabel machine learning and its application to semantic scene classification. Presented at the Storage and Retrieval Methods and Applications for Multimedia 2004 December 1 (2003).
21.
Zurück zum Zitat Shi, J., Luo, Z.: Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples. Comput. Biol. Med. 40, 723–732 (2010)CrossRef Shi, J., Luo, Z.: Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples. Comput. Biol. Med. 40, 723–732 (2010)CrossRef
22.
Zurück zum Zitat Street, W.N., Kim, Y.: A Streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 377–382. ACM, New York, (2001). Street, W.N., Kim, Y.: A Streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 377–382. ACM, New York, (2001).
23.
Zurück zum Zitat Sun, Y., Wong, A.K.C., Wang, Y.: Parameter inference of cost-sensitive boosting algorithms. In: Perner, P. and Imiya, A. (eds.) Machine Learning and Data Mining in Pattern Recognition. pp. 21–30. Springer Berlin Heidelberg (2005). Sun, Y., Wong, A.K.C., Wang, Y.: Parameter inference of cost-sensitive boosting algorithms. In: Perner, P. and Imiya, A. (eds.) Machine Learning and Data Mining in Pattern Recognition. pp. 21–30. Springer Berlin Heidelberg (2005).
24.
25.
Zurück zum Zitat Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 226–235. ACM, New York (2003). Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 226–235. ACM, New York (2003).
26.
Zurück zum Zitat Wang, X., Jia, Y., Chen, R., Fan, H., Zhou, B.: Improving text categorization with semantic knowledge in wikipedia. IEICE Trans. Inf. Syst. E96-D, 2786–2794 (2013a)CrossRef Wang, X., Jia, Y., Chen, R., Fan, H., Zhou, B.: Improving text categorization with semantic knowledge in wikipedia. IEICE Trans. Inf. Syst. E96-D, 2786–2794 (2013a)CrossRef
27.
Zurück zum Zitat Wang, S., Minku, L.L., Yao, X.: A learning framework for online class imbalance learning. In: 2013 I.E. Symposium on Computational Intelligence and Ensemble Learning (CIEL). pp. 36–45 (2013b). Wang, S., Minku, L.L., Yao, X.: A learning framework for online class imbalance learning. In: 2013 I.E. Symposium on Computational Intelligence and Ensemble Learning (CIEL). pp. 36–45 (2013b).
28.
Zurück zum Zitat Wang, S., Minku, L.L., Yao, X.: Online class imbalance learning and its applications in fault detection. Int. J. Comput. Intell. Appl. 12, 1340001 (2013c)CrossRef Wang, S., Minku, L.L., Yao, X.: Online class imbalance learning and its applications in fault detection. Int. J. Comput. Intell. Appl. 12, 1340001 (2013c)CrossRef
29.
Zurück zum Zitat Wang, Y., Li, H., Wang, H., Zhou, B., Zhang, Y.: Multi-window based ensemble learning for classification of imbalanced streaming data. In: 16th International Conference on Web Information Systems Engineering. pp. 78–92. Springer International Publishing, Miami, (2015). Wang, Y., Li, H., Wang, H., Zhou, B., Zhang, Y.: Multi-window based ensemble learning for classification of imbalanced streaming data. In: 16th International Conference on Web Information Systems Engineering. pp. 78–92. Springer International Publishing, Miami, (2015).
30.
Zurück zum Zitat Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 69–101 (1996) Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 69–101 (1996)
31.
Zurück zum Zitat Xioufis, E.S., Spiliopoulou, M., Tsoumakas, G., Vlahavas, I.: Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two. pp. 1583–1588. AAAI Press, Barcelona, Catalonia, Spain (2011). Xioufis, E.S., Spiliopoulou, M., Tsoumakas, G., Vlahavas, I.: Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two. pp. 1583–1588. AAAI Press, Barcelona, Catalonia, Spain (2011).
32.
Zurück zum Zitat Zhang, D., Shen, H., Hui, T., Li, Y., Wu, J., Sang, Y.: A selectively re-train approach based on clustering to classify concept-drifting data streams with skewed distribution. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., and Kao, H.-Y. (eds.) Advances in Knowledge Discovery and Data Mining. pp. 413–424. Springer International Publishing (2014). Zhang, D., Shen, H., Hui, T., Li, Y., Wu, J., Sang, Y.: A selectively re-train approach based on clustering to classify concept-drifting data streams with skewed distribution. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., and Kao, H.-Y. (eds.) Advances in Knowledge Discovery and Data Mining. pp. 413–424. Springer International Publishing (2014).
Metadaten
Titel
Multi-window based ensemble learning for classification of imbalanced streaming data
verfasst von
Hu Li
Ye Wang
Hua Wang
Bin Zhou
Publikationsdatum
08.03.2017
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 6/2017
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-017-0449-x

Weitere Artikel der Ausgabe 6/2017

World Wide Web 6/2017 Zur Ausgabe