Skip to main content
Erschienen in: Knowledge and Information Systems 3/2016

01.03.2016 | Regular Paper

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

verfasst von: Mohammad Javad Hosseini, Ameneh Gholipour, Hamid Beigy

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent advances in storage and processing have provided the possibility of automatic gathering of information, which in turn leads to fast and continuous flows of data. The data which are produced and stored in this way are called data streams. Data streams are produced in large size, and much dynamism and have some unique properties which make them applicable to model many real data mining applications. The main challenge of streaming data is the occurrence of concept drift. In addition, regarding the costs of labeling of instances, it is often assumed that only a small fraction of instances are labeled. In this paper, we propose an ensemble algorithm to classify instances of non-stationary data streams in a semi-supervised environment. Furthermore, this method is intended to recognize recurring concept drifts of data streams. In the proposed algorithm, a pool of classifiers is maintained by the algorithm with each classifier being representative of one single concept. At first, a batch of instances is classified by the algorithm. Thereafter, some of these instances are labeled and this partially labeled batch is used to update the classifiers in the pool. This process repeats for consecutive batches of the streams. The main advantage of the algorithm is that it uses unlabeled instances as well as labeled ones in the learning task. Experimental results show the effectiveness of the proposed algorithm over the state-of-the-art methods, in different aspects.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal CC (2006) Data streams: models and algorithms. Springer-Verlag New York Inc, New York Aggarwal CC (2006) Data streams: models and algorithms. Springer-Verlag New York Inc, New York
2.
Zurück zum Zitat Ahmadi Z, Beigy H (2012) Semi-supervised ensemble learning of data streams in the presence of concept drift. In: Proceedings of the 7th International Conference on Hybrid Artificial Intelligent Systems. Salamanca. Springer, Spain, pp 526–537 Ahmadi Z, Beigy H (2012) Semi-supervised ensemble learning of data streams in the presence of concept drift. In: Proceedings of the 7th International Conference on Hybrid Artificial Intelligent Systems. Salamanca. Springer, Spain, pp 526–537
3.
Zurück zum Zitat Bennett KP, Demiriz A, Maclin R (2002) Exploiting unlabeled data in ensemble methods. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 289–296 Bennett KP, Demiriz A, Maclin R (2002) Exploiting unlabeled data in ensemble methods. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 289–296
4.
Zurück zum Zitat Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of SIAM International Conference on Data Mining (SDM). Minneapolis, Minnesota, United States Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of SIAM International Conference on Data Mining (SDM). Minneapolis, Minnesota, United States
5.
Zurück zum Zitat Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 99(9):1601–1604 Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 99(9):1601–1604
6.
Zurück zum Zitat Castillo G (2008) Adaptive learning algorithms for Bayesian network classifiers. AI Commun 21(1):87–88MathSciNet Castillo G (2008) Adaptive learning algorithms for Bayesian network classifiers. AI Commun 21(1):87–88MathSciNet
7.
Zurück zum Zitat Chapelle O, Schalkopf B, Zien A (2006) Semi-supervised learning. MIT press, CambridgeCrossRef Chapelle O, Schalkopf B, Zien A (2006) Semi-supervised learning. MIT press, CambridgeCrossRef
8.
Zurück zum Zitat Ditzler G, Polikar R (2011) Semi-supervised learning in nonstationary environments. In: Proceeding of the International Joint Conference on Neural Networks (IJCNN). IEEE, pp 2741–2748 Ditzler G, Polikar R (2011) Semi-supervised learning in nonstationary environments. In: Proceeding of the International Joint Conference on Neural Networks (IJCNN). IEEE, pp 2741–2748
9.
Zurück zum Zitat Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. Boston, Massachusetts, United States, pp 71–80 Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. Boston, Massachusetts, United States, pp 71–80
10.
Zurück zum Zitat Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. Seattle. WA, United States, pp 128–137 Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. Seattle. WA, United States, pp 128–137
11.
Zurück zum Zitat Fan W, Huang Y, Yu PS (2004) Decision tree evolution using limited number of labeled data items from drifting data streams. In: Proceedings of the 4th IEEE International Conference on Data Mining. IEEE Computer Society, pp 379–382 Fan W, Huang Y, Yu PS (2004) Decision tree evolution using limited number of labeled data items from drifting data streams. In: Proceedings of the 4th IEEE International Conference on Data Mining. IEEE Computer Society, pp 379–382
12.
Zurück zum Zitat Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10(1):23–45 Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10(1):23–45
13.
Zurück zum Zitat Gama J, Medas P, Rocha R (2004) Forest trees for on-line data. In: Proceedings of the ACM Symposium on Applied Computing. ACM, pp 632–636 Gama J, Medas P, Rocha R (2004) Forest trees for on-line data. In: Proceedings of the ACM Symposium on Applied Computing. ACM, pp 632–636
14.
Zurück zum Zitat Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of the 7th IEEE International Conference on Data Mining. IEEE Computer Society Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of the 7th IEEE International Conference on Data Mining. IEEE Computer Society
15.
Zurück zum Zitat Gholipour A, Hosseini MJ, Beigy H (2013) An adaptive regression tree for non-stationary data streams. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM. Coimbra, Portugal, pp 815–817 Gholipour A, Hosseini MJ, Beigy H (2013) An adaptive regression tree for non-stationary data streams. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM. Coimbra, Portugal, pp 815–817
16.
Zurück zum Zitat Gomes JB, Menasalvas E, Sousa P (2010) Tracking recurrent concepts using context. In: Proceedings of the 7th International Conference on Rough Sets and Current Trends in Computing. Springer-Verlag, Warsaw, Poland, pp 168–177 Gomes JB, Menasalvas E, Sousa P (2010) Tracking recurrent concepts using context. In: Proceedings of the 7th International Conference on Rough Sets and Current Trends in Computing. Springer-Verlag, Warsaw, Poland, pp 168–177
17.
Zurück zum Zitat Hosseini MJ, Ahmadi Z, Beigy H (2011) Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: Proceedings of the IEEE International Conference on Data Mining Workshops. IEEE. Vancouver, Canada, pp 588–595 Hosseini MJ, Ahmadi Z, Beigy H (2011) Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: Proceedings of the IEEE International Conference on Data Mining Workshops. IEEE. Vancouver, Canada, pp 588–595
18.
Zurück zum Zitat Hosseini MJ, Ahmadi Z, Beigy H (2012) New management operations on classifiers pool to track recurring concepts. In: Proceedings of the 14th international conference on data warehousing and knowledge discovery. Springer, Vienna, Austria, pp 327–339 Hosseini MJ, Ahmadi Z, Beigy H (2012) New management operations on classifiers pool to track recurring concepts. In: Proceedings of the 14th international conference on data warehousing and knowledge discovery. Springer, Vienna, Austria, pp 327–339
19.
Zurück zum Zitat Hosseini MJ, Ahmadi Z, Beigy H (2013) Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification. Evol Syst 4(1):1–18CrossRef Hosseini MJ, Ahmadi Z, Beigy H (2013) Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification. Evol Syst 4(1):1–18CrossRef
20.
Zurück zum Zitat Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. San Francisco, California, United States, pp 97–106 Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. San Francisco, California, United States, pp 97–106
21.
Zurück zum Zitat Karimi Z, Abolhassani H, Beigy H (2012) A new method of mining data streams using harmony search. J Intell Inf Syst 39(2):491–511CrossRef Karimi Z, Abolhassani H, Beigy H (2012) A new method of mining data streams using harmony search. J Intell Inf Syst 39(2):491–511CrossRef
22.
Zurück zum Zitat Katakis I, Tsoumakas G, Vlahavas I (2009) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391CrossRef Katakis I, Tsoumakas G, Vlahavas I (2009) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391CrossRef
23.
Zurück zum Zitat Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300 Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300
24.
Zurück zum Zitat Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd International Conference on Machine learning. ACM, Bonn, Germany, pp 449–456 Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd International Conference on Machine learning. ACM, Bonn, Germany, pp 449–456
25.
Zurück zum Zitat Li P, Wu X, Hu X (2010) Mining recurring concept drifts with limited labeled streaming data. In: Proceeding of the 2nd Asian Conference on Machine Learning (JMLR), Tokyo, Japan, pp 241–252 Li P, Wu X, Hu X (2010) Mining recurring concept drifts with limited labeled streaming data. In: Proceeding of the 2nd Asian Conference on Machine Learning (JMLR), Tokyo, Japan, pp 241–252
26.
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, California, United States, pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, California, United States, pp 281–297
27.
Zurück zum Zitat Masud MM, Gao J, Khan L, Han J, Thuraisingham B (2008) A practical approach to classify evolving data streams: training with limited amount of labeled data. In: Proceedings of the 8th IEEE International Conference on Data Mining, pp 929–934 Masud MM, Gao J, Khan L, Han J, Thuraisingham B (2008) A practical approach to classify evolving data streams: training with limited amount of labeled data. In: Proceedings of the 8th IEEE International Conference on Data Mining, pp 929–934
28.
Zurück zum Zitat Masud MM, Woolam C, Gao J, Khan L, Han J, Hamlen KW, Oza NC (2012) Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inf Syst 33(1):213–244CrossRef Masud MM, Woolam C, Gao J, Khan L, Han J, Hamlen KW, Oza NC (2012) Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inf Syst 33(1):213–244CrossRef
29.
Zurück zum Zitat Minku LL (2011) Online ensemble learning in the presence of concept drift. University of Birmingham, Birmingham Minku LL (2011) Online ensemble learning in the presence of concept drift. University of Birmingham, Birmingham
30.
Zurück zum Zitat Moon TK (1996) The expectation-maximization algorithm. Signal Process Mag IEEE 13(6):47–60CrossRef Moon TK (1996) The expectation-maximization algorithm. Signal Process Mag IEEE 13(6):47–60CrossRef
31.
Zurück zum Zitat Nishida K (2008) Learning and detecting concept drift. Information science and technology. Hokkaido University, Hokkaido Nishida K (2008) Learning and detecting concept drift. Information science and technology. Hokkaido University, Hokkaido
32.
Zurück zum Zitat Padovitz A, Loke SW, Zaslavsky A (2004) Towards a theory of context spaces. In: Proceedings of the 2nd IEEE Annual Conference on Pervasive Computing and Communications Workshops. IEEE Computer Society, pp 38–42 Padovitz A, Loke SW, Zaslavsky A (2004) Towards a theory of context spaces. In: Proceedings of the 2nd IEEE Annual Conference on Pervasive Computing and Communications Workshops. IEEE Computer Society, pp 38–42
33.
Zurück zum Zitat Scholz M, Klinlenberg R (2005) An ensemble classifier for drifting concepts. In: Proceedings of the 2nd International Workshop on Knowledge Discovery in Data Streams, Porto, Portugal, pp 53–64 Scholz M, Klinlenberg R (2005) An ensemble classifier for drifting concepts. In: Proceedings of the 2nd International Workshop on Knowledge Discovery in Data Streams, Porto, Portugal, pp 53–64
34.
Zurück zum Zitat Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco, California, United States, pp 377–382 Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco, California, United States, pp 377–382
35.
Zurück zum Zitat Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Washington, DC, United States, pp 531–540 Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Washington, DC, United States, pp 531–540
36.
Zurück zum Zitat Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101 Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
37.
Zurück zum Zitat Woolam C, Masud MM, Khan L (2009) Lacking labels in the stream: classifying evolving stream data with few labels. In: Proceedings of the 18th International Symposium on Foundations of Intelligent Systems. Springer-Verlag, Prague, Czech Republic, pp 552–562 Woolam C, Masud MM, Khan L (2009) Lacking labels in the stream: classifying evolving stream data with few labels. In: Proceedings of the 18th International Symposium on Foundations of Intelligent Systems. Springer-Verlag, Prague, Czech Republic, pp 552–562
38.
Zurück zum Zitat Wu X, Li P, Hu X (2012) Learning from concept drifting data streams with unlabeled data. Neurocomputing. Elsevier, Amsterdam Wu X, Li P, Hu X (2012) Learning from concept drifting data streams with unlabeled data. Neurocomputing. Elsevier, Amsterdam
39.
Zurück zum Zitat Zhou ZH (2011) When semi-supervised learning meets ensemble learning. Front Electr Electron Eng China 6(1):6–16CrossRef Zhou ZH (2011) When semi-supervised learning meets ensemble learning. Front Electr Electron Eng China 6(1):6–16CrossRef
40.
Zurück zum Zitat Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach learn 3(1):1–130CrossRef Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach learn 3(1):1–130CrossRef
41.
Zurück zum Zitat Zliobaite I (2009) Learning under concept drift: an overview. Vilnius University, Technical Report Zliobaite I (2009) Learning under concept drift: an overview. Vilnius University, Technical Report
Metadaten
Titel
An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams
verfasst von
Mohammad Javad Hosseini
Ameneh Gholipour
Hamid Beigy
Publikationsdatum
01.03.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0837-4

Weitere Artikel der Ausgabe 3/2016

Knowledge and Information Systems 3/2016 Zur Ausgabe

Premium Partner