Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 6/2015

01.12.2015 | Original Article

An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection

verfasst von: Parneeta Sidhu, M. P. S. Bhatia

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 6/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data Streams are continuous data instances arriving at a very high speed with varying underlying conceptual distribution. We present a novel online ensemble approach, Diversified online ensembles detection (DOED), for handling these drifting concepts in data streams. Our approach maintains two ensembles of weighted experts, an ensemble with low diversity and an ensemble with high diversity, which are updated as per their accuracy in classifying the new data instances. Our approach detects drifts by comparing the two accuracies: an accuracy of an ensemble on the recent examples and its accuracy since the beginning of the learning. The final prediction for an instance is the class predicted by the ensemble which gives better accuracy in classifying the recent examples. When a drift is detected by an ensemble, it is reinitialized still maintaining its diversity levels. Experimental evaluation using various artificial and real-world datasets proves that DOED provides very high accuracy in classifying new data instances, irrespective of the size of dataset, type of drift or presence of noise. We compare DOED with the other learners in terms of new performance metrics such as kappa statistic, model cost, and the evaluation time and memory requirements. Our approach proved to be highly resource effective achieving very high accuracies even in a resource constrained environment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
  1. Baena-Garcı´a M, Campo-Avila JD, Fidalgo R, Bifet A (2006) Early Drift Detection Method. In: Proceedings of fourth ECML PKDD international workshop knowledge discovery from data streams, pp 77–86
  2. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings of seventh Brazilian symposium artificial intelligence (SBIA’04), pp 286–295
  3. Gao J, Fan W, Han J (2007a) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of IEEE international conference data mining (ICDM,’07), pp 143–152
  4. Minku FL, White A, Yao X (2010) The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742View Article
  5. Dawid A, Vovk V (1999) Prequential probability: principles and proper ties. Bernoulli 5(1):125–162MathSciNetView ArticleMATH
  6. Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd ICDM, 2003, USA, pp 123–130
  7. Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of International conference machine learning (ICML’05), pp 449–456
  8. Nishida K, Yamauchi K (2007a) Adaptive classifiers-ensemble system for tracking concept drift. In: Proceedings of sixth international conference machine learning and cybernetics (ICMLC’07), pp 3607–3612
  9. Nishida K, Yamauchi K (2007b) Detecting concept drift using statistical testing. In: Proceedings of 10th International conference discovery science (DS’07), pp 264–269
  10. Scholz M, Klinkenberg R (2005) An ensemble classifier for drifting concepts. Proceedings of the second international workshop on knowledge discovery from data streams (IWKDDS’05). Porto, Portugal, pp 53–64
  11. Stanley KO (2003) Learning concept drift with a committee of decision trees. Technical report AI-TR-03-302, Department of Computer Sciences, University of Texas, Austin, 2003
  12. Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM international conference on knowledge discovery and data mining, ACM Press, New York, NY, pp 377–382
  13. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of ACM SIGKDD international conference knowledge discovery and data mining, pp 226–235
  14. Chu F, Zaniolo C (2004) Fast and light boosting for adaptive mining of data streams. In: Proceedings of Pacific-Asia conference knowledge discovery and data mining (PAKDD’04), pp 282–292
  15. Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intell Data Anal Special Issue Knowl Discov Data Streams 11(1):3–28
  16. Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of International conference machine learning and applications (ICMLA’07), pp 404–409
  17. Gao J, Fan W, Han J, Yu P (2007b) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings of SIAM international conference data mining (ICDM)
  18. He H, Chen S (2008) IMORL: incremental multiple-object recognition and localization. IEEE Trans Neural Netw 19(10):1727–1738View Article
  19. Schlimmer J, Granger R (1986b) Beyond incremental processing: tracking concept drift. In: Proceedings of the 5th national conference on artificial intelligence. AAAI Press, Menlo Park, pp 502–507
  20. Polikar R, Udpa L, Udpa SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybernet Part C 31(4):497–508View Article
  21. Sidhu P, Bhatia MPS (2014) Extended dynamic weighted majority using diversity to handle drifts. New Trends Datab Inf Syst Adv Intell Syst Comput 241:389–395View Article
  22. Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51:181–207View ArticleMATH
  23. Tang EK, Sunganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271View Article
  24. Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3):385–404View Article
  25. Kolter JZ, Maloof MA (2007) Dynamic weighted Majority: an ensemble method for drifting concepts. J Machine Learn Res 8:2755–2790MATH
  26. Blum A (1997) Empirical support for winnow and weighted majority algorithms: results on a calendar scheduling domain, machine learning. Kluwer Academic Publisher, Boston
  27. Littlestone N, Warmuth M (1994) The weighted majority algorithm. Inf Comput 108:212–261MathSciNetView ArticleMATH
  28. Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
  29. Widmer G (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101 .16.Klinkenberg R., Learning drifting
  30. Tsymbal A (2004) The problems of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland, April 2004
  31. Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. In: Proceedings of the 6th international workshop on multiple classifier systems, ser. Lecture notes in computer science, vol 3541, pp 176–185
  32. Nishida K (2008) Learning and detecting concept drift, PhD dissertation, Hokkaido University. http://​lis2.​huie.​hokudai.​ac.​jp/​%20​%20​knishida/​paper/​nishida2008-dissertation%20​.​pdf
  33. Tsai CJ, Lee CI, Yang WP (2009) Mining decision rules on data streams in the presence of concept drifts. Expert Syst Appl 36:1164–1178View Article
  34. Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Dec Mak 5:659–670View Article
  35. Yang Y, Wu X, Zhu X (2005) Combining proactive and reactive predictions for data streams. In: Proceedings of ACM SIGKDD, pp 710–715
  36. Su L, Liu HY, Song ZH (2011) A new classification algorithm for data stream. Int J Modern Educat Comput Sci 4:32–39View Article
  37. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22:1517–1531View Article
  38. Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619View Article
  39. Farid DM, Zhang L, Hossain A, Rahman CM, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst Appl 40(15):5895–5906. doi:10.​1016/​j.​eswa.​2013.​05.​001 View Article
  40. Yule G (1900) On the association of attributes in statistics. philosophical trans. R Soc Lond Ser A 194:257–319View ArticleMATH
  41. Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. Proceedings of the Seventh ACM international conference on knowledge discovery and data mining (SIGKDD’01). ACM Press, New York, pp 359–364View Article
  42. Yates F (1934) Contingency table involving small numbers and the χ2 test. J R Stat Soc Suppl 1:217–235View ArticleMATH
  43. Gama J, Sebastião R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In KDD’09, pp 329–338
  44. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of KDD’01, ACM Press. San Francisco, CA, pp 97–106
  45. Harries M (1999) Splice-2 comparative evaluation: electricity pricing, technical report. University of New South Wales, Australia, July 1999
  46. Blake C, Merz C (1998) UCI Repository of machine learning databases. Web site. http://​www.​ics.​uci.​edu/​~mlearn/​MLRepository.​html, Department of Information and Computer Sciences, University of California, Irvine, 1998
  47. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis, a framework for stream classification and clustering. Workshop on Applications of Pattern Analysis, JMLR: Workshop and Conference Proceedings 11(2010):44
  48. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 71–80
  49. Asuncion A, Newman DJ (2007) UCI machine learning repository. Web site, Department of Information and Computer Sciences, University of California, Irvine, http://​www.​ics.​uci.​edu/​~mlearn/​MLRepository.​html
  50. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Metadaten
Titel
An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection
verfasst von
Parneeta Sidhu
M. P. S. Bhatia
Publikationsdatum
01.12.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 6/2015
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-015-0366-1

Weitere Artikel der Ausgabe 6/2015

International Journal of Machine Learning and Cybernetics 6/2015 Zur Ausgabe

Neuer Inhalt