Skip to main content
Top
Published in: International Journal of Data Science and Analytics 1/2022

14-07-2021 | Regular Paper

Probabilistic exact adaptive random forest for recurrent concepts in data streams

Authors: Ocean Wu, Yun Sing Koh, Gillian Dobbie, Thomas Lacombe

Published in: International Journal of Data Science and Analytics | Issue 1/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In order to adapt random forests to the dynamic nature of data streams, the state-of-the-art technique discards trained trees and grows new trees when concept drifts are detected. This is particularly wasteful when recurrent patterns exist. In this work, we introduce a novel framework called PEARL, which uses both an exact technique and a probabilistic graphical model with Lossy Counting, to replace drifted trees with relevant trees built in the past. The exact technique utilizes pattern matching to find the set of drifted trees that co-occurred in predictions in the past. Meanwhile, a probabilistic graphical model is being built to capture the tree replacements among recurrent concept drifts. Once the graphical model becomes stable, it replaces the exact technique and finds relevant trees in a probabilistic fashion. Further, Lossy Counting is applied to the graphical model which brings an added theoretical guarantee for both error rate and space complexity. We empirically show our technique outperforms baselines in terms of accuracy and kappa on both synthetic and real-world datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ahmadi, Z., Kramer, S.: Modeling recurring concepts in data streams: a graph-based framework. Knowl. Inf. Syst. 55(1), 15–44 (2018)CrossRef Ahmadi, Z., Kramer, S.: Modeling recurring concepts in data streams: a graph-based framework. Knowl. Inf. Syst. 55(1), 15–44 (2018)CrossRef
2.
go back to reference Anderson, R., Koh, Y.S., Dobbie, G., Bifet, A.: Recurring concept meta-learning for evolving data streams. Expert Syst. Appl. 138, 112832 (2019)CrossRef Anderson, R., Koh, Y.S., Dobbie, G., Bifet, A.: Recurring concept meta-learning for evolving data streams. Expert Syst. Appl. 138, 112832 (2019)CrossRef
3.
go back to reference Ángel, A.M., Bartolo, G.J., Ernestina, M.: Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function. Expert Syst. Appl. 46, 87–105 (2016)CrossRef Ángel, A.M., Bartolo, G.J., Ernestina, M.: Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function. Expert Syst. Appl. 46, 87–105 (2016)CrossRef
4.
go back to reference Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010) Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
5.
go back to reference Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 135–150. Springer, Berlin Heidelberg (2010)CrossRef Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 135–150. Springer, Berlin Heidelberg (2010)CrossRef
6.
go back to reference Bifet, A., Holmes, G., Pfahringer, B., Gavaldà, R.: Improving adaptive bagging methods for evolving data streams. In: Zhou, Z.H., Washio, T. (eds.) Advances in Machine Learning, pp. 23–37. Springer, Berlin Heidelberg (2009)CrossRef Bifet, A., Holmes, G., Pfahringer, B., Gavaldà, R.: Improving adaptive bagging methods for evolving data streams. In: Zhou, Z.H., Washio, T. (eds.) Advances in Machine Learning, pp. 23–37. Springer, Berlin Heidelberg (2009)CrossRef
9.
10.
go back to reference Chen, K., Koh, Y.S., Riddle, P.: Proactive drift detection: Predicting concept drifts in data streams using probabilistic networks. In: IJCNN, pp. 780–787. IEEE (2016) Chen, K., Koh, Y.S., Riddle, P.: Proactive drift detection: Predicting concept drifts in data streams using probabilistic networks. In: IJCNN, pp. 780–787. IEEE (2016)
11.
go back to reference Chiu, C.W., Minku, L.L.: Diversity-based pool of models for dealing with recurring concepts. In: 2018 IJCNN, pp. 1–8. IEEE (2018) Chiu, C.W., Minku, L.L.: Diversity-based pool of models for dealing with recurring concepts. In: 2018 IJCNN, pp. 1–8. IEEE (2018)
12.
go back to reference Gama, J., Kosina, P.: Recurrent concepts in data streams classification. Knowl. Inf. Syst. 40(3), 489–507 (2014)CrossRef Gama, J., Kosina, P.: Recurrent concepts in data streams classification. Knowl. Inf. Syst. 40(3), 489–507 (2014)CrossRef
14.
go back to reference Ghomeshi, H., Gaber, M.M., Kovalchuk, Y.: Eacd: evolutionary adaptation to concept drifts in data streams. Data Min. Knowl. Disc. 33, 663–694 (2019)CrossRef Ghomeshi, H., Gaber, M.M., Kovalchuk, Y.: Eacd: evolutionary adaptation to concept drifts in data streams. Data Min. Knowl. Disc. 33, 663–694 (2019)CrossRef
15.
go back to reference Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G., Abdessalem, T.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9–10), 1469–1495 (2017)MathSciNetCrossRef Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G., Abdessalem, T.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9–10), 1469–1495 (2017)MathSciNetCrossRef
16.
go back to reference Gonçalves Jr., P.M., Barros, R.S.M.D.: RCD: A recurring concept drift framework. Pattern Recogn. Lett. 34(9), 1018–1025 (2013)CrossRef Gonçalves Jr., P.M., Barros, R.S.M.D.: RCD: A recurring concept drift framework. Pattern Recogn. Lett. 34(9), 1018–1025 (2013)CrossRef
17.
go back to reference Goyal, A., Daumé, H.: Lossy conservative update (lcu) sketch: Succinct approximate count storage. In: 25th AAAI (2011) Goyal, A., Daumé, H.: Lossy conservative update (lcu) sketch: Succinct approximate count storage. In: 25th AAAI (2011)
18.
go back to reference Hu, J., Chen, J., Qin, X.: Algorithm of recurring concept drift base on main feature extraction. In: Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence, pp. 59–65 (2019) Hu, J., Chen, J., Qin, X.: Algorithm of recurring concept drift base on main feature extraction. In: Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence, pp. 59–65 (2019)
19.
go back to reference Koh, Y.S., Huang, D.T.J., Pearce, C., Dobbie, G.: Volatility drift prediction for transactional data streams. In: 2018 IEEE ICDM, pp. 1091–1096. IEEE (2018) Koh, Y.S., Huang, D.T.J., Pearce, C., Dobbie, G.: Volatility drift prediction for transactional data streams. In: 2018 IEEE ICDM, pp. 1091–1096. IEEE (2018)
20.
go back to reference Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Third IEEE International Conference on Data Mining, pp. 123–130 (2003) Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Third IEEE International Conference on Data Mining, pp. 123–130 (2003)
21.
go back to reference Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: A survey. Information Fusion 37, 132–156 (2017)CrossRef Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: A survey. Information Fusion 37, 132–156 (2017)CrossRef
22.
go back to reference Krawczyk, B., Woźniak, M.: One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft. Comput. 19, 3387–3400 (2015)CrossRef Krawczyk, B., Woźniak, M.: One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft. Comput. 19, 3387–3400 (2015)CrossRef
24.
go back to reference Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB, VLDB ’02, pp. 346–357. VLDB Endowment (2002) Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB, VLDB ’02, pp. 346–357. VLDB Endowment (2002)
25.
go back to reference Masud, M.M., Al-Khateeb, T.M., Khan, L., Aggarwal, C., Gao, J., Han, J., Thuraisingham, B.: Detecting recurring and novel classes in concept-drifting data streams. In: 2011 IEEE 11th ICDM, pp. 1176–1181. IEEE (2011) Masud, M.M., Al-Khateeb, T.M., Khan, L., Aggarwal, C., Gao, J., Han, J., Thuraisingham, B.: Detecting recurring and novel classes in concept-drifting data streams. In: 2011 IEEE 11th ICDM, pp. 1176–1181. IEEE (2011)
26.
go back to reference Moreira dos Reis, D., Maletzke, A., Silva, D.F., Batista, G.E.: Classifying and counting with recurrent contexts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1983–1992 (2018) Moreira dos Reis, D., Maletzke, A., Silva, D.F., Batista, G.E.: Classifying and counting with recurrent contexts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1983–1992 (2018)
27.
go back to reference Wu, O., Koh, Y.S., Dobbie, G., Lacombe, T.: PEARL: Probabilistic exact adaptive random forest with lossy counting for data streams. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 17–30. Springer (2020) Wu, O., Koh, Y.S., Dobbie, G., Lacombe, T.: PEARL: Probabilistic exact adaptive random forest with lossy counting for data streams. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 17–30. Springer (2020)
28.
go back to reference Yang, Y., Wu, X., Zhu, X.: Mining in anticipation for concept change: Proactive-reactive prediction in data streams. Data Min. Knowl. Disc. 13(3), 261–289 (2006)MathSciNetCrossRef Yang, Y., Wu, X., Zhu, X.: Mining in anticipation for concept change: Proactive-reactive prediction in data streams. Data Min. Knowl. Disc. 13(3), 261–289 (2006)MathSciNetCrossRef
Metadata
Title
Probabilistic exact adaptive random forest for recurrent concepts in data streams
Authors
Ocean Wu
Yun Sing Koh
Gillian Dobbie
Thomas Lacombe
Publication date
14-07-2021
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics / Issue 1/2022
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-021-00273-1

Other articles of this Issue 1/2022

International Journal of Data Science and Analytics 1/2022 Go to the issue

Premium Partner