Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 2/2024

17-10-2023

Mondrian forest for data stream classification under memory constraints

Authors: Martin Khannouz, Tristan Glatard

Published in: Data Mining and Knowledge Discovery | Issue 2/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Supervised learning algorithms generally assume the availability of enough memory to store data models during the training and test phases. However, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. In this paper, we adapt the online Mondrian forest classification algorithm to work with memory constraints on data streams. In particular, we design five out-of-memory strategies to update Mondrian trees with new data points when the memory limit is reached. Moreover, we design node trimming mechanisms to make Mondrian trees more robust to concept drifts under memory constraints. We evaluate our algorithms on a variety of real and simulated datasets, and we conclude with recommendations on their use in different situations: the Extend Node strategy appears as the best out-of-memory strategy in all configurations, whereas different node trimming mechanisms should be adopted depending on whether a concept drift is expected. All our methods are implemented in the OrpailleCC open-source library and are ready to be used on embedded systems and connected objects.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
available here.
 
2
available here.
 
Literature
go back to reference Akbar D, Omid S, Tristan G, Emad Shihab (2019) A quantitative comparison of overlapping and non-overlapping sliding windows for human activity recognition using inertial sensors. Sensors 19(22):5026CrossRef Akbar D, Omid S, Tristan G, Emad Shihab (2019) A quantitative comparison of overlapping and non-overlapping sliding windows for human activity recognition using inertial sensors. Sensors 19(22):5026CrossRef
go back to reference Albert B, Ricard G (2009) Adaptive learning from evolving data streams. Advances in Intelligent Data Analysis VIII. Springer, Berlin Heidelberg, pp 249–260 Albert B, Ricard G (2009) Adaptive learning from evolving data streams. Advances in Intelligent Data Analysis VIII. Springer, Berlin Heidelberg, pp 249–260
go back to reference Albert B, Ricard G (2009) Adaptive learning from evolving data streams. Advances in intelligent data analysis VIII. Springer, Berlin Heidelberg, pp 249–260 Albert B, Ricard G (2009) Adaptive learning from evolving data streams. Advances in intelligent data analysis VIII. Springer, Berlin Heidelberg, pp 249–260
go back to reference Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604 Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
go back to reference Bifet Albert, Gavaldà Ricard (apr 2007) Learning from Time-Changing Data with Adaptive Windowing. In: Proceedings of the 2007 SIAM international conference on data mining. society for industrial and applied mathematics Bifet Albert, Gavaldà Ricard (apr 2007) Learning from Time-Changing Data with Adaptive Windowing. In: Proceedings of the 2007 SIAM international conference on data mining. society for industrial and applied mathematics
go back to reference Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09. ACM Press Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09. ACM Press
go back to reference Bifet A, Zhang J, Fan W, He C, Zhang J, Qian J, Holmes G, Pfahringer B (2017) Extremely Fast Decision Tree Mining for Evolving Data Streams. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’17, pp 1733–1742, Bifet A, Zhang J, Fan W, He C, Zhang J, Qian J, Holmes G, Pfahringer B (2017) Extremely Fast Decision Tree Mining for Evolving Data Streams. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’17, pp 1733–1742,
go back to reference Cano A, Krawczyk B (2022) ROSE: Robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach Learn 111(7):2561–2599MathSciNetCrossRef Cano A, Krawczyk B (2022) ROSE: Robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach Learn 111(7):2561–2599MathSciNetCrossRef
go back to reference Dan MT, Scott S, Andrew G, and Ilya K (2014) . Using a Wearable Sensor to Find, Recognize, and Count Repetitive Exercises, RecoFit Dan MT, Scott S, Andrew G, and Ilya K (2014) . Using a Wearable Sensor to Find, Recognize, and Count Repetitive Exercises, RecoFit
go back to reference Dutta L, Bharali S (2021) Tinyml meets iot: a comprehensive survey. Internet of Things 16:100461CrossRef Dutta L, Bharali S (2021) Tinyml meets iot: a comprehensive survey. Internet of Things 16:100461CrossRef
go back to reference Elbasi S, Büyükçakı, Bonab H, Can F (2021) On-the-fly ensemble pruning in evolving data streams Elbasi S, Büyükçakı, Bonab H, Can F (2021) On-the-fly ensemble pruning in evolving data streams
go back to reference Gama J, Sebastião R, Rodrigues P P (2009) Issues in Evaluation of Stream Learning Algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 329-338 Gama J, Sebastião R, Rodrigues P P (2009) Issues in Evaluation of Stream Learning Algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 329-338
go back to reference Gupta C, Suggala A S, Goyal A, Simhadri H V, Paranjape B, Kumar A, Goyal S, Udupa R, Varma M, Jain P (06–11 Aug 2017) ProtoNN: Compressed and accurate kNN for resource-scarce devices. In Doina Precup and Yee Whye Teh, editors, In: Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp 1331–1340. PMLR Gupta C, Suggala A S, Goyal A, Simhadri H V, Paranjape B, Kumar A, Goyal S, Udupa R, Varma M, Jain P (06–11 Aug 2017) ProtoNN: Compressed and accurate kNN for resource-scarce devices. In Doina Precup and Yee Whye Teh, editors, In: Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp 1331–1340. PMLR
go back to reference HelloAlone. C++ implementation of the mondrian forest, (2018) HelloAlone. C++ implementation of the mondrian forest, (2018)
go back to reference Honnikoll N, Baidari I (2021) Mean error rate weighted online boosting method. The Comput J 66(1):1–15CrossRef Honnikoll N, Baidari I (2021) Mean error rate weighted online boosting method. The Comput J 66(1):1–15CrossRef
go back to reference Khannouz M, Li B, Glatard T (2019) OrpailleCC: a library for data stream analysis on embedded systems. The J Open Source Softw 4:1485ADSCrossRef Khannouz M, Li B, Glatard T (2019) OrpailleCC: a library for data stream analysis on embedded systems. The J Open Source Softw 4:1485ADSCrossRef
go back to reference Kumar A, Goyal S, Varma M (2017) Resource-efficient machine learning in 2 kb ram for the internet of things. In: Proceedings of the 34th international conference on machine learning - volume 70, ICML’17, pp 1935-1944. JMLR.org Kumar A, Goyal S, Varma M (2017) Resource-efficient machine learning in 2 kb ram for the internet of things. In: Proceedings of the 34th international conference on machine learning - volume 70, ICML’17, pp 1935-1944. JMLR.org
go back to reference Lakshminarayanan B (2014) Python implementation of the mondrian forest Lakshminarayanan B (2014) Python implementation of the mondrian forest
go back to reference Lakshminarayanan B, Roy DM, Teh Y W (2014) Mondrian Forests: efficient online random forests. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, vol 4, pp 3140–3148. Curran Associates, Inc Lakshminarayanan B, Roy DM, Teh Y W (2014) Mondrian Forests: efficient online random forests. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, vol 4, pp 3140–3148. Curran Associates, Inc
go back to reference Martin K, Tristan Glatard (2020) A benchmark of data stream classification for human activity recognition on connected objects. Sensors (Basel, Switzerland) 20(22):6486CrossRef Martin K, Tristan Glatard (2020) A benchmark of data stream classification for human activity recognition on connected objects. Sensors (Basel, Switzerland) 20(22):6486CrossRef
go back to reference Montiel Jacob, Bifet Albert, Losing Viktor, Read Jesse, Abdessalem Talel (dec 2018) Learning fast and slow: a unified batch/stream Framework. In: 2018 IEEE international conference on big data (Big Data). IEEE Montiel Jacob, Bifet Albert, Losing Viktor, Read Jesse, Abdessalem Talel (dec 2018) Learning fast and slow: a unified batch/stream Framework. In: 2018 IEEE international conference on big data (Big Data). IEEE
go back to reference Morris D, Saponas TS, Guillory A, Kelner I (2014) RecoFit: Using a Wearable Sensor to Find, Recognize, and Count Repetitive Exercises. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’14, pp 3225-3234, New York, NY, USA. Association for Computing Machinery Morris D, Saponas TS, Guillory A, Kelner I (2014) RecoFit: Using a Wearable Sensor to Find, Recognize, and Count Repetitive Exercises. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’14, pp 3225-3234, New York, NY, USA. Association for Computing Machinery
go back to reference Murshed M G Sarwar, Murphy C, Hou D, Khan N, Ananthanarayanan G, Hussain F (2021) Machine learning at the network edge: A survey. ACM Comput. Surv., 54(8) Murshed M G Sarwar, Murphy C, Hou D, Khan N, Ananthanarayanan G, Hussain F (2021) Machine learning at the network edge: A survey. ACM Comput. Surv., 54(8)
go back to reference Oresti B, Juan-Manuel G, Miguel D, Hector P, Ignacio R (2014) Window size impact in human activity recognition. Sensors 14(4):6474–6499CrossRef Oresti B, Juan-Manuel G, Miguel D, Hector P, Ignacio R (2014) Window size impact in human activity recognition. Sensors 14(4):6474–6499CrossRef
go back to reference Ray PP (2021) A review on tinyml: State-of-the-art and prospects. J King Saud Univ - Comput and Inf Sci 34(4):1595–1623 Ray PP (2021) A review on tinyml: State-of-the-art and prospects. J King Saud Univ - Comput and Inf Sci 34(4):1595–1623
go back to reference Reiss Attila, Stricker Didier (2012) Introducing a New Benchmarked Dataset for Activity Monitoring. In 2012 16th international symposium on wearable computers, pp 108–109 Reiss Attila, Stricker Didier (2012) Introducing a New Benchmarked Dataset for Activity Monitoring. In 2012 16th international symposium on wearable computers, pp 108–109
go back to reference Sugawara Yu, Oyama Satoshi, Kurihara Masahito (2021) Adaptive rotation forests: Decision tree ensembles for sequential learning. In: 2021 IEEE international conference on systems, Man, and Cybernetics (SMC), pp 613–618 Sugawara Yu, Oyama Satoshi, Kurihara Masahito (2021) Adaptive rotation forests: Decision tree ensembles for sequential learning. In: 2021 IEEE international conference on systems, Man, and Cybernetics (SMC), pp 613–618
go back to reference Ustad A, Logacjov A, Trollebø SØ, Thingstad P, Vereijken B, Bach K, Maroni NS (2023) Validation of an activity type recognition model classifying daily physical behavior in older adults: the HAR70+ model. Sensors 23(5):2368ADSCrossRefPubMedPubMedCentral Ustad A, Logacjov A, Trollebø SØ, Thingstad P, Vereijken B, Bach K, Maroni NS (2023) Validation of an activity type recognition model classifying daily physical behavior in older adults: the HAR70+ model. Sensors 23(5):2368ADSCrossRefPubMedPubMedCentral
Metadata
Title
Mondrian forest for data stream classification under memory constraints
Authors
Martin Khannouz
Tristan Glatard
Publication date
17-10-2023
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 2/2024
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-023-00970-4

Other articles of this Issue 2/2024

Data Mining and Knowledge Discovery 2/2024 Go to the issue

Premium Partner