ABSTRACT
This paper presents a new ensemble method for learning from non-stationary data streams. In these situations, massive data are constantly generated at high speed and their target function can change over time. The proposed method, named Fast Adaptive Stacking of Ensembles (FASE), uses a meta-classifier to combine the predictions from the base classifiers in the ensemble. FASE maintains a set of adaptive learners, in order to deal with concept drifting data. The new algorithm is able to process the input data in constant time and space computational complexity. It only receives as parameters the confidence level for the change detection mechanism and the number of base classifiers. These characteristics make FASE very suitable for learning from non-stationary data streams. We empirically compare the new algorithm with various state-of-the-art ensemble methods for learning in non-stationary data streams. We use a Naïve Bayes classifier and a Perceptron to evaluate the performance of the algorithms over real-world datasets. The experiment results show that FASE presents higher predictive accuracy in the investigated tasks, being also able to bound its computational cost.
- M. Baena, J. del Campo, R. Fidalgo, A. Bifet, R. Gavaldà, and R. Morales. Early Drift Detection Method. In 4th Int. Workshop on Knowledge Discovery from Data Streams, 2006.Google Scholar
- M. Basseville and I. Nikiforov. Detection of Abrupt Changes: Theory and Application. Prentice-Hall, Englewood Cliffs, NJ, 1993. Google ScholarDigital Library
- S. Bian and W. Wang. On diversity and accuracy of homogeneous and heterogeneous ensembles. Int. J. Hybrid Intell. Syst., 4(2):103--128, Apr. 2007. Google ScholarDigital Library
- A. Bifet. Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams, volume 207. IOS Press, 2010. Google ScholarDigital Library
- A. Bifet, E. Frank, G. Holmes, and B. Pfahringer. Ensembles of restricted hoeffding trees. ACM Trans. Intell. Syst. Technol., 3(2):30:1--30:20, 2012. Google ScholarDigital Library
- A. Bifet and R. Gavaldà. Learning from time-changing data with adaptive windowing. In Proc. 7th SIAM Int. Conf. on Data Mining, 2007.Google ScholarCross Ref
- A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive Online Analysis. Journal of Machine Learning Research, 11:1601--1604, 2010. Google ScholarDigital Library
- A. Bifet, G. Holmes, B. Pfahringer, and R. Gavaldà. New Ensemble Methods for Evolving Data Streams. In Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 139--148, 2009. Google ScholarDigital Library
- N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006. Google ScholarCross Ref
- W. Fan. Systematic data selection to mine concept-drifting data streams. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 128--137, 2004. Google ScholarDigital Library
- A. Frank and A. Asuncion. UCI Machine Learning Repository, 2010.Google Scholar
- I. Frías, J. del Campo, G. Ramos, R. Morales, A. Ortiz, and Y. Caballero. Aprendiendo con detección de cambios online. Computación y Sistemas, 18(1):169--183, 2014. Google ScholarDigital Library
- I. Frías, J. del Campo, G. Ramos, R. Morales, A. Ortiz, and Y. Caballero. Online and non-parametric drift detection methods based on Hoeffding's bounds. IEEE Transactions on Knowledge and Data Engineering, 14(3):810--823, 2015.Google ScholarDigital Library
- J. Gama and M. Gaber. Learning from Data Streams: Processing Techniques in Sensor Networks. Springer-Verlag, 1 edition, 2007. Google ScholarCross Ref
- J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with Drift Detection. In Proc. 20th Brazilian Symposium on Artificial Intelligence, pages 286--295, 2004.Google ScholarCross Ref
- J. Gama, P. Rodrigues, and G. Castillo. Evaluating Algorithms that Learn from Data Streams. In Proc. 2009 ACM Symposium on Applied Computing, pages 1496--1500, 2009. Google ScholarDigital Library
- J. Gama, R. Sebastião, and P. Rodrigues. On evaluating stream learning algorithms. Machine Learning, 90(3):317--346, 2013. Google ScholarDigital Library
- J. Gama, R. Sebastião, and P. Rodrigues. Issues in Evaluation of Stream Learning Algorithms. In Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 329--338, 2009. Google ScholarDigital Library
- M. Harries, C. Sammut, and K. Horn. Extracting Hidden Context. Machine Learning, 32(2):101--126, 1998. Google ScholarDigital Library
- I. Katakis, G. Tsoumakas, and I. Vlahavas. An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams. In Proc. 18th European Conf. on Artificial Intelligence, pages 763--764, 2008. Google ScholarDigital Library
- L. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004. Google ScholarCross Ref
- H. Nguyen, Y. Woon, W. Ng, and L. Wan. Heterogeneous ensemble for feature drifts in data streams. In Proceedings of the 16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume Part II, PAKDD'12, pages 1--12, 2012. Google ScholarDigital Library
- N. Oza. Online bagging and boosting. In 2005 IEEE Int. Conf. on Systems, Man and Cybernetics, volume 3, pages 2340--2345, 2005.Google ScholarCross Ref
- N. Oza and S. Russell. Experimental comparisons of online and batch versions of bagging and boosting. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '01, pages 359--364, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- N. Oza and S. Russell. Online Bagging and Boosting. In Proc. 8th Int. Workshop on Artificial Intelligence and Statistics, pages 105--112, 2001.Google Scholar
Index Terms
- Fast adaptive stacking of ensembles
Recommendations
Comparative Study of Fast Stacking Ensembles Families Algorithms
Intelligent SystemsAbstractOne of the main challenges in Machine Learning and Data Mining fields is the treatment of large Data Streams in the presence of Concept Drifts. This paper presents two families of ensemble algorithms designed to adapt to abrupt and gradual concept ...
An online adaptive classifier ensemble for mining non-stationary data streams
Many real-world situations constantly generate concept-drifting data streams at high speed. These situations demand adaptive algorithms able to learn online in accordance with the most recent target function (concept). This paper presents Online ...
An overview and comprehensive comparison of ensembles for concept drift
Highlights- Large comparison of ensembles configurable with drift detectors for data streams.
AbstractOnline learning is about extracting information from large data streams which may be affected by changes in the distribution of the data, events known as concept drift. Concept drift detectors are small programs that try to detect ...
Comments