skip to main content
research-article

Ensembles of Restricted Hoeffding Trees

Published:01 February 2012Publication History
Skip Abstract Section

Abstract

The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this article, we propose to exploit this phenomenon in the data stream context by building an ensemble of Hoeffding trees that are each limited to a small subset of attributes. In this way, each tree is restricted to model interactions between attributes in its corresponding subset. Because it is not known a priori which attribute subsets are relevant for prediction, we build exhaustive ensembles that consider all possible attribute subsets of a given size. As the resulting Hoeffding trees are not all equally important, we weigh them in a suitable manner to obtain accurate classifications. This is done by combining the log-odds of their probability estimates using sigmoid perceptrons, with one perceptron per class. We propose a mechanism for setting the perceptrons’ learning rate using the change detection method for data streams, and also use to reset ensemble members (i.e., Hoeffding trees) when they no longer perform well. Our experiments show that the resulting ensemble classifier outperforms bagging for data streams in terms of accuracy when both are used in conjunction with adaptive naive Bayes Hoeffding trees, at the expense of runtime and memory consumption. We also show that our stacking method can improve the performance of a bagged ensemble.

References

  1. Bifet, A. and Gavaldà, R. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the SIAM International Conference on Data Mining.Google ScholarGoogle Scholar
  2. Bifet, A., Holmes, G., Pfahringer, B., and Gavaldà, R. 2009a. Improving adaptive bagging methods for evolving data streams. In Proceedings of the 1st Asian Conference on Machine Learning (ACML). 23--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., and Gavaldà, R. 2009b. New ensemble methods for evolving data streams. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 139--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bifet, A., Holmes, G., Kirkby, R., and Pfahringer, B. 2010a. MOA: Massive online analysis. J. Mach. Learn. Res. 11, 1601--1604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bifet, A., Holmes, G., and Pfahringer, B. 2010b. Leveraging bagging for evolving data streams. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD). Part I. 135--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bifet, A., Holmes, G., Pfahringer, B., and Frank, E. 2010c. Fast perceptron decision tree learning from evolving data streams. In Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD). Part II. 299--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Breiman, L. 2001. Random forests. In Machine Learning, 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification and Regression Trees. Wadsworth.Google ScholarGoogle Scholar
  9. Domingos, P. and Hulten, G. 2000. Mining high-speed data streams. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Frank, A. and Asuncion, A. 2010. UCI machine learning repository. http://archive.ics.uci.edu/ml/citation_policy.html.Google ScholarGoogle Scholar
  11. Gama, J., Rocha, R., and Medas, P. 2003. Accurate decision trees for mining high-speed data streams. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 523--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gama, J., Medas, P., Castillo, G., and Rodrigues, P. P. 2004. Learning with drift detection. In Proceedings of the SBIA Brazilian Symposium on Artificial Intelligence. 286--295.Google ScholarGoogle Scholar
  13. Gama, J., Sebastião, R., and Rodrigues, P. P. 2009. Issues in evaluation of stream learning algorithms. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Harries, M. 1999. Splice-2 comparative evaluation: Electricity pricing. Tech. rep., The University of South Wales.Google ScholarGoogle Scholar
  15. Ho, T. K. 1998. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8, 832--844. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Holmes, G., Kirkby, R., and Pfahringer, B. 2005. Stress-Testing Hoeffding trees. In Proceedings of the 9th European Conference on Knowledge Discovery in Databases (PKDD’05). 495--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hulten, G., Spencer, L., and Domingos, P. 2001. Mining time-changing data streams. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kirkby, R. 2007. Improving hoeffding trees. Ph.D. thesis, University of Waikato.Google ScholarGoogle Scholar
  19. Kolter, J. Z. and Maloof, M. A. 2007. Dynamic weighted majority: An ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755--2790. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Margineantu, D. D. and Dietterich, T. G. 1997. Pruning adaptive boosting. In Proceedings of the 14th International Conference on Machine Learning (ICML). 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Oza, N. C. and Russell, S. J. 2001a. Experimental comparisons of online and batch versions of bagging and boosting. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 359--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Oza, N. C. and Russell, S. J. 2001b. Online bagging and boosting. In Proceedings of the Conference on Artificial Intelligence and Statistics. 105--112.Google ScholarGoogle Scholar
  23. Wolpert, D. H. 1992. Stacked generalization. Neural Netw. 5, 2, 241--259. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ensembles of Restricted Hoeffding Trees

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 3, Issue 2
      February 2012
      455 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2089094
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 February 2012
      • Accepted: 1 August 2011
      • Revised: 1 June 2011
      • Received: 1 March 2011
      Published in tist Volume 3, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader