Abstract
We study resource-limited online learning, motivated by the problem of conditional-branch outcome prediction in computer architecture. In particular, we consider (parallel) time and space-efficient ensemble learners for online settings, empirically demonstrating benefits similar to those shown previously for offline ensembles. Our learning algorithms are inspired by the previously published “boosting by filtering” framework as well as the offline Arc-x4 boosting-style algorithm. We train ensembles of online decision trees using a novel variant of the ID4 online decision-tree algorithm as the base learner, and show empirical results for both boosting and bagging-style online ensemble methods. Our results evaluate these methods on both our branch prediction domain and online variants of three familiar machine-learning benchmarks. Our data justifies three key claims. First, we show empirically that our extensions to ID4 significantly improve performance for single trees and additionally are critical to achieving performance gains in tree ensembles. Second, our results indicate significant improvements in predictive accuracy with ensemble size for the boosting-style algorithm. The bagging algorithms we tried showed poor performance relative to the boosting-style algorithm (but still improve upon individual base learners). Third, we show that ensembles of small trees are often able to outperform large single trees with the same number of nodes (and similarly outperform smaller ensembles of larger trees that use the same total number of nodes). This makes online boosting particularly useful in domains such as branch prediction with tight space restrictions (i.e., the available real-estate on a microprocessor chip).
Article PDF
Similar content being viewed by others
References
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36:1/2, 105-142.
Breiman, L. (1996a). Bagging predictors. Machine Learning, 24:2, 123-140.
Breiman, L. (1996b). Arcing classifiers. Technical Report 460, Department of Statistics, University of California, Berkeley, CA.
Burger, D., & Austin, T. (1997). The SimpleScalar tool set, version 2.0. Technical Report 1342, Department of Computer Science, University of Wisconsin-Madison.
Calder, B., Grunwald, D., Lindsay, D., Jones, M., Martin, J., Mozer, M., & Zorn, B. (1997). Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems, 19:1, 188-222.
Chang, P.-Y., Hao, E., & Patt, Y. (1995). Alternative implementations of hybrid branch predictors. In Proceedings of the 28th ACM/IEEE International Symposium on Microarchitecture (pp. 252-257).
Cormen, T. H., Leiserson, C. E., & Rivest, R. L. (1997). Arithmetic circuits. In Introduction to Algorithms. Cambridge, MA: MIT Press.
Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:2, 139-158.
Domingo, C., & Watanabe, O. (2000). MadaBoost: A modification of AdaBoost. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory (pp. 180-189).
Eden, A., & Mudge, T. (1998). The YAGS branch predictor. In 31th Annual IEEE/ACM Symposium on Microarchitecture(pp. 69-77).
Emer, J., & Gloy, N. (1997). A language for describing predictors and its application to automatic synthesis. In Proceedings of the 24th International Symposium on Computer Architecture (pp. 304-314).
Fan,W., Stolfo, S., & Zhang, J. (1999). The application of AdaBoost for distributed, scalable and on-line learning. In Proceedings of the Fifth SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 362-366).
Fern, A., Givan, R., Falsafi, B., & Vijaykumar, T. N. (2000). Dynamic feature selection for hardware prediction. Technical Report TR-ECE 00-12, School of Electrical & Computer Engineering, Purdue University.
Freund,Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121:2, 256-285.
Freund, Y.,& Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 148-156).
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119-139.
Grove, A., & Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 692-699).
Gwennap, L. (1996). Digital 21264 sets new standard. In Microprocessor Report, Oct. (pp. 9-15).
Heil, T., Smith, Z., & Smith, J. (1999). Improving branch predictors by correlation on data values. In Proceedings of the 32nd Annual International Symposium on Microarchitecutre (pp. 28-37).
Jimenez, D., Hanson, H., & Lin, C. (2001). Boolean formula-based branch prediction for future technologies. In Proceedings of the International Conference on Parallel Architectures and Compilation Technologies (pp. 97-106).
Jimenez, D.,& Lin, C. (2001). Dynamic branch prediction with perceptrons. In Proceedings of the 7th International Symposium on High Performance Computer Architecture (pp. 197-206).
Matheus, C. J., & Rendell, L. A. (1989). Constructive induction on decision trees. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 645-650).
McFarling, S. (1993). Combining branch predictors. WRL Technical Note TN-36.
Merz, C. J., & Murphy, P. M. (1996). UCI repository of machine learning databases. http://www.ics.uci.edu/ ~mlearn/MLRepository.html.
Nair, R. (1995). Dynamic path-based branch correlation. In 28th International Symposium on Microarchitecture (pp. 15-23).
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-106.
Quinlan, J. R. (1988). An empirical comparison of genetic and decision-tree classifiers. In Proceedings of the Fifth International Conference on Machine Learning (135-141).
Quinlan, J. R. (1996). Bagging, boosting and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 725-730).
Reilly, J. (1995). SPEC describes SPEC95 products and benchmarks. Standard Performance Evaluation Corporation August 1995 newsletter, http://www.spec.org.
Schlimmer, J. C., & Fisher, D. (1986). A case study of incremental concept induction. In Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 496-501).
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5:2, 197-227.
Sherwood, T.,& Calder, B. (2001). Automated design of finite state machine predictors for customized processors. In Proceedings of the 28th International Symposium on Computer Architecture (pp. 86-97).
Utgoff, P. E. (1989). Incremental induction of decision trees. Machine Learning, 4:2, 161-186.
Utgoff, P. E., Berkman, N. C., & Clouse, J. A. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29:1, 5-44.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fern, A., Givan, R. Online Ensemble Learning: An Empirical Study. Machine Learning 53, 71–109 (2003). https://doi.org/10.1023/A:1025619426553
Issue Date:
DOI: https://doi.org/10.1023/A:1025619426553