Methods Inf Med 2014; 53(06): 419-427
DOI: 10.3414/ME13-01-0122
Original Articles
Schattauer GmbH

The Evolution of Boosting Algorithms[*]

From Machine Learning to Statistical Modelling
A. Mayr
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
H. Binder
2   Institut für Medizinische Biometrie, Epidemiologie und Informatik, Johannes Gutenberg-Universität Mainz, Germany
,
O. Gefeller
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
,
M. Schmid
1   Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
3   Institut für Medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
› Author Affiliations
Further Information

Publication History

received: 11 November 2013

accepted: 02 May 2014

Publication Date:
20 January 2018 (online)

Summary

Background: The concept of boosting emerged from the field of machine learning. The basic idea is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction. This general concept was later adapted to the field of statistical modelling. Nowadays, boosting algorithms are often applied to estimate and select predictor effects in statistical regression models.

Objectives: This review article attempts to highlight the evolution of boosting algorithms from machine learning to statistical modelling.

Methods: We describe the AdaBoost algorithm for classification as well as the two most prominent statistical boosting approaches, gradient boosting and likelihood-based boosting for statistical modelling. We highlight the methodological background and present the most common software implementations.

Results: Although gradient boosting and likelihood-based boosting are typically treated separately in the literature, they share the same methodological roots and follow the same fundamental concepts. Compared to the initial machine learning algorithms, which must be seen as black-box prediction schemes, they result in statistical models with a straight-forward interpretation.

Conclusions: Statistical boosting algorithms have gained substantial interest during the last decade and offer a variety of options to address important research questions in modern biomedicine.

* Supplementary material published on our web-site www.methods-online.com


 
  • References

  • 1 Freund Y, Schapire R. Experiments With a New Boosting Algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning Theory. San Francisco, CA: Morgan Kaufmann Publishers Inc.; 1996: 148-156.
  • 2 Friedman JH, Hastie T, Tibshirani R. Additive Logistic Regression: A Statistical View of Boosting (with Discussion). The Annals of Statistics 2000; 28: 337-407.
  • 3 Friedman JH. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics 2001; 29: 1189-1232.
  • 4 Schmid M, Gefeller O, Hothorn T. Boosting Into a New Terminological Era. Methods Inf Med 2012; 51 (02) 150-151.
  • 5 Bühlmann P, Hothorn T. Boosting Algorithms: Regularization, Prediction and Model Fitting (with Discussion). Statistical Science 2007; 22: 477-522.
  • 6 Tutz G, Binder H. Generalized Additive Modeling with Implicit Variable Selection by Likelihood-based Boosting. Biometrics 2006; 62: 961-971.
  • 7 Faschingbauer F, Beckmann M, Goecke T, Yazdi B, Siemer J, Schmid M. et al. A New Formula for Optimized Weight Estimation in Extreme Fetal Macrosomia (≥ 4500 g). European Journal of Ultrasound 2012; 33 (05) 480-488.
  • 8 Lin K, Futschik A, Li H. A Fast Estimate for the Population Recombination Rate Based on Regression. Genetics 2013; 194 (02) 473-484.
  • 9 Saintigny P, Zhang L, Fan YHH, El-Naggar AK, Papadimitrakopoulou VA, Feng L. et al. Gene Expression Profiling Predicts the Development of Oral Cancer. Cancer Prevention Research 2011; 4 (02) 218-229.
  • 10 Li H, Luan Y. Boosting Proportional Hazards Models Using Smoothing Splines with Applications to High-Dimensional Microarray Data. Bioinformatics 2005; 21 (10) 2403-2409.
  • 11 Binder H, Benner A, Bullinger L, Schumacher M. Tailoring Sparse Multivariable Regression Techniques for Prognostic Single-Nucleotide Polymorphism Signatures. Statistics in Medicine 2013; 32 (10) 1778-1791.
  • 12 Mayr A, Schmid M. Boosting the Concordance Index for Survival Data - A Unified Framework to Derive and Evaluate Biomarker Combinations. PloS ONE 2014; 9 (01) e84483
  • 13 Mayr A, Binder H, Gefeller O, Schmid M. Extending Statistical Boosting - An Overview of Recent Methodological Developments. Methods Inf Med 2014; 53: 428-435.
  • 14 Bishop CM. et al. Pattern Recognition and Machine Learning. Vol. 4. Springer New York: 2006
  • 15 Kearns MJ, Valiant LG. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata. In: Johnson DS. editor Proceedings of the 21st Annual ACM Symposium on Theory of Computing, May 14-17, 1989. Seattle, Washington, USA: ACM; 1989: 433-444.
  • 16 Zhou ZH. Ensemble Methods: Foundations and Algorithms. CRC Machine Learning & Pattern Recognition. Chapman & Hall 2012
  • 17 Schapire RE. The Strength of Weak Learnability. Machine Learning 1990; 5 (02) 197-227.
  • 18 Freund Y. Boosting a Weak Learning Algorithm by Majority. In: Fulk MA, Case J. editors Proceedings of the Third Annual Workshop on Computational Learning Theory, COLT 1990, University of Rochester, Rochester, NY, USA, August 6-8. 1990. 1990 202-216.
  • 19 Schapire RE, Freund Y. Boosting: Foundations and Algorithms. MIT Press 2012
  • 20 Littlestone N, Warmuth MK. The Weighted Majority Algorithm. In: Foundations of Computer Science, 1989. 30th Annual Symposium on IEEE; 1989: 256-261.
  • 21 Ridgeway G. The State of Boosting. Computing Science and Statistics 1999; 31: 172-181.
  • 22 Meir R, Rätsch G. An Introduction to Boosting and Leveraging. Advanced Lectures on Machine Learning 2003: 118-183.
  • 23 Bauer E, Kohavi R. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Journal of Machine Learning 1999; 36: 105-139.
  • 24 Breiman L. Bagging Predictors. Machine Learning 1996; 24: 123-140.
  • 25 Breiman L. Arcing Classifiers (with Discussion). The Annals of Statistics 1998; 26: 801-849.
  • 26 Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed.. New York: Springer; 2009
  • 27 Dietterich T. Overfitting and Undercomputing in Machine Learning. ACM Computing Surveys (CSUR) 1995; 27 (03) 326-327.
  • 28 Grove AJ, Schuurmans D. Boosting in the Limit: Maximizing the Margin of Learned Ensembles. In: Proceeding of the AAAI-98. John Wiley & Sons Ltd; 1998: 692-699.
  • 29 Rätsch G, Onoda T, Müller KR. Soft Margins for AdaBoost. Machine Learning 2001; 42 (03) 287-320.
  • 30 Blumer A, Ehrenfeucht A, Haussler D, Warmuth MK. Occam’s Razor. Information Processing Letters 1987; 24: 377-380.
  • 31 Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. The Annals of Statistics 1998; 26 (05) 1651-1686.
  • 32 Reyzin L, Schapire RE. How Boosting the Margin can also Boost Classifier Complexity. In: Proceeding of the 23rd International Conference on Machine Learning. 2006: 753-760.
  • 33 Breiman L. Prediction Games and Arcing Algorithms. Neural Computation 1999; 11: 1493-1517.
  • 34 Mease D, Wyner A. Evidence Contrary to the Statistical View of Boosting. The Journal of Machine Learning Research 2008; 9: 131-156.
  • 35 Bühlmann P, Hothorn T. Rejoinder: Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science 2007; 22: 516-522.
  • 36 Bühlmann P, Yu B. Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting. Journal of Machine Learning Research 2008; 9: 187-194.
  • 37 Hastie T, Tibshirani R. Generalized Additive Models. London: Chapman & Hall; 1990
  • 38 Bühlmann P, Yu B. Boosting with the L2 Loss: Regression and Classification. Journal of the American Statistical Association 2003; 98: 324-338.
  • 39 Bühlmann P. Boosting for High-Dimensional Linear Models. The Annals of Statistics 2006; 34: 559-583.
  • 40 Schmid M, Hothorn T. Boosting Additive Models Using Component-Wise P-splines. Computational Statistics & Data Analysis 2008; 53: 298-311.
  • 41 Hofner B, Mayr A, Robinzonov N, Schmid M. Model-Based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics 2014; 29: 3-35.
  • 42 Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society - Series B 1996; 58 (01) 267-288.
  • 43 Ma S, Huang J. Regularized ROC Method for Disease Classification and Biomarker Selection with Microarray Data. Bioinformatics 2005; 21 (24) 4356-4362.
  • 44 Efron B, Hastie T, Johnstone L, Tibshirani R. Least Angle Regression. Annals of Statistics 2004; 32: 407-499.
  • 45 Tutz G, Binder H. Boosting Ridge Regression. Computational Statistics & Data Analysis 2007; 51 (12) 6044-6059.
  • 46 Groll A, Tutz G. Regularization for Generalized Additive Mixed Models by Likelihood-based Boosting. Methods Inf Med 2012; 51 (02) 168-177.
  • 47 Binder H, Schumacher M. Allowing for Mandatory Covariates in Boosting Estimation of Sparse High-Dimensional Survival Models. BMC Bioinformatics 2008. 9 (14)
  • 48 Mayr A, Hofner B, Schmid M. The Importance of Knowing When to Stop - A Sequential Stopping Rule for Component-Wise Gradient Boosting. Methods Inf Med 2012; 51 (02) 178-186.
  • 49 Hansen MH, Yu B. Model Selection and the Principle of Minimum Description Length. Journal of the American Statistical Association 2001; 96 (454) 746-774.
  • 50 Hastie T. Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science 2007; 22 (04) 513-515.
  • 51 R Development Core Team R: A Language and Environment for Statistical Computing. Vienna, Austria 2014. ISBN 3-900051 07-0 Available from: http://www.R-project.org
  • 52 Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B. mboost: Model-Based Boosting; 2013. R package version 2. 2-3. Available from: http://CRAN.R-project.org/package=mboost
  • 53 Ridgeway G. gbm: Generalized Boosted Regression Models; 2012. R package version 1. 6-3.2. Available from: http://CRAN.R-project.org/package=gbm
  • 54 Binder H. GAMBoost: Generalized Linear and Additive Models by Likelihood Based Boosting.; 2011. R package version 1.2-2. Available from: http://CRAN.R-project.org/package=GAMBoost
  • 55 Binder H. CoxBoost: Cox Models by Likelihood-based Boosting for a Single Survival Endpoint or Competing Risks; 2013. R package version 1.4. Available from: http://CRAN.R-project.org/package=CoxBoost
  • 56 Bühlmann P, Yu B. Sparse Boosting. Journal of Machine Learning Research 2007; 7: 1001-1024.
  • 57 Fenske N, Kneib T, Hothorn T. Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression. Journal of the American Statistical Association 2011; 106 (494) 494-510.