Skip to main content
Top

2013 | OriginalPaper | Chapter

5. Tree-Based Methods

Authors : Chris Aldrich, Lidia Auret

Published in: Unsupervised Process Monitoring and Fault Diagnosis with Machine Learning Methods

Publisher: Springer London

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this chapter, tree-based methods are discussed as another of the three major machine learning paradigms considered in the book. This includes the basic information theoretical approach used to construct classification and regression trees and a few simple examples to illustrate the characteristics of decision tree models. Following this is a short introduction to ensemble theory and ensembles of decision trees, leading to random forest models, which are discussed in detail. Unsupervised learning of random forests in particular is reviewed, as these characteristics are potentially important in unsupervised fault diagnostic systems. The interpretation of random forest models includes a discussion on the assessment of the importance of variables in the model, as well as partial dependence analysis to examine the relationship between predictor variables and the response variable. A brief review of boosted trees follows that of random forests, including discussion of concepts, such as gradient boosting and the AdaBoost algorithm. The use of tree-based ensemble models is illustrated by an example on rotogravure printing and the identification of defects in hot rolled steel plate.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Binary splitting is considered here; extension to multiple splits is trivial.
 
2
The C4.5 algorithm (Quinlan 1993) scales the decrease in impurity for categorical input variables, as a bias favouring multilevel variables exists in the cross-entropy impurity function. This corrected impurity decrease is known as the gain ratio.
 
3
See “The Elements of Statistical Learning” (Hastie et al. 2009) for details.
 
4
Shi and Horvath (2006) focused on the clustering utility of random forest proximities, a subtle difference to general feature extraction applications. Here, clustering refers to the ability of a feature extraction method to generate projections where known clusters are separate, without using cluster information in training.
 
Literature
go back to reference Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation, 9(7), 1545–1588.CrossRef Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation, 9(7), 1545–1588.CrossRef
go back to reference Archer, K. J., & Kimes, R. V. (2008). Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis, 52(4), 2249–2260.MathSciNetMATHCrossRef Archer, K. J., & Kimes, R. V. (2008). Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis, 52(4), 2249–2260.MathSciNetMATHCrossRef
go back to reference Auret, L., & Aldrich, C. (2012). Interpretation of nonlinear relationships between process variables by use of random forests. Minerals Engineering, 35, 27–42.CrossRef Auret, L., & Aldrich, C. (2012). Interpretation of nonlinear relationships between process variables by use of random forests. Minerals Engineering, 35, 27–42.CrossRef
go back to reference Belson, W. A. (1959). Matching and prediction on the principle of biological classification. Journal of the Royal Statistical Society Series C (Applied Statistics), 8(2), 65–75. Belson, W. A. (1959). Matching and prediction on the principle of biological classification. Journal of the Royal Statistical Society Series C (Applied Statistics), 8(2), 65–75.
go back to reference Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.MATH Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.MATH
go back to reference Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling. Boca Raton: Chapman & Hall.MATH Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling. Boca Raton: Chapman & Hall.MATH
go back to reference Cutler, A., & Stevens, J. R. (2006). Random forests for microarrays. In Methods in enzymology; DNA microarrays, Part B: Databases and statistics (pp. 422–432). San Diego: Academic Press. Cutler, A., & Stevens, J. R. (2006). Random forests for microarrays. In Methods in enzymology; DNA microarrays, Part B: Databases and statistics (pp. 422–432). San Diego: Academic Press.
go back to reference Dietterich, T. G. (2000a). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139–157.CrossRef Dietterich, T. G. (2000a). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139–157.CrossRef
go back to reference Evans, B., & Fisher, D. (1994). Overcoming process delays with decision tree induction. IEEE Expert, 9(1), 60–66.CrossRef Evans, B., & Fisher, D. (1994). Overcoming process delays with decision tree induction. IEEE Expert, 9(1), 60–66.CrossRef
go back to reference Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning. Proceedings of the Thirteenth International Conference (ICML’96)| (pp.148–156|558). Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning. Proceedings of the Thirteenth International Conference (ICML’96)| (pp.148–156|558).
go back to reference Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.MathSciNetMATHCrossRef Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.MathSciNetMATHCrossRef
go back to reference Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2), 337–374.MathSciNetMATHCrossRef Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2), 337–374.MathSciNetMATHCrossRef
go back to reference Gillo, M. W., & Shelly, M. W. (1974). Predictive modeling of multivariable and multivariate data. Journal of the American Statistical Association, 69(347), 646–653.MATHCrossRef Gillo, M. W., & Shelly, M. W. (1974). Predictive modeling of multivariable and multivariate data. Journal of the American Statistical Association, 69(347), 646–653.MATHCrossRef
go back to reference Hansen, L., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001.CrossRef Hansen, L., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001.CrossRef
go back to reference Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning – Data mining, inference and prediction. New York: Springer.MATHCrossRef Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning – Data mining, inference and prediction. New York: Springer.MATHCrossRef
go back to reference Ho, T. K. (1995). Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (pp. 278–282). ICDAR1995. Montreal: IEEE Computer Society. Ho, T. K. (1995). Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (pp. 278–282). ICDAR1995. Montreal: IEEE Computer Society.
go back to reference Izenman, A. (2008). Modern multivariate statistical techniques: Regression, classification, and manifold learning. New York/London: Springer.CrossRef Izenman, A. (2008). Modern multivariate statistical techniques: Regression, classification, and manifold learning. New York/London: Springer.CrossRef
go back to reference Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society Series C (Applied Statistics), 29(2), 119–127. Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society Series C (Applied Statistics), 29(2), 119–127.
go back to reference Messenger, R., & Mandell, L. (1972). A modal search technique for predictive nominal scale multivariate analysis. Journal of the American Statistical Association, 67(340), 768–772. Messenger, R., & Mandell, L. (1972). A modal search technique for predictive nominal scale multivariate analysis. Journal of the American Statistical Association, 67(340), 768–772.
go back to reference Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58(302), 415–434.MATHCrossRef Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58(302), 415–434.MATHCrossRef
go back to reference Nicodemus, K. K., & Malley, J. D. (2009). Predictor correlation impacts machine learning algorithms: Implications for genomic studies. Bioinformatics, 25(15), 1884–1890.CrossRef Nicodemus, K. K., & Malley, J. D. (2009). Predictor correlation impacts machine learning algorithms: Implications for genomic studies. Bioinformatics, 25(15), 1884–1890.CrossRef
go back to reference Polikar, R. (2006). Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 6(3), 21–45.CrossRef Polikar, R. (2006). Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 6(3), 21–45.CrossRef
go back to reference Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
go back to reference Quinlan, R. (1993). C4.5: Programs for machine learning. Palo Alto: Morgan Kaufmann. Quinlan, R. (1993). C4.5: Programs for machine learning. Palo Alto: Morgan Kaufmann.
go back to reference Ratsch, G., Onoda, T., & Muller, K. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320.CrossRef Ratsch, G., Onoda, T., & Muller, K. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320.CrossRef
go back to reference Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5), 401–409.CrossRef Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5), 401–409.CrossRef
go back to reference Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
go back to reference Schapire, R., Freund, Y., Bartlett, P., & Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686.MathSciNetMATHCrossRef Schapire, R., Freund, Y., Bartlett, P., & Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686.MathSciNetMATHCrossRef
go back to reference Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1), 118–138.MathSciNetCrossRef Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1), 118–138.MathSciNetCrossRef
go back to reference Strobl, C., Boulesteix, A., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1), 307–317.CrossRef Strobl, C., Boulesteix, A., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1), 307–317.CrossRef
go back to reference Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323–348.CrossRef Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323–348.CrossRef
go back to reference Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.MATHCrossRef Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.MATHCrossRef
Metadata
Title
Tree-Based Methods
Authors
Chris Aldrich
Lidia Auret
Copyright Year
2013
Publisher
Springer London
DOI
https://doi.org/10.1007/978-1-4471-5185-2_5

Premium Partner