nach oben

Erschienen in:

2013 | OriginalPaper | Buchkapitel

8. Regression Trees and Rule-Based Models

verfasst von : Max Kuhn, Kjell Johnson

Erschienen in: Applied Predictive Modeling

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Tree-based models consist of one or more nested if-then statements for the predictors that partition the data. Within these partitions, a model is used to predict the outcome. Regression trees and regression model trees are basic partitioning models and are covered in Sections 8.1 and 8.2, respectively. In Section 8.3, we present rule-based models, which are models governed by if-then conditions (possibly created by a tree) that have been collapsed into independent conditions. Rules can be simplified or pruned in a way that samples are covered by multiple rules. Ensemble methods combine many trees (or rule-based models) into one model and tend to have much better predictive performance than single tree- or rule-based model. Popular ensemble techniques are bagging (Section 8.4), random forests (Section 8.5), boosting (Section 8.6), and Cubist (Section 8.7). In the Computing Section (8.8), we demonstrate how to train each of these models in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Nonlinear Regression Models

Nächstes Kapitel A Summary of Solubility Models

Also, note that the first three splits here involve the same predictors as the regression tree shown in Fig. 8.4 (and two of the three split values are identical).

We are indebted to the work of Chris Keefer, who extensively studied the Cubist source code.

Amit Y, Geman D (1997). “Shape Quantization and Recognition with Randomized Trees.” Neural Computation, 9, 1545–1588.CrossRef

Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). “Tissue Classification with Gene Expression Profiles.” Journal of Computational Biology, 7(3), 559–583.CrossRef

Bergstra J, Casagrande N, Erhan D, Eck D, Kégl B (2006). “Aggregate Features and AdaBoost for Music Classification.” Machine Learning, 65, 473–484.CrossRef

Breiman L (1996a). “Bagging Predictors.” Machine Learning, 24(2), 123–140.MathSciNetMATH

Breiman L (1996b). “Heuristics of Instability and Stabilization in Model Selection.” The Annals of Statistics, 24(6), 2350–2383.MathSciNetCrossRefMATH

Breiman L (2000). “Randomizing Outputs to Increase Prediction Accuracy.” Mach. Learn., 40, 229–242. ISSN 0885-6125.

Breiman L (2001). “Random Forests.” Machine Learning, 45, 5–32.CrossRefMATH

Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and Regression Trees. Chapman and Hall, New York.MATH

Carolin C, Boulesteix A, Augustin T (2007). “Unbiased Split Selection for Classification Trees Based on the Gini Index.” Computational Statistics & Data Analysis, 52(1), 483–501.MathSciNetCrossRefMATH

Dietterich T (2000). “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.” Machine Learning, 40, 139–158.CrossRef

Dudoit S, Fridlyand J, Speed T (2002). “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association, 97(457), 77–87.MathSciNetCrossRefMATH

Freund Y (1995). “Boosting a Weak Learning Algorithm by Majority.” Information and Computation, 121, 256–285.MathSciNetCrossRefMATH

Friedman J (2001). “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 29(5), 1189–1232.MathSciNetCrossRefMATH

Friedman J (2002). “Stochastic Gradient Boosting.” Computational Statistics and Data Analysis, 38(4), 367–378.MathSciNetCrossRefMATH

Friedman J, Hastie T, Tibshirani R (2000). “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics, 38, 337–374.MathSciNetCrossRefMATH

Hastie T, Pregibon D (1990). “Shrinking Trees.” Technical report, AT&T Bell Laboratories Technical Report.

Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.

Ho T (1998). “The Random Subspace Method for Constructing Decision Forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 340–354.

Holmes G, Hall M, Frank E (1993). “Generating Rule Sets from Model Trees.” In “Australian Joint Conference on Artificial Intelligence,”.

Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.MathSciNetCrossRef

Kearns M, Valiant L (1989). “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.” In “Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing,”.

Loh WY (2002). “Regression Trees With Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12, 361–386.MathSciNetMATH

Loh WY (2010). “Tree–Structured Classifiers.” Wiley Interdisciplinary Reviews: Computational Statistics, 2, 364–369.CrossRef

Loh WY, Shih YS (1997). “Split Selection Methods for Classification Trees.” Statistica Sinica, 7, 815–840.MathSciNetMATH

Quinlan R (1987). “Simplifying Decision Trees.” International Journal of Man–Machine Studies, 27(3), 221–234.CrossRef

Quinlan R (1992). “Learning with Continuous Classes.” Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, pp. 343–348.

Quinlan R (1993a). “Combining Instance–Based and Model–Based Learning.” Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243.

Quinlan R (1993b). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.

Ridgeway G (2007). “Generalized Boosted Models: A Guide to the gbm Package.” URL http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf.

Schapire R (1990). “The Strength of Weak Learnability.” Machine Learning, 45, 197–227.

Schapire YFR (1999). “Adaptive Game Playing Using Multiplicative Weights.” Games and Economic Behavior, 29, 79–103.MathSciNetCrossRefMATH

Strobl C, Boulesteix A, Zeileis A, Hothorn T (2007). “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics, 8(1), 25.CrossRef

Valiant L (1984). “A Theory of the Learnable.” Communications of the ACM, 27, 1134–1142.CrossRefMATH

Varmuza K, He P, Fang K (2003). “Boosting Applied to Classification of Mass Spectral Data.” Journal of Data Science, 1, 391–404.

Wang Y, Witten I (1997). “Inducing Model Trees for Continuous Classes.” Proceedings of the Ninth European Conference on Machine Learning, pp. 128–137.

Westfall P, Young S (1993). Resampling–Based Multiple Testing: Examples and Methods for P–Value Adjustment. Wiley.

Zeileis A, Hothorn T, Hornik K (2008). “Model–Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514.MathSciNetCrossRef

Titel: Regression Trees and Rule-Based Models
verfasst von: Max Kuhn
Kjell Johnson
Verlag: Springer New York
Buch: Applied Predictive Modeling
Print ISBN: 978-1-4614-6848-6

Electronic ISBN: 978-1-4614-6849-3

Copyright-Jahr: 2013
DOI: https://doi.org/10.1007/978-1-4614-6849-3_8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"