Skip to main content

25.06.2024

Combining LASSO-type Methods with a Smooth Transition Random Forest

verfasst von: Alexandre L. D. Gandini, Flavio A. Ziegelmann

Erschienen in: Annals of Data Science

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, we propose a novel hybrid method for the estimation of regression models, which is based on a combination of LASSO-type methods and smooth transition (STR) random forests. Tree-based regression models are known for their flexibility and skills to learn even very nonlinear patterns. The STR-Tree model introduces smoothness into traditional splitting nodes, leading to a non-binary labeling, which can be interpreted as a group membership degree for each observation. Our approach involves two steps. First, we fit a penalized linear regression using LASSO-type methods. Then, we estimate an STR random forest on the residuals from the first step, using the original covariates. This dual-step process allows us to capture any significant linear relationships in the data generating process through a parametric approach, and then addresses nonlinearities with a flexible model. We conducted numerical studies with both simulated and real data to demonstrate our method’s effectiveness. Our findings indicate that our proposal offers superior predictive power, particularly in datasets with both linear and nonlinear characteristics, when compared to traditional benchmarks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
\(\text {RMSRE} = \sqrt{ \frac{1}{n} \sum _{i=1}^{n} \Big ( \frac{\hat{y_i}-y_i}{y_i} \Big )^2 }\) [46].
 
2
We use the Scikit-Learn library implementations (https://​scikit-learn.​org/​), except for the BooST model https://​github.​com/​gabrielrvsc/​BooST.
 
3
https://​github.​com/​alexgand/​adalasso_​STR_​RF. The code for the STR-tree model was partially modified from the BooST repository.
 
4
In our simulations, adaLASSO+STR RF was five to ten times faster than BooST.
 
Literatur
2.
Zurück zum Zitat Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am Stat Assoc 58:415–434CrossRef Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am Stat Assoc 58:415–434CrossRef
9.
Zurück zum Zitat Irsoy O, Yildiz OT, Alpaydin E (2012) Soft decision trees. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), pp 1819–1822 Irsoy O, Yildiz OT, Alpaydin E (2012) Soft decision trees. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), pp 1819–1822
13.
Zurück zum Zitat Alkhoury S, Devijver E, Clausel M, Tami M, Gaussier E, Oppenheim G (2020) Smooth and consistent probabilistic regression trees. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates Inc, Red Hook, pp 11345–11355 Alkhoury S, Devijver E, Clausel M, Tami M, Gaussier E, Oppenheim G (2020) Smooth and consistent probabilistic regression trees. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates Inc, Red Hook, pp 11345–11355
14.
Zurück zum Zitat Yildiz OT, Irsoy O, Alpaydin E (2016) Bagging soft decision trees. In: Machine learning for health informatics Yildiz OT, Irsoy O, Alpaydin E (2016) Bagging soft decision trees. In: Machine learning for health informatics
15.
Zurück zum Zitat Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Proceedings of the 26th international conference on neural information processing systems—volume 1. NIPS’13. Curran Associates Inc., Red Hook, pp. 431–439 Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Proceedings of the 26th international conference on neural information processing systems—volume 1. NIPS’13. Curran Associates Inc., Red Hook, pp. 431–439
16.
Zurück zum Zitat Wehenkel L (1997) Discretization of continuous attributes for supervised learning: variance evaluation and variance reduction. In: Proceedings of the International Fuzzy Systems Association World Congress IFSA, vol 97, pp 381–388 Wehenkel L (1997) Discretization of continuous attributes for supervised learning: variance evaluation and variance reduction. In: Proceedings of the International Fuzzy Systems Association World Congress IFSA, vol 97, pp 381–388
31.
33.
Zurück zum Zitat Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101(474):578–590CrossRef Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101(474):578–590CrossRef
36.
Zurück zum Zitat Tien JM (2017) Internet of Things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef Tien JM (2017) Internet of Things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef
40.
Zurück zum Zitat Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7(90):2541–2563 Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7(90):2541–2563
47.
Zurück zum Zitat Olson DL, Shi Y (2017) Introduction to business data mining. McGraw-Hill/Irwin, New York Olson DL, Shi Y (2017) Introduction to business data mining. McGraw-Hill/Irwin, New York
53.
Zurück zum Zitat Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830 Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Metadaten
Titel
Combining LASSO-type Methods with a Smooth Transition Random Forest
verfasst von
Alexandre L. D. Gandini
Flavio A. Ziegelmann
Publikationsdatum
25.06.2024
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-024-00541-4