Skip to main content
Top

25-06-2024

Combining LASSO-type Methods with a Smooth Transition Random Forest

Authors: Alexandre L. D. Gandini, Flavio A. Ziegelmann

Published in: Annals of Data Science

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this work, we propose a novel hybrid method for the estimation of regression models, which is based on a combination of LASSO-type methods and smooth transition (STR) random forests. Tree-based regression models are known for their flexibility and skills to learn even very nonlinear patterns. The STR-Tree model introduces smoothness into traditional splitting nodes, leading to a non-binary labeling, which can be interpreted as a group membership degree for each observation. Our approach involves two steps. First, we fit a penalized linear regression using LASSO-type methods. Then, we estimate an STR random forest on the residuals from the first step, using the original covariates. This dual-step process allows us to capture any significant linear relationships in the data generating process through a parametric approach, and then addresses nonlinearities with a flexible model. We conducted numerical studies with both simulated and real data to demonstrate our method’s effectiveness. Our findings indicate that our proposal offers superior predictive power, particularly in datasets with both linear and nonlinear characteristics, when compared to traditional benchmarks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
\(\text {RMSRE} = \sqrt{ \frac{1}{n} \sum _{i=1}^{n} \Big ( \frac{\hat{y_i}-y_i}{y_i} \Big )^2 }\) [46].
 
2
We use the Scikit-Learn library implementations (https://​scikit-learn.​org/​), except for the BooST model https://​github.​com/​gabrielrvsc/​BooST.
 
3
https://​github.​com/​alexgand/​adalasso_​STR_​RF. The code for the STR-tree model was partially modified from the BooST repository.
 
4
In our simulations, adaLASSO+STR RF was five to ten times faster than BooST.
 
Literature
2.
go back to reference Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am Stat Assoc 58:415–434CrossRef Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am Stat Assoc 58:415–434CrossRef
9.
go back to reference Irsoy O, Yildiz OT, Alpaydin E (2012) Soft decision trees. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), pp 1819–1822 Irsoy O, Yildiz OT, Alpaydin E (2012) Soft decision trees. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), pp 1819–1822
13.
go back to reference Alkhoury S, Devijver E, Clausel M, Tami M, Gaussier E, Oppenheim G (2020) Smooth and consistent probabilistic regression trees. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates Inc, Red Hook, pp 11345–11355 Alkhoury S, Devijver E, Clausel M, Tami M, Gaussier E, Oppenheim G (2020) Smooth and consistent probabilistic regression trees. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates Inc, Red Hook, pp 11345–11355
14.
go back to reference Yildiz OT, Irsoy O, Alpaydin E (2016) Bagging soft decision trees. In: Machine learning for health informatics Yildiz OT, Irsoy O, Alpaydin E (2016) Bagging soft decision trees. In: Machine learning for health informatics
15.
go back to reference Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Proceedings of the 26th international conference on neural information processing systems—volume 1. NIPS’13. Curran Associates Inc., Red Hook, pp. 431–439 Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Proceedings of the 26th international conference on neural information processing systems—volume 1. NIPS’13. Curran Associates Inc., Red Hook, pp. 431–439
16.
go back to reference Wehenkel L (1997) Discretization of continuous attributes for supervised learning: variance evaluation and variance reduction. In: Proceedings of the International Fuzzy Systems Association World Congress IFSA, vol 97, pp 381–388 Wehenkel L (1997) Discretization of continuous attributes for supervised learning: variance evaluation and variance reduction. In: Proceedings of the International Fuzzy Systems Association World Congress IFSA, vol 97, pp 381–388
33.
go back to reference Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101(474):578–590CrossRef Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101(474):578–590CrossRef
36.
go back to reference Tien JM (2017) Internet of Things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef Tien JM (2017) Internet of Things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef
40.
go back to reference Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7(90):2541–2563 Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7(90):2541–2563
47.
go back to reference Olson DL, Shi Y (2017) Introduction to business data mining. McGraw-Hill/Irwin, New York Olson DL, Shi Y (2017) Introduction to business data mining. McGraw-Hill/Irwin, New York
53.
go back to reference Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830 Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Metadata
Title
Combining LASSO-type Methods with a Smooth Transition Random Forest
Authors
Alexandre L. D. Gandini
Flavio A. Ziegelmann
Publication date
25-06-2024
Publisher
Springer Berlin Heidelberg
Published in
Annals of Data Science
Print ISSN: 2198-5804
Electronic ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-024-00541-4

Premium Partner