Skip to main content

2017 | OriginalPaper | Buchkapitel

Generalising Random Forest Parameter Optimisation to Include Stability and Cost

verfasst von : C. H. Bryan Liu, Benjamin Paul Chamberlain, Duncan A. Little, Ângelo Cardoso

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest parameters for commercial applications. We propose a novel metric that captures the stability of random forest predictions, which we argue is key for scenarios that require successive predictions. We motivate the need for multi-criteria optimization by showing that in practical applications, simply choosing the parameters that lead to the lowest error can introduce unnecessary costs and produce predictions that are not stable across independent runs. To optimize this multi-criteria trade-off, we present a new framework that efficiently finds a principled balance between these three considerations using Bayesian optimisation. The pitfalls of optimising forest parameters purely for error reduction are demonstrated using two publicly available real world datasets. We show that our framework leads to parameter settings that are markedly different from the values discovered by error reduction metrics alone.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A full derivation is available at our GitHub repository https://​github.​com/​liuchbryan/​generalised_​forest_​tuning.
 
2
This could be any distribution as long as its first two moments are finite, which is usually the case in practice as predictions are normally bounded.
 
Literatur
1.
Zurück zum Zitat Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MATH Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MATH
3.
Zurück zum Zitat Chamberlain, B.P., Cardoso, A., Liu, C.H.B., Pagliari, R., Deisenroth, M.P.: Customer lifetime value prediction using embeddings. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1753–1762 (2017) Chamberlain, B.P., Cardoso, A., Liu, C.H.B., Pagliari, R., Deisenroth, M.P.: Customer lifetime value prediction using embeddings. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1753–1762 (2017)
4.
Zurück zum Zitat Criminisi, A.: Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends® Comput. Graph. Vis. 7(2–3), 81–227 (2012)MATH Criminisi, A.: Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends® Comput. Graph. Vis. 7(2–3), 81–227 (2012)MATH
5.
Zurück zum Zitat Elisseeff, A., Evgeniou, T., Pontil, M.: Stability of randomized learning algorithms. J. Mach. Learn. Res. 6(1), 55–79 (2005)MathSciNetMATH Elisseeff, A., Evgeniou, T., Pontil, M.: Stability of randomized learning algorithms. J. Mach. Learn. Res. 6(1), 55–79 (2005)MathSciNetMATH
6.
Zurück zum Zitat Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., Amorim Fernández-Delgado, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)MathSciNetMATH Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., Amorim Fernández-Delgado, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)MathSciNetMATH
7.
Zurück zum Zitat Hoffman, M.W., Shahriari, R.: Modular mechanisms for Bayesian optimization. In: NIPS Workshop on Bayesian Optimization (2014) Hoffman, M.W., Shahriari, R.: Modular mechanisms for Bayesian optimization. In: NIPS Workshop on Bayesian Optimization (2014)
8.
Zurück zum Zitat Huang, B.F.F., Boutros, P.C.: The parameter sensitivity of random forests. BMC Bioinform. 17(1), 331 (2016)CrossRef Huang, B.F.F., Boutros, P.C.: The parameter sensitivity of random forests. BMC Bioinform. 17(1), 331 (2016)CrossRef
9.
Zurück zum Zitat Khalilia, M., Chakraborty, S., Popescu, M.: Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inf. Dec. Making 11(1), 51 (2011)CrossRef Khalilia, M., Chakraborty, S., Popescu, M.: Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inf. Dec. Making 11(1), 51 (2011)CrossRef
10.
Zurück zum Zitat Kushner, H.J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Basic Eng. 86(1), 97–106 (1964)CrossRef Kushner, H.J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Basic Eng. 86(1), 97–106 (1964)CrossRef
11.
Zurück zum Zitat Martinez-Cantin, R.: BayesOpt: a bayesian optimization library for nonlinear optimization, experimental design and bandits. J. Mach. Learn. Res. 15, 3735–3739 (2014)MathSciNetMATH Martinez-Cantin, R.: BayesOpt: a bayesian optimization library for nonlinear optimization, experimental design and bandits. J. Mach. Learn. Res. 15, 3735–3739 (2014)MathSciNetMATH
13.
Zurück zum Zitat Snoek, J., Larochelle, H., Adams, R.: Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012) Snoek, J., Larochelle, H., Adams, R.: Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
14.
Zurück zum Zitat Tamaddoni, A., Stakhovych, S., Ewing, M.: Comparing churn prediction techniques and assessing their performance: a contingent perspective. J. Serv. Res. 19(2), 123–141 (2016)CrossRef Tamaddoni, A., Stakhovych, S., Ewing, M.: Comparing churn prediction techniques and assessing their performance: a contingent perspective. J. Serv. Res. 19(2), 123–141 (2016)CrossRef
15.
Zurück zum Zitat Vanderveld, A., Pandey, A., Han, A., Parekh, R.: An engagement-based customer lifetime value system for e-commerce. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 293–302 (2016) Vanderveld, A., Pandey, A., Han, A., Parekh, R.: An engagement-based customer lifetime value system for e-commerce. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 293–302 (2016)
Metadaten
Titel
Generalising Random Forest Parameter Optimisation to Include Stability and Cost
verfasst von
C. H. Bryan Liu
Benjamin Paul Chamberlain
Duncan A. Little
Ângelo Cardoso
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-71273-4_9