Skip to main content
Erschienen in: Knowledge and Information Systems 2/2015

01.08.2015 | Regular Paper

Bounds on the moments for an ensemble of random decision trees

verfasst von: Amit Dhurandhar

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

An ensemble of random decision trees is a popular classification technique, especially known for its ability to scale to large domains. In this paper, we provide an efficient strategy to compute bounds on the moments of the generalization error computed over all datasets of a particular size drawn from an underlying distribution, for this classification technique. Being able to estimate these moments can help us gain insights into the performance of this model. As we will see in the experimental section, these bounds tend to be significantly tighter than the state-of-the-art Breiman’s bounds based on strength and correlation and hence more useful in practice.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
These probabilities and \( P\left[ Y(x)\!\ne \!y \right] \) are conditioned on \(x\). We omit explicitly writing the conditional since it improves readability and is obvious from the context.
 
2
For further details refer to [3] and [5].
 
3
This is after splitting the continuous attributes.
 
4
Partitioned into 3 categories high, medium and low.
 
Literatur
1.
Zurück zum Zitat Anandkumar A, Foster D, Hsu D, Kakade S, Liu Y (2012) A spectral algorithm for latent dirichlet allocation. In: NIPS. Lake Tahoe, USA, pp 926–934 Anandkumar A, Foster D, Hsu D, Kakade S, Liu Y (2012) A spectral algorithm for latent dirichlet allocation. In: NIPS. Lake Tahoe, USA, pp 926–934
2.
Zurück zum Zitat Boots B, Gordon G (2012) Two manifold problems with applications to nonlinear system identification. In: ICML. Edinburgh, Scotland, UK, p 338 Boots B, Gordon G (2012) Two manifold problems with applications to nonlinear system identification. In: ICML. Edinburgh, Scotland, UK, p 338
4.
Zurück zum Zitat Bshouty N, Long P (2010) Finding planted partitions in nearly linear time using arrested spectral clustering. In: ICML. Haifa, Israel, pp 135–142 Bshouty N, Long P (2010) Finding planted partitions in nearly linear time using arrested spectral clustering. In: ICML. Haifa, Israel, pp 135–142
5.
Zurück zum Zitat Buttrey S, Kobayashi I (2003) On strength and correlation in random forests. In : Proceedings of the 2003 joint statistical meetings, section on statistical computing Buttrey S, Kobayashi I (2003) On strength and correlation in random forests. In : Proceedings of the 2003 joint statistical meetings, section on statistical computing
7.
Zurück zum Zitat Dhurandhar A, Dobra A (2008) Probabilistic characterization of random decision trees. J Mach Learn Res 9:2321–2348 Dhurandhar A, Dobra A (2008) Probabilistic characterization of random decision trees. J Mach Learn Res 9:2321–2348
8.
Zurück zum Zitat Dhurandhar A, Dobra A (2009) Semi-analytical method for analyzing models and model selection measures based on moment analysis. ACM Trans Knowl Discov Data Min Dhurandhar A, Dobra A (2009) Semi-analytical method for analyzing models and model selection measures based on moment analysis. ACM Trans Knowl Discov Data Min
9.
Zurück zum Zitat Dhurandhar A, Dobra A (2012) Distribution free bounds for relational classification. Knowl Inf Syst Dhurandhar A, Dobra A (2012) Distribution free bounds for relational classification. Knowl Inf Syst
10.
Zurück zum Zitat Dhurandhar A, Dobra A (2012) Probabilistic characterization of nearest neighbor classifiers. Int J Mach Learn Cybern Dhurandhar A, Dobra A (2012) Probabilistic characterization of nearest neighbor classifiers. Int J Mach Learn Cybern
11.
Zurück zum Zitat Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York, p 654 Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York, p 654
12.
Zurück zum Zitat Fan W, Wang H, Yu PS, Ma S (2003) Is random model better? On its accuracy and efficiency. In: ICDM ’03: proceedings of the third IEEE international conference on data mining, IEEE Computer Society, Washington, DC, USA, pp 51–58 Fan W, Wang H, Yu PS, Ma S (2003) Is random model better? On its accuracy and efficiency. In: ICDM ’03: proceedings of the third IEEE international conference on data mining, IEEE Computer Society, Washington, DC, USA, pp 51–58
13.
Zurück zum Zitat Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42CrossRef Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42CrossRef
14.
Zurück zum Zitat Hastie T, Tibshirani R, Friedman J (2001) Elements of statistical learning, 2nd edn. Springer, BerlinCrossRef Hastie T, Tibshirani R, Friedman J (2001) Elements of statistical learning, 2nd edn. Springer, BerlinCrossRef
15.
Zurück zum Zitat Langford John (December 2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6:273–306 Langford John (December 2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6:273–306
16.
Zurück zum Zitat Liu F, Ting K, Fan W (2005) Maximizing tree diversity by building complete-random decision trees. In: PAKDD, pp 605–610 Liu F, Ting K, Fan W (2005) Maximizing tree diversity by building complete-random decision trees. In: PAKDD, pp 605–610
17.
Zurück zum Zitat McAllester D (1999) Pac-bayesian model averaging. In: Proceedings of the twelfth annual conference on computational learning theory. ACM Press, pp 164–170 McAllester D (1999) Pac-bayesian model averaging. In: Proceedings of the twelfth annual conference on computational learning theory. ACM Press, pp 164–170
18.
Zurück zum Zitat Mcallester D (2003) Simplified pac-bayesian margin bounds. In COLT, pp 203–215 Mcallester D (2003) Simplified pac-bayesian margin bounds. In COLT, pp 203–215
19.
Zurück zum Zitat Roy S, Bose R (1953) Simultaneous confidence interval estimation. Ann Math Stat 24(3):513–536CrossRef Roy S, Bose R (1953) Simultaneous confidence interval estimation. Ann Math Stat 24(3):513–536CrossRef
20.
Zurück zum Zitat Sison C, Glaz J (1995) Simultaneous confidence intervals and sample size determination for multinomial proportions. JASA 90(429):366–369CrossRef Sison C, Glaz J (1995) Simultaneous confidence intervals and sample size determination for multinomial proportions. JASA 90(429):366–369CrossRef
21.
Zurück zum Zitat Tong Y (1980) Probabilistic inequalities for multivariate distributions, 1st edn. Academic Press, Waltham Tong Y (1980) Probabilistic inequalities for multivariate distributions, 1st edn. Academic Press, Waltham
22.
Zurück zum Zitat Zhang K, Fan W (2008) Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowl Inf Syst 14(3):299–326CrossRef Zhang K, Fan W (2008) Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowl Inf Syst 14(3):299–326CrossRef
23.
Zurück zum Zitat Zhang X, Yuan Q, Zhao S, Fan W, Zheng W, Wang Z (2010) Multi-label classification without the multi-label cost. In: SDM ’10: proceedings of the siam conference on data mining, pp 778–789 Zhang X, Yuan Q, Zhao S, Fan W, Zheng W, Wang Z (2010) Multi-label classification without the multi-label cost. In: SDM ’10: proceedings of the siam conference on data mining, pp 778–789
Metadaten
Titel
Bounds on the moments for an ensemble of random decision trees
verfasst von
Amit Dhurandhar
Publikationsdatum
01.08.2015
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0768-5

Weitere Artikel der Ausgabe 2/2015

Knowledge and Information Systems 2/2015 Zur Ausgabe