Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2012

01.12.2012

What influences the accuracy of decision tree ensembles?

verfasst von: Graeme Richards, Wenjia Wang

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

An ensemble in machine learning is defined as a set of models (such as classifiers or predictors) that are induced individually from data by using one or more machine learning algorithms for a given task and then work collectively in the hope of generating improved decisions. In this paper we investigate the factors that influence ensemble performance, which mainly include accuracy of individual classifiers, diversity between classifiers, the number of classifiers in an ensemble and the decision fusion strategy. Among them, diversity is believed to be a key factor but more complex and difficult to be measured quantitatively, and it was thus chosen as the focus of this study, together with the relationships between the other factors. A technique was devised to build ensembles with decision trees that are induced with randomly selected features. Three sets of experiments were performed using 12 benchmark datasets, and the results indicate that (i) a high level of diversity indeed makes an ensemble more accurate and robust compared with individual models; (ii) small ensembles can produce results as good as, or better than, large ensembles provided the appropriate (e.g. more diverse) models are selected for the inclusion. This has implications that for scaling up to larger databases the increased efficiency of smaller ensembles becomes more significant and beneficial. As a test case study, ensembles are built based on these findings for a real world application—osteoporosis classification, and found that, in each case of three datasets used, the ensembles out-performed individual decision trees consistently and reliably.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bian, S., & Wang, W. (2007). On diversity and accuracy of homogeneous and heterogeneous ensembles. International Journal of Hybrid Intelligent Systems, 4(2), 103–128.MATH Bian, S., & Wang, W. (2007). On diversity and accuracy of homogeneous and heterogeneous ensembles. International Journal of Hybrid Intelligent Systems, 4(2), 103–128.MATH
Zurück zum Zitat Chan, P., & Stolfo, S. (1997). On the accuracy of meta-learning for scalable data mining. International Journal of Intelligent Information Systems, 8(1), 5–28.CrossRef Chan, P., & Stolfo, S. (1997). On the accuracy of meta-learning for scalable data mining. International Journal of Intelligent Information Systems, 8(1), 5–28.CrossRef
Zurück zum Zitat Dietterich, T. (2000). Ensemble methods in machine learning. In Multiple classifier systems, Cagliari, Italy (pp. 1–15). Dietterich, T. (2000). Ensemble methods in machine learning. In Multiple classifier systems, Cagliari, Italy (pp. 1–15).
Zurück zum Zitat Eckhardt, D., & Lee, L. (1985). A theoretical basis for the analysis of multiversion software subject to coincident errors. IEEE Transactions on Software Engineering, 11(12), 1511–1517.MATHCrossRef Eckhardt, D., & Lee, L. (1985). A theoretical basis for the analysis of multiversion software subject to coincident errors. IEEE Transactions on Software Engineering, 11(12), 1511–1517.MATHCrossRef
Zurück zum Zitat Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In 13th international conference on machine learning (pp. 148–156). Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In 13th international conference on machine learning (pp. 148–156).
Zurück zum Zitat Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.CrossRef Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.CrossRef
Zurück zum Zitat Giacinto, G., & Roli, F. (2001). An approach to the automatic design of multiple classifier systems. Pattern Recognition Letters, 22(1), 25–33.MATHCrossRef Giacinto, G., & Roli, F. (2001). An approach to the automatic design of multiple classifier systems. Pattern Recognition Letters, 22(1), 25–33.MATHCrossRef
Zurück zum Zitat Guile, G., & Wang, W. (2008). Relationships between depth of decision tree and boosting performance. In IEEE IJCNN08 (pp. 2268–2275). Guile, G., & Wang, W. (2008). Relationships between depth of decision tree and boosting performance. In IEEE IJCNN08 (pp. 2268–2275).
Zurück zum Zitat Hansen, L., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 993–1001.CrossRef Hansen, L., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 993–1001.CrossRef
Zurück zum Zitat Ho, T., Hull, J., & Sargur, S. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1), 66–75.CrossRef Ho, T., Hull, J., & Sargur, S. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1), 66–75.CrossRef
Zurück zum Zitat Kuncheva, L., & Whitaker, J. (2003). Measures of diversity in classifier ensembles and their relationships with the ensemble accuracy. Machine Learning, 51(2), 181–207.MATHCrossRef Kuncheva, L., & Whitaker, J. (2003). Measures of diversity in classifier ensembles and their relationships with the ensemble accuracy. Machine Learning, 51(2), 181–207.MATHCrossRef
Zurück zum Zitat Optiz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169–198. Optiz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169–198.
Zurück zum Zitat Partridge, D., & Yates, W. (1996). Engineering multi-version neural-net systems. Neural Computation, 8(4), 869–893.CrossRef Partridge, D., & Yates, W. (1996). Engineering multi-version neural-net systems. Neural Computation, 8(4), 869–893.CrossRef
Zurück zum Zitat Partridge, D., & Krzanowski, W. (1997). Distinct failure diversity in multiversion software. Technical report 348, Department of Computer Science, Exeter University. Partridge, D., & Krzanowski, W. (1997). Distinct failure diversity in multiversion software. Technical report 348, Department of Computer Science, Exeter University.
Zurück zum Zitat Quinlan, J. (1992). C4.5 programs for machine learning. Morgan Kaufmann. Quinlan, J. (1992). C4.5 programs for machine learning. Morgan Kaufmann.
Zurück zum Zitat Richards, G., & Wang, W. (2006). Empirical investigations on characteristics of ensemble and diversity. In IEEE IJCNN06 (pp. 5140–5147). Richards, G., & Wang, W. (2006). Empirical investigations on characteristics of ensemble and diversity. In IEEE IJCNN06 (pp. 5140–5147).
Zurück zum Zitat Ruta, D., & Gabrys, B. (2005). Classifier selection for majority voting. Information Fusion, 6, 63–81.CrossRef Ruta, D., & Gabrys, B. (2005). Classifier selection for majority voting. Information Fusion, 6, 63–81.CrossRef
Zurück zum Zitat Wang, W. (2008). Some fundamental issues in ensemble methods. In IEEE IJCNN08 (pp. 2243–2250). Wang, W. (2008). Some fundamental issues in ensemble methods. In IEEE IJCNN08 (pp. 2243–2250).
Zurück zum Zitat Wang, W., & Partridge, D. (1998). Multi-version neural network systems. NEURAP 98, 351–357. Wang, W., & Partridge, D. (1998). Multi-version neural network systems. NEURAP 98, 351–357.
Zurück zum Zitat Wang, W., Jones, P., & Partridge, D. (2000). Diversity between neural networks and decision trees for building multiple classifier systems. In Multiple classifier systems (pp. 240–249). Wang, W., Jones, P., & Partridge, D. (2000). Diversity between neural networks and decision trees for building multiple classifier systems. In Multiple classifier systems (pp. 240–249).
Zurück zum Zitat Wang W., Partridge, D., & Etherington, J. (2001). Hybrid ensembles and coincident-failure diversity. In IEEE IJCNN01 (pp. 2376–2381). Wang W., Partridge, D., & Etherington, J. (2001). Hybrid ensembles and coincident-failure diversity. In IEEE IJCNN01 (pp. 2376–2381).
Zurück zum Zitat Wang, W., Richards, G., & Rae, S. (2005). Hybrid data mining ensemble for predicting osteoporosis risk. In 27th int. conf. on engineering in medicine and biology (pp. 886–889). Wang, W., Richards, G., & Rae, S. (2005). Hybrid data mining ensemble for predicting osteoporosis risk. In 27th int. conf. on engineering in medicine and biology (pp. 886–889).
Metadaten
Titel
What influences the accuracy of decision tree ensembles?
verfasst von
Graeme Richards
Wenjia Wang
Publikationsdatum
01.12.2012
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2012
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-012-0206-7

Weitere Artikel der Ausgabe 3/2012

Journal of Intelligent Information Systems 3/2012 Zur Ausgabe