Skip to main content

2015 | OriginalPaper | Buchkapitel

5. HEAD-DT: Experimental Analysis

verfasst von : Rodrigo C. Barros, André C. P. L. F. de Carvalho, Alex A. Freitas

Erschienen in: Automatic Design of Decision-Tree Induction Algorithms

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this chapter, we present several empirical analyses that assess the performance of HEAD-DT in different scenarios. We divide these analyses into two sets of experiments, according to the meta-training strategy employed for automatically designing the decision-tree algorithms. As mentioned in Chap. 4, HEAD-DT can operate in two distinct frameworks: (i) evolving a decision-tree induction algorithm tailored to one specific data set (specific framework); or (ii) evolving a decision-tree induction algorithm from multiple data sets (general framework). The specific framework provides data from a single data set to HEAD-DT for both algorithm design (evolution) and performance assessment. The experiments conducted for this scenario (see Sect. 5.1) make use of public data sets that do not share a common application domain. In the general framework, distinct data sets are used for algorithm design and performance assessment. In this scenario (see Sect. 5.2), we conduct two types of experiments, namely the homogeneous approach and the heterogeneous approach. In the homogeneous approach, we analyse whether automatically designing a decision-tree algorithm for a particular domain provides good results. More specifically, the data sets that feed HEAD-DT during evolution, and also those employed for performance assessment, share a common application domain. In the heterogeneous approach, we investigate whether HEAD-DT is capable of generating an algorithm that performs well across a variety of different data sets, regardless of their particular characteristics or application domain. We also discuss about the theoretic and empirical time complexity of HEAD-DT in Sect. 5.3, and we make a brief discussion on the cost-effectiveness of automated algorithm design in Sect. 5.4. We present examples of algorithms which were automatically designed by HEAD-DT in Sect. 5.5. We conclude the experimental analysis by empirically verifying in Sect. 5.6 whether the genetic search is worthwhile.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
4
The term overfitting is not used because it refers to a model that overfits the data, whereas we are talking about the case of an algorithm that “overfits” the data, in the sense that it is excellent when dealing with those data sets it was designed to, but it underperforms in previously unseen data sets.
 
Literatur
1.
Zurück zum Zitat R.C. Barros et al., Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data, in BMC Bioinformatics 13 (2012) R.C. Barros et al., Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data, in BMC Bioinformatics 13 (2012)
2.
Zurück zum Zitat R.C. Barros et al., Towards the automatic design of decision tree induction algorithms, in 13th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO 2011). pp. 567–574 (2011) R.C. Barros et al., Towards the automatic design of decision tree induction algorithms, in 13th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO 2011). pp. 567–574 (2011)
3.
Zurück zum Zitat M.P. Basgalupp et al., Software effort prediction: a hyper-heuristic decision-tree based approach, in 28th Annual ACM Symposium on Applied Computing. pp. 1109–1116 (2013) M.P. Basgalupp et al., Software effort prediction: a hyper-heuristic decision-tree based approach, in 28th Annual ACM Symposium on Applied Computing. pp. 1109–1116 (2013)
4.
Zurück zum Zitat L. Breiman et al., Classification and Regression Trees (Wadsworth, Belmont, 1984)MATH L. Breiman et al., Classification and Regression Trees (Wadsworth, Belmont, 1984)MATH
5.
Zurück zum Zitat B. Chandra, R. Kothari, P. Paul, A new node splitting measure for decision tree construction. Pattern Recognit. 43(8), 2725–2731 (2010)CrossRefMATH B. Chandra, R. Kothari, P. Paul, A new node splitting measure for decision tree construction. Pattern Recognit. 43(8), 2725–2731 (2010)CrossRefMATH
6.
Zurück zum Zitat B. Chandra, P.P. Varghese, Moving towards efficient decision tree construction. Inf. Sci. 179(8), 1059–1069 (2009)CrossRefMATH B. Chandra, P.P. Varghese, Moving towards efficient decision tree construction. Inf. Sci. 179(8), 1059–1069 (2009)CrossRefMATH
7.
Zurück zum Zitat J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). ISSN: 1532–4435MATHMathSciNet J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). ISSN: 1532–4435MATHMathSciNet
8.
Zurück zum Zitat A. Frank, A. Asuncion, UCI Machine Learning Repository (2010) A. Frank, A. Asuncion, UCI Machine Learning Repository (2010)
9.
Zurück zum Zitat R. Iman, J. Davenport, Approximations of the critical region of the Friedman statistic, in Communications in Statistics, pp. 571–595 (1980) R. Iman, J. Davenport, Approximations of the critical region of the Friedman statistic, in Communications in Statistics, pp. 571–595 (1980)
10.
Zurück zum Zitat S. Monti et al., Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)CrossRefMATH S. Monti et al., Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)CrossRefMATH
11.
Zurück zum Zitat J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, San Francisco, 1993). ISBN: 1-55860-238-0 J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, San Francisco, 1993). ISBN: 1-55860-238-0
12.
Zurück zum Zitat M. Souto et al., Clustering cancer gene expression data: a comparative study. BMC Bioinform. 9(1), 497 (2008)CrossRef M. Souto et al., Clustering cancer gene expression data: a comparative study. BMC Bioinform. 9(1), 497 (2008)CrossRef
13.
Zurück zum Zitat F. Wilcoxon, Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)CrossRef F. Wilcoxon, Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)CrossRef
14.
Zurück zum Zitat I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, San Francisco, 1999). ISBN: 1558605525 I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, San Francisco, 1999). ISBN: 1558605525
Metadaten
Titel
HEAD-DT: Experimental Analysis
verfasst von
Rodrigo C. Barros
André C. P. L. F. de Carvalho
Alex A. Freitas
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-14231-9_5