Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 4/2013

01.08.2013 | Original Article

Performance of global–local hybrid ensemble versus boosting and bagging ensembles

verfasst von: Dustin Baumgartner, Gursel Serpen

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This study compares the classification performance of a hybrid ensemble, which is called the global–local hybrid ensemble that employs both local and global learners against data manipulation ensembles including bagging and boosting variants. A comprehensive simulation study is performed on 46 UCI machine learning repository data sets using prediction accuracy and SAR performance metrics and along with rigorous statistical significance tests. Simulation results for comparison of classification performances indicate that global–local hybrid ensemble outperforms or ties with bagging and boosting ensemble variants in all cases. This suggests that the global–local ensemble has a more robust performance profile since its performance is less sensitive to variation with respect to the problem domain, or equivalently the data sets. This performance robustness is realized at the expense of increased complexity of the global–local ensemble since at least two types of learners, e.g. one global and another one local, must be trained. A complementary diversity analysis of global–local hybrid ensemble and base learners used for bagging and boosting ensembles on select data sets in the classifier projection space provides both an explanation and support for the performance related findings of this study.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Aha D, Kibler D (1991) Instance-based learning algorithms. Mach Learn 6:37–66 Aha D, Kibler D (1991) Instance-based learning algorithms. Mach Learn 6:37–66
3.
Zurück zum Zitat Baumgartner D, Serpen G (2009) Large experiment and evaluation tool for WEKA classifiers. In: 5th international conference on data mining. Las Vegas, pp 340–346 Baumgartner D, Serpen G (2009) Large experiment and evaluation tool for WEKA classifiers. In: 5th international conference on data mining. Las Vegas, pp 340–346
4.
Zurück zum Zitat Baumgartner D, Serpen G (2012) A design heuristic for hybrid ensembles. Intell Data Anal 16(2):233–246 Baumgartner D, Serpen G (2012) A design heuristic for hybrid ensembles. Intell Data Anal 16(2):233–246
5.
Zurück zum Zitat Banfield RE, Hall LO, Bowyer KW, Bhadoria D, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180CrossRef Banfield RE, Hall LO, Bowyer KW, Bhadoria D, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180CrossRef
6.
Zurück zum Zitat Battista B, Fumera G, Roli F (2010) Multiple classifier systems for robust classifier design in adversarial environments. Int J Mach Learn Cybern 1:27–41CrossRef Battista B, Fumera G, Roli F (2010) Multiple classifier systems for robust classifier design in adversarial environments. Int J Mach Learn Cybern 1:27–41CrossRef
7.
Zurück zum Zitat Bian S, Wang W (2007) On diversity and accuracy of homogeneous and heterogeneous ensembles. Int J Hybrid Intell Syst 4:103–128MATH Bian S, Wang W (2007) On diversity and accuracy of homogeneous and heterogeneous ensembles. Int J Hybrid Intell Syst 4:103–128MATH
9.
Zurück zum Zitat Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6:5–20CrossRef Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6:5–20CrossRef
10.
Zurück zum Zitat Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: Proceedings of the 21st international conference on machine learning, pp 137–144 Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: Proceedings of the 21st international conference on machine learning, pp 137–144
11.
Zurück zum Zitat Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, pp 69–78 Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, pp 69–78
12.
Zurück zum Zitat Canuto AM, Abreu MC, Oliveira LM, Xavier JC, Santos AM (2007) Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recognit Lett 28:472–486CrossRef Canuto AM, Abreu MC, Oliveira LM, Xavier JC, Santos AM (2007) Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recognit Lett 28:472–486CrossRef
14.
Zurück zum Zitat Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
15.
Zurück zum Zitat Dietterich TG (2000) Ensemble methods in machine learning. Lect Notes Comput Sci 1857:1–15CrossRef Dietterich TG (2000) Ensemble methods in machine learning. Lect Notes Comput Sci 1857:1–15CrossRef
17.
Zurück zum Zitat Dzeroski S, Zenko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54:255–273MATHCrossRef Dzeroski S, Zenko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54:255–273MATHCrossRef
18.
Zurück zum Zitat Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning, pp 148–156 Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning, pp 148–156
19.
Zurück zum Zitat Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11:56–92CrossRef Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11:56–92CrossRef
20.
Zurück zum Zitat Hand DJ, Vinciotti V (2003) Local versus global models for classification problems: fitting models where it matters. Am Stat 57(2):124–131MathSciNetCrossRef Hand DJ, Vinciotti V (2003) Local versus global models for classification problems: fitting models where it matters. Am Stat 57(2):124–131MathSciNetCrossRef
21.
Zurück zum Zitat Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat 571–595 Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat 571–595
22.
Zurück zum Zitat Kotsiantis SB, Pintelas PE (2004) A hybrid decision support tool—using ensemble of classifiers. Int Conf Enterp Inf Syst (ICEIS) 2:448–456 Kotsiantis SB, Pintelas PE (2004) A hybrid decision support tool—using ensemble of classifiers. Int Conf Enterp Inf Syst (ICEIS) 2:448–456
23.
Zurück zum Zitat Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship to ensemble accuracy. Mach Learn 51:181–207MATHCrossRef Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship to ensemble accuracy. Mach Learn 51:181–207MATHCrossRef
24.
Zurück zum Zitat Kuncheva LI (2003) That elusive diversity in classifier ensembles. Lect Notes Comput Sci 2652:1126–1138CrossRef Kuncheva LI (2003) That elusive diversity in classifier ensembles. Lect Notes Comput Sci 2652:1126–1138CrossRef
25.
Zurück zum Zitat Luengo J, Garcia S, Herra F (2007) A study on the use of statistical tests for experimentation with neural networks. In: Proceedings of the 9th international work-conference on artificial neural networks. Lecture notes on computer science, vol 4507, pp 72–79 Luengo J, Garcia S, Herra F (2007) A study on the use of statistical tests for experimentation with neural networks. In: Proceedings of the 9th international work-conference on artificial neural networks. Lecture notes on computer science, vol 4507, pp 72–79
26.
Zurück zum Zitat Mitchell TM (1997) Machine learning. McGraw-Hill, NY Mitchell TM (1997) Machine learning. McGraw-Hill, NY
27.
Zurück zum Zitat Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198MATH Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198MATH
28.
Zurück zum Zitat Ott RL, Longnecker M (2001) An introduction to statistical methods and data analysis, 5th edn. Duxbury, Pacific Grove Ott RL, Longnecker M (2001) An introduction to statistical methods and data analysis, 5th edn. Duxbury, Pacific Grove
29.
Zurück zum Zitat Pekalska E, Duin RP, Skurichina M (2002) A discussion on the classifier projection space for classifier combining. In: Roli F, Kittler J (eds) 3rd international workshop on multiple classifier systems, MCS02, vol 2364. Springer, Cagliari, pp 137–148 Pekalska E, Duin RP, Skurichina M (2002) A discussion on the classifier projection space for classifier combining. In: Roli F, Kittler J (eds) 3rd international workshop on multiple classifier systems, MCS02, vol 2364. Springer, Cagliari, pp 137–148
30.
Zurück zum Zitat Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45CrossRef Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45CrossRef
31.
Zurück zum Zitat Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3):199–215MATHCrossRef Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3):199–215MATHCrossRef
32.
Zurück zum Zitat Quinlan JR (1996) Bagging, boosting, and C4.5. In: Proceedings of the 13th national conference on artificial intelligence, pp 725–730 Quinlan JR (1996) Bagging, boosting, and C4.5. In: Proceedings of the 13th national conference on artificial intelligence, pp 725–730
33.
Zurück zum Zitat Ricci F, Aha DW (1998) Error-correcting output codes for local learners. In: 10th european conference on machine learning, ECML. Springer, Berlin, pp 280–291 Ricci F, Aha DW (1998) Error-correcting output codes for local learners. In: 10th european conference on machine learning, ECML. Springer, Berlin, pp 280–291
34.
Zurück zum Zitat Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409CrossRef Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409CrossRef
35.
Zurück zum Zitat Seewald K, Furnkranz J (2001) An evaluation of grading classifiers. In: Proceedings of the 4th international conferences on advances in intelligent data analysis, pp 115–124 Seewald K, Furnkranz J (2001) An evaluation of grading classifiers. In: Proceedings of the 4th international conferences on advances in intelligent data analysis, pp 115–124
36.
Zurück zum Zitat Seewald K (2002) How to make stacking better and faster while also taking care of an unknown weakness. In: Proceedings of the nineteenth international conference on machine learning, pp 554–561 Seewald K (2002) How to make stacking better and faster while also taking care of an unknown weakness. In: Proceedings of the nineteenth international conference on machine learning, pp 554–561
37.
Zurück zum Zitat Skalak D (1996) The sources of increased accuracy for two proposed boosting algorithms. In: AAAI ’96 workshop on integrating multiple learned models for improving and scaling machine learning algorithms Skalak D (1996) The sources of increased accuracy for two proposed boosting algorithms. In: AAAI ’96 workshop on integrating multiple learned models for improving and scaling machine learning algorithms
38.
Zurück zum Zitat Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271CrossRef Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271CrossRef
39.
Zurück zum Zitat Witten H, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco Witten H, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
41.
Zurück zum Zitat Yates WB, Patridge D (1996) Use of methodological diversity to improve neural network generalization. Neural Comput Appl 4(2):114–128CrossRef Yates WB, Patridge D (1996) Use of methodological diversity to improve neural network generalization. Neural Comput Appl 4(2):114–128CrossRef
42.
Zurück zum Zitat Zhiwen Y, Zhongkai D, Wong HS, Tan L (2010) Identifying protein kinase-specific phosphorylation sites based on the bagging-Adaboost ensemble approach. IEEE Trans Nanobiosci 9(2):132–143CrossRef Zhiwen Y, Zhongkai D, Wong HS, Tan L (2010) Identifying protein kinase-specific phosphorylation sites based on the bagging-Adaboost ensemble approach. IEEE Trans Nanobiosci 9(2):132–143CrossRef
Metadaten
Titel
Performance of global–local hybrid ensemble versus boosting and bagging ensembles
verfasst von
Dustin Baumgartner
Gursel Serpen
Publikationsdatum
01.08.2013
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 4/2013
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-012-0094-8

Weitere Artikel der Ausgabe 4/2013

International Journal of Machine Learning and Cybernetics 4/2013 Zur Ausgabe

Neuer Inhalt