Skip to main content
Erschienen in: Soft Computing 22/2019

11.01.2019 | Methodologies and Application

New feature selection and voting scheme to improve classification accuracy

verfasst von: Cheng-Jung Tsai

Erschienen in: Soft Computing | Ausgabe 22/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Classification is a classic technique employed in data mining. Many ensemble learning methods have been introduced to improve the predictive accuracy of classification. A typical ensemble learning method consists of three steps: selection, building, and integration. Of the three steps, the first and third significantly affect the predictive accuracy of the classification. In this paper, we propose a new selection and integration scheme. Our method can improve the accuracy of subtrees and maintain their diversity. Through a new voting scheme, the predictive accuracy of ensemble learning is improved. We also theoretically analyzed the selection and integration steps of our method. The results of experimental analyses show that our method can achieve better accuracy than two state-of-the-art tree-based ensemble learning approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abedinia O, Amjady N, Zareipour H (2017) A new feature selection technique for load and price forecast of electrical power systems. IEEE Trans Power Syst 32(1):62–74CrossRef Abedinia O, Amjady N, Zareipour H (2017) A new feature selection technique for load and price forecast of electrical power systems. IEEE Trans Power Syst 32(1):62–74CrossRef
Zurück zum Zitat Alobaidi MH, Fateh C, Mohamed AM (2018) Robust ensemble learning framework for day-ahead forecasting of household based energy consumption. Appl Energy 212:997–1012CrossRef Alobaidi MH, Fateh C, Mohamed AM (2018) Robust ensemble learning framework for day-ahead forecasting of household based energy consumption. Appl Energy 212:997–1012CrossRef
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24(1):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(1):123–140MATH
Zurück zum Zitat Breiman L (2017) Classification and regression trees. Routledge, AbingtonCrossRef Breiman L (2017) Classification and regression trees. Routledge, AbingtonCrossRef
Zurück zum Zitat Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef
Zurück zum Zitat Cunningham P, Carney J (2000) Diversity versus quality in classification ensembles based on feature selection. In: Proceedings of the eleventh European conference on machine learning, Barcelona, Catalonia, Spain, 2000, pp 109–116 Cunningham P, Carney J (2000) Diversity versus quality in classification ensembles based on feature selection. In: Proceedings of the eleventh European conference on machine learning, Barcelona, Catalonia, Spain, 2000, pp 109–116
Zurück zum Zitat Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
Zurück zum Zitat Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning, Bari, Italy, 1996, pp 148–156 Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning, Bari, Italy, 1996, pp 148–156
Zurück zum Zitat Guerra-Salcedo C, Whitley D (1999) Genetic approach to feature selection for ensemble creation. In: Proceedings of the international conference on genetic and progressive computation, Orlando, Florida, USA, 1999, pp 236–243 Guerra-Salcedo C, Whitley D (1999) Genetic approach to feature selection for ensemble creation. In: Proceedings of the international conference on genetic and progressive computation, Orlando, Florida, USA, 1999, pp 236–243
Zurück zum Zitat Han J, Kambe M (2011) Data mining: concepts and techniques. Morgan Kaufmann, Burlington Han J, Kambe M (2011) Data mining: concepts and techniques. Morgan Kaufmann, Burlington
Zurück zum Zitat Ho TK (2008) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844 Ho TK (2008) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Zurück zum Zitat Hsu WH (2004) Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inf Sci 163(1–3):103–122MathSciNetCrossRef Hsu WH (2004) Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inf Sci 163(1–3):103–122MathSciNetCrossRef
Zurück zum Zitat Joshi S, Nair MK (2015) Prediction of heart disease using classification based data mining techniques. Comput Intell Data Min 2:503–511 Joshi S, Nair MK (2015) Prediction of heart disease using classification based data mining techniques. Comput Intell Data Min 2:503–511
Zurück zum Zitat Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 1:273–324CrossRef Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 1:273–324CrossRef
Zurück zum Zitat Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fus 37:132–156CrossRef Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fus 37:132–156CrossRef
Zurück zum Zitat Kuncheva L, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207CrossRef Kuncheva L, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207CrossRef
Zurück zum Zitat Kuncheva L, Whitaker CJ, Shipp CA, Duin RPW (2000) Is independence good for combining classifiers? In: Proceedings of the fifth international conference on pattern recognition, Barcelona, Spain, 2000, pp 2168–2171 Kuncheva L, Whitaker CJ, Shipp CA, Duin RPW (2000) Is independence good for combining classifiers? In: Proceedings of the fifth international conference on pattern recognition, Barcelona, Spain, 2000, pp 2168–2171
Zurück zum Zitat Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402CrossRef Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402CrossRef
Zurück zum Zitat Lee CI, Tsai CJ, Ku CW (2006) An evolutionary and attribute-oriented ensemble classifier. In: Proceedings of the international conference on computational science and its applications, Berlin, Heidelberg, 2006, pp 1210–1218 Lee CI, Tsai CJ, Ku CW (2006) An evolutionary and attribute-oriented ensemble classifier. In: Proceedings of the international conference on computational science and its applications, Berlin, Heidelberg, 2006, pp 1210–1218
Zurück zum Zitat Liu B, Long R, Chou KC (2016a) iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32(16):2411–2418CrossRef Liu B, Long R, Chou KC (2016a) iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32(16):2411–2418CrossRef
Zurück zum Zitat Liu B, Wang S, Long R, Chou KC (2016b) iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics 33(1):35–41CrossRef Liu B, Wang S, Long R, Chou KC (2016b) iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics 33(1):35–41CrossRef
Zurück zum Zitat Nock R (2002) Inducing interpretable voting classifiers without trading accuracy for simplicity: theoretical results, approximation algorithms, and experiments. J Artif Intell Res 17:137–170MathSciNetCrossRef Nock R (2002) Inducing interpretable voting classifiers without trading accuracy for simplicity: theoretical results, approximation algorithms, and experiments. J Artif Intell Res 17:137–170MathSciNetCrossRef
Zurück zum Zitat Opitz DW (1999) Feature selection for ensembles. In: Proceedings of the sixteenth national conference on artificial intelligence and eleventh conference on innovative applications of artificial intelligence, Orlando, Florida, USA, 1999, pp 379–384 Opitz DW (1999) Feature selection for ensembles. In: Proceedings of the sixteenth national conference on artificial intelligence and eleventh conference on innovative applications of artificial intelligence, Orlando, Florida, USA, 1999, pp 379–384
Zurück zum Zitat Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198CrossRef Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198CrossRef
Zurück zum Zitat Ortiz-Boyer D, Hervás-Martínez C, García-Pedrajas N (2005) CIXL2: a crossover operator for progressive algorithms based on population features. J Artif Intell Res 24:1–48CrossRef Ortiz-Boyer D, Hervás-Martínez C, García-Pedrajas N (2005) CIXL2: a crossover operator for progressive algorithms based on population features. J Artif Intell Res 24:1–48CrossRef
Zurück zum Zitat Pandya R, Pandya J (2015) C5.0 algorithm to improved decision tree with feature selection and reduced error pruning. Int J Comput Appl 117(16):18–21 Pandya R, Pandya J (2015) C5.0 algorithm to improved decision tree with feature selection and reduced error pruning. Int J Comput Appl 117(16):18–21
Zurück zum Zitat Quinlan JR (1993) C4.5: program for machine learning. Morgen Kaufmann Publisher, San Mateo Quinlan JR (1993) C4.5: program for machine learning. Morgen Kaufmann Publisher, San Mateo
Zurück zum Zitat Rastogi R, Shim K (2000) PUBLIC: a decision tree classifier that integrates building and pruning. Data Min Knowl Disc 4(4):315–344CrossRef Rastogi R, Shim K (2000) PUBLIC: a decision tree classifier that integrates building and pruning. Data Min Knowl Disc 4(4):315–344CrossRef
Zurück zum Zitat Reynolds TJ, Antoniou CA (2003) Experiments in speech recognition using a modular MLP architecture for acoustic modeling. Inf Sci 156(1–2):39–54CrossRef Reynolds TJ, Antoniou CA (2003) Experiments in speech recognition using a modular MLP architecture for acoustic modeling. Inf Sci 156(1–2):39–54CrossRef
Zurück zum Zitat Selamat A, Omatu S (2004) Web page feature selection and classification using neural networks. Inf Sci 158:69–88MathSciNetCrossRef Selamat A, Omatu S (2004) Web page feature selection and classification using neural networks. Inf Sci 158:69–88MathSciNetCrossRef
Zurück zum Zitat Shipp CA, Kuncheva L (2002) Relationships between combination methods and measures of diversity in combining classifiers. Inf Fus 3(2):35–148CrossRef Shipp CA, Kuncheva L (2002) Relationships between combination methods and measures of diversity in combining classifiers. Inf Fus 3(2):35–148CrossRef
Zurück zum Zitat Somol P, Pudil P, Kittler J (2004) Fast branch & bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Mach Intell 26:900–912CrossRef Somol P, Pudil P, Kittler J (2004) Fast branch & bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Mach Intell 26:900–912CrossRef
Zurück zum Zitat Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef
Zurück zum Zitat Wolpert D, Macready WG (1999) An efficient method to estimate bagging’s generalization error. Mach Learn 35(1):41–55CrossRef Wolpert D, Macready WG (1999) An efficient method to estimate bagging’s generalization error. Mach Learn 35(1):41–55CrossRef
Zurück zum Zitat Zhang Y, Bhattacharyya S (2004) Genetic programming in classifying large-scale data: an ensemble method. Inf Sci 163(1–3):85–101CrossRef Zhang Y, Bhattacharyya S (2004) Genetic programming in classifying large-scale data: an ensemble method. Inf Sci 163(1–3):85–101CrossRef
Zurück zum Zitat Zhang XL, Wang D (2016) A deep ensemble learning method for monaural speech separation. IEEE/ACM Trans Audio Speech Lang Process 24(5):967–977MathSciNetCrossRef Zhang XL, Wang D (2016) A deep ensemble learning method for monaural speech separation. IEEE/ACM Trans Audio Speech Lang Process 24(5):967–977MathSciNetCrossRef
Zurück zum Zitat Zilly J, Buhmann JM (2017) Mahapatra D (2017) Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation. Comput Med Imaging Graph 55:28–41CrossRef Zilly J, Buhmann JM (2017) Mahapatra D (2017) Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation. Comput Med Imaging Graph 55:28–41CrossRef
Metadaten
Titel
New feature selection and voting scheme to improve classification accuracy
verfasst von
Cheng-Jung Tsai
Publikationsdatum
11.01.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 22/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-03757-2

Weitere Artikel der Ausgabe 22/2019

Soft Computing 22/2019 Zur Ausgabe

Premium Partner