Skip to main content
Top
Published in: Soft Computing 1/2012

01-01-2012 | Original Paper

Parameter determination and feature selection for C4.5 algorithm using scatter search approach

Authors: Shih-Wei Lin, Shih-Chieh Chen

Published in: Soft Computing | Issue 1/2012

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The C4.5 decision tree (DT) can be applied in various fields and discovers knowledge for human understanding. However, different problems typically require different parameter settings. Rule of thumb or trial-and-error methods are generally utilized to determine parameter settings. However, these methods may result in poor parameter settings and unsatisfactory results. On the other hand, although a dataset can contain numerous features, not all features are beneficial for classification in C4.5 algorithm. Therefore, a novel scatter search-based approach (SS + DT) is proposed to acquire optimal parameter settings and to select the beneficial subset of features that result in better classification results. To evaluate the efficiency of the proposed SS + DT approach, datasets in the UCI (University of California, Irvine) Machine Learning Repository are utilized to assess the performance of the proposed approach. Experimental results demonstrate that the parameter settings for the C4.5 algorithm obtained by the SS + DT approach are better than those obtained by other approaches. When feature selection is considered, classification accuracy rates on most datasets are increased. Therefore, the proposed approach can be utilized to identify effectively the best parameter settings for C4.5 algorithm and useful features.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Berry MJA, Linoff G (2001) Data mining techniques: for marking, sales and customer support. Wiley, London Berry MJA, Linoff G (2001) Data mining techniques: for marking, sales and customer support. Wiley, London
go back to reference Freitas AA (1998) Data mining: and knowledge discovery with evolutionary algorithm. Springer, Berlin Freitas AA (1998) Data mining: and knowledge discovery with evolutionary algorithm. Springer, Berlin
go back to reference Glover F (1998) A template for scatter search and path relinking. In: Hao JK, Lutton E, Ronald E, Schoenauer M, Snyers D (eds) Artificial evolution, Lecture notes in computer science, vol 1363, Springer, Berlin, pp 13–54 Glover F (1998) A template for scatter search and path relinking. In: Hao JK, Lutton E, Ronald E, Schoenauer M, Snyers D (eds) Artificial evolution, Lecture notes in computer science, vol 1363, Springer, Berlin, pp 13–54
go back to reference Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, San FranciscoMATH Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, San FranciscoMATH
go back to reference Kohavi R, John G (1995) Automatic parameter selection by minimizing estimated error. In: Prieditis A, Russell S (eds) Machine learning: Proceedings of the twelfth international conference, Morgan Kaufmann Kohavi R, John G (1995) Automatic parameter selection by minimizing estimated error. In: Prieditis A, Russell S (eds) Machine learning: Proceedings of the twelfth international conference, Morgan Kaufmann
go back to reference Laguna M, Martí R (2003) Scatter search: methodology and implementations in C. Kluwer Academic Publishers, BostonCrossRef Laguna M, Martí R (2003) Scatter search: methodology and implementations in C. Kluwer Academic Publishers, BostonCrossRef
go back to reference Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic, BostonCrossRefMATH Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic, BostonCrossRefMATH
go back to reference Quinlan JR (1986) Introduction of decision trees. Mach Learn 1:81–106 Quinlan JR (1986) Introduction of decision trees. Mach Learn 1:81–106
go back to reference Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Menlo Park Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Menlo Park
go back to reference Rasha SAW, Monmarché N, Slimane M, Moaid AF, Saleh HH (2006) A scatter search algorithm for the automatic clustering problem. Lect Notes Comput Sci 4065:350–364. doi:10.1007/11790853_28 CrossRef Rasha SAW, Monmarché N, Slimane M, Moaid AF, Saleh HH (2006) A scatter search algorithm for the automatic clustering problem. Lect Notes Comput Sci 4065:350–364. doi:10.​1007/​11790853_​28 CrossRef
go back to reference Smith M, Bull L (2005) GAP: constructing and selection features with evolutionary computing. In: Jain LC, Ghosh A (eds) Evolutionary computation in data mining. Springer, Berlin Smith M, Bull L (2005) GAP: constructing and selection features with evolutionary computing. In: Jain LC, Ghosh A (eds) Evolutionary computation in data mining. Springer, Berlin
Metadata
Title
Parameter determination and feature selection for C4.5 algorithm using scatter search approach
Authors
Shih-Wei Lin
Shih-Chieh Chen
Publication date
01-01-2012
Publisher
Springer-Verlag
Published in
Soft Computing / Issue 1/2012
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-011-0734-z

Other articles of this Issue 1/2012

Soft Computing 1/2012 Go to the issue

Premium Partner