Skip to main content
Top
Published in: Soft Computing 5/2015

01-05-2015 | Methodologies and Application

Instance selection by genetic-based biological algorithm

Authors: Zong-Yao Chen, Chih-Fong Tsai, William Eberle, Wei-Chao Lin, Shih-Wen Ke

Published in: Soft Computing | Issue 5/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Instance selection is an important research problem of data pre-processing in the data mining field. The aim of instance selection is to reduce the data size by filtering out noisy data, which may degrade the mining performance, from a given dataset. Genetic algorithms have presented an effective instance selection approach to improve the performance of data mining algorithms. However, current approaches only pursue the simplest evolutionary process based on the most reasonable and simplest rules. In this paper, we introduce a novel instance selection algorithm, namely a genetic-based biological algorithm (GBA). GBA fits a “biological evolution” into the evolutionary process, where the most streamlined process also complies with the reasonable rules. In other words, after long-term evolution, organisms find the most efficient way to allocate resources and evolve. Consequently, we can closely simulate the natural evolution of an algorithm, such that the algorithm will be both efficient and effective. Our experiments are based on comparing GBA with five state-of-the-art algorithms over 50 different domain datasets from the UCI Machine Learning Repository. The experimental results demonstrate that GBA outperforms these baselines, providing the lowest classification error rate and the least storage requirement. Moreover, GBA is very computational efficient, which only requires slightly larger computational cost than GA.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
2
The experimental environments are as follows: CPU: Intel(R) Core(TM) i7-3770 @ 3.40 GHz, RAN: 32 GB, OS: Windows 7–64bit, Code: Matlab R2012a.
 
Literature
go back to reference Aggarwal CC, Yu PC (2001) Outlier detection for high dimensional data. In: Proceedings of the ACM SIGMOD conference, pp 37–46 Aggarwal CC, Yu PC (2001) Outlier detection for high dimensional data. In: Proceedings of the ACM SIGMOD conference, pp 37–46
go back to reference Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66 Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
go back to reference Ball P (2002) Natural strategies for the molecular engineer. Nanotechnology 13:R15–R28CrossRef Ball P (2002) Natural strategies for the molecular engineer. Nanotechnology 13:R15–R28CrossRef
go back to reference Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, HobokenMATH Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, HobokenMATH
go back to reference Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6:153–172CrossRefMATHMathSciNet Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6:153–172CrossRefMATHMathSciNet
go back to reference Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction: an experimental study. IEEE Trans Evolut Comput 7(6):561–575CrossRef Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction: an experimental study. IEEE Trans Evolut Comput 7(6):561–575CrossRef
go back to reference Derrac J, García S, Herrera F (2010) A survey on evolutionary instance selection and generation. Int J Appl Metaheur Comput 1(1):60–92CrossRef Derrac J, García S, Herrera F (2010) A survey on evolutionary instance selection and generation. Int J Appl Metaheur Comput 1(1):60–92CrossRef
go back to reference Ellstrand NC (2003) Dangerous liaisons: when cultivated plants mate with their wild relatives. Johns Hopkins University Press, Baltimore Ellstrand NC (2003) Dangerous liaisons: when cultivated plants mate with their wild relatives. Johns Hopkins University Press, Baltimore
go back to reference Flynn JJ, Wyss AR (1998) Recent advances in South American mammalian paleontology. Trends Eco Evol 13(11):449–454CrossRef Flynn JJ, Wyss AR (1998) Recent advances in South American mammalian paleontology. Trends Eco Evol 13(11):449–454CrossRef
go back to reference García-Pedrajas N, del Castillo JAR, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78:381–420CrossRefMathSciNet García-Pedrajas N, del Castillo JAR, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78:381–420CrossRefMathSciNet
go back to reference Guyonm I (2003) An Introduction to variable and feature selection. J Mach Learn Res 3:1157–1182 Guyonm I (2003) An Introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
go back to reference Holland JH (1992) Adaptation in natural and artificial system: an introductory analysis with applications to biology, control, and artificial intelligence. A Bradford Book, Chester Holland JH (1992) Adaptation in natural and artificial system: an introductory analysis with applications to biology, control, and artificial intelligence. A Bradford Book, Chester
go back to reference Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37CrossRef Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37CrossRef
go back to reference Jankowski N, Grochowski M (2004) Comparison of instances selection algorithms I: algorithms survey. International conference on artificial intelligence and soft computing, pp 598–603 Jankowski N, Grochowski M (2004) Comparison of instances selection algorithms I: algorithms survey. International conference on artificial intelligence and soft computing, pp 598–603
go back to reference Jing SY (2013) A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput 18:1373–1382 Jing SY (2013) A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput 18:1373–1382
go back to reference Knorr EM, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253CrossRef Knorr EM, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253CrossRef
go back to reference Koepfli KP, Gompper ME, Eizirik E, Ho CC, Linden L, Maldonado JE, Wayne RK (2007) Phylogeny of the Procyonidae (Mammalia: Carvnivora): molecules, morphology and the Great American interchange. Mol Phylogenet Evol 43(3):1076–1095CrossRef Koepfli KP, Gompper ME, Eizirik E, Ho CC, Linden L, Maldonado JE, Wayne RK (2007) Phylogeny of the Procyonidae (Mammalia: Carvnivora): molecules, morphology and the Great American interchange. Mol Phylogenet Evol 43(3):1076–1095CrossRef
go back to reference Li X-B, Jacob VS (2008) Adaptive data reduction for large-scale transaction data. Eur J Oper Res 188(3):910–924CrossRefMATH Li X-B, Jacob VS (2008) Adaptive data reduction for large-scale transaction data. Eur J Oper Res 188(3):910–924CrossRefMATH
go back to reference Liu H, Motoda H (2001) Instance selection and construction for data mining. Kluwer, Boston Liu H, Motoda H (2001) Instance selection and construction for data mining. Kluwer, Boston
go back to reference Morgan GS (2002) Late Rancholabrean mammals from southernmost Florida and neotropical influence in Florida pleistocene faunas. Smithson Contrib Paleobiol 93:15–38 Morgan GS (2002) Late Rancholabrean mammals from southernmost Florida and neotropical influence in Florida pleistocene faunas. Smithson Contrib Paleobiol 93:15–38
go back to reference Nojima Y, Ishibuchi H, Kuwajima I (2009) Parallel distributed genetic fuzzy rule selection. Soft Comput 13:511–519CrossRef Nojima Y, Ishibuchi H, Kuwajima I (2009) Parallel distributed genetic fuzzy rule selection. Soft Comput 13:511–519CrossRef
go back to reference Odum HT (1994) Ecological and general systems: an introduction to systems ecology. University Press of Colorado, Niwot Odum HT (1994) Ecological and general systems: an introduction to systems ecology. University Press of Colorado, Niwot
go back to reference Pollan M (2001) The year in ideas. A-Z. Genetic pollution, The New York Times Pollan M (2001) The year in ideas. A-Z. Genetic pollution, The New York Times
go back to reference Pyle D (1999) Data preparation for data mining. Morgan Kaufmann, Burlington Pyle D (1999) Data preparation for data mining. Morgan Kaufmann, Burlington
go back to reference Pradhan S, Wu X (1999) Instance selection in data mining. Technical report. Department of Computer Science, University of Colorado at Boulder Pradhan S, Wu X (1999) Instance selection in data mining. Technical report. Department of Computer Science, University of Colorado at Boulder
go back to reference Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
go back to reference Stern C (1962) Wilhelm Weinberg. Genetics 47:1–5 Stern C (1962) Wilhelm Weinberg. Genetics 47:1–5
go back to reference Uludağ G, Kiraz B, Etaner-Uyar AŞ, Özcan E (2013) A hybrid multi-population framework for dynamic environments combining online and offline learning. Soft Comput 17:2327–2348CrossRef Uludağ G, Kiraz B, Etaner-Uyar AŞ, Özcan E (2013) A hybrid multi-population framework for dynamic environments combining online and offline learning. Soft Comput 17:2327–2348CrossRef
go back to reference Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421CrossRefMATH Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421CrossRefMATH
go back to reference Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286CrossRefMATH Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286CrossRefMATH
go back to reference Xie XF, Liu J, Wang ZJ (2014) A cooperative group optimization system. Soft Comput 18:469–495CrossRef Xie XF, Liu J, Wang ZJ (2014) A cooperative group optimization system. Soft Comput 18:469–495CrossRef
Metadata
Title
Instance selection by genetic-based biological algorithm
Authors
Zong-Yao Chen
Chih-Fong Tsai
William Eberle
Wei-Chao Lin
Shih-Wen Ke
Publication date
01-05-2015
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 5/2015
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-014-1339-0

Other articles of this Issue 5/2015

Soft Computing 5/2015 Go to the issue

Premium Partner