Skip to main content
Erschienen in: Soft Computing 5/2015

01.05.2015 | Methodologies and Application

Instance selection by genetic-based biological algorithm

verfasst von: Zong-Yao Chen, Chih-Fong Tsai, William Eberle, Wei-Chao Lin, Shih-Wen Ke

Erschienen in: Soft Computing | Ausgabe 5/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Instance selection is an important research problem of data pre-processing in the data mining field. The aim of instance selection is to reduce the data size by filtering out noisy data, which may degrade the mining performance, from a given dataset. Genetic algorithms have presented an effective instance selection approach to improve the performance of data mining algorithms. However, current approaches only pursue the simplest evolutionary process based on the most reasonable and simplest rules. In this paper, we introduce a novel instance selection algorithm, namely a genetic-based biological algorithm (GBA). GBA fits a “biological evolution” into the evolutionary process, where the most streamlined process also complies with the reasonable rules. In other words, after long-term evolution, organisms find the most efficient way to allocate resources and evolve. Consequently, we can closely simulate the natural evolution of an algorithm, such that the algorithm will be both efficient and effective. Our experiments are based on comparing GBA with five state-of-the-art algorithms over 50 different domain datasets from the UCI Machine Learning Repository. The experimental results demonstrate that GBA outperforms these baselines, providing the lowest classification error rate and the least storage requirement. Moreover, GBA is very computational efficient, which only requires slightly larger computational cost than GA.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
2
The experimental environments are as follows: CPU: Intel(R) Core(TM) i7-3770 @ 3.40 GHz, RAN: 32 GB, OS: Windows 7–64bit, Code: Matlab R2012a.
 
Literatur
Zurück zum Zitat Aggarwal CC, Yu PC (2001) Outlier detection for high dimensional data. In: Proceedings of the ACM SIGMOD conference, pp 37–46 Aggarwal CC, Yu PC (2001) Outlier detection for high dimensional data. In: Proceedings of the ACM SIGMOD conference, pp 37–46
Zurück zum Zitat Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66 Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Zurück zum Zitat Ball P (2002) Natural strategies for the molecular engineer. Nanotechnology 13:R15–R28CrossRef Ball P (2002) Natural strategies for the molecular engineer. Nanotechnology 13:R15–R28CrossRef
Zurück zum Zitat Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, HobokenMATH Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, HobokenMATH
Zurück zum Zitat Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6:153–172CrossRefMATHMathSciNet Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6:153–172CrossRefMATHMathSciNet
Zurück zum Zitat Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction: an experimental study. IEEE Trans Evolut Comput 7(6):561–575CrossRef Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction: an experimental study. IEEE Trans Evolut Comput 7(6):561–575CrossRef
Zurück zum Zitat Derrac J, García S, Herrera F (2010) A survey on evolutionary instance selection and generation. Int J Appl Metaheur Comput 1(1):60–92CrossRef Derrac J, García S, Herrera F (2010) A survey on evolutionary instance selection and generation. Int J Appl Metaheur Comput 1(1):60–92CrossRef
Zurück zum Zitat Ellstrand NC (2003) Dangerous liaisons: when cultivated plants mate with their wild relatives. Johns Hopkins University Press, Baltimore Ellstrand NC (2003) Dangerous liaisons: when cultivated plants mate with their wild relatives. Johns Hopkins University Press, Baltimore
Zurück zum Zitat Flynn JJ, Wyss AR (1998) Recent advances in South American mammalian paleontology. Trends Eco Evol 13(11):449–454CrossRef Flynn JJ, Wyss AR (1998) Recent advances in South American mammalian paleontology. Trends Eco Evol 13(11):449–454CrossRef
Zurück zum Zitat García-Pedrajas N, del Castillo JAR, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78:381–420CrossRefMathSciNet García-Pedrajas N, del Castillo JAR, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78:381–420CrossRefMathSciNet
Zurück zum Zitat Guyonm I (2003) An Introduction to variable and feature selection. J Mach Learn Res 3:1157–1182 Guyonm I (2003) An Introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Zurück zum Zitat Holland JH (1992) Adaptation in natural and artificial system: an introductory analysis with applications to biology, control, and artificial intelligence. A Bradford Book, Chester Holland JH (1992) Adaptation in natural and artificial system: an introductory analysis with applications to biology, control, and artificial intelligence. A Bradford Book, Chester
Zurück zum Zitat Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37CrossRef Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37CrossRef
Zurück zum Zitat Jankowski N, Grochowski M (2004) Comparison of instances selection algorithms I: algorithms survey. International conference on artificial intelligence and soft computing, pp 598–603 Jankowski N, Grochowski M (2004) Comparison of instances selection algorithms I: algorithms survey. International conference on artificial intelligence and soft computing, pp 598–603
Zurück zum Zitat Jing SY (2013) A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput 18:1373–1382 Jing SY (2013) A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput 18:1373–1382
Zurück zum Zitat Knorr EM, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253CrossRef Knorr EM, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253CrossRef
Zurück zum Zitat Koepfli KP, Gompper ME, Eizirik E, Ho CC, Linden L, Maldonado JE, Wayne RK (2007) Phylogeny of the Procyonidae (Mammalia: Carvnivora): molecules, morphology and the Great American interchange. Mol Phylogenet Evol 43(3):1076–1095CrossRef Koepfli KP, Gompper ME, Eizirik E, Ho CC, Linden L, Maldonado JE, Wayne RK (2007) Phylogeny of the Procyonidae (Mammalia: Carvnivora): molecules, morphology and the Great American interchange. Mol Phylogenet Evol 43(3):1076–1095CrossRef
Zurück zum Zitat Li X-B, Jacob VS (2008) Adaptive data reduction for large-scale transaction data. Eur J Oper Res 188(3):910–924CrossRefMATH Li X-B, Jacob VS (2008) Adaptive data reduction for large-scale transaction data. Eur J Oper Res 188(3):910–924CrossRefMATH
Zurück zum Zitat Liu H, Motoda H (2001) Instance selection and construction for data mining. Kluwer, Boston Liu H, Motoda H (2001) Instance selection and construction for data mining. Kluwer, Boston
Zurück zum Zitat Morgan GS (2002) Late Rancholabrean mammals from southernmost Florida and neotropical influence in Florida pleistocene faunas. Smithson Contrib Paleobiol 93:15–38 Morgan GS (2002) Late Rancholabrean mammals from southernmost Florida and neotropical influence in Florida pleistocene faunas. Smithson Contrib Paleobiol 93:15–38
Zurück zum Zitat Nojima Y, Ishibuchi H, Kuwajima I (2009) Parallel distributed genetic fuzzy rule selection. Soft Comput 13:511–519CrossRef Nojima Y, Ishibuchi H, Kuwajima I (2009) Parallel distributed genetic fuzzy rule selection. Soft Comput 13:511–519CrossRef
Zurück zum Zitat Odum HT (1994) Ecological and general systems: an introduction to systems ecology. University Press of Colorado, Niwot Odum HT (1994) Ecological and general systems: an introduction to systems ecology. University Press of Colorado, Niwot
Zurück zum Zitat Pollan M (2001) The year in ideas. A-Z. Genetic pollution, The New York Times Pollan M (2001) The year in ideas. A-Z. Genetic pollution, The New York Times
Zurück zum Zitat Pyle D (1999) Data preparation for data mining. Morgan Kaufmann, Burlington Pyle D (1999) Data preparation for data mining. Morgan Kaufmann, Burlington
Zurück zum Zitat Pradhan S, Wu X (1999) Instance selection in data mining. Technical report. Department of Computer Science, University of Colorado at Boulder Pradhan S, Wu X (1999) Instance selection in data mining. Technical report. Department of Computer Science, University of Colorado at Boulder
Zurück zum Zitat Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Zurück zum Zitat Stern C (1962) Wilhelm Weinberg. Genetics 47:1–5 Stern C (1962) Wilhelm Weinberg. Genetics 47:1–5
Zurück zum Zitat Uludağ G, Kiraz B, Etaner-Uyar AŞ, Özcan E (2013) A hybrid multi-population framework for dynamic environments combining online and offline learning. Soft Comput 17:2327–2348CrossRef Uludağ G, Kiraz B, Etaner-Uyar AŞ, Özcan E (2013) A hybrid multi-population framework for dynamic environments combining online and offline learning. Soft Comput 17:2327–2348CrossRef
Zurück zum Zitat Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421CrossRefMATH Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421CrossRefMATH
Zurück zum Zitat Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286CrossRefMATH Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286CrossRefMATH
Zurück zum Zitat Xie XF, Liu J, Wang ZJ (2014) A cooperative group optimization system. Soft Comput 18:469–495CrossRef Xie XF, Liu J, Wang ZJ (2014) A cooperative group optimization system. Soft Comput 18:469–495CrossRef
Metadaten
Titel
Instance selection by genetic-based biological algorithm
verfasst von
Zong-Yao Chen
Chih-Fong Tsai
William Eberle
Wei-Chao Lin
Shih-Wen Ke
Publikationsdatum
01.05.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 5/2015
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-014-1339-0

Weitere Artikel der Ausgabe 5/2015

Soft Computing 5/2015 Zur Ausgabe

Premium Partner