Skip to main content
Erschienen in: Evolutionary Intelligence 2-3/2015

01.09.2015 | Special Issue

ExSTraCS 2.0: description and evaluation of a scalable learning classifier system

verfasst von: Ryan J. Urbanowicz, Jason H. Moore

Erschienen in: Evolutionary Intelligence | Ausgabe 2-3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Algorithmic scalability is a major concern for any machine learning strategy in this age of ‘big data’. A large number of potentially predictive attributes is emblematic of problems in bioinformatics, genetic epidemiology, and many other fields. Previously, ExSTraCS was introduced as an extended Michigan-style supervised learning classifier system that combined a set of powerful heuristics to successfully tackle the challenges of classification, prediction, and knowledge discovery in complex, noisy, and heterogeneous problem domains. While Michigan-style learning classifier systems are powerful and flexible learners, they are not considered to be particularly scalable. For the first time, this paper presents a complete description of the ExSTraCS algorithm and introduces an effective strategy to dramatically improve learning classifier system scalability. ExSTraCS 2.0 addresses scalability with (1) a rule specificity limit, (2) new approaches to expert knowledge guided covering and mutation mechanisms, and (3) the implementation and utilization of the TuRF algorithm for improving the quality of expert knowledge discovery in larger datasets. Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets. ExSTraCS 2.0 was also able to reliably solve the 6, 11, 20, 37, 70, and 135 multiplexer problems, and did so in similar or fewer learning iterations than previously reported, with smaller finite training sets, and without using building blocks discovered from simpler multiplexer problems. Furthermore, ExSTraCS usability was made simpler through the elimination of previously critical run parameters.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bacardit J, Burke E, Krasnogor N (2009) Improving the scalability of rule-based evolutionary learning. Memet Comput 1(1):55–67CrossRef Bacardit J, Burke E, Krasnogor N (2009) Improving the scalability of rule-based evolutionary learning. Memet Comput 1(1):55–67CrossRef
2.
Zurück zum Zitat Bacardit J, Goldberg D, Butz M, Llora X, Garrell J (2004) Speeding-up pittsburgh lcss: modeling time and accuracy. In: Parallel problem solving from nature-PPSN VIII. Springer, New York, , pp 1021–1031 Bacardit J, Goldberg D, Butz M, Llora X, Garrell J (2004) Speeding-up pittsburgh lcss: modeling time and accuracy. In: Parallel problem solving from nature-PPSN VIII. Springer, New York, , pp 1021–1031
3.
Zurück zum Zitat Bacardit J, Krasnogor N (2009) A mixed discrete-continuous attribute list representation for large scale classification domains. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, New York, pp 1155–1162 Bacardit J, Krasnogor N (2009) A mixed discrete-continuous attribute list representation for large scale classification domains. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, New York, pp 1155–1162
4.
Zurück zum Zitat Bacardit J, Krasnogor N (2009) Performance and efficiency of memetic pittsburgh learning classifier systems. Evol comput 17(3):307–342CrossRef Bacardit J, Krasnogor N (2009) Performance and efficiency of memetic pittsburgh learning classifier systems. Evol comput 17(3):307–342CrossRef
5.
Zurück zum Zitat Bacardit J, Llorà X (2013) Large-scale data mining using genetics-based machine learning. Wiley Interdiscip Rev Data Min Knowl Discov 3(1):37–61CrossRef Bacardit J, Llorà X (2013) Large-scale data mining using genetics-based machine learning. Wiley Interdiscip Rev Data Min Knowl Discov 3(1):37–61CrossRef
6.
Zurück zum Zitat Bacardit J, Stout M, Hirst J, Sastry K, Llorà X, Krasnogor N (2007) Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, New York, pp 346–353 Bacardit J, Stout M, Hirst J, Sastry K, Llorà X, Krasnogor N (2007) Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, New York, pp 346–353
7.
Zurück zum Zitat Bagallo G, Haussler D (1990) Boolean feature discovery in empirical learning. Machine Learn 5(1):71–99CrossRef Bagallo G, Haussler D (1990) Boolean feature discovery in empirical learning. Machine Learn 5(1):71–99CrossRef
8.
Zurück zum Zitat Barto AG, Anandan P, Anderson CW (1986) Cooperativity in networks of pattern recognizing stochastic learning automata. In: Adaptive and learning systems. Springer, New York, pp 235–246 Barto AG, Anandan P, Anderson CW (1986) Cooperativity in networks of pattern recognizing stochastic learning automata. In: Adaptive and learning systems. Springer, New York, pp 235–246
9.
Zurück zum Zitat Bernadó-Mansilla E, Garrell-Guiu J (2003) Accuracy-based learning classifier system: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238CrossRef Bernadó-Mansilla E, Garrell-Guiu J (2003) Accuracy-based learning classifier system: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238CrossRef
10.
Zurück zum Zitat Bull L, Studley M, Bagnall A, Whittley I (2007) Learning classifier system ensembles with rule-sharing. IEEE Trans Evol Comput 11(4):496–502CrossRef Bull L, Studley M, Bagnall A, Whittley I (2007) Learning classifier system ensembles with rule-sharing. IEEE Trans Evol Comput 11(4):496–502CrossRef
11.
Zurück zum Zitat Butz MV (2006) Xcs in binary classification problems. In: Rule-based evolutionary online learning systems. Springer, New York, pp 147–156 Butz MV (2006) Xcs in binary classification problems. In: Rule-based evolutionary online learning systems. Springer, New York, pp 147–156
12.
Zurück zum Zitat Butz MV, Goldberg DE, Tharakunnel K (2003) Analysis and improvement of fitness exploitation in xcs: bounding models, tournament selection, and bilateral accuracy. Evol Comput 11(3):239–277CrossRef Butz MV, Goldberg DE, Tharakunnel K (2003) Analysis and improvement of fitness exploitation in xcs: bounding models, tournament selection, and bilateral accuracy. Evol Comput 11(3):239–277CrossRef
13.
Zurück zum Zitat Butz MV, Kovacs T, Lanzi PL, Wilson SW (2004) Toward a theory of generalization and learning in xcs. IEEE Trans Evol Comput 8(1):28–46CrossRef Butz MV, Kovacs T, Lanzi PL, Wilson SW (2004) Toward a theory of generalization and learning in xcs. IEEE Trans Evol Comput 8(1):28–46CrossRef
14.
Zurück zum Zitat Butz MV, Pelikan M (2006) Studying xcs/boa learning in boolean functions: structure encoding and random boolean functions. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, New York, pp 1449–1456 Butz MV, Pelikan M (2006) Studying xcs/boa learning in boolean functions: structure encoding and random boolean functions. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, New York, pp 1449–1456
15.
Zurück zum Zitat Butz MV, Sastry K, Goldberg DE (2003) Tournament selection: stable fitness pressure in xcs. In: Genetic and evolutionary computation GECCO 2003. Springer, New York, pp 1857–1869 Butz MV, Sastry K, Goldberg DE (2003) Tournament selection: stable fitness pressure in xcs. In: Genetic and evolutionary computation GECCO 2003. Springer, New York, pp 1857–1869
16.
Zurück zum Zitat DeJong KA, Spears WM (1990) Learning concept classification rules using genetic algorithms. Tech. rep, DTIC document DeJong KA, Spears WM (1990) Learning concept classification rules using genetic algorithms. Tech. rep, DTIC document
17.
Zurück zum Zitat Franco M, Krasnogor N, Bacardit J (2010) Speeding up the evaluation of evolutionary learning systems using gpgpus. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 1039–1046 Franco M, Krasnogor N, Bacardit J (2010) Speeding up the evaluation of evolutionary learning systems using gpgpus. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 1039–1046
18.
Zurück zum Zitat Granizo-Mackenzie D, Moore JH (2013) Multiple threshold spatially uniform relieff for the genetic analysis of complex human diseases. In: Evolutionary computation, machine learning and data mining in bioinformatics. Springer, New York, pp 1–10 Granizo-Mackenzie D, Moore JH (2013) Multiple threshold spatially uniform relieff for the genetic analysis of complex human diseases. In: Evolutionary computation, machine learning and data mining in bioinformatics. Springer, New York, pp 1–10
19.
Zurück zum Zitat Greene C, Penrod N, Kiralis J, Moore J (2009) Spatially uniform relieff (surf) for computationally-efficient filtering of gene-gene interactions. BioData Mining 2(1):1–9CrossRef Greene C, Penrod N, Kiralis J, Moore J (2009) Spatially uniform relieff (surf) for computationally-efficient filtering of gene-gene interactions. BioData Mining 2(1):1–9CrossRef
20.
Zurück zum Zitat Greene CS, Himmelstein DS, Kiralis J, Moore JH (2010) The informative extremes: using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. In: Evolutionary computation, machine learning and data mining in bioinformatics. Springer, New York, pp 182–193 Greene CS, Himmelstein DS, Kiralis J, Moore JH (2010) The informative extremes: using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. In: Evolutionary computation, machine learning and data mining in bioinformatics. Springer, New York, pp 182–193
21.
Zurück zum Zitat Higuchi T, Niwa T, Tanaka T, Iba H, de Garis H, Furuya T (1993) Evolvable hardware-genetic-based generation of electric circuitry at gate and hardware description language (hdl) levels. Electrotechnical Laboratory, Tsukuba, Japan, pp 93–94 Higuchi T, Niwa T, Tanaka T, Iba H, de Garis H, Furuya T (1993) Evolvable hardware-genetic-based generation of electric circuitry at gate and hardware description language (hdl) levels. Electrotechnical Laboratory, Tsukuba, Japan, pp 93–94
22.
Zurück zum Zitat Iba H, De Garis H, Sato T (1994) Genetic programming using a minimum description length principle. Adv Genet Progr 1:265–284 Iba H, De Garis H, Sato T (1994) Genetic programming using a minimum description length principle. Adv Genet Progr 1:265–284
23.
Zurück zum Zitat Iqbal M, Browne WN, Zhang M (2012) Reusing building blocks of extracted knowledge to solve complex, large-scale boolean problems Iqbal M, Browne WN, Zhang M (2012) Reusing building blocks of extracted knowledge to solve complex, large-scale boolean problems
24.
Zurück zum Zitat Iqbal M, Browne WN, Zhang M (2013) Extending learning classifier system with cyclic graphs for scalability on complex, large-scale boolean problems. In: Proceeding of the 15th annual conference on genetic and evolutionary computation conference. ACM, New York, pp 1045–1052 Iqbal M, Browne WN, Zhang M (2013) Extending learning classifier system with cyclic graphs for scalability on complex, large-scale boolean problems. In: Proceeding of the 15th annual conference on genetic and evolutionary computation conference. ACM, New York, pp 1045–1052
25.
Zurück zum Zitat Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: Machine learning: ECML-94. Springer, New York, pp 171–182 Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: Machine learning: ECML-94. Springer, New York, pp 171–182
26.
Zurück zum Zitat Lanzi PL, Loiacono D (2010) Speeding up matching in learning classifier systems using cuda. In: Learning classifier systems. Springer, New York, pp 1–20 Lanzi PL, Loiacono D (2010) Speeding up matching in learning classifier systems using cuda. In: Learning classifier systems. Springer, New York, pp 1–20
27.
Zurück zum Zitat Llorà X, Sastry K (2006) Fast rule matching for lcss via vector instructions. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, New York, pp 1513–1520 Llorà X, Sastry K (2006) Fast rule matching for lcss via vector instructions. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, New York, pp 1513–1520
28.
Zurück zum Zitat Moore J, White B (2007) Tuning relieff for genome-wide genetic analysis. Evolutionary computation, machine learning and data mining in bioinformatics, pp 166–175 Moore J, White B (2007) Tuning relieff for genome-wide genetic analysis. Evolutionary computation, machine learning and data mining in bioinformatics, pp 166–175
29.
Zurück zum Zitat Moore JH (2015) Epistasis analysis using relieff. In: Epistasis. Springer, New York, pp 315–325 Moore JH (2015) Epistasis analysis using relieff. In: Epistasis. Springer, New York, pp 315–325
30.
Zurück zum Zitat Moore JH, Asselbergs FW, Williams SM (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4):445–455CrossRef Moore JH, Asselbergs FW, Williams SM (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4):445–455CrossRef
31.
Zurück zum Zitat Moore JH, Ritchie MD (2004) The challenges of whole-genome approaches to common diseases. JAMA 291(13):1642–1643CrossRef Moore JH, Ritchie MD (2004) The challenges of whole-genome approaches to common diseases. JAMA 291(13):1642–1643CrossRef
32.
Zurück zum Zitat Quinlan JR (1988) An empirical comparison of genetic and decision-tree classifiers. In: ML, pp 135–141 Quinlan JR (1988) An empirical comparison of genetic and decision-tree classifiers. In: ML, pp 135–141
33.
Zurück zum Zitat Rudd J, Moore JH, Urbanowicz RJ (2013) A multi-core parallelization strategy for statistical significance testing in learning classifier systems. Evol Intell 6(2):127–134CrossRef Rudd J, Moore JH, Urbanowicz RJ (2013) A multi-core parallelization strategy for statistical significance testing in learning classifier systems. Evol Intell 6(2):127–134CrossRef
34.
Zurück zum Zitat Smith RE, Cribbs HB III (1994) Is a learning classifier system a type of neural network? Evol Comput 2(1):19–36CrossRef Smith RE, Cribbs HB III (1994) Is a learning classifier system a type of neural network? Evol Comput 2(1):19–36CrossRef
35.
Zurück zum Zitat Tan J, Moore J, Urbanowicz R (2013) Rapid rule compaction strategies for global knowledge discovery in a supervised learning classifier system. Adv Artif Life, ECAL 12:110–117 Tan J, Moore J, Urbanowicz R (2013) Rapid rule compaction strategies for global knowledge discovery in a supervised learning classifier system. Adv Artif Life, ECAL 12:110–117
37.
Zurück zum Zitat Urbanowicz R, Granizo-Mackenzie A, Moore J (2012) Instance-linked attribute tracking and feedback for michigan-style supervised learning classifier systems. In: Proceedings of the fourteenth international conference on genetic and evolutionary computation conference. ACM, New York, pp 927–934 Urbanowicz R, Granizo-Mackenzie A, Moore J (2012) Instance-linked attribute tracking and feedback for michigan-style supervised learning classifier systems. In: Proceedings of the fourteenth international conference on genetic and evolutionary computation conference. ACM, New York, pp 927–934
38.
Zurück zum Zitat Urbanowicz R, Moore J (2010) The application of michigan-style learning classifier systems to address genetic heterogeneity and epistasis in association studies. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 195–202 Urbanowicz R, Moore J (2010) The application of michigan-style learning classifier systems to address genetic heterogeneity and epistasis in association studies. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 195–202
39.
Zurück zum Zitat Urbanowicz R, Moore J (2011) The application of pittsburgh-style lcs to address genetic heterogeneity and epistasis in association studies. Parallel problem solving from nature-PPSN XI, pp 404–413 Urbanowicz R, Moore J (2011) The application of pittsburgh-style lcs to address genetic heterogeneity and epistasis in association studies. Parallel problem solving from nature-PPSN XI, pp 404–413
40.
Zurück zum Zitat Urbanowicz RJ, Andrew AS, Karagas MR, Moore JH (2013) Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J Am Med Inf Assoc Urbanowicz RJ, Andrew AS, Karagas MR, Moore JH (2013) Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J Am Med Inf Assoc
41.
Zurück zum Zitat Urbanowicz RJ, Bertasius G, Moore JH (2014) An extended michigan-style learning classifier system for flexible supervised learning, classification, and data mining. In: Parallel problem solving from nature-PPSN XIII. Springer, New York Urbanowicz RJ, Bertasius G, Moore JH (2014) An extended michigan-style learning classifier system for flexible supervised learning, classification, and data mining. In: Parallel problem solving from nature-PPSN XIII. Springer, New York
42.
Zurück zum Zitat Urbanowicz RJ, Granizo-Mackenzie A, Moore JH (2012) An analysis pipeline with statistical and visualization-guided knowledge discovery for michigan-style learning classifier systems. IEEE Comput Intell Mag 7(4):35–45CrossRef Urbanowicz RJ, Granizo-Mackenzie A, Moore JH (2012) An analysis pipeline with statistical and visualization-guided knowledge discovery for michigan-style learning classifier systems. IEEE Comput Intell Mag 7(4):35–45CrossRef
43.
Zurück zum Zitat Urbanowicz RJ, Granizo-Mackenzie D, Moore JH (2012) Using expert knowledge to guide covering and mutation in a michigan style learning classifier system to detect epistasis and heterogeneity. In: Parallel problem solving from nature-PPSN XII. Springer, New York, pp 266–275 Urbanowicz RJ, Granizo-Mackenzie D, Moore JH (2012) Using expert knowledge to guide covering and mutation in a michigan style learning classifier system to detect epistasis and heterogeneity. In: Parallel problem solving from nature-PPSN XII. Springer, New York, pp 266–275
44.
Zurück zum Zitat Urbanowicz RJ, Kiralis J, Fisher JM, Moore JH (2012) Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection. BioData Min 5(1):1–13CrossRef Urbanowicz RJ, Kiralis J, Fisher JM, Moore JH (2012) Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection. BioData Min 5(1):1–13CrossRef
45.
Zurück zum Zitat Urbanowicz RJ, Kiralis J, Sinnott-Armstrong NA, Heberling T, Fisher JM, Moore JH (2012) Gametes: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min 5(1):16CrossRef Urbanowicz RJ, Kiralis J, Sinnott-Armstrong NA, Heberling T, Fisher JM, Moore JH (2012) Gametes: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min 5(1):16CrossRef
46.
Zurück zum Zitat Urbanowicz RJ, Moore JH (2009) Learning classifier systems: a complete introduction, review, and roadmap. J Artif Evol Appl Urbanowicz RJ, Moore JH (2009) Learning classifier systems: a complete introduction, review, and roadmap. J Artif Evol Appl
47.
Zurück zum Zitat Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH (2007) A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31(4):306–315CrossRef Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH (2007) A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31(4):306–315CrossRef
48.
Zurück zum Zitat Wilson S (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175CrossRef Wilson S (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175CrossRef
49.
Zurück zum Zitat Wilson SW (1987) Classifier systems and the animat problem. Mach Learn 2(3):199–228 Wilson SW (1987) Classifier systems and the animat problem. Mach Learn 2(3):199–228
Metadaten
Titel
ExSTraCS 2.0: description and evaluation of a scalable learning classifier system
verfasst von
Ryan J. Urbanowicz
Jason H. Moore
Publikationsdatum
01.09.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Evolutionary Intelligence / Ausgabe 2-3/2015
Print ISSN: 1864-5909
Elektronische ISSN: 1864-5917
DOI
https://doi.org/10.1007/s12065-015-0128-8

Weitere Artikel der Ausgabe 2-3/2015

Evolutionary Intelligence 2-3/2015 Zur Ausgabe