ABSTRACT
Datasets with a large number of attributes are a difficult challenge for evolutionary learning techniques. The recently proposed attribute list rule representation has shown to be able to significantly improve the overall performance (e.g. run-time, accuracy, rule set size) of the BioHEL Iterative Evolutionary Rule Learning system. In this paper we, first, extend the attribute list rule representation so it can handle not only continuous domains, but also datasets with a very large number of mixed discrete-continuous attributes. Secondly, we benchmark the new representation with a diverse set of large-scale datasets and, third, we compare the new algorithms with several well-known machine learning methods. The experimental results we describe in the paper show that the new representation is equal or better than the state of-the-art in evolutionary rule representations both in terms of the accuracy obtained with the benchmark datasets used, as well as in terms of the computational time requirements needed to achieve these improved accuracies. The new attribute list representation puts BioHEL on an equal footing with other well-established machine learning techniques in terms of accuracy. In the paper, we also analyse and discuss the current weaknesses behind the current representation and indicate potential avenues for correcting them.
- J. Bacardit. Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Spain, 2004.Google Scholar
- J. Bacardit, E. K. Burke, and N. Krasnogor. Improving the scalability of rule-based evolutionary learning. Memetic Computing, in press, 2009.Google Scholar
- J. Bacardit and N. Krasnogor. Performance and efficiency of memetic pittsburgh learning classifier systems. Evolutionary Computation Journal, 17(3):in press, 2009. Google ScholarDigital Library
- J. Bacardit, M. Stout, J. D. Hirst, K. Sastry, X. Llorà, and N. Krasnogor. Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 346--353. ACM Press, 2007. Google ScholarDigital Library
- J. Bacardit, M. Stout, J. D. Hirst, A. Valencia, R. E. Smith, and N. Krasnogor. Automated alphabet reduction for protein datasets. BMC Bioinformatics, 10:6, 2009.Google ScholarCross Ref
- G. W. Bassel, P. Fung, T.-f. F. Chow, J. A. Foong, N. J. Provart, and S. R. Cutler. Elucidating the Germination Transcriptional Program Using Small Molecules. Plant Physiol., 147(1):143--155, 2008.Google Scholar
- C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases, 1998. (www.ics.uci.edu/mlearn/MLRepository.html).Google Scholar
- M. V. Butz. Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design, volume 109 of Studies in Fuzziness and Soft Computing. Springer, 2006.Google Scholar
- M. V. Butz, P. L. Lanzi, X. Llorà, and D. Loiacono. An analysis of matching in learning classifier systems. In GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1349--1356. ACM, 2008. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. Department of Computer Science and Information Engineering, National Taiwan University, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
- K. A. De Jong and W. M. Spears. Learning concept classification rules using genetic algorithms. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 651--656. Morgan Kaufmann, 1991.Google Scholar
- J. Demsar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1--30, 2006. Google ScholarDigital Library
- F. Divina, M. Keijzer, and E. Marchiori. A method for handling numerical attributes in GA-based inductive concept learners. In GECCO 2003: Proceedings of the Genetic and Evolutionary Computation Conference, pages 898--908. Springer-Verlag, 12-16 July 2003. Google ScholarDigital Library
- A. A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag, 2002. Google ScholarDigital Library
- I. Guyon and A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157--1182, 2003. Google ScholarDigital Library
- J. H. Holland and J. S. Reitman. Cognitive systems based on adaptive algorithms. In D. Hayes-Roth and F. Waterman, editors, Pattern-directed Inference Systems, pages 313--329. Academic Press, New York, 1978.Google ScholarCross Ref
- X. Llorà, R. Reddy, B. Matesic, and R. Bhargava. Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging. In GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 2098--2105. ACM Press, 2007. Google ScholarDigital Library
- X. Llorà and K. Sastry. Fast rule matching for learning classifier systems via vector instructions. In GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1513--1520. ACM Press, 2006. Google ScholarDigital Library
- A. Orriols-Puig. New Challenges in Learning Classifier Systems: Mining Rarities and Evolving Fuzzy Models. PhD thesis, Ramon Llull University, Barcelona, Spain, 2008.Google Scholar
- J. Rissanen. Modeling by shortest data description. Automatica, vol. 14:465--471, 1978.Google ScholarDigital Library
- C. Schumacher, M. D. Vose, and L. D. Whitley. The no free lunch and problem description length. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001, pages 565--570. Morgan Kaufmann, 2001.Google Scholar
- M. Stout, J. Bacardit, J. D. Hirst, and N. Krasnogor. Prediction of recursive convex hull class assignments for protein residues. Bioinformatics, 24(7):916--923, 2008. Google ScholarDigital Library
- H. Vafaie and K. A. De Jong. Genetic algorithms as a tool for feature selection in machine learning. In Proceeding of the 4th International Conference on Tools with Artificial Intelligence, pages 200--203, 1992.Google ScholarCross Ref
- G. Venturini. SIA: A supervised inductive algorithm with genetic search for learning attributes based concepts. In P. B. Brazdil, editor, Machine Learning: ECML-93 - Proc. of the European Conference on Machine Learning, pages 280--296. Springer-Verlag, 1993. Google ScholarDigital Library
- S. W. Wilson. Get real! XCS with continuous-valued inputs. In L. Booker, S. Forrest, M. Mitchell, and R. L. Riolo, editors, Festschrift in Honor of John H. Holland, pages 111--121. Center for the Study of Complex Systems, 1999.Google Scholar
- I. H. Witten and E. Frank. Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, 2000. Google ScholarDigital Library
- D. H. Wolpert and W. G. Macready. No free lunch theorems for search. Working Papers 95-02-010, Santa Fe Institute, Feb 1995. available at http://ideas.repec.org/p/wop/safiwp/95-02-010.html.Google Scholar
Index Terms
- A mixed discrete-continuous attribute list representation for large scale classification domains
Recommendations
Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study
The classification problem can be addressed by numerous techniques and algorithms which belong to different paradigms of machine learning. In this paper, we are interested in evolutionary algorithms, the so-called genetics-based machine learning ...
Modelling the initialisation stage of the ALKR representation for discrete domains and GABIL encoding
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computationModels in Genetic Based Machine Learning (GBML) systems are commonly used to gain understanding of how the system works and, as a consequence, adjust it better. In this paper we propose models for the probability of having a good initial population ...
Speeding up the evaluation of evolutionary learning systems using GPGPUs
GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computationIn this paper we introduce a method for computing fitness in evolutionary learning systems based on NVIDIA's massive parallel technology using the CUDA library. Both the match process of a population of classifiers against a training set and the ...
Comments