Abstract
This paper presents a dual-objective evolutionary algorithm (DOEA) for extracting multiple decision rule lists in data mining, which aims at satisfying the classification criteria of high accuracy and ease of user comprehension. Unlike existing approaches, the algorithm incorporates the concept of Pareto dominance to evolve a set of non-dominated decision rule lists each having different classification accuracy and number of rules over a specified range. The classification results of DOEA are analyzed and compared with existing rule-based and non-rule based classifiers based upon 8 test problems obtained from UCI Machine Learning Repository. It is shown that the DOEA produces comprehensible rules with competitive classification accuracy as compared to many methods in literature. Results obtained from box plots and t-tests further examine its invariance to random partition of datasets.
Similar content being viewed by others
References
A.D. Arbatli and H.L. Akin, “Rule extraction from trained neural networks using genetic algorithms,” in Proceedings of the 2nd World Congress of Nonlinear Analysis, Theory, Methods & Application, vol. 30, no. 3, pp. 1639–1648, 1997.
W. Banzhaf, E. Nordin, P.R. Keller, and F.D. Francone, Genetic Programming: An Introduction on the Automatic Evolution of Computer Programs and its Applications, Morgan Kaufmann, San Francisco, CA, 1998.
C.L. Blake and C.J. Merz, UCI Repository of machine learning databases [http://www.ics.uci.edu/∼mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998.
C.C. Bojarczuk, H.S. Lopes, and A.A. Freitas, “Genetic programming for knowledge discovery in chest-pain diagnosis,” IEEE Engineering in Medicine and Biology Magazine, vol. 4, no. 19, pp. 38–44, 2000.
M. Brameier and W. Banzhaf, “A comparison of linear genetic programming neural networks in medical data mining,” IEEE Transactions on Evolutionary Computation, vol. 5, no. 1, pp. 17–26, 2001.
R. Cattral, F. Oppacher, and D. Deugo, “Rule acquisition with a genetic algorithm,” in Proceedings of the IEEE Congress on Evolutionary Computation, vol. 1, pp. 125–129, 1999.
J.M. Chambers, W.S. Cleveland, B. Kleiner, and P.A. Turkey, Graphical Methods for Data Analysis, Wadsworth & Brooks/Cole, Pacific CA, 1983.
C.A. Coello Coello, D.A. Van Veldhuizen, and G.B. Lamont, Evolutionary Algorithms for Solving Multi-Objective Problems, Plenum Pub Corp, 2002.
C.B. Congdon, “Classification of epidemiological data: a comparison of genetic algorithm and decision tree approaches,” in Proceedings of the IEEE Congress on Evolutionary Computation, vol. 1, pp. 442–449, 2000.
R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd edition, John Wiley and Sons, 2001.
U. Fayyad, “Data mining and knowledge discovery in databases: implications for scientific databases,” Proceedings of the Ninth International Conference on Scientific and Statistical Database Management, pp. 2–11, 1997.
M.V. Fidelis, H.S. Lopes, and A. Freitas, “Discovering comprehensible classification rules with a genetic algorithm,” in Proceedings of the IEEE Congress on Evolutionary Computation, vol. 1, pp. 805–810, 2000.
E. Frank and I.H. Witten “Generating accurate rule sets without global optimization,” Proceedings of the Fifteenth International Conference Machine Learning (ICML’98), pp. 144–151, 1998.
L.M. Howard and D.J. D’Angelo, “The GA-P: a genetic algorithm and genetic programming hybrid,” IEEE Expert, vol. 10, pp. 11–15, 1995.
H. Ishibuchi, T. Murata, and I.B. Türksen, “Single-objective and two-objective genetic algorithms for selecting linguistic rules for pattern classification problems,” Fuzzy Sets and Systems, vol. 89, no. 2, pp. 135–150, 1997.
H. Ishibuchi, T. Nakashima, and T. Murata, “Three-objective genetics-based machine learning for linguistic rule extraction,” Information Sciences, vol. 136, no. 1–4, pp. 109–133, 2001.
G.H. John, and P. Langley, “Estimating continuous distributions in Bayesian classifiers,” in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Mateo, pp. 338–345, 1995.
Y. Kim, W.N. Street, and F. Menczer, “Evolutionary model selection in unsupervised learning,” Intelligent Data Analysis, vol. 6, no. 6, pp. 531–556, 2002.
J.K. Kishore, L.M. Patnaik, V. Mani, and V.K. Agrawal, “Application of genetic programming for multicategory pattern classification,” IEEE Transactions on Evolutionary Computation, vol. 4, no. 3, pp. 242–258, 2000.
R. Kohavi, “The power of decision tables,” in Proceedings of the 8th European Conference on Machine Learning, pp. 174–189, 1995.
R.R.F. Mendes, F.B. Voznika, A.A. Freitas and J.C. Nievola, “Discovering fuzzy classification rules with genetic programming and co-evolution,” Lecture Notes in Artificial Intelligence 2168, Springer-Verlag, pp. 314–325, 2001.
Z. Michalewicz, Genetic Algorithms + Data Structure = Evolutionary Programs, Springer-Verlag: Berlin, 2nd edition, 1996.
D. Michie, D.J. Spiegelhalter, and C.C. Taylor, Machine Learning, Neural and Statistical Classification, London: Ellis Horwood, 1994.
T.M. Mitchell, Machine Learning, McGraw Hill, 1997.
D.C. Montgomery, G.C. Runger, and N.F. Hubele, Engineering Statistics, Wiley, John & Sons:, New York, 2nd edition, 2001.
C.A. Peña-Reyes and M. Sipper, “A fuzzy-genetic approach to breast cancer diagnosis,” Artificial Intelligence in Medicine, vol. 17, no. 2, pp. 131–155, 1999.
A.R. Polo and M. Hasse, “A Genetic Classifier Tool,” in Proceedings of the 20th International Conference of the Chilean Computer Science Society, pp. 14–23, 2000.
L. Prechelt, “Some notes on neural learning algorithm benchmarking,” Neurocomputing, vol. 9, no. 3, pp. 343–347, 1995.
J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann: CA, 1993.
R. Setiono and H. Liu, “NeuroLinear: From neural networks to oblique decision rules,” Neurocomputing, vol. 17, no. 1, pp. 1–24, 1997.
K.C. Tan, A. Tay, T.H. Lee, and C.M. Heng, “Mining multiple comprehensible classification rules using genetic programming,” in Proceedings of the IEEE Congress on Evolutionary Computation, Honolulu, Hawaii, vol. 2, pp. 1302–1307, 2002.
K.C. Tan, Q. Yu, and T.H. Lee, “A distributed coevolutionary classifier for knowledge discovery in data mining,” IEEE Transaction on Systems, Man and Cybernetics: Part C (Applications and Reviews), vol. 35, no. 2, pp. 131–142, 2005.
D.A. Van Veldhuizen and G.B. Lamont, “Multiobjective Evolutionary Algorithms: Analyzing the State-of-the-Art,” Evolutionary Computation, vol. 8, no. 2, pp. 125–147, 2000.
V. Vapnik, The Nature of Statistical Learning Theory, Springer: NY, 1995.
C.H. Wang, T.P. Hong, S.S. Tseng, and C.M. Liao, “Automatically integrating multiple rule sets in a distributed-knowledge environment,” IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, vol. 28, no. 3, pp. 471–476, 1998.
C.H. Wang, T.P. Hong, and S.S. Tseng, “Integrating membership functions and fuzzy rule sets from multiple knowledge sources,” Fuzzy Sets and Systems, vol. 112, no. 1, pp. 141–154, 2000.
I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann Publishers: CA, 1999.
M.L. Wong and K.S. Leung, Data Mining Using Grammar Based Genetic Programming and Applications, Kluwer Academic Publishers: London, 2000.
X. Yao, and Y. Liu, “A new evolutionary system for evolving artificial neural networks,” IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 694–713, 1997.
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article is available at http://dx.doi.org/10.1007/s10589-006-9594-3.
Rights and permissions
About this article
Cite this article
Tan, K.C., Yu, Q. & Ang, J.H. A Dual-Objective Evolutionary Algorithm for Rules Extraction in Data Mining. Comput Optim Applic 34, 273–294 (2006). https://doi.org/10.1007/s10589-005-3907-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-005-3907-9