skip to main content
article

A critical review of multi-objective optimization in data mining: a position paper

Published:01 December 2004Publication History
Skip Abstract Section

Abstract

This paper addresses the problem of how to evaluate the quality of a model built from the data in a multi-objective optimization scenario, where two or more quality criteria must be simultaneously optimized. A typical example is a scenario where one wants to maximize both the accuracy and the simplicity of a classification model or a candidate attribute subset in attribute selection. One reviews three very different approaches to cope with this problem, namely: (a) transforming the original multi-objective problem into a single-objective problem by using a weighted formula; (b) the lexicographical approach, where the objectives are ranked in order of priority; and (c) the Pareto approach, which consists of finding as many non-dominated solutions as possible and returning the set of non-dominated solutions to the user. One also presents a critical review of the case for and against each of these approaches. The general conclusions are that the weighted formula approach -- which is by far the most used in the data mining literature -- is to a large extent an ad-hoc approach for multi-objective optimization, whereas the lexicographic and the Pareto approach are more principled approaches, and therefore deserve more attention from the data mining community.

References

  1. {Baeza-Yates & Ribeiro-Neto 1999} R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {Bhattacharyya 2000} S. Bhattacharyya. Evolutionary algorithms in data mining: multi-objective performance modelling for direct marketing. Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2000), 465--473. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {Bradley 1997} A. P. Bradley. The use of the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145--1159. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {Brachman & Anand 1996} R. J. Brachman and T. Anand. The process of knowledge discovery in databases: a human-centered approach. In: U. M. Fayyad et al (Eds.) Advances in Knowledge Discovery and Data Mining, 37--58. AAAI/MIT, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {Bruha & Tkadlec 2003} I. Bruha and J. Tkadlec. Rule quality for multiple-rule classifier: empirical expertise and theoretical methodology. Intelligent Data Analysis 7(2), 2003, 99--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {Cherkauer & Shavlik 1996} K. J. Cherkauer and J. W. Shavlik. Growing simpler decision trees to facilitate knowledge discovery. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), 315--318. AAAI Press, 1996.Google ScholarGoogle Scholar
  7. {Chisholm & Tadepalli 2002} M. Chisholm and P. Tadepalli. Learning decision rules by randomised iterative local search. Proc. 19th Int. Conf. on Machine Learning (ICML-2002), 75--82. Morgan Kaufmann, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {Corne et al. 2003} D. Corne, K. Deb and P. J. Fleming. The good of the many outweighs the good of the one: evolutionary multi-objective optimization. IEEE Connections Newsletter 1(1), 9-13. IEEE Neural Networks Society, Feb. 2003.Google ScholarGoogle Scholar
  9. {Deb 2001} K. Deb. Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {Fayyad & Irani 1993} U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributges for classification learning. Proc. 13th Int. Joint Conf. on Artificial Intelligence (IJCAI'93), 1022--1027. 1993.Google ScholarGoogle Scholar
  11. {Fayyad et al. 1996} U. M. Fayyad, G. Piatetsky-Shapiro and P. Smyth. From data mining to knowledge discovery: an overview. In: U. M. Fayyad et al (Eds.) Advances in Knowledge Discovery and Data Mining, 1--34. AAAI/MIT, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {Freitas 1998} A. A. Freitas. On objective measures of rule surprisingness. Principles of Data Mining & Knowledge Discovery (Proc. 2nd European Symp., PKDD'98). LNAI 1510, 1--9. Springer-Verlag, 1998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {Furnkranz & Flach 2003} J. Furnkranz and P. A. Flach. An analysis of rule evaluation metrics. Proc. 20th Int. Conf. on Machine Learning (ICML-2003). Morgan Kaufmann, 2003.Google ScholarGoogle Scholar
  14. {Guyon & Elisseeff 2003} I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research 3 (2003), 1157--1182, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {Kaufmann & Michalski 1999} K. A. Kaufmann and R. S. Michalski. Learning from inconsistent and noisy data: the AQ18 approach. Foundations of Intelligent Systems (Proc. ISMIS-99). LNAI 1609, 411--419. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {Kim et al. 2000} Y. Kim, W. N. Street and F. Menczer. Feature selection in unsupervised learning via evolutionary search. Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2000), 365--369. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {Kohavi & John 1998} R. Kohavi and G. H. John. The wrapper approach. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection: a data mining perspective, 33--50. Kluwer, 1998.Google ScholarGoogle Scholar
  18. {Li et al. 2002} J. Li, R. Topor, H. Shen. Construct robust rule sets for classification. Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining (KDD-2002). ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {Ling & Zhang 2002} C. L. Ling and H. Zhang. Toward Bayesian classifiers with accurate probabilities. Advances in Knowledge Discovery & Data Mining (Proc. PAKDD-2002), LNAI 2336, 123--134. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {Liu & Motoda 1998} H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection: a data mining perspective. Kluwer, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {Liu et al. 1997} B. Liu, W. Hsu and S. Chen. Using general impressions to analyze discovered classification rules. Proc. 3rd Intl. Conf. on Knowledge Discovery & Data Mining (KDD-97), 31--36. AAAI Press, 1997.Google ScholarGoogle Scholar
  22. {Mitchell 1980} T. M. Mitchell. The need for biases in learning generalizations. Rutgers Technical Report, 1980. Published in: J. W. Shavlik and T. G. Dietterich (Eds.) Readings in Machine Learning, 184--191. Morgan Kaufmann, 1990.Google ScholarGoogle Scholar
  23. {Mitchell 1997} T. M. Mitchell. Machine Learning. McGraw-Hill, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {Pappa et al. 2002} G. L. Pappa, A. A. Freitas and C. A. A. Kaestner. A multiobjective genetic algorithm for attribute selection. Proc. 4th Int. Conf. on Recent Advances in Soft Computing (RASC-2002), 116--121. Published in CD-ROM (ISBN: 1-84233-0764). Nottingham Trent University, UK. Dec. 2002.Google ScholarGoogle Scholar
  25. {Provost & Fawcett 1997} F. Provost and T. Fawcett. Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD-97), 43--48. AAAI Press, 1997.Google ScholarGoogle Scholar
  26. {Quinlan & Rivest 1989} J. R. Quinlan and R. L. Rivest. Inferring decision trees using the minimum description length principle. Information and Computation 80, 227--248. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. {Ting 2002} K. M. Ting. Issues in classifier evaluation using optimal cost curves. Proc. 19th Int. Conf. on Machine Learning (ICML-2002), 642--649. Morgan Kaufmann, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. {Turney 1995} P. D. Turney. Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J. AI Research 2, Mar. 1995, 369--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. {Tuzhilin 2002} A. Tuzhilin. Minimum Description Length. In: W. Klosgen & J. M. Zytkow (Eds.) Handbook of Data Mining and Knowledge Discovery, 490--496. Oxford University Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. {Weiss & Indurkhya 2000} S. M. Weiss and N. Indurkhya. Lightweight rule induction. Proc. 17th Int. Conf. on Machine Learning (ICML-2000), 1135--1142. Morgan Kaufmann, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {Weiss et al. 2003} S. M. Weiss, S. J. Buckley, S. Kapoor and S. Damgaard. Knowledge-based data mining. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2003), 456--461. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. {Yu & Liu 2003} L. Yu and H. Liu. Efficiently handling feature redundancy in high-dimensional data. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2003), 685--690. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A critical review of multi-objective optimization in data mining: a position paper
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader