Abstract
This paper addresses the problem of how to evaluate the quality of a model built from the data in a multi-objective optimization scenario, where two or more quality criteria must be simultaneously optimized. A typical example is a scenario where one wants to maximize both the accuracy and the simplicity of a classification model or a candidate attribute subset in attribute selection. One reviews three very different approaches to cope with this problem, namely: (a) transforming the original multi-objective problem into a single-objective problem by using a weighted formula; (b) the lexicographical approach, where the objectives are ranked in order of priority; and (c) the Pareto approach, which consists of finding as many non-dominated solutions as possible and returning the set of non-dominated solutions to the user. One also presents a critical review of the case for and against each of these approaches. The general conclusions are that the weighted formula approach -- which is by far the most used in the data mining literature -- is to a large extent an ad-hoc approach for multi-objective optimization, whereas the lexicographic and the Pareto approach are more principled approaches, and therefore deserve more attention from the data mining community.
- {Baeza-Yates & Ribeiro-Neto 1999} R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarDigital Library
- {Bhattacharyya 2000} S. Bhattacharyya. Evolutionary algorithms in data mining: multi-objective performance modelling for direct marketing. Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2000), 465--473. ACM, 2000. Google ScholarDigital Library
- {Bradley 1997} A. P. Bradley. The use of the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145--1159. 1997. Google ScholarDigital Library
- {Brachman & Anand 1996} R. J. Brachman and T. Anand. The process of knowledge discovery in databases: a human-centered approach. In: U. M. Fayyad et al (Eds.) Advances in Knowledge Discovery and Data Mining, 37--58. AAAI/MIT, 1996. Google ScholarDigital Library
- {Bruha & Tkadlec 2003} I. Bruha and J. Tkadlec. Rule quality for multiple-rule classifier: empirical expertise and theoretical methodology. Intelligent Data Analysis 7(2), 2003, 99--124. Google ScholarDigital Library
- {Cherkauer & Shavlik 1996} K. J. Cherkauer and J. W. Shavlik. Growing simpler decision trees to facilitate knowledge discovery. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), 315--318. AAAI Press, 1996.Google Scholar
- {Chisholm & Tadepalli 2002} M. Chisholm and P. Tadepalli. Learning decision rules by randomised iterative local search. Proc. 19th Int. Conf. on Machine Learning (ICML-2002), 75--82. Morgan Kaufmann, 2002. Google ScholarDigital Library
- {Corne et al. 2003} D. Corne, K. Deb and P. J. Fleming. The good of the many outweighs the good of the one: evolutionary multi-objective optimization. IEEE Connections Newsletter 1(1), 9-13. IEEE Neural Networks Society, Feb. 2003.Google Scholar
- {Deb 2001} K. Deb. Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons, 2001. Google ScholarDigital Library
- {Fayyad & Irani 1993} U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributges for classification learning. Proc. 13th Int. Joint Conf. on Artificial Intelligence (IJCAI'93), 1022--1027. 1993.Google Scholar
- {Fayyad et al. 1996} U. M. Fayyad, G. Piatetsky-Shapiro and P. Smyth. From data mining to knowledge discovery: an overview. In: U. M. Fayyad et al (Eds.) Advances in Knowledge Discovery and Data Mining, 1--34. AAAI/MIT, 1996. Google ScholarDigital Library
- {Freitas 1998} A. A. Freitas. On objective measures of rule surprisingness. Principles of Data Mining & Knowledge Discovery (Proc. 2nd European Symp., PKDD'98). LNAI 1510, 1--9. Springer-Verlag, 1998 Google ScholarDigital Library
- {Furnkranz & Flach 2003} J. Furnkranz and P. A. Flach. An analysis of rule evaluation metrics. Proc. 20th Int. Conf. on Machine Learning (ICML-2003). Morgan Kaufmann, 2003.Google Scholar
- {Guyon & Elisseeff 2003} I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research 3 (2003), 1157--1182, 2003. Google ScholarDigital Library
- {Kaufmann & Michalski 1999} K. A. Kaufmann and R. S. Michalski. Learning from inconsistent and noisy data: the AQ18 approach. Foundations of Intelligent Systems (Proc. ISMIS-99). LNAI 1609, 411--419. Springer, 1999. Google ScholarDigital Library
- {Kim et al. 2000} Y. Kim, W. N. Street and F. Menczer. Feature selection in unsupervised learning via evolutionary search. Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2000), 365--369. ACM, 2000. Google ScholarDigital Library
- {Kohavi & John 1998} R. Kohavi and G. H. John. The wrapper approach. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection: a data mining perspective, 33--50. Kluwer, 1998.Google Scholar
- {Li et al. 2002} J. Li, R. Topor, H. Shen. Construct robust rule sets for classification. Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining (KDD-2002). ACM Press, 2002. Google ScholarDigital Library
- {Ling & Zhang 2002} C. L. Ling and H. Zhang. Toward Bayesian classifiers with accurate probabilities. Advances in Knowledge Discovery & Data Mining (Proc. PAKDD-2002), LNAI 2336, 123--134. Springer, 2002. Google ScholarDigital Library
- {Liu & Motoda 1998} H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection: a data mining perspective. Kluwer, 1998. Google ScholarDigital Library
- {Liu et al. 1997} B. Liu, W. Hsu and S. Chen. Using general impressions to analyze discovered classification rules. Proc. 3rd Intl. Conf. on Knowledge Discovery & Data Mining (KDD-97), 31--36. AAAI Press, 1997.Google Scholar
- {Mitchell 1980} T. M. Mitchell. The need for biases in learning generalizations. Rutgers Technical Report, 1980. Published in: J. W. Shavlik and T. G. Dietterich (Eds.) Readings in Machine Learning, 184--191. Morgan Kaufmann, 1990.Google Scholar
- {Mitchell 1997} T. M. Mitchell. Machine Learning. McGraw-Hill, 1997. Google ScholarDigital Library
- {Pappa et al. 2002} G. L. Pappa, A. A. Freitas and C. A. A. Kaestner. A multiobjective genetic algorithm for attribute selection. Proc. 4th Int. Conf. on Recent Advances in Soft Computing (RASC-2002), 116--121. Published in CD-ROM (ISBN: 1-84233-0764). Nottingham Trent University, UK. Dec. 2002.Google Scholar
- {Provost & Fawcett 1997} F. Provost and T. Fawcett. Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD-97), 43--48. AAAI Press, 1997.Google Scholar
- {Quinlan & Rivest 1989} J. R. Quinlan and R. L. Rivest. Inferring decision trees using the minimum description length principle. Information and Computation 80, 227--248. 1989. Google ScholarDigital Library
- {Ting 2002} K. M. Ting. Issues in classifier evaluation using optimal cost curves. Proc. 19th Int. Conf. on Machine Learning (ICML-2002), 642--649. Morgan Kaufmann, 2002. Google ScholarDigital Library
- {Turney 1995} P. D. Turney. Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J. AI Research 2, Mar. 1995, 369--409. Google ScholarDigital Library
- {Tuzhilin 2002} A. Tuzhilin. Minimum Description Length. In: W. Klosgen & J. M. Zytkow (Eds.) Handbook of Data Mining and Knowledge Discovery, 490--496. Oxford University Press, 2002. Google ScholarDigital Library
- {Weiss & Indurkhya 2000} S. M. Weiss and N. Indurkhya. Lightweight rule induction. Proc. 17th Int. Conf. on Machine Learning (ICML-2000), 1135--1142. Morgan Kaufmann, 2000. Google ScholarDigital Library
- {Weiss et al. 2003} S. M. Weiss, S. J. Buckley, S. Kapoor and S. Damgaard. Knowledge-based data mining. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2003), 456--461. ACM, 2003. Google ScholarDigital Library
- {Yu & Liu 2003} L. Yu and H. Liu. Efficiently handling feature redundancy in high-dimensional data. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2003), 685--690. ACM, 2003. Google ScholarDigital Library
Index Terms
- A critical review of multi-objective optimization in data mining: a position paper
Recommendations
Inverted PBI in MOEA/D and its impact on the search performance on multi and many-objective optimization
GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary ComputationMOEA/D decomposes a multi-objective optimization problem into a number of single objective optimization problems. Each single objective optimization problem is defined by a scalarizing function using a weight vector. In MOEA/D, there are several ...
Multi-objective optimization using teaching-learning-based optimization algorithm
Two major goals in multi-objective optimization are to obtain a set of nondominated solutions as closely as possible to the true Pareto front (PF) and maintain a well-distributed solution set along the Pareto front. In this paper, we propose a teaching-...
Logic optimality for multi-objective optimization
Pareto dominance is one of the most basic concepts in multi-objective optimization. However, it is inefficient when the number of objectives is large because in this case it leads to an unmanageable number of Pareto solutions. In order to solve this ...
Comments