Abstract
The vast majority of the literature evaluates the performance of classification models using only the criterion of predictive accuracy. This paper reviews the case for considering also the comprehensibility (interpretability) of classification models, and discusses the interpretability of five types of classification models, namely decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifiers. We discuss both interpretability issues which are specific to each of those model types and more generic interpretability issues, namely the drawbacks of using model size as the only criterion to evaluate the comprehensibility of a model, and the use of monotonicity constraints to improve the comprehensibility and acceptance of classification models by users.
- Allahyari, H., and Lavesson, N. User-oriented assessment of classification model understandability. Proc. 11th Scandinavian Conf. on Artificial Intelligence. IOS, 2011.Google Scholar
- Altendorf, E.E., Restificar, A.C., and Dietterich, T.G. Learning from sparse data by exploiting monotonicity constraints. Proc. 21st Annual Conf. on Uncertainty in Artificial Intelligence (UAI'05), 18--26. AUAI, 2005.Google Scholar
- Augusta, M.G., and Kathirvalavakumar, T. Reverse engineering the neural networks for rule extraction in classification problems. Neural Processing Letters 35(2): 131--150, April 2012. Google ScholarDigital Library
- Baesens, B., Mues, C., De Backer, M., and Vanthienen, J. Building intelligent credit scoring systems using decision tables. In: Enterprise Information Systems V, 131--137. Kluwer, 2004.Google Scholar
- Bellazzi, R., and Zupan, B. Predictive data mining in clinical medicine: current issues and guidelines. International Journal of Medical Informatics 77(2): 81--97, Feb. 2008.Google ScholarCross Ref
- Ben-David, A. Monotonicity maintenance in informationtheoretic machine learning algorithms. Machine Learning 19(1): 29--43. 1995. Google ScholarDigital Library
- Ben-David, A., Sterling, L., and Tran, T. Adding monotonicity to learning algorithms may impair their accuracy. Expert Systems with Applications 36(3): 6627--6634. April 2009. Google ScholarDigital Library
- Boz, O. Extracting decision trees from trained neural networks. Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'02), 456--461. ACM, 2002. Google ScholarDigital Library
- Bramer, M. Principles of Data Mining. Springer, 2007.Google ScholarDigital Library
- Cendrowska, J. PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies 27(4): 349--370. 1987.Google ScholarCross Ref
- Cheng, J., and Greiner, R. Learning Bayesian belief classifiers: algorithms and system. Proc. 14th Biennial Conference of Canadian Society on Computational Studies of Intelligence (AI'01), 141--151. Springer, 2001. Google ScholarDigital Library
- Clark, P., Boswell, R. Rule induction with CN2: some recent improvements. In: Machine Learning -- Proc. Fifth European Conf. (EWSL'91), 151--163. Springer, 1991. Google ScholarDigital Library
- Cristianini, N., and Shawe-Taylor, J. An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000. Google ScholarDigital Library
- Dejaeger, K., Goethals, F., Giangreco, A., Mola, L., and Baesens, B. Gaining insight into student satisfaction using comprehensible data mining techniques. European Journal of Operational Research, 218(2): 548--562, 2012.Google ScholarCross Ref
- Dhar, V., Chou, D., and Provost, F. Discovering interesting patterns for investment decision making with GLOWER -- a genetic learner overlaid with entropy reduction. Data Mining and Knowledge Discovery 4(4): 251--280, 2000. Google ScholarDigital Library
- Doderer, M., Yoon, K., Salinas, J., and Kwek, S. Protein subcellular localization prediction using a hybrid of similarity search and error-correcting output code techniques that produces interpretable results. In Silico Biology 6(5): 419--433, 2006.Google Scholar
- Domingos, P. Occam's two razors: the sharp and the blunt. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), 37--43. AAAI, 1998.Google Scholar
- Duivesteijn, W., and Feelders, A. Nearest neighbour classification with monotonicity constraints. Proc. ECML PKDD 2008, Part I, LNAI 5211, 301--316. Springer, 2008. Google ScholarDigital Library
- Elazmeh, W., Matwin, W., O'Sullivan, D., Michalowski, W., and Farion, W. Insights from predicting pediatric asthma exacerbations from retrospective clinical data. In: Evaluation Methods for Machine Learning II -- Papers from 2007 AAAI Workshop, 10--15. Technical Report WS-07-05. AAAI, 2007.Google Scholar
- Elomaa, T. In Defense of C4.5: Notes on learning one-level decision trees. Proc. 11th Int. Conf. on Machine Learning (ICML'94), pp. 62--69. Morgan Kaufmann, 1994.Google ScholarCross Ref
- Feelders, A.J. Prior knowledge in economic applications of data mining. Proc. European Conf. on Principles and Practice of Knowledge Discovery and Data Mining (PKDD' 2000), LNAI 1910, 395--400. Springer, 2000. Google ScholarDigital Library
- Feelders, A., and Pardoel, M. Pruning for monotone classification trees. Proc. Intelligent Data Analysis (IDA) Conf., LNCS 2810, 1--12. Springer, 2003.Google ScholarCross Ref
- Ferri, C., Hernandez-Orallo, J., and Ramirez-Quintana, M.J. From ensemble methods to comprehensible models. Proc. 5th Int. Conf. on Discovery Science (DS'2002), LNCS 2534, 165--177. Springer, 2002. Google ScholarDigital Library
- Freitas, A.A. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, 2002. Google ScholarDigital Library
- Freitas, A.A. A critical review of multi-objective optimization in data mining: a position paper. ACM SIGKDD Explorations, 6(2): 77--86. ACM, Dec. 2004. Google ScholarDigital Library
- Freitas, A.A., Wieser, D.C. and Apweiler, R. On the importance of comprehensible classification models for protein function prediction. ACM/IEEE Transactions on Computational Biology and Bioinformatics 7(1): 172--182, Jan.-Mar. 2010. Google ScholarDigital Library
- Friedman, N., Geiger, D., and Goldszmidt, M. Bayesian network classifiers. Machine Learning 29(2-3): 131--163, Nov./Dec. 1997. Google ScholarDigital Library
- Fung, G., Sandilya, S., and Rao, R.B. Rule extraction from linear support vector machines. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD' 2005), 32--40. ACM, 2005. Google ScholarDigital Library
- Furnkranz, J. Separate-and-conquer rule learning. Artificial Intelligence Review 13(1): 3--54. 1999. Google ScholarDigital Library
- Grunwald, P.D. The Minimum Description Length Principle. MIT Press, 2007. Google ScholarDigital Library
- Hayete, B., and Bienkowska, J.R. GOTrees: predicting GO associations from protein domain composition using decision trees. Proc. Pacific Symp. on Biocomput. 10, 127--138, 2005.Google Scholar
- Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., and Kadie, C. Dependency networks for inference, collaborative filtering and data visualization. Journal of Machine Learning Research 1: 49--75, 2000. Google ScholarDigital Library
- Henery, R.J. Classification. In: Michie, D., Spiegelhalter, D.J., and Taylor, C.C. Machine Learning, Neural and Statistical Classification, 6--16. Ellis Horwood, 1994. Google ScholarDigital Library
- Huang, J., and Ling, C. Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3): 299--310, 2005. Google ScholarDigital Library
- Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., and Baesens, B. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems 51(1): 141--154. 2011. Google ScholarDigital Library
- Japkowicz, N., and Shah, M. Evaluating learning algorithms: a classification perspective. Cambridge University Press, 2011. Google ScholarDigital Library
- Jiang, T., and Keating, A.E. AVID: an integrative framework for discovering functional relationships among proteins. BMC Bioinformatics 6:136, 2005.Google ScholarCross Ref
- Jin, Y. (Ed.) Multiobjective Machine Learning. Springer, 2006. Google ScholarDigital Library
- Johansson, U., and Niklasson, U. Evolving decision trees using oracle guides. Proc. 2009 IEEE Symp. on Computational Intelligence and Data Mining (CIDM 2009), 238--244. IEEE Press, 2009.Google ScholarCross Ref
- Karpf, J. Inductive modelling in law: example based expert systems in administrative law. Proc. 3rd Int. Conf. on Artificial Intelligence in Law, 297--306. ACM, 1991. Google ScholarDigital Library
- Karwath, A., and King, R.D. Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics 3:11, 2002.Google ScholarCross Ref
- Kaufmann, K.A., and Michalski, R.S. Learning from inconsistent and noisy data: the AQ18 approach. Foundations of Intelligent Systems (Proc. ISMIS'99). LNAI 1609, 411--419. Springer, 1999. Google ScholarDigital Library
- Kohavi, R. The power of decision tables. Proc. 1995 European Conf. on Machine Learning (ECML'95), LNAI 914, 174--189. Springer, 1995. Google ScholarDigital Library
- Kohavi, R., and Sommerfield, D. Targeting business users with decision table classifiers. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), 249--253. AAAI, 1998.Google Scholar
- Kononenko, I. Inductive and Bayesian learning in medical diagnosis. Applied Artificial Intelligence 7(4): 317--337, 1993.Google ScholarCross Ref
- Lavrac, N. Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16(1): 3--23, May 1999.Google ScholarCross Ref
- Lima, E., Mues, C., and Baesens, B. Domain knowledge integration in data mining using decision tables: case studies in churn prediction. Journal of the Operational Research Society 60: 1096--1106, 2009.Google ScholarCross Ref
- Maes, R., and Van Dijk, J.E.M. On the role of ambiguity and incompleteness in the design of decision tables and rulebased systems. The Computer Journal 31(6): 481--489. 1988. Google ScholarDigital Library
- Marteens, D., Vanthienen, J., Verbeke, W., and Baesens, B. Performance of classification models from a user perspective. Decision Support Systems 51(4): 782--793. 2011. Google ScholarDigital Library
- Michie, D., Spiegelhalter, D.J., and Taylor, C.C. (Eds.) Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994. Google ScholarDigital Library
- Otero, F.E.B. and Freitas, A.A. Improving the interpretability of classification rules discovered by an ant colony algorithm. Proc. 2013 Genetic and Evolutionary Computation Conference (GECCO'13), 73--80. ACM, 2013. Google ScholarDigital Library
- Pappa, G.L., Baines, A.J., and Freitas, A.A. Predicting postsynaptic activity in proteins with data mining. Bioinformatics 21(Suppl. 2): ii19--ii25, 2005. Google ScholarDigital Library
- Pazzani, M. Comprehensible Knowledge Discovery: Gaining Insight from Data. Proc. First Federal Data Mining Conf. and Exposition, 73--82. Washington, D.C., 1997.Google Scholar
- Pazzani, M.J. Learning with globally predictive tests. Proc. Discovery Science (DS'98), LNAI 1532. Springer, 1998. Google ScholarDigital Library
- Pazzani, M.J., Mani, S., and Shankle, W.R. Acceptance of rules generated by machine learning among medical experts. Methods of Information in Medicine, 40(5): 380--385, 2001.Google ScholarCross Ref
- Quinlan, J.R. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- Quinlan, J.R. Some elements of machine learning. Proc. 16th Int. Conf. on Machine Learning (ICML'99), 523--525. Morgan Kaufmann, 1999.Google ScholarCross Ref
- Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N.D. (Eds.) Dataset Shift in Machine Learning. MIT Press, 2009. Google ScholarDigital Library
- Richards, G., Rayward-Smith, V.J., Sonksen, P.H., Carey, S., and Weng, C. Data mining for indicators of early mortality in a database of clinical records. Artificial Intelligence in Medicine 22(3): 215--231, June 2001. Google ScholarDigital Library
- Rokach, L. Pattern Classification Using Ensemble Methods. World Scientific, 2010. Google ScholarDigital Library
- Rokach, L. and Maimon, O. Data Mining with Decision Trees: theory and applications. World Scientific, 2008. Google ScholarDigital Library
- Schwabacher, M., and Langley, P. Discovering communicable scientific knowledge from spatio-temporal data. Proc. 18th Int. Conf. on Machine Learning (ICML' 2001), 489--496. Morgan Kaufmann, 2001. Google ScholarDigital Library
- Sen, S., and Knight, L. A genetic prototype learner. Proc. 14th Int. Joint Conf. on Artificial Intelligence (IJCAI'95). 1995. Google ScholarDigital Library
- Sokolova, M., and Lapalme, G. A systematic analysis of performance measures for classification tasks. Information Processing and Management 45(4): 427--437, July 2009. Google ScholarDigital Library
- Subramanian, G.H., Nosek, J., Raghunathan, S.P., and Kanitkar, S.S. A comparison of the decision table and tree. Communications of the ACM 35(1): 89--94, Jan. 1992. Google ScholarDigital Library
- Suri, N.R., Srinivas, V.S. and Murty, M.N. A cooperative game theoretic approach to prototype selection. Proc. 2007 European Conf. on Machine Learning (ECML 2007), LNAI 4701, 556--564. Springer, 2007. Google ScholarDigital Library
- Szafron, D., Lu, P., Greiner, R., Wishart, D.S., Poulin, B., Eisner, R., Lu, Z., Anvik, J., Macdonell, C., Fyshe, A., and Meeuwis, D. Proteome analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids Research 32(Supp. 2): W365--W371, 2004.Google ScholarCross Ref
- S. Tsumoto. Clinical knowledge discovery in hospital information systems: two case studies. Proc. Europ. Conf. on Principles and Practice of Knowledge Discovery and Data Mining (PKDD'2000), LNAI 1910, 652--656. Springer, 2000. Google ScholarDigital Library
- van Assche, A., and Blockeel, H. Seeing the forest through the trees: learning a comprehensible model from an ensemble. Proc. 2007 European Conf. on Machine Learning (ECML 2007), LNAI 4701, 418--429. Springer, 2007. Google ScholarDigital Library
- Verbeke, W., Marteens, D., Mues, C., and Baesens, B. Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Systems with Applications 38(3): 2354--2364. 2011. Google ScholarDigital Library
- Wettschereck, D., Aha, and D.W., Mohri, T. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. In: Aha, D.W. (Ed.) Lazy Learning, 273--314. Kluwer, 1997. Google ScholarDigital Library
- Witten, I.H., Frank, E., and Hall, M.A. Data Mining: practical machine learning tools and techniques. 3rd Ed. Morgan Kaufman, 2011. Google ScholarDigital Library
- Wong, M.L., and Leung, K.S. Data Mining Using Grammar- Based Genetic Programming & Applications. Kluwer, 2000. Google ScholarDigital Library
- Zahalka, J., and Zelesny, F. An experimental test of Occam's Razor in classification. Machine Learning 82(3): 475--481, March 2011. Google ScholarDigital Library
- Zhang, J. Selecting typical instances in instance-based learning. Proc. 9th Int. Workshop on Machine Learning (ML'92), 470--479. 1992. Google ScholarDigital Library
- Zupan, B., Demsar, J., Kattan, M.W., Beck, J.R., and Bratko, I. Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artificial Intelligence in Medicine 20(1): 59--75, Aug. 2000. Google ScholarDigital Library
Index Terms
- Comprehensible classification models: a position paper
Recommendations
Classification trees for problems with monotonicity constraints
For classification problems with ordinal attributes very often the class attribute should increase with each or some of the explaining attributes. These are called classification problems with monotonicity constraints. Classical decision tree algorithms ...
RIONA: A New Classification System Combining Rule Induction and Instance-Based Learning
The article describes a method combining two widely-used empirical approaches to learning from examples: rule induction and instance-based learning. In our algorithm (RIONA) decision is predicted not on the basis of the whole support set of all rules ...
RIONA: A New Classification System Combining Rule Induction and Instance-Based Learning
The article describes a method combining two widely-used empirical approaches to learning from examples: rule induction and instance-based learning. In our algorithm (RIONA) decision is predicted not on the basis of the whole support set of all rules ...
Comments