column

Comprehensible classification models: a position paper

Author:
Alex A. Freitas

University of Kent, Canterbury, UK

University of Kent, Canterbury, UK
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 15 Issue 1June 2013pp 1–10https://doi.org/10.1145/2594473.2594475

Published:17 March 2014Publication History

ACM SIGKDD Explorations Newsletter

Abstract

The vast majority of the literature evaluates the performance of classification models using only the criterion of predictive accuracy. This paper reviews the case for considering also the comprehensibility (interpretability) of classification models, and discusses the interpretability of five types of classification models, namely decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifiers. We discuss both interpretability issues which are specific to each of those model types and more generic interpretability issues, namely the drawbacks of using model size as the only criterion to evaluate the comprehensibility of a model, and the use of monotonicity constraints to improve the comprehensibility and acceptance of classification models by users.

References

Allahyari, H., and Lavesson, N. User-oriented assessment of classification model understandability. Proc. 11th Scandinavian Conf. on Artificial Intelligence. IOS, 2011.Google Scholar
Altendorf, E.E., Restificar, A.C., and Dietterich, T.G. Learning from sparse data by exploiting monotonicity constraints. Proc. 21st Annual Conf. on Uncertainty in Artificial Intelligence (UAI'05), 18--26. AUAI, 2005.Google Scholar
Augusta, M.G., and Kathirvalavakumar, T. Reverse engineering the neural networks for rule extraction in classification problems. Neural Processing Letters 35(2): 131--150, April 2012. Google ScholarDigital Library
Baesens, B., Mues, C., De Backer, M., and Vanthienen, J. Building intelligent credit scoring systems using decision tables. In: Enterprise Information Systems V, 131--137. Kluwer, 2004.Google Scholar
Bellazzi, R., and Zupan, B. Predictive data mining in clinical medicine: current issues and guidelines. International Journal of Medical Informatics 77(2): 81--97, Feb. 2008.Google ScholarCross Ref
Ben-David, A. Monotonicity maintenance in informationtheoretic machine learning algorithms. Machine Learning 19(1): 29--43. 1995. Google ScholarDigital Library
Ben-David, A., Sterling, L., and Tran, T. Adding monotonicity to learning algorithms may impair their accuracy. Expert Systems with Applications 36(3): 6627--6634. April 2009. Google ScholarDigital Library
Boz, O. Extracting decision trees from trained neural networks. Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'02), 456--461. ACM, 2002. Google ScholarDigital Library
Bramer, M. Principles of Data Mining. Springer, 2007.Google ScholarDigital Library
Cendrowska, J. PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies 27(4): 349--370. 1987.Google ScholarCross Ref
Cheng, J., and Greiner, R. Learning Bayesian belief classifiers: algorithms and system. Proc. 14th Biennial Conference of Canadian Society on Computational Studies of Intelligence (AI'01), 141--151. Springer, 2001. Google ScholarDigital Library
Clark, P., Boswell, R. Rule induction with CN2: some recent improvements. In: Machine Learning -- Proc. Fifth European Conf. (EWSL'91), 151--163. Springer, 1991. Google ScholarDigital Library
Cristianini, N., and Shawe-Taylor, J. An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000. Google ScholarDigital Library
Dejaeger, K., Goethals, F., Giangreco, A., Mola, L., and Baesens, B. Gaining insight into student satisfaction using comprehensible data mining techniques. European Journal of Operational Research, 218(2): 548--562, 2012.Google ScholarCross Ref
Dhar, V., Chou, D., and Provost, F. Discovering interesting patterns for investment decision making with GLOWER -- a genetic learner overlaid with entropy reduction. Data Mining and Knowledge Discovery 4(4): 251--280, 2000. Google ScholarDigital Library
Doderer, M., Yoon, K., Salinas, J., and Kwek, S. Protein subcellular localization prediction using a hybrid of similarity search and error-correcting output code techniques that produces interpretable results. In Silico Biology 6(5): 419--433, 2006.Google Scholar
Domingos, P. Occam's two razors: the sharp and the blunt. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), 37--43. AAAI, 1998.Google Scholar
Duivesteijn, W., and Feelders, A. Nearest neighbour classification with monotonicity constraints. Proc. ECML PKDD 2008, Part I, LNAI 5211, 301--316. Springer, 2008. Google ScholarDigital Library
Elazmeh, W., Matwin, W., O'Sullivan, D., Michalowski, W., and Farion, W. Insights from predicting pediatric asthma exacerbations from retrospective clinical data. In: Evaluation Methods for Machine Learning II -- Papers from 2007 AAAI Workshop, 10--15. Technical Report WS-07-05. AAAI, 2007.Google Scholar
Elomaa, T. In Defense of C4.5: Notes on learning one-level decision trees. Proc. 11th Int. Conf. on Machine Learning (ICML'94), pp. 62--69. Morgan Kaufmann, 1994.Google ScholarCross Ref
Feelders, A.J. Prior knowledge in economic applications of data mining. Proc. European Conf. on Principles and Practice of Knowledge Discovery and Data Mining (PKDD' 2000), LNAI 1910, 395--400. Springer, 2000. Google ScholarDigital Library
Feelders, A., and Pardoel, M. Pruning for monotone classification trees. Proc. Intelligent Data Analysis (IDA) Conf., LNCS 2810, 1--12. Springer, 2003.Google ScholarCross Ref
Ferri, C., Hernandez-Orallo, J., and Ramirez-Quintana, M.J. From ensemble methods to comprehensible models. Proc. 5th Int. Conf. on Discovery Science (DS'2002), LNCS 2534, 165--177. Springer, 2002. Google ScholarDigital Library
Freitas, A.A. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, 2002. Google ScholarDigital Library
Freitas, A.A. A critical review of multi-objective optimization in data mining: a position paper. ACM SIGKDD Explorations, 6(2): 77--86. ACM, Dec. 2004. Google ScholarDigital Library
Freitas, A.A., Wieser, D.C. and Apweiler, R. On the importance of comprehensible classification models for protein function prediction. ACM/IEEE Transactions on Computational Biology and Bioinformatics 7(1): 172--182, Jan.-Mar. 2010. Google ScholarDigital Library
Friedman, N., Geiger, D., and Goldszmidt, M. Bayesian network classifiers. Machine Learning 29(2-3): 131--163, Nov./Dec. 1997. Google ScholarDigital Library
Fung, G., Sandilya, S., and Rao, R.B. Rule extraction from linear support vector machines. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD' 2005), 32--40. ACM, 2005. Google ScholarDigital Library
Furnkranz, J. Separate-and-conquer rule learning. Artificial Intelligence Review 13(1): 3--54. 1999. Google ScholarDigital Library
Grunwald, P.D. The Minimum Description Length Principle. MIT Press, 2007. Google ScholarDigital Library
Hayete, B., and Bienkowska, J.R. GOTrees: predicting GO associations from protein domain composition using decision trees. Proc. Pacific Symp. on Biocomput. 10, 127--138, 2005.Google Scholar
Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., and Kadie, C. Dependency networks for inference, collaborative filtering and data visualization. Journal of Machine Learning Research 1: 49--75, 2000. Google ScholarDigital Library
Henery, R.J. Classification. In: Michie, D., Spiegelhalter, D.J., and Taylor, C.C. Machine Learning, Neural and Statistical Classification, 6--16. Ellis Horwood, 1994. Google ScholarDigital Library
Huang, J., and Ling, C. Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3): 299--310, 2005. Google ScholarDigital Library
Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., and Baesens, B. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems 51(1): 141--154. 2011. Google ScholarDigital Library
Japkowicz, N., and Shah, M. Evaluating learning algorithms: a classification perspective. Cambridge University Press, 2011. Google ScholarDigital Library
Jiang, T., and Keating, A.E. AVID: an integrative framework for discovering functional relationships among proteins. BMC Bioinformatics 6:136, 2005.Google ScholarCross Ref
Jin, Y. (Ed.) Multiobjective Machine Learning. Springer, 2006. Google ScholarDigital Library
Johansson, U., and Niklasson, U. Evolving decision trees using oracle guides. Proc. 2009 IEEE Symp. on Computational Intelligence and Data Mining (CIDM 2009), 238--244. IEEE Press, 2009.Google ScholarCross Ref
Karpf, J. Inductive modelling in law: example based expert systems in administrative law. Proc. 3rd Int. Conf. on Artificial Intelligence in Law, 297--306. ACM, 1991. Google ScholarDigital Library
Karwath, A., and King, R.D. Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics 3:11, 2002.Google ScholarCross Ref
Kaufmann, K.A., and Michalski, R.S. Learning from inconsistent and noisy data: the AQ18 approach. Foundations of Intelligent Systems (Proc. ISMIS'99). LNAI 1609, 411--419. Springer, 1999. Google ScholarDigital Library
Kohavi, R. The power of decision tables. Proc. 1995 European Conf. on Machine Learning (ECML'95), LNAI 914, 174--189. Springer, 1995. Google ScholarDigital Library
Kohavi, R., and Sommerfield, D. Targeting business users with decision table classifiers. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), 249--253. AAAI, 1998.Google Scholar
Kononenko, I. Inductive and Bayesian learning in medical diagnosis. Applied Artificial Intelligence 7(4): 317--337, 1993.Google ScholarCross Ref
Lavrac, N. Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16(1): 3--23, May 1999.Google ScholarCross Ref
Lima, E., Mues, C., and Baesens, B. Domain knowledge integration in data mining using decision tables: case studies in churn prediction. Journal of the Operational Research Society 60: 1096--1106, 2009.Google ScholarCross Ref
Maes, R., and Van Dijk, J.E.M. On the role of ambiguity and incompleteness in the design of decision tables and rulebased systems. The Computer Journal 31(6): 481--489. 1988. Google ScholarDigital Library
Marteens, D., Vanthienen, J., Verbeke, W., and Baesens, B. Performance of classification models from a user perspective. Decision Support Systems 51(4): 782--793. 2011. Google ScholarDigital Library
Michie, D., Spiegelhalter, D.J., and Taylor, C.C. (Eds.) Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994. Google ScholarDigital Library
Otero, F.E.B. and Freitas, A.A. Improving the interpretability of classification rules discovered by an ant colony algorithm. Proc. 2013 Genetic and Evolutionary Computation Conference (GECCO'13), 73--80. ACM, 2013. Google ScholarDigital Library
Pappa, G.L., Baines, A.J., and Freitas, A.A. Predicting postsynaptic activity in proteins with data mining. Bioinformatics 21(Suppl. 2): ii19--ii25, 2005. Google ScholarDigital Library
Pazzani, M. Comprehensible Knowledge Discovery: Gaining Insight from Data. Proc. First Federal Data Mining Conf. and Exposition, 73--82. Washington, D.C., 1997.Google Scholar
Pazzani, M.J. Learning with globally predictive tests. Proc. Discovery Science (DS'98), LNAI 1532. Springer, 1998. Google ScholarDigital Library
Pazzani, M.J., Mani, S., and Shankle, W.R. Acceptance of rules generated by machine learning among medical experts. Methods of Information in Medicine, 40(5): 380--385, 2001.Google ScholarCross Ref
Quinlan, J.R. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
Quinlan, J.R. Some elements of machine learning. Proc. 16th Int. Conf. on Machine Learning (ICML'99), 523--525. Morgan Kaufmann, 1999.Google ScholarCross Ref
Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N.D. (Eds.) Dataset Shift in Machine Learning. MIT Press, 2009. Google ScholarDigital Library
Richards, G., Rayward-Smith, V.J., Sonksen, P.H., Carey, S., and Weng, C. Data mining for indicators of early mortality in a database of clinical records. Artificial Intelligence in Medicine 22(3): 215--231, June 2001. Google ScholarDigital Library
Rokach, L. Pattern Classification Using Ensemble Methods. World Scientific, 2010. Google ScholarDigital Library
Rokach, L. and Maimon, O. Data Mining with Decision Trees: theory and applications. World Scientific, 2008. Google ScholarDigital Library
Schwabacher, M., and Langley, P. Discovering communicable scientific knowledge from spatio-temporal data. Proc. 18th Int. Conf. on Machine Learning (ICML' 2001), 489--496. Morgan Kaufmann, 2001. Google ScholarDigital Library
Sen, S., and Knight, L. A genetic prototype learner. Proc. 14th Int. Joint Conf. on Artificial Intelligence (IJCAI'95). 1995. Google ScholarDigital Library
Sokolova, M., and Lapalme, G. A systematic analysis of performance measures for classification tasks. Information Processing and Management 45(4): 427--437, July 2009. Google ScholarDigital Library
Subramanian, G.H., Nosek, J., Raghunathan, S.P., and Kanitkar, S.S. A comparison of the decision table and tree. Communications of the ACM 35(1): 89--94, Jan. 1992. Google ScholarDigital Library
Suri, N.R., Srinivas, V.S. and Murty, M.N. A cooperative game theoretic approach to prototype selection. Proc. 2007 European Conf. on Machine Learning (ECML 2007), LNAI 4701, 556--564. Springer, 2007. Google ScholarDigital Library
Szafron, D., Lu, P., Greiner, R., Wishart, D.S., Poulin, B., Eisner, R., Lu, Z., Anvik, J., Macdonell, C., Fyshe, A., and Meeuwis, D. Proteome analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids Research 32(Supp. 2): W365--W371, 2004.Google ScholarCross Ref
S. Tsumoto. Clinical knowledge discovery in hospital information systems: two case studies. Proc. Europ. Conf. on Principles and Practice of Knowledge Discovery and Data Mining (PKDD'2000), LNAI 1910, 652--656. Springer, 2000. Google ScholarDigital Library
van Assche, A., and Blockeel, H. Seeing the forest through the trees: learning a comprehensible model from an ensemble. Proc. 2007 European Conf. on Machine Learning (ECML 2007), LNAI 4701, 418--429. Springer, 2007. Google ScholarDigital Library
Verbeke, W., Marteens, D., Mues, C., and Baesens, B. Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Systems with Applications 38(3): 2354--2364. 2011. Google ScholarDigital Library
Wettschereck, D., Aha, and D.W., Mohri, T. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. In: Aha, D.W. (Ed.) Lazy Learning, 273--314. Kluwer, 1997. Google ScholarDigital Library
Witten, I.H., Frank, E., and Hall, M.A. Data Mining: practical machine learning tools and techniques. 3rd Ed. Morgan Kaufman, 2011. Google ScholarDigital Library
Wong, M.L., and Leung, K.S. Data Mining Using Grammar- Based Genetic Programming & Applications. Kluwer, 2000. Google ScholarDigital Library
Zahalka, J., and Zelesny, F. An experimental test of Occam's Razor in classification. Machine Learning 82(3): 475--481, March 2011. Google ScholarDigital Library
Zhang, J. Selecting typical instances in instance-based learning. Proc. 9th Int. Workshop on Machine Learning (ML'92), 470--479. 1992. Google ScholarDigital Library
Zupan, B., Demsar, J., Kattan, M.W., Beck, J.R., and Bratko, I. Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artificial Intelligence in Medicine 20(1): 59--75, Aug. 2000. Google ScholarDigital Library

Index Terms

Comprehensible classification models: a position paper
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Classification trees for problems with monotonicity constraints

For classification problems with ordinal attributes very often the class attribute should increase with each or some of the explaining attributes. These are called classification problems with monotonicity constraints. Classical decision tree algorithms ...
Read More
RIONA: A New Classification System Combining Rule Induction and Instance-Based Learning

The article describes a method combining two widely-used empirical approaches to learning from examples: rule induction and instance-based learning. In our algorithm (RIONA) decision is predicted not on the basis of the whole support set of all rules ...
Read More
RIONA: A New Classification System Combining Rule Induction and Instance-Based Learning

The article describes a method combining two widely-used empirical approaches to learning from examples: rule induction and instance-based learning. In our algorithm (RIONA) decision is predicted not on the basis of the whole support set of all rules ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGKDD Explorations Newsletter Volume 15, Issue 1
June 2013
50 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/2594473
Editors:
Bart Goethals
University of Antwerp, Belgium
,
Charu Aggarwal
IBM T. J. Watson Research Center in Yorktown Heights, New York
,
Srinivasan Parthasarathy
The Ohio State University, Columbus, OH
,
Ankur Teredesai
University of Washington, Seattle, Washington
Issue’s Table of Contents
Copyright © 2014 Author
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 March 2014
Check for updates
Author Tags
Bayesian network classifiers
decision table
decision tree
monotonicity constraint
nearest neighbors
rule induction
Qualifiers
- column
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 402
  Total Citations
  View Citations
- 3,557
  Total Downloads
- Downloads (Last 12 months)672
- Downloads (Last 6 weeks)98
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.