skip to main content
column

Comprehensible classification models: a position paper

Published:17 March 2014Publication History
Skip Abstract Section

Abstract

The vast majority of the literature evaluates the performance of classification models using only the criterion of predictive accuracy. This paper reviews the case for considering also the comprehensibility (interpretability) of classification models, and discusses the interpretability of five types of classification models, namely decision trees, classification rules, decision tables, nearest neighbors and Bayesian network classifiers. We discuss both interpretability issues which are specific to each of those model types and more generic interpretability issues, namely the drawbacks of using model size as the only criterion to evaluate the comprehensibility of a model, and the use of monotonicity constraints to improve the comprehensibility and acceptance of classification models by users.

References

  1. Allahyari, H., and Lavesson, N. User-oriented assessment of classification model understandability. Proc. 11th Scandinavian Conf. on Artificial Intelligence. IOS, 2011.Google ScholarGoogle Scholar
  2. Altendorf, E.E., Restificar, A.C., and Dietterich, T.G. Learning from sparse data by exploiting monotonicity constraints. Proc. 21st Annual Conf. on Uncertainty in Artificial Intelligence (UAI'05), 18--26. AUAI, 2005.Google ScholarGoogle Scholar
  3. Augusta, M.G., and Kathirvalavakumar, T. Reverse engineering the neural networks for rule extraction in classification problems. Neural Processing Letters 35(2): 131--150, April 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baesens, B., Mues, C., De Backer, M., and Vanthienen, J. Building intelligent credit scoring systems using decision tables. In: Enterprise Information Systems V, 131--137. Kluwer, 2004.Google ScholarGoogle Scholar
  5. Bellazzi, R., and Zupan, B. Predictive data mining in clinical medicine: current issues and guidelines. International Journal of Medical Informatics 77(2): 81--97, Feb. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  6. Ben-David, A. Monotonicity maintenance in informationtheoretic machine learning algorithms. Machine Learning 19(1): 29--43. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ben-David, A., Sterling, L., and Tran, T. Adding monotonicity to learning algorithms may impair their accuracy. Expert Systems with Applications 36(3): 6627--6634. April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Boz, O. Extracting decision trees from trained neural networks. Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'02), 456--461. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bramer, M. Principles of Data Mining. Springer, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cendrowska, J. PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies 27(4): 349--370. 1987.Google ScholarGoogle ScholarCross RefCross Ref
  11. Cheng, J., and Greiner, R. Learning Bayesian belief classifiers: algorithms and system. Proc. 14th Biennial Conference of Canadian Society on Computational Studies of Intelligence (AI'01), 141--151. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Clark, P., Boswell, R. Rule induction with CN2: some recent improvements. In: Machine Learning -- Proc. Fifth European Conf. (EWSL'91), 151--163. Springer, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cristianini, N., and Shawe-Taylor, J. An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dejaeger, K., Goethals, F., Giangreco, A., Mola, L., and Baesens, B. Gaining insight into student satisfaction using comprehensible data mining techniques. European Journal of Operational Research, 218(2): 548--562, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  15. Dhar, V., Chou, D., and Provost, F. Discovering interesting patterns for investment decision making with GLOWER -- a genetic learner overlaid with entropy reduction. Data Mining and Knowledge Discovery 4(4): 251--280, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Doderer, M., Yoon, K., Salinas, J., and Kwek, S. Protein subcellular localization prediction using a hybrid of similarity search and error-correcting output code techniques that produces interpretable results. In Silico Biology 6(5): 419--433, 2006.Google ScholarGoogle Scholar
  17. Domingos, P. Occam's two razors: the sharp and the blunt. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), 37--43. AAAI, 1998.Google ScholarGoogle Scholar
  18. Duivesteijn, W., and Feelders, A. Nearest neighbour classification with monotonicity constraints. Proc. ECML PKDD 2008, Part I, LNAI 5211, 301--316. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Elazmeh, W., Matwin, W., O'Sullivan, D., Michalowski, W., and Farion, W. Insights from predicting pediatric asthma exacerbations from retrospective clinical data. In: Evaluation Methods for Machine Learning II -- Papers from 2007 AAAI Workshop, 10--15. Technical Report WS-07-05. AAAI, 2007.Google ScholarGoogle Scholar
  20. Elomaa, T. In Defense of C4.5: Notes on learning one-level decision trees. Proc. 11th Int. Conf. on Machine Learning (ICML'94), pp. 62--69. Morgan Kaufmann, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  21. Feelders, A.J. Prior knowledge in economic applications of data mining. Proc. European Conf. on Principles and Practice of Knowledge Discovery and Data Mining (PKDD' 2000), LNAI 1910, 395--400. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Feelders, A., and Pardoel, M. Pruning for monotone classification trees. Proc. Intelligent Data Analysis (IDA) Conf., LNCS 2810, 1--12. Springer, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ferri, C., Hernandez-Orallo, J., and Ramirez-Quintana, M.J. From ensemble methods to comprehensible models. Proc. 5th Int. Conf. on Discovery Science (DS'2002), LNCS 2534, 165--177. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Freitas, A.A. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Freitas, A.A. A critical review of multi-objective optimization in data mining: a position paper. ACM SIGKDD Explorations, 6(2): 77--86. ACM, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Freitas, A.A., Wieser, D.C. and Apweiler, R. On the importance of comprehensible classification models for protein function prediction. ACM/IEEE Transactions on Computational Biology and Bioinformatics 7(1): 172--182, Jan.-Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Friedman, N., Geiger, D., and Goldszmidt, M. Bayesian network classifiers. Machine Learning 29(2-3): 131--163, Nov./Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fung, G., Sandilya, S., and Rao, R.B. Rule extraction from linear support vector machines. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD' 2005), 32--40. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Furnkranz, J. Separate-and-conquer rule learning. Artificial Intelligence Review 13(1): 3--54. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Grunwald, P.D. The Minimum Description Length Principle. MIT Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hayete, B., and Bienkowska, J.R. GOTrees: predicting GO associations from protein domain composition using decision trees. Proc. Pacific Symp. on Biocomput. 10, 127--138, 2005.Google ScholarGoogle Scholar
  32. Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., and Kadie, C. Dependency networks for inference, collaborative filtering and data visualization. Journal of Machine Learning Research 1: 49--75, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Henery, R.J. Classification. In: Michie, D., Spiegelhalter, D.J., and Taylor, C.C. Machine Learning, Neural and Statistical Classification, 6--16. Ellis Horwood, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Huang, J., and Ling, C. Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3): 299--310, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., and Baesens, B. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems 51(1): 141--154. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Japkowicz, N., and Shah, M. Evaluating learning algorithms: a classification perspective. Cambridge University Press, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jiang, T., and Keating, A.E. AVID: an integrative framework for discovering functional relationships among proteins. BMC Bioinformatics 6:136, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  38. Jin, Y. (Ed.) Multiobjective Machine Learning. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Johansson, U., and Niklasson, U. Evolving decision trees using oracle guides. Proc. 2009 IEEE Symp. on Computational Intelligence and Data Mining (CIDM 2009), 238--244. IEEE Press, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  40. Karpf, J. Inductive modelling in law: example based expert systems in administrative law. Proc. 3rd Int. Conf. on Artificial Intelligence in Law, 297--306. ACM, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Karwath, A., and King, R.D. Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics 3:11, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  42. Kaufmann, K.A., and Michalski, R.S. Learning from inconsistent and noisy data: the AQ18 approach. Foundations of Intelligent Systems (Proc. ISMIS'99). LNAI 1609, 411--419. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kohavi, R. The power of decision tables. Proc. 1995 European Conf. on Machine Learning (ECML'95), LNAI 914, 174--189. Springer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Kohavi, R., and Sommerfield, D. Targeting business users with decision table classifiers. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), 249--253. AAAI, 1998.Google ScholarGoogle Scholar
  45. Kononenko, I. Inductive and Bayesian learning in medical diagnosis. Applied Artificial Intelligence 7(4): 317--337, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  46. Lavrac, N. Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16(1): 3--23, May 1999.Google ScholarGoogle ScholarCross RefCross Ref
  47. Lima, E., Mues, C., and Baesens, B. Domain knowledge integration in data mining using decision tables: case studies in churn prediction. Journal of the Operational Research Society 60: 1096--1106, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  48. Maes, R., and Van Dijk, J.E.M. On the role of ambiguity and incompleteness in the design of decision tables and rulebased systems. The Computer Journal 31(6): 481--489. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Marteens, D., Vanthienen, J., Verbeke, W., and Baesens, B. Performance of classification models from a user perspective. Decision Support Systems 51(4): 782--793. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Michie, D., Spiegelhalter, D.J., and Taylor, C.C. (Eds.) Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Otero, F.E.B. and Freitas, A.A. Improving the interpretability of classification rules discovered by an ant colony algorithm. Proc. 2013 Genetic and Evolutionary Computation Conference (GECCO'13), 73--80. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Pappa, G.L., Baines, A.J., and Freitas, A.A. Predicting postsynaptic activity in proteins with data mining. Bioinformatics 21(Suppl. 2): ii19--ii25, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Pazzani, M. Comprehensible Knowledge Discovery: Gaining Insight from Data. Proc. First Federal Data Mining Conf. and Exposition, 73--82. Washington, D.C., 1997.Google ScholarGoogle Scholar
  54. Pazzani, M.J. Learning with globally predictive tests. Proc. Discovery Science (DS'98), LNAI 1532. Springer, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Pazzani, M.J., Mani, S., and Shankle, W.R. Acceptance of rules generated by machine learning among medical experts. Methods of Information in Medicine, 40(5): 380--385, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  56. Quinlan, J.R. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Quinlan, J.R. Some elements of machine learning. Proc. 16th Int. Conf. on Machine Learning (ICML'99), 523--525. Morgan Kaufmann, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  58. Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N.D. (Eds.) Dataset Shift in Machine Learning. MIT Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Richards, G., Rayward-Smith, V.J., Sonksen, P.H., Carey, S., and Weng, C. Data mining for indicators of early mortality in a database of clinical records. Artificial Intelligence in Medicine 22(3): 215--231, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Rokach, L. Pattern Classification Using Ensemble Methods. World Scientific, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Rokach, L. and Maimon, O. Data Mining with Decision Trees: theory and applications. World Scientific, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Schwabacher, M., and Langley, P. Discovering communicable scientific knowledge from spatio-temporal data. Proc. 18th Int. Conf. on Machine Learning (ICML' 2001), 489--496. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Sen, S., and Knight, L. A genetic prototype learner. Proc. 14th Int. Joint Conf. on Artificial Intelligence (IJCAI'95). 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Sokolova, M., and Lapalme, G. A systematic analysis of performance measures for classification tasks. Information Processing and Management 45(4): 427--437, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Subramanian, G.H., Nosek, J., Raghunathan, S.P., and Kanitkar, S.S. A comparison of the decision table and tree. Communications of the ACM 35(1): 89--94, Jan. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Suri, N.R., Srinivas, V.S. and Murty, M.N. A cooperative game theoretic approach to prototype selection. Proc. 2007 European Conf. on Machine Learning (ECML 2007), LNAI 4701, 556--564. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Szafron, D., Lu, P., Greiner, R., Wishart, D.S., Poulin, B., Eisner, R., Lu, Z., Anvik, J., Macdonell, C., Fyshe, A., and Meeuwis, D. Proteome analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids Research 32(Supp. 2): W365--W371, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  68. S. Tsumoto. Clinical knowledge discovery in hospital information systems: two case studies. Proc. Europ. Conf. on Principles and Practice of Knowledge Discovery and Data Mining (PKDD'2000), LNAI 1910, 652--656. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. van Assche, A., and Blockeel, H. Seeing the forest through the trees: learning a comprehensible model from an ensemble. Proc. 2007 European Conf. on Machine Learning (ECML 2007), LNAI 4701, 418--429. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Verbeke, W., Marteens, D., Mues, C., and Baesens, B. Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Systems with Applications 38(3): 2354--2364. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Wettschereck, D., Aha, and D.W., Mohri, T. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. In: Aha, D.W. (Ed.) Lazy Learning, 273--314. Kluwer, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Witten, I.H., Frank, E., and Hall, M.A. Data Mining: practical machine learning tools and techniques. 3rd Ed. Morgan Kaufman, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Wong, M.L., and Leung, K.S. Data Mining Using Grammar- Based Genetic Programming & Applications. Kluwer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Zahalka, J., and Zelesny, F. An experimental test of Occam's Razor in classification. Machine Learning 82(3): 475--481, March 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Zhang, J. Selecting typical instances in instance-based learning. Proc. 9th Int. Workshop on Machine Learning (ML'92), 470--479. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Zupan, B., Demsar, J., Kattan, M.W., Beck, J.R., and Bratko, I. Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artificial Intelligence in Medicine 20(1): 59--75, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Comprehensible classification models: a position paper

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader