Fostering interpretability of data mining models through data perturbation
Introduction
If the concepts of interpretability and comprehensibility originally appeared alongside the first real-world data mining applications (Kononenko, 1993, Lavrač, Džeroski, Pirnat, Križman, 1993), it has only been in the last years, with the rise of complex nonlinear models such as those produced by e.g. neural networks (Montavon, Samek, & Müller, 2018), that their relevance has substantially increased (Freitas, 2014, Gerretzen, Szymaska, Bart, Davies, van Manen, van den Heuvel, Jansen, Buydens, 2016, Lipton, 2016). If one has to select one case exemplifying the need for an increased interpretability of data mining models, this can be easily found in medicine, e.g. in cancer diagnosis and treatment. Nowadays, physicians have to process a large number of inputs coming from different analyses, such as x-rays, biopsies, or genetic tests, to build an optimal treatment combining together different therapies (Urruticoechea et al., 2010). As this choice cannot be formalised using a simple set of rules, the future points towards a widespread adoption of Medical Diagnostic Decision Support systems (MDDS) to support physician activity (Berner, 2007, Miller, 1994, Musen, Middleton, Greenes, 2014, Scott, Brown, Adedeji, Wyatt, Georgiou, Eisenstein, Friedman, 2019). In this context, the need for interpretability comes from different sides. From a scientific point of view, one is clearly interested in the mechanisms detected by the system: e.g. not just which treatment should be administered, but also why it is considered the most appropriate; this knowledge further fosters trust and reliance on the system (Bussone, Stumpf, & O’Sullivan, 2015). Additionally, it may be necessary to resolve conflicting solutions, e.g. when the MDDS and the clinician have different views on the same problem, thus requiring a reasoned disambiguation; and to support training, i.e. when a trainee is exposed to different situations and he/she compares his/her responses with those of the system, to learn from them (Lagro, van de Pol, Laan, Huijbregts-Verheyden, Fluit, Rikkert, 2014, Yoon, Velasquez, Partridge, Nof, 2008).
In order to tackle this problem, several alternatives have been proposed in the literature, of which the most promising are based on model-agnostic explanation (Ribeiro, Singh, & Guestrin, 2016a). Roughly speaking, such solutions are based on an a posteriori description of a black-box model through a set of (simpler) rules - see Section 2 for further details. Nevertheless, it has to be noted that this approach does not truly tackle the problem, but just rephrases it, by mapping a complex model to a simpler, albeit not necessary interpretable, one.
In this contribution we propose a novel methodology for enhancing interpretability, which builds on top of the model-agnostic explanation concept, but does not rely on creating alternative classification models. Given an already trained model, for instance for the classification of instances between n classes, and a new instance to be analysed, we propose an algorithm yielding the smallest variation needed to change the class of the latter instance to match the one expected/desired by the user. The user is then able to understand under which conditions would the solution of the model have been different; or equivalently, why that specific solution was yielded. On a more abstract level, the user can leverage on this information to create his/her own representation of the black-box model, without being conditioned by any a priori assumption on the structure of that representation. Beyond yielding the smallest variation needed to swap classes, the method can also be tuned to minimise the number of features involved in this change - as, in many contexts, simpler solutions (that is, minimising the number of changes) can be preferred for being easier to understand and/or implement. We show how such methodology is well defined, model-agnostic, easy to implement and modular, as it does not impose constraints on the complexity of the classification algorithm, which is treated as a black-box. Above and beyond this, we demonstrate that the proposed approach allows to improve the interpretability of data mining models, and to help tackling problems as the previously described ones, e.g. the disambiguation of conflicting solutions and the improvement of medical training.
Beyond this introduction, the remainder of the paper is organised as follows. Section 2 presents a brief overview on the interpretability concept, and on how it has historically been dealt with in data mining. The proposed methodology is presented in Section 3, for then applying it to four case studies, constructed upon data sets respectively representing different real world problems and data characteristics (Section 4). Afterwards, Section 4.4 presents an analysis on the optimality and computational cost of the method. Finally some conclusions are drawn in Section 5.
Section snippets
Related work
In spite of past attempts, there seems to be no clear definition of what the interpretability of a model is and how it can be measured (Bibal, Frenay, 2016, Doshi-Velez, & Kim, Narayanan, Chen, He, Kim, Gershman, & Doshi-Velez).
There are a number of heuristics, guidelines and rules of thumb that the community has been using for years to both define and assess interpretability (Biran, Cotton, 2017, Gilpin, Bau, Yuan, Bajwa, Specter, Kagal, 2018), albeit without a clear formalisation nor
Methodology
In order to simplify the explanation of our approach, and without loss of generality, we here suppose a simple two-classes classification. As depicted in Fig. 1 Left, several instances are described by just two features f1 and f2, such that the problem space is limited to a plane. The class of each instance is denoted by black circles and triangles, while a black dashed line depicts the inner separation of the classification model (a priori unknown to the user). Finally, a new instance is
Data set description
The proposed methodology has been tested using four public and well-known data sets, all of them available through the UCI repository. They have been selected in order to be representative of heterogeneous areas and classification problems - e.g. different types of input features and number of labels.
Conclusions and discussion
In this contribution we have proposed a new method for increasing the interpretability and comprehensibility of data mining classification models. Starting from an instance that has been classified as belonging to one class, the solution entails identifying the minimum variation in the features’ values required to change the output class. This methodology presents four major advantages worth discussing. First, it is model-agnostic, as it does not depend on the considered classification model,
Conflict of Interest
The authors declare no conflict of interest.
CRediT authorship contribution statement
Seddik Belkoura: Formal analysis, Writing - original draft, Writing - review & editing. Massimiliano Zanin: Conceptualization, Writing - review & editing. Antonio LaTorre: Formal analysis, Writing - original draft, Writing - review & editing.
Acknowledgement
A.L. acknowledges funding from the Spanish Ministry of Science and Innovation (TIN2017-83132-C2-2-R) and Universidad Politécnica de Madrid (PINV-18-XEOGHQ-19-4QTEBP).
References (77)
- et al.
Classification in high-dimensional spectral data: Accuracy vs. interpretability vs. model size
Neurocomputing
(2014) - et al.
Modeling wine preferences by data mining from physicochemical properties
Decision Support Systems
(2009) - et al.
Boosting model performance and interpretation by entangling preprocessing selection and variable selection
Analytica Chimica Acta
(2016) - et al.
Optimal discriminant plane for a small number of samples and design method of classifier on the plane
Pattern Recognition
(1991) Rule extraction by successive regularization
Neural Networks
(2000)- et al.
Artificial convolution neural network for medical image pattern recognition
Neural Networks
(1995) Explanation in artificial intelligence: insights from the social sciences
Artificial Intelligence
(2019)- et al.
Methods for interpreting and understanding deep neural networks
Digital Signal Processing
(2018) - et al.
Generating knowledge in maintenance from experience feedback
Knowledge-Based Systems
(2014) - et al.
Tensorflow: Biologys gateway to deep learning?
Cell Systems
(2016)