As direct interpretability is a quality that few models feature (e.g., linear models such as decision trees), here we will focus only on post-hoc generated explanation. This represents the primary approach to make ‘black-box’ models, such as deep neural networks, interpretable (Lipton
2016; Molnar
2020). However, few important considerations emerge from the debate over different approaches to interpretability that must be taken into account. Post-hoc explanations are only approximations of the actual decision-making processes and require a second, simpler model to clarify how inputs are processed into outputs (Wang
2019). In turn, this makes explanations potentially unreliable and open to manipulations which may hide biases to the advantage, for instance, of the proprietary companies that own the rights of use of specific algorithms (Rudin
2018).
‘Hybrid interpretability’ represents a promising solution that combines the strengths of the other two approaches. Unlike post hoc interpretability, where a linear model is used as the explainer (Wang
2019), hybrid interpretability features linear models in a ‘ante-hoc’ fashion. Specifically, this entails replacing the black-box model with a more transparent linear one and test whether it can produce comparatively accurate predictions with a subset of input data. If this is not the case, the black-box model is employed together with its explainer (Wang and Lin
2021). This implies that in those cases which require the use of black-box models, the chances of untruthful or biased explanations persist. Section
3 describes how making explanations ‘questionable’ and ‘interactive’ may help coping with this issue and maximize the chances of successful explanations.
3.1 Explanations as trust support strategy
It is often reported how explanations may be useful to support trust towards artificial agents, particularly due to the opaqueness of their decision-making processes. Without explanations, people may struggle to build accurate mental models of artificial agents (Holliday et al.
2016) and to understand how decisions and predictions are generated (De Graaf and Malle
2017; de Graaf et al.
2018; Lomas et al.
2012). However, exactly how explanations support trust is often not discussed in detail. To better understand this point, we shall first discuss what explanations are.
What constitutes a ‘proper’ explanation is an open question. In fact, “Literature in both the philosophy of science and psychology suggests that no single definition of explanation can account for the range of information that can satisfy a request for an explanation” (Berland and Reiser
2009, p. 27). Miller reports Lewis’ definition that “to explain an event is to provide some information about its causal history. In an act of explaining, someone who is in possession of some information about the causal history of some event—
explanatory information, I shall call it—tries to convey it to someone else” (Lewis
1986, p. 99) in Miller (
2019) (italic in the original version).
Furthermore, the informative content of explanations (i.e., the ‘explanandum’) can be of either ‘scientific’ or ‘everyday’ type. Both concern events’ ‘causal histories’, and subsets of causes are selected to generate explanations (Hesslow
1988; Hilton et al.
2010), but the former type refers to scientific connections of various points in an event’s causal chain, while the latter aims to clarify “why particular facts (events, properties, decisions, etc.) occurred” (Miller
2019, p. 5). As this paper focuses primarily on non-expert users’ interactions with artificial agents, everyday explanations are more relevant for our purposes. Everyday explanations are forms of social communication which, through different means (e.g., textual, visual etc.) aim at transferring knowledge (Hilton
1990) and fill in information asymmetries between one or more ‘explainers’ and one or more ‘explainees’ (Malle et al.
2007). By means of explanations, people persuade each other and influence each other’s impressions and opinions (Malle
2011). Explanatory information is often ‘contrastive’, meaning that people mostly ask why events and actions occur in certain ways rather than in others (Miller
2019). While explanations that answer ‘why-questions’ are fundamental to justify artificial agents’ decisions, explanations to ‘how-questions’ are central for transparency as they help understand the processes that bring artificial agents to specific decisions (Pieters
2011).
For knowledge transfers to be successful, it is important that explanations are understood which, in turn, implies their coherence both internally and with the explainee’s beliefs (Lombrozo
2007; Thagard
1989). Here, it emerges how explanations may be helpful for supporting users’ trust towards artificial agents as they allow a transfer of knowledge about the otherwise opaque artificial agents’ decision-making processes. We reported how standardization is not one of the strengths of explainability (Berland and Reiser
2009). However, this entails that explanations are open to potential customization. As autonomous agents increase their presence in numerous aspects of daily life, they will likely interact with very diverse types of users (Hois et al.
2019; Mohseni et al.
2018). Accordingly, each context of interaction will tend to privilege certain specific qualities over others.
For instance, in some contexts, simplicity, accompanied by a low level of technicality may be desirable explanations (Cawsey
1993; Lombrozo
2007; Zemla et al.
2017). This could be the case with online recommending systems such as those featured by streaming platforms or news websites. A rather unusual suggestion on what to watch, read, or listen to may trigger users’ curiosity. A similar event would likely be considered as a low-stake case, as one could simply decide to skip the recommendation. However, studies show that even in such rather low-stake situations users benefit from explanations in terms of perception of the system’s performance and trustworthiness (Shin
2021). Therefore, an explanation in a similar case should be rather simple and quick and, for instance, refer to feature of the suggested movie or song that closely match previous users’ choices.
Then, other situations in which the consequences at stake are significant may require explanations to be complete and spare no details, even if their internal complexity increases (Kulesza et al.
2013; Zemla et al.
2017). For instance, if algorithms are employed to compute loan requests or job applications, explanations for rejected requests should be rather extensive and exhaustive. They may, for instance, show how the process was not internally biased by forms of discrimination that have nothing to do with applicants’ merits (Bellamy et al.
2018). Such discrimination types can follow nuanced paths and be difficult to detect but, when exposed, they can undermine the trustworthiness of whole processes. Consequently, if specific groups or communities (e.g., in terms of ethnicity or gender (Zou and Schiebinger
2018) become the target of discriminatory AI-based decision-making processes due to underlying biases, members of these groups may develop systematic distrust towards AI-based technologies. In turn, the resulting lack of data including these discriminated groups in training data sets could further increase inequalities in automated decision-making processes, creating a vicious circle. In light of the context-dependence of what qualities explanations should have, we propose tailoring explanations according to the plausibility principle to maximize the benefits of explanations’ flexibility and personalization options.
3.1.1 Explanation plausibility
In the field of explanation science, the relevance of explanations’ plausibility can be found in the pioneering work on abductive reasoning by Peirce (
1997). According to the author, explaining something is better described in terms of abductive reasoning as opposed to other cognitive process such as induction and deduction. Abductive reasoning involves proceeding from effects to causes (like inductive reasoning). However, in deriving hypotheses to explain events, abductive reasoning assumes that something ‘might be’, rather than simply ‘actually is’ (Peirce
1997).
Abductive reasoning has been interpreted as a process of ‘inference to the best explanation’ (Harman
1965), which implies that explanations (ideally the best possible) are considered as the product of inferring processes. Perhaps more importantly for our purposes, Wilkenfeld and Lombrozo (
2015) reformulate the concept emphasizing the processual nature of providing explanations. Intended as the process rather than a product, explaining something aims to trigger ‘the best inference’ possible. Importantly, this translates into the idea that people do not necessarily seek ‘the true story’. They rather seek out plausible stories that can help them grasp the likely causes of an event (Weick et al.
2005).
Therefore, interpreted, abductive reasoning offers a reading in which plausibility emerges as a key criterion for selecting a subset of causes that could explain an event, where the explanatory power of an explanation is not a default quality but rather co-constructed by the parties. In this sense, plausibility implies that the soundness of the causes suggested to explain an event is determined by both the explainer, who offers the explanation, and the explainee, who evaluates it as sound. Furthermore, plausibility as a joint achievement represents the contextual sum of several explanation qualities that researchers identify as desirable.
A study from Wiegand et al. (
2019) provides an example of how to tailor artificial agents’ explanations according to the plausibility principle in the context of autonomous vehicles in a simulated environment. Specifically, they discuss how a self-driving car’s explanations may be designed by combining inputs, in terms of mental model of the vehicle, from both experts and non-expert users (i.e., the typical ‘passenger’ of autonomous vehicles). The result is a ‘target’ mental model made out of those shared features that are identified as fundamental. This target mental model serves as a baseline upon which the cars explanations ought to be built. Interestingly, the authors also specify that, since participants in the study never had to take over the steering wheel, there was no timing limitation for interpreting the car’s explanations.
Two problematic considerations need to be addressed in relation to plausibility. Some authors note that, in principle, an explanation might appear plausible but nevertheless be based on incorrect premises (Dunne et al.
2005; Lakkaraju and Bastani
2020; Walton
2011). When explanations are generated based on false beliefs, they can reinforce inaccuracies (Lombrozo
2006) and thus incorrect mental models. This is the case when the plausibility of an explanation does not match its truthfulness. Furthermore, interpreting plausibility as ‘explaining for the best inference’ means looking at plausibility as a dynamic concept that is contextually negotiated between the interested parties at each explanatory interaction, rather than a fixed property. This may represent an issue, considering artificial agents’ ‘coordinate-based’ reasoning (Lomas et al.
2012). Section
3 discusses explanations’ ‘interactivity’ and ‘questionability’ as implementable strategies to cope with both issues.