main-content

## Swipe to navigate through the articles of this issue

04-12-2020 | Research Paper | Issue 1/2021 Open Access

# Watch Me Improve—Algorithm Aversion and Demonstrating the Ability to Learn

Journal:
Business & Information Systems Engineering > Issue 1/2021
Authors:
Benedikt Berger, Martin Adam, Alexander Rühr, Alexander Benlian
Important notes

## Electronic supplementary material

The online version of this article (https://​doi.​org/​10.​1007/​s12599-020-00678-5) contains supplementary material, which is available to authorized users.
Accepted after two revisions by Christof Weinhardt.

## 1 Introduction

Artificial intelligence (AI) research has extended the capabilities of information technology (IT) systems to support or automate tasks, such as medical diagnosis, credit card fraud detection, and advertising budget allocation (Anthes 2017). Accordingly, the deployment of AI-based systems, i.e. IT systems employing capabilities developed in AI research, is supposed to change substantially how businesses operate and people work (vom Brocke et al. 2018; Ransbotham et al. 2017). AI researchers employ various approaches to realize new capabilities, yet many promising achievements are based on machine learning (Jordan and Mitchell 2015). Market research companies predict the market for IT systems employing machine learning to grow with double-digit rates over the upcoming years (Columbus 2020). Regardless of the specific problem domain, machine learning allows equipping IT systems with the ability to learn, i.e. the capability to improve in performance over time (Faraj et al. 2018). Such AI-based systems can assist users in various situations concerning their business and private life (Maedche et al. 2019).
A particularly important machine learning use case is the support of decisions. Decision support systems (DSS) have evolved in several waves in the past and the application of machine learning promises another leap forward (Watson 2017). Yet, existent decision-making and information systems (IS) research suggests that people can be reluctant to accept support from or delegate decisions to DSS – a phenomenon called algorithm aversion (Dietvorst et al. 2015; Castelo et al. 2019). 1 This phenomenon constitutes a serious issue for businesses employing DSS: Since even simple algorithms can outperform humans in many decision tasks (Kuncel et al. 2013; Elkins et al. 2013), rejecting the advice of DSS often leads to inferior decisions. Furthermore, potential gains of combining human and algorithmic insights (Dellermann et al. 2019) cannot be realized if decision makers are unwilling to take algorithmic advice into account.
RQ1
Do people exhibit a general algorithm aversion or do they prefer human to algorithmic decision support only after observing that the decision support errs?
If people indeed shun erring algorithmic support because they disregard the possibility that algorithms can improve, demonstrating the opposite (i.e., an algorithm’s ability to learn) should alleviate algorithm aversion. However, existing research has not examined whether there are differences in algorithm aversion to DSS with and without the ability to learn. Therefore, we specifically investigate whether demonstrating an algorithm’s ability to learn can contribute to overcoming algorithm aversion. We focus on the ability to learn for two reasons: Demonstrating an algorithm’s ongoing improvement in performance to users is theoretically intriguing because this design feature may counter users’ algorithm aversion and consequently increase their willingness to rely on particular AI-based systems. Moreover, the increasing application of machine learning in practice is especially relevant for tasks that algorithms can support, such as classification or forecasting (Jordan and Mitchell 2015). Therefore, we pose a second research question:
RQ2
Does demonstrating an algorithm’s ability to learn alleviate algorithm aversion?
To answer our research questions, we conducted an incentive-compatible online experiment with 452 subjects. Within this experiment, participants had to solve a forecasting task while deciding to what degree they would rely on an erring advisor to increase their odds of receiving a bonus. We manipulated the advisor to examine how its nature (i.e., human vs. algorithmic), its familiarity to the participants (i.e., unfamiliar vs. familiar), and its ability to learn (i.e., non-learning vs. learning) affect the participants’ reliance on the advice. Our results do not indicate a general aversion to algorithmic advice but a negative effect of familiarity on the participants’ willingness to accept algorithmic advice. However, if the algorithm is able to learn, the negative effect of familiarity disappears.
Our study makes a major, threefold contribution to research on algorithm aversion and the interaction with AI-based systems: First, we shed light on the algorithm aversion phenomenon by substantiating that becoming familiar with an erring algorithm is an important boundary condition for this phenomenon. Second, we demonstrate that the experience during the familiarization with an algorithm plays a key role in relying on an algorithm’s advice. Third, we provide first insights on an AI-based system’s ability to learn as an increasingly important but hitherto underexplored design characteristic, which may counter algorithm aversion. Thereby, we answer the call for research on individuals’ interaction with AI-based systems (Buxmann et al. 2019). Our findings also hold important implications for the design and employment of continually learning systems. Specifically, developers may seek to emphasize these systems’ ability to learn in order to enhance users’ tolerance for erring advice and, thus, reliance on support from AI-based systems.

## 2 Theoretical Foundations

### 2.1 Algorithm Aversion

The literature on algorithm aversion is rooted in the controversy over the merits of clinical (i.e., based on deliberate human thought) and actuarial (i.e., based on statistic models) judgement in different domains, such as medical diagnosis and treatment (Meehl 1954; Dawes 1979; Dawes et al. 1989; Grove et al. 2000). Overall, this research concludes that actuarial data interpretation is superior to clinical analysis but that humans nevertheless show a tendency to resist purely actuarial judgement. This resistance extends to the use of algorithmic decision support when compared to human advice (Promberger and Baron 2006; Alvarado-Valencia and Barrero 2014). Evidence from IS research supports these findings: For instance, Lim and O’Connor ( 1996) demonstrate that people underutilize information from DSS when making decisions. Elkins et al. ( 2013) find that expert system users feel threatened by system recommendations contradicting their expertise and thus tend to ignore these recommendations. Furthermore, the results by Leyer and Schneider ( 2019) indicate that managers are less likely to delegate strategic decisions to an AI-based DSS than to another human. While most empirical evidence supports the existence of algorithm aversion, other studies observed an appreciation of algorithmic advice (Dijkstra et al. 1998; Logg et al. 2019) or even an exaggerated reliance on AI-based systems (Dijkstra 1999; Wagner et al. 2018). Similarly, Gunaratne et al. ( 2018) reveal that humans tend to follow algorithmic financial advice more closely than identical crowdsourced advice. Therefore, our study seeks to contribute toward untangling these contradicting findings. Table A1 in the digital online appendix (available online via http://​link.​springer.​com) provides an overview of recent studies on algorithm aversion.
When comparing studies on algorithm aversion, it is important to note that two differing understandings of the term algorithm aversion exist (Dietvorst et al. 2015; Logg et al. 2019). Dietvorst et al. ( 2015) coined the term for the choice of inferior human over superior algorithmic judgement. However, their study specifically shows that people shun algorithmic decision making after having interacted and thus becoming familiar with the particular system. The commonly proposed reason for this behavior is that users devalue algorithmic advice after observing the algorithm to err, which means following the algorithmic advice still holds the risk of making suboptimal decisions (Prahl and Van Swol 2017; Dietvorst et al. 2015; Dzindolet et al. 2002). In contrast, other studies require participants to decide about their reliance on algorithmic advice before becoming familiar with the algorithm’s performance (Castelo et al. 2019; Longoni et al. 2019; Logg et al. 2019). These differences result in two varying understandings of what algorithm aversion is: unwillingness to rely on an algorithm that a user has experienced to err versus general resistance to algorithmic judgement. Our study aims at improving our understanding of algorithm aversion by investigating both understandings of this phenomenon in one common setting.
Previous research has suggested manifold predictors of algorithm aversion, such as the perceived subjectivity and uniqueness of tasks (Castelo et al. 2019; Longoni et al. 2019), the decision maker’s expertise (Whitecotton 1996) as well as the algorithm’s understandability (Yeomans et al. 2019). Burton et al. ( 2020) assorted possible causes of algorithm aversion into five categories: decision makers’ false expectations regarding the algorithms’ capabilities and performance, lack of control residing with the decision maker, incentive structures discriminating against the use of algorithmic decision support, incompatibility of intuitive human decision making and algorithmic calculations, and conflicting concepts of rationality between humans and algorithms. This study addresses the first of these categories: It specifically deals with the reasoning that people are less lenient toward algorithms than toward other humans because people expect algorithms to be perfect and do not believe that algorithms can overcome their errors (Dietvorst et al. 2015; Dawes 1979), whereas humans gain experience over time (Highhouse 2008). If this reasoning were true, then people should exhibit lower aversion toward an erring algorithm demonstrating the ability to learn than toward an erring algorithm that does not demonstrate the ability to learn. Existent studies have suggested several measures to enhance the use of DSS: allowing for minor adjustments of the algorithm by the decision maker (Dietvorst et al. 2018); improving the system design (Fildes et al. 2006; Benbasat and Taylor 1978); and training decision makers in the DSS use (Green and Hughes 1986; Mackay and Elam 1992). However, despite the increasing application of machine learning, we do not yet know how decision makers react to advice by AI-based systems that demonstrate the ability to learn and, thus, perceivably improve over time. We address this research gap in this study.

### 2.2 Ability to Learn

AI researchers employ various approaches to realize computational capabilities (Russell and Norvig 2010). The approach that enabled most of the recent breakthroughs in AI research is machine learning (Jordan and Mitchell 2015). Mitchell ( 1997, p. 2) defines machine learning as follows: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” Specifically, machine learning allows equipping systems with functionalities via data-based training instead of manual coding. We refer to such systems as AI-based systems because machine learning is part of the AI domain. Owing to algorithmic improvements, the increasing availability of training data, and decreasing costs of computation, machine learning has spurred substantial progress in the realization of several computational capabilities, such as computer vision, speech recognition, natural language processing, and decision support within AI-based systems (Jordan and Mitchell 2015).
When incorporating machine learning in IT systems, we can distinguish between training prior to system deployment (until the system meets specific performance thresholds) and ongoing (i.e., continual) learning after system deployment (Parisi et al. 2019). The latter is necessary if the available data is insufficient to train the system up to a desired level or if the system must be able to adapt to varying environmental conditions or user characteristics. For instance, DSS that depend on their users’ personal information suffer from a cold-start problem at the beginning of their use (Liebman et al. 2019). Among continually learning systems, we can further differentiate between those that explicitly involve users in the learning process (i.e., interactive or cooperative learning) and those that implicitly improve over time (Amershi et al. 2014; Saunders et al. 2016). In case of explicit learning, the user is part of the training loop and can exert influence on the process. Examples of explicit learning applications are data labeling (Wang and Hua 2011) and video game design (Seidel et al. 2018). Implicit learning systems improve over time without depending on explicit user feedback by either relying on other data sources or observing users’ behavior. Search engines, for instance, optimize the ranking of their search results by drawing upon clickstream data (Joachims and Radlinski 2007).
Whereas previous research has investigated the human role in interactive learning settings (Amershi et al. 2014), little is known about users’ reactions toward implicit learning systems. Zhang et al. ( 2011) show that retailer learning conceptualized as the quality of personalized product recommendations on an e-commerce website reduces customers’ product screening and evaluation costs while enhancing decision-making quality. Besides this study, we are not aware of research that has investigated whether humans perceive the ability to learn of AI-based systems and, if so, which consequences these perceptions have. Given the increasing use of machine learning and early calls for research on this matter (Liang 1987), our study seeks to provide first evidence on AI-based system users’ perceptions of the ability to learn.

## 3 Hypotheses Development

H1
For an objective and non- personal decision task, human decision makers exhibit algorithm appreciation if they are unfamiliar with the advisor’s performance.
While the literature generally offers mixed results regarding preferences for advisor nature, there is clear evidence of experience with an erring algorithmic advisor having a negative effect on the reliance on this advisor (Dietvorst et al. 2015; Prahl and Van Swol 2017). An important precondition for this effect is that the experience with the algorithmic advisor allows decision makers to determine that the advisor errs. Otherwise, people tend to continually rely on incorrect advice (Dijkstra 1999). A common explanation for this phenomenon is that people expect an algorithm’s advice to be perfect (Dzindolet et al. 2002; Highhouse 2008). However, in decisions under uncertainty neither humans nor algorithms can provide perfect advice. A disconfirmation of this expectation then leads to lower reliance on the algorithm compared to a similarly performant or even inferior human (Dietvorst et al. 2015). This reasoning is also in line with IS research on continued system use (Bhattacherjee and Lin 2015). Following the call by Castelo et al. ( 2019) for more research on how experience with algorithms influences their use, we thus propose:
H2a
For an objective and non- personal decision task, familiarity with an advisor’s performance moderates the effect of advisor nature on a human decision maker’s reliance on the advice if the advisor errs.
H2b
For an objective and non- personal decision task, human decision makers rely more on the advice of an unfamiliar algorithm than on the advice of a familiar algorithm if the advisor errs.
H2c
For an objective and non- personal decision task, human decision makers exhibit algorithm aversion if they are familiar with the advisor’s performance and the advisor errs.
If experiencing an algorithm to err causes a deterioration of reliance on this algorithm’s advice owing to unmet performance expectations (Dzindolet et al. 2002; Prahl and Van Swol 2017), a positive experience of an algorithm’s performance may conversely encourage a decision maker to rely on the algorithm (Alvarado-Valencia and Barrero 2014). In their study of algorithm aversion, Dietvorst et al. ( 2015) measured a set of beliefs about differences between human and algorithmic forecasts from the participants’ perspective. While the participants thought that algorithms outperformed humans in avoiding obvious mistakes and weighing information consistently, they strongly believed that humans were much better than algorithms at learning from mistakes and improving with practice. However, in light of the recent technological advances in AI and the increasing use of machine learning (Jordan and Mitchell 2015), these beliefs are not necessarily accurate, especially in the domain of objective and non-personal decision tasks. Likewise, we suggest that an algorithm’s ability to learn (i.e., to improve over time) can reduce the detrimental effect that familiarity with an erring algorithm has on the decision maker’s reliance on the algorithm’s advice. Naturally, this is only possible if users can recognize the algorithm’s ability to learn, which means the algorithm must demonstrate this ability. Furthermore, we expect this effect to hold only for algorithmic advisors because human advisors are expected to be able to learn. Therefore, our last hypotheses are:
H3a
For an objective and non- personal decision task, demonstrating an advisor’s ability to learn moderates the effect of advisor nature on the reliance on a familiar and erring advisor.
H3b
For an objective and non- personal decision task, human decision makers rely more on the advice of a familiar and erring algorithm with the ability to learn than on the advice of a familiar and erring algorithm without this ability.
H3c
For an objective and non- personal decision task, human decision makers do not exhibit algorithm aversion if they are familiar with the advisor’s performance and the advisor is erring but has the ability to learn.

## 4 Method

### 4.1 Experimental Design and Procedure

To test the hypotheses, we conducted an incentive-compatible online experiment in accordance with most research on algorithm aversion (Burton et al. 2020). An online experiment fitted the purpose of our study because it allowed us to measure the potential effects precisely and with high internal validity. Our experiment had a between-subject design with manipulations of advisor nature (human vs. algorithmic), familiarity (non-familiar vs. familiar), and ability to learn (non-learning vs. learning). Since the ability to learn can affect decision makers’ behavior only if they are familiar with the advisor, we could not employ a traditional full factorial design. Instead, we subdivided the experimental groups becoming familiar with the advisor into those experiencing a non-learning and those experiencing a learning advisor. Table  1 depicts our experimental design.
Table 1
Experimental design

Human
Algorithmic
Unfamiliar
H-U
A-U
Familiar
Non-learning
H-F-N
A-F-N
Learning
H-F-L
A-F-L
• The quarter of the year (ranging from Q1 to Q4);
• The day of the month (ranging from 1 to 31);
• The day of the week (ranging from Monday to Friday);
• The running of a promotion campaign (either yes or no);
• The recent sales (in percent below or above average); and
• The recent website traffic (in percent below or above average).
For all of these variables, we provided the participants with a short explanation about their effects on the number of incoming calls. Third, the participants received an advisor’s estimation based on the six variables’ specific values on the date for which the participants had to make their forecast. Lastly, we told the participants that they had to make eight training estimations before their final and incentivized estimation.
After receiving this information, the participants had to answer several comprehension questions before proceeding to the third step of the experimental procedure (see Table A7 in the digital appendix). This third step comprised the eight training estimations and was inspired by Dietvorst et al.’s ( 2015) experimental setting. This training phase was necessary to ensure that the participants could become familiar with the advisor. The participants in the conditions of not becoming familiar with the advisor had a training phase without the advisor to prevent confounds that could potentially distort the results. After completing the training estimations, we once more informed the participants that their ninth and final estimation (i.e., serious phase) would determine the variable share of their payment. The final forecast constituted the fourth step of the experiment and included the measurement of the dependent variable. This step was followed by a post-experimental questionnaire containing control and demographic variables (step 5). In the sixth and last step, we informed the participants about the accuracy of their final estimation and provided them with payment details.

### 4.2 Experimental Treatments

To ascertain that the advisor’s estimations were not too far from or too close to the participants’ estimations and thus created unintended confound, we conducted a pretest with 248 participants from Amazon Mechanical Turk. The participants had to provide estimations in a scenario similar to our final experiment. The average prediction error of the participants’ estimations was 5.5%. Therefore, we designed the advisors’ estimations in our actual experiment to have a similar prediction error on average.
Lastly, we conducted a second pretest with 267 participants from Amazon Mechanical Turk experiencing one of the six treatments to examine whether our experimental treatments would work as intended. We used manipulation checks for perceived learning (e.g., “ The Prediction Software gained a good understanding of how to properly estimate the number of calls.) by Alavi et al. ( 2002), anthropomorphism (e.g., “ The source of advice is …” “ machinelike … humanlike”) by Bartneck et al. ( 2009) and Benlian et al. ( 2020), and familiarity with the advisor (e.g., “ Overall, I am familiar with the Industry Expert”) by Gefen ( 2000) and Kim et al. ( 2009). Table A11 in the digital appendix contains the manipulation checks. The results of the second pretest indicated that all treatments would work as intended: First, H-F-L and A-F-L exhibited a significantly higher level of perceived learning than H-F-N and A-F-N (F = 5.53, p < 0.05). Second, H-F-N, H-F-L, A-F-N, and A-F-L exhibited a significantly higher level of familiarity than H-U and A-U (F = 3.00, p < 0.1). Lastly, H-U, H-F-N, and H-F-L exhibited a significantly higher level of anthropomorphism than A-U, A-F-N, and A-F-L (F = 4.46, p < 0.05).

### 4.3 Measures

$$WOA = \frac{adjusted\;estimation - initial\;estimation }{{advisor^{\prime}s\;estimation - initial\;estimation}}$$
A WOA of 0 means that decision makers do not adjust but remain with their initial estimation and thus ignore the advice. In contrast, a WOA of 1 represents a full adoption of the advisor’s estimation. Any values in between reflect the degree to which decision makers take their initial estimation and the advisor’s estimation into account for their adjusted estimation. Values below 0 or above 1 may also occur if decision makers believe that the true value lies outside the interval of their initial and the advisor’s estimation. Whereas several studies decide to winsorize such values (Logg et al. 2019), we retained these values as they were. Departing from the advisor’s estimation (WOA < 0) or overweighting the advisor’s estimation (WOA > 1) may be due to the participants’ deliberate choices depending on their experience with their own and the advisor’s performance in the training phase (Prahl and Van Swol 2017). Based on the WOA values within the different experimental groups, we intended to apply bootstrapped moderation analyses to test H2a as well as H3a and ANOVAs with planned contrasts to test the remaining hypotheses.
Besides our dependent variable, we measured several control and demographic variables in the post-experimental questionnaire (see Tables A8 and A9 in the digital appendix). Among those were the participants’ trusting disposition (Gefen and Straub 2004), personal innovativeness (Agarwal and Prasad 1998), experience in working for call centers as well as calling hotlines, and knowledge about call centers (based on Flynn and Goldsmith ( 1999)). Furthermore, we asked the participants for their age, gender, and education. Between the control and demographic variables, we placed an attention check (see Table A10 in the digital appendix). Lastly, we measured the participant’s perceived realism of the scenario.

### 4.4 Data Collection

To collect sample data, we recruited participants from Amazon Mechanical Turk, a viable and reliable crowdsourcing platform for behavioral research and experiments (Karahanna et al. 2018; Behrend et al. 2011; Steelman et al. 2014). Using Amazon Mechanical Turk is a suitable sampling strategy for our research, as it enables us to reach users who are internet savvy but not expert forecasters. Since experienced professionals have been shown to rely less on algorithmic advice than lay people, our sample is thus more conservative (Logg et al. 2019). We restricted participation to users who are situated in the U.S. and who exhibited a high approval rating (i.e., at least 95%) to ascertain high data quality (Goodman and Paolacci 2017). Moreover, we incentivized the attentive participation by mentioning that participants could receive up to twice the base payment as a bonus, depending on the accuracy of their final estimation.
From 636 participants completing the questionnaire, we removed those who failed the attention check or inserted values for the incentivized ninth estimation below 100. We further removed participants who exhibited outlier characteristics in the ninth estimation in the form of exceptionally fast (i.e., less than 7 s) or slow (i.e., more than 99 s) estimation times in any of the estimations. The final sample comprised 452 participants. Table  2 provides descriptive information of the analyzed data set.
Table 2
Descriptive sample information
Groups
N
Mean
WOA (%)
Age
Gender (male) (%)
H-U
68
58.0
40.6
45.6
A-U
83
74.6
41.5
54.2
H-F-N
72
64.2
39.3
44.4
A-F-N
70
39.9
40.0
54.3
H-F-L
70
54.9
39.1
35.7
A-F-L
89
63.6
38.1
52.8
To confirm the participants’ random assignment to the different experimental conditions based on our control and demographic variables, we conducted Fisher’s exact tests for the categorical variables and a MANOVA for the metric variables. There are no significant differences in trusting disposition, personal innovativeness, experience in working for call centers as well as calling hotlines, and knowledge about call centers between the six experimental groups (all p > 0.1). We also did not find differences regarding demographics in terms of gender, age, or education (all p > 0.1). Lastly, the participants across all groups indicated that they perceived the experiment as realistic (mean = 5.6; std. dev. = 1.1).

## 5 Results

We tested our hypotheses by conducting a series of analyses in IBM SPSS Statistics 25.
To test H1 – the effect of advisor nature on WOA if the advisors are unfamiliar – we conducted an ANOVA comparing H-U with A-U. The test revealed no significant main effect between the two groups (F = 2.14, p > 0.1). As such, H1 is not supported in that the participants do not significantly rely more on the unfamiliar algorithmic advisor than on the unfamiliar human advisor.
For H2a, we conducted a bootstrap moderation analysis with 10,000 samples and a 95% confidence interval (CI) with data from H-U, A-U, H-F-N, and A-F-N to test whether familiarity moderates the effect of advisor nature (Hayes 2017, PROCESS model 1). The results of our moderation analysis (see Fig.  3) show that familiarity moderates the effect of advisor nature on WOA (interaction effect = − 0.41, standard error = 0.18, p < 0.05). Specifically, the effect of advisor nature reverses when the advisor is familiar (effect = − 0.24, standard error = 0.13) compared to when the advisor is unfamiliar (effect = 0.17, standard error = 0.13), supporting H2a. To test H2b and H2c, we conducted a two-way independent ANOVA with planned contrasts among the same groups. The interaction effect between advisor nature and familiarity is significant (F = 5.20, p < 0.05), thus confirming the results of our moderation analysis. The pairwise comparison between A-U and A-F-N ( p < 0.01) is significant and that between H-F-N and A-F-N ( p < 0.1) is marginally significant. These results provide support for H2b and weak support for H2c.
For H3a, we conducted a bootstrap moderation analysis with 10,000 samples and a 95% confidence interval with data from groups H-F-N, A-F-N, H-F-L, and A-F-L to test whether demonstrating the ability to learn moderates the effect of advisor nature if the advisor is familiar (Hayes 2017, PROCESS model 1). The results of our moderation analysis (see Fig.  4) show that demonstrating the ability to learn moderates the effect of advisor nature on WOA (interaction effect = 0.33, standard error = 0.16, p < 0.05). Specifically, the negative effect of interacting with an algorithmic (vs. human) familiar advisor reverses when the familiar advisor demonstrates the ability to learn (effect = 0.09, standard error = 0.11) compared to when the familiar advisor does not learn (effect = − 0.24, standard error = 0.12), supporting H3a. We again conducted a two-way independent ANOVA with planned contrasts among the same groups to test H3b and H3c. The interaction effect between advisor nature and ability to learn is also significant (F = 4.24, p < 0.05), thus confirming the results of our moderation analysis. Similarly, the pairwise comparison between A-F-N and A-F-L is significant ( p < 0.05), while the pairwise comparison between H-F-L and A-F-L is not ( p > 0.1). These results support both H3b and H3c.

## 6 Discussion

Algorithm aversion has spurred controversial discussions in previous research, which resulted in differing understandings of this phenomenon. In this study, we set out to contribute toward clarifying what algorithm aversion is and under which conditions algorithm aversion occurs. Previous studies have produced conflicting findings about whether people are generally averse to algorithmic judgement or avoid algorithms only if they perceive these algorithms to err. Furthermore, we sought to investigate whether demonstrating an algorithm’s ability to learn may serve as an effective countermeasure against algorithm aversion, given that this ability becomes increasingly prevalent in AI-based systems. We studied these questions by simulating a forecasting task within a business setting. The accuracy of both the decision makers’ and the simulated advisors’ estimations were objectively measurable and did not depend on the decision makers’ personal characteristics. These important boundary conditions are true for many business decisions but should be considered when comparing our results with those of earlier studies (Castelo et al. 2019).

## 7 Implications

Our findings hold important implications for understanding decision makers’ reliance on AI-based systems under uncertainty, thereby answering the call for research on individuals’ reaction to and collaboration with AI-based systems (Buxmann et al. 2019).
We contribute to previous literature on algorithm aversion by comparing the reliance on an unfamiliar and a familiar algorithmic advisor to the reliance on an unfamiliar and a familiar human advisor of identical performance. Algorithm aversion was evident only if decision makers were familiar with the advisor. Therefore, we recommend using the term algorithm aversion only for the negative effect that familiarization with an algorithmic advisor has on reliance on this advisor, as was initially suggested by Dietvorst et al. ( 2015). Our results, furthermore, suggest that a general aversion to algorithmic judgement does not exist in objective and non-personal decision contexts. Different findings in early and recent studies on this topic may partly stem from the growing diffusion of algorithms in people’s everyday life and a corresponding accustomation to algorithms.
Practitioners may also gain useful insights from our study. Companies that provide or seek to employ DSS in contexts under uncertainty should consider possible negative effects of users becoming familiar with IS. To counter these effects, companies employing DSS in such contexts should manage their employees’ expectations of what IT systems can and cannot accomplish. Regarding the current debate on the effects of AI-based systems as black boxes (Maedche et al. 2019), our findings suggest that IS developers should invest in demonstrating and communicating the abilities of their IT systems to users. In case of AI-based systems with the ability to learn, this includes transparently demonstrating the system’s performance improvements over time. Potential measures to emphasize these improvements are displaying performance comparisons to previous advices and periodical reports on the performance development of IT systems.

## 9 Conclusion

Overall, our study is an initial step toward better understanding how users perceive the abilities of AI-based systems. Specifically, we shed light on how familiarity and demonstrating the ability to learn affect users’ reliance on algorithmic decision support. Our findings not only show that familiarity with an erring algorithmic advisor reduces decision makers’ reliance on this advisor but also that demonstrating an algorithm’s ability to learn over time can offset this effect. We hope that our study provides an impetus for future research on collaboration with and acceptance of AI-based systems as well as actionable recommendations for designing and unblackboxing such systems.

## Acknowledgements

The authors thank the editor and three anonymous reviewers for their valuable guidance throughout the review process as well as Jennifer Rix, Katharina Pagel, and Wilhelm Retief for their support in preparation of the manuscript.

## Electronic Supplementary Material

Below is the link to the electronic supplementary material.
Footnotes
1
We acknowledge that an algorithm is a processing logic to solve a task and that it is, thus, only a part of an IT system. Since previous research has labeled the phenomenon addressed in this study as algorithm aversion although users rather interact with IT systems than with algorithms, we use the terms algorithm and IT system interchangeably in this study. The same applies to the terms algorithmic advisors and DSS.

## Our product recommendations

### WIRTSCHAFTSINFORMATIK

WI – WIRTSCHAFTSINFORMATIK – ist das Kommunikations-, Präsentations- und Diskussionsforum für alle Wirtschaftsinformatiker im deutschsprachigen Raum. Über 30 Herausgeber garantieren das hohe redaktionelle Niveau und den praktischen Nutzen für den Leser.

### Business & Information Systems Engineering

BISE (Business & Information Systems Engineering) is an international scholarly and double-blind peer-reviewed journal that publishes scientific research on the effective and efficient design and utilization of information systems by individuals, groups, enterprises, and society for the improvement of social welfare.

### Wirtschaftsinformatik & Management

Texte auf dem Stand der wissenschaftlichen Forschung, für Praktiker verständlich aufbereitet. Diese Idee ist die Basis von „Wirtschaftsinformatik & Management“ kurz WuM. So soll der Wissenstransfer von Universität zu Unternehmen gefördert werden.

Supplementary Material
Literature