Zum Inhalt

2025 | Buch

Chatbots and Human-Centered AI

8th International Workshop, CONVERSATIONS 2024, Thessaloniki, Greece, December 4–5, 2024, Revised Selected Papers

herausgegeben von: Asbjørn Følstad, Symeon Papadopoulos, Theo Araujo, Effie L.-C. Law, Ewa Luger, Sebastian Hobert, Petter Bae Brandtzaeg

Verlag: Springer Nature Switzerland

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

Dieses Buch stellt die referierten Beiträge des 8. Internationalen Workshops über Chatbots und menschenzentrierte KI, CONVERSATIONS 2024, dar, der vom 4. bis 5. Dezember 2024 in Thessaloniki, Griechenland, stattfand. Die 12 vollständigen und 3 kurzen Beiträge wurden sorgfältig geprüft und aus 35 Einreichungen ausgewählt. Sie waren wie folgt in thematische Abschnitte gegliedert: Verständnis und Gestaltung von Mensch-KI-Interaktionen; Mensch-zentrierte KI in Bildung und sozialer Unterstützung; Konversationskunst für Bürger und Kunden.

Inhaltsverzeichnis

Frontmatter

Understanding and Designing for Human-AI Interactions

Frontmatter
Analyzing Patterns of Conversational Breakdown in Human-Chatbot Customer Service Conversations
Abstract
Many chatbots still struggle with correctly interpreting and responding to user enquiries. Therefore, it is important to figure out how and why chatbot-human conversations break down. In this study we analyzed features in user-utterances directly before a bot-initiated repair to determine their presence and prominence as possible predictors of conversational breakdowns. We used data from a real-life public transport customer service chatbot to demonstrate the errors that occur in actual deployed systems. The analysis shows that there are some features (such as commonness, outdated words, and unexpected words) that occur more often in utterances directly before a repair. Some features also correlate with each other and occur together, such as outdated words and subjectivity. By using feature analysis, many opportunities for improvement can be found either live (during the interaction) or afterwards.
Anouck Braggaar, Florian Kunneman, Emiel van Miltenburg
Selecting Empathic Response Headers in Customer Support Conversations with LLM-Based Emotion Recognition
Abstract
This research considers the task of automatically adding empathic headers (e.g., “Sorry to hear that.”) to agent responses in customer support conversations. We employ a task-oriented dialogue (TOD) response selection model which allows response headers to be selected from existing corpora of conversations. Since the model is not fine-tuned with information about emotions in tweets, it is supplemented by filtering based on emotion annotations. The open-sourced LLM Llama 3.1 is employed for providing these annotations. We devise an experiment to evaluate this approach by automatic means. The preliminary results are discussed.
W. L. Yeung
Exploring the Effects of Consistency-Based Hallucination Detection for LLM-Based QA Chatbots: A Simulation Study
Abstract
Large Language Models (LLMs) are increasingly used in Question Answering (QA) chatbots. Using LLMs in QA chatbots enables a more eloquent user conversation, however, these models are far from perfect. A major issue with LLMs is hallucinations, which refer to generated texts that include nonsensical or inaccurate information. These hallucinations are particularly problematic in critical QA contexts such as health or enterprise. Several methods exist to mitigate these hallucinations, wherein the consistency-based hallucination detection techniques are promising as they do not require a ground-truth or deep technical analysis. Although current research has proven the effectiveness of these detection techniques, it does not address possible trade-offs in their real-world use. To close this gap, we conducted a simulation study based on real-world QA data sets to compare different consistency-based hallucination detection techniques and their impact on precision and response time. Our research confirms that more complex detection techniques achieve higher precision, but also lead to a higher response time. In addition, we show that precision and response time strongly depend on the data set used and that precision is only slightly dependent on the consistency determining threshold.
Simon Rapp, Alexander Maedche, Marcus Jainta
LadderChat An LLM-Based Conversational Agent for Laddering Interviews
Abstract
Laddering interviews are a qualitative method to understand how customers value specific product attributes. However, traditional face-to-face approaches are resource-intensive and lack scalability. This paper introduces LadderChat, a conversational agent (CA) based on Large Language Models (LLMs) designed to conduct human-like laddering interviews. We propose design principles for an LLM-based CA and implement them in LadderChat. The system leverages LLMs for response analysis, adaptive probing, and visualization of Attributes-Consequences-Values (ACV) chains. A formative evaluation with six researchers provides initial insights into LadderChat’s effectiveness in guiding participants through the laddering process and generating real-time visualizations. The evaluation also identifies areas for refinement, including user interface improvements and data privacy considerations. This study contributes to ongoing efforts to enhance qualitative research methods through the application of advanced large language models.
Leon Hanschmann, Marvin Mokelke, Alexander Maedche
Can Machine Learning Models Recognise Emotions, Particularly Neutral, Better Than Humans?
Abstract
Audio and visual data play vital roles in emotion recognition, with machine learning (ML) methods like SVMs and deep neural networks excelling in inferring human emotions. This study compared a state-of-the-art ML model’s performance in each modality to human performance. It also examined items frequently labelled as ‘neutral’, comparing ML and human results. Utilising the CREMA-D dataset, CNN-LSTM and SVM models were trained for visual-only and audio-only data. Evaluation included matching and non-matching test sets, and ML models consistently outperformed humans, especially on the latter. The CNN-LSTM achieved 80.8% accuracy on matching visual data compared to human accuracy of 75.9%, and 39.0% versus 19.4% on non-matching data. Similarly, the SVM model scored 81.0% and 34.3% accuracy on matching and non-matching audio data, surpassing human accuracy of 68.9% and 17.9%. These results underscore ML’s superiority in monomodal emotion recognition, particularly for challenging emotions. Implications for emotion recognition research and ethical concerns are inferred.
Jeffin Siby, Effie Lai-Chong Law

Human-Centred AI in Education and Social Support

Frontmatter
An AI-Powered Learning Companion for Adaptive and Personalized STEM Education
Abstract
We introduce an AI learning companion designed to enhance STEM education by providing personalized, adaptive learning experiences. The companion dynamically adjusts the difficulty of experimental activities in online STEM labs and delivers targeted feedback based on real-time student performance, ensuring a more efficient learning process. Teachers can use the companion’s authoring tool to create tailored educational scenarios that cater to students at various performance levels. This work emphasizes a student-centered approach, supporting critical thinking and problem-solving skills.
Nikolaos Antonios Grammatikos, Evangelia Anagnostopoulou, Dimitris Apostolou, Gregoris Mentzas
Development and Evaluation of a University Chatbot Using Deep Learning: A RAG-Based Approach
Abstract
In university systems, traditional methods of information retrieval are often found to be inefficient, leading to frustration among students and staff. This paper presents the development and evaluation of a university-specific chatbot that employs the Retrieval-Augmented Generation (RAG) approach to improve the accuracy and relevance of its responses. Unlike conventional chatbots that depend on intent classification and pre-designed system responses and conversation flows, the proposed chatbot integrates Large Language Models (LLMs) with local university data, enhancing its ability to handle complex queries with context-aware responses and dynamically generated conversation flows. The system architecture includes components such as LangChain for orchestration, a vector store for embedding external knowledge, and a user interface developed using Streamlit. Evaluation results demonstrate that the RAG-based chatbot substantially outperforms traditional LLMs, including GPT-3.5, GPT-4 mini, and GPT-4, in terms of answer accuracy and reliability. In this paper we also reflect on the lessons learned during the chatbot’s development and deployment in a real-world university setting.
Kabir Olawore, Michael McTear, Yaxin Bi
A Voice-Enabled Intelligent Virtual Agent for People with Memory Impairments: Thematic Analysis of Focus Group Results
Abstract
Older adults often experience increasing memory impairments, which may lead to conditions like dementia, impacting their overall wellbeing. Intelligent virtual agents (IVAs) potentially offer support to address these wellbeing challenges. Most wellbeing support interventions are based on user interviews, but this approach has not yet resulted in widespread implementations. This study therefore utilizes a theoretical foundation for the agent’s design by building the agent on the Social Production Function (SPF) theory. The aim of this qualitative study was to evaluate a prototype version of the virtual agent, “Jane”, with experienced healthcare professionals in focus groups.
“Jane” was created as a lifelike human character displayed in a wall-mounted picture frame. The agent was designed to understand speech, and respond by voice, gesture, and multimedia. Eight use cases were derived from combining SPF theory and existing literature, programmed into the IVA dialogue manager, and demonstrated to a group of fourteen experienced healthcare professionals. The group was then split into two subgroups for two 1.5-hour focus group sessions. The discussions during these sessions were transcribed, and thematic analysis was conducted as qualitative data analysis technique. The consolidated criteria for reporting qualitative research (COREQ) checklist was applied to ensure the quality of the analysis.
The thematic analysis identified the main themes as ‘Getting to Know Her’, ‘Lifetime Support’, ‘Spider in the Social Web’, ‘Versatile Personal Assistant’, ‘Big Sister’, ‘Jane, I Don’t Understand You’, and ‘Deceiving Jane’. These themes were related to the SPF instrumental goals and the resources needed to achieve them. Overall, the healthcare professionals viewed the virtual agent as a promising resource for supporting the wellbeing of older adults.
Roel Boumans, Serge Thill, Debby Gerritsen, Tibor Bosse
The BookBot Project: Conceptual Design of a Social Robot Facilitating Reading Motivation
Abstract
The motivation to read among children in Nordic countries has seen a noticeable decline in recent years. This study explores the design of social robots for stimulating interest in reading among fourth-grade students (age 10–11). We used a combination of conceptual design methods to engage teachers and students from four classes in two Swedish schools in co-creation workshops. Ideas on functions, qualities, and robot designs were generated, and based on this a set of ten distinct design concepts were created: the Facilitator, the Librarian, the Coach, the Buddy, the Assistant, the Narrator, the Creator, the Apprentice, the Portable, and the Gamer. The strengths and weaknesses of different designs were evaluated, resulting in a final design named ‘The BookBot,’ aiming to inspire and engage students in book reading through discussions of book content, character portrayal, and personalised book recommendations.
Emma Mainza Chilufya, Mattias Arvola, Susanne Severinsson, Anna Martín Bylund, Linnéa Stenliden, Arezou Mortazavi, Tom Ziemke
Questions People Ask ChatGPT Regarding Their Romantic Relationships and What They Think About the Provided Answers: An Exploratory Study
Abstract
In this study, we explored ChatGPT’s potential to provide advice on interpersonal relationship issues. Ten participants asked the AI chatbot questions to evaluate the perceived usefulness and applicability of its responses regarding their romantic relationships. Specific patterns were identified in the questions participants asked, allowing for a categorization of the most common day-to-day concerns young people face in their relationships today. Among these, jealousy, trust, and privacy issues were prominent, alongside questions related to intimacy, sexual issues, relationship maintenance, and emotional conflict. Empirical data were collected through semi-structured interviews, regarding the perceived usefulness of the responses chatGPT provided. The findings suggest that ChatGPT can offer valuable insights and advice on interrelational issues. However, participants noted that while ChatGPT can be a helpful resource when human interaction is unavailable, it is not sufficient to function independently or replace human connections. Overall, this study highlights both the potential as well as the inadequacy of ChatGPT in providing interpersonal relationship advice.
Alexios Brailas, Lazaros Tsolakis

Conversational AI for Citizens and Customers

Frontmatter
AI-Driven Dialogue: Leveraging Generative AI in Conversational Agent Voting Advice Applications (CAVAAs)
Abstract
Voting Advice Applications (VAAs) have been crucial in modern elections, helping voters understand political issues and party positions. Recent innovations, such as Conversational Agent Voting Advice Applications (CAVAAs), enhance the user experience by integrating chatbots that address comprehension issues, helping to provide more accurate voting advice. However, current rule-based CAVAA chatbots face limitations due to the need for predefined responses and restricted question coverage. This paper introduces two new Generative AI chatbot designs -open and semi-open generative chatbots- using Retrieval-Augmented Generation (RAG) and open-source Large Language Models (LLMs). These models generate accurate, context-specific responses to political questions, addressing the limitations of rule-based systems while incorporating filters to avoid biased or inappropriate answers. Tested, in a German case study, using the German 2021 Wahl-O-Mat, the chatbots were evaluated through a two-stage process: coder grading of responses and an expert survey of VAA Creators (N = 13). Results demonstrate that the generative chatbots effectively answered both standard CAVAA questions and broader political inquiries, with high acceptance and perceived usefulness among experts. Despite these successes, further refinement is needed to improve filtering and ensure unbiased interactions. These findings offer valuable insights into the use of Generative AI in (CA)VAAs and its potential to improve voter guidance.
Thilo I. Dieing
First Aid for Europe – A Study on the Impact of Digital Voting Assistants on Young Adults During the Elections for the European Parliament in 2024
Abstract
Traditional digital voting assistants are online survey tools that help voters make a voting choice. Nevertheless, many young adults do not vote. Therefore, a study was conducted, in which two new types of digital voting assistants for young adults were compared: a voting assistant chatbot versus a voting assistant avatar. A small-scale experiment (N = 30) was used to investigate the influence of these voting assistants on the political knowledge and voting intention of young adults shortly before the election for the European Parliament in June 2024. In addition, the voting assistants were evaluated for user-friendliness, entertainment value and the extent to which young adults trusted the voting assistants. The results show that after interacting with both voting assistants, participants have more political knowledge and a higher, general voting intention. The increase in political knowledge is greater for participants with less political interest. In line with expectations, participants find the chatbot more user-friendly than the avatar. Participants also find the chatbot more entertaining and have more trust in the chatbot than in the avatar. This research shows that new forms of voting assistants may increase the political knowledge and voting intention of young adults. In this way, this research provides early insights into an important area of research, namely the potential contribution of these tools to an increase of the chance that young adults will actually cast their vote, and through this contribute to the strength and representativeness of the European Parliament.
Nina van Zanten, Roel Boumans
An Analysis of Federal and Municipal Chatbots in Germany
Abstract
Chatbots are often encountered in e-commerce and specifically customer service. In the last years, governmental institutions are also using them more often as a communication channel for citizens. In this paper we report our findings on the landscape of publicly available chatbots in federal states and large city municipalities in Germany. We collected a dataset of 25 chatbots and analyzed them with a codebook of 12 feature characteristics and 12 chatbot interaction patterns. These patterns describe how interactive elements are used in a chatbot conversation to guide the user. Our results show the presented features of the assessed municipal chatbots in terms of avatars, names, and layout. We also discuss our findings concerning the use of existing chatbot interaction patterns, and introduce four new patterns based on this real-world analysis. Our analysis also focuses on the differences between governmental and e-commerce chatbots in terms of their implementation of features and interaction patterns.
Verena Traubinger, Sebastian Heil, Julián Grigera, Alejandra Garrido, Sonia Abhyankar, Martin Gaedke
LLM-Powered Conversational AI in Customer Service: Users’ Expectations and Anticipated Use
Abstract
Large Language Models (LLMs) hold the promise to significantly enhance conversational AI in customer service. There is however limited understanding of end-users’ expectations and anticipated use of LLM-powered conversational AI (CAI), which is crucial since expectation is directly linked to experience. This paper addresses this gap by exploring potential end-users’ perceptions and expectations of LLM-powered CAI in customer service, drawing on expectancy-disconfirmation theory. The findings show that users have modest expectations for LLM-powered CAI in customer service, shaped by their past experiences with existing CAI and their experiences and knowledge of LLM features. The findings underline the importance of expectations management and ensuring high-quality use cases where LLM-powered CAI for customer service is implemented.
Anna Grøndahl Larsen, Marita Skjuve, Knut Kvale, Asbjørn Følstad
Feeling Understood by AI: How Empathy Shapes Trust and Influences Patronage Intentions in Conversational AI
Abstract
Conversational artificial intelligence (AI) is increasingly being integrated into customer service settings, with some AI now communicating empathetically in interactions. The impact of such empathetic displays on patronage intentions towards the AI however remains unclear. This research thus aimed to investigate the effect of empathetic communication through a conversational AI on consumer trust across competence- and benevolence-dimensions and on patronage intentions towards it after being provided with recommendations through said AI. Hence, two versions of a conversational AI via means of GPT-4 were trained: empathetic and non-empathetic. 300 UK residents were recruited to partake in a survey, in which they were presented with a screen recording of an interaction with one of the AI’s. The findings indicate that empathetic communication significantly increases perceived benevolence, which positively impacts patronage intentions. However, no significant effect was found on perceived competence. Notably, independent of empathetic communication competence proved to be a more influential driver of patronage intentions than benevolence, emphasizing the importance of high functionality in AI interactions.
Nele Pralat, Carolin Ischen, Hilde Voorveld
Backmatter
Metadaten
Titel
Chatbots and Human-Centered AI
herausgegeben von
Asbjørn Følstad
Symeon Papadopoulos
Theo Araujo
Effie L.-C. Law
Ewa Luger
Sebastian Hobert
Petter Bae Brandtzaeg
Copyright-Jahr
2025
Electronic ISBN
978-3-031-88045-2
Print ISBN
978-3-031-88044-5
DOI
https://doi.org/10.1007/978-3-031-88045-2