Skip to main content

About this book

This book compiles and presents a synopsis on current global research efforts to push forward the state of the art in dialogue technologies, including advances to the classical problems of dialogue management, language generation, question answering, human–robot interaction, chatbots design and evaluation, as well as topics related to the human nature of the conversational phenomena such as humour, social context, specific applications for e-health, understanding, and awareness

Table of Contents


Chat-Based Agents


A Character Expression Model Affecting Spoken Dialogue Behaviors

We address character (personality) expression for a spoken dialogue system in order to accommodate it in particular dialogue tasks and social roles. While conventional studies investigated controlling the linguistic expressions, we focus on spoken dialogue behaviors to express systems’ characters. Specifically, we investigate spoken dialogue behaviors such as utterance amount, backchannel frequency, filler frequency, and switching pause length in order to express three character traits: extroversion, emotional instability, and politeness. In this study, we evaluate this model with a natural spoken dialogue corpus. The results reveal that this model expresses reasonable characters according to the dialogue tasks and the participant roles. Furthermore, it is also shown that this model is able to express different characters among participants given the same role. A subjective experiment demonstrated that subjects could perceive the characters expressed by the model.

Kenta Yamamoto, Koji Inoue, Shizuka Nakamura, Katsuya Takanashi, Tatsuya Kawahara

ToxicBot: A Conversational Agent to Fight Online Hate Speech

Acting against online hate speech is an important challenge nowadays. Previous research has specifically focused on the development of NLP methods to automatically detect online hate speech while disregarding further action needed to mitigate hate speech in the future. This paper proposes a system that generates responses to intervene during online conversations with hate speech content. Prior to generation, the system uses a binomial, recurrent network-based classifier with a combination of word and sub-word embeddings to detect hate speech. With this architecture we achieved a F1 score of 0.786. The chatbot is based on a generative approach that uses a pre-trained transformer model and dynamically modifies the history or the persona profile to counteract the user’s hate speech. This adaptation provides sentences that the system could use to respond in the presence of aggression and discrimination behaviors.

Agustín Manuel de los Riscos, Luis Fernando D’Haro

Towards a Humorous Chat-Bot Companion for Senior Citizens

In the past decade, Singapore population has grown older with more elderly and fewer younger people. As such, government encourages research initiatives providing solutions to improve mental health and memory functions. In this paper, we discuss a within-subject experiment that we carried out to measure the impact of humour on information retention in senior citizens interacting with a chatbot. The recall and recognition rates of both humorous and non-humorous responses supported the hypothesis that humour aids recognition, but could not provide strong support for recall. We hope that studies going forward can assist the innovation of intelligent systems for other target groups and uses as well.

Ria Mundhra, Ting Jen Lim, Hong Ngan Duong, Kheng Hui Yeo, Andreea I. Niculescu

Masheli: A Choctaw-English Bilingual Chatbot

We present the implementation of an autonomous Choctaw-English bilingual chatbot. Choctaw is an American indigenous language. The intended use of the chatbot is for Choctaw language learners to practice. The system’s backend is NPCEditor, a response selection program that is trained on linked questions and answers. The chatbot’s answers are stories and conversational utterances in both languages. We experiment with the ability of NPCEditor to appropriately respond to language mixed utterances, and describe a pilot study with Choctaw-English speakers.

Jacqueline Brixey, David Traum

Dialogue Evaluation and Analysis


Deep AM-FM: Toolkit for Automatic Dialogue Evaluation

There have been many studies on human-machine dialogue systems. To evaluate them accurately and fairly, many resort to human grading of system outputs. Unfortunately, this is time-consuming and expensive. The study of AM-FM (Adequacy Metric - Fluency Metric) suggests an automatic evaluation metric, that achieves good performance in terms of correlation with human judgements. AM-FM framework intends to measure the quality of dialogue generation along two dimensions with the help of gold references: (1) The semantic closeness of generated response to the corresponding gold references; (2) The syntactic quality of the sentence construction. However, the original formulation of both adequacy and fluency metrics face some technical limitations. The latent semantic indexing (LSI) approach to AM modeling is not scalable to large amount of data. The bag-of-words representation of sentences fails to capture the contextual information. As for FM modeling, the n-gram language model implementation is not able to capture long-term dependency. Many deep learning approaches, such as the long short-term memory network (LSTM) or transformer-based architectures, are able to address these issues well by providing better contextual-aware sentence representations than the LSI approach and achieving much lower perplexity on benchmarking datasets as compared to the n-gram language model. In this paper, we propose deep AM-FM, a DNN-based implementation of the framework and demonstrate that it achieves promising improvements in both Pearson and Spearman correlation w.r.t human evaluation on the bench-marking DSTC6 End-to-end Conversation Modeling task as compared to its original implementation and other popular automatic metrics.

Chen Zhang, Luis Fernando D’Haro, Rafael E. Banchs, Thomas Friedrichs, Haizhou Li

Automatic Evaluation of Non-task Oriented Dialog Systems by Using Sentence Embeddings Projections and Their Dynamics

Human-Machine interaction through open-domain conversational agents has considerably grown in the last years. These social conversational agents try to solve the hard task of maintaining a meaningful, engaging and long-term conversation with human users by selecting or generating the most contextually appropriated response to a human prompt. Unfortunately, there is not a well-defined criteria or automatic metric that can be used to evaluate the best answer to provide. The traditional approach is to ask humans to evaluate each turn or the whole dialog according to a given dimension (e.g. naturalness, originality, appropriateness, syntax, engagingness, etc.). In this paper, we present our initial efforts on proposing an explainable metric by using sentence embedding projections and measuring different distances between the human-chatbot, human-human, and chatbot-chatbot turns on two different sets of dialogues. Our preliminary results show insights to visually and intuitively distinguish between good and bad dialogues.

Mario Rodríguez-Cantelar, Luis Fernando D’Haro, Fernando Matía

Dialogue Management and Pragmatic Models


Learning to Rank Intents in Voice Assistants

Voice Assistants aim to fulfill user requests by choosing the best intent from multiple options generated by its Automated Speech Recognition and Natural Language Understanding sub-systems. However, voice assistants do not always produce the expected results. This can happen because voice assistants choose from ambiguous intents—user-specific or domain-specific contextual information reduces the ambiguity of the user request. Additionally the user information-state can be leveraged to understand how relevant/executable a specific intent is for a user request. In this work, we propose a novel Energy-based model for the intent ranking task, where we learn an affinity metric and model the trade-off between extracted meaning from speech utterances and relevance/executability aspects of the intent. Furthermore we present a Multisource Denoising Autoencoder based pretraining that is capable of learning fused representations of data from multiple sources. We empirically show our approach outperforms existing state of the art methods by reducing the error-rate by 3.8%, which in turn reduces ambiguity and eliminates undesired dead-ends leading to better user experience. Finally, we evaluate the robustness of our algorithm on the intent ranking task and show our algorithm improves the robustness by 33.3%.

Raviteja Anantha, Srinivas Chappidi, William Dawoodi

Culture-Aware Dialogue Management for Conversational Assistants

The cultural background has a great influence on the people’s behaviour and perception. With the aim of designing a culturally sensitive conversational assistant, we have investigated whether culture-specific parameters may be trained by use of a supervised learning approach. We have used a dialogue management framework based on the concept of probabilistic rules and a multicultural data set to generate a culture-aware dialogue manager which allows communication in accordance with the user’s cultural idiosyncrasies. Hence, the system response to a user action varies depending on the user’s culture. Our data set contains 258 spoken dialogues from four different European cultures: German, Polish, Spanish and Turkish. For our evaluation, we have trained a culture-specific dialogue domain for each culture. Afterwards, we have compared the probability distributions of the parameters which are responsible for the selection of the next system action. The evaluation results show that culture-specific parameters have been trained and thus represent cultural patterns in the dialogue management decision process.

Juliana Miehle, Nicolas Wagner, Wolfgang Minker, Stefan Ultes

Dialog State Tracking with Incorporation of Target Values in Attention Models

This paper proposes a fully data-driven approach to dialog state tracking (DST) that can handle new slot values not seen during training. The approach is based on a long short-term memory recurrent neural network with an attention mechanism. Unlike with conventional attention mechanisms, we use encoded user utterances and a hypothesis for the slot value (the target value) to calculate attention weights. In addition, while conventional attention mechanisms focus on words that correspond to trained values, the proposed attention mechanism focuses on words that correspond to a target value. Therefore, the DST model can detect unseen values by adding values to the hypothesis as target values. The proposed approach is evaluated using the second and the third Dialog State Tracking Challenge datasets. Evaluation results show that the proposed method improves 10.3 points on unseen slot values.

Takami Yoshida, Kenji Iwata, Yuka Kobayashi, Hiroshi Fujimura

Delay Mitigation for Backchannel Prediction in Spoken Dialog System

To provide natural dialogues between spoken dialog systems and users, backchannel feedback can be used to make the interaction more sophisticated. Many related studies have combined acoustic and lexical features into a model to achieve better prediction. However, extracting lexical features leads to a delay caused by the automatic speech recognition (ASR) process. The systems should respond with no delay, since delays reduce the naturalness of the conversation and make the user feel dissatisfied. In this work, we present a prior prediction model for reducing response delay in backchannel prediction. We first train both acoustic- and lexical-based backchannel prediction models independently. In the lexical-based model, prior prediction is necessary to consider the ASR delay. The prior prediction model is trained with a weighting value that gradually increases when a sequence is closer to a suitable response timing. The backchannel probability is calculated based on the outputs from both acoustic- and lexical-based models. Evaluation results show that the prior prediction model can predict backchannel with an improvement rate on the F1 score 8% better than the current state-of-the-art algorithm under a 2.0-s delay condition.

Amalia Istiqlali Adiba, Takeshi Homma, Dario Bertero, Takashi Sumiyoshi, Kenji Nagamatsu

Towards Personalization of Spoken Dialogue System Communication Strategies

This study examines the effects of 3 conversational traits – Register, Explicitness, and Misunderstandings – on user satisfaction and the perception of specific subjective features for Virtual Home Assistant spoken dialogue systems. Eight different system profiles were created, each representing a different combination of these 3 traits. We then utilized a novel Wizard of Oz data collection tool and recruited participants who interacted with the 8 different system profiles, and then rated the systems on 7 subjective features. Surprisingly, we found that systems which made errors were preferred overall, with the statistical analysis revealing error-prone systems were rated higher than systems which made no errors for all 7 of the subjective features rated. There were also some interesting interaction effects between the 3 conversational traits, such as implicit confirmations being preferred for systems employing a “conversational” Register, while explicit confirmations were preferred for systems employing a “formal” Register, even though there was no overall main effect for Explicitness. This experimental framework offers a fine-grained approach to the evaluation of user satisfaction which looks towards the personalization of communication strategies for spoken dialogue systems.

Carla Gordon, Kallirroi Georgila, Volodymyr Yanov, David Traum

Dialogue Systems for e-health


Authoring Negotiation Content and Programming Simulated Patients

Competent negotiation with simulated patients can save expenses in medical education and improve outcomes for all parties involved. The use of simulated agents is beneficial for a study of human behaviour and cognition due to the possibility to create and manage a wide range of specific social situations. Building plausible cognitive models underlying the agent’s intelligent behaviour from scratch is challenging and costly. Interaction designers and cognitive engineers require sufficient background knowledge to decide which domain information, resources and activities are important. Domain experts require sufficient understanding of human interaction and social cognition. All may lack advanced software development skills and an access to sufficient amount of authentic data. This paper presents a methodology to author cognitive agents and interactions with them. Authors can easily encode agents’ knowledge and equip them with different sets of preferences and decision making strategies. This offers abundant opportunities for various social simulations: to create and control situations in which doctor’s decision making and negotiation skills can be applied and assessed; employ and relate specific action patterns to various strategies and sociopragmatic variables of interactional power, social distance and degree of imposition; predict outcomes and explain why the choices made lead to what specific outcomes. The proposed approach also enables efficient collection of significant amount of annotated dialogue data and can be applied to model various medical and not medical negotiation scenarios.

Volha Petukhova, Firuza Sharifullaeva, Dietrich Klakow

A Spanish Corpus for Talking to the Elderly

In this work, a Spanish corpus that was developed, within the EMPATHIC project ( ) framework, is presented. It was designed for building a dialogue system capable of talking to elderly people and promoting healthy habits, through a coaching model. The corpus, that comprises audio, video an text channels, was acquired by using a Wizard of Oz strategy. It was annotated in terms of different labels according to the different models that are needed in a dialogue system, including an emotion based annotation that will be used to generate empathetic system reactions. The annotation at different levels along with the employed procedure are described and analysed.

Raquel Justo, Leila Ben Letaifa, Javier Mikel Olaso, Asier López-Zorrilla, Mikel Develasco, Alain Vázquez, M. Inés Torres

Analysis of Prosodic Features During Cognitive Load in Patients with Depression

Major Depressive Disorder (MDD) is a largely extended mental health disorder commonly associated with a hesitant and monotonous speech. This study analyses a speech corpus from a database acquired on 40 MDD patients and 40 matched controls (CT). During the recordings, individuals experienced different levels of cognitive stress when performing Stroop color test that includes three tasks with increasingly level of difficulty. Speech features based on the fundamental frequency (F0), and the speech ratio (SR), which measures the speech to silence ratio, are used for characterising depressive mood and stress responsiveness. Results show that SR is significantly lower in MDD subjects compared to healthy controls for all the tasks, decreasing as the difficulty of the cognitive tasks, and thus the stress level, increases. Moreover F0 related parameters (median and interquartile range) show higher values within the same subject in tasks with increased difficulty level for both groups. It can be concluded that speech features could be used for characterising depressive mood and assessing different levels of stress.

Carmen Martínez, Spyridon Kontaxis, Mar Posadas-de Miguel, Esther García, Sara Siddi, Jordi Aguiló, Josep Maria Haro, Concepción de la Cámara, Raquel Bailón, Alfonso Ortega

Co-creating Requirements and Assessing End-User Acceptability of a Voice-Based Chatbot to Support Mental Health: A Thematic Analysis of a Living Lab Workshop

Mental health and mental wellbeing have become an important factor to many citizens navigating their way through their environment and in the work place. New technology solutions such as chatbots are potential channels for supporting and coaching users to maintain a good state of mental wellbeing. Chatbots have the added value of providing social conversations and coaching 24/7 outside from conventional mental health services. However, little is known about the acceptability and user led requirements of this technology. This paper uses a living lab approach to elicit requirements, opinions and attitudes towards the use of chatbots for supporting mental health. The data collected was acquired from people living with anxiety or mild depression in a workshop setting. The audio of the workshop was recorded and a thematic analysis was carried out. The results are the co-created functional requirements and a number of use case scenarios that can be of interest to guide future development of chatbots in the mental health domain.

Antonio Benítez-Guijarro, Raymond Bond, Frederick Booth, Zoraida Callejas, Edel Ennis, Anna Esposito, Matthias Kraus, Gavin McConvey, Michael McTear, Maurice Mulvenna, Courtney Potts, Louisa Pragst, Robin Turkington, Nicolas Wagner, Huiru Zheng

Development of a Dialogue System that Supports Recovery for Patients with Schizophrenia

Schizophrenia is a mental illness characterized by relapsing episodes of psychosis. Schizophrenia is treatable, and treatment with medicines and psychosocial support is effective. However, schizophrenia is one of the most expensive mental illnesses in terms of total medical costs required, including costs for effective treatment and for the continuous support and monitoring that is necessary. It is therefore useful and beneficial to explore how new technology, such as dialogue systems and social robots, can be used to provide help and assistance for care personnel as well as for the patients in the treatment and recovery from the illness. In this paper, we discuss various issues related to the development of a dialogue system that is able to recognize the characteristics of schizophrenia and provide support for schizophrenia patients 24 h a day.

Chiaki Oshiyama, Shin-ichi Niwa, Kristiina Jokinen, Takuichi Nishimura

Human-Robot Interaction


Caption Generation of Robot Behaviors Based on Unsupervised Learning of Action Segments

Bridging robot action sequences and their natural language captions is an important task to increase explainability of human assisting robots in their recently evolving field. In this paper, we propose a system for generating natural language captions that describe behaviors of human assisting robots. The system describes robot actions by using robot observations; histories from actuator systems and cameras, toward end-to-end bridging between robot actions and natural language captions. Two reasons make it challenging to apply existing sequence-to-sequence models to this mapping: (1) it is hard to prepare a large-scale dataset for any kinds of robots and their environment, and (2) there is a gap between the number of samples obtained from robot action observations and generated word sequences of captions. We introduced unsupervised segmentation based on K-means clustering to unify typical robot observation patterns into a class. This method makes it possible for the network to learn the relationship from a small amount of data. Moreover, we utilized a chunking method based on byte-pair encoding (BPE) to fill in the gap between the number of samples of robot action observations and words in a caption. We also applied an attention mechanism to the segmentation task. Experimental results show that the proposed model based on unsupervised learning can generate better descriptions than other methods. We also show that the attention mechanism did not work well in our low-resource setting.

Koichiro Yoshino, Kohei Wakimoto, Yuta Nishimura, Satoshi Nakamura

Towards a Natural Human-Robot Interaction in an Industrial Environment

Nowadays, modern industry has adopted robots as part of their processes. In many scenarios, such machines collaborate with humans to perform specific tasks in their same environment or simply guide them in a natural, safe and efficient way. Our approach improves a previously conducted work on a multi-modal human-robot interaction system with different audio acquisition and speech recognition modules for a more natural communication. The semantic interpreter, with the aid of a knowledge manager, parses the resulting transcription and, using contextual information, selects the order that the operator has uttered and sends it to the robot to be executed. This setup is evaluated in a real manufacture scenario in a laboratory environment with a large set of end users both quantitatively and qualitatively. The gathered results reveal that the system behaves robustly and that the assignment was also considered by the end users as manageable, whilst the system in overall was received with a high level of trust and usability.

Ander González-Docasal, Cristina Aceta, Haritz Arzelus, Aitor Álvarez, Izaskun Fernández, Johan Kildal

Nudges with Conversational Agents and Social Robots: A First Experiment with Children at a Primary School

This paper presents an experimental protocol during which human interlocutors interact with a dialog system capable to nudge, i.e. to influence through indirect suggestions which can affect the behaviour and the decision making. This first experiment was undertaken upon a population of young children with ages ranging from 5 to 10 years. The experiment was built to acquire video and audio data highlighting the propensity to nudge of automatic agents, whether they are humanoid robots or conversational agents and to point out potential biases human interlocutors may have when conversing with them. Dialogues carried with three types of agents were compared: a conversational agent (Google Home adapted for the experiment), a social robot (Pepper from Softbank Robotics) and a human. 91 French speaking children participated in this first experiment which took place in a private primary school. Dialogues are manually orthographically transcribed and annotated in terms of mental states (emotion, understanding, interest, etc.), affect bursts and language register, which form altogether what we call a user state. We report on an automatic user states detection experiment based on paralinguistic cues in order to build a future automatic nudging system that adapts to the user. First results highlight that the conversational agent and the robot are more influential in nudging children than the human interlocutor.

Hugues Ali Mehenni, Sofiya Kobylyanskaya, Ioana Vasilescu, Laurence Devillers

Exploring Boundaries Among Interactive Robots and Humans

Research and development of social robots have rapidly increased in recent years, and it is expected that usefulness of such agents in society will expand for various tasks, for which embodied natural interaction can make an important difference in interfacing humans and AI applications. This paper discusses challenges concerning the robot agent’s skills and knowledge in the development of interactive robot technology, focusing on a novel conceptualization of robot agents which cross boundaries from computational machines to human-like social agents and from human-controlled tools to autonomous co-workers.

Kristiina Jokinen

Data Augmentation, Collection and Manipulation


MixOut: A Simple Yet Effective Data Augmentation Scheme for Slot-Filling

We present a data augmentation strategy for slot-filling in task-oriented dialogue systems. It is simple yet effective and does not rely on external corpora. Lexicons for all slot types are generated from available annotated data. Synthetic, yet realistic utterances are then created by replacing slot values with other values of the same type. The method can also be easily extended to synthesize mixed language utterances for cross-lingual training. Monolingual experiments on 14 datasets across 10 different domains, 4 languages and cross-lingual experiments on 3 language pairs demonstrate the effectiveness of this method.

Mihir Kale, Aditya Siddhant

Towards Similar User Utterance Augmentation for Out-of-Domain Detection

Data scarcity is a common issue in the development of Dialogue Systems from scratch, where it is difficult to find dialogue data. This scenario is more likely to happen when the system’s language differs from English. This paper proposes a first text augmentation approach that selects samples similar to annotated user utterances from existing corpora, even if they differ in style, domain or content, in order to improve the detection of Out-of-Domain (OOD) user inputs. Three different sampling methods based on word-vectors extracted from BERT language representation model are compared. The evaluation is carried out using a Spanish chatbot corpus for OOD utterances detection, which has been artificially reduced to simulate various scenarios with different amounts of data. The presented approach is shown to be capable of enhancing the detection of OOD user utterances, achieving greater improvements when less annotated data is available.

Andoni Azpeitia, Manex Serras, Laura García-Sardiña, Mikel D. Fernández-Bhogal, Arantza del Pozo

Response Generation to Out-of-Database Questions for Example-Based Dialogue Systems

Example-based dialogue systems are often used in practice because of their robustness and simple architecture. However, when these systems are given out-of-database questions that are not registered in the question-response database, they have to respond with a fixed backup response, which can make users disengaged in the dialogue. In this study, we address response generation for out-of-database questions to make users perceive that the system understands the question itself. We define question types observed in the speed-dating scenario which is based on open-domain dialogue. Then we define possible response frames for each question type. We propose a sequence-to-sequence model that directly generates an appropriate response frame from an input question sentence in an end-to-end manner. The proposed model also explicitly integrates a question type classification to take into account the question type of the out-of-database question. Experimental results show that integrating the question type classification improved the response generation, and could exactly match 69.2% of response frames provided by human annotators.

Sota Isonishi, Koji Inoue, Divesh Lala, Katsuya Takanashi, Tatsuya Kawahara

Packing, Stacking, and Tracking: An Empirical Study of Online User Adaptation

This paper explores the application of expert tracking to online user adaptation based on a set of basic predictors in order to classify input in multimodal interaction settings. We compare the performances of this approach to other common approaches that aggregate multiple predictors, like stacking and voting. To realistically assess the performances of algorithms that require feedback, we added noise to feedback to simulate an imperfect system. Using two datasets, we obtained inconsistent results. With one dataset, expert tracking was the best option for short interactions, but with the other dataset, it was outperformed by other algorithms. In contrast, voting worked surprisingly well. On the basis of these results, we discuss implications and future directions.

Jean-Sébastien Laperrière, Darryl Lam, Kotaro Funakoshi

Language Identification, Grammar and Syntax


On the Use of Phonotactic Vector Representations with FastText for Language Identification

This paper explores a better way to learn word vector representations for language identification (LID). We have focused on a phonotactic approach using phoneme sequences in order to make phonotactic units (phone-grams) to incorporate context information. In order to take into consideration the morphology of phone-grams, we have considered the use of sub-word information (lower-order n-grams) to learn phone-grams embeddings using FastText. These embeddings are used as input to an i-Vector framework to train a multiclass logistic classifier. Our approach has been compared with a LID system that uses phone-gram embeddings learned through Skipgram that do not implement sub-word information, using Cavg as a metric for our experiments. Our approach to LID to incorporate sub-word information in phone-grams embeddings significantly improves the results obtained by using embeddings that are learned ignoring the structure of phone-grams. Furthermore, we have shown that our system provides complementary information to an acoustic system, improving it through the fusion of both systems.

David Romero, Christian Salamea

The Influence of Syntax on the Perception of In-Vehicle Prompts and Driving Performance

Advances in Natural Language Generation technically enable dialog systems to output utterances of an arbitrary length. However, in order to provide the most efficient form of interaction, the complexity of voice output needs to be adapted to individual user needs and contexts. This paper investigates the influence of syntactic complexity on user experience and primary task performance with spoken interaction representing a secondary task, such as in the automotive context. For this purpose, we validate the approach of assessing user preferences concerning voice output. On this basis, we report the results of a user study, where participants interact with a simulated dialog system producing utterances of differing syntactic complexity. We conclude that the choice of a particular syntactic structure affects primary task performance. Equally, our analyses of user preferences suggest an influence on the perception of syntactic forms dependent on individual context and user characteristics.

Daniela Stier, Ulrich Heid, Patricia Kittel, Maria Schmidt, Wolfgang Minker

Learning Grammar in Confined Worlds

In this position paper we argue that modern machine learning approaches fail to adequately address how grammar and common sense should be learned. State of the art language models achieve impressive results in a range of specialized tasks but lack underlying world understanding. We advocate for experiments with the use of abstract, confined world environments where agents interact with the emphasis on learning world models. Agents are induced to learn the grammar needed to navigate the environment, hence their grammar will be grounded in this abstracted world. We believe that this grounded grammar will therefore facilitate a more realistic, interpretable and human-like form of common sense.

Graham Spinks, Ruben Cartuyvels, Marie-Francine Moens

Corpora and Knowledge Management


A Content and Knowledge Management System Supporting Emotion Detection from Speech

Emotion recognition has recently attracted much attention in both industrial and academic research as it can be applied in many areas from education to national security. In healthcare, emotion detection has a key role as emotional state is an indicator of depression and mental disease. Much research in this area focuses on extracting emotion related features from images of the human face. Nevertheless, there are many other sources that can identify a person’s emotion. In the context of MENHIR, an EU-funded R&D project that applies Affective Computing to support people in their mental health, a new emotion-recognition system based on speech is being developed. However, this system requires comprehensive data-management support in order to manage its input data and analysis results. As a result, a cloud-based, high-performance, scalable, and accessible ecosystem for supporting speech-based emotion detection is currently developed and discussed here.

Binh Vu, Mikel deVelasco, Paul Mc Kevitt, Raymond Bond, Robin Turkington, Frederick Booth, Maurice Mulvenna, Michael Fuchs, Matthias Hemmje

A Script Knowledge Based Dialogue System for Indoor Navigation

We present an indoor navigation system that is based on natural spoken interaction. The system navigates the user through the University of Ulm based on scripts, supporting three different routes and varying communication styles for the system descriptions. Furthermore, the system is able to cope with incomplete scripts and inconclusive situations by passing the dialogue initiative to the user. In order to create the scripts, a data collection has been carried out in which 97 native speakers described the routes with the help of videos. In the end, the system has been evaluated in a user study with 30 participants. The work is part of the research project “MUNGO—Multi-User Indoor Navigation Using Natural Spoken Dialog and Self-learning Ontologies”, co-funded by the 2018 Google Faculty Research Award.

Juliana Miehle, Isabel Feustel, Wolfgang Minker, Stefan Ultes

Data Collection Design for Dialogue Systems for Low-Resource Languages

This paper presents our plan and initial design for constructing a dialogue corpus for a low resource language, in this case Uyghur, with the ultimate goal of developing a dialogue system for Uyghur. We plan to design and create a Massively multiplayer online role-playing game (MMORPG), using the RPG Maker MV Game Engine. We also introduce our initial design of a method for collecting various types of naturally generated questions and answers from native Uyghur speakers. Our method and the design of the game can be used for other low resource languages for collecting a large amount of dialogue data, which is crucial for implementing a dialogue system for such languages.

Zulipiye Yusupujiang, Jonathan Ginzburg

Conversational Systems Research in Spain: A Scientometric Approach

The aim of this paper is to present a preliminary scientometric study of the area of conversational systems in Spain. In order to do so, we have used the Web of Science database to retrieve the papers in the area using a comprehensive list of keywords and considering those papers with at least one author with Spanish affiliation. Our results present an overview of the main topics, authors and institutions involved in conversational system research and show the good status of Spanish research in this discipline.

David Griol, Zoraida Callejas


Additional information