Skip to main content

Über dieses Buch

Data driven methods have long been used in Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) synthesis and have more recently been introduced for dialogue management, spoken language understanding, and Natural Language Generation. Machine learning is now present “end-to-end” in Spoken Dialogue Systems (SDS). However, these techniques require data collection and annotation campaigns, which can be time-consuming and expensive, as well as dataset expansion by simulation. In this book, we provide an overview of the current state of the field and of recent advances, with a specific focus on adaptivity.



Chapter 1. Conversational Interfaces

Although long anticipated by science fiction authors, in 2012 speech interfaces have now arrived in everyday use. Late in 2011, Apple introduced “Siri”, a speech-enabled personal assistant for smartphones. In addition, Android phones have employed speech-activated “Voice Actions” before the arrival of Siri, and in 2012 Google is rumoured to be developing a Siri-style interface with its “Majel” or “Assistant” project. Many other speech-enabled applications have also been deployed (e.g. Evi, Vlingo), and the market for speech applications is growing rapidly. Likewise, Microsoft’s “Kinect” controller has added new speech input capabilities to video game controllers.
Oliver Lemon

Chapter 2. Developing Dialogue Managers from Limited Amounts of Data

One of the central problems in developing a spoken dialogue system (SDS) is in how the system makes the decision of “what to say next” at any specific point in a conversation. This selection of an appropriate action is the core problem of dialogue management (DM), and it depends on having a representation of the conversational context at each decision point. This context information could consist of, for example, what information has already been conveyed in the dialogue, what the user has said in the preceding utterance (according to a speech recogniser), and the length of the dialogue so far. Making decisions regarding what to say next has been approached in a variety of ways.
Verena Rieser, Oliver Lemon

Chapter 3. Data-Driven Methods for Spoken Language Understanding

Spoken dialogue systems need to be able to interpret the spoken input from theuser. This is done by mapping the user’s spoken utterance to a representation ofthe meaning of that utterance, and then passing this representation to thedialogue manager. This process begins with the application of automatic speechrecognition (ASR) technology, which maps the speech to hypotheses about thesequence of words in the utterance. It is then the job of spoken languageunderstanding (SLU) to map the word recognition hypotheses to hypothesisedmeanings. The representation of this meaning is called the semantics of theutterance.
James Henderson, Filip Jurčíček

Chapter 4. User Simulation in the Development of Statistical Spoken Dialogue Systems

Statistical approaches to dialogue management have steadily increased inpopularity over the last decade. Recent evaluations of such dialogue managershave shown their feasibility for sizeable domains and their advantage in terms ofincreased robustness. Moreover, simulated users have shown to be highly beneficialin the development and testing of dialogue managers and in particular, fortraining statistical dialogue managers. Learning the optimal policy of aPOMDP dialogue manager is typically done using the reinforcement learning(RL), but with the RL algorithms that are commonly used today, thisprocess still relies on the use of a simulated user. Data-driven approaches touser simulation have been developed to train dialogue managers on morerealistic user behaviour. This chapter provides an overview of user simulationtechniques and evaluation methodologies. In particular, recent developments inagenda-based user simulation, dynamic Bayesian network-based simulations andinverse reinforcement learning-based user simulations are discussed indetail. Finally, we will discuss ongoing work and future challenges for usersimulation.
Simon Keizer, Stéphane Rossignol, Senthilkumar Chandramohan, Olivier Pietquin

Chapter 5. Optimisation for POMDP-Based Spoken Dialogue Systems

Spoken dialogue systems (SDS) allow users to interact with a wide variety of information systems using speech as the primary, and often the only, communication medium. The principal elements of an SDS are a speech understanding component which converts each spoken input into an abstract semantic representation called a user dialogue act (see Chap. 3), a dialogue manager which responds to the user’s input and generates a system act a t in response, and a message generator which converts each system act back into speech (see Chap. 6). At each turn t, the system updates its state s t , and based on a policy π, it determines the next system act a t = π(s t ). The state consists of the variables needed to track the progress of the dialogue and the attribute values (often called slots) that determine the user’s requirements. In conventional systems, as discussed in Chap. 8, the policy is usually defined by a flow chart with nodes representing states and actions and arcs representing user inputs.
Milica Gašić, Filip Jurčíček, Blaise Thomson, Steve Young

Chapter 6. Statistical Approaches to Adaptive Natural Language Generation

Employing statistical models of users, generation contexts and of naturallanguages themselves has several potentially beneficial features: the ability to trainmodels on real data, the availability of precise mathematical methods foroptimisation, and the capacity to adapt robustly to previously unseensituations.
Oliver Lemon, Srini Janarthanam, Verena Rieser

Chapter 7. Metrics and Evaluation of Spoken Dialogue Systems

The ultimate goal of an evaluation framework is to determine a dialogue system’s performance, which can be defined as “the ability of a system to provide the function it has been designed for” [32]. Also important, particularly for industrial systems, is dialogue quality or usability. To measure usability, one can use subjective measures such as User Satisfaction or likelihood of future use. These subjective metrics are difficult to measure and are dependent on the context and the individual user, whose goal and values may differ from other users. This chapter will survey evaluation frameworks and discuss their advantages and disadvantages. We will examine metrics for evaluating system performance and dialogue quality. We will also discuss evaluation techniques that can be used to automatically detect problems in the dialogue, thus filtering out good dialogues and leaving poor dialogues for further evaluation and investigation [62].
Helen Hastie

Chapter 8. Data-Driven Methods in Industrial Spoken Dialog Systems

In the early 1990s, the performance of speech and language processing technology combined with advanced voice user interface (VUI) design procedures allowed to start building conversational machines which could be deployed for commercial services offered to a large population of users [11]. Such machines would provide services typically assigned to call centers and human agents or to touch-tone (DTMF) interactive voice response (IVR) systems. Examples include providing travel information for trains or flights, routing phone calls to the appropriate department or agent, performing banking or stock market transactions, and providing technical support and troubleshooting. In general, conversational machines (in the following referred to as spoken dialog systems, or SDSs) consist of the following components
Roberto Pieraccini, David Suendermann

Chapter 9. Conclusion and Future Research Directions

Processing of speech and language inputs has been studied under a statistical point of view for quite some time. Data-driven methods pioneered by speech recognition researchers such as Rabiner [8] and Jelinek [2] in the late 1970s were applied to natural language understanding only in the early 1990s [5]. It is only a decade ago that dialogue management has benefited from statistical modeling and data-driven methods [3]. Following this trend, this book described the recent advances in statistical data-driven methods for spoken dialogue systems, especially within the European CLASSiC project funded under the seventh framework program. The aim of this project, as reflected by this book, was to produce generic methods for statistical optimization from end to end of a spoken dialogue system, starting with speech recognition and ending with speech synthesis. Machine learning techniques, such as reinforcement learning, were expected to provide useful approaches to this problem because of their ability to solve sequential decision-making problems but also because they rely on a strong mathematical background and on interpretable optimization criteria. Reinforcement learning has thus been applied for optimizing dialogue management and natural language generation (NLG) but also, to some extent, to produce user simulation techniques. Other machine learning methods, such as support vector machines (SVM) or Bayesian networks, were also integrated to make a fully operational system. This book summarized the technical and practical results obtained during this project.
Olivier Pietquin
Weitere Informationen