Skip to main content

1998 | Buch

Designing Interactive Speech Systems

From First Ideas to User Testing

verfasst von: Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær

Verlag: Springer London

insite
SUCHEN

Über dieses Buch

Designing Interactive Speech Systems describes the design and implementation of spoken language dialogue within the context of SLDS (spoken language dialogue systems) development. Using an applications-oriented SLDS developed through the Danish Dialogue project, the authors describe the complete process involved in designing such a system; and in doing so present several innovative practical tools, such as dialogue design guideline s, in-depth evaluation methodologies, and speech functionality analysis. The approach taken is firmly applications-oriented, describing the results of research applicable to industry and showing how the development of advanced applications drives research rather than the other way around. All those working on the research and development of spoken language services, especially in the area of telecommunications, will benefit from reading this book.

Inhaltsverzeichnis

Frontmatter
1. Interactive Speech Systems
Abstract
When we use a computer system to perform a certain task, the computer system acts both as a tool and as a partner in communication. It would be stretching the sense of the term “communication” beyond reasonable limits to say that one communicates with a spade when using it. The computer is different to the spade in important ways. The user must input information in some form in order to make the system execute. Similarly, to inform the user of its state, processes and their results, the computer must output information to the user. The information which is being exchanged between user and system during task performance can be represented in different forms, or modalities, using a variety of different input/output devices. For a wide range of tasks, the system can achieve task adequacy as a tool by exchanging information with its users in ways that are completely different to those of human-human communication, such as through a keyboard and mouse as input devices, the screen as an output device and typed command notation as the key modality for representing input information. With or without the inclusion of typed command input notation, this form of interaction is called the graphical user interface (GUI) paradigm.
Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær
2. Speech Interaction Theory
Abstract
With the spreading of interactive speech system technologies, a clear need arises for theory which may adequately support the development of increasingly sophisticated but still restricted interactive speech systems. A complete and applied theory of spoken human-machine interaction would rigorously support efficient interactive speech system development from initial requirements capture through to the test and maintenance phases. It would include support for interaction model development and implementation, appropriate functionality design, usability optimization, interactive speech system evaluation and maintenance. Above all, such a theory would have to be based on the fact that the interaction models of today’s interactive speech systems are all task-oriented, they enable the system to carry out spoken interaction with users in limited application domains (Smith and Hipp, 1994). When combined with a basic level of meta-communication, or communication about the interaction itself, task-orientation is what enables current systems to successfully undertake spoken dialogue with humans despite their many limitations compared to human interlocutors. These comparative limitations may be briefly illustrated by taking a look at spoken human-human communication.
Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær
3. Developing Interactive Speech Systems
Abstract
In the following chapters we describe the development and evaluation of interaction model and dialogue component aspects of advanced interactive speech systems in accordance with the idea of a rationalized development process presented in Section 1.2. Ideally, development and evaluation would be exhaustively presented on the basis of a consolidated and transparent version of a theory of spoken interaction such as the one presented in Chapter 2. For the time being, we can offer only a less comprehensive and more fragmented view. Advanced interactive speech system development has so far taken place mainly in research projects and a complete best practice methodology which can support, improve, make more efficient and help standardize the development and evaluation of advanced interactive speech systems is still far away. The methodology should specialize software engineering best practice to the particular purposes of advanced interactive speech system engineering by specifying in detail the methods (procedures, guidelines, heuristics), concepts and tools to be used in developing and evaluating advanced interactive speech systems as well as providing guidelines on when and how to use each method, set of concepts or tool.
Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær
4. Interaction Model Analysis and Design
Abstract
Interaction model analysis and design is a core issue in the development of advanced interactive speech systems (cf. Section 5.4). It starts at a low speed in the survey phase, is the focal point in the analysis and design phase, and continues during subsequent phases alternating with evaluation (Figure 3.1). Revisions always require analysis, and some form of re-design may be needed as late as in the acceptance test phase. In the analysis and design phase the aim is to develop the interaction model to such a level of formal detail that it can serve as a basis for implementation. The design specification initiated during the survey (Section 3.2) serves as a basis for establishing an interaction model for the system to be developed. The design specification is iteratively extended in the analysis and design phase because many new questions typically arise during interaction model design. These questions must be addressed, new design decisions made and conflicts arising from design decision making resolved, often through trade-offs among conflicting constraints. Results in terms of new design goals, constraints and modifications are added to the design specification and the development of the interaction model is continued on this evolving basis.
Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær
5. Wizard of Oz Simulation
Abstract
When the first interaction model has been designed (cf. Section 4.1) interactive speech system development may either go through a phase of Wizard of Oz (WOZ) simulations as will be described in this chapter, or go straight to implementation (Chapter 6) following the implement-test-revise approach (Figure 3.1). Today’s research on interaction model design for advanced interactive speech systems often includes the WOZ experimental prototyping method. In WOZ a human (the wizard) simulates whole or part of the interaction model of the system to be developed, carrying out spoken interactions with users who are made to believe that they are interacting with a real system. WOZ is a relatively costly development method because: (1) the wizard needs a significant amount of training and support; (2) involving experimental subjects, WOZ experiments require careful planning and preparation and take time to run; and (3) experimental results have to be transcribed and analysed, which takes time and requires skill to benefit further system development. On the other hand, by producing data on the interaction between a (fully or partially) simulated system and its users, WOZ provides the basis for early tests of the system and its feasibility, as well as of the coverage and adequacy of requirements prior to implementation. The use of WOZ has so far been justified through the comparatively higher cost of having to revise an already implemented interactive speech system whose interaction model turned out to be seriously flawed, or of having to discard a system which users will not use. As recognition and parsing techniques continue to improve and the body of standard software grows, implement-test-revise methods are likely to gain ground in the design of advanced interactive speech systems.
Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær
6. Implementational Issues
Abstract
The implementational issues of interactive speech systems, neither in principle nor in practice, differ much from those of any other software system. Nevertheless, this chapter illustrates some important implementational issues raised in particular by the nature of the dialogue control layer.
Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær
7. Corpus Handling
Abstract
A corpus is a collection or body of linguistic data, organized in a manner that will facilitate investigation of, and reference to, the data. By today’s standards, corpora are in machine-readable form. Dictionary publishers maintain corpora of citations and word uses, and researchers collect huge (millions of words) corpora of texts of all kinds for many different purposes. Corpus linguistics is both a well-established discipline and an active research area (McEnery and Wilson, 1996). A growing subdiscipline focuses on spoken language (Leech et al, 1995). Spoken corpora are collections of usually transcribed spoken language such as monologues, interviews, conversations or task-oriented dialogues. This chapter focuses on the transcription, markup and coding of spoken dialogue corpora, emphasizing the representations, procedures and tools that are relevant to the design of interactive speech systems.
Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær
8. Evaluation
Abstract
System evaluation is a highly important discipline which is tightly interwoven with system development. Evaluation is constantly needed throughout development to measure progress towards the goals that the system has to meet. Interactive speech system evaluation today is as much of an art and craft as it is an exact science with established standards and procedures of good engineering practice. In particular, little is still known about interaction model evaluation, including evaluation of dialogue components and integrated interactive speech systems. There is not even consensus on terminology. Following Hirschmann and Thompson (1996) (see also Gibbon et al, 1997) we will distinguish between three types of evaluation which, although they are clearly not orthogonal, seem to cover the relevant aspects of evaluation, and subsume the scopes of other commonly used terms and distinctions. Each of these three types of evaluation may be used at any stage of system development:
  • performance evaluation, i.e. measurements of the performance of the system and its components in terms of a set of quantitative parameters;
  • diagnostic evaluation, i.e. detection and diagnosis of design and implementation errors;
  • adequacy evaluation, i.e., how well do the system and its components fit their purpose and meet actual user needs and expectations.
Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær
9. Next Steps in Interactive Speech Systems
Abstract
The advanced interactive speech system technologies that we have been discussing up to this point are more or less at the level of the Danish Dialogue System. It is an interesting, even if somewhat vague, question of how far it will be possible to advance towards fully natural interactive speech systems on the basis of these technologies. What are the issues ahead that will require significant changes of approach? In this chapter, we discuss two such issues. The first issue we have chosen to term the “heterogeneous task” which appears to demand a significant increase in systems’ language processing skills and in the theoretical underpinnings of these skills (Section 9.2). The second issue is multimodality. In the future, spoken human-system interaction no doubt will become much more similar to natural human-human spoken interaction than is currently the case. However, as long as the interaction is purely spoken, and hence unimodal, it remains far from the ideal of fully natural human-human communication presented in Section 1.1. Section 9.3 presents a range of multimodal systems which actually or potentially incorporate advanced interactive speech technologies, and discusses ways to develop a systematic understanding of such Advanced Multimodal Interactive Speech Systems (AMISSs).
Niels Ole Bernsen, Laila Dybkjær, Hans Dybkjær
Backmatter
Metadaten
Titel
Designing Interactive Speech Systems
verfasst von
Niels Ole Bernsen
Laila Dybkjær
Hans Dybkjær
Copyright-Jahr
1998
Verlag
Springer London
Electronic ISBN
978-1-4471-0897-9
Print ISBN
978-3-540-76048-1
DOI
https://doi.org/10.1007/978-1-4471-0897-9