Multimodal Interactive Pattern Recognition and Applications

verfasst von: Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta

Verlag: Springer London

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book presents a different approach to pattern recognition (PR) systems, in which users of a system are involved during the recognition process. This can help to avoid later errors and reduce the costs associated with post-processing. The book also examines a range of advanced multimodal interactions between the machine and the users, including handwriting, speech and gestures. Features: presents an introduction to the fundamental concepts and general PR approaches for multimodal interaction modeling and search (or inference); provides numerous examples and a helpful Glossary; discusses approaches for computer-assisted transcription of handwritten and spoken documents; examines systems for computer-assisted language translation, interactive text generation and parsing, relevance-based image retrieval, and interactive document layout analysis; reviews several full working prototypes of multimodal interactive PR applications, including live demonstrations that can be publicly accessed on the Internet.

Inhaltsverzeichnis

Frontmatter

Chapter 1. General Framework

Abstract

Lately, the paradigm for Pattern Recognition (PR) systems design is shifting from the concept of full-automation to systems where the decision process is conditioned by human feedback. This shift is motivated by the fact that full automation often proves elusive, or unnatural in many applications where technology is expected to assist rather than replace the human agents.

This chapter examines the challenges and research opportunities entailed by placing PR within the human-interaction framework; namely: (a) taking direct advantage of the feedback information provided by the user in each interaction step to improve raw performance; (b) acknowledging the inherent multimodality of interaction to improve overall system behavior and usability and (c) using the feedback-derived data to tune the system to the user behavior and the specific task considered, by means of adaptive learning techniques.

One of the most influential factors for the rapid development of PR technology in the last few decades is the nowadays commonly adopted assessment paradigm based on labeled training and testing corpora. This chapter includes a discussion about simple but realistic “user models” or interaction protocols and assessment criteria which allow the successful labeled corpus-based assessment paradigm to be applied also in the interactive scenario.

This chapter also provides an introduction to general approaches available to solve the underlying interactive search problems on the basis of existing methods to solve the corresponding non-interactive counterparts and an overview of modern machine learning approaches which can be useful in the interactive framework.