Skip to main content

2011 | Buch

Multimodal Interactive Pattern Recognition and Applications

verfasst von: Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta

Verlag: Springer London

insite
SUCHEN

Über dieses Buch

This book presents a different approach to pattern recognition (PR) systems, in which users of a system are involved during the recognition process. This can help to avoid later errors and reduce the costs associated with post-processing. The book also examines a range of advanced multimodal interactions between the machine and the users, including handwriting, speech and gestures. Features: presents an introduction to the fundamental concepts and general PR approaches for multimodal interaction modeling and search (or inference); provides numerous examples and a helpful Glossary; discusses approaches for computer-assisted transcription of handwritten and spoken documents; examines systems for computer-assisted language translation, interactive text generation and parsing, relevance-based image retrieval, and interactive document layout analysis; reviews several full working prototypes of multimodal interactive PR applications, including live demonstrations that can be publicly accessed on the Internet.

Inhaltsverzeichnis

Frontmatter
Chapter 1. General Framework
Abstract
Lately, the paradigm for Pattern Recognition (PR) systems design is shifting from the concept of full-automation to systems where the decision process is conditioned by human feedback. This shift is motivated by the fact that full automation often proves elusive, or unnatural in many applications where technology is expected to assist rather than replace the human agents.
This chapter examines the challenges and research opportunities entailed by placing PR within the human-interaction framework; namely: (a) taking direct advantage of the feedback information provided by the user in each interaction step to improve raw performance; (b) acknowledging the inherent multimodality of interaction to improve overall system behavior and usability and (c) using the feedback-derived data to tune the system to the user behavior and the specific task considered, by means of adaptive learning techniques.
One of the most influential factors for the rapid development of PR technology in the last few decades is the nowadays commonly adopted assessment paradigm based on labeled training and testing corpora. This chapter includes a discussion about simple but realistic “user models” or interaction protocols and assessment criteria which allow the successful labeled corpus-based assessment paradigm to be applied also in the interactive scenario.
This chapter also provides an introduction to general approaches available to solve the underlying interactive search problems on the basis of existing methods to solve the corresponding non-interactive counterparts and an overview of modern machine learning approaches which can be useful in the interactive framework.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 2. Computer Assisted Transcription: General Framework
Abstract
This chapter described the common basics on which are grounded the computer assisted transcription approaches described in the three subsequent chapters: Chaps. 3, 4 and 5. Besides, a general overview is provided of the common features characterizing the up-to-date systems we have employed for handwritten text and speech recognition.
Specific mathematical formulation and modeling adequate for interactive transcription of handwritten text images and speech signals are derived from a particular instantiation of the interactive–predictive general framework already introduced in Sect. 1.​3.​3. Moreover, on this ground and by adopting the passive left-to-right interaction protocol described in Sect. 1.​4.​2, the two basic computer assisted handwriting and speech transcription approaches were developed (detailed in Chaps. 3 and 4, respectively), along with the evaluation measures used to assess their performance.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 3. Computer Assisted Transcription of Text Images
Abstract
Grounded in the interactive–predictive transcription framework drawn in the previous chapter, an interactive approach for efficient transcription of handwritten text images, along with its more ergonomic and multimodal variants are presented. All these approaches, rather than full automation, aim at assisting the expert in the proper transcription process in an efficient way. In this sense, an interactive scenario is stated, where both automatic handwriting recognition system and human transcriber (user) cooperate to produce the final transcription of text-images.
Additionally, an explanation of both basic off- and on-line HTR systems used embedded in the CATTI approaches is given in some detail. This focusing mainly on the preprocessing, feature extraction and on specific aspects of the modeling and decoding-searching process, which complement the ones already introduced in Sect. 2.​2.
Moreover, in this chapter, it will be shown how user-interaction feedback directly allows us to improve system accuracy, while multimodality increases system ergonomics and user acceptability. Multimodal interaction is approached in such a way that both the main and the feedback data streams help each-other to optimize overall performance and usability. All these are supported by experimental results obtained on three cursive handwritten tasks suggesting that, using these approaches, considerable amounts of user effort can be saved with respect to both pure manual work and non-interactive, post-editing processing.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 4. Computer Assisted Transcription of Speech Signals
Abstract
Automatic Speech Recognition has been widely employed in the last years. However, when a perfect transcription of the input is required, it is still necessary to rely on a human operator that supervises and corrects the mistakes that recognition systems usually make. Although the use of automatic systems can speed up the transcription process significantly, the intervention of these human supervisors can slow down this job considerably. Owing to this fact, the application of the Interactive Pattern Recognition approach to this task turns out to be a good opportunity to improve the cooperation between the computer and the human when an error-free transcribed document is needed.
In this chapter, an interactive multimodal approach for efficient transcriptions of speech signal is presented. This approach, rather than full automation, aims at assisting the expert in the proper transcription process. In this sense, an interactive scenario is proposed and it is based on a cooperative process between an automatic recognition system and a human transcriber to generate the final transcription of the speech signal. It will be shown how user’s feedback directly allows one to improve the system accuracy, while multimodality increases system ergonomics and user acceptability.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 5. Active Interaction and Learning in Handwritten Text Transcription
Abstract
Computer-assisted systems are being increasingly used in a variety of real-world tasks, though their application to handwritten text transcription in old manuscripts remains largely unexplored. The basic idea explored in this chapter is to follow a sequential, line-by-line transcription of the whole manuscript in which a continuously retrained system interacts with the user to efficiently transcribe each new line. User interaction is expensive in terms of time and cost. Our top priority is to take advantage of these interactions, while trying to reduce them as most as possible.
To this end, we study three different frameworks: (a) improve a recognition system from newly recognized transcriptions via adaptation techniques, using semi-supervised learning techniques; (b) study how to best adapt from limited user supervisions, which is related to active learning; and (c) develop a simple error estimate, which is used to let the user adjust the error in a computer-assisted transcription task. In addition, we test these approaches in the sequential transcription of two old text documents.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 6. Interactive Machine Translation
Abstract
Achieving high-quality translation between any pair of languages is not possible with the current Machine-Translation (MT) technology a human post-editing of the outputs of the MT system being necessary. Therefore, MT is a suitable area to apply the Interactive Pattern Recognition (IPR) framework and this application has led to what nowadays is known as Interactive Machine Translation (IMT). IMT can predict the translation of a given source sentence, and the human translator can accept or correct some of the errors. The text amended by the human translator can be used by the system to suggest new improved translations with the same translation models in an iterative process until the whole output is accepted by the human.
As in other areas where IPR is being applied, IMT offers a nice framework for adaptive learning. The consolidated translations obtained through the successive steps of the interaction process can easily be converted into new, fresh, training data, useful for dynamically adapting the system to the changing environment. On the other hand, IMT also allows one to take advantage of some available multi-modal interfaces to increase of productivity. Multi-modal interfaces and adaptive learning in IMT will be covered in Chaps. 7 and 8, respectively.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 7. Multi-Modality for Interactive Machine Translation
Abstract
In the Interactive Machine Translation (IMT) framework, a human translator can interact with the IMT system to achieve a high-quality translation. This is done by basic editing operations, i.e. substitution or deletion of erroneous words or insertion of missing words. This process is usually performed with the keyboard. While keyboard is considered as the principal way of introducing text to a computer, other modalities can provide useful information to improve IMT performance or to increase system ergonomics.
Examples of modalities that can improve performance are pointer interactions, which give implicit and explicit information that can be of great use to an IMT system. Additionally, the speech and handwritten text modalities are able to increase the system’s usability and ergonomics. This is specially true for the new kind of keyboard-less devices that are gaining popularity incredibly fast, as touch-screen tablets and mobile phones.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 8. Incremental and Adaptive Learning for Interactive Machine Translation
Abstract
High-quality translation between any pair of languages can be achieved by human post-editing of the outputs of a MT system or, as mentioned in Chap. 6, by following the Interactive Machine Translation (IMT) approach. In the interactive pattern recognition framework, IMT can predict the translation of the next words in the output, and can suggest them to the human translator who, iteratively, can accept or correct the suggested translations. The consolidated translations obtained through the successive steps of the interaction process can be considered as “perfect translations” due to the fact that they have been validated by a human expert. Therefore, this consolidated translations can easily be converted into new, fresh, training data, useful for dynamically adapting the system to the changing environment. Taking that into account, on the one hand, the IMT paradigm offers an appropriate framework for incremental and adaptive learning in SMT. On the other hand, incremental and adaptive learning offers the possibility to substantially save human effort by simply avoiding the user to perform the same corrections again and again.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 9. Interactive Parsing
Abstract
This chapter introduces the Interactive Parsing (IP) framework for obtaining the correct syntactic parse tree of a given sentence. This formal framework allows us to make the construction of interactive systems for tree annotation. These interactive systems can help to human annotators in creating error-free parse trees with little effort, when compared with manual post-editing of the trees provided by an automatic parser.
In principle, the interaction protocol defined in the IP framework differs from the left-to-right interaction protocol used throughout this book. Specifically, the IP protocol will be of desultory order; that is, in IP the user can edit any part of the parse tree and in any order. However, in order to efficiently calculate the next best tree in IP framework, in Sect. 9.4, a left-to-right depth-first tree review order will be introduced. In addition, this order also introduces computational advantages into the lookout of most probable tree for interactive bottom-up parsing algorithms. The use of Confidence Measures in IP is also presented as an efficient technique to detect erroneous parse trees. Confidence Measures can be efficiently computed in the IP framework and can help in detecting erroneous constituents within the IP process more quickly, as they provide discriminant information over all the IP process.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 10. Interactive Text Generation
Abstract
Using a computer to produce text documents is essentially a manual task nowadays. The computer is basically seen as an electronic typewriter and all the effort required falls on the human user who has to, firstly, think of a grammatically and semantically correct piece of text and, then, type on the computer. Although human beings are usually quite efficient when performing this task, in some cases, this process can be very time consuming. Writing text in a non-native language, using devices having highly constrained input interfaces, or the case of impaired people using computers are only a few examples. Providing some kind of automation in these scenarios could be really useful.
Interactive Text Prediction deals with providing assistance in document typing tasks. IPR techniques are used to predict what the user is going to type, given the text typed previously. Prediction is studied both at the word level and at the character level but, in both cases, the aim is to predict multi-word text chunks, not just a single next word or word fragment. Empirical tests suggest that significant amounts of user typing (and to some extent also thinking) effort can be saved using the proposed approaches. In this chapter, alternative strategies to perform the search in this type of tasks are also presented and discussed in detail.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 11. Interactive Image Retrieval
Abstract
This chapter presents search methods for image retrieval which are boosted using the user’s supervision by means of the human–computer interaction methodology. Two contributions are presented which cover different aspects of this problem.
The first one deals with classical relevance feedback, content-based image retrieval, but with a formulation directly derived from the IPR paradigm adopted throughout this book. This formulation helps putting forward the improving role of “consistency” among the retrieved images. The second contribution considers the use of a complementary text-based “modality” to express the user relevance feedback information, which leads to improved retrieval results.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Chapter 12. Prototypes and Demonstrators
Abstract
This chapter presents several full working prototypes and demonstrators of multimodal interactive pattern recognition applications. These systems serve as validating examples of the approaches that have been proposed and described throughout this book. Among other interesting things, they are designed to enable a true human–computer interaction on selected tasks.
To begin, we shall expound the different protocols that were tested, namely Passive Left-to-Right, Passive Desultory, and Active. The overview of each demonstrator is sufficiently detailed to give the reader an overview of the underlying technologies. The prototypes covered in this chapter are related to transcription of text images (IHT, GIDOC), machine translation (IMT), speech transcription (IST), text generation (ITG), and image retrieval (RISE). Additionally, most of these prototypes shall present evaluation measures about the amount of user effort reduction at the end of the process. Finally, some of such demonstrators come with web-based versions, whose addresses are included to allow the reader to test and practice with the different implemented applications.
Alejandro Héctor Toselli, Enrique Vidal, Francisco Casacuberta
Backmatter
Metadaten
Titel
Multimodal Interactive Pattern Recognition and Applications
verfasst von
Alejandro Héctor Toselli
Enrique Vidal
Francisco Casacuberta
Copyright-Jahr
2011
Verlag
Springer London
Electronic ISBN
978-0-85729-479-1
Print ISBN
978-0-85729-478-4
DOI
https://doi.org/10.1007/978-0-85729-479-1

Neuer Inhalt