Top

Published in:

2011 | OriginalPaper | Chapter

12. Prototypes and Demonstrators

Authors : Dr. Alejandro Héctor Toselli, Dr. Enrique Vidal, Prof. Francisco Casacuberta

Published in: Multimodal Interactive Pattern Recognition and Applications

Publisher: Springer London

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This chapter presents several full working prototypes and demonstrators of multimodal interactive pattern recognition applications. These systems serve as validating examples of the approaches that have been proposed and described throughout this book. Among other interesting things, they are designed to enable a true human–computer interaction on selected tasks.

To begin, we shall expound the different protocols that were tested, namely Passive Left-to-Right, Passive Desultory, and Active. The overview of each demonstrator is sufficiently detailed to give the reader an overview of the underlying technologies. The prototypes covered in this chapter are related to transcription of text images (IHT, GIDOC), machine translation (IMT), speech transcription (IST), text generation (ITG), and image retrieval (RISE). Additionally, most of these prototypes shall present evaluation measures about the amount of user effort reduction at the end of the process. Finally, some of such demonstrators come with web-based versions, whose addresses are included to allow the reader to test and practice with the different implemented applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Interactive Image Retrieval

TMX is an open XML standard for the exchange of translation documents.

The tree visiting order is left-to-right depth-first.

Alabau, V., Romero, V., Ortiz-Martínez, D., & Ocampo, J. (2009). A multimodal predictive-interactive application for computer assisted transcription and translation. In Proceedings of international conference on multimodal interfaces (ICMI) (pp. 227–228).

Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., Lagarda, A. L., Ney, H., Tomás, J., Vidal, E., & Vilar, J. M. (2009). Statistical approaches to computer-assisted translation. Computational Linguistics, 35(1), 3–28. MathSciNetCrossRef

Bickel, S., Haider, P., & Scheffer, T. (2005). Predicting sentences using n-gram language models. In Proceedings of human language technology and empirical methods in natural language processing (HLT/EMNLP) (pp. 193–200). CrossRef

Bisani, M., & Ney, H. (2004). Bootstrap estimates for confidence intervals in ASR performance evaluation. In Proc. ICASSP (pp. 409–412).

Cascia, M. L., Sethi, S., & Sclaroff, S. (1998). Combining textual and visual cues for content-based image retrieval on the world wide web. In IEEE workshop on content-based access of image and video libraries (pp. 24–28). CrossRef

Craciunescu, O., Gerding-Salas, C., & Stringer-O’Keeffe, S. (2004). Machine translation and computer-assisted translation: a new way of translating? Translation Journal, 8(3), 1–16.

Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2), 1–60. CrossRef

Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge: MIT Press.

Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the HLT/NAACL (pp. 48–54).

10.

Lease, M., Charniak, E., Johnson, M., & McClosky, D. (2006). A look at parsing and its applications. In Proc. AAAI (pp. 1642–1645).

11.

Likforman-Sulem, L., Zahour, A., & Taconet, B. (2007). Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition, 9, 123–138. CrossRef

12.

Moran, S. (2009). Automatic image tagging. Master’s thesis, School of Informatics, University of Edinburgh.

13.

Oncina, J. (2009). Optimum algorithm to minimize human interactions in sequential computer assisted pattern recognition. Pattern Recognition Letters, 30(5), 558–563. CrossRef

14.

Ortiz-Martínez, D., Leiva, L. A., Alabau, V., & Casacuberta, F. (2010). Interactive machine translation using a web-based architecture. In Proceedings of the international conference on intelligent user interfaces (pp. 423–425).

15.

Paredes, R., Deselaer, T., & Vidal, E. (2008). A probabilistic model for user relevance feedback on image retrieval. In Proceedings of machine learning for multimodal interaction (MLMI) (pp. 260–271). CrossRef

16.

Pérez, D., Tarazón, L., Serrano, N., Castro, F.-M., Ramos-Terrades, O., & Juan, A. (2009). The GERMANA database. In Proceedings of the international conference on document analysis and recognition (ICDAR) (pp. 301–305).

17.

Plötz, T., & Fink, G. A. (2009). Markov models for offline handwriting recognition: a survey. International Journal on Document Analysis and Recognition, 12(4), 269–298. CrossRef

18.

Ramos-Terrades, O., Serrano, N., Gordó, A., Valveny, E., & Juan, A. (2010). Interactive-predictive detection of handwritten text blocks. In Document recognition and retrieval XVII (Proc. of SPIE-IS&T electronic imaging) (pp. 219–222).

19.

Rodríguez, L., Casacuberta, F., & Vidal, E. (2007). Computer assisted transcription of speech. In Proceedings of the Iberian conference on pattern recognition and image analysis (pp. 241–248). CrossRef

20.

Romero, V., Toselli, A. H., Civera, J., & Vidal, E. (2008). Improvements in the computer assisted transciption system of handwritten text images. In Proceedings of workshop on pattern recognition in information system (PRIS) (pp. 103–112).

21.

Romero, V., Leiva, L. A., Toselli, A. H., & Vidal, E. (2009). Interactive multimodal transcription of text images using a web-based demo system. In Proceedings of the international conference on intelligent user interfaces (pp. 477–478).

22.

Romero, V., Leiva, L. A., Alabau, V., Toselli, A. H., & Vidal, E. (2009). A web-based demo to interactive multimodal transcription of historic text images. In LNCS: Vol. 5714. Proceedings of the European conference on digital libraries (ECDL) (pp. 459–460).

23.

Sánchez-Sáez, R., Leiva, L. A., Sánchez, J. A., & Benedí, J. M. (2010). Interactive predictive parsing using a web-based architecture. In Proceedings of NAACL (pp. 37–40).

24.

Sanchis-Trilles, G., Ortiz-Martínez, D., Civera, J., Casacuberta, F., Vidal, E., & Hoang, H. (2008). Improving interactive machine translation via mouse actions. In EMNLP 2008: conference on empirical methods in natural language processing.

25.

Serrano, N., Pérez, D., Sanchis, A., & Juan, A. (2009). Adaptation from partially supervised handwritten text transcriptions. In Proceedings of the 11th international conference on multimodal interfaces and the 6th workshop on machine learning for multimodal interaction (ICMI-MLMI) (pp. 289–292).

26.

Serrano, N., Tarazón, L., Perez, D., Ramos-Terrades, O., & Juan, A. (2010). The GIDOC prototype. In Proceedings of the 10th international workshop on pattern recognition in information systems (PRIS 2010) (pp. 82–89).

27.

Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380. CrossRef

28.

Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 901–904).

29.

Toselli, A. H., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., & Casacuberta, F. (2004). Integrated handwriting recognition and interpretation using finite-state models. International Journal of Pattern Recognition and Artificial Intelligence, 18(4), 519–539. CrossRef

30.

Trost, H., Matiasek, J., & Baroni, M. (2005). The language component of the fasty text prediction system. Applied Artificial Intelligence, 19(8), 743–781. CrossRef

31.

Wang, J. Z., Boujemaa, N., Bimbo, A. D., Geman, D., Hauptmann, A. G., & Tešić, J. (2006). Diversity in multimedia information retrieval research. In Proceedings of the 8th ACM international workshop on multimedia information retrieval (pp. 5–12).

32.

Young, S., et al. (1995). The HTK book. Cambridge University, Engineering Department.

Title: Prototypes and Demonstrators
Authors: Dr. Alejandro Héctor Toselli
Dr. Enrique Vidal
Prof. Francisco Casacuberta
Publisher: Springer London
Book: Multimodal Interactive Pattern Recognition and Applications
Print ISBN: 978-0-85729-478-4

Electronic ISBN: 978-0-85729-479-1

Copyright Year: 2011
DOI: https://doi.org/10.1007/978-0-85729-479-1_12

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"