Skip to main content
Top

2011 | OriginalPaper | Chapter

12. Prototypes and Demonstrators

Authors : Dr. Alejandro Héctor Toselli, Dr. Enrique Vidal, Prof. Francisco Casacuberta

Published in: Multimodal Interactive Pattern Recognition and Applications

Publisher: Springer London

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter presents several full working prototypes and demonstrators of multimodal interactive pattern recognition applications. These systems serve as validating examples of the approaches that have been proposed and described throughout this book. Among other interesting things, they are designed to enable a true human–computer interaction on selected tasks.
To begin, we shall expound the different protocols that were tested, namely Passive Left-to-Right, Passive Desultory, and Active. The overview of each demonstrator is sufficiently detailed to give the reader an overview of the underlying technologies. The prototypes covered in this chapter are related to transcription of text images (IHT, GIDOC), machine translation (IMT), speech transcription (IST), text generation (ITG), and image retrieval (RISE). Additionally, most of these prototypes shall present evaluation measures about the amount of user effort reduction at the end of the process. Finally, some of such demonstrators come with web-based versions, whose addresses are included to allow the reader to test and practice with the different implemented applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
TMX is an open XML standard for the exchange of translation documents.
 
2
The tree visiting order is left-to-right depth-first.
 
Literature
1.
go back to reference Alabau, V., Romero, V., Ortiz-Martínez, D., & Ocampo, J. (2009). A multimodal predictive-interactive application for computer assisted transcription and translation. In Proceedings of international conference on multimodal interfaces (ICMI) (pp. 227–228). Alabau, V., Romero, V., Ortiz-Martínez, D., & Ocampo, J. (2009). A multimodal predictive-interactive application for computer assisted transcription and translation. In Proceedings of international conference on multimodal interfaces (ICMI) (pp. 227–228).
2.
go back to reference Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., Lagarda, A. L., Ney, H., Tomás, J., Vidal, E., & Vilar, J. M. (2009). Statistical approaches to computer-assisted translation. Computational Linguistics, 35(1), 3–28. MathSciNetCrossRef Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., Lagarda, A. L., Ney, H., Tomás, J., Vidal, E., & Vilar, J. M. (2009). Statistical approaches to computer-assisted translation. Computational Linguistics, 35(1), 3–28. MathSciNetCrossRef
3.
go back to reference Bickel, S., Haider, P., & Scheffer, T. (2005). Predicting sentences using n-gram language models. In Proceedings of human language technology and empirical methods in natural language processing (HLT/EMNLP) (pp. 193–200). CrossRef Bickel, S., Haider, P., & Scheffer, T. (2005). Predicting sentences using n-gram language models. In Proceedings of human language technology and empirical methods in natural language processing (HLT/EMNLP) (pp. 193–200). CrossRef
4.
go back to reference Bisani, M., & Ney, H. (2004). Bootstrap estimates for confidence intervals in ASR performance evaluation. In Proc. ICASSP (pp. 409–412). Bisani, M., & Ney, H. (2004). Bootstrap estimates for confidence intervals in ASR performance evaluation. In Proc. ICASSP (pp. 409–412).
5.
go back to reference Cascia, M. L., Sethi, S., & Sclaroff, S. (1998). Combining textual and visual cues for content-based image retrieval on the world wide web. In IEEE workshop on content-based access of image and video libraries (pp. 24–28). CrossRef Cascia, M. L., Sethi, S., & Sclaroff, S. (1998). Combining textual and visual cues for content-based image retrieval on the world wide web. In IEEE workshop on content-based access of image and video libraries (pp. 24–28). CrossRef
6.
go back to reference Craciunescu, O., Gerding-Salas, C., & Stringer-O’Keeffe, S. (2004). Machine translation and computer-assisted translation: a new way of translating? Translation Journal, 8(3), 1–16. Craciunescu, O., Gerding-Salas, C., & Stringer-O’Keeffe, S. (2004). Machine translation and computer-assisted translation: a new way of translating? Translation Journal, 8(3), 1–16.
7.
go back to reference Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2), 1–60. CrossRef Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2), 1–60. CrossRef
8.
go back to reference Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge: MIT Press. Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge: MIT Press.
9.
go back to reference Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the HLT/NAACL (pp. 48–54). Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the HLT/NAACL (pp. 48–54).
10.
go back to reference Lease, M., Charniak, E., Johnson, M., & McClosky, D. (2006). A look at parsing and its applications. In Proc. AAAI (pp. 1642–1645). Lease, M., Charniak, E., Johnson, M., & McClosky, D. (2006). A look at parsing and its applications. In Proc. AAAI (pp. 1642–1645).
11.
go back to reference Likforman-Sulem, L., Zahour, A., & Taconet, B. (2007). Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition, 9, 123–138. CrossRef Likforman-Sulem, L., Zahour, A., & Taconet, B. (2007). Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition, 9, 123–138. CrossRef
12.
go back to reference Moran, S. (2009). Automatic image tagging. Master’s thesis, School of Informatics, University of Edinburgh. Moran, S. (2009). Automatic image tagging. Master’s thesis, School of Informatics, University of Edinburgh.
13.
go back to reference Oncina, J. (2009). Optimum algorithm to minimize human interactions in sequential computer assisted pattern recognition. Pattern Recognition Letters, 30(5), 558–563. CrossRef Oncina, J. (2009). Optimum algorithm to minimize human interactions in sequential computer assisted pattern recognition. Pattern Recognition Letters, 30(5), 558–563. CrossRef
14.
go back to reference Ortiz-Martínez, D., Leiva, L. A., Alabau, V., & Casacuberta, F. (2010). Interactive machine translation using a web-based architecture. In Proceedings of the international conference on intelligent user interfaces (pp. 423–425). Ortiz-Martínez, D., Leiva, L. A., Alabau, V., & Casacuberta, F. (2010). Interactive machine translation using a web-based architecture. In Proceedings of the international conference on intelligent user interfaces (pp. 423–425).
15.
go back to reference Paredes, R., Deselaer, T., & Vidal, E. (2008). A probabilistic model for user relevance feedback on image retrieval. In Proceedings of machine learning for multimodal interaction (MLMI) (pp. 260–271). CrossRef Paredes, R., Deselaer, T., & Vidal, E. (2008). A probabilistic model for user relevance feedback on image retrieval. In Proceedings of machine learning for multimodal interaction (MLMI) (pp. 260–271). CrossRef
16.
go back to reference Pérez, D., Tarazón, L., Serrano, N., Castro, F.-M., Ramos-Terrades, O., & Juan, A. (2009). The GERMANA database. In Proceedings of the international conference on document analysis and recognition (ICDAR) (pp. 301–305). Pérez, D., Tarazón, L., Serrano, N., Castro, F.-M., Ramos-Terrades, O., & Juan, A. (2009). The GERMANA database. In Proceedings of the international conference on document analysis and recognition (ICDAR) (pp. 301–305).
17.
go back to reference Plötz, T., & Fink, G. A. (2009). Markov models for offline handwriting recognition: a survey. International Journal on Document Analysis and Recognition, 12(4), 269–298. CrossRef Plötz, T., & Fink, G. A. (2009). Markov models for offline handwriting recognition: a survey. International Journal on Document Analysis and Recognition, 12(4), 269–298. CrossRef
18.
go back to reference Ramos-Terrades, O., Serrano, N., Gordó, A., Valveny, E., & Juan, A. (2010). Interactive-predictive detection of handwritten text blocks. In Document recognition and retrieval XVII (Proc. of SPIE-IS&T electronic imaging) (pp. 219–222). Ramos-Terrades, O., Serrano, N., Gordó, A., Valveny, E., & Juan, A. (2010). Interactive-predictive detection of handwritten text blocks. In Document recognition and retrieval XVII (Proc. of SPIE-IS&T electronic imaging) (pp. 219–222).
19.
go back to reference Rodríguez, L., Casacuberta, F., & Vidal, E. (2007). Computer assisted transcription of speech. In Proceedings of the Iberian conference on pattern recognition and image analysis (pp. 241–248). CrossRef Rodríguez, L., Casacuberta, F., & Vidal, E. (2007). Computer assisted transcription of speech. In Proceedings of the Iberian conference on pattern recognition and image analysis (pp. 241–248). CrossRef
20.
go back to reference Romero, V., Toselli, A. H., Civera, J., & Vidal, E. (2008). Improvements in the computer assisted transciption system of handwritten text images. In Proceedings of workshop on pattern recognition in information system (PRIS) (pp. 103–112). Romero, V., Toselli, A. H., Civera, J., & Vidal, E. (2008). Improvements in the computer assisted transciption system of handwritten text images. In Proceedings of workshop on pattern recognition in information system (PRIS) (pp. 103–112).
21.
go back to reference Romero, V., Leiva, L. A., Toselli, A. H., & Vidal, E. (2009). Interactive multimodal transcription of text images using a web-based demo system. In Proceedings of the international conference on intelligent user interfaces (pp. 477–478). Romero, V., Leiva, L. A., Toselli, A. H., & Vidal, E. (2009). Interactive multimodal transcription of text images using a web-based demo system. In Proceedings of the international conference on intelligent user interfaces (pp. 477–478).
22.
go back to reference Romero, V., Leiva, L. A., Alabau, V., Toselli, A. H., & Vidal, E. (2009). A web-based demo to interactive multimodal transcription of historic text images. In LNCS: Vol. 5714. Proceedings of the European conference on digital libraries (ECDL) (pp. 459–460). Romero, V., Leiva, L. A., Alabau, V., Toselli, A. H., & Vidal, E. (2009). A web-based demo to interactive multimodal transcription of historic text images. In LNCS: Vol. 5714. Proceedings of the European conference on digital libraries (ECDL) (pp. 459–460).
23.
go back to reference Sánchez-Sáez, R., Leiva, L. A., Sánchez, J. A., & Benedí, J. M. (2010). Interactive predictive parsing using a web-based architecture. In Proceedings of NAACL (pp. 37–40). Sánchez-Sáez, R., Leiva, L. A., Sánchez, J. A., & Benedí, J. M. (2010). Interactive predictive parsing using a web-based architecture. In Proceedings of NAACL (pp. 37–40).
24.
go back to reference Sanchis-Trilles, G., Ortiz-Martínez, D., Civera, J., Casacuberta, F., Vidal, E., & Hoang, H. (2008). Improving interactive machine translation via mouse actions. In EMNLP 2008: conference on empirical methods in natural language processing. Sanchis-Trilles, G., Ortiz-Martínez, D., Civera, J., Casacuberta, F., Vidal, E., & Hoang, H. (2008). Improving interactive machine translation via mouse actions. In EMNLP 2008: conference on empirical methods in natural language processing.
25.
go back to reference Serrano, N., Pérez, D., Sanchis, A., & Juan, A. (2009). Adaptation from partially supervised handwritten text transcriptions. In Proceedings of the 11th international conference on multimodal interfaces and the 6th workshop on machine learning for multimodal interaction (ICMI-MLMI) (pp. 289–292). Serrano, N., Pérez, D., Sanchis, A., & Juan, A. (2009). Adaptation from partially supervised handwritten text transcriptions. In Proceedings of the 11th international conference on multimodal interfaces and the 6th workshop on machine learning for multimodal interaction (ICMI-MLMI) (pp. 289–292).
26.
go back to reference Serrano, N., Tarazón, L., Perez, D., Ramos-Terrades, O., & Juan, A. (2010). The GIDOC prototype. In Proceedings of the 10th international workshop on pattern recognition in information systems (PRIS 2010) (pp. 82–89). Serrano, N., Tarazón, L., Perez, D., Ramos-Terrades, O., & Juan, A. (2010). The GIDOC prototype. In Proceedings of the 10th international workshop on pattern recognition in information systems (PRIS 2010) (pp. 82–89).
27.
go back to reference Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380. CrossRef Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380. CrossRef
28.
go back to reference Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 901–904). Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 901–904).
29.
go back to reference Toselli, A. H., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., & Casacuberta, F. (2004). Integrated handwriting recognition and interpretation using finite-state models. International Journal of Pattern Recognition and Artificial Intelligence, 18(4), 519–539. CrossRef Toselli, A. H., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., & Casacuberta, F. (2004). Integrated handwriting recognition and interpretation using finite-state models. International Journal of Pattern Recognition and Artificial Intelligence, 18(4), 519–539. CrossRef
30.
go back to reference Trost, H., Matiasek, J., & Baroni, M. (2005). The language component of the fasty text prediction system. Applied Artificial Intelligence, 19(8), 743–781. CrossRef Trost, H., Matiasek, J., & Baroni, M. (2005). The language component of the fasty text prediction system. Applied Artificial Intelligence, 19(8), 743–781. CrossRef
31.
go back to reference Wang, J. Z., Boujemaa, N., Bimbo, A. D., Geman, D., Hauptmann, A. G., & Tešić, J. (2006). Diversity in multimedia information retrieval research. In Proceedings of the 8th ACM international workshop on multimedia information retrieval (pp. 5–12). Wang, J. Z., Boujemaa, N., Bimbo, A. D., Geman, D., Hauptmann, A. G., & Tešić, J. (2006). Diversity in multimedia information retrieval research. In Proceedings of the 8th ACM international workshop on multimedia information retrieval (pp. 5–12).
32.
go back to reference Young, S., et al. (1995). The HTK book. Cambridge University, Engineering Department. Young, S., et al. (1995). The HTK book. Cambridge University, Engineering Department.
Metadata
Title
Prototypes and Demonstrators
Authors
Dr. Alejandro Héctor Toselli
Dr. Enrique Vidal
Prof. Francisco Casacuberta
Copyright Year
2011
Publisher
Springer London
DOI
https://doi.org/10.1007/978-0-85729-479-1_12