Skip to main content
Top

2017 | OriginalPaper | Chapter

Simple and Effective Multi-word Query Spotting in Handwritten Text Images

Authors : Ernesto Noya-García, Alejandro H. Toselli, Enrique Vidal

Published in: Pattern Recognition and Image Analysis

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Keyword spotting techniques are becoming cost-effective solutions for information retrieval in handwritten documents. We explore the extension of the single-word, line-level probabilistic indexing approach described in [1, 2] to allow page-level Boolean combinations of several single-keyword queries. We propose heuristic rules to combine the single-word relevance probabilities into probabilistically consistent confidence scores of the multi-word boolean combinations. As a preliminary study, this paper focuses on evaluating the search performance of word-pair queries involving just one OR or AND Boolean operation. Empirical results of this study support the proposed approach and clearly show its effectiveness.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
2
Note that these statistics were obtained without any kind of tokenization; that is, each non-blank sequence of characters is assumed to be a “word”.
 
Literature
1.
go back to reference Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word-graph based keyword spotting in handwritten document images. Int. J. Inf. Sci. 370, 497–518 (2015) Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word-graph based keyword spotting in handwritten document images. Int. J. Inf. Sci. 370, 497–518 (2015)
2.
go back to reference Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politècnica de València (2013) Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politècnica de València (2013)
3.
go back to reference Sánchez, J., Mühlberger, G., Gatos, B., Schofield, P., Depuydt, K., Davis, R., Vidal, E., de Does, J.: tranScriptorium: an European project on handwritten text recognition. In: DocEng, pp. 227–228 (2013) Sánchez, J., Mühlberger, G., Gatos, B., Schofield, P., Depuydt, K., Davis, R., Vidal, E., de Does, J.: tranScriptorium: an European project on handwritten text recognition. In: DocEng, pp. 227–228 (2013)
4.
go back to reference Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998) Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
5.
go back to reference Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. Digit. Humanit. Q. 6(2) (2012) Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. Digit. Humanit. Q. 6(2) (2012)
6.
go back to reference Sanchez, J.A., Romero, V., Toselli, A., Vidal, E.: ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790, September 2014 Sanchez, J.A., Romero, V., Toselli, A., Vidal, E.: ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790, September 2014
7.
go back to reference Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH
8.
go back to reference Zhu, M.: Recall, Precision and Average Precision. Working Paper 2004–09 Department of Statistics & Actuarial Science, University of Waterloo, 26 August 2004 Zhu, M.: Recall, Precision and Average Precision. Working Paper 2004–09 Department of Statistics & Actuarial Science, University of Waterloo, 26 August 2004
9.
go back to reference Robertson, S.: A new interpretation of average precision. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 689–690. ACM, New York (2008) Robertson, S.: A new interpretation of average precision. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 689–690. ACM, New York (2008)
10.
go back to reference Kozielski, M., Forster, J., Ney, H.: Moment-based image normalization for handwritten text recognition. In: Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, ICFHR 2012, pp. 256–261. IEEE Computer Society, Washington, DC (2012) Kozielski, M., Forster, J., Ney, H.: Moment-based image normalization for handwritten text recognition. In: Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, ICFHR 2012, pp. 256–261. IEEE Computer Society, Washington, DC (2012)
11.
go back to reference Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book: Hidden Markov Models Toolkit V2.1. Cambridge Research Laboratory Ltd. (1997) Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book: Hidden Markov Models Toolkit V2.1. Cambridge Research Laboratory Ltd. (1997)
12.
go back to reference Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D.: The HTK Book: Hidden Markov Models Toolkit V3.4. Microsoft Corporation & Cambridge Research Laboratory Ltd., March 2009 Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D.: The HTK Book: Hidden Markov Models Toolkit V3.4. Microsoft Corporation & Cambridge Research Laboratory Ltd., March 2009
13.
go back to reference Toselli, A., Vidal, E.: Handwritten text recognition results on the Bentham collection with improved classical N-gram-HMM methods. In: 3rd International Workshop on Historical Document Imaging and Processing (HIP 2015), pp. 15–22, August 2015 Toselli, A., Vidal, E.: Handwritten text recognition results on the Bentham collection with improved classical N-gram-HMM methods. In: 3rd International Workshop on Historical Document Imaging and Processing (HIP 2015), pp. 15–22, August 2015
14.
go back to reference Kneser, R., Ney, H.: Improved backing-off for N-gram language modeling. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 1995), vol. 1, Los Alamitos, CA, USA, pp. 181–184. IEEE Computer Society (1995) Kneser, R., Ney, H.: Improved backing-off for N-gram language modeling. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 1995), vol. 1, Los Alamitos, CA, USA, pp. 181–184. IEEE Computer Society (1995)
Metadata
Title
Simple and Effective Multi-word Query Spotting in Handwritten Text Images
Authors
Ernesto Noya-García
Alejandro H. Toselli
Enrique Vidal
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-58838-4_9

Premium Partner