Skip to main content

2017 | OriginalPaper | Buchkapitel

Simple and Effective Multi-word Query Spotting in Handwritten Text Images

verfasst von : Ernesto Noya-García, Alejandro H. Toselli, Enrique Vidal

Erschienen in: Pattern Recognition and Image Analysis

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Keyword spotting techniques are becoming cost-effective solutions for information retrieval in handwritten documents. We explore the extension of the single-word, line-level probabilistic indexing approach described in [1, 2] to allow page-level Boolean combinations of several single-keyword queries. We propose heuristic rules to combine the single-word relevance probabilities into probabilistically consistent confidence scores of the multi-word boolean combinations. As a preliminary study, this paper focuses on evaluating the search performance of word-pair queries involving just one OR or AND Boolean operation. Empirical results of this study support the proposed approach and clearly show its effectiveness.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
Note that these statistics were obtained without any kind of tokenization; that is, each non-blank sequence of characters is assumed to be a “word”.
 
Literatur
1.
Zurück zum Zitat Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word-graph based keyword spotting in handwritten document images. Int. J. Inf. Sci. 370, 497–518 (2015) Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word-graph based keyword spotting in handwritten document images. Int. J. Inf. Sci. 370, 497–518 (2015)
2.
Zurück zum Zitat Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politècnica de València (2013) Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politècnica de València (2013)
3.
Zurück zum Zitat Sánchez, J., Mühlberger, G., Gatos, B., Schofield, P., Depuydt, K., Davis, R., Vidal, E., de Does, J.: tranScriptorium: an European project on handwritten text recognition. In: DocEng, pp. 227–228 (2013) Sánchez, J., Mühlberger, G., Gatos, B., Schofield, P., Depuydt, K., Davis, R., Vidal, E., de Does, J.: tranScriptorium: an European project on handwritten text recognition. In: DocEng, pp. 227–228 (2013)
4.
Zurück zum Zitat Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998) Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
5.
Zurück zum Zitat Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. Digit. Humanit. Q. 6(2) (2012) Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. Digit. Humanit. Q. 6(2) (2012)
6.
Zurück zum Zitat Sanchez, J.A., Romero, V., Toselli, A., Vidal, E.: ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790, September 2014 Sanchez, J.A., Romero, V., Toselli, A., Vidal, E.: ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790, September 2014
7.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH
8.
Zurück zum Zitat Zhu, M.: Recall, Precision and Average Precision. Working Paper 2004–09 Department of Statistics & Actuarial Science, University of Waterloo, 26 August 2004 Zhu, M.: Recall, Precision and Average Precision. Working Paper 2004–09 Department of Statistics & Actuarial Science, University of Waterloo, 26 August 2004
9.
Zurück zum Zitat Robertson, S.: A new interpretation of average precision. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 689–690. ACM, New York (2008) Robertson, S.: A new interpretation of average precision. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 689–690. ACM, New York (2008)
10.
Zurück zum Zitat Kozielski, M., Forster, J., Ney, H.: Moment-based image normalization for handwritten text recognition. In: Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, ICFHR 2012, pp. 256–261. IEEE Computer Society, Washington, DC (2012) Kozielski, M., Forster, J., Ney, H.: Moment-based image normalization for handwritten text recognition. In: Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, ICFHR 2012, pp. 256–261. IEEE Computer Society, Washington, DC (2012)
11.
Zurück zum Zitat Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book: Hidden Markov Models Toolkit V2.1. Cambridge Research Laboratory Ltd. (1997) Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book: Hidden Markov Models Toolkit V2.1. Cambridge Research Laboratory Ltd. (1997)
12.
Zurück zum Zitat Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D.: The HTK Book: Hidden Markov Models Toolkit V3.4. Microsoft Corporation & Cambridge Research Laboratory Ltd., March 2009 Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D.: The HTK Book: Hidden Markov Models Toolkit V3.4. Microsoft Corporation & Cambridge Research Laboratory Ltd., March 2009
13.
Zurück zum Zitat Toselli, A., Vidal, E.: Handwritten text recognition results on the Bentham collection with improved classical N-gram-HMM methods. In: 3rd International Workshop on Historical Document Imaging and Processing (HIP 2015), pp. 15–22, August 2015 Toselli, A., Vidal, E.: Handwritten text recognition results on the Bentham collection with improved classical N-gram-HMM methods. In: 3rd International Workshop on Historical Document Imaging and Processing (HIP 2015), pp. 15–22, August 2015
14.
Zurück zum Zitat Kneser, R., Ney, H.: Improved backing-off for N-gram language modeling. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 1995), vol. 1, Los Alamitos, CA, USA, pp. 181–184. IEEE Computer Society (1995) Kneser, R., Ney, H.: Improved backing-off for N-gram language modeling. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 1995), vol. 1, Los Alamitos, CA, USA, pp. 181–184. IEEE Computer Society (1995)
Metadaten
Titel
Simple and Effective Multi-word Query Spotting in Handwritten Text Images
verfasst von
Ernesto Noya-García
Alejandro H. Toselli
Enrique Vidal
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-58838-4_9

Premium Partner