Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 3/2021

06.08.2021 | Special Issue Paper

Asking questions on handwritten document collections

verfasst von: Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, C. V. Jawahar

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Vincent, L.: Google book search: document understanding on a massive scale. In: ICDAR, (2007) Vincent, L.: Google book search: document understanding on a massive scale. In: ICDAR, (2007)
2.
Zurück zum Zitat Jaume, Kemal Ekenel,H. K.,Thiran, J.: FUNSD: A dataset for form understanding in noisy scanned documents. In: International Conference on Document Analysis and Recognition Workshops (ICDARW), (2019) Jaume, Kemal Ekenel,H. K.,Thiran, J.: FUNSD: A dataset for form understanding in noisy scanned documents. In: International Conference on Document Analysis and Recognition Workshops (ICDARW), (2019)
3.
Zurück zum Zitat Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., Jawahar, C. V.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: ICDAR, (2019) Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., Jawahar, C. V.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: ICDAR, (2019)
4.
Zurück zum Zitat Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: Pre-training of text and layout for document image understanding. In: CoRR, vol. abs/1912.13318, (2019) Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: Pre-training of text and layout for document image understanding. In: CoRR, vol. abs/1912.13318, (2019)
5.
Zurück zum Zitat Mathew, M., Karatzas, D., Jawahar, C.V.: Docvqa: A dataset for vqa on document images. In: WACV, (2021) Mathew, M., Karatzas, D., Jawahar, C.V.: Docvqa: A dataset for vqa on document images. In: WACV, (2021)
6.
Zurück zum Zitat Oliveira, D, A, B., Viana, M. P.: Fast CNN-based document layout analysis. In: ICCVW, (2017) Oliveira, D, A, B., Viana, M. P.: Fast CNN-based document layout analysis. In: ICCVW, (2017)
7.
Zurück zum Zitat Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In: CoRR, vol. abs/1507.05717, (2015) Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In: CoRR, vol. abs/1507.05717, (2015)
8.
Zurück zum Zitat Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: DAS, (2018) Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: DAS, (2018)
9.
Zurück zum Zitat Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C, L.: Microsoft COCO captions: data collection and evaluation server. CoRR, vol. abs/1504.00325, (2015) Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C, L.: Microsoft COCO captions: data collection and evaluation server. CoRR, vol. abs/1504.00325, (2015)
10.
Zurück zum Zitat Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C, L., Parikh, D.: VQA: Visual question answering. In: ICCV, (2015) Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C, L., Parikh, D.: VQA: Visual question answering. In: ICCV, (2015)
11.
Zurück zum Zitat Kafle, K., Price, B., Cohen, S., Kanan, C.: DVQA: Understanding data visualizations via question answering. In: CVPR, (2018) Kafle, K., Price, B., Cohen, S., Kanan, C.: DVQA: Understanding data visualizations via question answering. In: CVPR, (2018)
12.
Zurück zum Zitat Singh, A.,Natarjan, V., Shah, M., Jiang, Y., Chen, X., Parikh, D., Rohrbach, M.: Towards VQA models that can read. In: CVPR, (2019) Singh, A.,Natarjan, V., Shah, M., Jiang, Y., Chen, X., Parikh, D., Rohrbach, M.: Towards VQA models that can read. In: CVPR, (2019)
13.
Zurück zum Zitat Biten, A, F., Tito, R., Mafla, A., Gómez, L., Rusiñol, M., Valveny, E., Jawahar, C, V., Karatzas, D.: Scene text visual question answering. In: CoRR, vol. abs/1905.13648, (2019) Biten, A, F., Tito, R., Mafla, A., Gómez, L., Rusiñol, M., Valveny, E., Jawahar, C, V., Karatzas, D.: Scene text visual question answering. In: CoRR, vol. abs/1905.13648, (2019)
14.
Zurück zum Zitat Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:1606.05250, (2016) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:​1606.​05250, (2016)
15.
Zurück zum Zitat Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L.: MS MARCO: A human generated machine reading comprehension dataset. In: CoRR, vol. abs/1611.09268, (2016) Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L.: MS MARCO: A human generated machine reading comprehension dataset. In: CoRR, vol. abs/1611.09268, (2016)
16.
Zurück zum Zitat Chen, D., Fisch, A., Weston, J., Bordes, A: Reading wikipedia to answer open-domain questions. In: ACL, (2017) Chen, D., Fisch, A., Weston, J., Bordes, A: Reading wikipedia to answer open-domain questions. In: ACL, (2017)
17.
Zurück zum Zitat Devlin, J., Chang, M. -W., Lee K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: ACL, (2019) Devlin, J., Chang, M. -W., Lee K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: ACL, (2019)
18.
Zurück zum Zitat Mathew, M., Tito, R., Karatzas, D., Manmatha, R., Jawahar, C. V.: Document visual question answering challenge 2020. (2020) Mathew, M., Tito, R., Karatzas, D., Manmatha, R., Jawahar, C. V.: Document visual question answering challenge 2020. (2020)
19.
Zurück zum Zitat Wigington, C., Tensmeyer, C., Davis, B. L., Barrett, W. A., Price, B. L., Cohen, S.: Start, follow, read: end-to-end full-page handwriting recognition. In: ECCV, (2018) Wigington, C., Tensmeyer, C., Davis, B. L., Barrett, W. A., Price, B. L., Cohen, S.: Start, follow, read: end-to-end full-page handwriting recognition. In: ECCV, (2018)
20.
Zurück zum Zitat Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. (2020) Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. (2020)
21.
Zurück zum Zitat Bazzo, G.T., Lorentz, G.A., Vargas, D.S., Moreira, V.P.: Assessing the impact of OCR errors in information retrieval. In: Jose, M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) Advances in Information Retrieval. Nature Publishing Group, Berlin (2020) Bazzo, G.T., Lorentz, G.A., Vargas, D.S., Moreira, V.P.: Assessing the impact of OCR errors in information retrieval. In: Jose, M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) Advances in Information Retrieval. Nature Publishing Group, Berlin (2020)
22.
Zurück zum Zitat Chiron, G., Doucet, A., Coustaty, M., M. Visani, M., Moreux, J. -P.: Impact of OCR errors on the use of digital libraries: towards a better access to information. In: ACM/IEEE JCDL, (2017) Chiron, G., Doucet, A., Coustaty, M., M. Visani, M., Moreux, J. -P.: Impact of OCR errors on the use of digital libraries: towards a better access to information. In: ACM/IEEE JCDL, (2017)
23.
Zurück zum Zitat Shih, K. J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. (2016) Shih, K. J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. (2016)
24.
Zurück zum Zitat Wang, X., Liu, Y., Shen, C., Ng, C. C., Luo, C., Jin, L., Chan, C. S., van den Hengel, A., Wang, L.: On the general value of evidence, and bilingual scene-text visual question answering. (2020) Wang, X., Liu, Y., Shen, C., Ng, C. C., Luo, C., Jin, L., Chan, C. S., van den Hengel, A., Wang, L.: On the general value of evidence, and bilingual scene-text visual question answering. (2020)
25.
Zurück zum Zitat Trischler, A., Wang, T., Yuan, X., Harris, J., Sordoni, A., Bachman, P., Suleman, K.: Newsqa: A machine comprehension dataset. In: CoRR, vol. abs/1611.09830, (2016) Trischler, A., Wang, T., Yuan, X., Harris, J., Sordoni, A., Bachman, P., Suleman, K.: Newsqa: A machine comprehension dataset. In: CoRR, vol. abs/1611.09830, (2016)
26.
Zurück zum Zitat Causer, T., Wallace, V.: Building a volunteer community: results and findings from transcribe Bentham. Digit. Humanit. Q. 6, 01 (2012) Causer, T., Wallace, V.: Building a volunteer community: results and findings from transcribe Bentham. Digit. Humanit. Q. 6, 01 (2012)
27.
Zurück zum Zitat Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Kelcey, M., Devlin, J., Lee, K., Toutanova, K.N., Jones, L., Chang, M.-W., Dai, A., Uszkoreit, J., Le, Q., Petrov, S.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 453–466 (2019)CrossRef Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Kelcey, M., Devlin, J., Lee, K., Toutanova, K.N., Jones, L., Chang, M.-W., Dai, A., Uszkoreit, J., Le, Q., Petrov, S.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 453–466 (2019)CrossRef
28.
Zurück zum Zitat Hermann, K. M., Kocisky, T.,Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M.,Blunsom, P.: Teaching machines to read and comprehend. In: NeurIPS, (2015) Hermann, K. M., Kocisky, T.,Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M.,Blunsom, P.: Teaching machines to read and comprehend. In: NeurIPS, (2015)
29.
Zurück zum Zitat Shen, Y., Huang, P.-S., Gao, J., Chen, W.: Reasonet: Learning to stop reading in machine comprehension. In: ACM SIGKDD, (2017) Shen, Y., Huang, P.-S., Gao, J., Chen, W.: Reasonet: Learning to stop reading in machine comprehension. In: ACM SIGKDD, (2017)
30.
31.
Zurück zum Zitat Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W.: Are you talking to a machine? dataset and methods for multilingual image question. In: NeurIPS, (2015) Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W.: Are you talking to a machine? dataset and methods for multilingual image question. In: NeurIPS, (2015)
32.
Zurück zum Zitat Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: NeurIPS, (2015) Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: NeurIPS, (2015)
33.
Zurück zum Zitat Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: CVPR, (2017) Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: CVPR, (2017)
34.
Zurück zum Zitat Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C. L., Girshick, R.: CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In: CVPR, (2017) Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C. L., Girshick, R.: CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In: CVPR, (2017)
35.
Zurück zum Zitat Agrawal, A., Batra, D., Parikh, D., Kembhavi, A.: Don’t just assume; look and answer: overcoming priors for visual question answering. In: CVPR, (2018) Agrawal, A., Batra, D., Parikh, D., Kembhavi, A.: Don’t just assume; look and answer: overcoming priors for visual question answering. In: CVPR, (2018)
36.
Zurück zum Zitat Gurari, D., Li, Q., Stangl, A. J., Guo, A., Lin, C., Grauman, K., Luo, J., Bigham, J. P.: VizWiz Grand challenge: answering visual questions from blind people. In: CoRR, vol. abs/1802.08218, 2018 Gurari, D., Li, Q., Stangl, A. J., Guo, A., Lin, C., Grauman, K., Luo, J., Bigham, J. P.: VizWiz Grand challenge: answering visual questions from blind people. In: CoRR, vol. abs/1802.08218, 2018
37.
Zurück zum Zitat Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J.,Popov, S., Kamali, S., Malloci, M., Pont-Tuset, J., Veit, A., Belongie, S., Gomes, V., Gupta, A., Sun, C., Chechik, G., Cai, D., Feng, Z., Narayanan, D., Murphy, K.: OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available fromhttps://storage.googleapis.com/openimages/web/index.html, (2017) Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J.,Popov, S., Kamali, S., Malloci, M., Pont-Tuset, J., Veit, A., Belongie, S., Gomes, V., Gupta, A., Sun, C., Chechik, G., Cai, D., Feng, Z., Narayanan, D., Murphy, K.: OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available fromhttps://​storage.​googleapis.​com/​openimages/​web/​index.​html, (2017)
38.
Zurück zum Zitat Kahou, S. E., Michalski, V., Atkinson, A., Trischler,K. A., Bengio, Y.: FigureQA: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300, (2017) Kahou, S. E., Michalski, V., Atkinson, A., Trischler,K. A., Bengio, Y.: FigureQA: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:​1710.​07300, (2017)
39.
Zurück zum Zitat Kembhavi, A., Seo, M., Schwenk, D., Choi, J., Farhadi, A., Hajishirzi, H.: Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension. In: CVPR, (2017) Kembhavi, A., Seo, M., Schwenk, D., Choi, J., Farhadi, A., Hajishirzi, H.: Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension. In: CVPR, (2017)
40.
Zurück zum Zitat Jain A, K., Namboodiri, A. M.: Indexing and retrieval of on-line handwritten documents. In: ICDAR, (2003) Jain A, K., Namboodiri, A. M.: Indexing and retrieval of on-line handwritten documents. In: ICDAR, (2003)
41.
Zurück zum Zitat Howe, N. R., Rath, T. M., Manmatha, R.: Boosted decision trees for word recognition in handwritten document retrieval. In: SIGIR, (2005) Howe, N. R., Rath, T. M., Manmatha, R.: Boosted decision trees for word recognition in handwritten document retrieval. In: SIGIR, (2005)
42.
Zurück zum Zitat Cao, H., Bhardwaj, A., Govindaraju, V.: A probabilistic method for keyword retrieval in handwritten document images. Pattern Recognit. 42(12), 3374–3382 (2009)CrossRef Cao, H., Bhardwaj, A., Govindaraju, V.: A probabilistic method for keyword retrieval in handwritten document images. Pattern Recognit. 42(12), 3374–3382 (2009)CrossRef
43.
Zurück zum Zitat Ahmed, R., Al-Khatib, W.G., Mahmoud, S.: A survey on handwritten documents word spotting. IJMIR 6(1), 31–47 (2017) Ahmed, R., Al-Khatib, W.G., Mahmoud, S.: A survey on handwritten documents word spotting. IJMIR 6(1), 31–47 (2017)
44.
Zurück zum Zitat Villegas, M., Puigcerver, J., Toselli, A., Sánchez, J. -A., Vidal, E.: Overview of the ImageCLEF 2016 handwritten scanned document retrieval task. In: CLEF, (2016) Villegas, M., Puigcerver, J., Toselli, A., Sánchez, J. -A., Vidal, E.: Overview of the ImageCLEF 2016 handwritten scanned document retrieval task. In: CLEF, (2016)
45.
Zurück zum Zitat Kise, K., Fukushima, S., Matsumoto, K.: Document image retrieval for QA systems based on the density distributions of successive terms. IEICE 88–D, 1843–1851 (2005) Kise, K., Fukushima, S., Matsumoto, K.: Document image retrieval for QA systems based on the density distributions of successive terms. IEICE 88–D, 1843–1851 (2005)
46.
Zurück zum Zitat Sudholt, S., Fink, G. A.: PHOCNet: A deep convolutional neural network for word spotting in handwritten documents. In: ICFHR, (2016) Sudholt, S., Fink, G. A.: PHOCNet: A deep convolutional neural network for word spotting in handwritten documents. In: ICFHR, (2016)
47.
Zurück zum Zitat Sudholt, S., Fink, G. A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: ICDAR, (2017) Sudholt, S., Fink, G. A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: ICDAR, (2017)
48.
Zurück zum Zitat Gómez, L., Rusiñol, M., Karatzas, D.: LSDE: Levenshtein space deep embedding for query-by-string word spotting. In: ICDAR, (2017) Gómez, L., Rusiñol, M., Karatzas, D.: LSDE: Levenshtein space deep embedding for query-by-string word spotting. In: ICDAR, (2017)
49.
Zurück zum Zitat Jaakkola, T. S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NeurIPS, (1999) Jaakkola, T. S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NeurIPS, (1999)
50.
Zurück zum Zitat Perronnin F., Dance, C. R.: Fisher Kernels on visual vocabularies for image categorization. In: CVPR, (2007) Perronnin F., Dance, C. R.: Fisher Kernels on visual vocabularies for image categorization. In: CVPR, (2007)
51.
Zurück zum Zitat Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput, Vis. 105, 222–245 (2013)MathSciNetCrossRef Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput, Vis. 105, 222–245 (2013)MathSciNetCrossRef
52.
Zurück zum Zitat Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: ECCV, (2010) Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: ECCV, (2010)
53.
Zurück zum Zitat Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. TPAMI 34, 1704–1716 (2012)CrossRef Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. TPAMI 34, 1704–1716 (2012)CrossRef
54.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227, (2014) Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:​1406.​2227, (2014)
55.
Zurück zum Zitat Bradski, G.: The openCV library. Dr. Dobb’s J. Softw. Tools 25, 120–125 (2000) Bradski, G.: The openCV library. Dr. Dobb’s J. Softw. Tools 25, 120–125 (2000)
56.
Zurück zum Zitat Loper E., Bird, S.: NLTK: The natural language toolkit. In: ACL ETMTNLP Loper E., Bird, S.: NLTK: The natural language toolkit. In: ACL ETMTNLP
57.
58.
Zurück zum Zitat Marti, U.-V., Bunke, H.: The IAM-database: an english sentence database for offline handwriting recognition. IJDAR 5, 39–46 (2002)CrossRef Marti, U.-V., Bunke, H.: The IAM-database: an english sentence database for offline handwriting recognition. IJDAR 5, 39–46 (2002)CrossRef
59.
Zurück zum Zitat Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2007)MATH Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2007)MATH
60.
Zurück zum Zitat Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide. O’Reilly Media Inc, Sebastopol (2015) Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide. O’Reilly Media Inc, Sebastopol (2015)
61.
Zurück zum Zitat Cao, H., Govindaraju, V., Bhardwaj, A.: Unconstrained handwritten document retrieval. IJDAR 14, 145–157 (2011)CrossRef Cao, H., Govindaraju, V., Bhardwaj, A.: Unconstrained handwritten document retrieval. IJDAR 14, 145–157 (2011)CrossRef
62.
Zurück zum Zitat Fataicha, Y., Cheriet, M., Nie, J.Y., Suen, C.Y.: Retrieving poorly degraded OCR documents. IJDAR 8, 15 (2006)CrossRef Fataicha, Y., Cheriet, M., Nie, J.Y., Suen, C.Y.: Retrieving poorly degraded OCR documents. IJDAR 8, 15 (2006)CrossRef
Metadaten
Titel
Asking questions on handwritten document collections
verfasst von
Minesh Mathew
Lluis Gomez
Dimosthenis Karatzas
C. V. Jawahar
Publikationsdatum
06.08.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 3/2021
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-021-00383-3

Weitere Artikel der Ausgabe 3/2021

International Journal on Document Analysis and Recognition (IJDAR) 3/2021 Zur Ausgabe

Premium Partner