Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 3/2021

06-08-2021 | Special Issue Paper

Asking questions on handwritten document collections

Authors: Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, C. V. Jawahar

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 3/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at docvqa.org.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Vincent, L.: Google book search: document understanding on a massive scale. In: ICDAR, (2007) Vincent, L.: Google book search: document understanding on a massive scale. In: ICDAR, (2007)
2.
go back to reference Jaume, Kemal Ekenel,H. K.,Thiran, J.: FUNSD: A dataset for form understanding in noisy scanned documents. In: International Conference on Document Analysis and Recognition Workshops (ICDARW), (2019) Jaume, Kemal Ekenel,H. K.,Thiran, J.: FUNSD: A dataset for form understanding in noisy scanned documents. In: International Conference on Document Analysis and Recognition Workshops (ICDARW), (2019)
3.
go back to reference Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., Jawahar, C. V.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: ICDAR, (2019) Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., Jawahar, C. V.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: ICDAR, (2019)
4.
go back to reference Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: Pre-training of text and layout for document image understanding. In: CoRR, vol. abs/1912.13318, (2019) Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: Pre-training of text and layout for document image understanding. In: CoRR, vol. abs/1912.13318, (2019)
5.
go back to reference Mathew, M., Karatzas, D., Jawahar, C.V.: Docvqa: A dataset for vqa on document images. In: WACV, (2021) Mathew, M., Karatzas, D., Jawahar, C.V.: Docvqa: A dataset for vqa on document images. In: WACV, (2021)
6.
go back to reference Oliveira, D, A, B., Viana, M. P.: Fast CNN-based document layout analysis. In: ICCVW, (2017) Oliveira, D, A, B., Viana, M. P.: Fast CNN-based document layout analysis. In: ICCVW, (2017)
7.
go back to reference Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In: CoRR, vol. abs/1507.05717, (2015) Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In: CoRR, vol. abs/1507.05717, (2015)
8.
go back to reference Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: DAS, (2018) Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: DAS, (2018)
9.
go back to reference Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C, L.: Microsoft COCO captions: data collection and evaluation server. CoRR, vol. abs/1504.00325, (2015) Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C, L.: Microsoft COCO captions: data collection and evaluation server. CoRR, vol. abs/1504.00325, (2015)
10.
go back to reference Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C, L., Parikh, D.: VQA: Visual question answering. In: ICCV, (2015) Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C, L., Parikh, D.: VQA: Visual question answering. In: ICCV, (2015)
11.
go back to reference Kafle, K., Price, B., Cohen, S., Kanan, C.: DVQA: Understanding data visualizations via question answering. In: CVPR, (2018) Kafle, K., Price, B., Cohen, S., Kanan, C.: DVQA: Understanding data visualizations via question answering. In: CVPR, (2018)
12.
go back to reference Singh, A.,Natarjan, V., Shah, M., Jiang, Y., Chen, X., Parikh, D., Rohrbach, M.: Towards VQA models that can read. In: CVPR, (2019) Singh, A.,Natarjan, V., Shah, M., Jiang, Y., Chen, X., Parikh, D., Rohrbach, M.: Towards VQA models that can read. In: CVPR, (2019)
13.
go back to reference Biten, A, F., Tito, R., Mafla, A., Gómez, L., Rusiñol, M., Valveny, E., Jawahar, C, V., Karatzas, D.: Scene text visual question answering. In: CoRR, vol. abs/1905.13648, (2019) Biten, A, F., Tito, R., Mafla, A., Gómez, L., Rusiñol, M., Valveny, E., Jawahar, C, V., Karatzas, D.: Scene text visual question answering. In: CoRR, vol. abs/1905.13648, (2019)
14.
go back to reference Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:1606.05250, (2016) Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:​1606.​05250, (2016)
15.
go back to reference Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L.: MS MARCO: A human generated machine reading comprehension dataset. In: CoRR, vol. abs/1611.09268, (2016) Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L.: MS MARCO: A human generated machine reading comprehension dataset. In: CoRR, vol. abs/1611.09268, (2016)
16.
go back to reference Chen, D., Fisch, A., Weston, J., Bordes, A: Reading wikipedia to answer open-domain questions. In: ACL, (2017) Chen, D., Fisch, A., Weston, J., Bordes, A: Reading wikipedia to answer open-domain questions. In: ACL, (2017)
17.
go back to reference Devlin, J., Chang, M. -W., Lee K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: ACL, (2019) Devlin, J., Chang, M. -W., Lee K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: ACL, (2019)
18.
go back to reference Mathew, M., Tito, R., Karatzas, D., Manmatha, R., Jawahar, C. V.: Document visual question answering challenge 2020. (2020) Mathew, M., Tito, R., Karatzas, D., Manmatha, R., Jawahar, C. V.: Document visual question answering challenge 2020. (2020)
19.
go back to reference Wigington, C., Tensmeyer, C., Davis, B. L., Barrett, W. A., Price, B. L., Cohen, S.: Start, follow, read: end-to-end full-page handwriting recognition. In: ECCV, (2018) Wigington, C., Tensmeyer, C., Davis, B. L., Barrett, W. A., Price, B. L., Cohen, S.: Start, follow, read: end-to-end full-page handwriting recognition. In: ECCV, (2018)
20.
go back to reference Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. (2020) Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. (2020)
21.
go back to reference Bazzo, G.T., Lorentz, G.A., Vargas, D.S., Moreira, V.P.: Assessing the impact of OCR errors in information retrieval. In: Jose, M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) Advances in Information Retrieval. Nature Publishing Group, Berlin (2020) Bazzo, G.T., Lorentz, G.A., Vargas, D.S., Moreira, V.P.: Assessing the impact of OCR errors in information retrieval. In: Jose, M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) Advances in Information Retrieval. Nature Publishing Group, Berlin (2020)
22.
go back to reference Chiron, G., Doucet, A., Coustaty, M., M. Visani, M., Moreux, J. -P.: Impact of OCR errors on the use of digital libraries: towards a better access to information. In: ACM/IEEE JCDL, (2017) Chiron, G., Doucet, A., Coustaty, M., M. Visani, M., Moreux, J. -P.: Impact of OCR errors on the use of digital libraries: towards a better access to information. In: ACM/IEEE JCDL, (2017)
23.
go back to reference Shih, K. J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. (2016) Shih, K. J., Singh, S., Hoiem, D.: Where to look: focus regions for visual question answering. (2016)
24.
go back to reference Wang, X., Liu, Y., Shen, C., Ng, C. C., Luo, C., Jin, L., Chan, C. S., van den Hengel, A., Wang, L.: On the general value of evidence, and bilingual scene-text visual question answering. (2020) Wang, X., Liu, Y., Shen, C., Ng, C. C., Luo, C., Jin, L., Chan, C. S., van den Hengel, A., Wang, L.: On the general value of evidence, and bilingual scene-text visual question answering. (2020)
25.
go back to reference Trischler, A., Wang, T., Yuan, X., Harris, J., Sordoni, A., Bachman, P., Suleman, K.: Newsqa: A machine comprehension dataset. In: CoRR, vol. abs/1611.09830, (2016) Trischler, A., Wang, T., Yuan, X., Harris, J., Sordoni, A., Bachman, P., Suleman, K.: Newsqa: A machine comprehension dataset. In: CoRR, vol. abs/1611.09830, (2016)
26.
go back to reference Causer, T., Wallace, V.: Building a volunteer community: results and findings from transcribe Bentham. Digit. Humanit. Q. 6, 01 (2012) Causer, T., Wallace, V.: Building a volunteer community: results and findings from transcribe Bentham. Digit. Humanit. Q. 6, 01 (2012)
27.
go back to reference Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Kelcey, M., Devlin, J., Lee, K., Toutanova, K.N., Jones, L., Chang, M.-W., Dai, A., Uszkoreit, J., Le, Q., Petrov, S.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 453–466 (2019)CrossRef Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Kelcey, M., Devlin, J., Lee, K., Toutanova, K.N., Jones, L., Chang, M.-W., Dai, A., Uszkoreit, J., Le, Q., Petrov, S.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist. 7, 453–466 (2019)CrossRef
28.
go back to reference Hermann, K. M., Kocisky, T.,Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M.,Blunsom, P.: Teaching machines to read and comprehend. In: NeurIPS, (2015) Hermann, K. M., Kocisky, T.,Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M.,Blunsom, P.: Teaching machines to read and comprehend. In: NeurIPS, (2015)
29.
go back to reference Shen, Y., Huang, P.-S., Gao, J., Chen, W.: Reasonet: Learning to stop reading in machine comprehension. In: ACM SIGKDD, (2017) Shen, Y., Huang, P.-S., Gao, J., Chen, W.: Reasonet: Learning to stop reading in machine comprehension. In: ACM SIGKDD, (2017)
31.
go back to reference Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W.: Are you talking to a machine? dataset and methods for multilingual image question. In: NeurIPS, (2015) Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W.: Are you talking to a machine? dataset and methods for multilingual image question. In: NeurIPS, (2015)
32.
go back to reference Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: NeurIPS, (2015) Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: NeurIPS, (2015)
33.
go back to reference Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: CVPR, (2017) Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: CVPR, (2017)
34.
go back to reference Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C. L., Girshick, R.: CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In: CVPR, (2017) Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C. L., Girshick, R.: CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In: CVPR, (2017)
35.
go back to reference Agrawal, A., Batra, D., Parikh, D., Kembhavi, A.: Don’t just assume; look and answer: overcoming priors for visual question answering. In: CVPR, (2018) Agrawal, A., Batra, D., Parikh, D., Kembhavi, A.: Don’t just assume; look and answer: overcoming priors for visual question answering. In: CVPR, (2018)
36.
go back to reference Gurari, D., Li, Q., Stangl, A. J., Guo, A., Lin, C., Grauman, K., Luo, J., Bigham, J. P.: VizWiz Grand challenge: answering visual questions from blind people. In: CoRR, vol. abs/1802.08218, 2018 Gurari, D., Li, Q., Stangl, A. J., Guo, A., Lin, C., Grauman, K., Luo, J., Bigham, J. P.: VizWiz Grand challenge: answering visual questions from blind people. In: CoRR, vol. abs/1802.08218, 2018
37.
go back to reference Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J.,Popov, S., Kamali, S., Malloci, M., Pont-Tuset, J., Veit, A., Belongie, S., Gomes, V., Gupta, A., Sun, C., Chechik, G., Cai, D., Feng, Z., Narayanan, D., Murphy, K.: OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available fromhttps://storage.googleapis.com/openimages/web/index.html, (2017) Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J.,Popov, S., Kamali, S., Malloci, M., Pont-Tuset, J., Veit, A., Belongie, S., Gomes, V., Gupta, A., Sun, C., Chechik, G., Cai, D., Feng, Z., Narayanan, D., Murphy, K.: OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available fromhttps://​storage.​googleapis.​com/​openimages/​web/​index.​html, (2017)
38.
go back to reference Kahou, S. E., Michalski, V., Atkinson, A., Trischler,K. A., Bengio, Y.: FigureQA: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300, (2017) Kahou, S. E., Michalski, V., Atkinson, A., Trischler,K. A., Bengio, Y.: FigureQA: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:​1710.​07300, (2017)
39.
go back to reference Kembhavi, A., Seo, M., Schwenk, D., Choi, J., Farhadi, A., Hajishirzi, H.: Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension. In: CVPR, (2017) Kembhavi, A., Seo, M., Schwenk, D., Choi, J., Farhadi, A., Hajishirzi, H.: Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension. In: CVPR, (2017)
40.
go back to reference Jain A, K., Namboodiri, A. M.: Indexing and retrieval of on-line handwritten documents. In: ICDAR, (2003) Jain A, K., Namboodiri, A. M.: Indexing and retrieval of on-line handwritten documents. In: ICDAR, (2003)
41.
go back to reference Howe, N. R., Rath, T. M., Manmatha, R.: Boosted decision trees for word recognition in handwritten document retrieval. In: SIGIR, (2005) Howe, N. R., Rath, T. M., Manmatha, R.: Boosted decision trees for word recognition in handwritten document retrieval. In: SIGIR, (2005)
42.
go back to reference Cao, H., Bhardwaj, A., Govindaraju, V.: A probabilistic method for keyword retrieval in handwritten document images. Pattern Recognit. 42(12), 3374–3382 (2009)CrossRef Cao, H., Bhardwaj, A., Govindaraju, V.: A probabilistic method for keyword retrieval in handwritten document images. Pattern Recognit. 42(12), 3374–3382 (2009)CrossRef
43.
go back to reference Ahmed, R., Al-Khatib, W.G., Mahmoud, S.: A survey on handwritten documents word spotting. IJMIR 6(1), 31–47 (2017) Ahmed, R., Al-Khatib, W.G., Mahmoud, S.: A survey on handwritten documents word spotting. IJMIR 6(1), 31–47 (2017)
44.
go back to reference Villegas, M., Puigcerver, J., Toselli, A., Sánchez, J. -A., Vidal, E.: Overview of the ImageCLEF 2016 handwritten scanned document retrieval task. In: CLEF, (2016) Villegas, M., Puigcerver, J., Toselli, A., Sánchez, J. -A., Vidal, E.: Overview of the ImageCLEF 2016 handwritten scanned document retrieval task. In: CLEF, (2016)
45.
go back to reference Kise, K., Fukushima, S., Matsumoto, K.: Document image retrieval for QA systems based on the density distributions of successive terms. IEICE 88–D, 1843–1851 (2005) Kise, K., Fukushima, S., Matsumoto, K.: Document image retrieval for QA systems based on the density distributions of successive terms. IEICE 88–D, 1843–1851 (2005)
46.
go back to reference Sudholt, S., Fink, G. A.: PHOCNet: A deep convolutional neural network for word spotting in handwritten documents. In: ICFHR, (2016) Sudholt, S., Fink, G. A.: PHOCNet: A deep convolutional neural network for word spotting in handwritten documents. In: ICFHR, (2016)
47.
go back to reference Sudholt, S., Fink, G. A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: ICDAR, (2017) Sudholt, S., Fink, G. A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: ICDAR, (2017)
48.
go back to reference Gómez, L., Rusiñol, M., Karatzas, D.: LSDE: Levenshtein space deep embedding for query-by-string word spotting. In: ICDAR, (2017) Gómez, L., Rusiñol, M., Karatzas, D.: LSDE: Levenshtein space deep embedding for query-by-string word spotting. In: ICDAR, (2017)
49.
go back to reference Jaakkola, T. S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NeurIPS, (1999) Jaakkola, T. S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: NeurIPS, (1999)
50.
go back to reference Perronnin F., Dance, C. R.: Fisher Kernels on visual vocabularies for image categorization. In: CVPR, (2007) Perronnin F., Dance, C. R.: Fisher Kernels on visual vocabularies for image categorization. In: CVPR, (2007)
51.
go back to reference Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput, Vis. 105, 222–245 (2013)MathSciNetCrossRef Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput, Vis. 105, 222–245 (2013)MathSciNetCrossRef
52.
go back to reference Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: ECCV, (2010) Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: ECCV, (2010)
53.
go back to reference Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. TPAMI 34, 1704–1716 (2012)CrossRef Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. TPAMI 34, 1704–1716 (2012)CrossRef
54.
go back to reference Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227, (2014) Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:​1406.​2227, (2014)
55.
go back to reference Bradski, G.: The openCV library. Dr. Dobb’s J. Softw. Tools 25, 120–125 (2000) Bradski, G.: The openCV library. Dr. Dobb’s J. Softw. Tools 25, 120–125 (2000)
56.
go back to reference Loper E., Bird, S.: NLTK: The natural language toolkit. In: ACL ETMTNLP Loper E., Bird, S.: NLTK: The natural language toolkit. In: ACL ETMTNLP
58.
go back to reference Marti, U.-V., Bunke, H.: The IAM-database: an english sentence database for offline handwriting recognition. IJDAR 5, 39–46 (2002)CrossRef Marti, U.-V., Bunke, H.: The IAM-database: an english sentence database for offline handwriting recognition. IJDAR 5, 39–46 (2002)CrossRef
59.
go back to reference Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2007)MATH Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2007)MATH
60.
go back to reference Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide. O’Reilly Media Inc, Sebastopol (2015) Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide. O’Reilly Media Inc, Sebastopol (2015)
61.
go back to reference Cao, H., Govindaraju, V., Bhardwaj, A.: Unconstrained handwritten document retrieval. IJDAR 14, 145–157 (2011)CrossRef Cao, H., Govindaraju, V., Bhardwaj, A.: Unconstrained handwritten document retrieval. IJDAR 14, 145–157 (2011)CrossRef
62.
go back to reference Fataicha, Y., Cheriet, M., Nie, J.Y., Suen, C.Y.: Retrieving poorly degraded OCR documents. IJDAR 8, 15 (2006)CrossRef Fataicha, Y., Cheriet, M., Nie, J.Y., Suen, C.Y.: Retrieving poorly degraded OCR documents. IJDAR 8, 15 (2006)CrossRef
Metadata
Title
Asking questions on handwritten document collections
Authors
Minesh Mathew
Lluis Gomez
Dimosthenis Karatzas
C. V. Jawahar
Publication date
06-08-2021
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 3/2021
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-021-00383-3

Other articles of this Issue 3/2021

International Journal on Document Analysis and Recognition (IJDAR) 3/2021 Go to the issue

Premium Partner