Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 4/2019

31.07.2019 | Original Paper

HWNet v2: an efficient word image representation for handwritten documents

verfasst von: Praveen Krishnan, C. V. Jawahar

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a framework for learning an efficient holistic representation for handwritten word images. The proposed method uses a deep convolutional neural network with traditional classification loss. The major strengths of our work lie in: (i) the efficient usage of synthetic data to pre-train a deep network, (ii) an adapted version of the ResNet-34 architecture with the region of interest pooling (referred to as HWNet v2) which learns discriminative features for variable sized word images, and (iii) a realistic augmentation of training data with multiple scales and distortions which mimics the natural process of handwriting. We further investigate the process of transfer learning to reduce the domain gap between synthetic and real domain and also analyze the invariances learned at different layers of the network using visualization techniques proposed in the literature. Our representation leads to a state-of-the-art word spotting performance on standard handwritten datasets and historical manuscripts in different languages with minimal representation size. On the challenging iam dataset, our method is first to report an mAP of around 0.90 for word spotting with a representation size of just 32 dimensions. Furthermore, we also present results on printed document datasets in English and Indic scripts which validates the generic nature of the proposed framework for learning word image representation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aldavert, D., Rusinol, M., Toledo, R., Lladós, J.: Integrating visual and textual cues for query-by-string word spotting. In: ICDAR (2013) Aldavert, D., Rusinol, M., Toledo, R., Lladós, J.: Integrating visual and textual cues for query-by-string word spotting. In: ICDAR (2013)
2.
Zurück zum Zitat Aldavert, D., Rusiñol, M., Toledo, R., Lladós, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. In: IJDAR (2015) Aldavert, D., Rusiñol, M., Toledo, R., Lladós, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. In: IJDAR (2015)
3.
Zurück zum Zitat Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Segmentation-free word spotting with exemplar SVMs. In: PR (2014) Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Segmentation-free word spotting with exemplar SVMs. In: PR (2014)
4.
Zurück zum Zitat Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. In: PAMI (2014) Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. In: PAMI (2014)
5.
Zurück zum Zitat Ambati, V., Balakrishnan, N., Reddy, R., Pratha, L., Jawahar, C.V.: The digital library of India Project: process, policies and architecture. In: ICDL (2007) Ambati, V., Balakrishnan, N., Reddy, R., Pratha, L., Jawahar, C.V.: The digital library of India Project: process, policies and architecture. In: ICDL (2007)
6.
Zurück zum Zitat Axler, G., Wolf, L.: Toward a dataset-agnostic word segmentation method. In: ICIP (2018) Axler, G., Wolf, L.: Toward a dataset-agnostic word segmentation method. In: ICIP (2018)
7.
Zurück zum Zitat Balasubramanian, A., Meshesha, M., Jawahar, C.V.: Retrieval from document image collections. In: DAS (2006) Balasubramanian, A., Meshesha, M., Jawahar, C.V.: Retrieval from document image collections. In: DAS (2006)
8.
Zurück zum Zitat Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML (2009) Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML (2009)
9.
Zurück zum Zitat Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. In: Digital Humanities Quarterly (2012) Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. In: Digital Humanities Quarterly (2012)
10.
Zurück zum Zitat Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: ICDAR (2015) Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: ICDAR (2015)
11.
Zurück zum Zitat Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004) Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)
12.
Zurück zum Zitat Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
13.
Zurück zum Zitat Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef
14.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)
15.
Zurück zum Zitat Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014) Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)
16.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. In: IJCV (2010) Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. In: IJCV (2010)
17.
Zurück zum Zitat Fischer, A., Frinken, V., Bunke, H., Suen, C.Y.: Improving HMM-based keyword spotting with character language models. In: 2013 12th International Conference on Document Analysis and Recognition (2013) Fischer, A., Frinken, V., Bunke, H., Suen, C.Y.: Improving HMM-based keyword spotting with character language models. In: 2013 12th International Conference on Document Analysis and Recognition (2013)
18.
Zurück zum Zitat Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. In: PRL (2012) Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. In: PRL (2012)
19.
Zurück zum Zitat Ghosh, S., Valveny, E.: Text box proposals for handwritten word spotting from documents. In: IJDAR (2018)CrossRef Ghosh, S., Valveny, E.: Text box proposals for handwritten word spotting from documents. In: IJDAR (2018)CrossRef
20.
Zurück zum Zitat Ghosh, S.K., Valveny, E.: A sliding window framework for word spotting based on word attributes. In: IbPRIA (2015) Ghosh, S.K., Valveny, E.: A sliding window framework for word spotting based on word attributes. In: IbPRIA (2015)
21.
Zurück zum Zitat Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recognit. 68, 310–332 (2017)CrossRef Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recognit. 68, 310–332 (2017)CrossRef
22.
Zurück zum Zitat Girshick, R.: Fast R-CNN. In: ICCV (2015) Girshick, R.: Fast R-CNN. In: ICCV (2015)
23.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
24.
Zurück zum Zitat Gómez, L., Rusinol, M., Karatzas, D.: LSDE: Levenshtein space deep embedding for query-by-string word spotting. In: ICDAR (2017) Gómez, L., Rusinol, M., Karatzas, D.: LSDE: Levenshtein space deep embedding for query-by-string word spotting. In: ICDAR (2017)
25.
Zurück zum Zitat Gordo, A., Almazán, J., Murray, N., Perronin, F.: LEWIS: latent embeddings for word images and their semantics. In: ICCV (2015) Gordo, A., Almazán, J., Murray, N., Perronin, F.: LEWIS: latent embeddings for word images and their semantics. In: ICCV (2015)
26.
Zurück zum Zitat Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: ICDAR (2015) Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: ICDAR (2015)
27.
Zurück zum Zitat Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference (1988) Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference (1988)
28.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015) He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)
29.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
30.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: CoRR (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: CoRR (2015)
31.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. In: IJCV (2014) Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. In: IJCV (2014)
32.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: CoRR (2014) Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: CoRR (2014)
33.
Zurück zum Zitat Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: ECCV (2014) Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: ECCV (2014)
34.
Zurück zum Zitat Kovalchuk, A., Wolf, L., Dershowitz, N.: A simple and fast word spotting method. In: ICFHR (2014) Kovalchuk, A., Wolf, L., Dershowitz, N.: A simple and fast word spotting method. In: ICFHR (2014)
35.
Zurück zum Zitat Krishnan, P., Dutta, K., Jawahar, C.V.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: ICFHR (2016) Krishnan, P., Dutta, K., Jawahar, C.V.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: ICFHR (2016)
36.
Zurück zum Zitat Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: DAS (2018) Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: DAS (2018)
37.
Zurück zum Zitat Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: ECCV (2016) Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: ECCV (2016)
38.
Zurück zum Zitat Krishnan, P., Shekhar, R., Jawahar, C.: Content level access to digital library of India pages. In: ICVGIP (2012) Krishnan, P., Shekhar, R., Jawahar, C.: Content level access to digital library of India pages. In: ICVGIP (2012)
39.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
40.
Zurück zum Zitat Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: ACCV (2007) Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: ACCV (2007)
41.
Zurück zum Zitat Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004)CrossRef Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004)CrossRef
42.
Zurück zum Zitat Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. In: JMLR (2008) Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. In: JMLR (2008)
43.
Zurück zum Zitat Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015) Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015)
44.
Zurück zum Zitat Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-SVMs for object detection and beyond. In: ICCV (2011) Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-SVMs for object detection and beyond. In: ICCV (2011)
45.
Zurück zum Zitat Manmatha, R., Han, C., Riseman, E.M.: Word spotting: A new approach to indexing handwriting. In: CVPR (1996) Manmatha, R., Han, C., Riseman, E.M.: Word spotting: A new approach to indexing handwriting. In: CVPR (1996)
46.
Zurück zum Zitat Marti, U., Bunke, H.: Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. In: IJPRAI (2001) Marti, U., Bunke, H.: Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. In: IJPRAI (2001)
47.
Zurück zum Zitat Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. In: IJDAR (2002) Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. In: IJDAR (2002)
48.
Zurück zum Zitat Meshesha, M., Jawahar, C.V.: Matching Word Images for Content-based Retrieval from Printed Document Images. In: IJDAR (2008) Meshesha, M., Jawahar, C.V.: Matching Word Images for Content-based Retrieval from Printed Document Images. In: IJDAR (2008)
49.
Zurück zum Zitat Myers, C., Rabiner, L., Rosenberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 28(6), 623–635 (1980)CrossRef Myers, C., Rabiner, L., Rosenberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 28(6), 623–635 (1980)CrossRef
50.
Zurück zum Zitat Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007) Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)
51.
Zurück zum Zitat Perronnin, F., Rodríguez-Serrano, J.A.: Fisher kernels for handwritten word-spotting. In: ICDAR (2009) Perronnin, F., Rodríguez-Serrano, J.A.: Fisher kernels for handwritten word-spotting. In: ICDAR (2009)
52.
Zurück zum Zitat Poznanski, A., Wolf, L.: CNN-N-Gram for handwriting word recognition. In: CVPR (2016) Poznanski, A., Wolf, L.: CNN-N-Gram for handwriting word recognition. In: CVPR (2016)
53.
Zurück zum Zitat Pratikakis, I., Zagoris, K., Gatos, B., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016). In: ICFHR (2016) Pratikakis, I., Zagoris, K., Gatos, B., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016). In: ICFHR (2016)
54.
Zurück zum Zitat Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: CVPR (2003) Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: CVPR (2003)
55.
Zurück zum Zitat Rath, T.M., Manmatha, R.: Word spotting for historical documents. In: IJDAR (2007) Rath, T.M., Manmatha, R.: Word spotting for historical documents. In: IJDAR (2007)
56.
Zurück zum Zitat Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR (2014) Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR (2014)
57.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
58.
Zurück zum Zitat Rodriguez, J.A., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents (2008) Rodriguez, J.A., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents (2008)
59.
Zurück zum Zitat Rodríguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. In: PAMI (2012) Rodríguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. In: PAMI (2012)
60.
Zurück zum Zitat Rohlicek, J.R., Russell, W., Roukos, S., Gish, H.: Continuous hidden Markov modeling for speaker-independent word spotting. In: ICASSP (1989) Rohlicek, J.R., Russell, W., Roukos, S., Gish, H.: Continuous hidden Markov modeling for speaker-independent word spotting. In: ICASSP (1989)
61.
Zurück zum Zitat Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016) Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)
62.
Zurück zum Zitat Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: ECCV (2006) Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: ECCV (2006)
63.
Zurück zum Zitat Rothacker, L., Rusinol, M., Fink, G.A.: Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: ICDAR (2013) Rothacker, L., Rusinol, M., Fink, G.A.: Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: ICDAR (2013)
64.
Zurück zum Zitat Rothacker, L., Sudholt, S., Rusakov, E., Kasperidus, M., Fink, G.A.: Word hypotheses for segmentation-free word spotting in historic document images. In: ICDAR Rothacker, L., Sudholt, S., Rusakov, E., Kasperidus, M., Fink, G.A.: Word hypotheses for segmentation-free word spotting in historic document images. In: ICDAR
65.
Zurück zum Zitat Roy, P.P., Rayar, F., Ramel, J.Y.: Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis. Comput. 44, 15–28 (2015)CrossRef Roy, P.P., Rayar, F., Ramel, J.Y.: Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis. Comput. 44, 15–28 (2015)CrossRef
66.
Zurück zum Zitat Rozantsev, A., Lepetit, V., Fua, P.: On rendering synthetic images for training an object detector. In: CVIU (2015) Rozantsev, A., Lepetit, V., Fua, P.: On rendering synthetic images for training an object detector. In: CVIU (2015)
67.
Zurück zum Zitat Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing heterogeneous document collections by a segmentation-free word spotting method. In: ICDAR (2011) Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing heterogeneous document collections by a segmentation-free word spotting method. In: ICDAR (2011)
68.
Zurück zum Zitat Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. In: PR (2015) Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. In: PR (2015)
69.
Zurück zum Zitat Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)CrossRef Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)CrossRef
70.
Zurück zum Zitat Shekhar, R., Jawahar, C.V.: Word image retrieval using bag of visual words. In: DAS (2012) Shekhar, R., Jawahar, C.V.: Word image retrieval using bag of visual words. In: DAS (2012)
71.
Zurück zum Zitat Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR (2003) Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR (2003)
72.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CoRR (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CoRR (2014)
73.
Zurück zum Zitat Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: ICCV (2003) Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: ICCV (2003)
74.
Zurück zum Zitat Sudholt, S., Fink, G.A.: PHOCNet: A deep convolutional neural network for word spotting in handwritten documents. In: ICFHR (2016) Sudholt, S., Fink, G.A.: PHOCNet: A deep convolutional neural network for word spotting in handwritten documents. In: ICFHR (2016)
75.
Zurück zum Zitat Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: ICDAR (2017) Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: ICDAR (2017)
76.
Zurück zum Zitat Sudholt, S., Fink, G.A.: Attribute CNNs for word spotting in handwritten documents. Int. J. Doc. Anal. Recognit. (IJDAR) 21(3), 199–218 (2018)CrossRef Sudholt, S., Fink, G.A.: Attribute CNNs for word spotting in handwritten documents. Int. J. Doc. Anal. Recognit. (IJDAR) 21(3), 199–218 (2018)CrossRef
77.
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
78.
Zurück zum Zitat Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:​1312.​6199 (2013)
79.
Zurück zum Zitat Terasawa, K., Tanaka, Y.: Slit style HOG feature for document image word spotting. In: ICDAR (2009) Terasawa, K., Tanaka, Y.: Slit style HOG feature for document image word spotting. In: ICDAR (2009)
80.
Zurück zum Zitat Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370–371, 497–518 (2016)CrossRef Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370–371, 497–518 (2016)CrossRef
81.
Zurück zum Zitat Vinciarelli, A., Bengio, S.: Offline cursive word recognition using continuous density hidden markov models trained with PCA or ICA features. In: ICPR (2002) Vinciarelli, A., Bengio, S.: Offline cursive word recognition using continuous density hidden markov models trained with PCA or ICA features. In: ICPR (2002)
82.
Zurück zum Zitat Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010) Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
83.
Zurück zum Zitat Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: ICFHR (2016) Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: ICFHR (2016)
84.
Zurück zum Zitat Wilkinson, T., Lindstrom, J., Brun, A.: Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In: ICCV (2017) Wilkinson, T., Lindstrom, J., Brun, A.: Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In: ICCV (2017)
85.
Zurück zum Zitat Wilkinson, T., Lindström, J., Brun, A.: Neural word search in historical manuscript collections. In: CoRR arXiv:1812.02771 (2018) Wilkinson, T., Lindström, J., Brun, A.: Neural word search in historical manuscript collections. In: CoRR arXiv:​1812.​02771 (2018)
86.
Zurück zum Zitat Yalniz, I.Z., Manmatha, R.: An efficient framework for searching text in noisy document images. In: DAS (2012) Yalniz, I.Z., Manmatha, R.: An efficient framework for searching text in noisy document images. In: DAS (2012)
87.
Zurück zum Zitat Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009) Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
88.
Zurück zum Zitat Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS (2014) Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS (2014)
89.
Zurück zum Zitat Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015) Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:​1506.​06579 (2015)
90.
Zurück zum Zitat Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014) Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014)
Metadaten
Titel
HWNet v2: an efficient word image representation for handwritten documents
verfasst von
Praveen Krishnan
C. V. Jawahar
Publikationsdatum
31.07.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 4/2019
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-019-00336-x

Weitere Artikel der Ausgabe 4/2019

International Journal on Document Analysis and Recognition (IJDAR) 4/2019 Zur Ausgabe