nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

31.07.2019 | Original Paper

HWNet v2: an efficient word image representation for handwritten documents

verfasst von: Praveen Krishnan, C. V. Jawahar

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present a framework for learning an efficient holistic representation for handwritten word images. The proposed method uses a deep convolutional neural network with traditional classification loss. The major strengths of our work lie in: (i) the efficient usage of synthetic data to pre-train a deep network, (ii) an adapted version of the ResNet-34 architecture with the region of interest pooling (referred to as HWNet v2) which learns discriminative features for variable sized word images, and (iii) a realistic augmentation of training data with multiple scales and distortions which mimics the natural process of handwriting. We further investigate the process of transfer learning to reduce the domain gap between synthetic and real domain and also analyze the invariances learned at different layers of the network using visualization techniques proposed in the literature. Our representation leads to a state-of-the-art word spotting performance on standard handwritten datasets and historical manuscripts in different languages with minimal representation size. On the challenging iam dataset, our method is first to report an mAP of around 0.90 for word spotting with a representation size of just 32 dimensions. Furthermore, we also present results on printed document datasets in English and Indic scripts which validates the generic nature of the proposed framework for learning word image representation.

Vorheriger Artikel Patch-based offline signature verification using one-class hierarchical deep learning

Nächster Artikel HanFont: large-scale adaptive Hangul font recognizer using CNN and font clustering

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

We use ImageMagick for rendering the word images. URL: http://www.imagemagick.org/script/index.php.

http://cvit.iiit.ac.in/research/projects/cvit-projects/hwnet.

Aldavert, D., Rusinol, M., Toledo, R., Lladós, J.: Integrating visual and textual cues for query-by-string word spotting. In: ICDAR (2013)

Aldavert, D., Rusiñol, M., Toledo, R., Lladós, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. In: IJDAR (2015)

Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Segmentation-free word spotting with exemplar SVMs. In: PR (2014)

Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. In: PAMI (2014)

Ambati, V., Balakrishnan, N., Reddy, R., Pratha, L., Jawahar, C.V.: The digital library of India Project: process, policies and architecture. In: ICDL (2007)

Axler, G., Wolf, L.: Toward a dataset-agnostic word segmentation method. In: ICIP (2018)

Balasubramanian, A., Meshesha, M., Jawahar, C.V.: Retrieval from document image collections. In: DAS (2006)

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML (2009)

Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. In: Digital Humanities Quarterly (2012)

10.

Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: ICDAR (2015)

11.

Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)

12.

Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

13.

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef

14.

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)

15.

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)

16.

Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. In: IJCV (2010)

17.

Fischer, A., Frinken, V., Bunke, H., Suen, C.Y.: Improving HMM-based keyword spotting with character language models. In: 2013 12th International Conference on Document Analysis and Recognition (2013)

18.

Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. In: PRL (2012)

19.

Ghosh, S., Valveny, E.: Text box proposals for handwritten word spotting from documents. In: IJDAR (2018)CrossRef

20.

Ghosh, S.K., Valveny, E.: A sliding window framework for word spotting based on word attributes. In: IbPRIA (2015)

21.

Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recognit. 68, 310–332 (2017)CrossRef

22.

Girshick, R.: Fast R-CNN. In: ICCV (2015)

23.

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

24.

Gómez, L., Rusinol, M., Karatzas, D.: LSDE: Levenshtein space deep embedding for query-by-string word spotting. In: ICDAR (2017)

25.

Gordo, A., Almazán, J., Murray, N., Perronin, F.: LEWIS: latent embeddings for word images and their semantics. In: ICCV (2015)

26.

Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: ICDAR (2015)

27.

Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference (1988)

28.

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)

29.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

30.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: CoRR (2015)

31.

Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. In: IJCV (2014)

32.

Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: CoRR (2014)

33.

Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: ECCV (2014)

34.

Kovalchuk, A., Wolf, L., Dershowitz, N.: A simple and fast word spotting method. In: ICFHR (2014)

35.

Krishnan, P., Dutta, K., Jawahar, C.V.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: ICFHR (2016)

36.

Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: DAS (2018)

37.

Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: ECCV (2016)

38.

Krishnan, P., Shekhar, R., Jawahar, C.: Content level access to digital library of India pages. In: ICVGIP (2012)

39.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

40.

Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: ACCV (2007)

41.

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004)CrossRef

42.

Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. In: JMLR (2008)

43.

Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015)

44.

Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-SVMs for object detection and beyond. In: ICCV (2011)

45.

Manmatha, R., Han, C., Riseman, E.M.: Word spotting: A new approach to indexing handwriting. In: CVPR (1996)

46.

Marti, U., Bunke, H.: Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. In: IJPRAI (2001)

47.

Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. In: IJDAR (2002)

48.

Meshesha, M., Jawahar, C.V.: Matching Word Images for Content-based Retrieval from Printed Document Images. In: IJDAR (2008)

49.

Myers, C., Rabiner, L., Rosenberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 28(6), 623–635 (1980)CrossRef

50.

Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)

51.

Perronnin, F., Rodríguez-Serrano, J.A.: Fisher kernels for handwritten word-spotting. In: ICDAR (2009)

52.

Poznanski, A., Wolf, L.: CNN-N-Gram for handwriting word recognition. In: CVPR (2016)

53.

Pratikakis, I., Zagoris, K., Gatos, B., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016). In: ICFHR (2016)

54.

Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: CVPR (2003)

55.

Rath, T.M., Manmatha, R.: Word spotting for historical documents. In: IJDAR (2007)

56.

Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR (2014)

57.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)

58.

Rodriguez, J.A., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents (2008)

59.

Rodríguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. In: PAMI (2012)

60.

Rohlicek, J.R., Russell, W., Roukos, S., Gish, H.: Continuous hidden Markov modeling for speaker-independent word spotting. In: ICASSP (1989)

61.

Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)

62.

Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: ECCV (2006)

63.

Rothacker, L., Rusinol, M., Fink, G.A.: Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: ICDAR (2013)

64.

Rothacker, L., Sudholt, S., Rusakov, E., Kasperidus, M., Fink, G.A.: Word hypotheses for segmentation-free word spotting in historic document images. In: ICDAR

65.

Roy, P.P., Rayar, F., Ramel, J.Y.: Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis. Comput. 44, 15–28 (2015)CrossRef

66.

Rozantsev, A., Lepetit, V., Fua, P.: On rendering synthetic images for training an object detector. In: CVIU (2015)

67.

Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing heterogeneous document collections by a segmentation-free word spotting method. In: ICDAR (2011)

68.

Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. In: PR (2015)

69.

Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)CrossRef

70.

Shekhar, R., Jawahar, C.V.: Word image retrieval using bag of visual words. In: DAS (2012)

71.

Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR (2003)

72.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CoRR (2014)

73.

Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: ICCV (2003)

74.

Sudholt, S., Fink, G.A.: PHOCNet: A deep convolutional neural network for word spotting in handwritten documents. In: ICFHR (2016)

75.

Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: ICDAR (2017)

76.

Sudholt, S., Fink, G.A.: Attribute CNNs for word spotting in handwritten documents. Int. J. Doc. Anal. Recognit. (IJDAR) 21(3), 199–218 (2018)CrossRef

77.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)

78.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)

79.

Terasawa, K., Tanaka, Y.: Slit style HOG feature for document image word spotting. In: ICDAR (2009)

80.

Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370–371, 497–518 (2016)CrossRef

81.

Vinciarelli, A., Bengio, S.: Offline cursive word recognition using continuous density hidden markov models trained with PCA or ICA features. In: ICPR (2002)

82.

Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)

83.

Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: ICFHR (2016)

84.

Wilkinson, T., Lindstrom, J., Brun, A.: Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In: ICCV (2017)

85.

Wilkinson, T., Lindström, J., Brun, A.: Neural word search in historical manuscript collections. In: CoRR arXiv:1812.02771 (2018)

86.

Yalniz, I.Z., Manmatha, R.: An efficient framework for searching text in noisy document images. In: DAS (2012)

87.

Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)

88.

Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS (2014)

89.

Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015)

90.

Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014)

Titel: HWNet v2: an efficient word image representation for handwritten documents
verfasst von: Praveen Krishnan
C. V. Jawahar
Publikationsdatum: 31.07.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 4/2019
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-019-00336-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2019

HanFont: large-scale adaptive Hangul font recognizer using CNN and font clustering

Patch-based offline signature verification using one-class hierarchical deep learning

Evaluation of word spotting under improper segmentation scenario

A novel feature transform framework using deep neural network for multimodal floor plan retrieval