Top

International Journal on Document Analysis and Recognition (IJDAR)

Published in:

15-11-2019 | Original Paper

MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes

Authors: Guofeng Tong, Yong Li, Huashuai Gao, Huairong Chen, Hao Wang, Xiang Yang

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 2/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The recognition methods for Chinese text lines, as an important component of optical character recognition, have been widely applied in many specific tasks. However, there are still some potential challenges: (1) lack of open Chinese text recognition dataset; (2) challenges caused by the characteristics of Chinese characters, e.g., diverse types, complex structure and various sizes; (3) difficulties brought by text images in different scenes, e.g., blur, illumination and distortion. In order to address these challenges, we propose an end-to-end recognition method based on convolutional recurrent neural networks (CRNNs), i.e., multi-scale attention CRNN, which adds three components on the basis of a CRNN: asymmetric convolution, feature reuse network and attention mechanism. The proposed model is mainly aimed at scene text recognition including Chinese characters. Then the model is trained and tested on two Chinese text recognition datasets, i.e., the open dataset MTWI and our constructed large-scale Chinese text line dataset collected from various scenes. The experimental results demonstrate that the proposed method achieves better performance than other methods.

previous article Fast multi-language LSTM-based online handwriting recognition

next article An adaptive document recognition system for lettrines

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 29–41 (2011)CrossRef

Bai, X., Yao, C., Liu, W.: Strokelets: a learned multi-scale mid-level representation for scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)

Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1457–1464 (2012)

Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Tech. Rep. TR-2011-114, Microsoft Research

Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1564–1567 (2006)CrossRef

Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)CrossRef

Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRef

Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)

Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017)

10.

Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2017)

11.

Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceeding of International Joint Conference on Neural Networks, pp. 2809–2813 (2011)

12.

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceeding of International Conference on Learning Representations (2015)

13.

Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceeding of International Conference on Machine Learning, pp. 369–376 (2006)

14.

Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: Reading text in uncontrolled conditions. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 785–792 (2013)

15.

Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid HMM maxout models (2013). arXiv preprint arXiv:1310.1811

16.

Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceeding of European Conference on Computer Vision, pp. 512–528 (2014)CrossRef

17.

Mishra, A., Alahari, K., Jawahar, C. V.: Scene text recognition using higher order language priors. In: Proceeding of British Machine Vision Conference (2012)

18.

Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Proceeding of European Conference on Computer Vision, pp. 752–765 (2012)CrossRef

19.

Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 398–402 (2013)

20.

Rodriguez-Serrano, J.A., Perronnin, F.C.: Label embedding for text recognition. In: Proceeding of British Machine Vision Conference, pp. 633–646 (2013)

21.

Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceeding of Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

22.

Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391 (2010)

23.

Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)CrossRef

24.

Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)CrossRef

25.

Cong, F., Hu, W., Huo, Q., Guo, L.: A comparative study of attention-based encoder-decoder approaches to natural scene text recognition. In: Proceeding of 15th International Conference on Document Analysis and Recognition (2019)

26.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

27.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

28.

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473

29.

He M, Liu Y, Yang Z, et al.: ICPR2018 contest on robust reading for multi-type web images. In: Proceeding of 24th International Conference on Pattern Recognition (ICPR), pp. 7–12 (2018)

30.

Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Proceeding of Advances in Neural Information Processing Systems, pp. 335–344 (2017)

31.

Zhang, J., Zhu, Y., Du, J., Dai, L.: RAN: Radical analysis networks for zero-shot learning of Chinese characters (2017). arXiv preprint arXiv:1711.01889

Title: MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes
Authors: Guofeng Tong
Yong Li
Huashuai Gao
Huairong Chen
Hao Wang
Xiang Yang
Publication date: 15-11-2019
Publisher: Springer Berlin Heidelberg
Published in: International Journal on Document Analysis and Recognition (IJDAR) / Issue 2/2020
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-019-00348-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2020

Exploiting complexity in pen- and touch-based signature biometrics

An adaptive document recognition system for lettrines

A general framework for the recognition of online handwritten graphics

Fast multi-language LSTM-based online handwriting recognition

Premium Partner