Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 2/2020

15-11-2019 | Original Paper

MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes

Authors: Guofeng Tong, Yong Li, Huashuai Gao, Huairong Chen, Hao Wang, Xiang Yang

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 2/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The recognition methods for Chinese text lines, as an important component of optical character recognition, have been widely applied in many specific tasks. However, there are still some potential challenges: (1) lack of open Chinese text recognition dataset; (2) challenges caused by the characteristics of Chinese characters, e.g., diverse types, complex structure and various sizes; (3) difficulties brought by text images in different scenes, e.g., blur, illumination and distortion. In order to address these challenges, we propose an end-to-end recognition method based on convolutional recurrent neural networks (CRNNs), i.e., multi-scale attention CRNN, which adds three components on the basis of a CRNN: asymmetric convolution, feature reuse network and attention mechanism. The proposed model is mainly aimed at scene text recognition including Chinese characters. Then the model is trained and tested on two Chinese text recognition datasets, i.e., the open dataset MTWI and our constructed large-scale Chinese text line dataset collected from various scenes. The experimental results demonstrate that the proposed method achieves better performance than other methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 29–41 (2011)CrossRef Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 29–41 (2011)CrossRef
2.
go back to reference Bai, X., Yao, C., Liu, W.: Strokelets: a learned multi-scale mid-level representation for scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014) Bai, X., Yao, C., Liu, W.: Strokelets: a learned multi-scale mid-level representation for scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)
3.
go back to reference Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1457–1464 (2012) Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1457–1464 (2012)
4.
go back to reference Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Tech. Rep. TR-2011-114, Microsoft Research Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Tech. Rep. TR-2011-114, Microsoft Research
5.
go back to reference Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1564–1567 (2006)CrossRef Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1564–1567 (2006)CrossRef
6.
go back to reference Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)CrossRef Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)CrossRef
7.
go back to reference Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRef
8.
go back to reference Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016) Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
9.
go back to reference Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017) Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017)
10.
go back to reference Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2017) Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2017)
11.
go back to reference Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceeding of International Joint Conference on Neural Networks, pp. 2809–2813 (2011) Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceeding of International Joint Conference on Neural Networks, pp. 2809–2813 (2011)
12.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceeding of International Conference on Learning Representations (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceeding of International Conference on Learning Representations (2015)
13.
go back to reference Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceeding of International Conference on Machine Learning, pp. 369–376 (2006) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceeding of International Conference on Machine Learning, pp. 369–376 (2006)
14.
go back to reference Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: Reading text in uncontrolled conditions. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 785–792 (2013) Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: Reading text in uncontrolled conditions. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 785–792 (2013)
15.
16.
go back to reference Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceeding of European Conference on Computer Vision, pp. 512–528 (2014)CrossRef Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceeding of European Conference on Computer Vision, pp. 512–528 (2014)CrossRef
17.
go back to reference Mishra, A., Alahari, K., Jawahar, C. V.: Scene text recognition using higher order language priors. In: Proceeding of British Machine Vision Conference (2012) Mishra, A., Alahari, K., Jawahar, C. V.: Scene text recognition using higher order language priors. In: Proceeding of British Machine Vision Conference (2012)
18.
go back to reference Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Proceeding of European Conference on Computer Vision, pp. 752–765 (2012)CrossRef Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Proceeding of European Conference on Computer Vision, pp. 752–765 (2012)CrossRef
19.
go back to reference Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 398–402 (2013) Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 398–402 (2013)
20.
go back to reference Rodriguez-Serrano, J.A., Perronnin, F.C.: Label embedding for text recognition. In: Proceeding of British Machine Vision Conference, pp. 633–646 (2013) Rodriguez-Serrano, J.A., Perronnin, F.C.: Label embedding for text recognition. In: Proceeding of British Machine Vision Conference, pp. 633–646 (2013)
21.
go back to reference Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceeding of Advances in Neural Information Processing Systems, pp. 3104–3112 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceeding of Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
22.
go back to reference Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391 (2010) Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391 (2010)
23.
go back to reference Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)CrossRef Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)CrossRef
24.
go back to reference Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)CrossRef Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)CrossRef
25.
go back to reference Cong, F., Hu, W., Huo, Q., Guo, L.: A comparative study of attention-based encoder-decoder approaches to natural scene text recognition. In: Proceeding of 15th International Conference on Document Analysis and Recognition (2019) Cong, F., Hu, W., Huo, Q., Guo, L.: A comparative study of attention-based encoder-decoder approaches to natural scene text recognition. In: Proceeding of 15th International Conference on Document Analysis and Recognition (2019)
26.
go back to reference Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
27.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
28.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473 Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:​1409.​0473
29.
go back to reference He M, Liu Y, Yang Z, et al.: ICPR2018 contest on robust reading for multi-type web images. In: Proceeding of 24th International Conference on Pattern Recognition (ICPR), pp. 7–12 (2018) He M, Liu Y, Yang Z, et al.: ICPR2018 contest on robust reading for multi-type web images. In: Proceeding of 24th International Conference on Pattern Recognition (ICPR), pp. 7–12 (2018)
30.
go back to reference Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Proceeding of Advances in Neural Information Processing Systems, pp. 335–344 (2017) Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Proceeding of Advances in Neural Information Processing Systems, pp. 335–344 (2017)
31.
go back to reference Zhang, J., Zhu, Y., Du, J., Dai, L.: RAN: Radical analysis networks for zero-shot learning of Chinese characters (2017). arXiv preprint arXiv:1711.01889 Zhang, J., Zhu, Y., Du, J., Dai, L.: RAN: Radical analysis networks for zero-shot learning of Chinese characters (2017). arXiv preprint arXiv:​1711.​01889
Metadata
Title
MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes
Authors
Guofeng Tong
Yong Li
Huashuai Gao
Huairong Chen
Hao Wang
Xiang Yang
Publication date
15-11-2019
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 2/2020
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-019-00348-7

Other articles of this Issue 2/2020

International Journal on Document Analysis and Recognition (IJDAR) 2/2020 Go to the issue

Premium Partner