Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 3/2019

04-06-2019 | Special Issue Paper

Boosting scene character recognition by learning canonical forms of glyphs

Authors: Yizhi Wang, Zhouhui Lian, Yingmin Tang, Jianguo Xiao

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As one of the fundamental problems in document analysis, scene character recognition has attracted considerable interests in recent years. But the problem is still considered to be extremely challenging due to many uncontrollable factors including glyph transformation, blur, noisy background, uneven illumination, etc. In this paper, we propose a novel methodology for boosting scene character recognition by learning canonical forms of glyphs, based on the fact that characters appearing in scene images are all derived from their corresponding canonical forms. Our key observation is that more discriminative features can be learned by solving specially designed generative tasks compared to traditional classification-based feature learning frameworks. Specifically, we design a GAN-based model to make the learned deep feature of a given scene character capable of reconstructing corresponding glyphs in a number of standard font styles. In this manner, we obtain deep features for scene characters that are more discriminative in recognition and less sensitive against the above-mentioned factors. Our experiments conducted on several publicly available databases demonstrate the superiority of our method compared to the state of the art.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontrolled conditions. In: IEEE International Conference on Computer Vision, pp. 785–792 (2013) Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontrolled conditions. In: IEEE International Conference on Computer Vision, pp. 785–792 (2013)
2.
go back to reference Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5086–5094 (2017) Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5086–5094 (2017)
3.
go back to reference Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: Aon: towards arbitrarily-oriented text recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5571–5579 (2018) Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: Aon: towards arbitrarily-oriented text recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5571–5579 (2018)
4.
go back to reference Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
5.
go back to reference Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255 (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255 (2009)
6.
go back to reference Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Neural Information Processing Systems, pp. 1486–1494 (2015) Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Neural Information Processing Systems, pp. 1486–1494 (2015)
7.
go back to reference Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Adv. Neural Inf. Process. Syst. 3, 2672–2680 (2014) Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Adv. Neural Inf. Process. Syst. 3, 2672–2680 (2014)
10.
go back to reference Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
11.
go back to reference Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967–5976 (2017) Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967–5976 (2017)
12.
go back to reference Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. In: International Conference on Learning Representations (2015) Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. In: International Conference on Learning Representations (2015)
13.
go back to reference Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)MathSciNetCrossRef Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)MathSciNetCrossRef
14.
go back to reference Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: European Conference on Computer Vision, pp. 512–528 (2014) Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: European Conference on Computer Vision, pp. 512–528 (2014)
15.
go back to reference Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazàn, J.A., de las Heras, L.P.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazàn, J.A., de las Heras, L.P.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)
16.
go back to reference Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015) Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
17.
go back to reference Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2231–2239 (2016) Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2231–2239 (2016)
18.
go back to reference Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. Proc ICDAR 7(2–3), 105–122 (2003) Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. Proc ICDAR 7(2–3), 105–122 (2003)
19.
go back to reference Mishra, A., Alahari, K., Jawahar,C.V.: Scene text recognition using higher order language priors. In: British machine vision conference, pp. 127.1–127.11 (2012) Mishra, A., Alahari, K., Jawahar,C.V.: Scene text recognition using higher order language priors. In: British machine vision conference, pp. 127.1–127.11 (2012)
20.
go back to reference Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International conference on learning representations (2016) Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International conference on learning representations (2016)
21.
go back to reference Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015) Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)
22.
go back to reference Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef
23.
go back to reference Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2018) Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2018)
24.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
25.
go back to reference Tian, S., Bhattacharya, U., Lu, S., Su, B., Wang, Q., Wei, X., Lu, Y., Tan, C.L.: Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognit. 51(C), 125–134 (2016)CrossRef Tian, S., Bhattacharya, U., Lu, S., Su, B., Wang, Q., Wei, X., Lu, Y., Tan, C.L.: Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognit. 51(C), 125–134 (2016)CrossRef
26.
go back to reference Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition, pp. 3304–3308 (2013) Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition, pp. 3304–3308 (2013)
27.
go back to reference Wang, Y., Shi, C., Xiao, B., Wang, C.: Learning spatially embedded discriminative part detectors for scene character recognition. In: IAPR International Conference on Document Analysis and Recognition, pp. 363–368 (2017) Wang, Y., Shi, C., Xiao, B., Wang, C.: Learning spatially embedded discriminative part detectors for scene character recognition. In: IAPR International Conference on Document Analysis and Recognition, pp. 363–368 (2017)
28.
go back to reference Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833 (2014) Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833 (2014)
29.
go back to reference Zhang, Y., Liang, S., Nie, S., Liu, W., Peng, S.: Robust offline handwritten character recognition through exploring writer-independent features under the guidance of printed data. Pattern Recognit. Lett. 106, 20–26 (2018)CrossRef Zhang, Y., Liang, S., Nie, S., Liu, W., Peng, S.: Robust offline handwritten character recognition through exploring writer-independent features under the guidance of printed data. Pattern Recognit. Lett. 106, 20–26 (2018)CrossRef
30.
go back to reference Zhang, Z., Xu, Y., Liu, C.L.: Natural scene character recognition using robust PCA and sparse representation. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 340–345. IEEE (2016) Zhang, Z., Xu, Y., Liu, C.L.: Natural scene character recognition using robust PCA and sparse representation. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 340–345. IEEE (2016)
Metadata
Title
Boosting scene character recognition by learning canonical forms of glyphs
Authors
Yizhi Wang
Zhouhui Lian
Yingmin Tang
Jianguo Xiao
Publication date
04-06-2019
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 3/2019
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-019-00326-z

Other articles of this Issue 3/2019

International Journal on Document Analysis and Recognition (IJDAR) 3/2019 Go to the issue

Premium Partner