Top

Neural Processing Letters

Published in:

02-01-2020

Generating Text Sequence Images for Recognition

Authors: Yanxiang Gong, Linjie Deng, Zheng Ma, Mei Xie

Published in: Neural Processing Letters | Issue 2/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Recently, methods based on deep learning have dominated the field of text recognition. With a large number of training data, most of them can achieve the state-of-the-art performances. However, it is hard to harvest and label sufficient text sequence images from the real scenes. To mitigate this issue, several methods to synthesize text sequence images were proposed, yet they usually need complicated preceding or follow-up steps. In this work, we present a method which is able to generate infinite training data without any auxiliary pre/post-process. We tackle the generation task as an image-to-image translation one and utilize conditional adversarial networks to produce realistic text sequence images in the light of the semantic ones. Some evaluation metrics are involved to assess our method and the results demonstrate that the caliber of the data is satisfactory. The code and dataset will be publicly available soon.

previous article Exponential Lag Synchronization and Global Dissipativity for Delayed Fuzzy Cohen–Grossberg Neural Networks with Discontinuous Activations

next article Growing Self-Organizing Maps for Nonlinear Time-Varying Function Approximation

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

https://github.com/meijieru/crnn.pytorch.

https://github.com/clovaai/deep-text-recognition-benchmark.

Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? Dataset and model analysis. arXiv:1904.01906

Bai F, Cheng Z, Niu Y, Pu S, Zhou S (2018) Edit probability for scene text recognition. arXiv:1805.03384

Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE international conference on computer vision, pp 5076–5084

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 369–376

Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324

He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, pp 6626–6637

10.

Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRef

11.

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456

12.

Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 5967–5976

13.

Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Reading text in the wild with convolutional neural networks. arXiv:1412.1842

14.

Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227

15.

Jaderberg M, Simonyan K, Zisserman A, et al (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025

16.

Jung J, Lee S, Cho MS, Kim JH (2011) Touch tt: scene text extractor using touchscreen interface. ETRI J 33(1):78–88CrossRef

17.

Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) ICDAR 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition (ICDAR), IEEE, pp 1484–1493

18.

Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

19.

Lang K, Mitchell T (1999) Newsgroup 20 dataset

20.

Lee CY, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2231–2239

21.

Lee CY, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2231–2239

22.

Liu W, Chen C, Wong KYK, Su Z, Han J (2016) Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol 2, p 7

23.

Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784

24.

Mishra A, Alahari K, Jawahar CV (2012) Scene text recognition using higher order language priors. In: BMVC

25.

Paszke A, Gross S, Chintala S, Chanan G (2017) PyTorch

26.

Pérez P, Gangnet M, Blake A (2003) Poisson image editing. ACM Trans Graph (TOG) 22(3):313–318CrossRef

27.

Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241

28.

Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242

29.

Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176

30.

Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176

31.

Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRef

32.

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

33.

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

34.

Yao C, Wu J, Zhou X, Zhang C, Zhou S, Cao Z, Yin Q (2015) Incidental scene text understanding: recent progresses on ICDAR 2015 robust reading competition challenge 4. arXiv:1511.09207

35.

Zhan F, Lu S, Xue C (2018) Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: European conference on computer vision. Springer, pp 257–273

36.

Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915

Title: Generating Text Sequence Images for Recognition
Authors: Yanxiang Gong
Linjie Deng
Zheng Ma
Mei Xie
Publication date: 02-01-2020
Publisher: Springer US
Published in: Neural Processing Letters / Issue 2/2020
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-019-10166-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2020

A Design Strategy for the Efficient Implementation of Random Basis Neural Networks on Resource-Constrained Devices

Class-specific discriminant regularization in real-time deep CNN models for binary classification problems

Multi-parallel Extreme Learning Machine with Excitatory and Inhibitory Neurons for Regression

Piecewise Pseudo Almost-Periodic Solutions of Impulsive Fuzzy Cellular Neural Networks with Mixed Delays

Finite-Time Performance State Estimation of Recurrent Neural Networks with Sampled-Data Signals

Lagrange Stability for Delayed-Impulses in Discrete-Time Cohen–Grossberg Neural Networks with Delays