Skip to main content
Erschienen in: Neural Processing Letters 2/2020

02.01.2020

Generating Text Sequence Images for Recognition

verfasst von: Yanxiang Gong, Linjie Deng, Zheng Ma, Mei Xie

Erschienen in: Neural Processing Letters | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently, methods based on deep learning have dominated the field of text recognition. With a large number of training data, most of them can achieve the state-of-the-art performances. However, it is hard to harvest and label sufficient text sequence images from the real scenes. To mitigate this issue, several methods to synthesize text sequence images were proposed, yet they usually need complicated preceding or follow-up steps. In this work, we present a method which is able to generate infinite training data without any auxiliary pre/post-process. We tackle the generation task as an image-to-image translation one and utilize conditional adversarial networks to produce realistic text sequence images in the light of the semantic ones. Some evaluation metrics are involved to assess our method and the results demonstrate that the caliber of the data is satisfactory. The code and dataset will be publicly available soon.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? Dataset and model analysis. arXiv:1904.01906 Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? Dataset and model analysis. arXiv:​1904.​01906
3.
Zurück zum Zitat Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE international conference on computer vision, pp 5076–5084 Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE international conference on computer vision, pp 5076–5084
4.
Zurück zum Zitat Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
5.
Zurück zum Zitat Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 369–376 Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 369–376
6.
Zurück zum Zitat Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324 Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
7.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034 He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
8.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
9.
Zurück zum Zitat Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, pp 6626–6637 Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, pp 6626–6637
10.
Zurück zum Zitat Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRef Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRef
11.
Zurück zum Zitat Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
12.
Zurück zum Zitat Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 5967–5976 Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 5967–5976
13.
Zurück zum Zitat Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Reading text in the wild with convolutional neural networks. arXiv:1412.1842 Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Reading text in the wild with convolutional neural networks. arXiv:​1412.​1842
14.
Zurück zum Zitat Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227 Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv:​1406.​2227
15.
Zurück zum Zitat Jaderberg M, Simonyan K, Zisserman A, et al (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025 Jaderberg M, Simonyan K, Zisserman A, et al (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025
16.
Zurück zum Zitat Jung J, Lee S, Cho MS, Kim JH (2011) Touch tt: scene text extractor using touchscreen interface. ETRI J 33(1):78–88CrossRef Jung J, Lee S, Cho MS, Kim JH (2011) Touch tt: scene text extractor using touchscreen interface. ETRI J 33(1):78–88CrossRef
17.
Zurück zum Zitat Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) ICDAR 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition (ICDAR), IEEE, pp 1484–1493 Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) ICDAR 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition (ICDAR), IEEE, pp 1484–1493
19.
Zurück zum Zitat Lang K, Mitchell T (1999) Newsgroup 20 dataset Lang K, Mitchell T (1999) Newsgroup 20 dataset
20.
Zurück zum Zitat Lee CY, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2231–2239 Lee CY, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2231–2239
21.
Zurück zum Zitat Lee CY, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2231–2239 Lee CY, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2231–2239
22.
Zurück zum Zitat Liu W, Chen C, Wong KYK, Su Z, Han J (2016) Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol 2, p 7 Liu W, Chen C, Wong KYK, Su Z, Han J (2016) Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol 2, p 7
24.
Zurück zum Zitat Mishra A, Alahari K, Jawahar CV (2012) Scene text recognition using higher order language priors. In: BMVC Mishra A, Alahari K, Jawahar CV (2012) Scene text recognition using higher order language priors. In: BMVC
25.
Zurück zum Zitat Paszke A, Gross S, Chintala S, Chanan G (2017) PyTorch Paszke A, Gross S, Chintala S, Chanan G (2017) PyTorch
26.
Zurück zum Zitat Pérez P, Gangnet M, Blake A (2003) Poisson image editing. ACM Trans Graph (TOG) 22(3):313–318CrossRef Pérez P, Gangnet M, Blake A (2003) Poisson image editing. ACM Trans Graph (TOG) 22(3):313–318CrossRef
27.
Zurück zum Zitat Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241 Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
28.
Zurück zum Zitat Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242 Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242
29.
Zurück zum Zitat Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176 Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176
30.
Zurück zum Zitat Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176 Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176
31.
Zurück zum Zitat Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRef Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRef
32.
33.
Zurück zum Zitat Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
34.
Zurück zum Zitat Yao C, Wu J, Zhou X, Zhang C, Zhou S, Cao Z, Yin Q (2015) Incidental scene text understanding: recent progresses on ICDAR 2015 robust reading competition challenge 4. arXiv:1511.09207 Yao C, Wu J, Zhou X, Zhang C, Zhou S, Cao Z, Yin Q (2015) Incidental scene text understanding: recent progresses on ICDAR 2015 robust reading competition challenge 4. arXiv:​1511.​09207
35.
Zurück zum Zitat Zhan F, Lu S, Xue C (2018) Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: European conference on computer vision. Springer, pp 257–273 Zhan F, Lu S, Xue C (2018) Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: European conference on computer vision. Springer, pp 257–273
36.
Zurück zum Zitat Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915 Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915
Metadaten
Titel
Generating Text Sequence Images for Recognition
verfasst von
Yanxiang Gong
Linjie Deng
Zheng Ma
Mei Xie
Publikationsdatum
02.01.2020
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 2/2020
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-019-10166-x

Weitere Artikel der Ausgabe 2/2020

Neural Processing Letters 2/2020 Zur Ausgabe

Neuer Inhalt