Skip to main content
Top
Published in: Neural Computing and Applications 11/2020

07-06-2019 | Original Article

DenseNet with Up-Sampling block for recognizing texts in images

Authors: Zeming Tang, Weiming Jiang, Zhao Zhang, Mingbo Zhao, Li Zhang, Meng Wang

Published in: Neural Computing and Applications | Issue 11/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The Convolutional Recurrent Neural Networks (CRNN) have achieved a great success for the study of OCR. But existing deep models usually apply the down-sampling in pooling operation to reduce the size of features by dropping some feature information, which may cause the relevant characters with small occupancy rate to be missed. Moreover, all hidden layer units in the cyclic module need to be connected in cyclic layer, which may result in a heavy computation burden. In this paper, we explore to improve the results potentially using Dense Convolutional Network (DenseNet) to replace the convolution network of the CRNN to connect and combine multiple features. Also, we use the up-sampling function to construct an Up-Sampling block to reduce the negative effects of down-sampling in pooling stage and restore the lost information to a certain extent. Thus, informative features can also be extracted with deeper structure. Besides, we also directly use the output of inner convolution parts to describe the label distribution of each frame to make the process efficient. Finally, we propose a new OCR framework, termed DenseNet with Up-Sampling block joint with the connectionist temporal classification, for Chinese recognition. Results on Chinese string dataset show that our model delivers the enhanced performance, compared with several popular deep frameworks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of international conference on computational statistics, Paris, pp 177–186 Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of international conference on computational statistics, Paris, pp 177–186
2.
go back to reference Caulfield HJ, Maloney WT (1969) Improved discrimination in optical character recognition. Appl Opt 8(11):2354–2356CrossRef Caulfield HJ, Maloney WT (1969) Improved discrimination in optical character recognition. Appl Opt 8(11):2354–2356CrossRef
3.
go back to reference Chen HY (2016) TensorFlow—a system for large-scale machine learning. In: USENIX operating system design and implementation, Savannah, GA, USA, pp 265–283 Chen HY (2016) TensorFlow—a system for large-scale machine learning. In: USENIX operating system design and implementation, Savannah, GA, USA, pp 265–283
4.
go back to reference Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:​1406.​1078
5.
go back to reference de Silva AP, Dixon IM, Gunaratne HN, Gunnlaugsson T, Maxwell PR, Rice TE (1999) Integration of logic functions and sequential operation of gates at the molecular-scale. J Am Chem Soc 121(6):1393–1394CrossRef de Silva AP, Dixon IM, Gunaratne HN, Gunnlaugsson T, Maxwell PR, Rice TE (1999) Integration of logic functions and sequential operation of gates at the molecular-scale. J Am Chem Soc 121(6):1393–1394CrossRef
6.
go back to reference Duan K, Keerthi SS, Chu W, Shevade SK, Poo AN (2003) Multi-category classification by soft-max combination of binary classifiers. In: Multiple classifier systems, international workshop, MCS. DBLP, GuilfordCrossRef Duan K, Keerthi SS, Chu W, Shevade SK, Poo AN (2003) Multi-category classification by soft-max combination of binary classifiers. In: Multiple classifier systems, international workshop, MCS. DBLP, GuilfordCrossRef
7.
go back to reference Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach Learn 7(2–3):195–225 Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach Learn 7(2–3):195–225
8.
go back to reference Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2013) Multi-digit number recognition from street view imagery using deep convolutional neural networks, arXiv preprint arXiv:1312.6082 Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2013) Multi-digit number recognition from street view imagery using deep convolutional neural networks, arXiv preprint arXiv:​1312.​6082
9.
go back to reference Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, ACM, Carnegie Mellon University, Pittsburgh, pp 369–376 Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, ACM, Carnegie Mellon University, Pittsburgh, pp 369–376
10.
go back to reference Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, Vancouver, British Columbia, Canada, pp 6645–6649 Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, Vancouver, British Columbia, Canada, pp 6645–6649
11.
go back to reference Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp 2315–2324 Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp 2315–2324
12.
go back to reference Hara K, Saito D, Shouno H (2015) Analysis of function of rectified linear unit used in deep learning. In: International joint conference on neural networks, Killarney, Ireland, pp 1–8 Hara K, Saito D, Shouno H (2015) Analysis of function of rectified linear unit used in deep learning. In: International joint conference on neural networks, Killarney, Ireland, pp 1–8
13.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp 770–778
14.
go back to reference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
15.
go back to reference Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, Hawaii Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, Hawaii
16.
go back to reference Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:​1502.​03167
17.
go back to reference Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, Orlando, FL, USA, pp 675–678 Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, Orlando, FL, USA, pp 675–678
18.
19.
go back to reference Kopf J, Cohen MF, Lischinski D, Uyttendaele M (2007) Joint bilateral upsampling. ACM Trans Graph 26(3):96CrossRef Kopf J, Cohen MF, Lischinski D, Uyttendaele M (2007) Joint bilateral upsampling. ACM Trans Graph 26(3):96CrossRef
20.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, Curran Associates Inc, Lake Tahoe Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, Curran Associates Inc, Lake Tahoe
21.
22.
go back to reference Lefebvre G, Berlemont S, Mamalet F, Garcia C (2013) BLSTM-RNN based 3D gesture classification. In: International conference on artificial neural networks, Berlin, Heidelberg, pp 381–388 Lefebvre G, Berlemont S, Mamalet F, Garcia C (2013) BLSTM-RNN based 3D gesture classification. In: International conference on artificial neural networks, Berlin, Heidelberg, pp 381–388
23.
go back to reference Li W, Cao L, Zhao D, Cui X (2013) CRNN: Integrating classification rules into neural network. In: International joint conference on neural networks, Dallas, Texas, USA, pp 1–8 Li W, Cao L, Zhao D, Cui X (2013) CRNN: Integrating classification rules into neural network. In: International joint conference on neural networks, Dallas, Texas, USA, pp 1–8
24.
go back to reference McBride-Chang C, Shu H, Zhou A, Wat CP, Wagner RK (2003) Morphological awareness uniquely predicts young children’s Chinese character recognition. J Educ Psychol 95(4):743CrossRef McBride-Chang C, Shu H, Zhou A, Wat CP, Wagner RK (2003) Morphological awareness uniquely predicts young children’s Chinese character recognition. J Educ Psychol 95(4):743CrossRef
25.
go back to reference Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Berlin, Heidelberg, pp 770–783 Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Berlin, Heidelberg, pp 770–783
26.
go back to reference Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, Santiago, Chile, pp 1520–1528 Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, Santiago, Chile, pp 1520–1528
27.
go back to reference Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Berg AC (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRef Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Berg AC (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRef
28.
go back to reference Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE international conference on computer vision workshops, Sydney, Australia, pp 397–403 Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE international conference on computer vision workshops, Sydney, Australia, pp 397–403
29.
30.
go back to reference Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH
31.
go back to reference Stasser G, Titus W (1985) Pooling of unshared information in group decision making: biased information sampling during discussion. J Pers Soc Psychol 48(6):1467CrossRef Stasser G, Titus W (1985) Pooling of unshared information in group decision making: biased information sampling during discussion. J Pers Soc Psychol 48(6):1467CrossRef
32.
go back to reference Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
33.
go back to reference Tang Z, Zhang Z, Ma X, Qin J, Zhao M (2018) Robust neighborhood preserving low-rank sparse CNN features for classification. In: Proceedings of the 19th pacific-rim conference on multimedia, Hefei, ChinaCrossRef Tang Z, Zhang Z, Ma X, Qin J, Zhao M (2018) Robust neighborhood preserving low-rank sparse CNN features for classification. In: Proceedings of the 19th pacific-rim conference on multimedia, Hefei, ChinaCrossRef
34.
go back to reference Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Amsterdam, The Netherlands, pp 56–72 Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Amsterdam, The Netherlands, pp 56–72
35.
go back to reference Wang T, Liu C (2018) Fully convolutional network based skeletonization for handwritten chinese characters. In: AAAI conference on artificial intelligence, New Orleans, Louisiana Wang T, Liu C (2018) Fully convolutional network based skeletonization for handwritten chinese characters. In: AAAI conference on artificial intelligence, New Orleans, Louisiana
36.
go back to reference Xiao X, Jin L, Yang Y, Yang W, Sun J, Chang T (2017) Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition. Pattern Recogn 72:72–81CrossRef Xiao X, Jin L, Yang Y, Yang W, Sun J, Chang T (2017) Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition. Pattern Recogn 72:72–81CrossRef
37.
go back to reference Zhang Z, Shao L, Xu Y, Liu L, Yang J (2017) Marginal representation learning with graph structure self-adaptation. IEEE Trans Neural Netw Learn Syst 99:1–15CrossRef Zhang Z, Shao L, Xu Y, Liu L, Yang J (2017) Marginal representation learning with graph structure self-adaptation. IEEE Trans Neural Netw Learn Syst 99:1–15CrossRef
38.
go back to reference Zhang Z, Xu Y, Shao L, Yang J (2018) Discriminative block-diagonal representation learning for image recognition. IEEE Trans Neural Netw Learn Syst 29(7):3111–3125MathSciNetCrossRef Zhang Z, Xu Y, Shao L, Yang J (2018) Discriminative block-diagonal representation learning for image recognition. IEEE Trans Neural Netw Learn Syst 29(7):3111–3125MathSciNetCrossRef
39.
go back to reference Zhang Z, Liu L, Shen F, Shen HT, Shao L (2019) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782CrossRef Zhang Z, Liu L, Shen F, Shen HT, Shao L (2019) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782CrossRef
40.
go back to reference Zhang Z, Jiang W, Qin J, Zhang L, Li F, Zhang M, Yan S (2018) Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Trans Neural Netw Learn Syst 29(8):3798–3814MathSciNetCrossRef Zhang Z, Jiang W, Qin J, Zhang L, Li F, Zhang M, Yan S (2018) Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Trans Neural Netw Learn Syst 29(8):3798–3814MathSciNetCrossRef
Metadata
Title
DenseNet with Up-Sampling block for recognizing texts in images
Authors
Zeming Tang
Weiming Jiang
Zhao Zhang
Mingbo Zhao
Li Zhang
Meng Wang
Publication date
07-06-2019
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 11/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-019-04285-8

Other articles of this Issue 11/2020

Neural Computing and Applications 11/2020 Go to the issue

Premium Partner