Top

Neural Computing and Applications

Published in:

07-06-2019 | Original Article

DenseNet with Up-Sampling block for recognizing texts in images

Authors: Zeming Tang, Weiming Jiang, Zhao Zhang, Mingbo Zhao, Li Zhang, Meng Wang

Published in: Neural Computing and Applications | Issue 11/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The Convolutional Recurrent Neural Networks (CRNN) have achieved a great success for the study of OCR. But existing deep models usually apply the down-sampling in pooling operation to reduce the size of features by dropping some feature information, which may cause the relevant characters with small occupancy rate to be missed. Moreover, all hidden layer units in the cyclic module need to be connected in cyclic layer, which may result in a heavy computation burden. In this paper, we explore to improve the results potentially using Dense Convolutional Network (DenseNet) to replace the convolution network of the CRNN to connect and combine multiple features. Also, we use the up-sampling function to construct an Up-Sampling block to reduce the negative effects of down-sampling in pooling stage and restore the lost information to a certain extent. Thus, informative features can also be extracted with deeper structure. Besides, we also directly use the output of inner convolution parts to describe the label distribution of each frame to make the process efficient. Finally, we propose a new OCR framework, termed DenseNet with Up-Sampling block joint with the connectionist temporal classification, for Chinese recognition. Results on Chinese string dataset show that our model delivers the enhanced performance, compared with several popular deep frameworks.

previous article Local bit-plane decoded convolutional neural network features for biomedical image retrieval

next article A data ensemble approach for real-time air quality forecasting using extremely randomized trees and deep neural networks

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of international conference on computational statistics, Paris, pp 177–186

Caulfield HJ, Maloney WT (1969) Improved discrimination in optical character recognition. Appl Opt 8(11):2354–2356CrossRef

Chen HY (2016) TensorFlow—a system for large-scale machine learning. In: USENIX operating system design and implementation, Savannah, GA, USA, pp 265–283

Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

de Silva AP, Dixon IM, Gunaratne HN, Gunnlaugsson T, Maxwell PR, Rice TE (1999) Integration of logic functions and sequential operation of gates at the molecular-scale. J Am Chem Soc 121(6):1393–1394CrossRef

Duan K, Keerthi SS, Chu W, Shevade SK, Poo AN (2003) Multi-category classification by soft-max combination of binary classifiers. In: Multiple classifier systems, international workshop, MCS. DBLP, GuilfordCrossRef

Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach Learn 7(2–3):195–225

Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2013) Multi-digit number recognition from street view imagery using deep convolutional neural networks, arXiv preprint arXiv:1312.6082

Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, ACM, Carnegie Mellon University, Pittsburgh, pp 369–376

10.

Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, Vancouver, British Columbia, Canada, pp 6645–6649

11.

Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp 2315–2324

12.

Hara K, Saito D, Shouno H (2015) Analysis of function of rectified linear unit used in deep learning. In: International joint conference on neural networks, Killarney, Ireland, pp 1–8

13.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp 770–778

14.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

15.

Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, Hawaii

16.

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

17.

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, Orlando, FL, USA, pp 675–678

18.

Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69MathSciNetCrossRef

19.

Kopf J, Cohen MF, Lischinski D, Uyttendaele M (2007) Joint bilateral upsampling. ACM Trans Graph 26(3):96CrossRef

20.

Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, Curran Associates Inc, Lake Tahoe

21.

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436CrossRef

22.

Lefebvre G, Berlemont S, Mamalet F, Garcia C (2013) BLSTM-RNN based 3D gesture classification. In: International conference on artificial neural networks, Berlin, Heidelberg, pp 381–388

23.

Li W, Cao L, Zhao D, Cui X (2013) CRNN: Integrating classification rules into neural network. In: International joint conference on neural networks, Dallas, Texas, USA, pp 1–8

24.

McBride-Chang C, Shu H, Zhou A, Wat CP, Wagner RK (2003) Morphological awareness uniquely predicts young children’s Chinese character recognition. J Educ Psychol 95(4):743CrossRef

25.

Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Berlin, Heidelberg, pp 770–783

26.

Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, Santiago, Chile, pp 1520–1528

27.

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Berg AC (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRef

28.

Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE international conference on computer vision workshops, Sydney, Australia, pp 397–403

29.

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

30.

Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH

31.

Stasser G, Titus W (1985) Pooling of unshared information in group decision making: biased information sampling during discussion. J Pers Soc Psychol 48(6):1467CrossRef

32.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

33.

Tang Z, Zhang Z, Ma X, Qin J, Zhao M (2018) Robust neighborhood preserving low-rank sparse CNN features for classification. In: Proceedings of the 19th pacific-rim conference on multimedia, Hefei, ChinaCrossRef

34.

Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Amsterdam, The Netherlands, pp 56–72

35.

Wang T, Liu C (2018) Fully convolutional network based skeletonization for handwritten chinese characters. In: AAAI conference on artificial intelligence, New Orleans, Louisiana

36.

Xiao X, Jin L, Yang Y, Yang W, Sun J, Chang T (2017) Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition. Pattern Recogn 72:72–81CrossRef

37.

Zhang Z, Shao L, Xu Y, Liu L, Yang J (2017) Marginal representation learning with graph structure self-adaptation. IEEE Trans Neural Netw Learn Syst 99:1–15CrossRef

38.

Zhang Z, Xu Y, Shao L, Yang J (2018) Discriminative block-diagonal representation learning for image recognition. IEEE Trans Neural Netw Learn Syst 29(7):3111–3125MathSciNetCrossRef

39.

Zhang Z, Liu L, Shen F, Shen HT, Shao L (2019) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782CrossRef

40.

Zhang Z, Jiang W, Qin J, Zhang L, Li F, Zhang M, Yan S (2018) Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Trans Neural Netw Learn Syst 29(8):3798–3814MathSciNetCrossRef

41.

Zhang Y, Zhang Z, Li S, Qin J, Liu GC, Wang M, Yan SC (2018) Unsupervised nonnegative adaptive feature extraction for data representation. IEEE Trans Knowl Data Eng (Early Access). https://doi.org/10.1109/TKDE.2018.2877746 CrossRef

42.

Zhang Z, Zhang Y, Liu G, Tang J, Yan S, Wang M (2019) Joint label prediction based semi-supervised adaptive concept factorization for robust data representation. IEEE Trans Knowl Data Eng (Early Access). https://doi.org/10.1109/TKDE.2019.2893956 CrossRef

Title: DenseNet with Up-Sampling block for recognizing texts in images
Authors: Zeming Tang
Weiming Jiang
Zhao Zhang
Mingbo Zhao
Li Zhang
Meng Wang
Publication date: 07-06-2019
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 11/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-019-04285-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 11/2020

An efficient hybrid approach of improved adaptive neural fuzzy inference system and teaching learning-based optimization for design optimization of a jet pump-based thermoacoustic-Stirling heat engine

Deep understanding of big multimedia data

Crosstalk modeling in high-speed transmission lines by multilayer perceptron neural networks

Local bit-plane decoded convolutional neural network features for biomedical image retrieval

A comparison of modified tree–seed algorithm for high-dimensional numerical functions

SP-J48: a novel optimization and machine-learning-based approach for solving complex problems: special application in software engineering for detecting code smells

Premium Partner