nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

02.07.2019 | Special Issue Paper

Dynamic temporal residual network for sequence modeling

verfasst von: Ruijie Yan, Liangrui Peng, Shanyu Xiao, Michael T. Johnson, Shengjin Wang

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The long short-term memory (LSTM) network with gating mechanism has been widely used in sequence modeling tasks including handwriting and speech recognition. As an LSTM network can be unfolded along the temporal dimension and its temporal depth is equal to the length of the input feature sequence, the introduction of gating might not be sufficient to completely model the dynamic temporal dependencies in sequential data. Inspired by the residual learning in ResNet, this paper proposes a dynamic temporal residual network (DTRN) by incorporating residual learning into an LSTM network along the temporal dimension. DTRN involves two networks: Its primary network consists of modified LSTM units with weighted shortcut connections for adjacent temporal outputs, while its secondary network generates dynamic weights for the shortcut connections. To validate the performance of DTRN, we conduct experiments on three commonly used public handwriting recognition datasets (IFN/ENIT, IAM and Rimes) and one speech recognition dataset (TIMIT). The experimental results show that the proposed DTRN has outperformed previously reported methods.

Vorheriger Artikel Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content

Nächster Artikel A comparison of local features for camera-based document image retrieval and spotting

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://github.com/tmbdev/ocropy.

Abandah, G.A., Jamour, F.T., Qaralleh, E.A.: Recognizing handwritten Arabic words using grapheme segmentation and recurrent neural networks. Int. J. Doc. Anal. Recognit. 17(3), 275–291 (2014)CrossRef

Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: Proceedings of the International Conference on Machine Learning, pp. 173–182 (2016)

Bluche, T., Louradour, J., Messina, R.: Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 1050–1055 (2017)

Chang, S., Zhang, Y., Han, W., Yu, M., Guo, X., Tan, W., Cui, X., Witbrock, M., Hasegawa-Johnson, M.A., Huang, T.S.: Dilated recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 77–87 (2017)

Chen, K., Huo, Q.: Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. IEEE Trans. Audio Speech Lang. Process. 24(7), 1185–1193 (2016)CrossRef

Chorowski, J., Jaitly, N.: Towards better decoding and language model integration in sequence to sequence models (2016). arXiv preprint arXiv:1612.02695

Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef

Ding, H., Chen, K., Yuan, Y., Cai, M., Sun, L., Liang, S., Huo, Q.: A compact CNN-DBLSTM based character model for offline handwriting recognition with Tucker decomposition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 507–512 (2017)

Elman, J.L.: Finding structure in time. Cognit. Sci. 14(2), 179–211 (1990)CrossRef

10.

Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon technical report 93 (1993)

11.

Graves, A.: Connectionist temporal classification. In: Supervised Sequence Labelling with Recurrent Neural Networks, pp. 5–13. Springer (2012)

12.

Graves, A.: Offline Arabic handwriting recognition with multidimensional recurrent neural networks. In: Guide to OCR for Arabic Scripts, pp. 297–313. Springer (2012)

13.

Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013)

14.

Grosicki, E., Carre, M., Brodin, J.M., Geoffrois, E.: RIMES evaluation campaign for handwritten mail processing. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 1–6 (2008)

15.

Gui, T., Zhang, Q., Zhao, L., Lin, Y., Peng, M., Gong, J., Huang, X.: Long short-term memory with dynamic skip connections (2018). arXiv preprint arXiv:1811.03873

16.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

17.

Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(2), 107–116 (1998)MathSciNetCrossRefMATH

18.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

19.

Hu, W., Cai, M., Chen, K., Ding, H., Sun, L., Liang, S., Mo, X., Huo, Q.: Sequence discriminative training for offline handwriting recognition by an interpolated CTC and lattice-free MMI objective function. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 61–66 (2017)

20.

Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167

21.

Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37(11), 1641–1648 (1989)CrossRef

22.

Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)CrossRefMATH

23.

Menasri, F., Louradour, J., Bianne-Bernard, A.L., Kermorvant, C.: The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition. In: Proceedings of Document Recognition and Retrieval, p. 8297Y (2012)

24.

Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks (2013). arXiv preprint arXiv:1312.6026

25.

Paszke, A., Gross, S., Soumith, C., et. al.: Automatic differentiation in PyTorch. In: NIPS 2017 Autodiff Workshop (2017)

26.

Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H., et al.: IFN/ENIT-database of handwritten Arabic words. In: Proceedings of the Colloque International Francophone sur l’Ecrit et le Document, vol. 2, pp. 127–136 (2002)

27.

Pei, W., Baltrusaitis, T., Tax, D.M., Morency, L.P.: Temporal attention-gated model for robust sequence classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 820–829 (2017)

28.

Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 285–290 (2014)

29.

Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 67–72 (2017)

30.

Puigcerver, J., Martin-Albo, D., Villegas, M.: Laia: a deep learning toolkit for HTR. https://github.com/jpuigcerver/Laia (2016). GitHub repository

31.

Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef

32.

Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks (2015). arXiv preprint arXiv:1505.00387

33.

Tieleman, T., Hinton, G.: Lecture 6.5-RmsProp: divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning (2012)

34.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

35.

Wu, Y., Schuster, M., Chen, Z., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). arXiv preprint arXiv:1609.08144

36.

Wu, Y.C., Yin, F., Chen, Z., Liu, C.L.: Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 79–84 (2017)

37.

Yousefi, M.R., Soheili, M.R., Breuel, T.M., Stricker, D.: A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic. In: Proceedings of Document Recognition and Retrieval, p. 94020H (2015)

38.

Yue, B., Fu, J., Liang, J.: Residual recurrent neural networks for learning sequential representations. Information 9(3), 56 (2018)CrossRef

39.

Zhang, Y., Chen, G., Yu, D., Yaco, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNs for distant speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 5755–5759 (2016)

Titel: Dynamic temporal residual network for sequence modeling
verfasst von: Ruijie Yan
Liangrui Peng
Shanyu Xiao
Michael T. Johnson
Shengjin Wang
Publikationsdatum: 02.07.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 3/2019
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-019-00328-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2019

Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement

On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model

Boosting scene character recognition by learning canonical forms of glyphs

Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content

Handwritten Arabic text recognition using multi-stage sub-core-shape HMMs

Comic MTL: optimized multi-task learning for comic book image analysis