Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 3/2019

02.07.2019 | Special Issue Paper

Dynamic temporal residual network for sequence modeling

verfasst von: Ruijie Yan, Liangrui Peng, Shanyu Xiao, Michael T. Johnson, Shengjin Wang

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The long short-term memory (LSTM) network with gating mechanism has been widely used in sequence modeling tasks including handwriting and speech recognition. As an LSTM network can be unfolded along the temporal dimension and its temporal depth is equal to the length of the input feature sequence, the introduction of gating might not be sufficient to completely model the dynamic temporal dependencies in sequential data. Inspired by the residual learning in ResNet, this paper proposes a dynamic temporal residual network (DTRN) by incorporating residual learning into an LSTM network along the temporal dimension. DTRN involves two networks: Its primary network consists of modified LSTM units with weighted shortcut connections for adjacent temporal outputs, while its secondary network generates dynamic weights for the shortcut connections. To validate the performance of DTRN, we conduct experiments on three commonly used public handwriting recognition datasets (IFN/ENIT, IAM and Rimes) and one speech recognition dataset (TIMIT). The experimental results show that the proposed DTRN has outperformed previously reported methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abandah, G.A., Jamour, F.T., Qaralleh, E.A.: Recognizing handwritten Arabic words using grapheme segmentation and recurrent neural networks. Int. J. Doc. Anal. Recognit. 17(3), 275–291 (2014)CrossRef Abandah, G.A., Jamour, F.T., Qaralleh, E.A.: Recognizing handwritten Arabic words using grapheme segmentation and recurrent neural networks. Int. J. Doc. Anal. Recognit. 17(3), 275–291 (2014)CrossRef
2.
Zurück zum Zitat Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: Proceedings of the International Conference on Machine Learning, pp. 173–182 (2016) Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: Proceedings of the International Conference on Machine Learning, pp. 173–182 (2016)
3.
Zurück zum Zitat Bluche, T., Louradour, J., Messina, R.: Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 1050–1055 (2017) Bluche, T., Louradour, J., Messina, R.: Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, pp. 1050–1055 (2017)
4.
Zurück zum Zitat Chang, S., Zhang, Y., Han, W., Yu, M., Guo, X., Tan, W., Cui, X., Witbrock, M., Hasegawa-Johnson, M.A., Huang, T.S.: Dilated recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 77–87 (2017) Chang, S., Zhang, Y., Han, W., Yu, M., Guo, X., Tan, W., Cui, X., Witbrock, M., Hasegawa-Johnson, M.A., Huang, T.S.: Dilated recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 77–87 (2017)
5.
Zurück zum Zitat Chen, K., Huo, Q.: Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. IEEE Trans. Audio Speech Lang. Process. 24(7), 1185–1193 (2016)CrossRef Chen, K., Huo, Q.: Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. IEEE Trans. Audio Speech Lang. Process. 24(7), 1185–1193 (2016)CrossRef
6.
Zurück zum Zitat Chorowski, J., Jaitly, N.: Towards better decoding and language model integration in sequence to sequence models (2016). arXiv preprint arXiv:1612.02695 Chorowski, J., Jaitly, N.: Towards better decoding and language model integration in sequence to sequence models (2016). arXiv preprint arXiv:​1612.​02695
7.
Zurück zum Zitat Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef
8.
Zurück zum Zitat Ding, H., Chen, K., Yuan, Y., Cai, M., Sun, L., Liang, S., Huo, Q.: A compact CNN-DBLSTM based character model for offline handwriting recognition with Tucker decomposition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 507–512 (2017) Ding, H., Chen, K., Yuan, Y., Cai, M., Sun, L., Liang, S., Huo, Q.: A compact CNN-DBLSTM based character model for offline handwriting recognition with Tucker decomposition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 507–512 (2017)
9.
Zurück zum Zitat Elman, J.L.: Finding structure in time. Cognit. Sci. 14(2), 179–211 (1990)CrossRef Elman, J.L.: Finding structure in time. Cognit. Sci. 14(2), 179–211 (1990)CrossRef
10.
Zurück zum Zitat Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon technical report 93 (1993) Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon technical report 93 (1993)
11.
Zurück zum Zitat Graves, A.: Connectionist temporal classification. In: Supervised Sequence Labelling with Recurrent Neural Networks, pp. 5–13. Springer (2012) Graves, A.: Connectionist temporal classification. In: Supervised Sequence Labelling with Recurrent Neural Networks, pp. 5–13. Springer (2012)
12.
Zurück zum Zitat Graves, A.: Offline Arabic handwriting recognition with multidimensional recurrent neural networks. In: Guide to OCR for Arabic Scripts, pp. 297–313. Springer (2012) Graves, A.: Offline Arabic handwriting recognition with multidimensional recurrent neural networks. In: Guide to OCR for Arabic Scripts, pp. 297–313. Springer (2012)
13.
Zurück zum Zitat Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013) Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013)
14.
Zurück zum Zitat Grosicki, E., Carre, M., Brodin, J.M., Geoffrois, E.: RIMES evaluation campaign for handwritten mail processing. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 1–6 (2008) Grosicki, E., Carre, M., Brodin, J.M., Geoffrois, E.: RIMES evaluation campaign for handwritten mail processing. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 1–6 (2008)
15.
Zurück zum Zitat Gui, T., Zhang, Q., Zhao, L., Lin, Y., Peng, M., Gong, J., Huang, X.: Long short-term memory with dynamic skip connections (2018). arXiv preprint arXiv:1811.03873 Gui, T., Zhang, Q., Zhao, L., Lin, Y., Peng, M., Gong, J., Huang, X.: Long short-term memory with dynamic skip connections (2018). arXiv preprint arXiv:​1811.​03873
16.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
17.
Zurück zum Zitat Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(2), 107–116 (1998)MathSciNetCrossRefMATH Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(2), 107–116 (1998)MathSciNetCrossRefMATH
18.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
19.
Zurück zum Zitat Hu, W., Cai, M., Chen, K., Ding, H., Sun, L., Liang, S., Mo, X., Huo, Q.: Sequence discriminative training for offline handwriting recognition by an interpolated CTC and lattice-free MMI objective function. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 61–66 (2017) Hu, W., Cai, M., Chen, K., Ding, H., Sun, L., Liang, S., Mo, X., Huo, Q.: Sequence discriminative training for offline handwriting recognition by an interpolated CTC and lattice-free MMI objective function. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 61–66 (2017)
20.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:​1502.​03167
21.
Zurück zum Zitat Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37(11), 1641–1648 (1989)CrossRef Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37(11), 1641–1648 (1989)CrossRef
22.
Zurück zum Zitat Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)CrossRefMATH Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)CrossRefMATH
23.
Zurück zum Zitat Menasri, F., Louradour, J., Bianne-Bernard, A.L., Kermorvant, C.: The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition. In: Proceedings of Document Recognition and Retrieval, p. 8297Y (2012) Menasri, F., Louradour, J., Bianne-Bernard, A.L., Kermorvant, C.: The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition. In: Proceedings of Document Recognition and Retrieval, p. 8297Y (2012)
24.
Zurück zum Zitat Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks (2013). arXiv preprint arXiv:1312.6026 Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks (2013). arXiv preprint arXiv:​1312.​6026
25.
Zurück zum Zitat Paszke, A., Gross, S., Soumith, C., et. al.: Automatic differentiation in PyTorch. In: NIPS 2017 Autodiff Workshop (2017) Paszke, A., Gross, S., Soumith, C., et. al.: Automatic differentiation in PyTorch. In: NIPS 2017 Autodiff Workshop (2017)
26.
Zurück zum Zitat Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H., et al.: IFN/ENIT-database of handwritten Arabic words. In: Proceedings of the Colloque International Francophone sur l’Ecrit et le Document, vol. 2, pp. 127–136 (2002) Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H., et al.: IFN/ENIT-database of handwritten Arabic words. In: Proceedings of the Colloque International Francophone sur l’Ecrit et le Document, vol. 2, pp. 127–136 (2002)
27.
Zurück zum Zitat Pei, W., Baltrusaitis, T., Tax, D.M., Morency, L.P.: Temporal attention-gated model for robust sequence classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 820–829 (2017) Pei, W., Baltrusaitis, T., Tax, D.M., Morency, L.P.: Temporal attention-gated model for robust sequence classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 820–829 (2017)
28.
Zurück zum Zitat Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 285–290 (2014) Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 285–290 (2014)
29.
Zurück zum Zitat Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 67–72 (2017) Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 67–72 (2017)
31.
Zurück zum Zitat Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef
33.
Zurück zum Zitat Tieleman, T., Hinton, G.: Lecture 6.5-RmsProp: divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning (2012) Tieleman, T., Hinton, G.: Lecture 6.5-RmsProp: divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning (2012)
34.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
35.
Zurück zum Zitat Wu, Y., Schuster, M., Chen, Z., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). arXiv preprint arXiv:1609.08144 Wu, Y., Schuster, M., Chen, Z., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). arXiv preprint arXiv:​1609.​08144
36.
Zurück zum Zitat Wu, Y.C., Yin, F., Chen, Z., Liu, C.L.: Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 79–84 (2017) Wu, Y.C., Yin, F., Chen, Z., Liu, C.L.: Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 79–84 (2017)
37.
Zurück zum Zitat Yousefi, M.R., Soheili, M.R., Breuel, T.M., Stricker, D.: A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic. In: Proceedings of Document Recognition and Retrieval, p. 94020H (2015) Yousefi, M.R., Soheili, M.R., Breuel, T.M., Stricker, D.: A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic. In: Proceedings of Document Recognition and Retrieval, p. 94020H (2015)
38.
Zurück zum Zitat Yue, B., Fu, J., Liang, J.: Residual recurrent neural networks for learning sequential representations. Information 9(3), 56 (2018)CrossRef Yue, B., Fu, J., Liang, J.: Residual recurrent neural networks for learning sequential representations. Information 9(3), 56 (2018)CrossRef
39.
Zurück zum Zitat Zhang, Y., Chen, G., Yu, D., Yaco, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNs for distant speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 5755–5759 (2016) Zhang, Y., Chen, G., Yu, D., Yaco, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNs for distant speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 5755–5759 (2016)
Metadaten
Titel
Dynamic temporal residual network for sequence modeling
verfasst von
Ruijie Yan
Liangrui Peng
Shanyu Xiao
Michael T. Johnson
Shengjin Wang
Publikationsdatum
02.07.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 3/2019
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-019-00328-x

Weitere Artikel der Ausgabe 3/2019

International Journal on Document Analysis and Recognition (IJDAR) 3/2019 Zur Ausgabe