Skip to main content
Erschienen in: International Journal of Speech Technology 1/2022

12.05.2020

Bidirectional internal memory gate recurrent neural networks for spoken language understanding

verfasst von: Mohamed Morchid

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recurrent neural networks have encountered a wide success in different domains due to their high capability to code short- and long-term dependencies between basic features of a sequence. Different RNN units have been proposed to well manage the term dependencies with an efficient algorithm that requires few basic operations to reduce the processing time needed to learn the model. Among these units, the internal memory gate (IMG) have produce efficient accuracies faster than LSTM and GRU during a SLU task. This paper presents the bidirectional internal memory gate recurrent neural network (BIMG) that codes short- and long-term dependencies in forward and backward directions. Indeed, the BIMG is composed with IMG cells made of an unique gate managing short- and long-term dependencies by combining the advantages of the LSTM, GRU (short- and long-term dependencies) and the leaky unit (LU) (fast learning). The effectiveness and the robustness of the proposed BIMG-RNN is evaluated during a theme identification task of telephone conversations. The experiments show that BIMG reaches better accuracies than BGRU and BLSTM with a gain of 1.1 and a gain of 2.1 with IMG model. Moreover, BIMG requires less processing time than BGRU and BLSTM with a gain of 12% and 35% respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Best configuration observed on development set (number of neurons in the hidden layer) applied to test data set.
 
2
Best configuration observed on development set (number of neurons in the hidden layer) applied to test data set.
 
Literatur
Zurück zum Zitat Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473.
Zurück zum Zitat Bechet, F., Maza, B., Bigouroux, N., Bazillon, T., El-Beze, M., De Mori, R., & Arbillot, E. (2012). Decoda: a call-centre human-human spoken conversation corpus. In Proceedings of the LREC’12. Bechet, F., Maza, B., Bigouroux, N., Bazillon, T., El-Beze, M., De Mori, R., & Arbillot, E. (2012). Decoda: a call-centre human-human spoken conversation corpus. In Proceedings of the LREC’12.
Zurück zum Zitat Bengio, Y., Boulanger-Lewandowski, N., & Pascanu, R. (2013). Advances in optimizing recurrent networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628. IEEE. Bengio, Y., Boulanger-Lewandowski, N., & Pascanu, R. (2013). Advances in optimizing recurrent networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628. IEEE.
Zurück zum Zitat Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137–1155.MATH Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137–1155.MATH
Zurück zum Zitat Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.CrossRef Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.CrossRef
Zurück zum Zitat Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014a) Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014a) Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv preprint arXiv:​1406.​1078.
Zurück zum Zitat Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014b). On the properties of neural machine translation: Encoder–decoder approaches. arXiv preprint arXiv:1409.1259. Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014b). On the properties of neural machine translation: Encoder–decoder approaches. arXiv preprint arXiv:​1409.​1259.
Zurück zum Zitat Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:​1412.​3555.
Zurück zum Zitat Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.CrossRef Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.CrossRef
Zurück zum Zitat Fernández, S., Graves, A., & Schmidhuber, J. (2007). An application of recurrent neural networks to discriminative keyword spotting. In Proceedings of the Artificial Neural Networks—ICANN 2007, pp. 220–229. Springer. Fernández, S., Graves, A., & Schmidhuber, J. (2007). An application of recurrent neural networks to discriminative keyword spotting. In Proceedings of the Artificial Neural Networks—ICANN 2007, pp. 220–229. Springer.
Zurück zum Zitat Gers, F. A., Eck, D., & Schmidhuber, J. (2001). Applying lstm to time series predictable through time-window approaches. In Proceedings of the Artificial Neural Networks ICANN 2001, pp. 669–676. Springer. Gers, F. A., Eck, D., & Schmidhuber, J. (2001). Applying lstm to time series predictable through time-window approaches. In Proceedings of the Artificial Neural Networks ICANN 2001, pp. 669–676. Springer.
Zurück zum Zitat Godfrey, J. J., Holliman, E. C., & McDaniel, J. (1992). Switchboard: Telephone speech corpus for research and development. In Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992. ICASSP-92, vol. 1, pp. 517–520. IEEE. Godfrey, J. J., Holliman, E. C., & McDaniel, J. (1992). Switchboard: Telephone speech corpus for research and development. In Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992. ICASSP-92, vol. 1, pp. 517–520. IEEE.
Zurück zum Zitat Graves, A., Fernández, S., & Schmidhuber, J. (2005). Bidirectional lSTM networks for improved phoneme classification and recognition. In Proceedings of the Artificial Neural Networks: Formal Models and Their Applications—ICANN 2005, pp. 799–804. Springer. Graves, A., Fernández, S., & Schmidhuber, J. (2005). Bidirectional lSTM networks for improved phoneme classification and recognition. In Proceedings of the Artificial Neural Networks: Formal Models and Their Applications—ICANN 2005, pp. 799–804. Springer.
Zurück zum Zitat Graves, A., Mohamed, A. r., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE. Graves, A., Mohamed, A. r., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE.
Zurück zum Zitat Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5), 602–610.CrossRef Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5), 602–610.CrossRef
Zurück zum Zitat Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(02), 107–116.CrossRef Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(02), 107–116.CrossRef
Zurück zum Zitat Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef
Zurück zum Zitat Linarès, G., Nocéra, P., Massonie, D., & Matrouf, D. (2007). The lia speech recognition system: from 10xrt to 1xrt. In Proceedings of the Text, Speech and Dialogue, pp. 302–308. Springer. Linarès, G., Nocéra, P., Massonie, D., & Matrouf, D. (2007). The lia speech recognition system: from 10xrt to 1xrt. In Proceedings of the Text, Speech and Dialogue, pp. 302–308. Springer.
Zurück zum Zitat Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics. Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics.
Zurück zum Zitat Mikolov, T. (2012). Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April. Mikolov, T. (2012). Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April.
Zurück zum Zitat Morchid, M. (2017). Internal memory gate for recurrent neural networks with application to spoken language understanding. Proceedings of the Interspeech, pp. 3316–3319, 2017. Morchid, M. (2017). Internal memory gate for recurrent neural networks with application to spoken language understanding. Proceedings of the Interspeech, pp. 3316–3319, 2017.
Zurück zum Zitat Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681.CrossRef Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681.CrossRef
Zurück zum Zitat Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Proceedings of the INTERSPEECH, pp. 194–197. Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Proceedings of the INTERSPEECH, pp. 194–197.
Zurück zum Zitat Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, pp. 3104–3112. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, pp. 3104–3112.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015) Show and tell: A neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015) Show and tell: A neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164.
Zurück zum Zitat Vukotic, V., Raymond, C., & Gravier, G. (2016). A step beyond local observations with a dialog aware bidirectional GRU network for spoken language understanding. In: Proceedings of the Interspeech. Vukotic, V., Raymond, C., & Gravier, G. (2016). A step beyond local observations with a dialog aware bidirectional GRU network for spoken language understanding. In: Proceedings of the Interspeech.
Zurück zum Zitat Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., & Cowie, R. (2008). Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of the INTERSPEECH, vol. 2008, pp. 597–600. Citeseer. Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., & Cowie, R. (2008). Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of the INTERSPEECH, vol. 2008, pp. 597–600. Citeseer.
Metadaten
Titel
Bidirectional internal memory gate recurrent neural networks for spoken language understanding
verfasst von
Mohamed Morchid
Publikationsdatum
12.05.2020
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-020-09708-9

Weitere Artikel der Ausgabe 1/2022

International Journal of Speech Technology 1/2022 Zur Ausgabe

Neuer Inhalt