nach oben

International Journal of Speech Technology

Erschienen in:

12.05.2020

Bidirectional internal memory gate recurrent neural networks for spoken language understanding

verfasst von: Mohamed Morchid

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recurrent neural networks have encountered a wide success in different domains due to their high capability to code short- and long-term dependencies between basic features of a sequence. Different RNN units have been proposed to well manage the term dependencies with an efficient algorithm that requires few basic operations to reduce the processing time needed to learn the model. Among these units, the internal memory gate (IMG) have produce efficient accuracies faster than LSTM and GRU during a SLU task. This paper presents the bidirectional internal memory gate recurrent neural network (BIMG) that codes short- and long-term dependencies in forward and backward directions. Indeed, the BIMG is composed with IMG cells made of an unique gate managing short- and long-term dependencies by combining the advantages of the LSTM, GRU (short- and long-term dependencies) and the leaky unit (LU) (fast learning). The effectiveness and the robustness of the proposed BIMG-RNN is evaluated during a theme identification task of telephone conversations. The experiments show that BIMG reaches better accuracies than BGRU and BLSTM with a gain of 1.1 and a gain of 2.1 with IMG model. Moreover, BIMG requires less processing time than BGRU and BLSTM with a gain of 12% and 35% respectively.

Vorheriger Artikel A novel semantic and logical-based approach integrating RTE technique in the Arabic question–answering

Nächster Artikel Towards a historical dictionary for Arabic language

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Best configuration observed on development set (number of neurons in the hidden layer) applied to test data set.

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

Bechet, F., Maza, B., Bigouroux, N., Bazillon, T., El-Beze, M., De Mori, R., & Arbillot, E. (2012). Decoda: a call-centre human-human spoken conversation corpus. In Proceedings of the LREC’12.

Bengio, Y., Boulanger-Lewandowski, N., & Pascanu, R. (2013). Advances in optimizing recurrent networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628. IEEE.

Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137–1155.MATH

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.CrossRef

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014a) Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014b). On the properties of neural machine translation: Encoder–decoder approaches. arXiv preprint arXiv:1409.1259.

Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.CrossRef

Fernández, S., Graves, A., & Schmidhuber, J. (2007). An application of recurrent neural networks to discriminative keyword spotting. In Proceedings of the Artificial Neural Networks—ICANN 2007, pp. 220–229. Springer.

Gers, F. A., Eck, D., & Schmidhuber, J. (2001). Applying lstm to time series predictable through time-window approaches. In Proceedings of the Artificial Neural Networks ICANN 2001, pp. 669–676. Springer.

Godfrey, J. J., Holliman, E. C., & McDaniel, J. (1992). Switchboard: Telephone speech corpus for research and development. In Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992. ICASSP-92, vol. 1, pp. 517–520. IEEE.

Graves, A., Fernández, S., & Schmidhuber, J. (2005). Bidirectional lSTM networks for improved phoneme classification and recognition. In Proceedings of the Artificial Neural Networks: Formal Models and Their Applications—ICANN 2005, pp. 799–804. Springer.

Graves, A., Mohamed, A. r., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE.

Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5), 602–610.CrossRef

Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(02), 107–116.CrossRef

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef

Linarès, G., Nocéra, P., Massonie, D., & Matrouf, D. (2007). The lia speech recognition system: from 10xrt to 1xrt. In Proceedings of the Text, Speech and Dialogue, pp. 302–308. Springer.

Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics.

Mikolov, T. (2012). Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April.

Morchid, M. (2017). Internal memory gate for recurrent neural networks with application to spoken language understanding. Proceedings of the Interspeech, pp. 3316–3319, 2017.

Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681.CrossRef

Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Proceedings of the INTERSPEECH, pp. 194–197.

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, pp. 3104–3112.

Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015) Show and tell: A neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164.

Vukotic, V., Raymond, C., & Gravier, G. (2016). A step beyond local observations with a dialog aware bidirectional GRU network for spoken language understanding. In: Proceedings of the Interspeech.

Wöllmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., & Cowie, R. (2008). Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of the INTERSPEECH, vol. 2008, pp. 597–600. Citeseer.

Titel: Bidirectional internal memory gate recurrent neural networks for spoken language understanding
verfasst von: Mohamed Morchid
Publikationsdatum: 12.05.2020
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 1/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-020-09708-9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Customer Experience/© © oatawa / Getty Images / iStock, Erdgasmotor 1.5 TGI evo von Volkswagen/© Volkswagen AG, Thorsten Mücke/© Alexandra Bachran, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2022

Normalized approximate descent used for spike based automatic bird species recognition system

Automatic diacritization of Tunisian dialect text using SMT model

Towards a historical dictionary for Arabic language

Using novel method: Real Cepstral Discrete Cosine Transform, for detecting Parkinson from multiple system atrophy, other neurological diseases and healthy cases using voice analysis

Correction to: The perception of emotional cues by children in artificial background noise

Acoustic domain mismatch compensation in bird audio detection

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.