nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

verfasst von : Jan Vaněk, Josef Michálek, Jan Zelinka, Josef Psutka

Erschienen in: Statistical Language and Speech Processing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recently, recurrent neural networks have become state-of-the-art in acoustic modeling for automatic speech recognition. The long short-term memory (LSTM) units are the most popular ones. However, alternative units like gated recurrent unit (GRU) and its modifications outperformed LSTM in some publications. In this paper, we compared five neural network (NN) architectures with various adaptation and feature normalization techniques. We have evaluated feature-space maximum likelihood linear regression, five variants of i-vector adaptation and two variants of cepstral mean normalization. The most adaptation and normalization techniques were developed for feed-forward NNs and, according to results in this paper, not all of them worked also with RNNs. For experiments, we have chosen a well known and available TIMIT phone recognition task. The phone recognition is much more sensitive to the quality of AM than large vocabulary task with a complex language model. Also, we published the open-source scripts to easily replicate the results and to help continue the development.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Investigating a Hybrid Learning Approach for Robust Automatic Speech Recognition

Nächstes Kapitel Restoring Punctuation and Capitalization Using Transformer Models

A flexible framework of neural networks for deep learning. https://chainer.org

Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)

Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

Gales, M.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRef

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

Huang, Z., Tang, J., Xue, S., Dai, L.: Speaker adaptation of RNN-BLSTM for speech recognition based on speaker code. In: ICASSP, vol. 1, pp. 5305–5309 (2016). https://doi.org/10.1109/ICASSP.2016.7472690

Karafiát, M., Burget, L., Matějka, P., Glembek, O., Černocký, J.: iVector-based discriminative adaptation for automatic speech recognition. In: Proceedings of 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, pp. 152–157 (2011). https://doi.org/10.1109/ASRU.2011.6163922

Parthasarathi, S.H.K., Hoffmeister, B., Matsoukas, S., Mandal, A., Strom, N., Garimella, S.: fMLLR based feature-space speaker adaptation of DNN acoustic models. In: INTERSPEECH, pp. 3630–3634. ISCA (2015)

Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH January 2015, pp. 3214–3218 (2015)

10.

Rath, S.P., Povey, D., Veselý, K., Černocký, J.: Improved feature processing for deep neural networks. In: INTERSPEECH, pp. 109–113. ISCA (2013)

11.

Ravanelli, M., Brakel, P., Omologo, M., Bengio, Y., Kessler, F.B.: Improving speech recognition by revising gated recurrent units. In: INTERSPEECH 2017, pp. 1308–1312 (2017). https://doi.org/10.21437/Interspeech.2017-775

12.

Sak, H., Senior, A., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. In: INTERSPEECH, vol. 1, pp. 338–342 (2014). arXiv:1402.1128

13.

Saon, G., Soltau, H.: Unfolded Recurrent Neural Networks for Speech Recognition. In: INTERSPEECH, vol. 1, pp. 343–347 (2014). http://mazsola.iit.uni-miskolc.hu/~czap/letoltes/IS14/IS2014/PDF/AUTHOR/IS141054.PDF

14.

Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-vectors. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 55–59 (2013)

15.

Seide, F., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: ASRU (2011)

16.

Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.J.: Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Sig. Process. 37(3), 328–339 (1989). https://doi.org/10.1109/29.21701CrossRef

17.

Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L., Liu, Q.: Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Trans. Speech Lang. Process. 22(12), 1713–1725 (2014). https://doi.org/10.1109/TASLP.2014.2346313CrossRef

Titel: A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures
verfasst von: Jan Vaněk
Josef Michálek
Jan Zelinka
Josef Psutka
Verlag: Springer International Publishing
Buch: Statistical Language and Speech Processing
Print ISBN: 978-3-030-00809-3

Electronic ISBN: 978-3-030-00810-9

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-00810-9_8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"