Skip to main content
Top

2015 | OriginalPaper | Chapter

A Comparison of RNN LM and FLM for Russian Speech Recognition

Authors : Irina Kipyatkova, Alexey Karpov

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the paper, we describe a research of recurrent neural network (RNN) language model (LM) for N-best list rescoring for automatic continuous Russian speech recognition and make a comparison of it with factored language model (FLM). We tried RNN with different number of units in the hidden layer. For FLM creation, we used five linguistic factors: word, lemma, stem, part-of-speech, and morphological tag. All models were trained on the text corpus of 350M words. Also we made linear interpolation of RNN LM and FLM with the baseline 3-gram LM. We achieved the relative WER reduction of 8 % using FLM and 14 % relative WER reduction using RNN LM with respect to the baseline model.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 4–6. Association for Computational Linguistics (2003) Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Companion Volume of the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 4–6. Association for Computational Linguistics (2003)
2.
3.
go back to reference Huang, Z., Zweig, G., Dumoulin, B.: Cache based recurrent neural network language model inference for first pass speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6354–6358. IEEE (2014) Huang, Z., Zweig, G., Dumoulin, B.: Cache based recurrent neural network language model inference for first pass speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6354–6358. IEEE (2014)
4.
go back to reference Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin, A., Hoffmann, R.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceedings of the SPECOM, pp. 515–520 (2009) Jokisch, O., Wagner, A., Sabo, R., Jaeckel, R., Cylwik, N., Rusko, M., Ronzhin, A., Hoffmann, R.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceedings of the SPECOM, pp. 515–520 (2009)
5.
go back to reference Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Twelfth Annual Conference of the International Speech Communication Association (2011) Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
6.
go back to reference Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)CrossRef Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)CrossRef
7.
go back to reference Kipyatkova, I., Karpov, A.: Lexicon size and language model order optimization for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 219–226. Springer, Heidelberg (2013) CrossRef Kipyatkova, I., Karpov, A.: Lexicon size and language model order optimization for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 219–226. Springer, Heidelberg (2013) CrossRef
8.
go back to reference Kipyatkova, I., Karpov, A.: Development of factored language models for automatic Russian speech recognition. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, pp. 234–246 (2015) Kipyatkova, I., Karpov, A.: Development of factored language models for automatic Russian speech recognition. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, pp. 234–246 (2015)
9.
go back to reference Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Proceedings of APSIPA ASC 2009, 2009 Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, pp. 131–137. International Organizing Committee (2009) Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Proceedings of APSIPA ASC 2009, 2009 Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, pp. 131–137. International Organizing Committee (2009)
10.
go back to reference Mikolov, T., Deoras, A., Povey, D., Burget, L., Černockỳ, J.: Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 196–201. IEEE (2011) Mikolov, T., Deoras, A., Povey, D., Burget, L., Černockỳ, J.: Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 196–201. IEEE (2011)
11.
go back to reference Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010 Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010
12.
go back to reference Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Cernocky, J.: RNNLM-recurrent neural network language modeling toolkit. In: Proceedings of the ASRU Workshop, pp. 196–201 (2011) Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Cernocky, J.: RNNLM-recurrent neural network language modeling toolkit. In: Proceedings of the ASRU Workshop, pp. 196–201 (2011)
13.
go back to reference Schwenk, H., Gauvain, J.L.: Training neural network language models on very large corpora. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 201–208 (2005) Schwenk, H., Gauvain, J.L.: Training neural network language models on very large corpora. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 201–208 (2005)
14.
go back to reference Shi, Y., Larson, M., Wiggers, P., Jonker, C.M.: Exploiting the succeeding words in recurrent neural network language models. In: INTERSPEECH, pp. 632–636 (2013) Shi, Y., Larson, M., Wiggers, P., Jonker, C.M.: Exploiting the succeeding words in recurrent neural network language models. In: INTERSPEECH, pp. 632–636 (2013)
15.
go back to reference Sokirko, A.: Morphological modules on the website. In: Proceedings of Dialog 2004 International Conference, pp. 559–564 (2004). www.aot.ru Sokirko, A.: Morphological modules on the website. In: Proceedings of Dialog 2004 International Conference, pp. 559–564 (2004). www.​aot.​ru
16.
go back to reference Stolcke, A., Zheng, J., Wang, W., Abrash, V.: Srilm at sixteen: update and outlook. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, p. 5 (2011) Stolcke, A., Zheng, J., Wang, W., Abrash, V.: Srilm at sixteen: update and outlook. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, p. 5 (2011)
17.
go back to reference Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schluter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013) Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schluter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013)
18.
go back to reference Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: Fifteenth Annual Conference of the International Speech Communication Association (2014) Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
19.
go back to reference Vazhenina, D., Markov, K.: Evaluation of advanced language modeling techniques for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 124–131. Springer, Heidelberg (2013) CrossRef Vazhenina, D., Markov, K.: Evaluation of advanced language modeling techniques for Russian LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 124–131. Springer, Heidelberg (2013) CrossRef
20.
go back to reference Zulkarneev, M., Penalov, S.: System of speech recognition for Russian language, using deep neural networks and finite state transducers. Neurocomput. Develop. Appl. 10, 40–46 (2013) Zulkarneev, M., Penalov, S.: System of speech recognition for Russian language, using deep neural networks and finite state transducers. Neurocomput. Develop. Appl. 10, 40–46 (2013)
Metadata
Title
A Comparison of RNN LM and FLM for Russian Speech Recognition
Authors
Irina Kipyatkova
Alexey Karpov
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-23132-7_5

Premium Partner