nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

Mongolian Speech Recognition Based on Deep Neural Networks

verfasst von : Hui Zhang, Feilong Bao, Guanglai Gao

Erschienen in: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Mongolian is an influential language. And better Mongolian Large Vocabulary Continuous Speech Recognition (LVCSR) systems are required. Recently, the research of speech recognition has achieved a big improvement by introducing the Deep Neural Networks (DNNs). In this study, a DNN-based Mongolian LVCSR system is built. Experimental results show that the DNN-based models outperform the conventional models which based on Gaussian Mixture Models (GMMs) for the Mongolian speech recognition, by a large margin. Compared with the best GMM-based model, the DNN-based one obtains a relative improvement over 50 %. And it becomes a new state-of-the-art system in this field.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Types and Constructions of Exocentric Adjectives in Tibetan

Nächstes Kapitel Tibetan Word Segmentation as Sub-syllable Tagging with Syllable’s Part-of-Speech Property

Lewis, M.P., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 18th edn. Sil International, Dallas, TX (2015). http://www.ethnologue.com

Gao, G., Biligetu, Nabuqing, Zhang, S.: A mongolian speech recognition system based on HMM. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 667–676. Springer, Heidelberg (2006) CrossRef

Qilao, H., Gao, G.: Researching of speech recognition oriented mongolian acoustic model. In: Chinese Conference on Pattern Recognition, CCPR 2008, pp. 406–411. IEEE (2008)

Bao, F., Gao, G.: Improving of acoustic model for the mongolian speech recognition system. In: Chinese Conference on Pattern Recognition, CCPR 2009, pp. 616–620. IEEE (2009)

Bao, F., Gao, G., Yan, X., Wang, W.: Segmentation-based mongolian LVCSR approach. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. pp. 8136–8139. IEEE (2013)

Ayush, A., Damdinsuren, B.: A design and implementation of HMM based mongolian speech recognition system. In: 2013 8th International Forum on Strategic Technology (IFOST), vol. 2, pp. 341–344, June 2013

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef

Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef

Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)CrossRef

10.

Mohamed, A.-R., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)

11.

Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRef

12.

Forney Jr., G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)MathSciNetCrossRef

13.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

14.

Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning, pp. 137–186. Springer, Heidelberg (2006)CrossRef

15.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit (2011)

16.

Stolcke, A., et al.: SRILM-an extensible language modeling toolkit. In: INTERSPEECH (2002)

17.

Bao, F., Gao, G., Yan, X., Wang, H.: Language model for cyrillic mongolian to traditional mongolian conversion. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds.) NLPCC 2013. CCIS, vol. 400, pp. 13–18. Springer, Heidelberg (2013) CrossRef

18.

Bao, F., Gao, G., Yan, X., Wei, H.: Research on conversion approach between traditional mongolian and cyrillic mongolian. Comput. Eng. Appl. 2014(23), 206–211 (2014)

19.

Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1045–1048 (2010)

20.

Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schlüter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013)

21.

Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., et al.: Deepspeech: Scaling up end-to-end speech recognition (2014). arXiv preprint arXiv:1412.5567

22.

Chan, W., Lane, I.: Deep recurrent neural networks for acoustic modelling (2015). arXiv preprint arXiv:1504.01482

Titel: Mongolian Speech Recognition Based on Deep Neural Networks
verfasst von: Hui Zhang
Feilong Bao
Guanglai Gao
Verlag: Springer International Publishing
Buch: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
Print ISBN: 978-3-319-25815-7

Electronic ISBN: 978-3-319-25816-4

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-3-319-25816-4_15

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"