Skip to main content

2015 | OriginalPaper | Buchkapitel

Mongolian Speech Recognition Based on Deep Neural Networks

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Mongolian is an influential language. And better Mongolian Large Vocabulary Continuous Speech Recognition (LVCSR) systems are required. Recently, the research of speech recognition has achieved a big improvement by introducing the Deep Neural Networks (DNNs). In this study, a DNN-based Mongolian LVCSR system is built. Experimental results show that the DNN-based models outperform the conventional models which based on Gaussian Mixture Models (GMMs) for the Mongolian speech recognition, by a large margin. Compared with the best GMM-based model, the DNN-based one obtains a relative improvement over 50 %. And it becomes a new state-of-the-art system in this field.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Gao, G., Biligetu, Nabuqing, Zhang, S.: A mongolian speech recognition system based on HMM. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 667–676. Springer, Heidelberg (2006) CrossRef Gao, G., Biligetu, Nabuqing, Zhang, S.: A mongolian speech recognition system based on HMM. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 667–676. Springer, Heidelberg (2006) CrossRef
3.
Zurück zum Zitat Qilao, H., Gao, G.: Researching of speech recognition oriented mongolian acoustic model. In: Chinese Conference on Pattern Recognition, CCPR 2008, pp. 406–411. IEEE (2008) Qilao, H., Gao, G.: Researching of speech recognition oriented mongolian acoustic model. In: Chinese Conference on Pattern Recognition, CCPR 2008, pp. 406–411. IEEE (2008)
4.
Zurück zum Zitat Bao, F., Gao, G.: Improving of acoustic model for the mongolian speech recognition system. In: Chinese Conference on Pattern Recognition, CCPR 2009, pp. 616–620. IEEE (2009) Bao, F., Gao, G.: Improving of acoustic model for the mongolian speech recognition system. In: Chinese Conference on Pattern Recognition, CCPR 2009, pp. 616–620. IEEE (2009)
5.
Zurück zum Zitat Bao, F., Gao, G., Yan, X., Wang, W.: Segmentation-based mongolian LVCSR approach. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. pp. 8136–8139. IEEE (2013) Bao, F., Gao, G., Yan, X., Wang, W.: Segmentation-based mongolian LVCSR approach. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. pp. 8136–8139. IEEE (2013)
6.
Zurück zum Zitat Ayush, A., Damdinsuren, B.: A design and implementation of HMM based mongolian speech recognition system. In: 2013 8th International Forum on Strategic Technology (IFOST), vol. 2, pp. 341–344, June 2013 Ayush, A., Damdinsuren, B.: A design and implementation of HMM based mongolian speech recognition system. In: 2013 8th International Forum on Strategic Technology (IFOST), vol. 2, pp. 341–344, June 2013
7.
Zurück zum Zitat Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef
8.
Zurück zum Zitat Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef
9.
Zurück zum Zitat Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)CrossRef Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)CrossRef
10.
Zurück zum Zitat Mohamed, A.-R., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012) Mohamed, A.-R., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)
11.
Zurück zum Zitat Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRef Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRef
13.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
14.
Zurück zum Zitat Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning, pp. 137–186. Springer, Heidelberg (2006)CrossRef Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning, pp. 137–186. Springer, Heidelberg (2006)CrossRef
15.
Zurück zum Zitat Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit (2011) Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit (2011)
16.
Zurück zum Zitat Stolcke, A., et al.: SRILM-an extensible language modeling toolkit. In: INTERSPEECH (2002) Stolcke, A., et al.: SRILM-an extensible language modeling toolkit. In: INTERSPEECH (2002)
17.
Zurück zum Zitat Bao, F., Gao, G., Yan, X., Wang, H.: Language model for cyrillic mongolian to traditional mongolian conversion. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds.) NLPCC 2013. CCIS, vol. 400, pp. 13–18. Springer, Heidelberg (2013) CrossRef Bao, F., Gao, G., Yan, X., Wang, H.: Language model for cyrillic mongolian to traditional mongolian conversion. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds.) NLPCC 2013. CCIS, vol. 400, pp. 13–18. Springer, Heidelberg (2013) CrossRef
18.
Zurück zum Zitat Bao, F., Gao, G., Yan, X., Wei, H.: Research on conversion approach between traditional mongolian and cyrillic mongolian. Comput. Eng. Appl. 2014(23), 206–211 (2014) Bao, F., Gao, G., Yan, X., Wei, H.: Research on conversion approach between traditional mongolian and cyrillic mongolian. Comput. Eng. Appl. 2014(23), 206–211 (2014)
19.
Zurück zum Zitat Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1045–1048 (2010) Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1045–1048 (2010)
20.
Zurück zum Zitat Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schlüter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013) Sundermeyer, M., Oparin, I., Gauvain, J.L., Freiberg, B., Schlüter, R., Ney, H.: Comparison of feedforward and recurrent neural network language models. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8430–8434 (2013)
21.
Zurück zum Zitat Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., et al.: Deepspeech: Scaling up end-to-end speech recognition (2014). arXiv preprint arXiv:1412.5567 Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., et al.: Deepspeech: Scaling up end-to-end speech recognition (2014). arXiv preprint arXiv:​1412.​5567
Metadaten
Titel
Mongolian Speech Recognition Based on Deep Neural Networks
verfasst von
Hui Zhang
Feilong Bao
Guanglai Gao
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-25816-4_15