Skip to main content
Top

2016 | OriginalPaper | Chapter

A Deep Neural Networks (DNN) Based Models for a Computer Aided Pronunciation Learning System

Authors : Mohamed S. Elaraby, Mustafa Abdallah, Sherif Abdou, Mohsen Rashwan

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Gaussian Mixture Models (GMM) has been the most common used models in pronunciation verification systems. The recently introduced Deep Neural Networks (DNN) has proved to provide significantly better discriminative models of the acoustic space. In this paper, we introduce our efforts to upgrade the models of a Computer Aided Language Learner (CAPL) system that is used to teach the Arabic pronunciation for Quran recitation rules. Four major enhancements were introduced, firstly we used SAT to reduce the inter-speakers variability, secondly, we integrated a hybrid DNN-HMM models to enhance the acoustic model and decrease the phone error rate. Third, we integrated Minimum Phone Error (MPE) with the hybrid DNN. Finally, in the testing phase, we used a grammar-based decoding graph to limit the search space to the frequent errors types. A comparison between the performance of the conventional GMM-HMM and the hybrid DNN-HMM was performed with results showing significant performance improvements.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Franco, H., Neumeyer, L., Ramos, M., Bratt, H.: Automatic detection of phone-level mispronunciation for language learning. In: Proceedings of Eurospeech 1999, vol. 2, pp. 851–854. Budapest, Hungary (1999) Franco, H., Neumeyer, L., Ramos, M., Bratt, H.: Automatic detection of phone-level mispronunciation for language learning. In: Proceedings of Eurospeech 1999, vol. 2, pp. 851–854. Budapest, Hungary (1999)
2.
go back to reference El-Kasasy, M.S.: An Automatic Speech Verification System. Ph.D. Thesis, Cairo University, Faculty of Engineering, Department of Electronics and Communications, Egypt (1992) El-Kasasy, M.S.: An Automatic Speech Verification System. Ph.D. Thesis, Cairo University, Faculty of Engineering, Department of Electronics and Communications, Egypt (1992)
3.
go back to reference Hamid, S.: Computer Aided Pronunciation Learning System using Statistical Based Automatic Speech Recognition. Ph.D. thesis, Cairo University, Cairo, Egypt (2005) Hamid, S.: Computer Aided Pronunciation Learning System using Statistical Based Automatic Speech Recognition. Ph.D. thesis, Cairo University, Cairo, Egypt (2005)
4.
go back to reference Samir, A., Abdou, S.M., Khalil, A.H., Rashwan, M.: Enhancing usability of CAPL system for Qur’an recitation learning. In: 8th Annual Conference of the International Speech Communication Association, at Antwerp, Belgium (2007) Samir, A., Abdou, S.M., Khalil, A.H., Rashwan, M.: Enhancing usability of CAPL system for Qur’an recitation learning. In: 8th Annual Conference of the International Speech Communication Association, at Antwerp, Belgium (2007)
5.
go back to reference Mohamed, A., Dahl, G.E., Hinton, G.E.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Proc. 20(1), 14–22 (2012)CrossRef Mohamed, A., Dahl, G.E., Hinton, G.E.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Proc. 20(1), 14–22 (2012)CrossRef
6.
go back to reference Mohamed, A., Yu., D., Deng, L.: Investigation of full-sequence training of deep belief networks for speech recognition. In: Proceedings of Interspeech, Makuhari, pp. 1692–1695. Japan (2010) Mohamed, A., Yu., D., Deng, L.: Investigation of full-sequence training of deep belief networks for speech recognition. In: Proceedings of Interspeech, Makuhari, pp. 1692–1695. Japan (2010)
7.
go back to reference Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Proc. 20(1), 30–42 (2010)CrossRef Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Proc. 20(1), 30–42 (2010)CrossRef
8.
go back to reference Seide, F., Li, G., Yu, D.: Conversational speech transcription using context dependent deep neural networks. In: Proceedings of Interspeech, pp. 437–440, Florence, Italy (2011) Seide, F., Li, G., Yu, D.: Conversational speech transcription using context dependent deep neural networks. In: Proceedings of Interspeech, pp. 437–440, Florence, Italy (2011)
9.
go back to reference Shahin, M., Ahmed, B., McKechnie, J., Ballard, K., Gutierrez-Osuna, R.: A comparison of GMM-HMM and DNNHMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. In: Interspeech 2014, Singapore (2014) Shahin, M., Ahmed, B., McKechnie, J., Ballard, K., Gutierrez-Osuna, R.: A comparison of GMM-HMM and DNNHMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. In: Interspeech 2014, Singapore (2014)
10.
go back to reference Lee, Y.Z., Glass, J.: Mispronunciation detection via dynamic time wrapping on deep belief network-based posterior grams. In: ICASSP (2013) Lee, Y.Z., Glass, J.: Mispronunciation detection via dynamic time wrapping on deep belief network-based posterior grams. In: ICASSP (2013)
11.
go back to reference Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker-adaptive training. In: Proceedings of the Spoken Language, ICSLP, vol. 2, pp. 1137–1140 (1996) Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker-adaptive training. In: Proceedings of the Spoken Language, ICSLP, vol. 2, pp. 1137–1140 (1996)
12.
go back to reference Sherif, M.A., Salah, E.H., Mohsen, R., Abdurrahman, S., Ossama, A.-H., Mostafa, S., Waleed, N.: Computer aided pronunciation learning system using speech recognition techniques. In: INTERSPEECH 2006, ICSLP, Pittsburgh, PA, USA (2006) Sherif, M.A., Salah, E.H., Mohsen, R., Abdurrahman, S., Ossama, A.-H., Mostafa, S., Waleed, N.: Computer aided pronunciation learning system using speech recognition techniques. In: INTERSPEECH 2006, ICSLP, Pittsburgh, PA, USA (2006)
Metadata
Title
A Deep Neural Networks (DNN) Based Models for a Computer Aided Pronunciation Learning System
Authors
Mohamed S. Elaraby
Mustafa Abdallah
Sherif Abdou
Mohsen Rashwan
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-43958-7_5

Premium Partner