Skip to main content
Top

2020 | OriginalPaper | Chapter

Optimization of Gain in Symmetrized Itakura-Saito Discrimination for Pronunciation Learning

Authors : Andrey V. Savchenko, Vladimir V. Savchenko, Lyudmila V. Savchenko

Published in: Mathematical Optimization Theory and Operations Research

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper considers an assessment and evaluation of the pronunciation quality in computer-aided language learning systems. We propose the novel distortion measure for speech processing by using the gain optimization of the symmetrized Itakura-Saito divergence. This dissimilarity is implemented in a complete algorithm for pronunciation learning and improvement. At its first stage, a user has to achieve a stable pronunciation of all sounds by matching them with sounds of an ideal speaker. At the second stage, the recognition of sounds and their short sequences is carried out to guarantee the distinguishability of learned sounds. The training set may contain not only ideal sounds but the best utterances of a user obtained at the previous step. Finally, the word recognition accuracy is estimated by using deep neural networks fine-tuned on the best words from a user. Experimental study shows that the proposed procedure makes it possible to achieve high efficiency for learning of sounds and their sequences even in the presence of noise in an observed utterance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Golonka, E.M., Bowles, A.R., Frank, V.M., Richardson, D.L., Freynik, S.: Technologies for foreign language learning: a review of technology types and their effectiveness. Comput. Assist. Lang. Learn. 27(1), 70–105 (2014)CrossRef Golonka, E.M., Bowles, A.R., Frank, V.M., Richardson, D.L., Freynik, S.: Technologies for foreign language learning: a review of technology types and their effectiveness. Comput. Assist. Lang. Learn. 27(1), 70–105 (2014)CrossRef
2.
go back to reference Sztahó, D., Kiss, G., Vicsi, K.: Computer based speech prosody teaching system. Comput. Speech Lang. 50, 126–140 (2018)CrossRef Sztahó, D., Kiss, G., Vicsi, K.: Computer based speech prosody teaching system. Comput. Speech Lang. 50, 126–140 (2018)CrossRef
3.
go back to reference Han, K.I., Park, H.J., Lee, K.M.: Speech recognition and lip shape feature extraction for English vowel pronunciation of the hearing-impaired based on SVM technique. In: Proceedings of the International Conference on Big Data and Smart Computing (BigComp), pp. 293–296. IEEE (2016) Han, K.I., Park, H.J., Lee, K.M.: Speech recognition and lip shape feature extraction for English vowel pronunciation of the hearing-impaired based on SVM technique. In: Proceedings of the International Conference on Big Data and Smart Computing (BigComp), pp. 293–296. IEEE (2016)
4.
go back to reference Hu, W., Qian, Y., Soong, F.K.: A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In: Proceedings of Interspeech, pp. 1886–1890 (2013) Hu, W., Qian, Y., Soong, F.K.: A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In: Proceedings of Interspeech, pp. 1886–1890 (2013)
5.
go back to reference Kneller, E., Karaulnyh, D.: System and method of converting voice signal into transcript presentation with metadata. RU Patent 2589851 C2, 10 July 2016 Kneller, E., Karaulnyh, D.: System and method of converting voice signal into transcript presentation with metadata. RU Patent 2589851 C2, 10 July 2016
7.
go back to reference Haikun, T., Shiying, W., Xinsheng, L., Yue, X.G.: Speech recognition model based on deep learning and application in pronunciation quality evaluation system. In: Proceedings of the International Conference on Data Mining and Machine Learning, pp. 1–5 (2019) Haikun, T., Shiying, W., Xinsheng, L., Yue, X.G.: Speech recognition model based on deep learning and application in pronunciation quality evaluation system. In: Proceedings of the International Conference on Data Mining and Machine Learning, pp. 1–5 (2019)
9.
go back to reference Franco, H., Bratt, H., Rossier, R., Rao Gadde, V., Shriberg, E., Abrash, V., Precoda, K.: Eduspeak®: a speech recognition and pronunciation scoring toolkit for computer-aided language learning applications. Lang. Test. 27(3), 401–418 (2010)CrossRef Franco, H., Bratt, H., Rossier, R., Rao Gadde, V., Shriberg, E., Abrash, V., Precoda, K.: Eduspeak®: a speech recognition and pronunciation scoring toolkit for computer-aided language learning applications. Lang. Test. 27(3), 401–418 (2010)CrossRef
10.
go back to reference Sudhakara, S., Ramanathi, M.K., Yarra, C., Ghosh, P.K.: An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering hmm transition probabilities. In: Proceedings of Interspeech, pp. 954–958 (2019) Sudhakara, S., Ramanathi, M.K., Yarra, C., Ghosh, P.K.: An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering hmm transition probabilities. In: Proceedings of Interspeech, pp. 954–958 (2019)
11.
go back to reference Arias, J.P., Yoma, N.B., Vivanco, H.: Automatic intonation assessment for computer aided language learning. Speech Commun. 52(3), 254–267 (2010)CrossRef Arias, J.P., Yoma, N.B., Vivanco, H.: Automatic intonation assessment for computer aided language learning. Speech Commun. 52(3), 254–267 (2010)CrossRef
13.
go back to reference Huang, G., Ye, J., Shen, Y., Zhou, Y.: A evaluating model of English pronunciation for Chinese students. In: Proceedings of the 9th International Conference on Communication Software and Networks (ICCSN), pp. 1062–1065. IEEE (2017) Huang, G., Ye, J., Shen, Y., Zhou, Y.: A evaluating model of English pronunciation for Chinese students. In: Proceedings of the 9th International Conference on Communication Software and Networks (ICCSN), pp. 1062–1065. IEEE (2017)
14.
go back to reference Xiao, Y., Soong, F., Hu, W.: Paired phone-posteriors approach to ESL pronunciation quality assessment. In: Proceedings of Interspeech, pp. 1631–1635 (2018) Xiao, Y., Soong, F., Hu, W.: Paired phone-posteriors approach to ESL pronunciation quality assessment. In: Proceedings of Interspeech, pp. 1631–1635 (2018)
15.
go back to reference Srinivasan, A., Yarra, C., Ghosh, P.K.: Automatic assessment of pronunciation and its dependent factors by exploring their interdependencies using DNN and LSTM. In: Proceedings of the 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE), pp. 30–34 (2019) Srinivasan, A., Yarra, C., Ghosh, P.K.: Automatic assessment of pronunciation and its dependent factors by exploring their interdependencies using DNN and LSTM. In: Proceedings of the 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE), pp. 30–34 (2019)
16.
go back to reference Gu, L., Harris, J.G.: SLAP: a system for the detection and correction of pronunciation for second language acquisition. In: Proceedings of the International Symposium on Circuits and Systems (ISCAS), vol. 2, p. II. IEEE (2003) Gu, L., Harris, J.G.: SLAP: a system for the detection and correction of pronunciation for second language acquisition. In: Proceedings of the International Symposium on Circuits and Systems (ISCAS), vol. 2, p. II. IEEE (2003)
17.
go back to reference Gray, R., Buzo, A., Gray, A., Matsuyama, Y.: Distortion measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 28(4), 367–376 (1980)CrossRef Gray, R., Buzo, A., Gray, A., Matsuyama, Y.: Distortion measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 28(4), 367–376 (1980)CrossRef
19.
go back to reference Mošner, L., et al.: Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6475–6479. IEEE (2019) Mošner, L., et al.: Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6475–6479. IEEE (2019)
20.
go back to reference Savchenko, A.V., Savchenko, L.V.: Towards the creation of reliable voice control system based on a fuzzy approach. Pattern Recogn. Lett. 65, 145–151 (2015)CrossRef Savchenko, A.V., Savchenko, L.V.: Towards the creation of reliable voice control system based on a fuzzy approach. Pattern Recogn. Lett. 65, 145–151 (2015)CrossRef
22.
go back to reference Su, H.Y., Gao, Y.: Adaptive gain reduction for encoding a speech signal. US Patent 9,269,365, 23 February 2016 Su, H.Y., Gao, Y.: Adaptive gain reduction for encoding a speech signal. US Patent 9,269,365, 23 February 2016
23.
go back to reference Dionelis, N., Brookes, M.: Speech enhancement using modulation-domain Kalman filtering with active speech level normalized log-spectrum global priors. In: Proceedings of the 25th European Signal Processing Conference (EUSIPCO), pp. 2309–2313. IEEE (2017) Dionelis, N., Brookes, M.: Speech enhancement using modulation-domain Kalman filtering with active speech level normalized log-spectrum global priors. In: Proceedings of the 25th European Signal Processing Conference (EUSIPCO), pp. 2309–2313. IEEE (2017)
24.
go back to reference Erkelens, J., Jensen, J., Heusdens, R.: A data-driven approach to optimizing spectral speech enhancement methods for various error criteria. Speech Commun. 49(7–8), 530–541 (2007)CrossRef Erkelens, J., Jensen, J., Heusdens, R.: A data-driven approach to optimizing spectral speech enhancement methods for various error criteria. Speech Commun. 49(7–8), 530–541 (2007)CrossRef
25.
go back to reference Bastos, I., Oliveira, L.B., Goes, J., Silva, M.: MOSFET-only wideband LNA with noise cancelling and gain optimization. In: Proceedings of the 17th International Conference Mixed Design of Integrated Circuits and Systems (MIXDES), pp. 306–311. IEEE (2010) Bastos, I., Oliveira, L.B., Goes, J., Silva, M.: MOSFET-only wideband LNA with noise cancelling and gain optimization. In: Proceedings of the 17th International Conference Mixed Design of Integrated Circuits and Systems (MIXDES), pp. 306–311. IEEE (2010)
26.
go back to reference Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of the 6th International Congress on Acoustics, pp. 17–20 (1968) Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of the 6th International Congress on Acoustics, pp. 17–20 (1968)
27.
go back to reference Marple Jr., S.L.: Digital Spectral Analysis with Applications, 2nd edn. Dover Publications, Mineola, New York (2019). 432 p. Marple Jr., S.L.: Digital Spectral Analysis with Applications, 2nd edn. Dover Publications, Mineola, New York (2019). 432 p.
29.
go back to reference Kullback, S.: Information Theory and Statistics. Dover Publications, New York (1997)MATH Kullback, S.: Information Theory and Statistics. Dover Publications, New York (1997)MATH
30.
go back to reference Savchenko, A.V., Belova, N.S.: Statistical testing of segment homogeneity in classification of piecewise-regular objects. Int. J. Appl. Math. Comput. Sci. 25(4), 915–925 (2015)MathSciNetCrossRef Savchenko, A.V., Belova, N.S.: Statistical testing of segment homogeneity in classification of piecewise-regular objects. Int. J. Appl. Math. Comput. Sci. 25(4), 915–925 (2015)MathSciNetCrossRef
31.
go back to reference Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 67–72 (1975)CrossRef Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 67–72 (1975)CrossRef
33.
go back to reference Sainath, T.N., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, pp. 1478–1482 (2015) Sainath, T.N., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, pp. 1478–1482 (2015)
34.
go back to reference Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720 (2017) Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:​1701.​02720 (2017)
35.
go back to reference Nakkiran, P., Alvarez, R., Prabhavalkar, R., Parada, C.: Compressing deep neural networks using a rank-constrained topology. In: Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, pp. 1473–1477 (2015) Nakkiran, P., Alvarez, R., Prabhavalkar, R., Parada, C.: Compressing deep neural networks using a rank-constrained topology. In: Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, pp. 1473–1477 (2015)
36.
Metadata
Title
Optimization of Gain in Symmetrized Itakura-Saito Discrimination for Pronunciation Learning
Authors
Andrey V. Savchenko
Vladimir V. Savchenko
Lyudmila V. Savchenko
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-49988-4_30

Premium Partner