nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

Optimization of Gain in Symmetrized Itakura-Saito Discrimination for Pronunciation Learning

verfasst von : Andrey V. Savchenko, Vladimir V. Savchenko, Lyudmila V. Savchenko

Erschienen in: Mathematical Optimization Theory and Operations Research

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper considers an assessment and evaluation of the pronunciation quality in computer-aided language learning systems. We propose the novel distortion measure for speech processing by using the gain optimization of the symmetrized Itakura-Saito divergence. This dissimilarity is implemented in a complete algorithm for pronunciation learning and improvement. At its first stage, a user has to achieve a stable pronunciation of all sounds by matching them with sounds of an ideal speaker. At the second stage, the recognition of sounds and their short sequences is carried out to guarantee the distinguishability of learned sounds. The training set may contain not only ideal sounds but the best utterances of a user obtained at the previous step. Finally, the word recognition accuracy is estimated by using deep neural networks fine-tuned on the best words from a user. Experimental study shows that the proposed procedure makes it possible to achieve high efficiency for learning of sounds and their sequences even in the presence of noise in an observed utterance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Most Favorable Russell Measures of Efficiency: Properties and Measurement

Nächstes Kapitel Integer Programming Approach to the Data Traffic Paths Recovering Problem

Golonka, E.M., Bowles, A.R., Frank, V.M., Richardson, D.L., Freynik, S.: Technologies for foreign language learning: a review of technology types and their effectiveness. Comput. Assist. Lang. Learn. 27(1), 70–105 (2014)CrossRef

Sztahó, D., Kiss, G., Vicsi, K.: Computer based speech prosody teaching system. Comput. Speech Lang. 50, 126–140 (2018)CrossRef

Han, K.I., Park, H.J., Lee, K.M.: Speech recognition and lip shape feature extraction for English vowel pronunciation of the hearing-impaired based on SVM technique. In: Proceedings of the International Conference on Big Data and Smart Computing (BigComp), pp. 293–296. IEEE (2016)

Hu, W., Qian, Y., Soong, F.K.: A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In: Proceedings of Interspeech, pp. 1886–1890 (2013)

Kneller, E., Karaulnyh, D.: System and method of converting voice signal into transcript presentation with metadata. RU Patent 2589851 C2, 10 July 2016

Agarwal, C., Chakraborty, P.: A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Educ. Inf. Technol. 24(6), 3731–3743 (2019). https://doi.org/10.1007/s10639-019-09955-7CrossRef

Haikun, T., Shiying, W., Xinsheng, L., Yue, X.G.: Speech recognition model based on deep learning and application in pronunciation quality evaluation system. In: Proceedings of the International Conference on Data Mining and Machine Learning, pp. 1–5 (2019)

Savchenko, V.V.: Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition. Radioelectron. Commun. Syst. 63(1), 42–54 (2020). https://doi.org/10.3103/S0735272720010045CrossRef

Franco, H., Bratt, H., Rossier, R., Rao Gadde, V., Shriberg, E., Abrash, V., Precoda, K.: Eduspeak®: a speech recognition and pronunciation scoring toolkit for computer-aided language learning applications. Lang. Test. 27(3), 401–418 (2010)CrossRef

10.

Sudhakara, S., Ramanathi, M.K., Yarra, C., Ghosh, P.K.: An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering hmm transition probabilities. In: Proceedings of Interspeech, pp. 954–958 (2019)

11.

Arias, J.P., Yoma, N.B., Vivanco, H.: Automatic intonation assessment for computer aided language learning. Speech Commun. 52(3), 254–267 (2010)CrossRef

12.

Elaraby, M.S., Abdallah, M., Abdou, S., Rashwan, M.: A deep neural networks (DNN) based models for a computer aided pronunciation learning system. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 51–58. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_5CrossRef

13.

Huang, G., Ye, J., Shen, Y., Zhou, Y.: A evaluating model of English pronunciation for Chinese students. In: Proceedings of the 9th International Conference on Communication Software and Networks (ICCSN), pp. 1062–1065. IEEE (2017)

14.

Xiao, Y., Soong, F., Hu, W.: Paired phone-posteriors approach to ESL pronunciation quality assessment. In: Proceedings of Interspeech, pp. 1631–1635 (2018)

15.

Srinivasan, A., Yarra, C., Ghosh, P.K.: Automatic assessment of pronunciation and its dependent factors by exploring their interdependencies using DNN and LSTM. In: Proceedings of the 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE), pp. 30–34 (2019)

16.

Gu, L., Harris, J.G.: SLAP: a system for the detection and correction of pronunciation for second language acquisition. In: Proceedings of the International Symposium on Circuits and Systems (ISCAS), vol. 2, p. II. IEEE (2003)

17.

Gray, R., Buzo, A., Gray, A., Matsuyama, Y.: Distortion measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 28(4), 367–376 (1980)CrossRef

18.

Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.): Springer Handbook of Speech Processing. SH. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9CrossRef

19.

Mošner, L., et al.: Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6475–6479. IEEE (2019)

20.

Savchenko, A.V., Savchenko, L.V.: Towards the creation of reliable voice control system based on a fuzzy approach. Pattern Recogn. Lett. 65, 145–151 (2015)CrossRef

21.

Savchenko, L.V., Savchenko, A.V.: Fuzzy phonetic decoding method in a phoneme recognition problem. In: Drugman, T., Dutoit, T. (eds.) NOLISP 2013. LNCS (LNAI), vol. 7911, pp. 176–183. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38847-7_23CrossRef

22.

Su, H.Y., Gao, Y.: Adaptive gain reduction for encoding a speech signal. US Patent 9,269,365, 23 February 2016

23.

Dionelis, N., Brookes, M.: Speech enhancement using modulation-domain Kalman filtering with active speech level normalized log-spectrum global priors. In: Proceedings of the 25th European Signal Processing Conference (EUSIPCO), pp. 2309–2313. IEEE (2017)

24.

Erkelens, J., Jensen, J., Heusdens, R.: A data-driven approach to optimizing spectral speech enhancement methods for various error criteria. Speech Commun. 49(7–8), 530–541 (2007)CrossRef

25.

Bastos, I., Oliveira, L.B., Goes, J., Silva, M.: MOSFET-only wideband LNA with noise cancelling and gain optimization. In: Proceedings of the 17th International Conference Mixed Design of Integrated Circuits and Systems (MIXDES), pp. 306–311. IEEE (2010)

26.

Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of the 6th International Congress on Acoustics, pp. 17–20 (1968)

27.

Marple Jr., S.L.: Digital Spectral Analysis with Applications, 2nd edn. Dover Publications, Mineola, New York (2019). 432 p.

28.

Savchenko, V.V.: Itakura–Saito divergence as an element of the information theory of speech perception. J. Commun. Technol. Electron. 64(6), 590–596 (2019). https://doi.org/10.1134/S1064226919060093CrossRef

29.

Kullback, S.: Information Theory and Statistics. Dover Publications, New York (1997)MATH

30.

Savchenko, A.V., Belova, N.S.: Statistical testing of segment homogeneity in classification of piecewise-regular objects. Int. J. Appl. Math. Comput. Sci. 25(4), 915–925 (2015)MathSciNetCrossRef

31.

Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 67–72 (1975)CrossRef

32.

Savchenko, V.V., Savchenko, L.V.: Method for measuring the intelligibility of speech signals in the Kullback–Leibler information metric. Meas. Tech. 62(9), 832–839 (2019). https://doi.org/10.1007/s11018-019-01702-1CrossRef

33.

Sainath, T.N., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, pp. 1478–1482 (2015)

34.

Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720 (2017)

35.

Nakkiran, P., Alvarez, R., Prabhavalkar, R., Parada, C.: Compressing deep neural networks using a rank-constrained topology. In: Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, pp. 1473–1477 (2015)

36.

Kuchaiev, O., et al.: Nemo: a toolkit for building AI applications using neural modules. arXiv preprint arXiv:1909.09577 (2019)

Titel: Optimization of Gain in Symmetrized Itakura-Saito Discrimination for Pronunciation Learning
verfasst von: Andrey V. Savchenko
Vladimir V. Savchenko
Lyudmila V. Savchenko
Verlag: Springer International Publishing
Buch: Mathematical Optimization Theory and Operations Research
Print ISBN: 978-3-030-49987-7

Electronic ISBN: 978-3-030-49988-4

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-49988-4_30

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"