Top

Published in:

2018 | OriginalPaper | Chapter

Phoneme Duration Prediction for Kazakh Language

Authors : Arman Kaliyev, Sergey V. Rybin, Yuri N. Matveev

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Our research team set the goal of creating a modern speech synthesis system for the Kazakh language. One of the most important components of such system is the phoneme duration prediction. In this article, we present our work on the creation of such a classifier. We managed to develop a detector based on deep neural network, using for this purpose a minimum number of input linguistic and phonetic parameters. Based on the learning results, the proposed detector predicts the duration of phonemes on test data with a deviation of 20–25 ms on average.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Word-Initial Consonant Lengthening in Stressed and Unstressed Syllables in Russian

next chapter Optimized Active Learning Strategy for Audiovisual Speaker Recognition

Arman K., Rybin, S.V., Matveev, Y.N., Kaziyeva, N., Burambayeva, N.,: Modeling pause for the synthesis of Kazakh speech. In: Proceedings of the Fourth International Conference on Engineering & MIS 2018 (ICEMIS 2018), Article 1, 4 p. ACM, New York, NY, USA, (2018). https://doi.org/10.1145/3234698.3234699

Chen, B., Bian, T., Yu, K.: Discrete duration model for speech synthesis. In: 18th Annual Conference of the International Speech Communication Association, Interspeech 2017, Stockholm, Sweden, 20–24 August 2017, pp. 789–793 (2017)

Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: 15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014, Singapore, 14–18 September 2014, pp. 2268–2272 (2014)

Foltz, P.W.: Latent semantic analysis for text-based research. Behav. Res. Methods Instrum. Comput. 28(2), 197–202 (1996). https://doi.org/10.3758/BF03204765CrossRef

Henter, G.E., Ronanki, S., Watts, O., Wester, M., Wu, Z., King, S.: Robust TTS duration modelling using DNNs. In: Proceedings of the ICASSP, vol. 41. IEEE, Shanghai, March 2016. http://homepages.inf.ed.ac.uk/ghenter/pubs/henter2016robust.pdf

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735CrossRef

Kaliyev, A., Rybin, S.V., Matveev, Y.: The pausing method based on brown clustering and word embedding. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 741–747. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_74CrossRef

Karpov, A., Verkhodanova, V.: Speech technologies for under-resourced languages of the world 2015, pp. 117–135 (2015)

Khomitsevich, O., Mendelev, V., Tomashenko, N., Rybin, S., Medennikov, I., Kudubayeva, S.: A bilingual Kazakh-Russian system for automatic speech recognition and synthesis. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 25–33. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23132-7_3CrossRef

10.

Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of ACL 2008: HLT, pp. 595–603. Association for Computational Linguistics, Columbus, June 2008. http://www.aclweb.org/anthology/P/P08/P08-1068

11.

Miller, S., Guinness, J., Zamanian, A.: Name tagging with word clusters and discriminative training. In: Susan Dumais, D.M., Roukos, S. (eds.) HLT-NAACL 2004: Main Proceedings, pp. 337–342. Association for Computational Linguistics, Boston, 2–7 May 2004. http://www.aclweb.org/anthology/N04-1043

12.

Ronanki, S., Watts, O., King, S., Henter, G.E.: Median-based generation of synthetic speech durations using a non-parametric approach. CoRR abs/1608.06134 (2016). http://arxiv.org/abs/1608.06134

13.

Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4470–4474 (2015)

14.

Zen, H., Senior, A.W.: Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, 4–9 May 2014, pp. 3844–3848 (2014). https://doi.org/10.1109/ICASSP.2014.6854321

Title: Phoneme Duration Prediction for Kazakh Language
Authors: Arman Kaliyev
Sergey V. Rybin
Yuri N. Matveev
Publisher: Springer International Publishing
Book: Speech and Computer
Print ISBN: 978-3-319-99578-6

Electronic ISBN: 978-3-319-99579-3

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-99579-3_29

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner