Skip to main content
Top

2018 | OriginalPaper | Chapter

Phoneme Duration Prediction for Kazakh Language

Authors : Arman Kaliyev, Sergey V. Rybin, Yuri N. Matveev

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Our research team set the goal of creating a modern speech synthesis system for the Kazakh language. One of the most important components of such system is the phoneme duration prediction. In this article, we present our work on the creation of such a classifier. We managed to develop a detector based on deep neural network, using for this purpose a minimum number of input linguistic and phonetic parameters. Based on the learning results, the proposed detector predicts the duration of phonemes on test data with a deviation of 20–25 ms on average.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Arman K., Rybin, S.V., Matveev, Y.N., Kaziyeva, N., Burambayeva, N.,: Modeling pause for the synthesis of Kazakh speech. In: Proceedings of the Fourth International Conference on Engineering & MIS 2018 (ICEMIS 2018), Article 1, 4 p. ACM, New York, NY, USA, (2018). https://doi.org/10.1145/3234698.3234699 Arman K., Rybin, S.V., Matveev, Y.N., Kaziyeva, N., Burambayeva, N.,: Modeling pause for the synthesis of Kazakh speech. In: Proceedings of the Fourth International Conference on Engineering & MIS 2018 (ICEMIS 2018), Article 1, 4 p. ACM, New York, NY, USA, (2018). https://​doi.​org/​10.​1145/​3234698.​3234699
2.
go back to reference Chen, B., Bian, T., Yu, K.: Discrete duration model for speech synthesis. In: 18th Annual Conference of the International Speech Communication Association, Interspeech 2017, Stockholm, Sweden, 20–24 August 2017, pp. 789–793 (2017) Chen, B., Bian, T., Yu, K.: Discrete duration model for speech synthesis. In: 18th Annual Conference of the International Speech Communication Association, Interspeech 2017, Stockholm, Sweden, 20–24 August 2017, pp. 789–793 (2017)
3.
go back to reference Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: 15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014, Singapore, 14–18 September 2014, pp. 2268–2272 (2014) Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: 15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014, Singapore, 14–18 September 2014, pp. 2268–2272 (2014)
8.
go back to reference Karpov, A., Verkhodanova, V.: Speech technologies for under-resourced languages of the world 2015, pp. 117–135 (2015) Karpov, A., Verkhodanova, V.: Speech technologies for under-resourced languages of the world 2015, pp. 117–135 (2015)
9.
go back to reference Khomitsevich, O., Mendelev, V., Tomashenko, N., Rybin, S., Medennikov, I., Kudubayeva, S.: A bilingual Kazakh-Russian system for automatic speech recognition and synthesis. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 25–33. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23132-7_3CrossRef Khomitsevich, O., Mendelev, V., Tomashenko, N., Rybin, S., Medennikov, I., Kudubayeva, S.: A bilingual Kazakh-Russian system for automatic speech recognition and synthesis. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 25–33. Springer, Cham (2015). https://​doi.​org/​10.​1007/​978-3-319-23132-7_​3CrossRef
11.
go back to reference Miller, S., Guinness, J., Zamanian, A.: Name tagging with word clusters and discriminative training. In: Susan Dumais, D.M., Roukos, S. (eds.) HLT-NAACL 2004: Main Proceedings, pp. 337–342. Association for Computational Linguistics, Boston, 2–7 May 2004. http://www.aclweb.org/anthology/N04-1043 Miller, S., Guinness, J., Zamanian, A.: Name tagging with word clusters and discriminative training. In: Susan Dumais, D.M., Roukos, S. (eds.) HLT-NAACL 2004: Main Proceedings, pp. 337–342. Association for Computational Linguistics, Boston, 2–7 May 2004. http://​www.​aclweb.​org/​anthology/​N04-1043
13.
go back to reference Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4470–4474 (2015) Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4470–4474 (2015)
14.
go back to reference Zen, H., Senior, A.W.: Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, 4–9 May 2014, pp. 3844–3848 (2014). https://doi.org/10.1109/ICASSP.2014.6854321 Zen, H., Senior, A.W.: Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, 4–9 May 2014, pp. 3844–3848 (2014). https://​doi.​org/​10.​1109/​ICASSP.​2014.​6854321
Metadata
Title
Phoneme Duration Prediction for Kazakh Language
Authors
Arman Kaliyev
Sergey V. Rybin
Yuri N. Matveev
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-99579-3_29

Premium Partner