Top

Published in:

2018 | OriginalPaper | Chapter

On the Comparison of Different Phrase Boundary Detection Approaches Trained on Czech TTS Speech Corpora

Author : Markéta Jůzová

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The phrasing is a very important issue in the process of speech synthesis since it ensures higher naturalness and intelligibility of synthesized sentences. There are many different approaches to phrase boundary detection, including simple classification-based, HMM-based, CRF-based approaches, however, different types of neural networks are used for this task as well. The paper compares representative methods for phrasing of Czech sentences using large-scale TTS speech corpora as training data, taking only speaker-dependent phrasing issue into consideration.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Designing Advanced Geometric Features for Automatic Russian Visual Speech Recognition

next chapter Word-Initial Consonant Lengthening in Stressed and Unstressed Syllables in Russian

Note that no cross-validation results for classifier’s parameters are presented in this paper since they were a part of the previous study in [7], and the parameters were set according to the best results shown in the aforementioned paper.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 999888, 2493–2537 (2011)MATH

Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: Proceedings of Interspeech 2014, pp. 2268–2272. ISCA, September 2014

Gregory, M.L.: Using conditional random fields to predict pitch accents in conversational speech. In: Proceedings of ACL 2004. ACL, East Stroudsburg, pp. 677–684 (2004)

Hanzlíček, Z.: Correction of prosodic phrases in large speech corpora. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 408–417. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_47CrossRef

Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Commun. 18(3), 281–290 (1996)CrossRef

Jůzová, M.: CRF-based phrase boundary detection trained on large-scale TTS speech corpora. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 272–281. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_26CrossRef

Jůzová, M.: Prosodic phrase boundary classification based on Czech speech corpora. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 165–173. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_19CrossRef

Jůzová, M., Tihelka, D., Volín, J.: On the extension of the formal prosody model for TTS. In: Text, Speech and Dialogue. Lecture Notes in Computer Science, Springer, Heidelberg (2018)

Koehn, P., Abney, S., Hirschberg, J., Collins, M.: Improving intonational phrasing with syntactic information. In: Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1289–1290 (2000)

10.

Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)

11.

Legát, M., Matoušek, J., Tihelka, D.: A robust multi-phase pitch-mark detection algorithm. Proc. Interspeech 2007, 1641–1644 (2007)

12.

Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Commun. 53(4), 552–566 (2011)CrossRef

13.

Louw, A., Moodley, A.: Speaker specific phrase break modeling with conditional random fields for text-to-speech. In: Proceedings of 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), pp. 1–6 (2016)

14.

Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: Proceedings of Interspeech 2008, pp. 1626–1629. ISCA (2008)

15.

Matoušek, J., Legát, M.: Is unit selection aware of audible artifacts? SSW 2013. In: Proceedings of the 8th Speech Synthesis Workshop, pp. 267–271. ISCA, Barcelona (2013)

16.

Matoušek, J., Tihelka, D.: Classification-based detection of glottal closure instants from speech signals. In: Proceedings of Interspeech 2017, pp. 3053–3057. ISCA (2017)

17.

Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006). https://doi.org/10.1007/11846406_55CrossRef

18.

Matoušek, J., Romportl, J.: Recording and annotation of speech corpus for Czech unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 326–333. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74628-7_43CrossRef

19.

Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proceedings of Interspeech 2013, pp. 1511–1515. ISCA (2013)

20.

Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of 2013 NAACL HLT, pp. 746–751 (2013)

21.

Mishra, T., Jun Kim, Y., Bangalore, S.: Intonational phrase break prediction for text-to-speech synthesis using dependency relations. In: Proceedings of ICASSP 2015, pp. 4919–4923 (2015)

22.

Palková, Z.: Rytmická výstavba prozaického textu. Studia ČSAV; čis. 13/1974, Academia (1974)

23.

Parlikar, A., Black, A.W.: Data-driven phrasing for speech synthesis in low-resource languages. Proc. ICASSP 2012, 4013–4016 (2012)

24.

Prahallad, K., Raghavendra, E.V., Black, A.W.: Learning speaker-specific phrase breaks for text-to-speech systems. In: The 7th ISCA Tutorial and Research Workshop on Speech Synthesis, pp. 162–166 (2010)

25.

Read, I., Cox, S.: Stochastic and syntactic techniques for predicting phrase breaks. Comput. Speech Lang. 21(3), 519–542 (2007)CrossRef

26.

Romportl, J., Matoušek, J.: Formal prosodic structures and their application in NLP. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 371–378. Springer, Heidelberg (2005). https://doi.org/10.1007/11551874_48CrossRef

27.

Romportl, J.: Structural data-driven prosody model for TTS synthesis. In: Proceedings of Speech Prosody 2006, pp. 549–552. TUDpress, Dresden (2006)

28.

Romportl, J.: Automatic prosodic phrase annotation in a corpus for speech synthesis. In: Proceedings of Speech Prosody 2010. University of Illionois, Chicago (2010)

29.

Romportl, J., Matoušek, J.: Several aspects of machine-driven phrasing in text-to-speech systems. Prague Bull. Math. Linguist. 95, 51–61 (2011)CrossRef

30.

Rosenberg, A., Fernandez, R., Ramabhadran, B.: Modeling phrasing and prominence using deep recurrent learning. In: Proceedings of Interspeech 2015, pp. 3066–3070. ISCA (2015)

31.

Sun, X., Applebaum, T.H.: Intonational phrase break prediction using decision tree and n-gram model. Proc. Eurospeech 2001, 3–7 (2001)

32.

Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York (2009)CrossRef

33.

Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Comput. Speech Lang. 12(2), 99–117 (1998)CrossRef

34.

Tihelka, D.: Symbolic prosody driven unit selection for highly natural synthetic speech. In: Proceedings of Interspeech 2005 - Eurospeech, pp. 2525–2528. ISCA (2005)

35.

Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: A decade of research on the field of speech technologies. In: Text, Speech and Dialogue. Lecture Notes in Computer Science, Springer, Heidelberg (2018)

36.

Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, pp. 174–177. ISCA (2010)

37.

Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semisupervised learning. In: Proceedings of ACL 2010, pp. 384–394. ACL (2010)

Title: On the Comparison of Different Phrase Boundary Detection Approaches Trained on Czech TTS Speech Corpora
Author: Markéta Jůzová
Publisher: Springer International Publishing
Book: Speech and Computer
Print ISBN: 978-3-319-99578-6

Electronic ISBN: 978-3-319-99579-3

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-99579-3_27

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner