Skip to main content
Top

2018 | OriginalPaper | Chapter

On the Comparison of Different Phrase Boundary Detection Approaches Trained on Czech TTS Speech Corpora

Author : Markéta Jůzová

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The phrasing is a very important issue in the process of speech synthesis since it ensures higher naturalness and intelligibility of synthesized sentences. There are many different approaches to phrase boundary detection, including simple classification-based, HMM-based, CRF-based approaches, however, different types of neural networks are used for this task as well. The paper compares representative methods for phrasing of Czech sentences using large-scale TTS speech corpora as training data, taking only speaker-dependent phrasing issue into consideration.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Note that no cross-validation results for classifier’s parameters are presented in this paper since they were a part of the previous study in [7], and the parameters were set according to the best results shown in the aforementioned paper.
 
Literature
1.
go back to reference Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 999888, 2493–2537 (2011)MATH Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 999888, 2493–2537 (2011)MATH
2.
go back to reference Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: Proceedings of Interspeech 2014, pp. 2268–2272. ISCA, September 2014 Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: Proceedings of Interspeech 2014, pp. 2268–2272. ISCA, September 2014
3.
go back to reference Gregory, M.L.: Using conditional random fields to predict pitch accents in conversational speech. In: Proceedings of ACL 2004. ACL, East Stroudsburg, pp. 677–684 (2004) Gregory, M.L.: Using conditional random fields to predict pitch accents in conversational speech. In: Proceedings of ACL 2004. ACL, East Stroudsburg, pp. 677–684 (2004)
5.
go back to reference Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Commun. 18(3), 281–290 (1996)CrossRef Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Commun. 18(3), 281–290 (1996)CrossRef
8.
go back to reference Jůzová, M., Tihelka, D., Volín, J.: On the extension of the formal prosody model for TTS. In: Text, Speech and Dialogue. Lecture Notes in Computer Science, Springer, Heidelberg (2018) Jůzová, M., Tihelka, D., Volín, J.: On the extension of the formal prosody model for TTS. In: Text, Speech and Dialogue. Lecture Notes in Computer Science, Springer, Heidelberg (2018)
9.
go back to reference Koehn, P., Abney, S., Hirschberg, J., Collins, M.: Improving intonational phrasing with syntactic information. In: Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1289–1290 (2000) Koehn, P., Abney, S., Hirschberg, J., Collins, M.: Improving intonational phrasing with syntactic information. In: Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1289–1290 (2000)
10.
go back to reference Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001) Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
11.
go back to reference Legát, M., Matoušek, J., Tihelka, D.: A robust multi-phase pitch-mark detection algorithm. Proc. Interspeech 2007, 1641–1644 (2007) Legát, M., Matoušek, J., Tihelka, D.: A robust multi-phase pitch-mark detection algorithm. Proc. Interspeech 2007, 1641–1644 (2007)
12.
go back to reference Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Commun. 53(4), 552–566 (2011)CrossRef Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Commun. 53(4), 552–566 (2011)CrossRef
13.
go back to reference Louw, A., Moodley, A.: Speaker specific phrase break modeling with conditional random fields for text-to-speech. In: Proceedings of 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), pp. 1–6 (2016) Louw, A., Moodley, A.: Speaker specific phrase break modeling with conditional random fields for text-to-speech. In: Proceedings of 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), pp. 1–6 (2016)
14.
go back to reference Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: Proceedings of Interspeech 2008, pp. 1626–1629. ISCA (2008) Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: Proceedings of Interspeech 2008, pp. 1626–1629. ISCA (2008)
15.
go back to reference Matoušek, J., Legát, M.: Is unit selection aware of audible artifacts? SSW 2013. In: Proceedings of the 8th Speech Synthesis Workshop, pp. 267–271. ISCA, Barcelona (2013) Matoušek, J., Legát, M.: Is unit selection aware of audible artifacts? SSW 2013. In: Proceedings of the 8th Speech Synthesis Workshop, pp. 267–271. ISCA, Barcelona (2013)
16.
go back to reference Matoušek, J., Tihelka, D.: Classification-based detection of glottal closure instants from speech signals. In: Proceedings of Interspeech 2017, pp. 3053–3057. ISCA (2017) Matoušek, J., Tihelka, D.: Classification-based detection of glottal closure instants from speech signals. In: Proceedings of Interspeech 2017, pp. 3053–3057. ISCA (2017)
19.
go back to reference Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proceedings of Interspeech 2013, pp. 1511–1515. ISCA (2013) Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proceedings of Interspeech 2013, pp. 1511–1515. ISCA (2013)
20.
go back to reference Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of 2013 NAACL HLT, pp. 746–751 (2013) Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of 2013 NAACL HLT, pp. 746–751 (2013)
21.
go back to reference Mishra, T., Jun Kim, Y., Bangalore, S.: Intonational phrase break prediction for text-to-speech synthesis using dependency relations. In: Proceedings of ICASSP 2015, pp. 4919–4923 (2015) Mishra, T., Jun Kim, Y., Bangalore, S.: Intonational phrase break prediction for text-to-speech synthesis using dependency relations. In: Proceedings of ICASSP 2015, pp. 4919–4923 (2015)
22.
go back to reference Palková, Z.: Rytmická výstavba prozaického textu. Studia ČSAV; čis. 13/1974, Academia (1974) Palková, Z.: Rytmická výstavba prozaického textu. Studia ČSAV; čis. 13/1974, Academia (1974)
23.
go back to reference Parlikar, A., Black, A.W.: Data-driven phrasing for speech synthesis in low-resource languages. Proc. ICASSP 2012, 4013–4016 (2012) Parlikar, A., Black, A.W.: Data-driven phrasing for speech synthesis in low-resource languages. Proc. ICASSP 2012, 4013–4016 (2012)
24.
go back to reference Prahallad, K., Raghavendra, E.V., Black, A.W.: Learning speaker-specific phrase breaks for text-to-speech systems. In: The 7th ISCA Tutorial and Research Workshop on Speech Synthesis, pp. 162–166 (2010) Prahallad, K., Raghavendra, E.V., Black, A.W.: Learning speaker-specific phrase breaks for text-to-speech systems. In: The 7th ISCA Tutorial and Research Workshop on Speech Synthesis, pp. 162–166 (2010)
25.
go back to reference Read, I., Cox, S.: Stochastic and syntactic techniques for predicting phrase breaks. Comput. Speech Lang. 21(3), 519–542 (2007)CrossRef Read, I., Cox, S.: Stochastic and syntactic techniques for predicting phrase breaks. Comput. Speech Lang. 21(3), 519–542 (2007)CrossRef
27.
go back to reference Romportl, J.: Structural data-driven prosody model for TTS synthesis. In: Proceedings of Speech Prosody 2006, pp. 549–552. TUDpress, Dresden (2006) Romportl, J.: Structural data-driven prosody model for TTS synthesis. In: Proceedings of Speech Prosody 2006, pp. 549–552. TUDpress, Dresden (2006)
28.
go back to reference Romportl, J.: Automatic prosodic phrase annotation in a corpus for speech synthesis. In: Proceedings of Speech Prosody 2010. University of Illionois, Chicago (2010) Romportl, J.: Automatic prosodic phrase annotation in a corpus for speech synthesis. In: Proceedings of Speech Prosody 2010. University of Illionois, Chicago (2010)
29.
go back to reference Romportl, J., Matoušek, J.: Several aspects of machine-driven phrasing in text-to-speech systems. Prague Bull. Math. Linguist. 95, 51–61 (2011)CrossRef Romportl, J., Matoušek, J.: Several aspects of machine-driven phrasing in text-to-speech systems. Prague Bull. Math. Linguist. 95, 51–61 (2011)CrossRef
30.
go back to reference Rosenberg, A., Fernandez, R., Ramabhadran, B.: Modeling phrasing and prominence using deep recurrent learning. In: Proceedings of Interspeech 2015, pp. 3066–3070. ISCA (2015) Rosenberg, A., Fernandez, R., Ramabhadran, B.: Modeling phrasing and prominence using deep recurrent learning. In: Proceedings of Interspeech 2015, pp. 3066–3070. ISCA (2015)
31.
go back to reference Sun, X., Applebaum, T.H.: Intonational phrase break prediction using decision tree and n-gram model. Proc. Eurospeech 2001, 3–7 (2001) Sun, X., Applebaum, T.H.: Intonational phrase break prediction using decision tree and n-gram model. Proc. Eurospeech 2001, 3–7 (2001)
32.
go back to reference Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York (2009)CrossRef Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York (2009)CrossRef
33.
go back to reference Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Comput. Speech Lang. 12(2), 99–117 (1998)CrossRef Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Comput. Speech Lang. 12(2), 99–117 (1998)CrossRef
34.
go back to reference Tihelka, D.: Symbolic prosody driven unit selection for highly natural synthetic speech. In: Proceedings of Interspeech 2005 - Eurospeech, pp. 2525–2528. ISCA (2005) Tihelka, D.: Symbolic prosody driven unit selection for highly natural synthetic speech. In: Proceedings of Interspeech 2005 - Eurospeech, pp. 2525–2528. ISCA (2005)
35.
go back to reference Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: A decade of research on the field of speech technologies. In: Text, Speech and Dialogue. Lecture Notes in Computer Science, Springer, Heidelberg (2018) Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: A decade of research on the field of speech technologies. In: Text, Speech and Dialogue. Lecture Notes in Computer Science, Springer, Heidelberg (2018)
36.
go back to reference Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, pp. 174–177. ISCA (2010) Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, pp. 174–177. ISCA (2010)
37.
go back to reference Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semisupervised learning. In: Proceedings of ACL 2010, pp. 384–394. ACL (2010) Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semisupervised learning. In: Proceedings of ACL 2010, pp. 384–394. ACL (2010)
Metadata
Title
On the Comparison of Different Phrase Boundary Detection Approaches Trained on Czech TTS Speech Corpora
Author
Markéta Jůzová
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-99579-3_27

Premium Partner