Skip to main content
Top

2017 | OriginalPaper | Chapter

Last Syllable Unit Penalization in Unit Selection TTS

Authors : Markéta Jůzová, Daniel Tihelka, Radek Skarnitzl

Published in: Text, Speech, and Dialogue

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

While unit selection speech synthesis tries to avoid speech modifications, it strongly depends on the placement of units into the correct position. Usually, the position is tightly coupled with a distance from the beginning/end of some prosodic or rhythmic units like phrases or words. The present paper shows, however, that it is not necessary to follow position requirements, when the phonetic knowledge of the perception of prosodic patterns (mostly durational in our case) is considered. In particular, we focus on the effects of using word-final units in word-internal positions in synthesized speech, which are often perceived negatively by listeners, due to disruptions in local timing.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Baddeley, A.: Human Memory: Theory and Practice. Psychology Press, East Sussex (1997). Revised edn Baddeley, A.: Human Memory: Theory and Practice. Psychology Press, East Sussex (1997). Revised edn
2.
go back to reference Beckman, M., Edwards, J.: Lengthenings and shortenings and the nature of prosodic constituency. In: Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech, pp. 152–178. Cambridge University Press, Cambridge (1990) Beckman, M., Edwards, J.: Lengthenings and shortenings and the nature of prosodic constituency. In: Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech, pp. 152–178. Cambridge University Press, Cambridge (1990)
3.
go back to reference Buxton, H.: Temporal predictability in the perception of English speech. In: Cutler, A., Ladd, D.R. (eds.) Prosody: Models and Measurements, vol. 14, pp. 111–121. Springer, Heidelberg (1983) Buxton, H.: Temporal predictability in the perception of English speech. In: Cutler, A., Ladd, D.R. (eds.) Prosody: Models and Measurements, vol. 14, pp. 111–121. Springer, Heidelberg (1983)
4.
go back to reference Byrd, D., Saltzman, E.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Phonetics 31, 149–180 (2003)CrossRef Byrd, D., Saltzman, E.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Phonetics 31, 149–180 (2003)CrossRef
5.
go back to reference Crystal, T.H., House, A.S.: Segmental durations in connected-speech signals: current results. J. Acoust. Soc. Am. 83, 1553–1573 (1988)CrossRef Crystal, T.H., House, A.S.: Segmental durations in connected-speech signals: current results. J. Acoust. Soc. Am. 83, 1553–1573 (1988)CrossRef
6.
go back to reference Cutler, A., Butterfield, S.: Syllabic lengthening as a word boundary cue. In: Proceedings of the 3rd Australian SST, pp. 324–328 (1990) Cutler, A., Butterfield, S.: Syllabic lengthening as a word boundary cue. In: Proceedings of the 3rd Australian SST, pp. 324–328 (1990)
7.
go back to reference Dankovičová, J.: The domain of articulation rate variation in Czech. J. Phonetics 25, 287–312 (1997) Dankovičová, J.: The domain of articulation rate variation in Czech. J. Phonetics 25, 287–312 (1997)
8.
go back to reference Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: Proceedings of Interspeech, pp. 2268–2272. ISCA (2014) Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: Proceedings of Interspeech, pp. 2268–2272. ISCA (2014)
9.
go back to reference Fletcher, J.: The prosody of speech: timing and rhythm. In: The Handbook of Phonetic Sciences, pp. 521–602. Blackwell Publishing Ltd. (2010) Fletcher, J.: The prosody of speech: timing and rhythm. In: The Handbook of Phonetic Sciences, pp. 521–602. Blackwell Publishing Ltd. (2010)
10.
go back to reference Gussenhoven, C.: The Phonology of Tone and Intonation. Cambridge University Press, Cambridge (2004) Gussenhoven, C.: The Phonology of Tone and Intonation. Cambridge University Press, Cambridge (2004)
11.
12.
go back to reference Holm, B., Bailly, G.: Generating prosody by superposing multi-parametric overlapping contours. In: Proceedings of ICSLP, pp. 203–206 (2000) Holm, B., Bailly, G.: Generating prosody by superposing multi-parametric overlapping contours. In: Proceedings of ICSLP, pp. 203–206 (2000)
13.
go back to reference Klatt, D.H.: Linguistic uses of segmental duration in English: acoustic and perceptual evidence. J. Acoust. Soc. Am. 59, 1208–1221 (1976)CrossRef Klatt, D.H.: Linguistic uses of segmental duration in English: acoustic and perceptual evidence. J. Acoust. Soc. Am. 59, 1208–1221 (1976)CrossRef
14.
go back to reference Ladd, D.R.: Intonational Phonology, 2nd edn. Cambridge University Press, Cambridge (2008)CrossRef Ladd, D.R.: Intonational Phonology, 2nd edn. Cambridge University Press, Cambridge (2008)CrossRef
15.
go back to reference Matoušek, J., Hanzlíček, Z., Tihelka, D.: Hybrid syllable/triphone speech synthesis. In: Proceedings of 9th Interspeech (Eurospeech), Lisbon, Portugal, pp. 2529–2532 (2005) Matoušek, J., Hanzlíček, Z., Tihelka, D.: Hybrid syllable/triphone speech synthesis. In: Proceedings of 9th Interspeech (Eurospeech), Lisbon, Portugal, pp. 2529–2532 (2005)
16.
go back to reference Matoušek, J., Romportl, J., Tihelka, D., Tychtl, Z.: Recent improvements on ARTIC: czech text-to-speech system. In: Proceedings of Interspeech, Jeju Island, Korea, pp. 1933–1936 (2004) Matoušek, J., Romportl, J., Tihelka, D., Tychtl, Z.: Recent improvements on ARTIC: czech text-to-speech system. In: Proceedings of Interspeech, Jeju Island, Korea, pp. 1933–1936 (2004)
17.
go back to reference NíChasaide, A., Yanushevskaya, I., Gobl, C.: Prosody of voice: declination, sentence mode and interaction with prominence. In: Proceedings of 18th ICPhS (2015). Paper 476 NíChasaide, A., Yanushevskaya, I., Gobl, C.: Prosody of voice: declination, sentence mode and interaction with prominence. In: Proceedings of 18th ICPhS (2015). Paper 476
18.
go back to reference Quené, H., van Delft, L.E.: Non-native durational patterns decrease speech intelligibility. Speech Commun. 52(11–12), 911–918 (2010) Quené, H., van Delft, L.E.: Non-native durational patterns decrease speech intelligibility. Speech Commun. 52(11–12), 911–918 (2010)
19.
go back to reference Quené, H., Port, R.: Effects of timing regularity and metrical expectancy on spoken-word perception. Phonetica 62(1), 1–13 (2005) Quené, H., Port, R.: Effects of timing regularity and metrical expectancy on spoken-word perception. Phonetica 62(1), 1–13 (2005)
20.
go back to reference Romportl, J., Kala, J.: Prosody modelling in Czech text-to-speech synthesis. In: Proceedings of the 6th ISCA SSW, Bonn, pp. 200–205 (2007) Romportl, J., Kala, J.: Prosody modelling in Czech text-to-speech synthesis. In: Proceedings of the 6th ISCA SSW, Bonn, pp. 200–205 (2007)
22.
go back to reference van Santen, J.P.H.: Assignment of segmental duration in text-to-speech synthesis. Comput. Speech Lang. 8, 95–128 (1994)CrossRef van Santen, J.P.H.: Assignment of segmental duration in text-to-speech synthesis. Comput. Speech Lang. 8, 95–128 (1994)CrossRef
23.
go back to reference Skarnitzl, R., Eriksson, A.: The acoustics of word stress in Czech as a function of speaking style. In: Proceedings of Interspeech (2017) Skarnitzl, R., Eriksson, A.: The acoustics of word stress in Czech as a function of speaking style. In: Proceedings of Interspeech (2017)
24.
go back to reference Tihelka, D.: Symbolic prosody driven unit selection for highly natural synthetic speech. In: Proceedings of 9th Interspeech (Eurospeech), pp. 2525–2528. ISCA, Bonn (2005) Tihelka, D.: Symbolic prosody driven unit selection for highly natural synthetic speech. In: Proceedings of 9th Interspeech (Eurospeech), pp. 2525–2528. ISCA, Bonn (2005)
25.
go back to reference Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 442–449. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40585-3_56 Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 442–449. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-40585-3_​56
26.
go back to reference Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proceedings of 9th ICSLP, vol. 1, pp. 2042–2045. ISCA, Bonn (2006) Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proceedings of 9th ICSLP, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)
27.
go back to reference Tihelka, D., Méner, M.: Generalized non-uniform time scaling distribution method for natural-sounding speech rate change. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 147–154. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23538-2_19 CrossRef Tihelka, D., Méner, M.: Generalized non-uniform time scaling distribution method for natural-sounding speech rate change. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 147–154. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-23538-2_​19 CrossRef
28.
go back to reference Tihelka, D., Romportl, J.: Exploring automatic similarity measures for unit selection tuning. In: Proceedings of 10th Interspeech, pp. 736–739. ISCA, Brighton (2009) Tihelka, D., Romportl, J.: Exploring automatic similarity measures for unit selection tuning. In: Proceedings of 10th Interspeech, pp. 736–739. ISCA, Brighton (2009)
29.
go back to reference Volín, J., Skarnitzl, R.: Temporal downtrends in Czech read speech. In: Proceedings of Interspeech, pp. 442–445 (2007) Volín, J., Skarnitzl, R.: Temporal downtrends in Czech read speech. In: Proceedings of Interspeech, pp. 442–445 (2007)
30.
go back to reference Volín, J., Poesová, K., Skarnitzl, R.: The impact of rhythmic distortions in speech on personality assessment. Res. Lang. 12, 209–216 (2014) Volín, J., Poesová, K., Skarnitzl, R.: The impact of rhythmic distortions in speech on personality assessment. Res. Lang. 12, 209–216 (2014)
31.
go back to reference White, L., Turk, A.E.: English words on the procrustean bed: polysyllabic shortening reconsidered. J. Phonetics 38(3), 459–471 (2010)CrossRef White, L., Turk, A.E.: English words on the procrustean bed: polysyllabic shortening reconsidered. J. Phonetics 38(3), 459–471 (2010)CrossRef
32.
go back to reference Windmann, A., Šimko, J., Wagner, P.: Polysyllabic shortening and word-final lengthening in English. In: Interspeech 2015, pp. 23–40 (2015) Windmann, A., Šimko, J., Wagner, P.: Polysyllabic shortening and word-final lengthening in English. In: Interspeech 2015, pp. 23–40 (2015)
33.
go back to reference Wu, Z., Watts, O., King, S.: Merlin: an open source neural network speech synthesis system. In: Proceedings of 9th ISCA SSW, pp. 218–223, September 2016 Wu, Z., Watts, O., King, S.: Merlin: an open source neural network speech synthesis system. In: Proceedings of 9th ISCA SSW, pp. 218–223, September 2016
Metadata
Title
Last Syllable Unit Penalization in Unit Selection TTS
Authors
Markéta Jůzová
Daniel Tihelka
Radek Skarnitzl
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-64206-2_36

Premium Partner