Skip to main content
Top

2016 | OriginalPaper | Chapter

A Linguistic Interpretation of the Atom Decomposition of Fundamental Frequency Contour for American English

Authors : Tijana Delić, Branislav Gerazov, Branislav Popović, Milan Sečujski

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

One of the most recently proposed techniques for modeling the prosody of an utterance is the decomposition of its pitch, duration and/or energy contour into physiologically motivated units called atoms, based on matching pursuit. Since this model is based on the physiology of the production of sentence intonation, it is essentially language independent. However, the intonation of an utterance in a particular language is obviously under the influence of factors of a predominantly linguistic nature. In this research, restricted to the case of American English with prosody annotated using standard ToBI conventions, we have shown that, under certain mild constraints, the positive and negative atoms identified in the pitch contour coincide very well with high and low pitch accents and phrase accents of ToBI. By giving a linguistic interpretation of the atom decomposition model, this research enables its practical use in domains such as speech synthesis or cross-lingual prosody transfer.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Six recordings from the initial set of 910 recordings were excluded because they contained the ERROR tag where a boundary tone was expected.
 
Literature
1.
go back to reference Fujisaki, H., Nagashima, S.: A model for the synthesis of pitch contours of connected speech. Technical Report, Engineering Research Institute. University of Tokyo, Japan (1969) Fujisaki, H., Nagashima, S.: A model for the synthesis of pitch contours of connected speech. Technical Report, Engineering Research Institute. University of Tokyo, Japan (1969)
2.
go back to reference Strik, H.: Physiological control and behaviour of the voice source in the production of prosody, Ph.D. thesis, Department of Language and Speech, University of Nijmegen, Netherlands (1994) Strik, H.: Physiological control and behaviour of the voice source in the production of prosody, Ph.D. thesis, Department of Language and Speech, University of Nijmegen, Netherlands (1994)
3.
go back to reference Kochanski, G.P., Shih, C.: Stem-ML: Language independent prosody description. In: International Conference on Spoken Language Processing (ICSLP), vol. 3, pp. 239–242 (2000) Kochanski, G.P., Shih, C.: Stem-ML: Language independent prosody description. In: International Conference on Spoken Language Processing (ICSLP), vol. 3, pp. 239–242 (2000)
4.
go back to reference Honnet, P.-E., Gerazov, B., Garner, P.N.: Atom decomposition-based intonation modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP (2015) Honnet, P.-E., Gerazov, B., Garner, P.N.: Atom decomposition-based intonation modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP (2015)
5.
go back to reference Gerazov, B., Honnet, P-E., Gjoreski, A., Garner, P.: Weighted correlation based atom decomposition intonation modeling. In: INTERSPEECH (2015) Gerazov, B., Honnet, P-E., Gjoreski, A., Garner, P.: Weighted correlation based atom decomposition intonation modeling. In: INTERSPEECH (2015)
6.
go back to reference Pierrehumbert, J.B.: The phonetics and phonology of English intonation (Ph.D. thesis). MIT, Cambridge, MA, USA (1980) Pierrehumbert, J.B.: The phonetics and phonology of English intonation (Ph.D. thesis). MIT, Cambridge, MA, USA (1980)
7.
go back to reference Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierre-humbert, J., Hirschberg, J.: ToBI: A standard for labeling English prosody. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 867–870 (1992) Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierre-humbert, J., Hirschberg, J.: ToBI: A standard for labeling English prosody. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 867–870 (1992)
8.
go back to reference Taylor, P.: Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107(3), 1697–1714 (2000)CrossRef Taylor, P.: Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107(3), 1697–1714 (2000)CrossRef
9.
go back to reference Aubergé, V.: Prosody modeling with a dynamic lexicon of intonative forms: Application for text-to-speech synthesis. In: Proceedings of the ESCA Workshop on Prosody, pp. 62–65 (1993) Aubergé, V.: Prosody modeling with a dynamic lexicon of intonative forms: Application for text-to-speech synthesis. In: Proceedings of the ESCA Workshop on Prosody, pp. 62–65 (1993)
10.
go back to reference Holm, B,. Bailly G.: Generating prosody by superposing multi-parametric overlapping contours. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 203–206 (2000) Holm, B,. Bailly G.: Generating prosody by superposing multi-parametric overlapping contours. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 203–206 (2000)
11.
go back to reference Kohler, K.J.: Studies in German intonation, Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung. Universität Kiel, vol. 25, 295–360 (1991) Kohler, K.J.: Studies in German intonation, Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung. Universität Kiel, vol. 25, 295–360 (1991)
12.
go back to reference Kohler, K.J.: Parametric control of prosodic variables by symbolic input in TTS synthesis. In: van Santen, J., Sproat, R., Olive, J., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 459–475. Springer, New York (1997)CrossRef Kohler, K.J.: Parametric control of prosodic variables by symbolic input in TTS synthesis. In: van Santen, J., Sproat, R., Olive, J., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 459–475. Springer, New York (1997)CrossRef
13.
go back to reference Beckman, M.E., Hirschberg, J., Shattuck-Hufnagel, S.: The original ToBI system and the evolution of the ToBI framework. In: Jun, S.-A. (ed.) Prosodic Typology: The Phonology of Intonation and Phrasing, pp. 9–54. Oxford University Press, UK (2005)CrossRef Beckman, M.E., Hirschberg, J., Shattuck-Hufnagel, S.: The original ToBI system and the evolution of the ToBI framework. In: Jun, S.-A. (ed.) Prosodic Typology: The Phonology of Intonation and Phrasing, pp. 9–54. Oxford University Press, UK (2005)CrossRef
14.
go back to reference Ostendorf, M., Price, P., Shattuck-Hufnagel, S.: The Boston University Radio News Corpus. Linguistic Data Consortium (1995) Ostendorf, M., Price, P., Shattuck-Hufnagel, S.: The Boston University Radio News Corpus. Linguistic Data Consortium (1995)
15.
go back to reference Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)CrossRefMATH Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)CrossRefMATH
16.
go back to reference Hermes, D.J.: Measuring the perceptual similarity of pitch contours. J. Speech Lang. Hear. Res. 41(1), 73–82 (1998)CrossRef Hermes, D.J.: Measuring the perceptual similarity of pitch contours. J. Speech Lang. Hear. Res. 41(1), 73–82 (1998)CrossRef
17.
go back to reference Öhman, S.: Word and sentence intonation: A quantitative model. Speech Transmission Laboratory, Department of Speech Communication, Royal Institute of Technology (1967) Öhman, S.: Word and sentence intonation: A quantitative model. Speech Transmission Laboratory, Department of Speech Communication, Royal Institute of Technology (1967)
18.
go back to reference Prom-on, S., Xu, Y., Thipakorn, B.: Modeling tone and intonation in Mandarin and English as a process of target approximation. J. Acoust. Soc. Am. 125, 405–424 (2009)CrossRef Prom-on, S., Xu, Y., Thipakorn, B.: Modeling tone and intonation in Mandarin and English as a process of target approximation. J. Acoust. Soc. Am. 125, 405–424 (2009)CrossRef
19.
go back to reference Mixdorff, H.: A novel approach to the fully automatic extraction of Fujisaki model parameters. In: ICASSP 2000, vol. 3, pp. 1281–1284 (2000) Mixdorff, H.: A novel approach to the fully automatic extraction of Fujisaki model parameters. In: ICASSP 2000, vol. 3, pp. 1281–1284 (2000)
Metadata
Title
A Linguistic Interpretation of the Atom Decomposition of Fundamental Frequency Contour for American English
Authors
Tijana Delić
Branislav Gerazov
Branislav Popović
Milan Sečujski
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-43958-7_6

Premium Partner