Skip to main content
Top

2015 | OriginalPaper | Chapter

The Role of Prosody in the Perception of Synthesized and Natural Speech

Authors : Maja Marković, Bojana Jakovljević, Tanja Milićev, Nataša Miliević

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents the results of research of perception of synthesized and natural speech, and investigates the role of the prosodic characteristic of pauses in the process of speech comprehension. The research involved a series of perception tasks, including quality assessment, an intelligibility task and comprehension tests of ten shorter and one longer text in Serbian produced by the AlfaNum speech synthesizer and a professional actor, and a follow-up comprehension task of synthesized speech with modified pauses. The results of the intelligibility task show similar performance by both groups of subjects, while the comprehensibility tasks indicate better performance for natural than for synthesized speech. The results of the follow-up task show that the modified prosody contributed to the better performance of the subjects. The quality assessment task revealed the subjects preference for natural speech mainly on the basis of the prosodic characteristic of pauses.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Pauses in the form of silence are not the only significant indicators of IP boundaries; the other common cues of IP boundaries are the lengthening of final segments (pre-boundary lengthening) and the presence of a specific boundary tone.
 
2
The results will be reported in detail in Sect. 3.
 
3
SUS is the methodology proposed as the most appropriate for assessing segmental intelligibility in [6] and references therein.
 
Literature
1.
go back to reference Pisoni, D.B.: Perception of synthetic speech. In: van Santen, J.P.H., Sproat, R.W., Olive, J.P., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 541–560. Springer, New York (1997)CrossRef Pisoni, D.B.: Perception of synthetic speech. In: van Santen, J.P.H., Sproat, R.W., Olive, J.P., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 541–560. Springer, New York (1997)CrossRef
2.
go back to reference Pisoni, D.B.: Some measures of intelligibility and comprehension. In: Allen, J., Hunnicutt, M.S., Klatt, D.H. (eds.) From Text to Speech: The MITalk System, pp. 151–171. Cambridge University Press, Cambridge, UK (1987) Pisoni, D.B.: Some measures of intelligibility and comprehension. In: Allen, J., Hunnicutt, M.S., Klatt, D.H. (eds.) From Text to Speech: The MITalk System, pp. 151–171. Cambridge University Press, Cambridge, UK (1987)
3.
go back to reference Pisoni, D.B.: Speeded classification of natural and synthetic speech in a lexical decision task. J. Acoust. Soc. Am. 70, S98 (1981)CrossRef Pisoni, D.B.: Speeded classification of natural and synthetic speech in a lexical decision task. J. Acoust. Soc. Am. 70, S98 (1981)CrossRef
4.
go back to reference Pisoni, D.B., Nusbaum, H., Greene, B.G.: Perception of synthetic speech generated by rule. In: Proceedings of the IEEE, pp. 1665–1676 (1985) Pisoni, D.B., Nusbaum, H., Greene, B.G.: Perception of synthetic speech generated by rule. In: Proceedings of the IEEE, pp. 1665–1676 (1985)
5.
go back to reference Pols, L.C.W., Santen, J.P.H. van, Abe, M., Kahn, D., Keller, E.: The use of large text corpora for evaluation text-to-speech systems. In: Proceedings of the First International Conference on Language Resources and Evaluation, pp. 637–640. Granada, Spain (1998) Pols, L.C.W., Santen, J.P.H. van, Abe, M., Kahn, D., Keller, E.: The use of large text corpora for evaluation text-to-speech systems. In: Proceedings of the First International Conference on Language Resources and Evaluation, pp. 637–640. Granada, Spain (1998)
6.
go back to reference Chang, Y.Y.: Evaluation of TTS systems in intelligibility and comprehension tasks. In: ROCLING, Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing, Taipei, Taiwan, pp. 64–78 (2011) Chang, Y.Y.: Evaluation of TTS systems in intelligibility and comprehension tasks. In: ROCLING, Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing, Taipei, Taiwan, pp. 64–78 (2011)
7.
go back to reference Warren, R.M.: Perceptual restoration of missing speech sounds. Science 167, 392–393 (1970)CrossRef Warren, R.M.: Perceptual restoration of missing speech sounds. Science 167, 392–393 (1970)CrossRef
8.
go back to reference Warren, R.M., Obusek, C.: Speech perception and phonemic restorations. Percept. Psychophys. 9, 358–363 (1971)CrossRef Warren, R.M., Obusek, C.: Speech perception and phonemic restorations. Percept. Psychophys. 9, 358–363 (1971)CrossRef
9.
go back to reference Selkirk, E.: Phonology and Syntax: The Relation Between Sound and Structure. MIT Press, Cambridge (1984) Selkirk, E.: Phonology and Syntax: The Relation Between Sound and Structure. MIT Press, Cambridge (1984)
10.
go back to reference Kjelgaard, M.M., Speer, S.R.: Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity. J. Mem. Lang. 40, 153–194 (1999)CrossRef Kjelgaard, M.M., Speer, S.R.: Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity. J. Mem. Lang. 40, 153–194 (1999)CrossRef
11.
go back to reference Swerts, M., Geluykens, R.: Prosody as a marker of information flow in spoken discourse. Lang. Speech 37, 21–45 (1994) Swerts, M., Geluykens, R.: Prosody as a marker of information flow in spoken discourse. Lang. Speech 37, 21–45 (1994)
12.
go back to reference Hirschberg, J.: Communication and prosody: functional aspects of prosody. Speech Commun. (Special Issue on Dialogue and Prosody) 36, 31–43 (2001)MATH Hirschberg, J.: Communication and prosody: functional aspects of prosody. Speech Commun. (Special Issue on Dialogue and Prosody) 36, 31–43 (2001)MATH
13.
go back to reference Shriberg, E., Stolcke, A., Hakkani-Tür, D., Tür, G.: Prosody-Based Automatic Segmentation of Speech into Sentences and Topics. Speech Commun. 32, 127–154 (2000)CrossRef Shriberg, E., Stolcke, A., Hakkani-Tür, D., Tür, G.: Prosody-Based Automatic Segmentation of Speech into Sentences and Topics. Speech Commun. 32, 127–154 (2000)CrossRef
14.
go back to reference Cutler, A., Dahan, D., van Donselaar, W.: Prosody in the comprehension of spoken language: a literature review. Lang. Speech 40, 141–201 (1997) Cutler, A., Dahan, D., van Donselaar, W.: Prosody in the comprehension of spoken language: a literature review. Lang. Speech 40, 141–201 (1997)
15.
go back to reference Swerts, M., Geluykens, R.: Local and global prosodic cues to discourse organization in dialogues. In: Proceedings of the ESCA Workshop on Prosody, pp. 108–111. Lund, Sweden (1993) Swerts, M., Geluykens, R.: Local and global prosodic cues to discourse organization in dialogues. In: Proceedings of the ESCA Workshop on Prosody, pp. 108–111. Lund, Sweden (1993)
16.
go back to reference Tench, P.: The Intonation System of English. Cassell, London (1996) Tench, P.: The Intonation System of English. Cassell, London (1996)
17.
go back to reference Jonides, J., Lewis, R.L., Nee, D.E., Lustig, C.A., Berman, M.G., Moore, K.S.: The mind and brain of short-term memory. Annu. Rev. Psychol. 59, 193–224 (2008)CrossRef Jonides, J., Lewis, R.L., Nee, D.E., Lustig, C.A., Berman, M.G., Moore, K.S.: The mind and brain of short-term memory. Annu. Rev. Psychol. 59, 193–224 (2008)CrossRef
Metadata
Title
The Role of Prosody in the Perception of Synthesized and Natural Speech
Authors
Maja Marković
Bojana Jakovljević
Tanja Milićev
Nataša Miliević
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-23132-7_55

Premium Partner