Skip to main content
Top

2017 | OriginalPaper | Chapter

Hesitations in Spontaneous Speech: Acoustic Analysis and Detection

Authors : Vasilisa Verkhodanova, Vladimir Shapranov, Irina Kipyatkova

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Spontaneous speech is different from any other type of speech in many ways, with speech disfluencies being the prominent feature. These phenomena both play an important role in communication, and also cause problems for automatic speech processing. In this study we present the results of acoustic analysis of the most frequent disfluencies - voiced hesitations (filled pauses and lengthenings) across different speaking styles in spontaneous Russian speech, as well as results of experiments on their detection using SVM classifier on a joint Russian and English spontaneous speech corpus. Results of acoustic analysis showed significant differences in fundamental frequency and energy distribution ratios of hesitations and their contexts across speaking styles in Russian: comparing to the dialogues, in monologues speakers exhibit more prosodic cues for the adjacent context and hesitations. Experiments on detection of voiced hesitations on a mixed language and style corpus with SVM resulted in achieving F1–score = 0.48 (With F1–score = 0.55 for only Russian data).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Allwood, J., Nivre, J., Ahlsén, E.: Speech management on the non-written life of speech. Nordic J. Linguist. 13(1), 3–48 (1990)CrossRef Allwood, J., Nivre, J., Ahlsén, E.: Speech management on the non-written life of speech. Nordic J. Linguist. 13(1), 3–48 (1990)CrossRef
5.
go back to reference Clark, H.H., Tree, J.E.F.: Using uh and um in spontaneous speaking. Cognition 84(1), 73–111 (2002)CrossRef Clark, H.H., Tree, J.E.F.: Using uh and um in spontaneous speaking. Cognition 84(1), 73–111 (2002)CrossRef
6.
7.
go back to reference Du Bois, J.W., Chafe, W.L., Meyer, C., Thompson, S.A., Martey, N.: Santa Barbara Corpus of Spoken American English, Linguistic Data Consortium. Philadelphia (2000–2005) Du Bois, J.W., Chafe, W.L., Meyer, C., Thompson, S.A., Martey, N.: Santa Barbara Corpus of Spoken American English, Linguistic Data Consortium. Philadelphia (2000–2005)
8.
go back to reference Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE: the Munich versatile and fast open-source audio feature extractor. In: Proceeding of 18th ACM International Conference on Multimedia, pp. 1459–1462. ACM (2010) Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE: the Munich versatile and fast open-source audio feature extractor. In: Proceeding of 18th ACM International Conference on Multimedia, pp. 1459–1462. ACM (2010)
9.
go back to reference Giannini, A.: Hesitation phenomena in spontaneous Italian. In: Proceeding of 15th International Congress of Phonetic Sciences, Barcelona, Spain, pp. 2653–2656 (2003) Giannini, A.: Hesitation phenomena in spontaneous Italian. In: Proceeding of 15th International Congress of Phonetic Sciences, Barcelona, Spain, pp. 2653–2656 (2003)
10.
go back to reference Godfrey, J.J., Holliman, E.C., McDaniel, J.: SwitchBoard: telephone speech corpus for research and development. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP-1992), vol. 1, pp. 517–520. IEEE (1992) Godfrey, J.J., Holliman, E.C., McDaniel, J.: SwitchBoard: telephone speech corpus for research and development. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP-1992), vol. 1, pp. 517–520. IEEE (1992)
11.
go back to reference Heijmans, H.J.: Mathematical morphology: a modern approach in image processing based on algebra and geometry. SIAM Rev. 37(1), 1–36 (1995)MathSciNetCrossRefMATH Heijmans, H.J.: Mathematical morphology: a modern approach in image processing based on algebra and geometry. SIAM Rev. 37(1), 1–36 (1995)MathSciNetCrossRefMATH
13.
go back to reference Khurshudian, V.: Hesitation in typologically different languages: an experimental study. In: Proceeding of International Conference on Computational Linguistics Dialogue, pp. 497–501 (2005) Khurshudian, V.: Hesitation in typologically different languages: an experimental study. In: Proceeding of International Conference on Computational Linguistics Dialogue, pp. 497–501 (2005)
14.
go back to reference Kibrik, A., Podlesskaya, V. (eds.): Rasskazy o Snovideniyah: Korpusnoye Issledovaniye Ustnogo Russkogo Diskursa [Night dream stories: Corpus study of Russian discourse]. Litres (2014) Kibrik, A., Podlesskaya, V. (eds.): Rasskazy o Snovideniyah: Korpusnoye Issledovaniye Ustnogo Russkogo Diskursa [Night dream stories: Corpus study of Russian discourse]. Litres (2014)
15.
go back to reference Medeiros, H., Batista, F., Moniz, H., Trancoso, I., Meinedo, H.: Experiments on automatic detection of filled pauses using prosodic features. Actas de Inforum 2013, 335–345 (2013) Medeiros, H., Batista, F., Moniz, H., Trancoso, I., Meinedo, H.: Experiments on automatic detection of filled pauses using prosodic features. Actas de Inforum 2013, 335–345 (2013)
16.
go back to reference Medeiros, H., Moniz, H., Batista, F., Trancoso, I., Nunes, L., et al.: Disfluency detection based on prosodic features for university lectures. In: Proceeding of INTERSPEECH 2013, Lyon, France, pp. 2629–2633 (2013) Medeiros, H., Moniz, H., Batista, F., Trancoso, I., Nunes, L., et al.: Disfluency detection based on prosodic features for university lectures. In: Proceeding of INTERSPEECH 2013, Lyon, France, pp. 2629–2633 (2013)
17.
go back to reference Moniz, H., Batista, F., Mata, A.I., Trancoso, I.: Speaking style effects in the production of disfluencies. Speech Commun. 65, 20–35 (2014)CrossRef Moniz, H., Batista, F., Mata, A.I., Trancoso, I.: Speaking style effects in the production of disfluencies. Speech Commun. 65, 20–35 (2014)CrossRef
18.
go back to reference O’Connel, D.C., Kowal, S.: Communicating with One Another: Toward a Psychology of Spontaneous Spoken Discourse. Cognition and Language: A Series in Psycholinguistics. Springer Science & Business Media, New York (2009). doi:10.1007/978-0-387-77632-3 O’Connel, D.C., Kowal, S.: Communicating with One Another: Toward a Psychology of Spontaneous Spoken Discourse. Cognition and Language: A Series in Psycholinguistics. Springer Science & Business Media, New York (2009). doi:10.​1007/​978-0-387-77632-3
19.
go back to reference O’Connell, D., Kowal, S.: The history of research on the filled pause as evidence of the written language bias in linguistics. J. Psycholinguist. Res. 33(6), 459–474 (2004)CrossRef O’Connell, D., Kowal, S.: The history of research on the filled pause as evidence of the written language bias in linguistics. J. Psycholinguist. Res. 33(6), 459–474 (2004)CrossRef
20.
go back to reference Ogden, R.: Turn-holding, turn-yielding and laryngeal activity in finnish talk-in-interaction. J. Int. Phonetics Assoc. 31(1), 139–52 (2001) Ogden, R.: Turn-holding, turn-yielding and laryngeal activity in finnish talk-in-interaction. J. Int. Phonetics Assoc. 31(1), 139–52 (2001)
21.
go back to reference O’Shaughnessy, D.: Recognition of hesitations in spontaneous speech. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP-1992), vol. 1, pp. 521–524. IEEE (1992) O’Shaughnessy, D.: Recognition of hesitations in spontaneous speech. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP-1992), vol. 1, pp. 521–524. IEEE (1992)
22.
go back to reference Ostendorf, M., Shriberg, E., Stolcke, A.: Human language technology: opportunities and challenges. Technical report, DTIC Document (2005) Ostendorf, M., Shriberg, E., Stolcke, A.: Human language technology: opportunities and challenges. Technical report, DTIC Document (2005)
23.
go back to reference Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A.: Application of image processing methods to filled pauses detection from spontaneous speech. In: Proceeding of INTERSPEECH 2014, Singapore, pp. 1816–1820. ISCA (2014) Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A.: Application of image processing methods to filled pauses detection from spontaneous speech. In: Proceeding of INTERSPEECH 2014, Singapore, pp. 1816–1820. ISCA (2014)
24.
go back to reference Ranganath, R., Jurafsky, D., McFarland, D.A.: Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Comput. Speech Lang. 27(1), 89–115 (2013)CrossRef Ranganath, R., Jurafsky, D., McFarland, D.A.: Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Comput. Speech Lang. 27(1), 89–115 (2013)CrossRef
25.
go back to reference Shriberg, E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California at Berkeley (1994) Shriberg, E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California at Berkeley (1994)
26.
go back to reference Shriberg, E.: To ‘Errrr’ is human: ecology and acoustics of speech disfluencies. J. Int. Phonetic Assoc. 31(1), 153–169 (2001)CrossRef Shriberg, E.: To ‘Errrr’ is human: ecology and acoustics of speech disfluencies. J. Int. Phonetic Assoc. 31(1), 153–169 (2001)CrossRef
27.
go back to reference Shriberg, E., Bates, R.A., Stolcke, A.: A prosody only decision-tree model for disfluency detection. In: Proceeding of the Eurospeech 1997, 5th European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 2383–2386 (1997) Shriberg, E., Bates, R.A., Stolcke, A.: A prosody only decision-tree model for disfluency detection. In: Proceeding of the Eurospeech 1997, 5th European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 2383–2386 (1997)
28.
go back to reference Stepanova, S.: Some features of filled hesitation pauses in spontaneous Russian. In: Proceeding of 16th International Congress of Phonetic Sciences, Saarbrucken, Germany, vol. 16, pp. 1325–1328 (2007) Stepanova, S.: Some features of filled hesitation pauses in spontaneous Russian. In: Proceeding of 16th International Congress of Phonetic Sciences, Saarbrucken, Germany, vol. 16, pp. 1325–1328 (2007)
29.
go back to reference Stolcke, A., Shriberg, E., Bates, R.A., Ostendorf, M., Hakkani, D., Plauche, M., Tür, G., Lu, Y.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998) Stolcke, A., Shriberg, E., Bates, R.A., Ostendorf, M., Hakkani, D., Plauche, M., Tür, G., Lu, Y.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998)
30.
go back to reference Thordardottir, E.T., Weismer, S.E.: Content mazes and filled pauses in narrative language samples of children with specific language impairment. Brain Cogn. 48(2–3), 587–592 (2001) Thordardottir, E.T., Weismer, S.E.: Content mazes and filled pauses in narrative language samples of children with specific language impairment. Brain Cogn. 48(2–3), 587–592 (2001)
31.
go back to reference Verkhodanova, V., Shapranov, V.: Automatic detection of filled pauses and lengthenings in the spontaneous Russian speech. In: Proceeding of 7th International Conference Speech Prosody, pp. 1110–1114 (2014) Verkhodanova, V., Shapranov, V.: Automatic detection of filled pauses and lengthenings in the spontaneous Russian speech. In: Proceeding of 7th International Conference Speech Prosody, pp. 1110–1114 (2014)
32.
go back to reference Verkhodanova, V., Shapranov, V.: Multi-factor method for detection of filled pauses and lengthenings in Russian spontaneous speech. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 285–292. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_35 CrossRef Verkhodanova, V., Shapranov, V.: Multi-factor method for detection of filled pauses and lengthenings in Russian spontaneous speech. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 285–292. Springer, Cham (2015). doi:10.​1007/​978-3-319-23132-7_​35 CrossRef
33.
go back to reference Verkhodanova, V., Shapranov, V.: Detecting filled pauses and lengthenings in Russian spontaneous speech using SVM. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 224–231. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_26 CrossRef Verkhodanova, V., Shapranov, V.: Detecting filled pauses and lengthenings in Russian spontaneous speech using SVM. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 224–231. Springer, Cham (2016). doi:10.​1007/​978-3-319-43958-7_​26 CrossRef
34.
go back to reference Watanabe, M., Hirose, K., Den, Y., Minematsu, N.: Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners. Speech Commun. 50(2), 81–94 (2008)CrossRef Watanabe, M., Hirose, K., Den, Y., Minematsu, N.: Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners. Speech Commun. 50(2), 81–94 (2008)CrossRef
35.
go back to reference Zahorian, S.A., Wu, J., Karnjanadecha, M., Vootkur, C.S., Wong, B., Hwang, A., Tokhtamyshev, E.: Open-source multi-language audio database for spoken language processing applications. In: Proceeding of INTERSPEECH 2011, Florence, Italy, pp. 1493–1496 (2011) Zahorian, S.A., Wu, J., Karnjanadecha, M., Vootkur, C.S., Wong, B., Hwang, A., Tokhtamyshev, E.: Open-source multi-language audio database for spoken language processing applications. In: Proceeding of INTERSPEECH 2011, Florence, Italy, pp. 1493–1496 (2011)
Metadata
Title
Hesitations in Spontaneous Speech: Acoustic Analysis and Detection
Authors
Vasilisa Verkhodanova
Vladimir Shapranov
Irina Kipyatkova
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-66429-3_39

Premium Partner