Skip to main content
Top

2014 | OriginalPaper | Chapter

Automatic Phonetic Transcription in Two Steps: Forced Alignment and Burst Detection

Authors : Barbara Schuppler, Sebastian Grill, André Menrath, Juan A. Morales-Cordovilla

Published in: Statistical Language and Speech Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the last decade, there was a growing interest in conversational speech in the fields of human and automatic speech recognition. Whereas for the varieties spoken in Germany, both resources and tools are numerous, for Austrian German only recently the first corpus of read and conversational speech was collected. In the current paper, we present automatic methods to phonetically transcribe and segment (read and) conversational Austrian German. For this purpose, we developed an automatic two-step transcription procedure: In the first step, broad phonetic transcriptions are created by means of a forced alignment and a lexicon with multiple pronunciation variants per word. In the second step, plosives are annotated on the sub-phonemic level: an automatic burst detector automatically determines whether a burst exists and where it is located. Our preliminary results show that the forced alignment based approach reaches accuracies in the range of what has been reported for the inter-transcriber agreement for conversational speech. Furthermore, our burst detector outperforms previous tools with accuracies between 98 % and 74 % for the different conditions in read speech, and between 82 % and 52 % for conversational speech.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Adda-Decker, M., Lamel, L.: Modeling reduced pronunciations in German. Phonus 5, Institute of Phonetics, University of the Saarland, pp. 129–143 (2000) Adda-Decker, M., Lamel, L.: Modeling reduced pronunciations in German. Phonus 5, Institute of Phonetics, University of the Saarland, pp. 129–143 (2000)
2.
go back to reference Adda-Decker, M., Snoeren, N.D.: Quantifying temporal speech reduction in French using forced speech alignment. J. Phonetics 39, 261–270 (2011)CrossRef Adda-Decker, M., Snoeren, N.D.: Quantifying temporal speech reduction in French using forced speech alignment. J. Phonetics 39, 261–270 (2011)CrossRef
4.
go back to reference Cucchiarini, C., Binnenpoorte, D.: Validation and improvement of automatic phonetic transcriptions. In: Proceedings of ISCLP, Denver, USA, pp. 313–316 (2002) Cucchiarini, C., Binnenpoorte, D.: Validation and improvement of automatic phonetic transcriptions. In: Proceedings of ISCLP, Denver, USA, pp. 313–316 (2002)
5.
go back to reference Ernestus, M.: Voice assimilation and segment reduction in casual Dutch. A corpus-based study of the phonology-phonetics interface. Ph.D. thesis, LOT, Vrije Universiteit Amsterdam, The Netherlands (2000) Ernestus, M.: Voice assimilation and segment reduction in casual Dutch. A corpus-based study of the phonology-phonetics interface. Ph.D. thesis, LOT, Vrije Universiteit Amsterdam, The Netherlands (2000)
6.
go back to reference Gubian, M., Schuppler, B., van Doremalen, J., Sanders, E., Boves, L.: Novelty detection as a tool for automatic detection of orthographic transcription errors. In: Proceedings of the 13-th International Conference on Speech and Computer SPECOM-2009, pp. 509–514 (2009) Gubian, M., Schuppler, B., van Doremalen, J., Sanders, E., Boves, L.: Novelty detection as a tool for automatic detection of orthographic transcription errors. In: Proceedings of the 13-th International Conference on Speech and Computer SPECOM-2009, pp. 509–514 (2009)
8.
go back to reference Khasanova, A., Cole, J., Hasegawa-Johnson, M.: Assessing reliability of automatic burst location. In: Proceedings of Interspeech (2009) Khasanova, A., Cole, J., Hasegawa-Johnson, M.: Assessing reliability of automatic burst location. In: Proceedings of Interspeech (2009)
9.
go back to reference Kipp, A., Wesenick, M., Schiel, F.: Pronunciation modeling applied to automatic segmentation of spontaneous speech. In: Proceedings of Eurospeech, pp. 1023–1026 (1997) Kipp, A., Wesenick, M., Schiel, F.: Pronunciation modeling applied to automatic segmentation of spontaneous speech. In: Proceedings of Eurospeech, pp. 1023–1026 (1997)
10.
go back to reference Kuzla, C., Ernestus, M.: Prosodic conditioning of phonetic detail in German plosives. J. Phonetics 39, 143–155 (2011)CrossRef Kuzla, C., Ernestus, M.: Prosodic conditioning of phonetic detail in German plosives. J. Phonetics 39, 143–155 (2011)CrossRef
11.
go back to reference Leitner, C., Schickbichler, M., Petrik, S.: Example-based automatic phonetic transcription. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pp. 3278–3284 (2010) Leitner, C., Schickbichler, M., Petrik, S.: Example-based automatic phonetic transcription. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pp. 3278–3284 (2010)
12.
go back to reference Lücking, A., Bergman, K., Hahn, F., Kopp, S., Rieser, H.: The Bielefeld speech and gesture alignment corpus (SaGA). In: Proceedings of LREC 2010 Workshop: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, pp. 92–98 (2010) Lücking, A., Bergman, K., Hahn, F., Kopp, S., Rieser, H.: The Bielefeld speech and gesture alignment corpus (SaGA). In: Proceedings of LREC 2010 Workshop: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, pp. 92–98 (2010)
13.
go back to reference Makimoto, S., Kashioka, H., Nick, C.: Tagging structure and relationships in a Japanese natural dialogue corpus, In: Proceedings of Interspeech, pp. 912–917 (2007) Makimoto, S., Kashioka, H., Nick, C.: Tagging structure and relationships in a Japanese natural dialogue corpus, In: Proceedings of Interspeech, pp. 912–917 (2007)
14.
go back to reference Moosmüller, S.: The process of monophthongization in Austria (reading material and spontaneous speech). In: Papers and Studies in Contrastive Linguistics, pp. 9–25 (1998) Moosmüller, S.: The process of monophthongization in Austria (reading material and spontaneous speech). In: Papers and Studies in Contrastive Linguistics, pp. 9–25 (1998)
15.
go back to reference Muhr, R.: Österreichisches Aussprachewörterbuch – Österreichische Aussprachedatenbank. Peter Lang Verlag, Frankfurt/M., Wien u.a. 525 S. mit DVD (2007) Muhr, R.: Österreichisches Aussprachewörterbuch – Österreichische Aussprachedatenbank. Peter Lang Verlag, Frankfurt/M., Wien u.a. 525 S. mit DVD (2007)
16.
go back to reference Neubarth, F., Pucher, M., Kranzler, C.: Modeling Austrian dialect varieties for TTS. In: Proceedings of Interspeech, pp. 1877–1880 (2008) Neubarth, F., Pucher, M., Kranzler, C.: Modeling Austrian dialect varieties for TTS. In: Proceedings of Interspeech, pp. 1877–1880 (2008)
17.
go back to reference Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W.D.: The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Commun. 45, 89–95 (2005)CrossRef Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W.D.: The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Commun. 45, 89–95 (2005)CrossRef
18.
go back to reference Raymond, W.D., Dautricourt, R., Hume, E.: Word-internal /t, d/ deletion in spontaneous speech: modeling the effects of extra-linguistic, lexical and phonological factors. Lang. Var. Change 18, 55–97 (2006)CrossRef Raymond, W.D., Dautricourt, R., Hume, E.: Word-internal /t, d/ deletion in spontaneous speech: modeling the effects of extra-linguistic, lexical and phonological factors. Lang. Var. Change 18, 55–97 (2006)CrossRef
19.
go back to reference Reichel, U.D.: PermA and Balloon: tools for string alignment and text processing. In: Proceedings of Interspeech 2012, pp. 346 (2012) Reichel, U.D.: PermA and Balloon: tools for string alignment and text processing. In: Proceedings of Interspeech 2012, pp. 346 (2012)
20.
go back to reference Schiel, F.: Automatic phonetic transcription of non-prompted speech. In: Proceedings ICPhS 1999, pp. 607–610 (1999) Schiel, F.: Automatic phonetic transcription of non-prompted speech. In: Proceedings ICPhS 1999, pp. 607–610 (1999)
22.
go back to reference Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: How linguistic and probabilistic properties of a word affect the realization of its final /t/: studies at the phonemic and sub-phonemic level. J. Phonetics 40, 595–607 (2012)CrossRef Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: How linguistic and probabilistic properties of a word affect the realization of its final /t/: studies at the phonemic and sub-phonemic level. J. Phonetics 40, 595–607 (2012)CrossRef
23.
go back to reference Schuppler, B.: automatic analysis of acoustic reduction in spontaneous speech. Ph.D. thesis, Radboud University Nijmegen, The Netherlands (2011) Schuppler, B.: automatic analysis of acoustic reduction in spontaneous speech. Ph.D. thesis, Radboud University Nijmegen, The Netherlands (2011)
24.
go back to reference Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Interspeech’14 (2014, accepted for publication) Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Interspeech’14 (2014, accepted for publication)
25.
go back to reference Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: Word-final [t]-deletion: an analysis on the segmental and sub-segmental level. In: Proceedings of Interspeech, pp. 2275–2278 (2009) Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: Word-final [t]-deletion: an analysis on the segmental and sub-segmental level. In: Proceedings of Interspeech, pp. 2275–2278 (2009)
26.
go back to reference Schuppler, B., Ernestus, M., Scharenborg, O., Boves, L.: Acoustic reduction in conversational Dutch: a quantitative analysis based on automatically generated segmental transcriptions. J. Phonetics 39, 96–109 (2011)CrossRef Schuppler, B., Ernestus, M., Scharenborg, O., Boves, L.: Acoustic reduction in conversational Dutch: a quantitative analysis based on automatically generated segmental transcriptions. J. Phonetics 39, 96–109 (2011)CrossRef
27.
go back to reference Schuppler, B., Hagmüller, M., Morales-Cordovilla, J.A., Pessentheiner, H.: GRASS: the Graz corpus of read and spontaneous speech. In: Proceedings of LREC’14, pp. 1465–1470 (2014) Schuppler, B., Hagmüller, M., Morales-Cordovilla, J.A., Pessentheiner, H.: GRASS: the Graz corpus of read and spontaneous speech. In: Proceedings of LREC’14, pp. 1465–1470 (2014)
28.
go back to reference Torreira, F., Adda-Decker, M., Ernestus, M.: The Nijmegen corpus of casual French. Speech Commun. 52(3), 201–212 (2010)CrossRef Torreira, F., Adda-Decker, M., Ernestus, M.: The Nijmegen corpus of casual French. Speech Commun. 52(3), 201–212 (2010)CrossRef
29.
go back to reference Torreira, F., Ernestus, M.: Probabilistic effects on French [t] duration. In: Proceedings of Interspeech, pp. 448–451 (2009) Torreira, F., Ernestus, M.: Probabilistic effects on French [t] duration. In: Proceedings of Interspeech, pp. 448–451 (2009)
30.
go back to reference Van Bael, C.: Validation, automatic generation and use of broad phonetic transcriptions. Ph.D. thesis, Radboud Universiteit Nijmegen, Nijmegen (2007) Van Bael, C.: Validation, automatic generation and use of broad phonetic transcriptions. Ph.D. thesis, Radboud Universiteit Nijmegen, Nijmegen (2007)
31.
go back to reference Weilhammer, K., Reichel, U., Schiel, F.: Multi-tier annotations in the Verbmobil corpus. In: Proceedings of LREC, pp. 912–917 (2002) Weilhammer, K., Reichel, U., Schiel, F.: Multi-tier annotations in the Verbmobil corpus. In: Proceedings of LREC, pp. 912–917 (2002)
32.
go back to reference Wesenick, M.B.: Automatic generation of German pronunciation variants. In: Proceedings of the ICSLP, pp. 125–128 (1996) Wesenick, M.B.: Automatic generation of German pronunciation variants. In: Proceedings of the ICSLP, pp. 125–128 (1996)
33.
go back to reference Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book (v. 3.2). Technical report, Cambridge University. Engineering Department (2002) Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book (v. 3.2). Technical report, Cambridge University. Engineering Department (2002)
34.
go back to reference Yuan, J., Liberman, M.: Investigating /l/ variation in English through forced alignment. In: Proceedings of Interspeech, pp. 2215–2218 (2009) Yuan, J., Liberman, M.: Investigating /l/ variation in English through forced alignment. In: Proceedings of Interspeech, pp. 2215–2218 (2009)
35.
go back to reference Zimmerer, F., Scharinger, M., Reetz, H.: When BEAT becomes HOUSE: factors of word final /t/-deletion in German. J. Phonetics 39, 143–155 (2011)CrossRef Zimmerer, F., Scharinger, M., Reetz, H.: When BEAT becomes HOUSE: factors of word final /t/-deletion in German. J. Phonetics 39, 143–155 (2011)CrossRef
Metadata
Title
Automatic Phonetic Transcription in Two Steps: Forced Alignment and Burst Detection
Authors
Barbara Schuppler
Sebastian Grill
André Menrath
Juan A. Morales-Cordovilla
Copyright Year
2014
DOI
https://doi.org/10.1007/978-3-319-11397-5_10

Premium Partner