Top

Published in:

2014 | OriginalPaper | Chapter

Automatic Phonetic Transcription in Two Steps: Forced Alignment and Burst Detection

Authors : Barbara Schuppler, Sebastian Grill, André Menrath, Juan A. Morales-Cordovilla

Published in: Statistical Language and Speech Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In the last decade, there was a growing interest in conversational speech in the fields of human and automatic speech recognition. Whereas for the varieties spoken in Germany, both resources and tools are numerous, for Austrian German only recently the first corpus of read and conversational speech was collected. In the current paper, we present automatic methods to phonetically transcribe and segment (read and) conversational Austrian German. For this purpose, we developed an automatic two-step transcription procedure: In the first step, broad phonetic transcriptions are created by means of a forced alignment and a lexicon with multiple pronunciation variants per word. In the second step, plosives are annotated on the sub-phonemic level: an automatic burst detector automatically determines whether a burst exists and where it is located. Our preliminary results show that the forced alignment based approach reaches accuracies in the range of what has been reported for the inter-transcriber agreement for conversational speech. Furthermore, our burst detector outperforms previous tools with accuracies between 98 % and 74 % for the different conditions in read speech, and between 82 % and 52 % for conversational speech.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Physiological and Cognitive Status Monitoring on the Base of Acoustic-Phonetic Speech Parameters

next chapter Supervised Classification Using Balanced Training

Adda-Decker, M., Lamel, L.: Modeling reduced pronunciations in German. Phonus 5, Institute of Phonetics, University of the Saarland, pp. 129–143 (2000)

Adda-Decker, M., Snoeren, N.D.: Quantifying temporal speech reduction in French using forced speech alignment. J. Phonetics 39, 261–270 (2011)CrossRef

Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 314–345 (2001). http://www.praat.org (last viewed 25-3-2014)

Cucchiarini, C., Binnenpoorte, D.: Validation and improvement of automatic phonetic transcriptions. In: Proceedings of ISCLP, Denver, USA, pp. 313–316 (2002)

Ernestus, M.: Voice assimilation and segment reduction in casual Dutch. A corpus-based study of the phonology-phonetics interface. Ph.D. thesis, LOT, Vrije Universiteit Amsterdam, The Netherlands (2000)

Gubian, M., Schuppler, B., van Doremalen, J., Sanders, E., Boves, L.: Novelty detection as a tool for automatic detection of orthographic transcription errors. In: Proceedings of the 13-th International Conference on Speech and Computer SPECOM-2009, pp. 509–514 (2009)

IPDS: CD-ROM: The Kiel Corpus of Spontaneous Speech, vol i-vol iii. Corpus description available at http://www.ipds.uni-kiel.de/forschung/kielcorpus.de.html (1997) (last viewed 25/11/2012)

Khasanova, A., Cole, J., Hasegawa-Johnson, M.: Assessing reliability of automatic burst location. In: Proceedings of Interspeech (2009)

Kipp, A., Wesenick, M., Schiel, F.: Pronunciation modeling applied to automatic segmentation of spontaneous speech. In: Proceedings of Eurospeech, pp. 1023–1026 (1997)

10.

Kuzla, C., Ernestus, M.: Prosodic conditioning of phonetic detail in German plosives. J. Phonetics 39, 143–155 (2011)CrossRef

11.

Leitner, C., Schickbichler, M., Petrik, S.: Example-based automatic phonetic transcription. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pp. 3278–3284 (2010)

12.

Lücking, A., Bergman, K., Hahn, F., Kopp, S., Rieser, H.: The Bielefeld speech and gesture alignment corpus (SaGA). In: Proceedings of LREC 2010 Workshop: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, pp. 92–98 (2010)

13.

Makimoto, S., Kashioka, H., Nick, C.: Tagging structure and relationships in a Japanese natural dialogue corpus, In: Proceedings of Interspeech, pp. 912–917 (2007)

14.

Moosmüller, S.: The process of monophthongization in Austria (reading material and spontaneous speech). In: Papers and Studies in Contrastive Linguistics, pp. 9–25 (1998)

15.

Muhr, R.: Österreichisches Aussprachewörterbuch – Österreichische Aussprachedatenbank. Peter Lang Verlag, Frankfurt/M., Wien u.a. 525 S. mit DVD (2007)

16.

Neubarth, F., Pucher, M., Kranzler, C.: Modeling Austrian dialect varieties for TTS. In: Proceedings of Interspeech, pp. 1877–1880 (2008)

17.

Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W.D.: The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Commun. 45, 89–95 (2005)CrossRef

18.

Raymond, W.D., Dautricourt, R., Hume, E.: Word-internal /t, d/ deletion in spontaneous speech: modeling the effects of extra-linguistic, lexical and phonological factors. Lang. Var. Change 18, 55–97 (2006)CrossRef

19.

Reichel, U.D.: PermA and Balloon: tools for string alignment and text processing. In: Proceedings of Interspeech 2012, pp. 346 (2012)

20.

Schiel, F.: Automatic phonetic transcription of non-prompted speech. In: Proceedings ICPhS 1999, pp. 607–610 (1999)

21.

Schiel, F., Baumann, A.: Phondat1, corpus version 3.4. Internal report, Bavarian Archive for Speech Signals (BAS) (2006). http://www.bas.uni-muenchen.de/bas/basformatseng.html

22.

Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: How linguistic and probabilistic properties of a word affect the realization of its final /t/: studies at the phonemic and sub-phonemic level. J. Phonetics 40, 595–607 (2012)CrossRef

23.

Schuppler, B.: automatic analysis of acoustic reduction in spontaneous speech. Ph.D. thesis, Radboud University Nijmegen, The Netherlands (2011)

24.

Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Interspeech’14 (2014, accepted for publication)

25.

Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: Word-final [t]-deletion: an analysis on the segmental and sub-segmental level. In: Proceedings of Interspeech, pp. 2275–2278 (2009)

26.

Schuppler, B., Ernestus, M., Scharenborg, O., Boves, L.: Acoustic reduction in conversational Dutch: a quantitative analysis based on automatically generated segmental transcriptions. J. Phonetics 39, 96–109 (2011)CrossRef

27.

Schuppler, B., Hagmüller, M., Morales-Cordovilla, J.A., Pessentheiner, H.: GRASS: the Graz corpus of read and spontaneous speech. In: Proceedings of LREC’14, pp. 1465–1470 (2014)

28.

Torreira, F., Adda-Decker, M., Ernestus, M.: The Nijmegen corpus of casual French. Speech Commun. 52(3), 201–212 (2010)CrossRef

29.

Torreira, F., Ernestus, M.: Probabilistic effects on French [t] duration. In: Proceedings of Interspeech, pp. 448–451 (2009)

30.

Van Bael, C.: Validation, automatic generation and use of broad phonetic transcriptions. Ph.D. thesis, Radboud Universiteit Nijmegen, Nijmegen (2007)

31.

Weilhammer, K., Reichel, U., Schiel, F.: Multi-tier annotations in the Verbmobil corpus. In: Proceedings of LREC, pp. 912–917 (2002)

32.

Wesenick, M.B.: Automatic generation of German pronunciation variants. In: Proceedings of the ICSLP, pp. 125–128 (1996)

33.

Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book (v. 3.2). Technical report, Cambridge University. Engineering Department (2002)

34.

Yuan, J., Liberman, M.: Investigating /l/ variation in English through forced alignment. In: Proceedings of Interspeech, pp. 2215–2218 (2009)

35.

Zimmerer, F., Scharinger, M., Reetz, H.: When BEAT becomes HOUSE: factors of word final /t/-deletion in German. J. Phonetics 39, 143–155 (2011)CrossRef

Title: Automatic Phonetic Transcription in Two Steps: Forced Alignment and Burst Detection
Authors: Barbara Schuppler
Sebastian Grill
André Menrath
Juan A. Morales-Cordovilla
Publisher: Springer International Publishing
Book: Statistical Language and Speech Processing
Print ISBN: 978-3-319-11396-8

Electronic ISBN: 978-3-319-11397-5

Copyright Year: 2014
DOI: https://doi.org/10.1007/978-3-319-11397-5_10

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner