Skip to main content

2014 | OriginalPaper | Buchkapitel

Automatic Phonetic Transcription in Two Steps: Forced Alignment and Burst Detection

verfasst von : Barbara Schuppler, Sebastian Grill, André Menrath, Juan A. Morales-Cordovilla

Erschienen in: Statistical Language and Speech Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the last decade, there was a growing interest in conversational speech in the fields of human and automatic speech recognition. Whereas for the varieties spoken in Germany, both resources and tools are numerous, for Austrian German only recently the first corpus of read and conversational speech was collected. In the current paper, we present automatic methods to phonetically transcribe and segment (read and) conversational Austrian German. For this purpose, we developed an automatic two-step transcription procedure: In the first step, broad phonetic transcriptions are created by means of a forced alignment and a lexicon with multiple pronunciation variants per word. In the second step, plosives are annotated on the sub-phonemic level: an automatic burst detector automatically determines whether a burst exists and where it is located. Our preliminary results show that the forced alignment based approach reaches accuracies in the range of what has been reported for the inter-transcriber agreement for conversational speech. Furthermore, our burst detector outperforms previous tools with accuracies between 98 % and 74 % for the different conditions in read speech, and between 82 % and 52 % for conversational speech.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adda-Decker, M., Lamel, L.: Modeling reduced pronunciations in German. Phonus 5, Institute of Phonetics, University of the Saarland, pp. 129–143 (2000) Adda-Decker, M., Lamel, L.: Modeling reduced pronunciations in German. Phonus 5, Institute of Phonetics, University of the Saarland, pp. 129–143 (2000)
2.
Zurück zum Zitat Adda-Decker, M., Snoeren, N.D.: Quantifying temporal speech reduction in French using forced speech alignment. J. Phonetics 39, 261–270 (2011)CrossRef Adda-Decker, M., Snoeren, N.D.: Quantifying temporal speech reduction in French using forced speech alignment. J. Phonetics 39, 261–270 (2011)CrossRef
4.
Zurück zum Zitat Cucchiarini, C., Binnenpoorte, D.: Validation and improvement of automatic phonetic transcriptions. In: Proceedings of ISCLP, Denver, USA, pp. 313–316 (2002) Cucchiarini, C., Binnenpoorte, D.: Validation and improvement of automatic phonetic transcriptions. In: Proceedings of ISCLP, Denver, USA, pp. 313–316 (2002)
5.
Zurück zum Zitat Ernestus, M.: Voice assimilation and segment reduction in casual Dutch. A corpus-based study of the phonology-phonetics interface. Ph.D. thesis, LOT, Vrije Universiteit Amsterdam, The Netherlands (2000) Ernestus, M.: Voice assimilation and segment reduction in casual Dutch. A corpus-based study of the phonology-phonetics interface. Ph.D. thesis, LOT, Vrije Universiteit Amsterdam, The Netherlands (2000)
6.
Zurück zum Zitat Gubian, M., Schuppler, B., van Doremalen, J., Sanders, E., Boves, L.: Novelty detection as a tool for automatic detection of orthographic transcription errors. In: Proceedings of the 13-th International Conference on Speech and Computer SPECOM-2009, pp. 509–514 (2009) Gubian, M., Schuppler, B., van Doremalen, J., Sanders, E., Boves, L.: Novelty detection as a tool for automatic detection of orthographic transcription errors. In: Proceedings of the 13-th International Conference on Speech and Computer SPECOM-2009, pp. 509–514 (2009)
8.
Zurück zum Zitat Khasanova, A., Cole, J., Hasegawa-Johnson, M.: Assessing reliability of automatic burst location. In: Proceedings of Interspeech (2009) Khasanova, A., Cole, J., Hasegawa-Johnson, M.: Assessing reliability of automatic burst location. In: Proceedings of Interspeech (2009)
9.
Zurück zum Zitat Kipp, A., Wesenick, M., Schiel, F.: Pronunciation modeling applied to automatic segmentation of spontaneous speech. In: Proceedings of Eurospeech, pp. 1023–1026 (1997) Kipp, A., Wesenick, M., Schiel, F.: Pronunciation modeling applied to automatic segmentation of spontaneous speech. In: Proceedings of Eurospeech, pp. 1023–1026 (1997)
10.
Zurück zum Zitat Kuzla, C., Ernestus, M.: Prosodic conditioning of phonetic detail in German plosives. J. Phonetics 39, 143–155 (2011)CrossRef Kuzla, C., Ernestus, M.: Prosodic conditioning of phonetic detail in German plosives. J. Phonetics 39, 143–155 (2011)CrossRef
11.
Zurück zum Zitat Leitner, C., Schickbichler, M., Petrik, S.: Example-based automatic phonetic transcription. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pp. 3278–3284 (2010) Leitner, C., Schickbichler, M., Petrik, S.: Example-based automatic phonetic transcription. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), pp. 3278–3284 (2010)
12.
Zurück zum Zitat Lücking, A., Bergman, K., Hahn, F., Kopp, S., Rieser, H.: The Bielefeld speech and gesture alignment corpus (SaGA). In: Proceedings of LREC 2010 Workshop: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, pp. 92–98 (2010) Lücking, A., Bergman, K., Hahn, F., Kopp, S., Rieser, H.: The Bielefeld speech and gesture alignment corpus (SaGA). In: Proceedings of LREC 2010 Workshop: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, pp. 92–98 (2010)
13.
Zurück zum Zitat Makimoto, S., Kashioka, H., Nick, C.: Tagging structure and relationships in a Japanese natural dialogue corpus, In: Proceedings of Interspeech, pp. 912–917 (2007) Makimoto, S., Kashioka, H., Nick, C.: Tagging structure and relationships in a Japanese natural dialogue corpus, In: Proceedings of Interspeech, pp. 912–917 (2007)
14.
Zurück zum Zitat Moosmüller, S.: The process of monophthongization in Austria (reading material and spontaneous speech). In: Papers and Studies in Contrastive Linguistics, pp. 9–25 (1998) Moosmüller, S.: The process of monophthongization in Austria (reading material and spontaneous speech). In: Papers and Studies in Contrastive Linguistics, pp. 9–25 (1998)
15.
Zurück zum Zitat Muhr, R.: Österreichisches Aussprachewörterbuch – Österreichische Aussprachedatenbank. Peter Lang Verlag, Frankfurt/M., Wien u.a. 525 S. mit DVD (2007) Muhr, R.: Österreichisches Aussprachewörterbuch – Österreichische Aussprachedatenbank. Peter Lang Verlag, Frankfurt/M., Wien u.a. 525 S. mit DVD (2007)
16.
Zurück zum Zitat Neubarth, F., Pucher, M., Kranzler, C.: Modeling Austrian dialect varieties for TTS. In: Proceedings of Interspeech, pp. 1877–1880 (2008) Neubarth, F., Pucher, M., Kranzler, C.: Modeling Austrian dialect varieties for TTS. In: Proceedings of Interspeech, pp. 1877–1880 (2008)
17.
Zurück zum Zitat Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W.D.: The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Commun. 45, 89–95 (2005)CrossRef Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W.D.: The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Commun. 45, 89–95 (2005)CrossRef
18.
Zurück zum Zitat Raymond, W.D., Dautricourt, R., Hume, E.: Word-internal /t, d/ deletion in spontaneous speech: modeling the effects of extra-linguistic, lexical and phonological factors. Lang. Var. Change 18, 55–97 (2006)CrossRef Raymond, W.D., Dautricourt, R., Hume, E.: Word-internal /t, d/ deletion in spontaneous speech: modeling the effects of extra-linguistic, lexical and phonological factors. Lang. Var. Change 18, 55–97 (2006)CrossRef
19.
Zurück zum Zitat Reichel, U.D.: PermA and Balloon: tools for string alignment and text processing. In: Proceedings of Interspeech 2012, pp. 346 (2012) Reichel, U.D.: PermA and Balloon: tools for string alignment and text processing. In: Proceedings of Interspeech 2012, pp. 346 (2012)
20.
Zurück zum Zitat Schiel, F.: Automatic phonetic transcription of non-prompted speech. In: Proceedings ICPhS 1999, pp. 607–610 (1999) Schiel, F.: Automatic phonetic transcription of non-prompted speech. In: Proceedings ICPhS 1999, pp. 607–610 (1999)
22.
Zurück zum Zitat Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: How linguistic and probabilistic properties of a word affect the realization of its final /t/: studies at the phonemic and sub-phonemic level. J. Phonetics 40, 595–607 (2012)CrossRef Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: How linguistic and probabilistic properties of a word affect the realization of its final /t/: studies at the phonemic and sub-phonemic level. J. Phonetics 40, 595–607 (2012)CrossRef
23.
Zurück zum Zitat Schuppler, B.: automatic analysis of acoustic reduction in spontaneous speech. Ph.D. thesis, Radboud University Nijmegen, The Netherlands (2011) Schuppler, B.: automatic analysis of acoustic reduction in spontaneous speech. Ph.D. thesis, Radboud University Nijmegen, The Netherlands (2011)
24.
Zurück zum Zitat Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Interspeech’14 (2014, accepted for publication) Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Interspeech’14 (2014, accepted for publication)
25.
Zurück zum Zitat Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: Word-final [t]-deletion: an analysis on the segmental and sub-segmental level. In: Proceedings of Interspeech, pp. 2275–2278 (2009) Schuppler, B., van Dommelen, W., Koreman, J., Ernestus, M.: Word-final [t]-deletion: an analysis on the segmental and sub-segmental level. In: Proceedings of Interspeech, pp. 2275–2278 (2009)
26.
Zurück zum Zitat Schuppler, B., Ernestus, M., Scharenborg, O., Boves, L.: Acoustic reduction in conversational Dutch: a quantitative analysis based on automatically generated segmental transcriptions. J. Phonetics 39, 96–109 (2011)CrossRef Schuppler, B., Ernestus, M., Scharenborg, O., Boves, L.: Acoustic reduction in conversational Dutch: a quantitative analysis based on automatically generated segmental transcriptions. J. Phonetics 39, 96–109 (2011)CrossRef
27.
Zurück zum Zitat Schuppler, B., Hagmüller, M., Morales-Cordovilla, J.A., Pessentheiner, H.: GRASS: the Graz corpus of read and spontaneous speech. In: Proceedings of LREC’14, pp. 1465–1470 (2014) Schuppler, B., Hagmüller, M., Morales-Cordovilla, J.A., Pessentheiner, H.: GRASS: the Graz corpus of read and spontaneous speech. In: Proceedings of LREC’14, pp. 1465–1470 (2014)
28.
Zurück zum Zitat Torreira, F., Adda-Decker, M., Ernestus, M.: The Nijmegen corpus of casual French. Speech Commun. 52(3), 201–212 (2010)CrossRef Torreira, F., Adda-Decker, M., Ernestus, M.: The Nijmegen corpus of casual French. Speech Commun. 52(3), 201–212 (2010)CrossRef
29.
Zurück zum Zitat Torreira, F., Ernestus, M.: Probabilistic effects on French [t] duration. In: Proceedings of Interspeech, pp. 448–451 (2009) Torreira, F., Ernestus, M.: Probabilistic effects on French [t] duration. In: Proceedings of Interspeech, pp. 448–451 (2009)
30.
Zurück zum Zitat Van Bael, C.: Validation, automatic generation and use of broad phonetic transcriptions. Ph.D. thesis, Radboud Universiteit Nijmegen, Nijmegen (2007) Van Bael, C.: Validation, automatic generation and use of broad phonetic transcriptions. Ph.D. thesis, Radboud Universiteit Nijmegen, Nijmegen (2007)
31.
Zurück zum Zitat Weilhammer, K., Reichel, U., Schiel, F.: Multi-tier annotations in the Verbmobil corpus. In: Proceedings of LREC, pp. 912–917 (2002) Weilhammer, K., Reichel, U., Schiel, F.: Multi-tier annotations in the Verbmobil corpus. In: Proceedings of LREC, pp. 912–917 (2002)
32.
Zurück zum Zitat Wesenick, M.B.: Automatic generation of German pronunciation variants. In: Proceedings of the ICSLP, pp. 125–128 (1996) Wesenick, M.B.: Automatic generation of German pronunciation variants. In: Proceedings of the ICSLP, pp. 125–128 (1996)
33.
Zurück zum Zitat Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book (v. 3.2). Technical report, Cambridge University. Engineering Department (2002) Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book (v. 3.2). Technical report, Cambridge University. Engineering Department (2002)
34.
Zurück zum Zitat Yuan, J., Liberman, M.: Investigating /l/ variation in English through forced alignment. In: Proceedings of Interspeech, pp. 2215–2218 (2009) Yuan, J., Liberman, M.: Investigating /l/ variation in English through forced alignment. In: Proceedings of Interspeech, pp. 2215–2218 (2009)
35.
Zurück zum Zitat Zimmerer, F., Scharinger, M., Reetz, H.: When BEAT becomes HOUSE: factors of word final /t/-deletion in German. J. Phonetics 39, 143–155 (2011)CrossRef Zimmerer, F., Scharinger, M., Reetz, H.: When BEAT becomes HOUSE: factors of word final /t/-deletion in German. J. Phonetics 39, 143–155 (2011)CrossRef
Metadaten
Titel
Automatic Phonetic Transcription in Two Steps: Forced Alignment and Burst Detection
verfasst von
Barbara Schuppler
Sebastian Grill
André Menrath
Juan A. Morales-Cordovilla
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-11397-5_10