Skip to main content

2015 | OriginalPaper | Buchkapitel

Language Model Speaker Adaptation for Transcription of Slovak Parliament Proceedings

verfasst von : Ján Staš, Daniel Hládek, Jozef Juhár

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Language model and acoustic model adaptation play an important role in enhancing performance and robustness of automatic speech recognition, especially in the case of domain-specific, gender-dependent, or user-adapted systems development. This paper is oriented on the language model speaker adaptation for transcription of parliament proceedings in Slovak for individual speaker. Based on the current research studies, we have developed a framework combining multiple speech recognition outputs with acoustic and language model adaptation at different stages. The preliminary results show a significant decrease in the model perplexity from 45 % to 74 % relatively and the speech recognition word error rate from 29 % to 43 %, for male and female speakers respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Rusko, M., et al.: Slovak automatic dictation system for judicial domain. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS, vol. 8387, pp. 16–27. Springer, Heidelberg (2014) Rusko, M., et al.: Slovak automatic dictation system for judicial domain. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS, vol. 8387, pp. 16–27. Springer, Heidelberg (2014)
2.
Zurück zum Zitat Niesler, T., Willett, D.: Unsupervised language model adaptation for lecture speech transcription. In: Proceedings of ICSLP 2002, pp. 1413–1416 (2002) Niesler, T., Willett, D.: Unsupervised language model adaptation for lecture speech transcription. In: Proceedings of ICSLP 2002, pp. 1413–1416 (2002)
3.
Zurück zum Zitat Nanjo, H., Kawahara, T.: Language model and speaking rate adaptation for spontaneous presentation speech recognition. IEEE Trans. Speech Audio Process. 12(4), 391–400 (2004)CrossRef Nanjo, H., Kawahara, T.: Language model and speaking rate adaptation for spontaneous presentation speech recognition. IEEE Trans. Speech Audio Process. 12(4), 391–400 (2004)CrossRef
4.
Zurück zum Zitat Hsu, B.-J., Glass, J.: Language model parameter estimation using user transcriptions. In: Proceedings of ICASSP 2009, Taipei, Taiwan, pp. 4805–4808 (2009) Hsu, B.-J., Glass, J.: Language model parameter estimation using user transcriptions. In: Proceedings of ICASSP 2009, Taipei, Taiwan, pp. 4805–4808 (2009)
5.
Zurück zum Zitat Ariki, Y., et al.: Live speech recognition in sports games by adaptation of acoustic and language model. In: Proceedings of EUROSPEECH 2003, pp. 1453–1456 (2003) Ariki, Y., et al.: Live speech recognition in sports games by adaptation of acoustic and language model. In: Proceedings of EUROSPEECH 2003, pp. 1453–1456 (2003)
6.
Zurück zum Zitat Chen, L., Gauvain, J.-L., Lamel, L., Adda, G.: Dynamic language modeling for broadcast news. In: Proceedings of ICSLP 2004, Jeju Island, Korea, pp. 997–1000 (2004) Chen, L., Gauvain, J.-L., Lamel, L., Adda, G.: Dynamic language modeling for broadcast news. In: Proceedings of ICSLP 2004, Jeju Island, Korea, pp. 997–1000 (2004)
7.
Zurück zum Zitat Cerva, P., Nouza, J., Kolorenc, J., David, P.: Improved transcription of Czech parliament speeches by acoustic and language model adaptation. In: Proceedings of SPECOM 2006, St. Petersburg, Russia, pp. 103–106 (2006) Cerva, P., Nouza, J., Kolorenc, J., David, P.: Improved transcription of Czech parliament speeches by acoustic and language model adaptation. In: Proceedings of SPECOM 2006, St. Petersburg, Russia, pp. 103–106 (2006)
8.
Zurück zum Zitat Tur, G., Stolcke, A.: Unsupervised language model adaptation for meeting recognition. In: Proceedings of ICASSP 2007, Honolulu, Hawaii, USA, pp. IV-173–IV-176 (2007) Tur, G., Stolcke, A.: Unsupervised language model adaptation for meeting recognition. In: Proceedings of ICASSP 2007, Honolulu, Hawaii, USA, pp. IV-173–IV-176 (2007)
9.
Zurück zum Zitat Vergyri, D., Stolcke, A., Tur, G.: Exploiting user feedback for language model adaptation in meeting recognition. In: Proceedings of ICASSP 2009, pp. 4737–4740 (2009) Vergyri, D., Stolcke, A., Tur, G.: Exploiting user feedback for language model adaptation in meeting recognition. In: Proceedings of ICASSP 2009, pp. 4737–4740 (2009)
10.
Zurück zum Zitat Besling, S., Meier, H.-G.: Language model speaker adaptation. In: Proceedings of EUROSPEECH 1995, Madrid, Spain, pp. 1755–1758 (1995) Besling, S., Meier, H.-G.: Language model speaker adaptation. In: Proceedings of EUROSPEECH 1995, Madrid, Spain, pp. 1755–1758 (1995)
11.
Zurück zum Zitat Klakow, D.: Language model adaptation for tiny adaptation corpora. In: Proceedings of INTERSPEECH 2006, Pittsburgh, PA, USA, pp. 2214–2217 (2006) Klakow, D.: Language model adaptation for tiny adaptation corpora. In: Proceedings of INTERSPEECH 2006, Pittsburgh, PA, USA, pp. 2214–2217 (2006)
12.
Zurück zum Zitat Kneser, R., Peters, J., Klakow, D.: Language model adaptation using dynamic marginals. In: Proceedings of EUROSPEECH 1997, Rhodes, Greece, pp. 1971–1974 (1997) Kneser, R., Peters, J., Klakow, D.: Language model adaptation using dynamic marginals. In: Proceedings of EUROSPEECH 1997, Rhodes, Greece, pp. 1971–1974 (1997)
13.
Zurück zum Zitat Bacchiani, M., Roark, B.: Unsupervised language model adaptation. In: Proceedings of ICASSP 2003, Hong Kong, China, pp. I-224–I-227 (2003) Bacchiani, M., Roark, B.: Unsupervised language model adaptation. In: Proceedings of ICASSP 2003, Hong Kong, China, pp. I-224–I-227 (2003)
14.
Zurück zum Zitat Staš, J., Juhár, J., Hládek, D.: Classification of heterogeneous text data for robust domain-specific language modeling. EURASIP J. Audio Speech Music Process. 2014(14), 12 (2014) Staš, J., Juhár, J., Hládek, D.: Classification of heterogeneous text data for robust domain-specific language modeling. EURASIP J. Audio Speech Music Process. 2014(14), 12 (2014)
15.
Zurück zum Zitat Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of ICSLP 2002, Denver, Colorado, USA, pp. 901–904 (2002) Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of ICSLP 2002, Denver, Colorado, USA, pp. 901–904 (2002)
16.
Zurück zum Zitat Lee, A., Kawahara, T., Shikano, K.: Julius - an open source real-time large vocabulary recognition engine. In: Proceedings of EUROSPEECH 2001, Aalborg, Denmark, pp. 1691–1694 (2001) Lee, A., Kawahara, T., Shikano, K.: Julius - an open source real-time large vocabulary recognition engine. In: Proceedings of EUROSPEECH 2001, Aalborg, Denmark, pp. 1691–1694 (2001)
17.
Zurück zum Zitat Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of IEEE ASRU Workshop, Santa Barbara, CA, USA, pp. 347–354 (1997) Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of IEEE ASRU Workshop, Santa Barbara, CA, USA, pp. 347–354 (1997)
18.
Zurück zum Zitat Lojka, M., Juhár, J.: Hypothesis combination for Slovak dictation speech recognition. In: Proceedings of 56th International Symposium on ELMAR 2014, Zadar, Croatia, pp. 43–46 (2014) Lojka, M., Juhár, J.: Hypothesis combination for Slovak dictation speech recognition. In: Proceedings of 56th International Symposium on ELMAR 2014, Zadar, Croatia, pp. 43–46 (2014)
Metadaten
Titel
Language Model Speaker Adaptation for Transcription of Slovak Parliament Proceedings
verfasst von
Ján Staš
Daniel Hládek
Jozef Juhár
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-23132-7_32