Skip to main content
Top

2018 | OriginalPaper | Chapter

Cross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources

Authors : Jan Nouza, Petr Cerva, Radek Safarik

Published in: Human Language Technology. Challenges for Computer Science and Linguistics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present methods and procedures designed for cost-efficient adaptation of an existing speech recognition system to Polish. The system (originally built for Czech language) is adapted using common texts and speech recordings accessible from Polish web-pages. The most critical part, an acoustic model (AM) for Polish, is built in several steps, which include: (a) an initial bootstrapping phase that utilizes existing Czech AM, (b) a lightly-supervised iterative scheme for automatic collection and annotation of Polish speech data, and finally (c) acquisition of a large amount of broadcast data in an unsupervised way. The developed system has been evaluated in the task of automatic content monitoring of major Polish TV and Radio stations. Its transcription accuracy (measured on a set of 4 complete TV news shows with total duration of 105 min) is 79,2%. For clean studio speech, its accuracy gets over 92%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Demenko, G., Wypych, M., Baranowska, E.: Implementation of grapheme-to-phoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis. Speech Lang. Technol. 7(17), 79–97 (2003) Demenko, G., Wypych, M., Baranowska, E.: Implementation of grapheme-to-phoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis. Speech Lang. Technol. 7(17), 79–97 (2003)
3.
go back to reference Demenko, G., Grocholewski, S., Klessa, K., Ogorkiewicz, J., Wagner, A., Lange, M., Sledzinski, D., Cylwik, N.: JURISDIC: polish speech database for taking dictation of legal texts. In: Proceedings of LREC, pp. 1280–1287 (2008) Demenko, G., Grocholewski, S., Klessa, K., Ogorkiewicz, J., Wagner, A., Lange, M., Sledzinski, D., Cylwik, N.: JURISDIC: polish speech database for taking dictation of legal texts. In: Proceedings of LREC, pp. 1280–1287 (2008)
4.
go back to reference Demenko, G., et al.: Development of large vocabulary continuous speech recognition for polish. Acta Phys. Pol. A 1(121), A-86 (2012) Demenko, G., et al.: Development of large vocabulary continuous speech recognition for polish. Acta Phys. Pol. A 1(121), A-86 (2012)
6.
go back to reference Lööf, J., Gollan, C., Ney, H.: Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system. In: Proceedings of Interspeech, pp. 88–91 (2009) Lööf, J., Gollan, C., Ney, H.: Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system. In: Proceedings of Interspeech, pp. 88–91 (2009)
7.
go back to reference Marasek, K.: Large vocabulary continuous speech recognition system for Polish. Arch. Acoust. 28(4), 119–126 (2003) Marasek, K.: Large vocabulary continuous speech recognition system for Polish. Arch. Acoust. 28(4), 119–126 (2003)
8.
go back to reference Nouza, J., Boháč, M.: Using TTS for fast prototyping of cross-lingual ASR applications. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds.) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. LNCS, vol. 6800, pp. 154–162. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25775-9_15CrossRef Nouza, J., Boháč, M.: Using TTS for fast prototyping of cross-lingual ASR applications. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds.) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. LNCS, vol. 6800, pp. 154–162. Springer, Heidelberg (2011). https://​doi.​org/​10.​1007/​978-3-642-25775-9_​15CrossRef
9.
go back to reference Nouza, J., Cerva, P., Kucharova, M.: Cost-efficient development of acoustic models for speech recognition of related languages. Radioengineering 22(3), 866–873 (2013) Nouza, J., Cerva, P., Kucharova, M.: Cost-efficient development of acoustic models for speech recognition of related languages. Radioengineering 22(3), 866–873 (2013)
10.
go back to reference Nouza, J., et al.: Speech-to-text technology to transcribe and disclose 100,000 + hours of bilingual documents from historical czech and czechoslovak radio archive. In: Proceedings of Interspeech, pp. 964–968 (2014) Nouza, J., et al.: Speech-to-text technology to transcribe and disclose 100,000 + hours of bilingual documents from historical czech and czechoslovak radio archive. In: Proceedings of Interspeech, pp. 964–968 (2014)
11.
go back to reference Nouza, J., Safarik, R., Cerva, P.: ASR for south slavic languages developed in almost automated way. In: Proceedings of Interspeech, pp. 3868–3872 (2016) Nouza, J., Safarik, R., Cerva, P.: ASR for south slavic languages developed in almost automated way. In: Proceedings of Interspeech, pp. 3868–3872 (2016)
12.
go back to reference Pawlaczyk, L., Bosky, P.: Skrybot–a System for Automatic Speech Recognition of Polish Language. Man-Machine Interactions, pp. 381–387. Springer, Heidelberg (2009)MATH Pawlaczyk, L., Bosky, P.: Skrybot–a System for Automatic Speech Recognition of Polish Language. Man-Machine Interactions, pp. 381–387. Springer, Heidelberg (2009)MATH
13.
go back to reference Schultz, T.: GlobalPhone: a multilingual speech and text database developed at karlsruhe university. In: Proceedings of ICSLP, pp. 345–348 (2002) Schultz, T.: GlobalPhone: a multilingual speech and text database developed at karlsruhe university. In: Proceedings of ICSLP, pp. 345–348 (2002)
14.
go back to reference Seps, L., Malek, J., Cerva, P., Nouza, J.: Investigation of deep neural networks for robust recognition of nonlinearly distorted speech. In: Proceedings of Interspeech, pp. 363–367 (2014) Seps, L., Malek, J., Cerva, P., Nouza, J.: Investigation of deep neural networks for robust recognition of nonlinearly distorted speech. In: Proceedings of Interspeech, pp. 363–367 (2014)
15.
go back to reference Vu, N.T., et al.: Rapid bootstrapping of five eastern European languages using the rapid language adaptation toolkit. In: Proceedings of Interspeech, pp. 865–868 (2010) Vu, N.T., et al.: Rapid bootstrapping of five eastern European languages using the rapid language adaptation toolkit. In: Proceedings of Interspeech, pp. 865–868 (2010)
16.
go back to reference Vu, N.T., Kraus, F., Schultz, T.: Multilingual A-stabil: a new confidence score for multilingual unsupervised training. In: Proceedings of Spoken Language Technology Workshop (SLT), pp. 183–188. IEEE (2010) Vu, N.T., Kraus, F., Schultz, T.: Multilingual A-stabil: a new confidence score for multilingual unsupervised training. In: Proceedings of Spoken Language Technology Workshop (SLT), pp. 183–188. IEEE (2010)
17.
go back to reference Ziółko, M., et al.: Automatic speech recognition system dedicated for Polish. In: Proceedings of Interspeech, pp. 3315–3315 (2011) Ziółko, M., et al.: Automatic speech recognition system dedicated for Polish. In: Proceedings of Interspeech, pp. 3315–3315 (2011)
Metadata
Title
Cross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources
Authors
Jan Nouza
Petr Cerva
Radek Safarik
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-93782-3_3

Premium Partner