Skip to main content
Top

2019 | OriginalPaper | Chapter

Slovak Broadcast News Speech Recognition and Transcription System

Authors : Martin Lojka, Peter Viszlay, Ján Staš, Daniel Hládek, Jozef Juhár

Published in: Advances in Network-Based Information Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We have developed a working prototype of automatic subtitling system for transcription, archiving, and indexing of Slovak audiovisual recordings, such as lectures, talks, discussions or broadcast news. To go further in the development and research, we had to incorporate more and more modern speech technologies and embrace nowadays deep learning techniques. This paper describes transition and changes made to our working prototype regarding speech recognition core replacement, architecture changes and new web-based user interface. We have used the state-of-the art speech toolkit KALDI and distributed architecture to achieve better responsivity of the interface and faster processing of the audiovisual recordings. Using acoustic models based on time delay deep neural networks we have been able to lower the system’s average word error rate from previously reported 24% to 15%, absolutely.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Akita, Y., Watanabe, M., Kawahara, T.: Automatic transcription of lecture speech using language model based on speaking-style transformation of proceeding texts. In: Proceedings of INTERSPEECH 2012, pp. 2326–2329, Portland, OR, USA (2012) Akita, Y., Watanabe, M., Kawahara, T.: Automatic transcription of lecture speech using language model based on speaking-style transformation of proceeding texts. In: Proceedings of INTERSPEECH 2012, pp. 2326–2329, Portland, OR, USA (2012)
2.
go back to reference Álvarez, A., Mendes, C., Raffaelli, M., Luís, T., Paulo, S., Piccinini, N., Arzelus, H., Neto, J., Aliprandi, C., Del Pozo, A.: Automating live and batch subtitling of multimedia contents for several European languages. Multimedia Tools Appl. 75(18), 10823–10853 (2016)CrossRef Álvarez, A., Mendes, C., Raffaelli, M., Luís, T., Paulo, S., Piccinini, N., Arzelus, H., Neto, J., Aliprandi, C., Del Pozo, A.: Automating live and batch subtitling of multimedia contents for several European languages. Multimedia Tools Appl. 75(18), 10823–10853 (2016)CrossRef
3.
go back to reference Gauvain, J.-L., Lamel, L., Adda, G.: The LIMSI broadcast news transcription system. Speech Commun. 37(1–2), 89–108 (2002)CrossRef Gauvain, J.-L., Lamel, L., Adda, G.: The LIMSI broadcast news transcription system. Speech Commun. 37(1–2), 89–108 (2002)CrossRef
4.
5.
go back to reference Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Proceedings of INTERSPEECH 2015, pp. 3214–3218, Dresden, Germany (2015) Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Proceedings of INTERSPEECH 2015, pp. 3214–3218, Dresden, Germany (2015)
6.
go back to reference Pleva, M., Juhár, J.: TUKE-BNews-SK: Slovak broadcast news corpus construction and evaluation. In: Proceedings of LREC 2014, pp. 1709–1713, Reykjavik, Island (2014) Pleva, M., Juhár, J.: TUKE-BNews-SK: Slovak broadcast news corpus construction and evaluation. In: Proceedings of LREC 2014, pp. 1709–1713, Reykjavik, Island (2014)
7.
go back to reference Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU 2011, Waikoloa, Hawaii, USA (2011) Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU 2011, Waikoloa, Hawaii, USA (2011)
8.
go back to reference Quoc, V.L., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1188–1196, Beijing, China (2014) Quoc, V.L., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1188–1196, Beijing, China (2014)
9.
go back to reference Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings of INTERSPEECH 2013, pp. 1477–1481, Lyon, France (2013) Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings of INTERSPEECH 2013, pp. 1477–1481, Lyon, France (2013)
10.
go back to reference Rusko, M., Juhár, J., Trnka, M., Staš, J., Darjaa, S., Hládek, D., Sabo, R., Pleva, M., Ritomskỳ, M., Ondáš, S.: Recent advances in the Slovak dictation system for judicial domain. In: Proceedings of LTC 2013, pp. 555–560, Poznań, Poland (2013) Rusko, M., Juhár, J., Trnka, M., Staš, J., Darjaa, S., Hládek, D., Sabo, R., Pleva, M., Ritomskỳ, M., Ondáš, S.: Recent advances in the Slovak dictation system for judicial domain. In: Proceedings of LTC 2013, pp. 555–560, Poznań, Poland (2013)
11.
go back to reference Staš, J., Hládek, D., Juhár, J.: Semantic indexing and document retrieval for personalized language modeling. In: Proceedings of ELMAR 2017, pp. 157–161, Zadar, Croatia (2017) Staš, J., Hládek, D., Juhár, J.: Semantic indexing and document retrieval for personalized language modeling. In: Proceedings of ELMAR 2017, pp. 157–161, Zadar, Croatia (2017)
12.
go back to reference Staš, J., Viszlay, P., Lojka, M., Koctúr, T., Hládek, D., Kiktová, E., Pleva, M., Juhár, J.: Automatic subtitling system for transcription archiving and indexing of Slovak audiovisual recordings. In: Proceedings of LTC 2015, pp. 186–191, Poznań, Poland (2015) Staš, J., Viszlay, P., Lojka, M., Koctúr, T., Hládek, D., Kiktová, E., Pleva, M., Juhár, J.: Automatic subtitling system for transcription archiving and indexing of Slovak audiovisual recordings. In: Proceedings of LTC 2015, pp. 186–191, Poznań, Poland (2015)
13.
go back to reference Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of ICSLP 2002, Denver, CO, USA (2002) Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of ICSLP 2002, Denver, CO, USA (2002)
14.
go back to reference Vavrek, J., Viszlay, P., Kiktová, E., Lojka, M., Juhár, J., Čižmár, A.: Query-by-example retrieval via fast sequential dynamic time warping algorithm. In: Proceedings of TSP 2015, pp. 453–457, Berlin, Germany (2015) Vavrek, J., Viszlay, P., Kiktová, E., Lojka, M., Juhár, J., Čižmár, A.: Query-by-example retrieval via fast sequential dynamic time warping algorithm. In: Proceedings of TSP 2015, pp. 453–457, Berlin, Germany (2015)
15.
go back to reference Viszlay, P., Stas, J., Koctúr, T., Lojka, M., Juhár, J.: An extension of the Slovak broadcast news corpus based on semi-automatic annotation. In: Proceedings of LREC 2016, pp. 4684–4687, Portorož, Slovenia (2016) Viszlay, P., Stas, J., Koctúr, T., Lojka, M., Juhár, J.: An extension of the Slovak broadcast news corpus based on semi-automatic annotation. In: Proceedings of LREC 2016, pp. 4684–4687, Portorož, Slovenia (2016)
Metadata
Title
Slovak Broadcast News Speech Recognition and Transcription System
Authors
Martin Lojka
Peter Viszlay
Ján Staš
Daniel Hládek
Jozef Juhár
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-319-98530-5_32

Premium Partner