Skip to main content
Top

2017 | OriginalPaper | Chapter

Improving of LVCSR for Causal Czech Using Publicly Available Language Resources

Authors : Petr Mizera, Petr Pollak

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The paper presents the design of Czech casual speech recognition which is a part of the wider research focused on understanding very informal speaking styles. The study was carried out using the NCCCz corpus and the contributions of optimized acoustic and language models as well as pronunciation lexicon optimization were analyzed. Special attention was paid to the impact of publicly available corpora suitable for language model (LM) creation. Our final DNN-HMM system achieved in the task of casual speech recognition WER of 30–60% depending on LM used. The results of recognition for other speaking styles are presented as well for the comparison purposes. The system was built using KALDI toolkit and created recipes are available for the research community.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cui, J., Ramabhadran, B., Cui, X., Rosenberg, A., Kingsbury, B., Sethy, A.: Recent improvements in neural network acoustic modeling for LVCSR in low resource languages. In: Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association, Singapore (2014) Cui, J., Ramabhadran, B., Cui, X., Rosenberg, A., Kingsbury, B., Sethy, A.: Recent improvements in neural network acoustic modeling for LVCSR in low resource languages. In: Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association, Singapore (2014)
2.
go back to reference Seltzer, L.M., Dong, Y., Yongqiang, W.: An investigation of deep neural networks for noise robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, Canada (2013) Seltzer, L.M., Dong, Y., Yongqiang, W.: An investigation of deep neural networks for noise robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, Canada (2013)
3.
go back to reference Korvas, M., Plátek, O., Dušek, O., Žilka, L., Jurčíček, F.: Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license. In: Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland, pp. 365–370 (2014) Korvas, M., Plátek, O., Dušek, O., Žilka, L., Jurčíček, F.: Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license. In: Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland, pp. 365–370 (2014)
4.
go back to reference Barras, C., Lamel, L., Gauvain, J.L.: Automatic transcription of compressed broadcast audio. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, USA, pp. 265–268 (2001) Barras, C., Lamel, L., Gauvain, J.L.: Automatic transcription of compressed broadcast audio. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, USA, pp. 265–268 (2001)
5.
go back to reference Nouza, J., Ždánský, J., Červa, P.: System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In: Proceedings of 15th IEEE MELECON Conference, La Valleta, Malta, pp. 202–205 (2010) Nouza, J., Ždánský, J., Červa, P.: System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In: Proceedings of 15th IEEE MELECON Conference, La Valleta, Malta, pp. 202–205 (2010)
6.
go back to reference Nouza, J., Blavka, K., Bohac, M., Cerva, P., Málek, J.: System for producing subtitles to internet audio-visual documents. In: 38th International Conference on Telecommunications and Signal Processing, TSP 2015, Prague, Czech Republic, pp. 1–5, 9–11 July 2015 Nouza, J., Blavka, K., Bohac, M., Cerva, P., Málek, J.: System for producing subtitles to internet audio-visual documents. In: 38th International Conference on Telecommunications and Signal Processing, TSP 2015, Prague, Czech Republic, pp. 1–5, 9–11 July 2015
7.
go back to reference Psutka, J., Psutka, J., Ircing, P., Hoidekr, J.: Recognition of spontaneously pronounced TV ice-hockey commentary. In: Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 83–86 (2003) Psutka, J., Psutka, J., Ircing, P., Hoidekr, J.: Recognition of spontaneously pronounced TV ice-hockey commentary. In: Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 83–86 (2003)
8.
go back to reference Lehr, M., Gorman, K., Shafran, I.: Discriminative pronunciation modeling for dialectal speech recognition. In: Proceedings of Interspeech 2014, Singapore, pp. 1458–1462 (2014) Lehr, M., Gorman, K., Shafran, I.: Discriminative pronunciation modeling for dialectal speech recognition. In: Proceedings of Interspeech 2014, Singapore, pp. 1458–1462 (2014)
9.
go back to reference Nouza, J., Silovský, J.: Adpating lexical and language models for transcription of highly spontaneous spoken Czech. In: Proceedings of Text, Speech, and Dialogue, LNAI, vol. 6231, Brno, Czech Republic, pp. 377–384 (2010) Nouza, J., Silovský, J.: Adpating lexical and language models for transcription of highly spontaneous spoken Czech. In: Proceedings of Text, Speech, and Dialogue, LNAI, vol. 6231, Brno, Czech Republic, pp. 377–384 (2010)
10.
go back to reference Byrne, W., et al.: Automatic recognition of spontaneous speech for access to multilingual oral history archives. IEEE Trans. Speech Audio Process. 12(4), 420–435 (2004)CrossRef Byrne, W., et al.: Automatic recognition of spontaneous speech for access to multilingual oral history archives. IEEE Trans. Speech Audio Process. 12(4), 420–435 (2004)CrossRef
11.
go back to reference Ernestus, M., Kočková-Amortová, L., Pollák, P.: The Nijmegen corpus of casual Czech. In: Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland, pp. 365–370 (2014) Ernestus, M., Kočková-Amortová, L., Pollák, P.: The Nijmegen corpus of casual Czech. In: Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland, pp. 365–370 (2014)
12.
go back to reference Torreira, F., Adda-Decker, M., Ernestus, M.: The Nijmegen corpus of casual French. Speech Commun. 52, 201–221 (2010)CrossRef Torreira, F., Adda-Decker, M., Ernestus, M.: The Nijmegen corpus of casual French. Speech Commun. 52, 201–221 (2010)CrossRef
13.
go back to reference Prochazka, V., Pollak, P.: Conversational speech from Nijmegen corpus of casual Czech by general ASR language models. In: Production and Comprehension of Conversational Speech, pp. 34–35 (2011) Prochazka, V., Pollak, P.: Conversational speech from Nijmegen corpus of casual Czech by general ASR language models. In: Production and Comprehension of Conversational Speech, pp. 34–35 (2011)
14.
go back to reference Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef
15.
go back to reference Vesely, K., Karafiat, M., Grezl, F.: Convolutive bottleneck network features for lVCSR. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), December 2011 Vesely, K., Karafiat, M., Grezl, F.: Convolutive bottleneck network features for lVCSR. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), December 2011
16.
go back to reference Pollak, P., Cernocky, J.: Czech SPEECON adult database. Technical report (2004) Pollak, P., Cernocky, J.: Czech SPEECON adult database. Technical report (2004)
18.
go back to reference Prochazka, V., Pollak, P., Zdansky, J., Nouza, J.: Performance of Czech speech recognition with language models created from public resources. Radioengineering 20, 1002–1008 (2011) Prochazka, V., Pollak, P., Zdansky, J., Nouza, J.: Performance of Czech speech recognition with language models created from public resources. Radioengineering 20, 1002–1008 (2011)
19.
20.
go back to reference Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Proceedings of Interspeech 2014, Singapore (2014) Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Proceedings of Interspeech 2014, Singapore (2014)
21.
go back to reference Kolman, A., Pollak, P.: Speech reduction in Czech. In: Proceedings of LabPhone 14, The 14th Conference on Laboratory Phonology, Tokyo, Japan (2014) Kolman, A., Pollak, P.: Speech reduction in Czech. In: Proceedings of LabPhone 14, The 14th Conference on Laboratory Phonology, Tokyo, Japan (2014)
22.
go back to reference Rajnoha, J., Pollák, P.: Czech spontaneous speech collection and annotation: the database of technical lectures. In: Esposito, A., Vích, R. (eds.) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. LNCS, vol. 5641, pp. 377–385. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03320-9_35 CrossRef Rajnoha, J., Pollák, P.: Czech spontaneous speech collection and annotation: the database of technical lectures. In: Esposito, A., Vích, R. (eds.) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. LNCS, vol. 5641, pp. 377–385. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-03320-9_​35 CrossRef
23.
go back to reference Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU 2011, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011) Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU 2011, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)
24.
go back to reference Fousek, P., Pollak, P.: Efficient and reliable measurement and simulation of noisy speech background. In: Proceedings of EUROSPEECH 2003, 8-th European Conference on Speech Communication and Technology, Geneve, Switzerland (2003) Fousek, P., Pollak, P.: Efficient and reliable measurement and simulation of noisy speech background. In: Proceedings of EUROSPEECH 2003, 8-th European Conference on Speech Communication and Technology, Geneve, Switzerland (2003)
25.
go back to reference Borsky, M., Mizera, P., Pollak, P.: Noise and channel normalized cepstral features for far-speech recognition. In: Proceedings of SPECOM 2013, The 15th International Conference on Speech and Computer, Pilsen, Czech Republic (2013) Borsky, M., Mizera, P., Pollak, P.: Noise and channel normalized cepstral features for far-speech recognition. In: Proceedings of SPECOM 2013, The 15th International Conference on Speech and Computer, Pilsen, Czech Republic (2013)
Metadata
Title
Improving of LVCSR for Causal Czech Using Publicly Available Language Resources
Authors
Petr Mizera
Petr Pollak
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-66429-3_42

Premium Partner