nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

End-to-End Speech Recognition in Agglutinative Languages

verfasst von : Orken Mamyrbayev, Keylan Alimhan, Bagashar Zhumazhanov, Tolganay Turdalykyzy, Farida Gusmanova

Erschienen in: Intelligent Information and Database Systems

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper considers end-to-end speech recognition systems based on deep neural networks (DNN). The studies used different types of neural networks, CTC model and attention-based encoder-decoder models. As a result of the study, it was proved that the CTC model works without language models directly for agglutinative languages, but the best is ResNet with 11.52% of CER and 19.57% of WER of using the language model. An experiment with the BLSTM neural network using the attention-based encoder-decoder models showed 8.01% of CER of and 17.91% of WER. Using the experiment, it was proved that without integrating language models, good results can be achieved. The best result showed ResNet.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Soft Computing-Based Control System of Intelligent Robot Navigation

Nächstes Kapitel Approach the Interval Type-2 Fuzzy System and PSO Technique in Landcover Classification

Perera, F.P., et al.: Relationship between polycyclic aromatic hydrocarbon–DNA adducts and proximity to the World Trade Center and effects on fetal growth. Environ. Health Perspect. 113, 1062–1067 (2005)CrossRef

Mamyrbayev, O., Turdalyuly, M., Mekebayev, N., Alimhan, K., Kydyrbekova, A., Turdalykyzy, T.: Automatic recognition of Kazakh speech using deep neural networks. In: Nguyen, N.T., Gaol, F.L., Hong, T.-P., Trawiński, B. (eds.) ACIIDS 2019. LNCS (LNAI), vol. 11432, pp. 465–474. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14802-7_40CrossRef

Mikolov, T., et al.: Recurrent neural network based language model. Interspeech 2, 1045–1048 (2010)

Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4225–4229 (2015)

Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887 (2011)

Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. Colorado University at Boulder Department of Computer Science, pp. 194–281 (1986)

Vaněk, J., Zelinka, J., Soutner, D., Psutka, J.: A regularization post layer: an additional way how to make deep neural networks robust. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 204–214. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_17CrossRef

Kim, S., Hori, T., Watanabe, S.: Joint CTC-attention based end-to-end speech recognition using multi-task learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)

Aida-Zade, K., Rustamov, S., Mustafayev, E.: Principles of construction of speech recognition system by the example of Azerbaijan language. In: International Symposium on Innovations in Intelligent Systems and Applications, pp. 378–382 (2009)

10.

Hannun, A., et al.: DeepSpeech: scaling up end-to-end speech recognition, arXiv:1412.5567 (2014)

11.

Zhang, Z., et al.: Deep recurrent convolutional neural network: improving performance for speech recognition (2016). preprint: arXiv:1611.07174. https://arxiv.org/abs/1611.07174

12.

Bahdanau, D., et al.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949. IEEE (2016)

13.

Zhang, Y., et al.: Towards end-to-end speech recognition with deep convolutional neural networks (2017). preprint: arXiv:1701.02720. https://arxiv.org/abs/1701.02720

14.

Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, 4 p. IEEE Signal Processing Society (2011)

15.

Soltau, H., Liao, H., Sak, H.: Neural speech recognizer: acoustic-to-word LSTM model for large vocabulary speech recognition. arXiv:1610.09975 (2016)

16.

Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)

17.

Popović, B., Pakoci, E., Pekar, D.: End-to-End large vocabulary speech recognition for the Serbian language. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 343–352. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_33CrossRef

18.

Boulanger-Lewandowski, N., Bengio, Y., Vincent, P.: High-dimensional sequence transduction. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3178–3182 (2013)

19.

Wang, Y., Deng, X., Pu, S., Huang, Z.: Residual convolutional CTC networks for automatic speech recognition (2017). preprint: arXiv:1702.07793. https://arxiv.org/abs/1702.07793

20.

Rustamov, S., Gasimov, E., Hasanov, R., Jahangirli, S., Mustafayev, E., Usikov, D.: Speech recognition in flight simulator. In: Aegean International Textile and Advanced Engineering Conference. IOP Conference Series: Materials Science and Engineering, vol. 459 (2018)

21.

Gulmira, T., Alymzhan, T., Orken, M., Rustam, M.: Neural named entity recognition for Kazakh. In: 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 7–13 April 2019, La Rochelle, France. Lecture Notes in Computer Science (2019)

22.

Toleu, A., Tolegen, G., Makazhanov, A.: Character-aware neural morphological disambiguation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 666–671. Association for Computational Linguistics, Vancouver (2017)

Titel: End-to-End Speech Recognition in Agglutinative Languages
verfasst von: Orken Mamyrbayev
Keylan Alimhan
Bagashar Zhumazhanov
Tolganay Turdalykyzy
Farida Gusmanova
Verlag: Springer International Publishing
Buch: Intelligent Information and Database Systems
Print ISBN: 978-3-030-42057-4

Electronic ISBN: 978-3-030-42058-1

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-42058-1_33

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner