Top

Published in:

2022 | OriginalPaper | Chapter

Latest Trends in Deep Learning for Automatic Speech Recognition System

Authors : Amritpreet Kaur, Rohit Sachdeva, Amitoj Singh

Published in: Artificial Intelligence and Speech Technology

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In the field of Computer Learning and Intelligent Systems research, Deep Learning is one of the latest development projects. It’s also one of the trendiest areas of study right now. Computational vision and pattern recognition have benefited greatly from the dramatic advances made possible by deep learning techniques. New deep learning approaches are already being suggested, offering performance that outperforms current state-of-the-art methods and even surpasses them. There has been much significant advancement in this area in the last few years. Deep learning is developing at an accelerated rate, making it difficult for new investigators to keep pace of its many kinds. We will quickly cover current developments in Deep Learning in the last several years in this article.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Comparison of Modelling ASR System with Different Features Extraction Methods Using Sequential Model

next chapter Deep Learning Approaches for Speech Analysis: A Critical Insight

Bhatt, S., Jain, A., Dev, A.: Acoustic modeling in speech recognition: a systematic review. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(4), 397–412 (2020)

Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRef

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)

Kaur, J., Singh, A., Kadyan, V.: Automatic speech recognition system for tonal languages: state-of-the-art survey. Arch. Comput. Meth. Eng. 28(3), 1039–1068 (2021)CrossRef

Seide, F., Li, G., Chen, X., Yu, D.:Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 24–29. IEEE (2011)

Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef

Bansal, P., Kant, A., Kumar, S., Sharda, A., Gupta, S.: Improved hybrid moda of HMM/GMM for speech recognition. Inf. Sci. Comput. 2, 69–74 (2008). Supplement to international Journal; “Information Technologies and Knowledge”

Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Springer, Boston (2012). https://doi.org/10.1007/978-1-4615-3210-1

10.

Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)

11.

Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRef

12.

Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013)

13.

Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849. IEEE (2017)

14.

Zeghidour, N., Xu, Q., Liptchinsky, V., Usunier, N., Synnaeve, G., Collobert, R.: Fully convolutional speech recognition. arXiv preprint arXiv:06864 (2018)

15.

Kriman, S., et al.: QuartzNet: deep automatic speech recognition with 1d time-channel separable convolutions. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 6124–6128. IEEE (2020)

16.

Li, J., et al.: Jasper: an end-to-end convolutional neural acoustic model. arXiv preprint arXiv:03288 (2019)

17.

Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:09452 (2017)

18.

Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)

19.

Sun, C., Ma, M., Zhao, Z., Chen, X.: Sparse deep stacking network for fault diagnosis of motor. IEEE Trans. Industr. Inf. 14(7), 3261–3270 (2018)CrossRef

20.

Medsker, L.R., Jain, L.: Recurrent neural networks. Des. Appl. 5, 64–67 (2001)

21.

Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)

22.

Balyan, A., Agrawal, S.S., Dev, A.: Automatic phonetic segmentation of Hindi speech using hidden Markov model. AI Soc. 27, 543–549 (2012). https://doi.org/10.1007/s00146-012-0386-2 CrossRef

23.

Manohar, V., Chen, S.-J., Wang, Z., Fujita, Y., Watanabe, S., Khudanpur, S.:Acoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge system. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6665–6669. IEEE (2019)

24.

Deng, L., Hinton, G., Kingsbury, B.:New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013)

25.

Peddinti, V., Povey, D., Khudanpur, S.:A time delay neural network architecture for efficient modeling of long temporal contexts. In: 16th Annual Conference of the International Speech Communication Association (2015)

26.

Palaz, D., Doss, M.M., Collobert, R.:Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4295–4299. IEEE (2015)

27.

Narasimhan, R., Fern, X.Z., Raich, R.:Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 146–150. IEEE (2017)

28.

Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.:Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:02720 (2017)

29.

Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.:Multi-level wavelet-CNN for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018)

30.

Hennequin, R., Royo-Letelier, J., Moussallam,M.: Codec independent lossy audio compression detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 726–730. IEEE (2017)

31.

Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRef

32.

Wang, H., Wang, D.: Towards robust speech super-resolution. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 2058–2066 (2021)CrossRef

Title: Latest Trends in Deep Learning for Automatic Speech Recognition System
Authors: Amritpreet Kaur
Rohit Sachdeva
Amitoj Singh
Publisher: Springer International Publishing
Book: Artificial Intelligence and Speech Technology
Print ISBN: 978-3-030-95710-0

Electronic ISBN: 978-3-030-95711-7

Copyright Year: 2022
DOI: https://doi.org/10.1007/978-3-030-95711-7_6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner