Skip to main content
Top

2022 | OriginalPaper | Chapter

Latest Trends in Deep Learning for Automatic Speech Recognition System

Authors : Amritpreet Kaur, Rohit Sachdeva, Amitoj Singh

Published in: Artificial Intelligence and Speech Technology

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the field of Computer Learning and Intelligent Systems research, Deep Learning is one of the latest development projects. It’s also one of the trendiest areas of study right now. Computational vision and pattern recognition have benefited greatly from the dramatic advances made possible by deep learning techniques. New deep learning approaches are already being suggested, offering performance that outperforms current state-of-the-art methods and even surpasses them. There has been much significant advancement in this area in the last few years. Deep learning is developing at an accelerated rate, making it difficult for new investigators to keep pace of its many kinds. We will quickly cover current developments in Deep Learning in the last several years in this article.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bhatt, S., Jain, A., Dev, A.: Acoustic modeling in speech recognition: a systematic review. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(4), 397–412 (2020) Bhatt, S., Jain, A., Dev, A.: Acoustic modeling in speech recognition: a systematic review. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(4), 397–412 (2020)
2.
go back to reference Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRef Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRef
3.
go back to reference LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef
4.
go back to reference Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
5.
go back to reference Kaur, J., Singh, A., Kadyan, V.: Automatic speech recognition system for tonal languages: state-of-the-art survey. Arch. Comput. Meth. Eng. 28(3), 1039–1068 (2021)CrossRef Kaur, J., Singh, A., Kadyan, V.: Automatic speech recognition system for tonal languages: state-of-the-art survey. Arch. Comput. Meth. Eng. 28(3), 1039–1068 (2021)CrossRef
6.
go back to reference Seide, F., Li, G., Chen, X., Yu, D.:Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 24–29. IEEE (2011) Seide, F., Li, G., Chen, X., Yu, D.:Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 24–29. IEEE (2011)
7.
go back to reference Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef
8.
go back to reference Bansal, P., Kant, A., Kumar, S., Sharda, A., Gupta, S.: Improved hybrid moda of HMM/GMM for speech recognition. Inf. Sci. Comput. 2, 69–74 (2008). Supplement to international Journal; “Information Technologies and Knowledge” Bansal, P., Kant, A., Kumar, S., Sharda, A., Gupta, S.: Improved hybrid moda of HMM/GMM for speech recognition. Inf. Sci. Comput. 2, 69–74 (2008). Supplement to international Journal; “Information Technologies and Knowledge”
10.
go back to reference Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013) Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
11.
go back to reference Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRef Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRef
12.
go back to reference Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013) Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013)
13.
go back to reference Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849. IEEE (2017) Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849. IEEE (2017)
14.
go back to reference Zeghidour, N., Xu, Q., Liptchinsky, V., Usunier, N., Synnaeve, G., Collobert, R.: Fully convolutional speech recognition. arXiv preprint arXiv:06864 (2018) Zeghidour, N., Xu, Q., Liptchinsky, V., Usunier, N., Synnaeve, G., Collobert, R.: Fully convolutional speech recognition. arXiv preprint arXiv:​06864 (2018)
15.
go back to reference Kriman, S., et al.: QuartzNet: deep automatic speech recognition with 1d time-channel separable convolutions. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 6124–6128. IEEE (2020) Kriman, S., et al.: QuartzNet: deep automatic speech recognition with 1d time-channel separable convolutions. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 6124–6128. IEEE (2020)
16.
go back to reference Li, J., et al.: Jasper: an end-to-end convolutional neural acoustic model. arXiv preprint arXiv:03288 (2019) Li, J., et al.: Jasper: an end-to-end convolutional neural acoustic model. arXiv preprint arXiv:​03288 (2019)
17.
go back to reference Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:09452 (2017) Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:​09452 (2017)
18.
go back to reference Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017) Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
19.
go back to reference Sun, C., Ma, M., Zhao, Z., Chen, X.: Sparse deep stacking network for fault diagnosis of motor. IEEE Trans. Industr. Inf. 14(7), 3261–3270 (2018)CrossRef Sun, C., Ma, M., Zhao, Z., Chen, X.: Sparse deep stacking network for fault diagnosis of motor. IEEE Trans. Industr. Inf. 14(7), 3261–3270 (2018)CrossRef
20.
go back to reference Medsker, L.R., Jain, L.: Recurrent neural networks. Des. Appl. 5, 64–67 (2001) Medsker, L.R., Jain, L.: Recurrent neural networks. Des. Appl. 5, 64–67 (2001)
21.
go back to reference Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012) Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)
23.
go back to reference Manohar, V., Chen, S.-J., Wang, Z., Fujita, Y., Watanabe, S., Khudanpur, S.:Acoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge system. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6665–6669. IEEE (2019) Manohar, V., Chen, S.-J., Wang, Z., Fujita, Y., Watanabe, S., Khudanpur, S.:Acoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge system. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6665–6669. IEEE (2019)
24.
go back to reference Deng, L., Hinton, G., Kingsbury, B.:New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013) Deng, L., Hinton, G., Kingsbury, B.:New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013)
25.
go back to reference Peddinti, V., Povey, D., Khudanpur, S.:A time delay neural network architecture for efficient modeling of long temporal contexts. In: 16th Annual Conference of the International Speech Communication Association (2015) Peddinti, V., Povey, D., Khudanpur, S.:A time delay neural network architecture for efficient modeling of long temporal contexts. In: 16th Annual Conference of the International Speech Communication Association (2015)
26.
go back to reference Palaz, D., Doss, M.M., Collobert, R.:Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4295–4299. IEEE (2015) Palaz, D., Doss, M.M., Collobert, R.:Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4295–4299. IEEE (2015)
27.
go back to reference Narasimhan, R., Fern, X.Z., Raich, R.:Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 146–150. IEEE (2017) Narasimhan, R., Fern, X.Z., Raich, R.:Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 146–150. IEEE (2017)
28.
go back to reference Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.:Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:02720 (2017) Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.:Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:​02720 (2017)
29.
go back to reference Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.:Multi-level wavelet-CNN for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018) Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.:Multi-level wavelet-CNN for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018)
30.
go back to reference Hennequin, R., Royo-Letelier, J., Moussallam,M.: Codec independent lossy audio compression detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 726–730. IEEE (2017) Hennequin, R., Royo-Letelier, J., Moussallam,M.: Codec independent lossy audio compression detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 726–730. IEEE (2017)
31.
go back to reference Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRef Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRef
32.
go back to reference Wang, H., Wang, D.: Towards robust speech super-resolution. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 2058–2066 (2021)CrossRef Wang, H., Wang, D.: Towards robust speech super-resolution. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 2058–2066 (2021)CrossRef
Metadata
Title
Latest Trends in Deep Learning for Automatic Speech Recognition System
Authors
Amritpreet Kaur
Rohit Sachdeva
Amitoj Singh
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-030-95711-7_6

Premium Partner