Skip to main content

2022 | OriginalPaper | Buchkapitel

Latest Trends in Deep Learning for Automatic Speech Recognition System

verfasst von : Amritpreet Kaur, Rohit Sachdeva, Amitoj Singh

Erschienen in: Artificial Intelligence and Speech Technology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the field of Computer Learning and Intelligent Systems research, Deep Learning is one of the latest development projects. It’s also one of the trendiest areas of study right now. Computational vision and pattern recognition have benefited greatly from the dramatic advances made possible by deep learning techniques. New deep learning approaches are already being suggested, offering performance that outperforms current state-of-the-art methods and even surpasses them. There has been much significant advancement in this area in the last few years. Deep learning is developing at an accelerated rate, making it difficult for new investigators to keep pace of its many kinds. We will quickly cover current developments in Deep Learning in the last several years in this article.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bhatt, S., Jain, A., Dev, A.: Acoustic modeling in speech recognition: a systematic review. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(4), 397–412 (2020) Bhatt, S., Jain, A., Dev, A.: Acoustic modeling in speech recognition: a systematic review. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(4), 397–412 (2020)
2.
Zurück zum Zitat Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRef Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRef
3.
Zurück zum Zitat LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef
4.
Zurück zum Zitat Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
5.
Zurück zum Zitat Kaur, J., Singh, A., Kadyan, V.: Automatic speech recognition system for tonal languages: state-of-the-art survey. Arch. Comput. Meth. Eng. 28(3), 1039–1068 (2021)CrossRef Kaur, J., Singh, A., Kadyan, V.: Automatic speech recognition system for tonal languages: state-of-the-art survey. Arch. Comput. Meth. Eng. 28(3), 1039–1068 (2021)CrossRef
6.
Zurück zum Zitat Seide, F., Li, G., Chen, X., Yu, D.:Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 24–29. IEEE (2011) Seide, F., Li, G., Chen, X., Yu, D.:Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 24–29. IEEE (2011)
7.
Zurück zum Zitat Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef
8.
Zurück zum Zitat Bansal, P., Kant, A., Kumar, S., Sharda, A., Gupta, S.: Improved hybrid moda of HMM/GMM for speech recognition. Inf. Sci. Comput. 2, 69–74 (2008). Supplement to international Journal; “Information Technologies and Knowledge” Bansal, P., Kant, A., Kumar, S., Sharda, A., Gupta, S.: Improved hybrid moda of HMM/GMM for speech recognition. Inf. Sci. Comput. 2, 69–74 (2008). Supplement to international Journal; “Information Technologies and Knowledge”
10.
Zurück zum Zitat Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013) Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
11.
Zurück zum Zitat Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRef Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRef
12.
Zurück zum Zitat Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013) Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013)
13.
Zurück zum Zitat Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849. IEEE (2017) Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849. IEEE (2017)
14.
Zurück zum Zitat Zeghidour, N., Xu, Q., Liptchinsky, V., Usunier, N., Synnaeve, G., Collobert, R.: Fully convolutional speech recognition. arXiv preprint arXiv:06864 (2018) Zeghidour, N., Xu, Q., Liptchinsky, V., Usunier, N., Synnaeve, G., Collobert, R.: Fully convolutional speech recognition. arXiv preprint arXiv:​06864 (2018)
15.
Zurück zum Zitat Kriman, S., et al.: QuartzNet: deep automatic speech recognition with 1d time-channel separable convolutions. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 6124–6128. IEEE (2020) Kriman, S., et al.: QuartzNet: deep automatic speech recognition with 1d time-channel separable convolutions. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 6124–6128. IEEE (2020)
16.
Zurück zum Zitat Li, J., et al.: Jasper: an end-to-end convolutional neural acoustic model. arXiv preprint arXiv:03288 (2019) Li, J., et al.: Jasper: an end-to-end convolutional neural acoustic model. arXiv preprint arXiv:​03288 (2019)
17.
Zurück zum Zitat Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:09452 (2017) Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:​09452 (2017)
18.
Zurück zum Zitat Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017) Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
19.
Zurück zum Zitat Sun, C., Ma, M., Zhao, Z., Chen, X.: Sparse deep stacking network for fault diagnosis of motor. IEEE Trans. Industr. Inf. 14(7), 3261–3270 (2018)CrossRef Sun, C., Ma, M., Zhao, Z., Chen, X.: Sparse deep stacking network for fault diagnosis of motor. IEEE Trans. Industr. Inf. 14(7), 3261–3270 (2018)CrossRef
20.
Zurück zum Zitat Medsker, L.R., Jain, L.: Recurrent neural networks. Des. Appl. 5, 64–67 (2001) Medsker, L.R., Jain, L.: Recurrent neural networks. Des. Appl. 5, 64–67 (2001)
21.
Zurück zum Zitat Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012) Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)
23.
Zurück zum Zitat Manohar, V., Chen, S.-J., Wang, Z., Fujita, Y., Watanabe, S., Khudanpur, S.:Acoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge system. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6665–6669. IEEE (2019) Manohar, V., Chen, S.-J., Wang, Z., Fujita, Y., Watanabe, S., Khudanpur, S.:Acoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge system. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6665–6669. IEEE (2019)
24.
Zurück zum Zitat Deng, L., Hinton, G., Kingsbury, B.:New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013) Deng, L., Hinton, G., Kingsbury, B.:New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013)
25.
Zurück zum Zitat Peddinti, V., Povey, D., Khudanpur, S.:A time delay neural network architecture for efficient modeling of long temporal contexts. In: 16th Annual Conference of the International Speech Communication Association (2015) Peddinti, V., Povey, D., Khudanpur, S.:A time delay neural network architecture for efficient modeling of long temporal contexts. In: 16th Annual Conference of the International Speech Communication Association (2015)
26.
Zurück zum Zitat Palaz, D., Doss, M.M., Collobert, R.:Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4295–4299. IEEE (2015) Palaz, D., Doss, M.M., Collobert, R.:Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4295–4299. IEEE (2015)
27.
Zurück zum Zitat Narasimhan, R., Fern, X.Z., Raich, R.:Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 146–150. IEEE (2017) Narasimhan, R., Fern, X.Z., Raich, R.:Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 146–150. IEEE (2017)
28.
Zurück zum Zitat Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.:Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:02720 (2017) Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.:Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:​02720 (2017)
29.
Zurück zum Zitat Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.:Multi-level wavelet-CNN for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018) Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.:Multi-level wavelet-CNN for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018)
30.
Zurück zum Zitat Hennequin, R., Royo-Letelier, J., Moussallam,M.: Codec independent lossy audio compression detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 726–730. IEEE (2017) Hennequin, R., Royo-Letelier, J., Moussallam,M.: Codec independent lossy audio compression detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 726–730. IEEE (2017)
31.
Zurück zum Zitat Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRef Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRef
32.
Zurück zum Zitat Wang, H., Wang, D.: Towards robust speech super-resolution. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 2058–2066 (2021)CrossRef Wang, H., Wang, D.: Towards robust speech super-resolution. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 2058–2066 (2021)CrossRef
Metadaten
Titel
Latest Trends in Deep Learning for Automatic Speech Recognition System
verfasst von
Amritpreet Kaur
Rohit Sachdeva
Amitoj Singh
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-95711-7_6

Premium Partner