Skip to main content
Erschienen in: International Journal of Speech Technology 2/2023

24.06.2023

Noise robust automatic speech recognition: review and analysis

verfasst von: Mohit Dua, Akanksha, Shelza Dua

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic Speech Recognition (ASR) system is an emerging technology used in various fields such as robotics, traffic controls, and healthcare, etc. The leading cause of ASR performance degradation is mismatch between the training and testing environments. The main reason for this mismatch is the presence of noise during the testing phase of an ASR system. Various techniques have been used by different researchers in front and backend phases of ASR, to detect and handle the noise. However, a very few review papers have considered noise as a criterion to present the comparison among the existing research works. Hence, the objective of this survey is to analyze and review all the effective methods proposed by different scientists and researchers to boost the noise robustness of an ASR system. Initially, the paper discusses the basic architecture of an ASR system, the factors affecting the its performance, and noise problem formulation. Secondly, the work analysis existing state of the art noise robust ASR methods in terms of front end feature extraction techniques and backend classification model. Then, a detailed review in terms of various speech databases, that are used by these methods, is given. Finally, an analysis in terms of performance metrics of all these noise-resistant ASR techniques is presented. Also, the paper discusses various feature extraction techniques, backend classification methods, different speech databases and performance metrics in detail, while presenting the analysis. The paper also discusses the existing challenges, and describes future research directions in the area of building noise-resistant ASR systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdollahi, M., & Nasersharif, B. (2017, May). Noise adaptive deep belief network for robust speech features extraction. In 2017 Iranian conference on electrical engineering (ICEE) (pp. 1491–1496). IEEE. Abdollahi, M., & Nasersharif, B. (2017, May). Noise adaptive deep belief network for robust speech features extraction. In 2017 Iranian conference on electrical engineering (ICEE) (pp. 1491–1496). IEEE.
Zurück zum Zitat Agrawal, P., & Ganapathy, S. (2019). Modulation filter learning using deep variational networks for robust speech recognition. IEEE Journal of Selected Topics in Signal Processing, 13(2), 244–253. Agrawal, P., & Ganapathy, S. (2019). Modulation filter learning using deep variational networks for robust speech recognition. IEEE Journal of Selected Topics in Signal Processing, 13(2), 244–253.
Zurück zum Zitat Alimuradov, A. K., & Tychkov, A. Y. (2021, March). EMD-based noise-robust method for speech/pause segmentation. In 2021 3rd international youth conference on radio electronics, electrical and power engineering (REEPE) (pp. 1–8). IEEE. Alimuradov, A. K., & Tychkov, A. Y. (2021, March). EMD-based noise-robust method for speech/pause segmentation. In 2021 3rd international youth conference on radio electronics, electrical and power engineering (REEPE) (pp. 1–8). IEEE.
Zurück zum Zitat Al-Karawi, K. A., & Mohammed, D. Y. (2021). Improving short utterance speaker verification by combining MFCC and entrocy in noisy conditions. Multimedia Tools and Applications, 80(14), 22231–22249. Al-Karawi, K. A., & Mohammed, D. Y. (2021). Improving short utterance speaker verification by combining MFCC and entrocy in noisy conditions. Multimedia Tools and Applications, 80(14), 22231–22249.
Zurück zum Zitat Barker, J., Watanabe, S., Vincent, E., & Trmal, J. (2018). The fifth ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines. arXiv preprint arXiv:1803.10609. Barker, J., Watanabe, S., Vincent, E., & Trmal, J. (2018). The fifth ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines. arXiv preprint arXiv:​1803.​10609.
Zurück zum Zitat Barker, J. P., Marxer, R., Vincent, E., & Watanabe, S. (2017). The CHiME challenges: Robust speech recognition in everyday environments. In S. Watanabe, M. Delcroix, F. Metze, & J. R. Hershey (Eds.), New era for robust speech recognition (pp. 327–344). Springer. Barker, J. P., Marxer, R., Vincent, E., & Watanabe, S. (2017). The CHiME challenges: Robust speech recognition in everyday environments. In S. Watanabe, M. Delcroix, F. Metze, & J. R. Hershey (Eds.), New era for robust speech recognition (pp. 327–344). Springer.
Zurück zum Zitat Bawa, P., & Kadyan, V. (2021). Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions. Applied Acoustics, 175, 107810. Bawa, P., & Kadyan, V. (2021). Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions. Applied Acoustics, 175, 107810.
Zurück zum Zitat Bharath, K. P., & Kumar, R. (2020). ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score. Multimedia Tools and Applications, 79(39), 28859–28883. Bharath, K. P., & Kumar, R. (2020). ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score. Multimedia Tools and Applications, 79(39), 28859–28883.
Zurück zum Zitat Bourouba, H., & Djemili, R. (2020). Feature extraction algorithm using new cepstral techniques for robust speech recognition. Malaysian Journal of Computer Science, 33(2), 90–101. Bourouba, H., & Djemili, R. (2020). Feature extraction algorithm using new cepstral techniques for robust speech recognition. Malaysian Journal of Computer Science, 33(2), 90–101.
Zurück zum Zitat Bu, H., Du, J., Na, X., Wu, B., & Zheng, H. (2017, November). Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline. In 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA) (pp. 1–5). IEEE. Bu, H., Du, J., Na, X., Wu, B., & Zheng, H. (2017, November). Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline. In 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA) (pp. 1–5). IEEE.
Zurück zum Zitat Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., & Wellner, P. (2005, July). The AMI meeting corpus: A pre-announcement. In International workshop on machine learning for multimodal interaction (pp. 28–39). Springer. Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., & Wellner, P. (2005, July). The AMI meeting corpus: A pre-announcement. In International workshop on machine learning for multimodal interaction (pp. 28–39). Springer.
Zurück zum Zitat Casebeer, J., Vale, V., Isik, U., Valin, J. M., Giri, R., & Krishnaswamy, A. (2021, June). Enhancing into the codec: Noise robust speech coding with vector-quantized auto-encoders. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 711–715). IEEE. Casebeer, J., Vale, V., Isik, U., Valin, J. M., Giri, R., & Krishnaswamy, A. (2021, June). Enhancing into the codec: Noise robust speech coding with vector-quantized auto-encoders. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 711–715). IEEE.
Zurück zum Zitat Chai, L., Du, J., Liu, D. Y., Tu, Y. H., & Lee, C. H. (2021, January). Acoustic modeling for multi-array conversational speech recognition in the chime-6 challenge. In 2021 IEEE spoken language technology workshop (SLT) (pp. 912–918). IEEE. Chai, L., Du, J., Liu, D. Y., Tu, Y. H., & Lee, C. H. (2021, January). Acoustic modeling for multi-array conversational speech recognition in the chime-6 challenge. In 2021 IEEE spoken language technology workshop (SLT) (pp. 912–918). IEEE.
Zurück zum Zitat Chao, F. A., Jiang, S. W. F., Yan, B. C., Hung, J. W., & Chen, B. (2021). TENET: A time-reversal enhancement network for noise-robust ASR. arXiv preprint arXiv:2107.01531. Chao, F. A., Jiang, S. W. F., Yan, B. C., Hung, J. W., & Chen, B. (2021). TENET: A time-reversal enhancement network for noise-robust ASR. arXiv preprint arXiv:​2107.​01531.
Zurück zum Zitat Chao, F. A., Hung, J. W., & Chen, B. (2021, July). Cross-domain single-channel speech enhancement model with BI-projection fusion module for noise-robust ASR. In 2021 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE. Chao, F. A., Hung, J. W., & Chen, B. (2021, July). Cross-domain single-channel speech enhancement model with BI-projection fusion module for noise-robust ASR. In 2021 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
Zurück zum Zitat Cho, B. J., & Park, H. M. (2021). Convolutional maximum-likelihood distortionless response beamforming with steering vector estimation for robust speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 1352–1367. Cho, B. J., & Park, H. M. (2021). Convolutional maximum-likelihood distortionless response beamforming with steering vector estimation for robust speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 1352–1367.
Zurück zum Zitat Christensen, H., Barker, J., Ma, N., & Green, P. D. (2010). The CHiME corpus: a resource and a challenge for computational hearing in multisource environments. In Eleventh annual conference of the international speech communication association. Christensen, H., Barker, J., Ma, N., & Green, P. D. (2010). The CHiME corpus: a resource and a challenge for computational hearing in multisource environments. In Eleventh annual conference of the international speech communication association.
Zurück zum Zitat Chung, H., Jeon, H. B., & Park, J. G. (2020, July). Semi-supervised training for sequence-to-sequence speech recognition using reinforcement learning. In 2020 international joint conference on neural networks (IJCNN) (pp. 1–6). IEEE. Chung, H., Jeon, H. B., & Park, J. G. (2020, July). Semi-supervised training for sequence-to-sequence speech recognition using reinforcement learning. In 2020 international joint conference on neural networks (IJCNN) (pp. 1–6). IEEE.
Zurück zum Zitat de La Calle-Silos, F., & Stern, R. M. (2017). Synchrony-based feature extraction for robust automatic speech recognition. IEEE Signal Processing Letters, 24(8), 1158–1162. de La Calle-Silos, F., & Stern, R. M. (2017). Synchrony-based feature extraction for robust automatic speech recognition. IEEE Signal Processing Letters, 24(8), 1158–1162.
Zurück zum Zitat Donahue, C., Li, B., & Prabhavalkar, R. (2018, April). Exploring speech enhancement with generative adversarial networks for robust speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5024–5028). IEEE. Donahue, C., Li, B., & Prabhavalkar, R. (2018, April). Exploring speech enhancement with generative adversarial networks for robust speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5024–5028). IEEE.
Zurück zum Zitat Dua, M., Aggarwal, R. K., & Biswas, M. (2017, September). Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system. In 2017 international conference on computer and applications (ICCA) (pp. 158–162). IEEE. Dua, M., Aggarwal, R. K., & Biswas, M. (2017, September). Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system. In 2017 international conference on computer and applications (ICCA) (pp. 158–162). IEEE.
Zurück zum Zitat Dua, M., Sethi, P. S., Agrawal, V., & Chawla, R. (2021). Speaker recognition using noise robust features and LSTM-RNN. In Progress in advanced computing and intelligent engineering (pp. 19–28). Springer. Dua, M., Sethi, P. S., Agrawal, V., & Chawla, R. (2021). Speaker recognition using noise robust features and LSTM-RNN. In Progress in advanced computing and intelligent engineering (pp. 19–28). Springer.
Zurück zum Zitat Dua, M., Aggarwal, R. K., & Biswas, M. (2018). Optimizing integrated features for Hindi automatic speech recognition system. Journal of Intelligent Systems, 29(1), 959–976. Dua, M., Aggarwal, R. K., & Biswas, M. (2018). Optimizing integrated features for Hindi automatic speech recognition system. Journal of Intelligent Systems, 29(1), 959–976.
Zurück zum Zitat Dua, M., Aggarwal, R. K., & Biswas, M. (2020). Discriminative training using noise-robust integrated features and refined HMM modeling. Journal of Intelligent Systems, 29(1), 327–344. Dua, M., Aggarwal, R. K., & Biswas, M. (2020). Discriminative training using noise-robust integrated features and refined HMM modeling. Journal of Intelligent Systems, 29(1), 327–344.
Zurück zum Zitat Dua, M., Jain, C., & Kumar, S. (2022). LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems. Journal of Ambient Intelligence and Humanized Computing, 13, 1–16. Dua, M., Jain, C., & Kumar, S. (2022). LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems. Journal of Ambient Intelligence and Humanized Computing, 13, 1–16.
Zurück zum Zitat Dua, M., Sadhu, A., Jindal, A., & Mehta, R. (2022). A hybrid noise robust model for multi-replay attack detection in automatic speaker verification systems. Biomedical Signal Processing and Control, 74, 103517. Dua, M., Sadhu, A., Jindal, A., & Mehta, R. (2022). A hybrid noise robust model for multi-replay attack detection in automatic speaker verification systems. Biomedical Signal Processing and Control, 74, 103517.
Zurück zum Zitat Dubey, H., Sangwan, A., & Hansen, J. H. (2018). Leveraging frequency-dependent kernel and dip-based clustering for robust speech activity detection in naturalistic audio streams. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11), 2056–2071. Dubey, H., Sangwan, A., & Hansen, J. H. (2018). Leveraging frequency-dependent kernel and dip-based clustering for robust speech activity detection in naturalistic audio streams. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11), 2056–2071.
Zurück zum Zitat Erdogan, H., Hershey, J. R., Watanabe, S., & Le Roux, J. (2017). Deep recurrent networks for separation and recognition of single-channel speech in nonstationary background audio. In New Era for Robust Speech Recognition (pp. 165–186). Springer. Erdogan, H., Hershey, J. R., Watanabe, S., & Le Roux, J. (2017). Deep recurrent networks for separation and recognition of single-channel speech in nonstationary background audio. In New Era for Robust Speech Recognition (pp. 165–186). Springer.
Zurück zum Zitat Errattahi, R., El Hannani, A., & Ouahmane, H. (2018). Automatic speech recognition errors detection and correction: A review. Procedia Computer Science, 128, 32–37. Errattahi, R., El Hannani, A., & Ouahmane, H. (2018). Automatic speech recognition errors detection and correction: A review. Procedia Computer Science, 128, 32–37.
Zurück zum Zitat Fallside, F., Lucke, H., Marsland, T. P., O'Shea, P. J., Owen, M. S. J., Prager, R. W., Robinson, A. J., & Russell, N. H. (1990, April). Continuous speech recognition for the TIMIT database using neural networks. In International conference on acoustics, speech, and signal processing (pp. 445–448). IEEE. Fallside, F., Lucke, H., Marsland, T. P., O'Shea, P. J., Owen, M. S. J., Prager, R. W., Robinson, A. J., & Russell, N. H. (1990, April). Continuous speech recognition for the TIMIT database using neural networks. In International conference on acoustics, speech, and signal processing (pp. 445–448). IEEE.
Zurück zum Zitat Faragallah, O. S. (2018). Robust noise MKMFCC–SVM automatic speaker identification. International Journal of Speech Technology, 21(2), 185–192. Faragallah, O. S. (2018). Robust noise MKMFCC–SVM automatic speaker identification. International Journal of Speech Technology, 21(2), 185–192.
Zurück zum Zitat Fendji, J. L. K., Tala, D. M., Yenke, B. O., & Atemkeng, M. (2021). Automatic Speech Recognition using limited vocabulary: A survey. arXiv preprint arXiv:2108.10254. Fendji, J. L. K., Tala, D. M., Yenke, B. O., & Atemkeng, M. (2021). Automatic Speech Recognition using limited vocabulary: A survey. arXiv preprint arXiv:​2108.​10254.
Zurück zum Zitat Fukuda, T., & Kurata, G. (2021, June). Generalized knowledge distillation from an ensemble of specialized teachers leveraging unsupervised neural clustering. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6868–6872). IEEE. Fukuda, T., & Kurata, G. (2021, June). Generalized knowledge distillation from an ensemble of specialized teachers leveraging unsupervised neural clustering. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6868–6872). IEEE.
Zurück zum Zitat Gref, M., Walter, O., Schmidt, C., Behnke, S., & Köhler, J. (2020). Multi-staged cross-lingual acoustic model adaption for robust speech recognition in real-world applications—A case study on German oral history interviews. arXiv preprint arXiv:2005.12562. Gref, M., Walter, O., Schmidt, C., Behnke, S., & Köhler, J. (2020). Multi-staged cross-lingual acoustic model adaption for robust speech recognition in real-world applications—A case study on German oral history interviews. arXiv preprint arXiv:​2005.​12562.
Zurück zum Zitat Hermansky, H., Ellis, D. P., & Sharma, S. (2000, June). Tandem connectionist feature extraction for conventional HMM systems. In 2000 IEEE international conference on acoustics, speech, and signal processing: Proceedings (Cat. No. 00CH37100) (Vol. 3, pp. 1635–1638). IEEE. Hermansky, H., Ellis, D. P., & Sharma, S. (2000, June). Tandem connectionist feature extraction for conventional HMM systems. In 2000 IEEE international conference on acoustics, speech, and signal processing: Proceedings (Cat. No. 00CH37100) (Vol. 3, pp. 1635–1638). IEEE.
Zurück zum Zitat Higuchi, Y., Tawara, N., Ogawa, A., Iwata, T., Kobayashi, T., & Ogawa, T. (2021, January). Noise-robust attention learning for end-to-end speech recognition. In 2020 28th European Signal Processing Conference (EUSIPCO) (pp. 311–315). IEEE. Higuchi, Y., Tawara, N., Ogawa, A., Iwata, T., Kobayashi, T., & Ogawa, T. (2021, January). Noise-robust attention learning for end-to-end speech recognition. In 2020 28th European Signal Processing Conference (EUSIPCO) (pp. 311–315). IEEE.
Zurück zum Zitat Hsu, W. N., & Glass, J. (2018, April). Extracting domain invariant features by unsupervised learning for robust automatic speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5614–5618). IEEE. Hsu, W. N., & Glass, J. (2018, April). Extracting domain invariant features by unsupervised learning for robust automatic speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5614–5618). IEEE.
Zurück zum Zitat Hu, H., Tan, T., & Qian, Y. (2018, April). Generative adversarial networks based data augmentation for noise robust speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5044–5048). IEEE. Hu, H., Tan, T., & Qian, Y. (2018, April). Generative adversarial networks based data augmentation for noise robust speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5044–5048). IEEE.
Zurück zum Zitat Huang, C. W., & Narayanan, S. S. (2017, July). Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition. In 2017 IEEE international conference on multimedia and expo (ICME) (pp. 583–588). IEEE. Huang, C. W., & Narayanan, S. S. (2017, July). Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition. In 2017 IEEE international conference on multimedia and expo (ICME) (pp. 583–588). IEEE.
Zurück zum Zitat Huang, Y., Ao, W., & Zhang, G. (2017). Novel sub-band spectral centroid weighted wavelet packet features with importance-weighted support vector machines for robust speech emotion recognition. Wireless Personal Communications, 95(3), 2223–2238. Huang, Y., Ao, W., & Zhang, G. (2017). Novel sub-band spectral centroid weighted wavelet packet features with importance-weighted support vector machines for robust speech emotion recognition. Wireless Personal Communications, 95(3), 2223–2238.
Zurück zum Zitat Huang, Y., Tian, K., Wu, A., & Zhang, G. (2019). Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition. Journal of Ambient Intelligence and Humanized Computing, 10(5), 1787–1798. Huang, Y., Tian, K., Wu, A., & Zhang, G. (2019). Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition. Journal of Ambient Intelligence and Humanized Computing, 10(5), 1787–1798.
Zurück zum Zitat Ibrahim, A. K., Zhuang, H., Erdol, N., & Ali, A. M. (2018, December). Feature extraction methods for the detection of north Atlantic right whale up-calls. In 2018 international conference on computational science and computational intelligence (CSCI) (pp. 179–185). IEEE. Ibrahim, A. K., Zhuang, H., Erdol, N., & Ali, A. M. (2018, December). Feature extraction methods for the detection of north Atlantic right whale up-calls. In 2018 international conference on computational science and computational intelligence (CSCI) (pp. 179–185). IEEE.
Zurück zum Zitat Jainar, S. J., Sale, P. L., & Nagaraja, B. G. (2020). VAD, feature extraction and modelling techniques for speaker recognition: A review. International Journal of Signal and Imaging Systems Engineering, 12(1–2), 1–18. Jainar, S. J., Sale, P. L., & Nagaraja, B. G. (2020). VAD, feature extraction and modelling techniques for speaker recognition: A review. International Journal of Signal and Imaging Systems Engineering, 12(1–2), 1–18.
Zurück zum Zitat Joshi, S. S., & Bhagile, V. D. (2020, November). Native and non-native Marathi numerals recognition using LPC and ANN. In 2020 4th international conference on electronics, communication and aerospace technology (ICECA) (pp. 355–361). IEEE. Joshi, S. S., & Bhagile, V. D. (2020, November). Native and non-native Marathi numerals recognition using LPC and ANN. In 2020 4th international conference on electronics, communication and aerospace technology (ICECA) (pp. 355–361). IEEE.
Zurück zum Zitat Kadyan, V., & Kaur, M. (2020). SGMM-based modeling classifier for Punjabi automatic speech recognition system. In Smart computing paradigms: New progresses and challenges (pp. 149–155). Springer. Kadyan, V., & Kaur, M. (2020). SGMM-based modeling classifier for Punjabi automatic speech recognition system. In Smart computing paradigms: New progresses and challenges (pp. 149–155). Springer.
Zurück zum Zitat Kadyan, V., Bala, S., & Bawa, P. (2021). Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system. International Journal of Speech Technology, 24(2), 473–481. Kadyan, V., Bala, S., & Bawa, P. (2021). Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system. International Journal of Speech Technology, 24(2), 473–481.
Zurück zum Zitat Kadyan, V., Bala, S., Bawa, P., & Mittal, M. (2020a). Developing in-vehicular noise robust children ASR system using Tandem-NN-based acoustic modelling. International Journal of Vehicle Autonomous Systems, 15(3–4), 296–306. Kadyan, V., Bala, S., Bawa, P., & Mittal, M. (2020a). Developing in-vehicular noise robust children ASR system using Tandem-NN-based acoustic modelling. International Journal of Vehicle Autonomous Systems, 15(3–4), 296–306.
Zurück zum Zitat Kadyan, V., Dua, M., & Dhiman, P. (2021). Enhancing accuracy of long contextual dependencies for Punjabi speech recognition system using deep LSTM. International Journal of Speech Technology, 24, 517–527. Kadyan, V., Dua, M., & Dhiman, P. (2021). Enhancing accuracy of long contextual dependencies for Punjabi speech recognition system using deep LSTM. International Journal of Speech Technology, 24, 517–527.
Zurück zum Zitat Kadyan, V., Mantri, A., & Aggarwal, R. K. (2020b). Improved filter bank on multitaper framework for robust Punjabi-ASR system. International Journal of Speech Technology, 23(1), 87–100. Kadyan, V., Mantri, A., & Aggarwal, R. K. (2020b). Improved filter bank on multitaper framework for robust Punjabi-ASR system. International Journal of Speech Technology, 23(1), 87–100.
Zurück zum Zitat Kahn, J., Riviere, M., Zheng, W., Kharitonov, E., Xu, Q., Mazare, P.E., Karadayi, J., Liptchinsky, V., Collobert, R., Fuegen, C., Likhomanenko, T., Synnaeve, G., Joulin, A., Mohamed, A., & Dupoux, E. (2020, May). Libri-light: A benchmark for ASR with limited or no supervision. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7669–7673). IEEE. Kahn, J., Riviere, M., Zheng, W., Kharitonov, E., Xu, Q., Mazare, P.E., Karadayi, J., Liptchinsky, V., Collobert, R., Fuegen, C., Likhomanenko, T., Synnaeve, G., Joulin, A., Mohamed, A., & Dupoux, E. (2020, May). Libri-light: A benchmark for ASR with limited or no supervision. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7669–7673). IEEE.
Zurück zum Zitat Kamble, M. R., & Patil, H. A. (2020). Combination of amplitude and frequency modulation features for presentation attack detection. Journal of Signal Processing Systems, 92(8), 777–791. Kamble, M. R., & Patil, H. A. (2020). Combination of amplitude and frequency modulation features for presentation attack detection. Journal of Signal Processing Systems, 92(8), 777–791.
Zurück zum Zitat Khoria, K., Kamble, M. R., & Patil, H. A. (2021, January). Teager energy cepstral coefficients for classification of normal vs. whisper speech. In 2020 28th European signal processing conference (EUSIPCO) (pp. 1–5). IEEE. Khoria, K., Kamble, M. R., & Patil, H. A. (2021, January). Teager energy cepstral coefficients for classification of normal vs. whisper speech. In 2020 28th European signal processing conference (EUSIPCO) (pp. 1–5). IEEE.
Zurück zum Zitat Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Habets, E., Haeb-Umbach, R., Leutnant, V., Sehr, A., Kellermann, W., Maas, R., Gannot, S., & Raj, B. (2013, October). The REVERB challenge: A common evaluation framework for de-reverberation and recognition of reverberant speech. In 2013 IEEE workshop on applications of signal processing to audio and acoustics (pp. 1–4). IEEE. Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Habets, E., Haeb-Umbach, R., Leutnant, V., Sehr, A., Kellermann, W., Maas, R., Gannot, S., & Raj, B. (2013, October). The REVERB challenge: A common evaluation framework for de-reverberation and recognition of reverberant speech. In 2013 IEEE workshop on applications of signal processing to audio and acoustics (pp. 1–4). IEEE.
Zurück zum Zitat Kinoshita, K., Ochiai, T., Delcroix, M., & Nakatani, T. (2020, May). Improving noise-robust automatic speech recognition with single-channel time-domain enhancement network. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7009–7013). IEEE. Kinoshita, K., Ochiai, T., Delcroix, M., & Nakatani, T. (2020, May). Improving noise-robust automatic speech recognition with single-channel time-domain enhancement network. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7009–7013). IEEE.
Zurück zum Zitat Kinoshita, K., Delcroix, M., Gannot, S., Habets, E. A. P., Haeb-Umbach, R., Kellermann, W., Leutnant, V., Maas, R., Nakatani, T., Raj, B., Sehr, A., & Yoshioka, T. (2016). A summary of the REVERB challenge: State-of-the-art and remaining challenges in reverberant speech processing research. EURASIP Journal on Advances in Signal Processing, 2016, 1–19. Kinoshita, K., Delcroix, M., Gannot, S., Habets, E. A. P., Haeb-Umbach, R., Kellermann, W., Leutnant, V., Maas, R., Nakatani, T., Raj, B., Sehr, A., & Yoshioka, T. (2016). A summary of the REVERB challenge: State-of-the-art and remaining challenges in reverberant speech processing research. EURASIP Journal on Advances in Signal Processing, 2016, 1–19.
Zurück zum Zitat Ko, T., Peddinti, V., Povey, D., Seltzer, M. L., & Khudanpur, S. (2017, March). A study on data augmentation of reverberant speech for robust speech recognition. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5220–5224). IEEE. Ko, T., Peddinti, V., Povey, D., Seltzer, M. L., & Khudanpur, S. (2017, March). A study on data augmentation of reverberant speech for robust speech recognition. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5220–5224). IEEE.
Zurück zum Zitat Koya, J. R., & Rao, S. V. M. (2021). Deep bidirectional neural networks for robust speech recognition under heavy background noise. Materials Today: Proceedings. Koya, J. R., & Rao, S. V. M. (2021). Deep bidirectional neural networks for robust speech recognition under heavy background noise. Materials Today: Proceedings.
Zurück zum Zitat Krishna, G., Tran, C., Yu, J., & Tewfik, A. H. (2019, May). Speech recognition with no speech or with noisy speech. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1090–1094). IEEE. Krishna, G., Tran, C., Yu, J., & Tewfik, A. H. (2019, May). Speech recognition with no speech or with noisy speech. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1090–1094). IEEE.
Zurück zum Zitat Krobba, A., Debyeche, M., & Selouani, S. A. (2020). Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise. Multimedia Tools and Applications, 79(25), 18679–18693. Krobba, A., Debyeche, M., & Selouani, S. A. (2020). Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise. Multimedia Tools and Applications, 79(25), 18679–18693.
Zurück zum Zitat Kuamr, A., Dua, M., & Choudhary, A. (2014, February). Implementation and performance evaluation of continuous Hindi speech recognition. In 2014 international conference on electronics and communication systems (ICECS) (pp. 1–5). IEEE. Kuamr, A., Dua, M., & Choudhary, A. (2014, February). Implementation and performance evaluation of continuous Hindi speech recognition. In 2014 international conference on electronics and communication systems (ICECS) (pp. 1–5). IEEE.
Zurück zum Zitat Kumar, A., & Shahnawazuddin, S. (2020, July). Robust detection of vowel onset and end points. In 2020 international conference on signal processing and communications (SPCOM) (pp. 1–5). IEEE. Kumar, A., & Shahnawazuddin, S. (2020, July). Robust detection of vowel onset and end points. In 2020 international conference on signal processing and communications (SPCOM) (pp. 1–5). IEEE.
Zurück zum Zitat Kumar, K., Ren, B., Gong, Y., & Wu, J. (2020). Bandpass noise generation and augmentation for unified ASR. In INTERSPEECH (pp. 1683–1687). Kumar, K., Ren, B., Gong, Y., & Wu, J. (2020). Bandpass noise generation and augmentation for unified ASR. In INTERSPEECH (pp. 1683–1687).
Zurück zum Zitat Kumar, A., & Aggarwal, R. K. (2021). Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling. Journal of Intelligent Systems, 30(1), 165–179. Kumar, A., & Aggarwal, R. K. (2021). Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling. Journal of Intelligent Systems, 30(1), 165–179.
Zurück zum Zitat Kumar, A., & Mittal, V. (2021). Hindi speech recognition in noisy environment using hybrid technique. International Journal of Information Technology, 13(2), 483–492.MathSciNet Kumar, A., & Mittal, V. (2021). Hindi speech recognition in noisy environment using hybrid technique. International Journal of Information Technology, 13(2), 483–492.MathSciNet
Zurück zum Zitat Laghari, M., Tahir, M. J., Azeem, A., Riaz, W., & Zhou, Y. (2021, May). Robust speech emotion recognition for Sindhi language based on deep convolutional neural network. In 2021 international conference on communications, information system and computer engineering (CISCE) (pp. 543–548). IEEE. Laghari, M., Tahir, M. J., Azeem, A., Riaz, W., & Zhou, Y. (2021, May). Robust speech emotion recognition for Sindhi language based on deep convolutional neural network. In 2021 international conference on communications, information system and computer engineering (CISCE) (pp. 543–548). IEEE.
Zurück zum Zitat Latha, A. P. (2020, October). Evaluation of voice mimicking using I–Vector framework. In Speech and computer: 22nd international conference, SPECOM 2020, St. Petersburg, Russia, October 7–9, 2020, Proceedings (Vol. 12335, p. 446). Springer Nature. Latha, A. P. (2020, October). Evaluation of voice mimicking using I–Vector framework. In Speech and computer: 22nd international conference, SPECOM 2020, St. Petersburg, Russia, October 7–9, 2020, Proceedings (Vol. 12335, p. 446). Springer Nature.
Zurück zum Zitat Li, H., Wang, D., Zhang, X., & Gao, G. (2020). Frame-level signal-to-noise ratio estimation using deep learning. In INTERSPEECH (pp. 4626–4630). Li, H., Wang, D., Zhang, X., & Gao, G. (2020). Frame-level signal-to-noise ratio estimation using deep learning. In INTERSPEECH (pp. 4626–4630).
Zurück zum Zitat Lim, H., Kim, Y., & Kim, H. (2020). Cross-informed domain adversarial training for noise-robust wake-up word detection. IEEE Signal Processing Letters, 27, 1769–1773. Lim, H., Kim, Y., & Kim, H. (2020). Cross-informed domain adversarial training for noise-robust wake-up word detection. IEEE Signal Processing Letters, 27, 1769–1773.
Zurück zum Zitat Lin, Y., Guo, D., Zhang, J., Chen, Z., & Yang, B. (2020). A unified framework for multilingual speech recognition in air traffic control systems. IEEE Transactions on Neural Networks and Learning Systems. Lin, Y., Guo, D., Zhang, J., Chen, Z., & Yang, B. (2020). A unified framework for multilingual speech recognition in air traffic control systems. IEEE Transactions on Neural Networks and Learning Systems.
Zurück zum Zitat Liu, B., Nie, S., Zhang, Y., Ke, D., Liang, S., & Liu, W. (2018, April). Boosting noise robustness of acoustic model via deep adversarial training. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5034–5038). IEEE. Liu, B., Nie, S., Zhang, Y., Ke, D., Liang, S., & Liu, W. (2018, April). Boosting noise robustness of acoustic model via deep adversarial training. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5034–5038). IEEE.
Zurück zum Zitat Liu, B., Shen, Z., Huang, L., Gong, Y., Zhang, Z., & Cai, H. (2021, February). A 1D-CRNN inspired reconfigurable processor for noise-robust low-power keywords recognition. In 2021 design, automation & test in Europe conference & exhibition (DATE) (pp. 495–500). IEEE. Liu, B., Shen, Z., Huang, L., Gong, Y., Zhang, Z., & Cai, H. (2021, February). A 1D-CRNN inspired reconfigurable processor for noise-robust low-power keywords recognition. In 2021 design, automation & test in Europe conference & exhibition (DATE) (pp. 495–500). IEEE.
Zurück zum Zitat Lokesh, S., & Devi, M. R. (2019). Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method. Cluster Computing, 22(5), 11669–11679. Lokesh, S., & Devi, M. R. (2019). Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method. Cluster Computing, 22(5), 11669–11679.
Zurück zum Zitat Lü, Y., Lin, H., Wu, P., & Chen, Y. (2021). Feature compensation based on independent noise estimation for robust speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2021(1), 1–9. Lü, Y., Lin, H., Wu, P., & Chen, Y. (2021). Feature compensation based on independent noise estimation for robust speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2021(1), 1–9.
Zurück zum Zitat Maity, K., Pradhan, G., & Singh, J. P. (2021). A pitch and noise robust keyword spotting system using SMAC features with prosody modification. Circuits, Systems, and Signal Processing, 40(4), 1892–1904. Maity, K., Pradhan, G., & Singh, J. P. (2021). A pitch and noise robust keyword spotting system using SMAC features with prosody modification. Circuits, Systems, and Signal Processing, 40(4), 1892–1904.
Zurück zum Zitat Malekzadeh, S., Gholizadeh, M. H., & Razavi, S. N. (2018). Persian vowel recognition with MFCC and ANN on PCVC speech dataset. arXiv preprint arXiv:1812.06953. Malekzadeh, S., Gholizadeh, M. H., & Razavi, S. N. (2018). Persian vowel recognition with MFCC and ANN on PCVC speech dataset. arXiv preprint arXiv:​1812.​06953.
Zurück zum Zitat Malik, M., Malik, M. K., Mehmood, K., & Makhdoom, I. (2021). Automatic speech recognition: A survey. Multimedia Tools and Applications, 80(6), 9411–9457. Malik, M., Malik, M. K., Mehmood, K., & Makhdoom, I. (2021). Automatic speech recognition: A survey. Multimedia Tools and Applications, 80(6), 9411–9457.
Zurück zum Zitat Mandalapu, H., Ramachandra, R., & Busch, C. (2021, May). Smartphone audio replay attacks dataset. In 2021 IEEE international workshop on biometrics and forensics (IWBF) (pp. 1–6). IEEE. Mandalapu, H., Ramachandra, R., & Busch, C. (2021, May). Smartphone audio replay attacks dataset. In 2021 IEEE international workshop on biometrics and forensics (IWBF) (pp. 1–6). IEEE.
Zurück zum Zitat McLoughlin, I., Xie, Z., Song, Y., Phan, H., & Palaniappan, R. (2020). Time-frequency feature fusion for noise-robust audio event classification. Circuits, Systems, and Signal Processing, 39(3), 1672–1687. McLoughlin, I., Xie, Z., Song, Y., Phan, H., & Palaniappan, R. (2020). Time-frequency feature fusion for noise-robust audio event classification. Circuits, Systems, and Signal Processing, 39(3), 1672–1687.
Zurück zum Zitat Meng, Z., Watanabe, S., Hershey, J. R., & Erdogan, H. (2017, March). Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 271–275). IEEE. Meng, Z., Watanabe, S., Hershey, J. R., & Erdogan, H. (2017, March). Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 271–275). IEEE.
Zurück zum Zitat Meng, L., Xu, J., Tan, X., Wang, J., Qin, T., & Xu, B. (2021, June). MixSpeech: Data augmentation for low-resource automatic speech recognition. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7008–7012). IEEE. Meng, L., Xu, J., Tan, X., Wang, J., Qin, T., & Xu, B. (2021, June). MixSpeech: Data augmentation for low-resource automatic speech recognition. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7008–7012). IEEE.
Zurück zum Zitat Meutzner, H., Ma, N., Nickel, R., Schymura, C., & Kolossa, D. (2017, March). Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5320–5324). IEEE. Meutzner, H., Ma, N., Nickel, R., Schymura, C., & Kolossa, D. (2017, March). Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5320–5324). IEEE.
Zurück zum Zitat Mitra, V., Sivaraman, G., Bartels, C., Nam, H., Wang, W., Espy-Wilson, C., Vergyri, D., & Franco, H. (2017, March). Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5205–5209). IEEE. Mitra, V., Sivaraman, G., Bartels, C., Nam, H., Wang, W., Espy-Wilson, C., Vergyri, D., & Franco, H. (2017, March). Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5205–5209). IEEE.
Zurück zum Zitat Mitra, V., Franco, H., Stern, R. M., van Hout, J., Ferrer, L., Graciarena, M., Wang, W., Vergyri, D., Alwan, A., & Hansen, J. H. L. (2017). Robust features in deep-learning-based speech recognition. In S. Watanabe, M. Delcroix, F. Metze, & J. R. Hershey (Eds.), New era for robust speech recognition (pp. 187–217). Springer. Mitra, V., Franco, H., Stern, R. M., van Hout, J., Ferrer, L., Graciarena, M., Wang, W., Vergyri, D., Alwan, A., & Hansen, J. H. L. (2017). Robust features in deep-learning-based speech recognition. In S. Watanabe, M. Delcroix, F. Metze, & J. R. Hershey (Eds.), New era for robust speech recognition (pp. 187–217). Springer.
Zurück zum Zitat Mittal, A., & Dua, M. (2021). Constant Q cepstral coefficients and long short-term memory model-based automatic speaker verification system. In Proceedings of international conference on intelligent computing, information and control systems: ICICCS 2020 (pp. 895–904). Springer. Mittal, A., & Dua, M. (2021). Constant Q cepstral coefficients and long short-term memory model-based automatic speaker verification system. In Proceedings of international conference on intelligent computing, information and control systems: ICICCS 2020 (pp. 895–904). Springer.
Zurück zum Zitat Naik, A. (2021). HMM-based phoneme speech recognition system for the control and command of industrial robots. Technical. Technical Transactions, e2021002. Naik, A. (2021). HMM-based phoneme speech recognition system for the control and command of industrial robots. Technical. Technical Transactions, e2021002.
Zurück zum Zitat Nainan, S., & Kulkarni, V. (2020). Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. International Journal of Speech Technology, 24, 1–14. Nainan, S., & Kulkarni, V. (2020). Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. International Journal of Speech Technology, 24, 1–14.
Zurück zum Zitat Naing, H. M. S., Hidayat, R., Hartanto, R., & Miyanaga, Y. (2020, November). A front-end technique for automatic noisy speech recognition. In 2020 23rd conference of the oriental COCOSDA international committee for the co-ordination and standardisation of speech databases and assessment techniques (O-COCOSDA) (pp. 49–54). IEEE. Naing, H. M. S., Hidayat, R., Hartanto, R., & Miyanaga, Y. (2020, November). A front-end technique for automatic noisy speech recognition. In 2020 23rd conference of the oriental COCOSDA international committee for the co-ordination and standardisation of speech databases and assessment techniques (O-COCOSDA) (pp. 49–54). IEEE.
Zurück zum Zitat Namazifar, M., Tur, G., & Hakkani-Tür, D. (2021, January). Warped language models for noise robust language understanding. In 2021 IEEE spoken language technology workshop (SLT) (pp. 981–988). IEEE. Namazifar, M., Tur, G., & Hakkani-Tür, D. (2021, January). Warped language models for noise robust language understanding. In 2021 IEEE spoken language technology workshop (SLT) (pp. 981–988). IEEE.
Zurück zum Zitat Nanjo, H., & Kawahara, T. (2005, March). A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding. In Proceedings (ICASSP’05): IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. I–1053). IEEE. Nanjo, H., & Kawahara, T. (2005, March). A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding. In Proceedings (ICASSP’05): IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. I–1053). IEEE.
Zurück zum Zitat Nian, Z., Tu, Y. H., Du, J., & Lee, C. H. (2021, June). A progressive learning approach to adaptive noise and speech estimation for speech enhancement and noisy speech recognition. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6913–6917). IEEE. Nian, Z., Tu, Y. H., Du, J., & Lee, C. H. (2021, June). A progressive learning approach to adaptive noise and speech estimation for speech enhancement and noisy speech recognition. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6913–6917). IEEE.
Zurück zum Zitat Noé, P. G., Parcollet, T., & Morchid, M. (2020, May). CGCNN: Complex Gabor convolutional neural network on raw speech. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7724–7728). IEEE. Noé, P. G., Parcollet, T., & Morchid, M. (2020, May). CGCNN: Complex Gabor convolutional neural network on raw speech. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7724–7728). IEEE.
Zurück zum Zitat Oglic, D., Cvetkovic, Z., Bell, P., & Renals, S. (2020, July). A deep 2D convolutional network for waveform-based speech recognition. In INTERSPEECH (pp. 1654–1658). Oglic, D., Cvetkovic, Z., Bell, P., & Renals, S. (2020, July). A deep 2D convolutional network for waveform-based speech recognition. In INTERSPEECH (pp. 1654–1658).
Zurück zum Zitat Oh, S. (2021). DNN based robust speech feature extraction and signal noise removal method using improved average prediction LMS filter for speech recognition. Journal of Convergence for Information Technology, 11(6), 1–6. Oh, S. (2021). DNN based robust speech feature extraction and signal noise removal method using improved average prediction LMS filter for speech recognition. Journal of Convergence for Information Technology, 11(6), 1–6.
Zurück zum Zitat Ouisaadane, A., & Safi, S. (2021). A comparative study for Arabic speech recognition system in noisy environments. International Journal of Speech Technology, 24, 1–10. Ouisaadane, A., & Safi, S. (2021). A comparative study for Arabic speech recognition system in noisy environments. International Journal of Speech Technology, 24, 1–10.
Zurück zum Zitat Padi, B., Mohan, A., & Ganapathy, S. (2020). Towards relevance and sequence modeling in language recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1223–1232. Padi, B., Mohan, A., & Ganapathy, S. (2020). Towards relevance and sequence modeling in language recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1223–1232.
Zurück zum Zitat Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015, April). Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5206–5210). IEEE. Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015, April). Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5206–5210). IEEE.
Zurück zum Zitat Paul, D. B., & Baker, J. (1992). The design for the Wall Street Journal-based CSR corpus. In Speech and natural language: proceedings of a workshop held at Harriman. New York, February 23–26, 1992. Paul, D. B., & Baker, J. (1992). The design for the Wall Street Journal-based CSR corpus. In Speech and natural language: proceedings of a workshop held at Harriman. New York, February 23–26, 1992.
Zurück zum Zitat Pearce, D. (1998). Aurora project: Experimental framework for the performance evaluation of distributed speech recognition front-ends. ETSI working paper. Pearce, D. (1998). Aurora project: Experimental framework for the performance evaluation of distributed speech recognition front-ends. ETSI working paper.
Zurück zum Zitat Qian, Y., Tan, T., Hu, H., & Liu, Q. (2018, April). Noise robust speech recognition on aurora4 by humans and machines. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5604–5608). IEEE. Qian, Y., Tan, T., Hu, H., & Liu, Q. (2018, April). Noise robust speech recognition on aurora4 by humans and machines. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5604–5608). IEEE.
Zurück zum Zitat Radha, K., & Bansal, M. (2022). Audio augmentation for non-native children’s speech recognition through discriminative learning. Entropy, 24(10), 1490. Radha, K., & Bansal, M. (2022). Audio augmentation for non-native children’s speech recognition through discriminative learning. Entropy, 24(10), 1490.
Zurück zum Zitat Raju, S., Jagtap, V., Kulkarni, P., Ravikanth, M., & Rafeeq, M. (2020, March). Speech recognition to build context: A survey. In 2020 international conference on computer science, engineering and applications (ICCSEA) (pp. 1–7). IEEE. Raju, S., Jagtap, V., Kulkarni, P., Ravikanth, M., & Rafeeq, M. (2020, March). Speech recognition to build context: A survey. In 2020 international conference on computer science, engineering and applications (ICCSEA) (pp. 1–7). IEEE.
Zurück zum Zitat Ravanelli, M., Zhong, J., Pascual, S., Swietojanski, P., Monteiro, J., Trmal, J., & Bengio, Y. (2020, May). Multi-task self-supervised learning for robust speech recognition. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6989–6993). IEEE. Ravanelli, M., Zhong, J., Pascual, S., Swietojanski, P., Monteiro, J., Trmal, J., & Bengio, Y. (2020, May). Multi-task self-supervised learning for robust speech recognition. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6989–6993). IEEE.
Zurück zum Zitat Ray, A., Rajeswar, S., & Chaudhury, S. (2015, January). Text recognition using deep BLSTM networks. In 2015 eighth international conference on advances in pattern recognition (ICAPR) (pp. 1–6). IEEE. Ray, A., Rajeswar, S., & Chaudhury, S. (2015, January). Text recognition using deep BLSTM networks. In 2015 eighth international conference on advances in pattern recognition (ICAPR) (pp. 1–6). IEEE.
Zurück zum Zitat Reddy, C.K.A., Dubey, H., Gopal, V., Cutler, R., Braun, S., Gamper, H., Aichner, R., Srinivasan, S. (2021, June). ICASSP 2021 deep noise suppression challenge. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6623–6627). IEEE. Reddy, C.K.A., Dubey, H., Gopal, V., Cutler, R., Braun, S., Gamper, H., Aichner, R., Srinivasan, S. (2021, June). ICASSP 2021 deep noise suppression challenge. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6623–6627). IEEE.
Zurück zum Zitat Rownicka, J., Bell, P., & Renals, S. (2020, May). Multi-scale octave convolutions for robust speech recognition. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7019–7023). IEEE. Rownicka, J., Bell, P., & Renals, S. (2020, May). Multi-scale octave convolutions for robust speech recognition. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7019–7023). IEEE.
Zurück zum Zitat Sahidullah, M., Kinnunen, T., & Hanilçi, C. (2015). A comparison of features for synthetic speech detection. Sahidullah, M., Kinnunen, T., & Hanilçi, C. (2015). A comparison of features for synthetic speech detection.
Zurück zum Zitat Sahu, P., Dua, M., & Kumar, A. (2018). Challenges and issues in adopting speech recognition. Speech and Language Processing for Human-Machine Communications: Proceedings of CSI, 2015, 209–215. Sahu, P., Dua, M., & Kumar, A. (2018). Challenges and issues in adopting speech recognition. Speech and Language Processing for Human-Machine Communications: Proceedings of CSI, 2015, 209–215.
Zurück zum Zitat Sailor, H. B., & Patil, H. A. (2017). Auditory feature representation using convolutional restricted Boltzmann machine and Teager energy operator for speech recognition. The Journal of the Acoustical Society of America, 141(6), EL500–EL506. Sailor, H. B., & Patil, H. A. (2017). Auditory feature representation using convolutional restricted Boltzmann machine and Teager energy operator for speech recognition. The Journal of the Acoustical Society of America, 141(6), EL500–EL506.
Zurück zum Zitat Sakthi, M., Tewfik, A., & Pawate, R. (2020, May). Speech Recognition model compression. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7869–7873). IEEE. Sakthi, M., Tewfik, A., & Pawate, R. (2020, May). Speech Recognition model compression. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7869–7873). IEEE.
Zurück zum Zitat Shahrebabaki, A. S., Siniscalchi, S. M., Salvi, G., & Svendsen, T. (2021, May). A DNN based speech enhancement approach to noise robust acoustic-to-articulatory inversion. In 2021 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE. Shahrebabaki, A. S., Siniscalchi, S. M., Salvi, G., & Svendsen, T. (2021, May). A DNN based speech enhancement approach to noise robust acoustic-to-articulatory inversion. In 2021 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.
Zurück zum Zitat Shen, Y. L., Huang, C. Y., Wang, S. S., Tsao, Y., Wang, H. M., & Chi, T. S. (2019, May). Reinforcement learning based speech enhancement for robust speech recognition. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6750–6754). IEEE. Shen, Y. L., Huang, C. Y., Wang, S. S., Tsao, Y., Wang, H. M., & Chi, T. S. (2019, May). Reinforcement learning based speech enhancement for robust speech recognition. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6750–6754). IEEE.
Zurück zum Zitat Sheng, P., Yang, Z., Hu, H., Tan, T., & Qian, Y. (2018, November). Data augmentation using conditional generative adversarial networks for robust speech recognition. In 2018 11th international symposium on Chinese spoken language processing (ISCSLP) (pp. 121–125). IEEE. Sheng, P., Yang, Z., Hu, H., Tan, T., & Qian, Y. (2018, November). Data augmentation using conditional generative adversarial networks for robust speech recognition. In 2018 11th international symposium on Chinese spoken language processing (ISCSLP) (pp. 121–125). IEEE.
Zurück zum Zitat Singh, A., Kadyan, V., Kumar, M., & Bassan, N. (2020). ASRoIL: A comprehensive survey for automatic speech recognition of Indian languages. Artificial Intelligence Review, 53(5), 3673–3704. Singh, A., Kadyan, V., Kumar, M., & Bassan, N. (2020). ASRoIL: A comprehensive survey for automatic speech recognition of Indian languages. Artificial Intelligence Review, 53(5), 3673–3704.
Zurück zum Zitat Song, Z. (2020). English speech recognition based on deep learning with multiple features. Computing, 102(3), 663–682.MathSciNetMATH Song, Z. (2020). English speech recognition based on deep learning with multiple features. Computing, 102(3), 663–682.MathSciNetMATH
Zurück zum Zitat Sriram, A., Jun, H., Gaur, Y., & Satheesh, S. (2018, April). Robust speech recognition using generative adversarial networks. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5639–5643). IEEE. Sriram, A., Jun, H., Gaur, Y., & Satheesh, S. (2018, April). Robust speech recognition using generative adversarial networks. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5639–5643). IEEE.
Zurück zum Zitat Sun, S., Yeh, C. F., Hwang, M. Y., Ostendorf, M., & Xie, L. (2018, April). Domain adversarial training for accented speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4854–4858). IEEE. Sun, S., Yeh, C. F., Hwang, M. Y., Ostendorf, M., & Xie, L. (2018, April). Domain adversarial training for accented speech recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4854–4858). IEEE.
Zurück zum Zitat Sun, S., Zhang, B., Xie, L., & Zhang, Y. (2017). An unsupervised deep domain adaptation approach for robust speech recognition. Neurocomputing, 257, 79–87. Sun, S., Zhang, B., Xie, L., & Zhang, Y. (2017). An unsupervised deep domain adaptation approach for robust speech recognition. Neurocomputing, 257, 79–87.
Zurück zum Zitat Szöke, I., Skácel, M., Mošner, L., Paliesek, J., & Černocký, J. (2019). Building and evaluation of a real room impulse response dataset. IEEE Journal of Selected Topics in Signal Processing, 13(4), 863–876. Szöke, I., Skácel, M., Mošner, L., Paliesek, J., & Černocký, J. (2019). Building and evaluation of a real room impulse response dataset. IEEE Journal of Selected Topics in Signal Processing, 13(4), 863–876.
Zurück zum Zitat Tambe, T., Yang, E-Y., Ko, G., Chai, Y., Hooper, C., Donato, M., Whatmough, P., Rush, A., Brooks, D., & Wei, G-Y. (2021, February). 9.8 A 25mm 2 SoC for IoT devices with 18ms noise-robust speech-to-text latency via Bayesian speech denoising and attention-based sequence-to-sequence DNN speech recognition in 16nm FinFET. In 2021 IEEE international solid-state circuits conference (ISSCC) (Vol. 64, pp. 158–160). IEEE. Tambe, T., Yang, E-Y., Ko, G., Chai, Y., Hooper, C., Donato, M., Whatmough, P., Rush, A., Brooks, D., & Wei, G-Y. (2021, February). 9.8 A 25mm 2 SoC for IoT devices with 18ms noise-robust speech-to-text latency via Bayesian speech denoising and attention-based sequence-to-sequence DNN speech recognition in 16nm FinFET. In 2021 IEEE international solid-state circuits conference (ISSCC) (Vol. 64, pp. 158–160). IEEE.
Zurück zum Zitat Tan, T., Lu, Y., Ma, R., Zhu, S., Guo, J., & Qian, Y. (2021, June). AI speech-SJTUASR system for the accented English speech recognition challenge. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6413–6417). IEEE. Tan, T., Lu, Y., Ma, R., Zhu, S., Guo, J., & Qian, Y. (2021, June). AI speech-SJTUASR system for the accented English speech recognition challenge. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6413–6417). IEEE.
Zurück zum Zitat Tang, Z., Chen, L., Wu, B., Yu, D., & Manocha, D. (2020, May). Improving reverberant speech training using diffuse acoustic simulation. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6969–6973). IEEE. Tang, Z., Chen, L., Wu, B., Yu, D., & Manocha, D. (2020, May). Improving reverberant speech training using diffuse acoustic simulation. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6969–6973). IEEE.
Zurück zum Zitat Thimmaraja, Y. G., Nagaraja, B. G., & Jayanna, H. S. (2021). Speech enhancement and encoding by combining SS-VAD and LPC. International Journal of Speech Technology, 24(1), 165–172. Thimmaraja, Y. G., Nagaraja, B. G., & Jayanna, H. S. (2021). Speech enhancement and encoding by combining SS-VAD and LPC. International Journal of Speech Technology, 24(1), 165–172.
Zurück zum Zitat Thomas, T., Spoorthy, V., Sobhana, N. V., & Koolagudi, S. G. (2020, December). Speaker recognition in emotional environment using excitation features. In 2020 third international conference on advances in electronics, computers and communications (ICAECC) (pp. 1–6). IEEE. Thomas, T., Spoorthy, V., Sobhana, N. V., & Koolagudi, S. G. (2020, December). Speaker recognition in emotional environment using excitation features. In 2020 third international conference on advances in electronics, computers and communications (ICAECC) (pp. 1–6). IEEE.
Zurück zum Zitat Vanderreydt, G., & Demuynck, K. (n.d.). A Novel Channel estimate for noise robust speech recognition. Available at SSRN 4330824. Vanderreydt, G., & Demuynck, K. (n.d.). A Novel Channel estimate for noise robust speech recognition. Available at SSRN 4330824.
Zurück zum Zitat Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II— NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II— NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.
Zurück zum Zitat Wang, Z. Q., & Wang, D. (2020, May). Multi-microphone complex spectral mapping for speech de-reverberation. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 486–490). IEEE. Wang, Z. Q., & Wang, D. (2020, May). Multi-microphone complex spectral mapping for speech de-reverberation. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 486–490). IEEE.
Zurück zum Zitat Wang, Z. Q., Wang, P., & Wang, D. (2020). Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1778–1787. Wang, Z. Q., Wang, P., & Wang, D. (2020). Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1778–1787.
Zurück zum Zitat Watanabe, S., Mandel, M., Barker, J., Vincent, E., Arora, A., Chang, X., Khudanpur, S., Manohar, V., Povey, D., Raj, D., Snyder, D., Subramanian, A.S., Trmal, J., Yair, B.B., Boeddeker, C., Ni, Z., Fujita, Y., Horiguchi, S., Kanda, N., et al. (2020). CHiME-6 challenge: Tackling multi-speaker speech recognition for unsegmented recordings. arXiv preprint arXiv:2004.09249. Watanabe, S., Mandel, M., Barker, J., Vincent, E., Arora, A., Chang, X., Khudanpur, S., Manohar, V., Povey, D., Raj, D., Snyder, D., Subramanian, A.S., Trmal, J., Yair, B.B., Boeddeker, C., Ni, Z., Fujita, Y., Horiguchi, S., Kanda, N., et al. (2020). CHiME-6 challenge: Tackling multi-speaker speech recognition for unsegmented recordings. arXiv preprint arXiv:​2004.​09249.
Zurück zum Zitat Wessel, F., Schluter, R., Macherey, K., & Ney, H. (2001). Confidence measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(3), 288–298. Wessel, F., Schluter, R., Macherey, K., & Ney, H. (2001). Confidence measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(3), 288–298.
Zurück zum Zitat Wu, B., Li, K., Ge, F., Huang, Z., Yang, M., Siniscalchi, S. M., & Lee, C. H. (2017). An end-to-end deep learning approach to simultaneous speech de-reverberation and acoustic modeling for robust speech recognition. IEEE Journal of Selected Topics in Signal Processing, 11(8), 1289–1300. Wu, B., Li, K., Ge, F., Huang, Z., Yang, M., Siniscalchi, S. M., & Lee, C. H. (2017). An end-to-end deep learning approach to simultaneous speech de-reverberation and acoustic modeling for robust speech recognition. IEEE Journal of Selected Topics in Signal Processing, 11(8), 1289–1300.
Zurück zum Zitat Xu, Y., Weng, C., Hui, L., Liu, J., Yu, M., Su, D., & Yu, D. (2019, May). Joint training of complex ratio mask based beam former and acoustic model for noise robust ASR. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6745–6749). IEEE. Xu, Y., Weng, C., Hui, L., Liu, J., Yu, M., Su, D., & Yu, D. (2019, May). Joint training of complex ratio mask based beam former and acoustic model for noise robust ASR. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6745–6749). IEEE.
Zurück zum Zitat Yadav, I. C., & Pradhan, G. (2021). Pitch and noise normalized acoustic feature for children’s ASR. Digital Signal Processing, 109, 102922. Yadav, I. C., & Pradhan, G. (2021). Pitch and noise normalized acoustic feature for children’s ASR. Digital Signal Processing, 109, 102922.
Zurück zum Zitat Yalamanchili, B., Dungala, K., Mandapati, K., Pillodi, M., & Vanga, S. R. (2021). Survey on multimodal emotion recognition (MER) Systems. In Machine learning technologies and applications: Proceedings of ICACECS 2020 (pp. 319–326). Springer. Yalamanchili, B., Dungala, K., Mandapati, K., Pillodi, M., & Vanga, S. R. (2021). Survey on multimodal emotion recognition (MER) Systems. In Machine learning technologies and applications: Proceedings of ICACECS 2020 (pp. 319–326). Springer.
Zurück zum Zitat Yang, S., Lee, M., & Kim, H. (2021, January). Deep learning-based syllable recognition framework for Korean children. In 2021 international conference on information networking (ICOIN) (pp. 723–726). IEEE. Yang, S., Lee, M., & Kim, H. (2021, January). Deep learning-based syllable recognition framework for Korean children. In 2021 international conference on information networking (ICOIN) (pp. 723–726). IEEE.
Zurück zum Zitat Yoshioka, T., & Gales, M. J. (2015). Environmentally robust ASR front-end for deep neural network acoustic models. Computer Speech & Language, 31(1), 65–86. Yoshioka, T., & Gales, M. J. (2015). Environmentally robust ASR front-end for deep neural network acoustic models. Computer Speech & Language, 31(1), 65–86.
Zurück zum Zitat Zealouk, O., Satori, H., Laaidi, N., Hamidi, M., & Satori, K. (2020). Noise effect on Amazigh digits in speech recognition system. International Journal of Speech Technology, 23(4), 885–892. Zealouk, O., Satori, H., Laaidi, N., Hamidi, M., & Satori, K. (2020). Noise effect on Amazigh digits in speech recognition system. International Journal of Speech Technology, 23(4), 885–892.
Zurück zum Zitat Zhang, S., Do, C. T., Doddipatla, R., Loweimi, E., Bell, P., & Renals, S. (2021, June). Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2750–2754). IEEE. Zhang, S., Do, C. T., Doddipatla, R., Loweimi, E., Bell, P., & Renals, S. (2021, June). Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2750–2754). IEEE.
Zurück zum Zitat Zhang, X., Zou, X., Sun, M., Zheng, T. F., Jia, C., & Wang, Y. (2019). Noise robust speaker recognition based on adaptive frame weighting in GMM for i-vector extraction. IEEE Access, 7, 27874–27882. Zhang, X., Zou, X., Sun, M., Zheng, T. F., Jia, C., & Wang, Y. (2019). Noise robust speaker recognition based on adaptive frame weighting in GMM for i-vector extraction. IEEE Access, 7, 27874–27882.
Zurück zum Zitat Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A. E. D., Jin, W., & Schuller, B. (2018). Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Transactions on Intelligent Systems and Technology (TIST), 9(5), 1–28. Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A. E. D., Jin, W., & Schuller, B. (2018). Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Transactions on Intelligent Systems and Technology (TIST), 9(5), 1–28.
Zurück zum Zitat Zheng, N., Shi, Y., Kang, Y., & Meng, Q. (2021, June). A noise-robust signal processing strategy for cochlear implants using neural networks. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 8343–8347). IEEE. Zheng, N., Shi, Y., Kang, Y., & Meng, Q. (2021, June). A noise-robust signal processing strategy for cochlear implants using neural networks. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 8343–8347). IEEE.
Zurück zum Zitat Zhou, P., Yang, W., Chen, W., Wang, Y., & Jia, J. (2019, May). Modality attention for end-to-end audio-visual speech recognition. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6565–6569). IEEE Zhou, P., Yang, W., Chen, W., Wang, Y., & Jia, J. (2019, May). Modality attention for end-to-end audio-visual speech recognition. In ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6565–6569). IEEE
Zurück zum Zitat Zhu, Q. S., Zhou, L., Zhang, J., Liu, S. J., Hu, Y. C., & Dai, L. R. (2022). Robust Data2vec: Noise-robust speech representation learning for ASR by combining regression and improved contrastive learning. arXiv preprint arXiv:2210.15324. Zhu, Q. S., Zhou, L., Zhang, J., Liu, S. J., Hu, Y. C., & Dai, L. R. (2022). Robust Data2vec: Noise-robust speech representation learning for ASR by combining regression and improved contrastive learning. arXiv preprint arXiv:​2210.​15324.
Zurück zum Zitat Zylich, B., & Whitehill, J. (2020, May). Noise-robust key-phrase detectors for automated classroom feedback. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 9215–9219). IEEE. Zylich, B., & Whitehill, J. (2020, May). Noise-robust key-phrase detectors for automated classroom feedback. In ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 9215–9219). IEEE.
Metadaten
Titel
Noise robust automatic speech recognition: review and analysis
verfasst von
Mohit Dua
Akanksha
Shelza Dua
Publikationsdatum
24.06.2023
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2023
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-023-10033-0

Weitere Artikel der Ausgabe 2/2023

International Journal of Speech Technology 2/2023 Zur Ausgabe

Neuer Inhalt