Skip to main content
Erschienen in: International Journal of Speech Technology 2/2023

21.02.2023

An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks

verfasst von: Bhuvaneshwari Jolad, Rajashri Khanai

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Dysarthric speech is the noisy or source distortion speech. Reasonable speech enhancement is required to obtain higher communication quality for non-stationary noises. Owing to complexities in speech rate of dysarthric persons, understanding their speech is more critical and complex task. The generic recognition systems do not perform well in speech recognition. Hence, this paper proposes a Fractional Competitive Crow Search Algorithm-based Speech Enhancement Generative Adversarial Network (FCCSA-SEGAN) for enhancing the speech signal. Initially, at the pre-processing stage, the noise from the speech signal is removed using spectral subtraction method. Then, pre-processed signal is fed to speech enhancement, where signal quality is enhanced by the Speech Enhancement Generative Adversarial Network (SEGAN), which is trained by the developed FCCA. By the incorporation of Fractional Calculus (FC) and Competitive Crow Search Algorithm (CSSA), proposed FCCA is obtained, in which CSSA is hybridization of Crow Search Algorithm (CSA) and Competitive Swarm Optimizer (CSO). After that, the features, such as Multiple Kernel Weighted Mel Frequency Cepstral Coefficient (MKMFCC), Linear Prediction Cepstral Coefficient (LPCC), spectral flux, spectral crest, spectral centroid, and pitch chroma are extracted. Moreover, to increase the dimensionality of signal samples, noises are added to the original signal through data augmentation phase. Finally, using Competitive Crow Search Algorithm-based Hierarchical Attention Network (CCSA-based HAN), speech recognition process is done. In addition, the performance of the proposed method is evaluated using the UA speech database and the accuracy, sensitivity, and specificity of 0.930, 0.933, and 0.934 are obtained by the proposed method. By the proposed speech enhancement approach, higher Perceptual Evaluation of Speech Quality (PESQ) and lower Root Mean Square Error (RMSE) of 3.14, and 0.022 are attained.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Anita, J. S., & Abinaya, J. S. (2019). Impact of supervised classifier on speech emotion recognition. Multimedia Research, 2(1), 9–16. Anita, J. S., & Abinaya, J. S. (2019). Impact of supervised classifier on speech emotion recognition. Multimedia Research, 2(1), 9–16.
Zurück zum Zitat Arul, V., Sivakumar, V. G., Marimuthu, R., & Chakraborty, B. (2019). An approach for speech enhancement using deep convolutional neural network. Multimedia Research, 2(1), 37–44. Arul, V., Sivakumar, V. G., Marimuthu, R., & Chakraborty, B. (2019). An approach for speech enhancement using deep convolutional neural network. Multimedia Research, 2(1), 37–44.
Zurück zum Zitat Askarzadeh, A. (2016). A novel metaheuristic method for solving constrained engineering optimization problems: Crow search algorithm. Computers and Structures, 169, 1–12.CrossRef Askarzadeh, A. (2016). A novel metaheuristic method for solving constrained engineering optimization problems: Crow search algorithm. Computers and Structures, 169, 1–12.CrossRef
Zurück zum Zitat Bhaladhare, P. R., & Jinwala, D. C. (2014). A clustering approach for the -diversity model in privacypreserving data mining using fractional calculus-bacterial. Advances in Computer Engineering. Bhaladhare, P. R., & Jinwala, D. C. (2014). A clustering approach for the -diversity model in privacypreserving data mining using fractional calculus-bacterial. Advances in Computer Engineering.
Zurück zum Zitat Cheng, R., & Jin, Y. (2014). A competitive swarm optimizer for large scale optimization. IEEE Transactions on Cybernetics, 45(2), 191–204.CrossRef Cheng, R., & Jin, Y. (2014). A competitive swarm optimizer for large scale optimization. IEEE Transactions on Cybernetics, 45(2), 191–204.CrossRef
Zurück zum Zitat Dash, T. K., & Solanki, S. S. (2020). Speech intelligibility based enhancement system using modified deep neural network and adaptive multi-band spectral subtraction. Wireless Personal Communications, 111(2), 1073–1087.CrossRef Dash, T. K., & Solanki, S. S. (2020). Speech intelligibility based enhancement system using modified deep neural network and adaptive multi-band spectral subtraction. Wireless Personal Communications, 111(2), 1073–1087.CrossRef
Zurück zum Zitat Enderby, P. (2013). Disorders of communication: Dysarthria. Handbook of Clinical Neurology, 110, 273–281.CrossRef Enderby, P. (2013). Disorders of communication: Dysarthria. Handbook of Clinical Neurology, 110, 273–281.CrossRef
Zurück zum Zitat Faragallah, O. S. (2018). Robust noise MKMFCC–SVM automatic speaker identification. International Journal of Speech Technology, 21(2), 185–192.CrossRef Faragallah, O. S. (2018). Robust noise MKMFCC–SVM automatic speaker identification. International Journal of Speech Technology, 21(2), 185–192.CrossRef
Zurück zum Zitat Fritsch, J., & Magimai-Doss, M. (2021). Utterance verification-based dysarthric speech intelligibility assessment using phonetic posterior features. IEEE Signal Processing Letters, 28, 224–228.CrossRef Fritsch, J., & Magimai-Doss, M. (2021). Utterance verification-based dysarthric speech intelligibility assessment using phonetic posterior features. IEEE Signal Processing Letters, 28, 224–228.CrossRef
Zurück zum Zitat Garg, A., & Sahu, O. P. (2015). Cuckoo search based optimal mask generation for noise suppression and enhancement of speech signal. Journal of King Saud University-Computer and Information Sciences, 27(3), 269–277.CrossRef Garg, A., & Sahu, O. P. (2015). Cuckoo search based optimal mask generation for noise suppression and enhancement of speech signal. Journal of King Saud University-Computer and Information Sciences, 27(3), 269–277.CrossRef
Zurück zum Zitat Gurugubelli, K., & Vuppala, A. K. (2020). Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Communication, 121, 1–15.CrossRef Gurugubelli, K., & Vuppala, A. K. (2020). Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Communication, 121, 1–15.CrossRef
Zurück zum Zitat Haridas, A. V., Marimuthu, R., & Chakraborty, B. (2018). A novel approach to improve the speech intelligibility using fractional delta-amplitude modulation spectrogram. Cybernetics and Systems, 49(7–8), 421–451.CrossRef Haridas, A. V., Marimuthu, R., & Chakraborty, B. (2018). A novel approach to improve the speech intelligibility using fractional delta-amplitude modulation spectrogram. Cybernetics and Systems, 49(7–8), 421–451.CrossRef
Zurück zum Zitat Hasegawa-Johnson, M., Gunderson, J., Perlman, A., & Huang, T. (2006). HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. In IEEE international conference on acoustics speech and signal processing proceedings (Vol. 3). Hasegawa-Johnson, M., Gunderson, J., Perlman, A., & Huang, T. (2006). HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. In IEEE international conference on acoustics speech and signal processing proceedings (Vol. 3).
Zurück zum Zitat He, Q., Bao, F., & Bao, C. (2016). Multiplicative update of auto-regressive gains for codebook-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 457–468.CrossRef He, Q., Bao, F., & Bao, C. (2016). Multiplicative update of auto-regressive gains for codebook-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 457–468.CrossRef
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2007). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.CrossRef Hu, Y., & Loizou, P. C. (2007). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.CrossRef
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2017). NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.CrossRef Hu, Y., & Loizou, P. C. (2017). NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.CrossRef
Zurück zum Zitat Jain, U., Nathani, K., Ruban, N., Raj, A. N. J., Zhuang, Z., & Mahesh, V. G. (2018) Cubic SVM classifier based feature extraction and emotion detection from speech signals. In IEEE International Conference on Sensor Networks and Signal Processing (SNSP) (pp. 386–391). Jain, U., Nathani, K., Ruban, N., Raj, A. N. J., Zhuang, Z., & Mahesh, V. G. (2018) Cubic SVM classifier based feature extraction and emotion detection from speech signals. In IEEE International Conference on Sensor Networks and Signal Processing (SNSP) (pp. 386–391).
Zurück zum Zitat KhaleelurRahiman, P. F., Jayanthi, V. S., & Jayanthi, A. N. (2021). Speech enhancement method using deep learning approach for hearing-impaired listeners. Health Informatics Journal, 27(1), 1460458219893850. KhaleelurRahiman, P. F., Jayanthi, V. S., & Jayanthi, A. N. (2021). Speech enhancement method using deep learning approach for hearing-impaired listeners. Health Informatics Journal, 27(1), 1460458219893850.
Zurück zum Zitat Loizou, P. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601.CrossRef Loizou, P. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601.CrossRef
Zurück zum Zitat Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Transactions on Speech and Audio Processing, 13(5), 845–856.CrossRef Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Transactions on Speech and Audio Processing, 13(5), 845–856.CrossRef
Zurück zum Zitat Narendra, N. P., & Alku, P. (2021). Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Computer Speech & Language, 65, 101117.CrossRef Narendra, N. P., & Alku, P. (2021). Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Computer Speech & Language, 65, 101117.CrossRef
Zurück zum Zitat Pascual, S., Bonafonte, A., & Serra, J. (2017). SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452. Pascual, S., Bonafonte, A., & Serra, J. (2017). SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:​1703.​09452.
Zurück zum Zitat Polur, P. D., & Miller, G. E. (2006). Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. Medical Engineering & Physics, 28(8), 741–748.CrossRef Polur, P. D., & Miller, G. E. (2006). Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. Medical Engineering & Physics, 28(8), 741–748.CrossRef
Zurück zum Zitat Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 852–861.CrossRef Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 852–861.CrossRef
Zurück zum Zitat Sidi Yakoub, M., Selouani, S. A., Zaidi, B. F., & Bouchair, A. (2020). Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–7.CrossRef Sidi Yakoub, M., Selouani, S. A., Zaidi, B. F., & Bouchair, A. (2020). Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–7.CrossRef
Zurück zum Zitat Takashima, Y., Takashima, R., Takiguchi, T., & Ariki, Y. (2020). Dysarthric speech recognition based on deep metric learning. In Interspeech (pp. 4796–4800). Takashima, Y., Takashima, R., Takiguchi, T., & Ariki, Y. (2020). Dysarthric speech recognition based on deep metric learning. In Interspeech (pp. 4796–4800).
Zurück zum Zitat Trinh, V. A., & Braun, S. (2021). Unsupervised speech enhancement with speech recognition embedding and disentanglement losses. arXiv:2111.08678 [eess.AS] Trinh, V. A., & Braun, S. (2021). Unsupervised speech enhancement with speech recognition embedding and disentanglement losses. arXiv:​2111.​08678 [eess.AS]
Zurück zum Zitat Wang, Y., Han, J., Zhang, T., & Qing, D. (2021). Speech enhancement from fused features based on deep neural network and gated recurrent unit network. EURASIP Journal on Advances in Signal Processing, 1, 1–19. Wang, Y., Han, J., Zhang, T., & Qing, D. (2021). Speech enhancement from fused features based on deep neural network and gated recurrent unit network. EURASIP Journal on Advances in Signal Processing, 1, 1–19.
Zurück zum Zitat Welker, S., Richter, J., & Gerkmann, T. (2022). Speech enhancement with score-based generative models in the complex STFT domain. arXiv:2203.17004 [eess.AS] Welker, S., Richter, J., & Gerkmann, T. (2022). Speech enhancement with score-based generative models in the complex STFT domain. arXiv:​2203.​17004 [eess.AS]
Zurück zum Zitat Woszczyk, D., Petridis, S., & Millard, D. (2020). Domain adversarial neural networks for dysarthric speech recognition. arXiv preprint arXiv:2010.03623 Woszczyk, D., Petridis, S., & Millard, D. (2020). Domain adversarial neural networks for dysarthric speech recognition. arXiv preprint arXiv:​2010.​03623
Zurück zum Zitat Xiong, F., Barker, J., & Christensen, H. (2018). Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition. In 13th ITG-symposium on speech communication (pp. 1–5). Xiong, F., Barker, J., & Christensen, H. (2018). Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition. In 13th ITG-symposium on speech communication (pp. 1–5).
Zurück zum Zitat Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1480–1489) Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1480–1489)
Zurück zum Zitat Yu, J., Xie, X., Liu, S., Hu, S., Lam, M. W., Wu, X., Wong, K. H., Liu, X., & Meng, H. (2018). Development of the CUHK dysarthric speech recognition system for the UA speech corpus. In Interspeech (pp. 2938–2942). Yu, J., Xie, X., Liu, S., Hu, S., Lam, M. W., Wu, X., Wong, K. H., Liu, X., & Meng, H. (2018). Development of the CUHK dysarthric speech recognition system for the UA speech corpus. In Interspeech (pp. 2938–2942).
Zurück zum Zitat Yue, Z., Christensen, H., & Barker, J. (2020). Autoencoder bottleneck features with multi-task optimisation for improved continuous dysarthric speech recognition. In Proceedings of the annual conference of the international speech communication association, international speech communication association (ISCA). Yue, Z., Christensen, H., & Barker, J. (2020). Autoencoder bottleneck features with multi-task optimisation for improved continuous dysarthric speech recognition. In Proceedings of the annual conference of the international speech communication association, international speech communication association (ISCA).
Metadaten
Titel
An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks
verfasst von
Bhuvaneshwari Jolad
Rajashri Khanai
Publikationsdatum
21.02.2023
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2023
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-023-10019-y

Weitere Artikel der Ausgabe 2/2023

International Journal of Speech Technology 2/2023 Zur Ausgabe

Neuer Inhalt