Skip to main content
Top
Published in:

16-06-2024

Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech

Authors: Md. Talib Ahmad, Gayadhar Pradhan, Jyoti Prakash Singh

Published in: Circuits, Systems, and Signal Processing | Issue 10/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper investigates the nuanced characteristics of the spectral envelope attributes due to vocal-tract resonance structure and fine-level excitation source features within short-term Fourier transform (STFT) magnitude spectra for the assessment of dysarthria. The single-channel convolutional neural network (CNN) employing time-frequency representations such as STFT spectrogram (STFT-SPEC) and Mel-spectrogram (MEL-SPEC) does not ensure capture of the source and system information simultaneously due to the filtering operation using a fixed-size filter. Building upon this observation, this study first explores the significance of convolution filter size in the context of the CNN-based automated dysarthric assessment system. An approach is then introduced to effectively capture resonance structure and fine-level features through a multi-channel CNN. In the proposed approach, the STFT-SPEC is decomposed using a one-level discrete wavelet transform (DWT) to separate the slow-varying spectral structure and fine-level features. The resulting decomposed coefficients in four directions are taken as the inputs to multi-channel CNN to capture the source and system features by employing different sizes of convolution filters. The experimental results conducted on the UA-speech corpus validate the efficacy of the proposed approach utilizing multi-channel CNN. The proposed approach demonstrates the notable enhancement in accuracy and F1 score (60.86% and 48.52%) compared to a single-channel CNN using STFT-SPEC (46.45% and 40.97%), MEL-SPEC (48.86% and 38.20%), and MEL-SPEC appended with delta and delta-delta coefficients (52.40% and 42.84%) for assessment of dysarthria in a speaker-independent and text-independent mode.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Show more products
Literature
1.
go back to reference A. Aggarwal, Enhancement of GPS position accuracy using machine vision and deep learning techniques. J. Comput. Sci. 16(5), 651–659 (2020)CrossRef A. Aggarwal, Enhancement of GPS position accuracy using machine vision and deep learning techniques. J. Comput. Sci. 16(5), 651–659 (2020)CrossRef
2.
go back to reference K. An, M. Kim, K. Teplansky, J. Green, T. Campbell, Y. Yunusova, D. Heitzman, J. Wang, Automatic early detection of amyotrophic lateral sclerosis from intelligible speech using convolutional neural networks. Proc. Interspeech 2018, 1913–1917 (2018) K. An, M. Kim, K. Teplansky, J. Green, T. Campbell, Y. Yunusova, D. Heitzman, J. Wang, Automatic early detection of amyotrophic lateral sclerosis from intelligible speech using convolutional neural networks. Proc. Interspeech 2018, 1913–1917 (2018)
3.
go back to reference M. Aqil, A. Jbari, A. Bourouhou, ECG signal denoising by discrete wavelet transform. Int. J. Online Eng. 13, 51 (2017)CrossRef M. Aqil, A. Jbari, A. Bourouhou, ECG signal denoising by discrete wavelet transform. Int. J. Online Eng. 13, 51 (2017)CrossRef
4.
go back to reference K.K. Baker, L.O. Ramig, E.S. Luschei, M.E. Smith, Thyroarytenoid muscle activity associated with hypophonia in Parkinson’s disease and aging. Neurology 51(6), 1592–1598 (1998)CrossRef K.K. Baker, L.O. Ramig, E.S. Luschei, M.E. Smith, Thyroarytenoid muscle activity associated with hypophonia in Parkinson’s disease and aging. Neurology 51(6), 1592–1598 (1998)CrossRef
5.
go back to reference S.S. Barreto, K.Z. Ortiz, Speech intelligibility in dysarthrias: influence of utterance length. Folia Phoniatr. Logop. 72(3), 202–210 (2020)CrossRef S.S. Barreto, K.Z. Ortiz, Speech intelligibility in dysarthrias: influence of utterance length. Folia Phoniatr. Logop. 72(3), 202–210 (2020)CrossRef
6.
go back to reference A. Benba, A. Jilbab, A. Hammouch, Detecting patients with Parkinson’s disease using Mel frequency cepstral coefficients and support vector machines. Int. J. Electr. Eng. Inform. 7(2), 297–307 (2015) A. Benba, A. Jilbab, A. Hammouch, Detecting patients with Parkinson’s disease using Mel frequency cepstral coefficients and support vector machines. Int. J. Electr. Eng. Inform. 7(2), 297–307 (2015)
7.
go back to reference J.C. Brown, Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)CrossRef J.C. Brown, Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)CrossRef
8.
go back to reference H. Chandrashekar, V. Karjigi, N. Sreedevi, Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE J. Sel. Topics Signal Process. 14(2), 390–399 (2019)CrossRef H. Chandrashekar, V. Karjigi, N. Sreedevi, Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE J. Sel. Topics Signal Process. 14(2), 390–399 (2019)CrossRef
9.
go back to reference H. Chandrashekar, V. Karjigi, N. Sreedevi, Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech. IEEE Trans. Neural Syst. Rehabil. Eng. 28(12), 2880–2889 (2020)CrossRef H. Chandrashekar, V. Karjigi, N. Sreedevi, Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech. IEEE Trans. Neural Syst. Rehabil. Eng. 28(12), 2880–2889 (2020)CrossRef
10.
go back to reference G. Constantinescu, D. Theodoros, T. Russell, E. Ward, S. Wilson, R. Wootton, Assessing disordered speech and voice in Parkinson’s disease: a telerehabilitation application. Int. J. Lang. Commun. Disord. 45(6), 630–644 (2010)CrossRef G. Constantinescu, D. Theodoros, T. Russell, E. Ward, S. Wilson, R. Wootton, Assessing disordered speech and voice in Parkinson’s disease: a telerehabilitation application. Int. J. Lang. Commun. Disord. 45(6), 630–644 (2010)CrossRef
11.
go back to reference J.R. Duffy. Motor speech disorders: Clues to neurologic diagnosis, in Parkinson’s disease and movement disorders: Diagnosis and treatment guidelines for the practicing physician, pp. 35–53 (2000) J.R. Duffy. Motor speech disorders: Clues to neurologic diagnosis, in Parkinson’s disease and movement disorders: Diagnosis and treatment guidelines for the practicing physician, pp. 35–53 (2000)
12.
go back to reference T.H. Falk, W.Y. Chan, F. Shein, Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Commun. 54(5), 622–631 (2012)CrossRef T.H. Falk, W.Y. Chan, F. Shein, Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Commun. 54(5), 622–631 (2012)CrossRef
13.
go back to reference T.H. Falk, R. Hummel, W.Y. Chan. Quantifying perturbations in temporal dynamics for automated assessment of spastic dysarthric speech intelligibility, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4480–4483 (2011) T.H. Falk, R. Hummel, W.Y. Chan. Quantifying perturbations in temporal dynamics for automated assessment of spastic dysarthric speech intelligibility, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4480–4483 (2011)
14.
go back to reference K. Gurugubelli, A.K. Vuppala, Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Commun. 121, 1–15 (2020)CrossRef K. Gurugubelli, A.K. Vuppala, Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Commun. 121, 1–15 (2020)CrossRef
15.
go back to reference A. Hernandez, S. Kim, M. Chung, Prosody-based measures for automatic severity assessment of dysarthric speech. Appl. Sci. 10(19), 6999 (2020)CrossRef A. Hernandez, S. Kim, M. Chung, Prosody-based measures for automatic severity assessment of dysarthric speech. Appl. Sci. 10(19), 6999 (2020)CrossRef
16.
go back to reference S.A. Hicks, I. Strümke, V. Thambawita, M. Hammou, M.A. Riegler, P. Halvorsen, S. Parasa, On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 12(1), 1–9 (2022)CrossRef S.A. Hicks, I. Strümke, V. Thambawita, M. Hammou, M.A. Riegler, P. Halvorsen, S. Parasa, On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 12(1), 1–9 (2022)CrossRef
17.
go back to reference N.M. Joy, S. Umesh, Improving acoustic models in torgo dysarthric speech database. IEEE Trans. Neural Syst. Rehabil. Eng. 26(3), 637–645 (2018)CrossRef N.M. Joy, S. Umesh, Improving acoustic models in torgo dysarthric speech database. IEEE Trans. Neural Syst. Rehabil. Eng. 26(3), 637–645 (2018)CrossRef
18.
go back to reference K.L. Kadi, S.A. Selouani, B. Boudraa, M. Boudraa. Automated diagnosis and assessment of dysarthric speech using relevant prosodic features, in Transactions on Engineering Technologies: Special Volume of the World Congress on Engineering 2013, (Springer, 2014) pp. 529–542 K.L. Kadi, S.A. Selouani, B. Boudraa, M. Boudraa. Automated diagnosis and assessment of dysarthric speech using relevant prosodic features, in Transactions on Engineering Technologies: Special Volume of the World Congress on Engineering 2013, (Springer, 2014) pp. 529–542
19.
go back to reference T. Kapoor, R. Sharma, Parkinson’s disease diagnosis using MEL-frequency cepstral coefficients and vector quantization. Int. J. Comput. Appl. 14(3), 43–46 (2011) T. Kapoor, R. Sharma, Parkinson’s disease diagnosis using MEL-frequency cepstral coefficients and vector quantization. Int. J. Comput. Appl. 14(3), 43–46 (2011)
20.
go back to reference R.D. Kent, G. Weismer, J.F. Kent, H.K. Vorperian, J.R. Duffy, Acoustic studies of dysarthric speech: methods, progress, and potential. J. Commun. Disord. 32(3), 141–186 (1999)CrossRef R.D. Kent, G. Weismer, J.F. Kent, H.K. Vorperian, J.R. Duffy, Acoustic studies of dysarthric speech: methods, progress, and potential. J. Commun. Disord. 32(3), 141–186 (1999)CrossRef
21.
go back to reference H. Kim, M. Hasegawa-Johnson, A. Perlman, J. Gunderson, T.S. Huang, K. Watkin, S. Frame. Dysarthric speech database for universal access research, in Proc. INTERSPEECH, pp. 1741–1744 (2008) H. Kim, M. Hasegawa-Johnson, A. Perlman, J. Gunderson, T.S. Huang, K. Watkin, S. Frame. Dysarthric speech database for universal access research, in Proc. INTERSPEECH, pp. 1741–1744 (2008)
22.
go back to reference I. Kodrasi, Temporal envelope and fine structure cues for dysarthric speech detection using CNNS. IEEE Signal Process. Lett. 28, 1853–1857 (2021)CrossRef I. Kodrasi, Temporal envelope and fine structure cues for dysarthric speech detection using CNNS. IEEE Signal Process. Lett. 28, 1853–1857 (2021)CrossRef
23.
go back to reference R. Kronland-Martinet, J. Morlet, A. Grossmann, Analysis of sound patterns through wavelet transforms. Int. J. Pattern Recognit Artif Intell. 1(02), 273–302 (1987)CrossRef R. Kronland-Martinet, J. Morlet, A. Grossmann, Analysis of sound patterns through wavelet transforms. Int. J. Pattern Recognit Artif Intell. 1(02), 273–302 (1987)CrossRef
24.
go back to reference A. Kumar, G. Pradhan, Detection of vowel onset and offset points using non-local similarity between DWT approximation coefficients. Electron. Lett. 54(11), 722–724 (2018)CrossRef A. Kumar, G. Pradhan, Detection of vowel onset and offset points using non-local similarity between DWT approximation coefficients. Electron. Lett. 54(11), 722–724 (2018)CrossRef
25.
go back to reference R. Kumar, P.K. Singh, J. Yadav, Digital image watermarking technique based on adaptive median filter and hl sub-band of two-stage dwt. Int. J. Comput. Aided Eng. Technol. 18(4), 290–310 (2023)CrossRef R. Kumar, P.K. Singh, J. Yadav, Digital image watermarking technique based on adaptive median filter and hl sub-band of two-stage dwt. Int. J. Comput. Aided Eng. Technol. 18(4), 290–310 (2023)CrossRef
26.
go back to reference X. Ma, D. Wang, D. Liu, J. Yang, DWT and CNN based multi-class motor imagery electroencephalographic signal recognition. J. Neural Eng. 17(1), 016, 073 (2020)CrossRef X. Ma, D. Wang, D. Liu, J. Yang, DWT and CNN based multi-class motor imagery electroencephalographic signal recognition. J. Neural Eng. 17(1), 016, 073 (2020)CrossRef
27.
go back to reference A. Maier, T. Haderlein, F. Stelzle, E. Nöth, E. Nkenke, F. Rosanowski, A. Schützenberger, M. Schuster, Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J. Audio Speech Music Process. 2010, 1–7 (2009)CrossRef A. Maier, T. Haderlein, F. Stelzle, E. Nöth, E. Nkenke, F. Rosanowski, A. Schützenberger, M. Schuster, Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J. Audio Speech Music Process. 2010, 1–7 (2009)CrossRef
28.
go back to reference D. Maini, A.K. Aggarwal, Camera position estimation using 2d image dataset. Int. J. Innov. Eng. Technol. 10, 199–203 (2018) D. Maini, A.K. Aggarwal, Camera position estimation using 2d image dataset. Int. J. Innov. Eng. Technol. 10, 199–203 (2018)
29.
go back to reference D. Martínez, E. Lleida, P. Green, H. Christensen, A. Ortega, A. Miguel, Intelligibility assessment and speech recognizer word accuracy rate prediction for dysarthric speakers in a factor analysis subspace. ACM Trans. Accessible Comput. 6(3), 1–21 (2015)CrossRef D. Martínez, E. Lleida, P. Green, H. Christensen, A. Ortega, A. Miguel, Intelligibility assessment and speech recognizer word accuracy rate prediction for dysarthric speakers in a factor analysis subspace. ACM Trans. Accessible Comput. 6(3), 1–21 (2015)CrossRef
30.
go back to reference C. Middag, G. Van Nuffelen, J.P. Martens, M. De Bodt. Objective intelligibility assessment of pathological speakers, in 9th annual conference of the international speech communication association (interspeech 2008), (International Speech Communication Association (ISCA), 2008) pp. 1745–1748 C. Middag, G. Van Nuffelen, J.P. Martens, M. De Bodt. Objective intelligibility assessment of pathological speakers, in 9th annual conference of the international speech communication association (interspeech 2008), (International Speech Communication Association (ISCA), 2008) pp. 1745–1748
31.
go back to reference J. Müller, G.K. Wenning, M. Verny, A. McKee, K.R. Chaudhuri, K. Jellinger, W. Poewe, I. Litvan, Progression of dysarthria and dysphagia in postmortem-confirmed parkinsonian disorders. Arch. Neurol. 58(2), 259–264 (2001)CrossRef J. Müller, G.K. Wenning, M. Verny, A. McKee, K.R. Chaudhuri, K. Jellinger, W. Poewe, I. Litvan, Progression of dysarthria and dysphagia in postmortem-confirmed parkinsonian disorders. Arch. Neurol. 58(2), 259–264 (2001)CrossRef
32.
go back to reference N. Narendra, P. Alku, Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Comput. Speech Lang. 65, 1–14 (2021)CrossRef N. Narendra, P. Alku, Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Comput. Speech Lang. 65, 1–14 (2021)CrossRef
33.
go back to reference P.D. Polur, G.E. Miller, Experiments with fast fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden markov model. IEEE Trans. Neural Syst. Rehabil. Eng. 13(4), 558–561 (2005)CrossRef P.D. Polur, G.E. Miller, Experiments with fast fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden markov model. IEEE Trans. Neural Syst. Rehabil. Eng. 13(4), 558–561 (2005)CrossRef
34.
go back to reference Y. Qian, M. Bi, T. Tan, K. Yu, Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2263–2276 (2016)CrossRef Y. Qian, M. Bi, T. Tan, K. Yu, Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2263–2276 (2016)CrossRef
35.
go back to reference J. Ramirez, J.M. Górriz, J.C. Segura, Voice activity detection. Fundamentals and speech recognition system robustness. Robust Speech Recogn. Understand. 6(9), 1–22 (2007) J. Ramirez, J.M. Górriz, J.C. Segura, Voice activity detection. Fundamentals and speech recognition system robustness. Robust Speech Recogn. Understand. 6(9), 1–22 (2007)
36.
go back to reference S. Ratsameewichai, N. Theera-Umpon, J. Vilasdechanon, S. Uatrongjit, K. Likit-Anurucks. Thai phoneme segmentation using dual-band energy contour, in ITC-CSCC, pp. 111–113 (2002) S. Ratsameewichai, N. Theera-Umpon, J. Vilasdechanon, S. Uatrongjit, K. Likit-Anurucks. Thai phoneme segmentation using dual-band energy contour, in ITC-CSCC, pp. 111–113 (2002)
37.
go back to reference P. Sahane, S. Pangaonkar, S. Khandekar. Dysarthric speech recognition using multi-taper mel frequency cepstrum coefficients, in 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–4. IEEE (2021) P. Sahane, S. Pangaonkar, S. Khandekar. Dysarthric speech recognition using multi-taper mel frequency cepstrum coefficients, in 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–4. IEEE (2021)
38.
go back to reference L.P. Sahu, G. Pradhan, Analysis of short-time magnitude spectra for improving intelligibility assessment of dysarthric speech. Circ. Syst. Signal Process. 41, 5676–5698 (2022)CrossRef L.P. Sahu, G. Pradhan, Analysis of short-time magnitude spectra for improving intelligibility assessment of dysarthric speech. Circ. Syst. Signal Process. 41, 5676–5698 (2022)CrossRef
39.
go back to reference L.P. Sahu, G. Pradhan. Significance of filterbank structure for capturing dysarthric information through cepstral coefficients, in SPCOM 2022-IEEE International Conference on Signal Processing and Communications pp. 1–5 (2022) L.P. Sahu, G. Pradhan. Significance of filterbank structure for capturing dysarthric information through cepstral coefficients, in SPCOM 2022-IEEE International Conference on Signal Processing and Communications pp. 1–5 (2022)
40.
go back to reference R. Sandyk, Resolution of dysarthria in multiple sclerosis by treatment with weak electromagnetic fields. Int. J. Neurosci. 83(1–2), 81–92 (1995)CrossRef R. Sandyk, Resolution of dysarthria in multiple sclerosis by treatment with weak electromagnetic fields. Int. J. Neurosci. 83(1–2), 81–92 (1995)CrossRef
41.
go back to reference P. Singh, G. Pradhan, S. Shahnawazuddin, Denoising of ECG signal by non-local estimation of approximation coefficients in dwt. Biocybern. Biomed. Eng. 37(3), 599–610 (2017)CrossRef P. Singh, G. Pradhan, S. Shahnawazuddin, Denoising of ECG signal by non-local estimation of approximation coefficients in dwt. Biocybern. Biomed. Eng. 37(3), 599–610 (2017)CrossRef
42.
go back to reference S. Skodda, W. Visser, U. Schlegel, Vowel articulation in Parkinson’s disease. J. Voice 25(4), 467–472 (2011)CrossRef S. Skodda, W. Visser, U. Schlegel, Vowel articulation in Parkinson’s disease. J. Voice 25(4), 467–472 (2011)CrossRef
43.
go back to reference R.S. Stanković, B.J. Falkowski, The HAAR wavelet transform: its status and achievements. Comput. Electr. Eng. 29(1), 25–44 (2003)CrossRef R.S. Stanković, B.J. Falkowski, The HAAR wavelet transform: its status and achievements. Comput. Electr. Eng. 29(1), 25–44 (2003)CrossRef
44.
go back to reference R. Thukral, A. Arora, A. Kumar. Gulshan: Denoising of thermal images using deep neural network, in Proceedings of International Conference on Recent Trends in Computing: ICRTC 2021, (Springer, 2022) pp. 827–833 R. Thukral, A. Arora, A. Kumar. Gulshan: Denoising of thermal images using deep neural network, in Proceedings of International Conference on Recent Trends in Computing: ICRTC 2021, (Springer, 2022) pp. 827–833
45.
go back to reference G. Tzanetakis, G. Essl, P. Cook. Audio analysis using the discrete wavelet transform, in Proc. conf. in acoustics and music theory applications, vol. 66. (Citeseer, 2001) G. Tzanetakis, G. Essl, P. Cook. Audio analysis using the discrete wavelet transform, in Proc. conf. in acoustics and music theory applications, vol. 66. (Citeseer, 2001)
46.
go back to reference B.J. Wilson Bronagh, Acoustic variability in dysarthria and computer speech recognition. Clin. Linguist. Phon. 14(4), 307–327 (2000)CrossRef B.J. Wilson Bronagh, Acoustic variability in dysarthria and computer speech recognition. Clin. Linguist. Phon. 14(4), 307–327 (2000)CrossRef
Metadata
Title
Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech
Authors
Md. Talib Ahmad
Gayadhar Pradhan
Jyoti Prakash Singh
Publication date
16-06-2024
Publisher
Springer US
Published in
Circuits, Systems, and Signal Processing / Issue 10/2024
Print ISSN: 0278-081X
Electronic ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-024-02739-6