Skip to main content
Top
Published in: International Journal of Speech Technology 4/2022

30-10-2022

Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features

Authors: Y. Madhu Keerthana, K. Sreenivasa Rao, Pabitra Mitra

Published in: International Journal of Speech Technology | Issue 4/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Dysarthria is a motor speech impairment that impacts verbal articulation and co-ordination. Detecting dysarthria is a primary and essential step for early diagnosis and treatment. In this paper, we attempt dysarthric speech detection from telephone quality speech by using pitch perturbation (PP) measures computed with the recently introduced continuous wavelet transform (CWT)-based epoch extraction approach. This approach has the strong advantage that it is highly robust to telephone channel degradations. Six PP measures were computed from the extracted epochs. For comparison, the PP measures were also derived using two well-known epoch extraction methods, namely, zero-frequency filtering (ZFF) and dynamic programming phase slope algorithm (DYPSA). The experiments were carried out using the TORGO dysarthric speech database, which consists of speech from 7 healthy speakers and 8 dysarthric speakers. The G.191 software tools were used to convert clean speech to telephone speech. The results show that the PP measures computed with the CWT-based approach can better discriminate dysarthric and healthy speakers under telephone environment than those extracted with the other two epoch extraction methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Adiga, N., Vikram, C. M., Pullela, K., & Prasanna, S. M. (2017). Zero frequency filter based analysis of voice disorders. In Proceedings of the Interspeech 2017, August 20–24, Stockholm, Sweden. Adiga, N., Vikram, C. M., Pullela, K., & Prasanna, S. M. (2017). Zero frequency filter based analysis of voice disorders. In Proceedings of the Interspeech 2017, August 20–24, Stockholm, Sweden.
go back to reference Berisha, V., Liss, J., Sandoval, S., Utianski, R., & Spanias, A. (2014). Modeling pathological speech perception from data with similarity labels. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 915–919). Berisha, V., Liss, J., Sandoval, S., Utianski, R., & Spanias, A. (2014). Modeling pathological speech perception from data with similarity labels. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 915–919).
go back to reference Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017). Automatic assessment of dysarthria severity level using audio descriptors. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 5070–5074). Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017). Automatic assessment of dysarthria severity level using audio descriptors. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 5070–5074).
go back to reference Black, A. W., King, S., & Tokuda, K. (2009). The blizzard challenge 2009. In Proceedings of the of blizzard challenge (pp. 1–24). Black, A. W., King, S., & Tokuda, K. (2009). The blizzard challenge 2009. In Proceedings of the of blizzard challenge (pp. 1–24).
go back to reference Cortes, C., & Vapnik, V. (1995). Two-stage learning kernel algorithms. Machine Learning, 20(3), 273-297. Cortes, C., & Vapnik, V. (1995). Two-stage learning kernel algorithms. Machine Learning, 20(3), 273-297.
go back to reference Daoudi, K., & Kumar, A. J. (2015). Pitch-based speech perturbation measures using a novel GCI detection algorithm: Application to pathological voice classification. In Proceedings of the Interspeech. Daoudi, K., & Kumar, A. J. (2015). Pitch-based speech perturbation measures using a novel GCI detection algorithm: Application to pathological voice classification. In Proceedings of the Interspeech.
go back to reference Duffy, J. R. (2012). Motor speech disorders: Substrates, differential diagnosis, and management (3rd ed.). Elsevier Health Sciences. Duffy, J. R. (2012). Motor speech disorders: Substrates, differential diagnosis, and management (3rd ed.). Elsevier Health Sciences.
go back to reference Enderby, P. M. (1983). Frenchay dysarthria assessment. College Hill Press. Enderby, P. M. (1983). Frenchay dysarthria assessment. College Hill Press.
go back to reference Eyben, F., Weninger, F., Gross, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of the ACM international conference on multimedia (pp. 835–838). Eyben, F., Weninger, F., Gross, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of the ACM international conference on multimedia (pp. 835–838).
go back to reference Falk, T. H., Chan, W.-Y., & Shein, F. (2012). Characterization of atypical vocal source excitation temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication, 54, 622–631.CrossRef Falk, T. H., Chan, W.-Y., & Shein, F. (2012). Characterization of atypical vocal source excitation temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication, 54, 622–631.CrossRef
go back to reference Gillespie, S., Logan, Y.-Y., Moore, E., Laures-Gore, J., Russell, S., & Patel, R. (2017). Cross-database models for the classification of dysarthria presence. In Proceedings of the Interspeech (pp. 3127–3131). Gillespie, S., Logan, Y.-Y., Moore, E., Laures-Gore, J., Russell, S., & Patel, R. (2017). Cross-database models for the classification of dysarthria presence. In Proceedings of the Interspeech (pp. 3127–3131).
go back to reference Gurugubelli, K., & Vuppala, A. K. (2019). Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 6410–6414). Gurugubelli, K., & Vuppala, A. K. (2019). Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 6410–6414).
go back to reference Kim, J., Kumar, N., Tsiartas, A., Li, M., & Narayanan, S. S. (2015). Automatic intelligibility classification of sentence level pathological speech. Computer Speech & Language, 29, 132–144.CrossRef Kim, J., Kumar, N., Tsiartas, A., Li, M., & Narayanan, S. S. (2015). Automatic intelligibility classification of sentence level pathological speech. Computer Speech & Language, 29, 132–144.CrossRef
go back to reference Madhu Keerthana, Y., Kiran Reddy, M., & Sreenivasa Rao, K. (2019). CWT-based approach for epoch extraction from telephone quality speech. IEEE Signal Processing Letters, 26, 1107–1111.CrossRef Madhu Keerthana, Y., Kiran Reddy, M., & Sreenivasa Rao, K. (2019). CWT-based approach for epoch extraction from telephone quality speech. IEEE Signal Processing Letters, 26, 1107–1111.CrossRef
go back to reference Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.CrossRef Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.CrossRef
go back to reference Narendra, N. P., & Alku, P. (2018). Dysarthric speech classification using glottal features computed from non-words, words and sentences. In Proceedings of the Interspeech (pp. 3403–3307). Narendra, N. P., & Alku, P. (2018). Dysarthric speech classification using glottal features computed from non-words, words and sentences. In Proceedings of the Interspeech (pp. 3403–3307).
go back to reference Narendra, N. P., & Alku, P. (2019). Dysarthric speech classification from coded telephone speech using glottal features. Speech Communication, 110, 47–55.CrossRef Narendra, N. P., & Alku, P. (2019). Dysarthric speech classification from coded telephone speech using glottal features. Speech Communication, 110, 47–55.CrossRef
go back to reference Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.CrossRef Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.CrossRef
go back to reference Paja, M. S., & Falk, T. H. (2012). Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. In Proceedings of the Interspeech (pp. 62–65). Paja, M. S., & Falk, T. H. (2012). Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. In Proceedings of the Interspeech (pp. 62–65).
go back to reference Reddy, M. K., Alku, P., & Rao, K. S. (2020). Detection of specific language impairment in children using glottal source features. IEEE Access, 8, 15273–15279.CrossRef Reddy, M. K., Alku, P., & Rao, K. S. (2020). Detection of specific language impairment in children using glottal source features. IEEE Access, 8, 15273–15279.CrossRef
go back to reference Reddy, M. K., Helkkula, P., Keerthana, Y. M., Kaitue, K., Minkkinen, M., Tolppanen, H., et al. (2021). The automatic detection of heart failure using speech signals. Computer Speech & Language, 69, 101205.CrossRef Reddy, M. K., Helkkula, P., Keerthana, Y. M., Kaitue, K., Minkkinen, M., Tolppanen, H., et al. (2021). The automatic detection of heart failure using speech signals. Computer Speech & Language, 69, 101205.CrossRef
go back to reference Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 4605–4608). Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 4605–4608).
go back to reference Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46, 523–541.CrossRef Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46, 523–541.CrossRef
Metadata
Title
Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features
Authors
Y. Madhu Keerthana
K. Sreenivasa Rao
Pabitra Mitra
Publication date
30-10-2022
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 4/2022
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-022-10013-w

Other articles of this Issue 4/2022

International Journal of Speech Technology 4/2022 Go to the issue