Top

International Journal of Speech Technology

Published in:

30-10-2022

Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features

Authors: Y. Madhu Keerthana, K. Sreenivasa Rao, Pabitra Mitra

Published in: International Journal of Speech Technology | Issue 4/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Dysarthria is a motor speech impairment that impacts verbal articulation and co-ordination. Detecting dysarthria is a primary and essential step for early diagnosis and treatment. In this paper, we attempt dysarthric speech detection from telephone quality speech by using pitch perturbation (PP) measures computed with the recently introduced continuous wavelet transform (CWT)-based epoch extraction approach. This approach has the strong advantage that it is highly robust to telephone channel degradations. Six PP measures were computed from the extracted epochs. For comparison, the PP measures were also derived using two well-known epoch extraction methods, namely, zero-frequency filtering (ZFF) and dynamic programming phase slope algorithm (DYPSA). The experiments were carried out using the TORGO dysarthric speech database, which consists of speech from 7 healthy speakers and 8 dysarthric speakers. The G.191 software tools were used to convert clean speech to telephone speech. The results show that the PP measures computed with the CWT-based approach can better discriminate dysarthric and healthy speakers under telephone environment than those extracted with the other two epoch extraction methods.

previous article Non-intrusive speech quality assessment using context-aware neural networks

next article Blind identification of the inverse of SIMO system and deconvolution with Kalman filter

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Adiga, N., Vikram, C. M., Pullela, K., & Prasanna, S. M. (2017). Zero frequency filter based analysis of voice disorders. In Proceedings of the Interspeech 2017, August 20–24, Stockholm, Sweden.

Berisha, V., Liss, J., Sandoval, S., Utianski, R., & Spanias, A. (2014). Modeling pathological speech perception from data with similarity labels. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 915–919).

Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017). Automatic assessment of dysarthria severity level using audio descriptors. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 5070–5074).

Black, A. W., King, S., & Tokuda, K. (2009). The blizzard challenge 2009. In Proceedings of the of blizzard challenge (pp. 1–24).

Cortes, C., & Vapnik, V. (1995). Two-stage learning kernel algorithms. Machine Learning, 20(3), 273-297.

Daoudi, K., & Kumar, A. J. (2015). Pitch-based speech perturbation measures using a novel GCI detection algorithm: Application to pathological voice classification. In Proceedings of the Interspeech.

Duffy, J. R. (2012). Motor speech disorders: Substrates, differential diagnosis, and management (3rd ed.). Elsevier Health Sciences.

Enderby, P. M. (1983). Frenchay dysarthria assessment. College Hill Press.

Eyben, F., Weninger, F., Gross, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of the ACM international conference on multimedia (pp. 835–838).

Falk, T. H., Chan, W.-Y., & Shein, F. (2012). Characterization of atypical vocal source excitation temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication, 54, 622–631.CrossRef

Gillespie, S., Logan, Y.-Y., Moore, E., Laures-Gore, J., Russell, S., & Patel, R. (2017). Cross-database models for the classification of dysarthria presence. In Proceedings of the Interspeech (pp. 3127–3131).

Gurugubelli, K., & Vuppala, A. K. (2019). Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 6410–6414).

ITU-T, Recommendation G. 191. (2005). Software tools for speech and audio coding standardization. International Telecommunication Union. Retrieved from https://www.itu.int/rec/T-REC-G.191/en

Kim, J., Kumar, N., Tsiartas, A., Li, M., & Narayanan, S. S. (2015). Automatic intelligibility classification of sentence level pathological speech. Computer Speech & Language, 29, 132–144.CrossRef

Madhu Keerthana, Y., Kiran Reddy, M., & Sreenivasa Rao, K. (2019). CWT-based approach for epoch extraction from telephone quality speech. IEEE Signal Processing Letters, 26, 1107–1111.CrossRef

Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.CrossRef

Narendra, N. P., & Alku, P. (2018). Dysarthric speech classification using glottal features computed from non-words, words and sentences. In Proceedings of the Interspeech (pp. 3403–3307).

Narendra, N. P., & Alku, P. (2019). Dysarthric speech classification from coded telephone speech using glottal features. Speech Communication, 110, 47–55.CrossRef

Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.CrossRef

Paja, M. S., & Falk, T. H. (2012). Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. In Proceedings of the Interspeech (pp. 62–65).

Reddy, M. K., Alku, P., & Rao, K. S. (2020). Detection of specific language impairment in children using glottal source features. IEEE Access, 8, 15273–15279.CrossRef

Reddy, M. K., Helkkula, P., Keerthana, Y. M., Kaitue, K., Minkkinen, M., Tolppanen, H., et al. (2021). The automatic detection of heart failure using speech signals. Computer Speech & Language, 69, 101205.CrossRef

Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP) (pp. 4605–4608).

Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46, 523–541.CrossRef

Title: Dysarthric speech detection from telephone quality speech using epoch-based pitch perturbation features
Authors: Y. Madhu Keerthana
K. Sreenivasa Rao
Pabitra Mitra
Publication date: 30-10-2022
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 4/2022
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-022-10013-w

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 4/2022

An end-to-end TTS model with pronunciation predictor

AI driven human–computer interaction design framework of virtual environment based on comprehensive semantic data analysis with feature extraction

Universal and accent-discriminative encoders for conformer-based accent-invariant speech recognition

Construction of complex environment speech signal communication system based on 5G and AI driven feature extraction techniques

HTK-based speech recognition and corpus-based English vocabulary online guiding system

Handling high dimensional features by ensemble learning for emotion identification from speech signal