nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Multi-class Classification of Voice Disorders Using Deep Transfer Learning

verfasst von : Mehtab Ur Rahman, Cem Direkoglu

Erschienen in: Computing, Internet of Things and Data Analytics

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Voice disorders are a widespread issue affecting people of all ages, and accurate diagnosis is crucial for effective treatment. With the recent development of artificial intelligence-based audio and speech processing, research on detection and classification of voice disorders has increased. However, existing work has mostly focused on the binary (two class) classification of voice disorders. Some researchers have also explored multi-class classification, but their results are not promising. In this paper, a framework is proposed for the multi-class classification of voice disorders using OpenL3 embeddings. A pre-trained OpenL3 model is utilized to extract high-level embedding features from the mel spectrogram. Then different classifiers are evaluated after the neighbourhood component analysis (NCA) based feature selection. Random Forest (RF), Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) are employed separately to classify the selected features. The evaluation and comparison are performed on a balanced subset of the Saarbruecken voice database (SVD). Without any speech enhancement preprocessing, our best model, OpenL3-KNN improves the existing work accuracy by 4.9% and F1 score by 8.7%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel A Novel Method to Detect High Impedance Fault in Electric Vehicle Integrated Distribution System

Nächstes Kapitel Coronary Artery Blockage Detection by Automated Segmentation of Vessels in X-Ray Angiograms

Ramig, L.O., Verdolini, K.: Treatment efficacy: voice disorders. J. Speech Lang. Hear. Res. 41(1), S101–S116 (1998)CrossRef

American Speech-Language-Hearing Association. Voice disorders.[Practice Portal]. Accessed 30 Dec 2021. https://www.asha.org/Practice-Portal/Clinical-Topics/Voice-Disorders

Chaiani, M., Selouani, S.A., Boudraa, M., Yakoub, M.S.: Voice disorder classification using speech enhancement and deep learning models. Biocybern. Biomed. Eng. 42(2), 463–480 (2022)CrossRef

Barche, P., Gurugubelli, K., Vuppala, A.K.: Towards automatic assessment of voice disorders: a clinical approach. In: INTERSPEECH, pp. 2537–2541 (2020)

Arias-Londoño, J.D., Godino-Llorente, J.I., Markaki, M., Stylianou, Y.: On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices. Logoped. Phoniatr. Vocol. 36(2), 60–69 (2011)CrossRef

Fonseca, E.S., Guido, R.C., Scalassara, P.R., Maciel, C.D., Pereira, J.C.: Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders. Comput. Biol. Med. 37(4), 571–578 (2007)CrossRef

Arias-Londoño, J.D., Godino-Llorente, J.I., Sáenz-Lechón, N., Osma-Ruiz, V., Castellanos-Domínguez, G.: An improved method for voice pathology detection by means of a HMM-based feature space transformation. Pattern Recogn. 43(9), 3100–3112 (2010)CrossRef

Chen, L., Wang, C., Chen, J., Xiang, Z., Hu, X.: Voice disorder identification by using Hilbert-Huang transform (HHT) and K nearest neighbor (KNN). J. Voice 35(6), 932-e1 (2021)CrossRef

Chen, L., Chen, J.: Deep neural network for automatic classification of pathological voice signals. J. Voice 36(2), 288-e15 (2022)CrossRef

10.

Wu, H., Soraghan, J., Lowit, A., Di Caterina, G.: A deep learning method for pathological voice detection using convolutional deep belief networks. In: Interspeech 2018 (2018)

11.

Tirronen, S., Kadiri, S.R., Alku, P.: Hierarchical multi-class classification of voice disorders using self-supervised models and glottal features. IEEE Open J. Signal Process. 4, 80–88 (2023)CrossRef

12.

Ding, H., Gu, Z., Dai, P., Zhou, Z., Wang, L., Wu, X.: Deep connected attention (DCA) ResNet for robust voice pathology detection and classification. Biomed. Signal Process. Control 70, 102973 (2021)CrossRef

13.

Islam, R., Tarique, M.: A novel convolutional neural network based dysphonic voice detection algorithm using chromagram. Int. J. Electr. Comput. Eng. (2088–8708) 12(5) (2022)

14.

Junior, S.B., Guido, R.C., Aguiar, G.J., Santana, E.J., Junior, M.L.P., Patil, H.A.: Multiple voice disorders in the same individual: investigating handcrafted features, multi-label classification algorithms, and base-learners. Speech Commun. 102952 (2023)

15.

Ribas, D., Pastor, M.A., Miguel, A., Martínez, D., Ortega, A., Lleida, E.: Automatic voice disorder detection using self-supervised representations. IEEE Access 11, 14915–14927 (2023)CrossRef

16.

Robotti, C., et al.: Treatment of relapsing functional and organic dysphonia: a narrative literature review. Acta Otorhinolaryngol. Ital. 43(2 Suppl 1), S84 (2023)

17.

Schenck, A., Hilger, A.I., Levant, S., Kim, J.H., Lester-Smith, R.A., Larson, C.: The effect of pitch and loudness auditory feedback perturbations on vocal quality during sustained phonation. J. Voice 37(1), 37–47 (2023)CrossRef

18.

Mohammed, M.A., et al.: Voice pathology detection and classification using convolutional neural network model. Appl. Sci. 10(11), 3723 (2020)CrossRef

19.

Vavrek, L., Hires, M., Kumar, D., Drotar, P.: Deep convolutional neural network for detection of pathological speech. In 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 000245–000250. IEEE (2021)

20.

Peng, X., Xu, H., Liu, J., Wang, J., He, C.: Voice disorder classification using convolutional neural network based on deep transfer learning. Sci. Rep. 13(1), 7264 (2023)CrossRef

21.

Gumelar, A.B., Yuniarno, E.M., Anggraeni, W., Sugiarto, I., Mahindara, V.R., Purnomo, M.H.: Enhancing detection of pathological voice disorder based on deep VGG-16 CNN. In: 2020 3rd International Conference on Biomedical Engineering (IBIOMED), pp. 28–33. IEEE (2020)

22.

Zakaria, S., Thanush, S., Mugilan, M.: Voice disorder identification using convolutional neural network. In 2022 1st International Conference on Computational Science and Technology (ICCST), pp. 923–927. IEEE (2022)

23.

Cramer, J., Wu, H.H., Salamon, J., Bello, J.P.: Look, listen, and learn more: design choices for deep audio embeddings. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3852–3856. IEEE (2019)

24.

Woldert-Jokisz, B.: Saarbruecken voice database (2007). https://www.stimmdatenbank.coli.uni-saarland.de/help_en.php4

25.

Goldberger, J., Hinton, G.E., Roweis, S., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Advances in Neural Information Processing Systems, vol. 17 (2004)

26.

Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Cham (1999)

27.

Alpaydin, E.: Introduction to Machine Learning. MIT press, Cambridge (2020)

28.

Belgiu, M., Drăgu, L.: Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogramm. Remote. Sens. 114, 24–31 (2016)CrossRef

Titel: Multi-class Classification of Voice Disorders Using Deep Transfer Learning
verfasst von: Mehtab Ur Rahman
Cem Direkoglu
Verlag: Springer Nature Switzerland
Buch: Computing, Internet of Things and Data Analytics
Print ISBN: 978-3-031-53716-5

Electronic ISBN: 978-3-031-53717-2

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-031-53717-2_25

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner