Skip to main content

2024 | OriginalPaper | Buchkapitel

Multi-class Classification of Voice Disorders Using Deep Transfer Learning

verfasst von : Mehtab Ur Rahman, Cem Direkoglu

Erschienen in: Computing, Internet of Things and Data Analytics

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Voice disorders are a widespread issue affecting people of all ages, and accurate diagnosis is crucial for effective treatment. With the recent development of artificial intelligence-based audio and speech processing, research on detection and classification of voice disorders has increased. However, existing work has mostly focused on the binary (two class) classification of voice disorders. Some researchers have also explored multi-class classification, but their results are not promising. In this paper, a framework is proposed for the multi-class classification of voice disorders using OpenL3 embeddings. A pre-trained OpenL3 model is utilized to extract high-level embedding features from the mel spectrogram. Then different classifiers are evaluated after the neighbourhood component analysis (NCA) based feature selection. Random Forest (RF), Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) are employed separately to classify the selected features. The evaluation and comparison are performed on a balanced subset of the Saarbruecken voice database (SVD). Without any speech enhancement preprocessing, our best model, OpenL3-KNN improves the existing work accuracy by 4.9% and F1 score by 8.7%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ramig, L.O., Verdolini, K.: Treatment efficacy: voice disorders. J. Speech Lang. Hear. Res. 41(1), S101–S116 (1998)CrossRef Ramig, L.O., Verdolini, K.: Treatment efficacy: voice disorders. J. Speech Lang. Hear. Res. 41(1), S101–S116 (1998)CrossRef
3.
Zurück zum Zitat Chaiani, M., Selouani, S.A., Boudraa, M., Yakoub, M.S.: Voice disorder classification using speech enhancement and deep learning models. Biocybern. Biomed. Eng. 42(2), 463–480 (2022)CrossRef Chaiani, M., Selouani, S.A., Boudraa, M., Yakoub, M.S.: Voice disorder classification using speech enhancement and deep learning models. Biocybern. Biomed. Eng. 42(2), 463–480 (2022)CrossRef
4.
Zurück zum Zitat Barche, P., Gurugubelli, K., Vuppala, A.K.: Towards automatic assessment of voice disorders: a clinical approach. In: INTERSPEECH, pp. 2537–2541 (2020) Barche, P., Gurugubelli, K., Vuppala, A.K.: Towards automatic assessment of voice disorders: a clinical approach. In: INTERSPEECH, pp. 2537–2541 (2020)
5.
Zurück zum Zitat Arias-Londoño, J.D., Godino-Llorente, J.I., Markaki, M., Stylianou, Y.: On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices. Logoped. Phoniatr. Vocol. 36(2), 60–69 (2011)CrossRef Arias-Londoño, J.D., Godino-Llorente, J.I., Markaki, M., Stylianou, Y.: On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices. Logoped. Phoniatr. Vocol. 36(2), 60–69 (2011)CrossRef
6.
Zurück zum Zitat Fonseca, E.S., Guido, R.C., Scalassara, P.R., Maciel, C.D., Pereira, J.C.: Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders. Comput. Biol. Med. 37(4), 571–578 (2007)CrossRef Fonseca, E.S., Guido, R.C., Scalassara, P.R., Maciel, C.D., Pereira, J.C.: Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders. Comput. Biol. Med. 37(4), 571–578 (2007)CrossRef
7.
Zurück zum Zitat Arias-Londoño, J.D., Godino-Llorente, J.I., Sáenz-Lechón, N., Osma-Ruiz, V., Castellanos-Domínguez, G.: An improved method for voice pathology detection by means of a HMM-based feature space transformation. Pattern Recogn. 43(9), 3100–3112 (2010)CrossRef Arias-Londoño, J.D., Godino-Llorente, J.I., Sáenz-Lechón, N., Osma-Ruiz, V., Castellanos-Domínguez, G.: An improved method for voice pathology detection by means of a HMM-based feature space transformation. Pattern Recogn. 43(9), 3100–3112 (2010)CrossRef
8.
Zurück zum Zitat Chen, L., Wang, C., Chen, J., Xiang, Z., Hu, X.: Voice disorder identification by using Hilbert-Huang transform (HHT) and K nearest neighbor (KNN). J. Voice 35(6), 932-e1 (2021)CrossRef Chen, L., Wang, C., Chen, J., Xiang, Z., Hu, X.: Voice disorder identification by using Hilbert-Huang transform (HHT) and K nearest neighbor (KNN). J. Voice 35(6), 932-e1 (2021)CrossRef
9.
Zurück zum Zitat Chen, L., Chen, J.: Deep neural network for automatic classification of pathological voice signals. J. Voice 36(2), 288-e15 (2022)CrossRef Chen, L., Chen, J.: Deep neural network for automatic classification of pathological voice signals. J. Voice 36(2), 288-e15 (2022)CrossRef
10.
Zurück zum Zitat Wu, H., Soraghan, J., Lowit, A., Di Caterina, G.: A deep learning method for pathological voice detection using convolutional deep belief networks. In: Interspeech 2018 (2018) Wu, H., Soraghan, J., Lowit, A., Di Caterina, G.: A deep learning method for pathological voice detection using convolutional deep belief networks. In: Interspeech 2018 (2018)
11.
Zurück zum Zitat Tirronen, S., Kadiri, S.R., Alku, P.: Hierarchical multi-class classification of voice disorders using self-supervised models and glottal features. IEEE Open J. Signal Process. 4, 80–88 (2023)CrossRef Tirronen, S., Kadiri, S.R., Alku, P.: Hierarchical multi-class classification of voice disorders using self-supervised models and glottal features. IEEE Open J. Signal Process. 4, 80–88 (2023)CrossRef
12.
Zurück zum Zitat Ding, H., Gu, Z., Dai, P., Zhou, Z., Wang, L., Wu, X.: Deep connected attention (DCA) ResNet for robust voice pathology detection and classification. Biomed. Signal Process. Control 70, 102973 (2021)CrossRef Ding, H., Gu, Z., Dai, P., Zhou, Z., Wang, L., Wu, X.: Deep connected attention (DCA) ResNet for robust voice pathology detection and classification. Biomed. Signal Process. Control 70, 102973 (2021)CrossRef
13.
Zurück zum Zitat Islam, R., Tarique, M.: A novel convolutional neural network based dysphonic voice detection algorithm using chromagram. Int. J. Electr. Comput. Eng. (2088–8708) 12(5) (2022) Islam, R., Tarique, M.: A novel convolutional neural network based dysphonic voice detection algorithm using chromagram. Int. J. Electr. Comput. Eng. (2088–8708) 12(5) (2022)
14.
Zurück zum Zitat Junior, S.B., Guido, R.C., Aguiar, G.J., Santana, E.J., Junior, M.L.P., Patil, H.A.: Multiple voice disorders in the same individual: investigating handcrafted features, multi-label classification algorithms, and base-learners. Speech Commun. 102952 (2023) Junior, S.B., Guido, R.C., Aguiar, G.J., Santana, E.J., Junior, M.L.P., Patil, H.A.: Multiple voice disorders in the same individual: investigating handcrafted features, multi-label classification algorithms, and base-learners. Speech Commun. 102952 (2023)
15.
Zurück zum Zitat Ribas, D., Pastor, M.A., Miguel, A., Martínez, D., Ortega, A., Lleida, E.: Automatic voice disorder detection using self-supervised representations. IEEE Access 11, 14915–14927 (2023)CrossRef Ribas, D., Pastor, M.A., Miguel, A., Martínez, D., Ortega, A., Lleida, E.: Automatic voice disorder detection using self-supervised representations. IEEE Access 11, 14915–14927 (2023)CrossRef
16.
Zurück zum Zitat Robotti, C., et al.: Treatment of relapsing functional and organic dysphonia: a narrative literature review. Acta Otorhinolaryngol. Ital. 43(2 Suppl 1), S84 (2023) Robotti, C., et al.: Treatment of relapsing functional and organic dysphonia: a narrative literature review. Acta Otorhinolaryngol. Ital. 43(2 Suppl 1), S84 (2023)
17.
Zurück zum Zitat Schenck, A., Hilger, A.I., Levant, S., Kim, J.H., Lester-Smith, R.A., Larson, C.: The effect of pitch and loudness auditory feedback perturbations on vocal quality during sustained phonation. J. Voice 37(1), 37–47 (2023)CrossRef Schenck, A., Hilger, A.I., Levant, S., Kim, J.H., Lester-Smith, R.A., Larson, C.: The effect of pitch and loudness auditory feedback perturbations on vocal quality during sustained phonation. J. Voice 37(1), 37–47 (2023)CrossRef
18.
Zurück zum Zitat Mohammed, M.A., et al.: Voice pathology detection and classification using convolutional neural network model. Appl. Sci. 10(11), 3723 (2020)CrossRef Mohammed, M.A., et al.: Voice pathology detection and classification using convolutional neural network model. Appl. Sci. 10(11), 3723 (2020)CrossRef
19.
Zurück zum Zitat Vavrek, L., Hires, M., Kumar, D., Drotar, P.: Deep convolutional neural network for detection of pathological speech. In 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 000245–000250. IEEE (2021) Vavrek, L., Hires, M., Kumar, D., Drotar, P.: Deep convolutional neural network for detection of pathological speech. In 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 000245–000250. IEEE (2021)
20.
Zurück zum Zitat Peng, X., Xu, H., Liu, J., Wang, J., He, C.: Voice disorder classification using convolutional neural network based on deep transfer learning. Sci. Rep. 13(1), 7264 (2023)CrossRef Peng, X., Xu, H., Liu, J., Wang, J., He, C.: Voice disorder classification using convolutional neural network based on deep transfer learning. Sci. Rep. 13(1), 7264 (2023)CrossRef
21.
Zurück zum Zitat Gumelar, A.B., Yuniarno, E.M., Anggraeni, W., Sugiarto, I., Mahindara, V.R., Purnomo, M.H.: Enhancing detection of pathological voice disorder based on deep VGG-16 CNN. In: 2020 3rd International Conference on Biomedical Engineering (IBIOMED), pp. 28–33. IEEE (2020) Gumelar, A.B., Yuniarno, E.M., Anggraeni, W., Sugiarto, I., Mahindara, V.R., Purnomo, M.H.: Enhancing detection of pathological voice disorder based on deep VGG-16 CNN. In: 2020 3rd International Conference on Biomedical Engineering (IBIOMED), pp. 28–33. IEEE (2020)
22.
Zurück zum Zitat Zakaria, S., Thanush, S., Mugilan, M.: Voice disorder identification using convolutional neural network. In 2022 1st International Conference on Computational Science and Technology (ICCST), pp. 923–927. IEEE (2022) Zakaria, S., Thanush, S., Mugilan, M.: Voice disorder identification using convolutional neural network. In 2022 1st International Conference on Computational Science and Technology (ICCST), pp. 923–927. IEEE (2022)
23.
Zurück zum Zitat Cramer, J., Wu, H.H., Salamon, J., Bello, J.P.: Look, listen, and learn more: design choices for deep audio embeddings. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3852–3856. IEEE (2019) Cramer, J., Wu, H.H., Salamon, J., Bello, J.P.: Look, listen, and learn more: design choices for deep audio embeddings. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3852–3856. IEEE (2019)
25.
Zurück zum Zitat Goldberger, J., Hinton, G.E., Roweis, S., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Advances in Neural Information Processing Systems, vol. 17 (2004) Goldberger, J., Hinton, G.E., Roweis, S., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Advances in Neural Information Processing Systems, vol. 17 (2004)
26.
Zurück zum Zitat Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Cham (1999) Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Cham (1999)
27.
Zurück zum Zitat Alpaydin, E.: Introduction to Machine Learning. MIT press, Cambridge (2020) Alpaydin, E.: Introduction to Machine Learning. MIT press, Cambridge (2020)
28.
Zurück zum Zitat Belgiu, M., Drăgu, L.: Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogramm. Remote. Sens. 114, 24–31 (2016)CrossRef Belgiu, M., Drăgu, L.: Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogramm. Remote. Sens. 114, 24–31 (2016)CrossRef
Metadaten
Titel
Multi-class Classification of Voice Disorders Using Deep Transfer Learning
verfasst von
Mehtab Ur Rahman
Cem Direkoglu
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-53717-2_25

Premium Partner