Skip to main content
Top

2013 | OriginalPaper | Chapter

5. Established Methods

Authors : Ladan Baghai-Ravary, Steve W. Beet

Published in: Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Both pre-processing (feature extraction) and pattern classification techniques are discussed in this chapter. Traditionally, specialised parameters have been used for the analysis of speech disorders: harmonic-to-noise ratio, jitter, shimmer, and others. These have been devised using expert opinions from speech and language therapists and other professionals. They are typically calculated using widely available software packages, but still require trained personnel to collect and prepare the recordings, as well as to interpret the resulting parameters. More recently, researchers have also investigated many of the parameters or features used in speech and speaker recognition. Features such as the ubiquitous mel-frequency cepstral coefficients are often used, but so are numerous less common methods, such as formant frequencies, modulation spectra, chaos-theory parameters, and prosodic and phonological features. Each of these has had its fair share of success, but the most successful systems have generally used a combination of multiple features and/or multiple classification algorithms. Numerous methods for discriminating between disordered and normal speech, and sometimes between different forms of speech disorder, have been devised. They have typically been based on neural networks, Markov models, support vector machines, and other classifiers (both linear and non-linear), although Gaussian Mixture Models are probably the most widely used, robust, and successful so far.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Accardo A, Fabbro F, Mumolo E (1992) Analysis of normal and pathological voices via short-time fractal dimension. In: Proceedings of annual international conference of the IEEE engineering in medicine and biology society, vol 14, pp 1270–1271 Accardo A, Fabbro F, Mumolo E (1992) Analysis of normal and pathological voices via short-time fractal dimension. In: Proceedings of annual international conference of the IEEE engineering in medicine and biology society, vol 14, pp 1270–1271
go back to reference Alpan A, Schoentgen J, Maryn Y, Grenez F, Murphy P (2009) Cepstral analysis of vocal dysperiodicities in disordered connected speech. In: Proceedings of INTERSPEECH-2009, pp 959–962 Alpan A, Schoentgen J, Maryn Y, Grenez F, Murphy P (2009) Cepstral analysis of vocal dysperiodicities in disordered connected speech. In: Proceedings of INTERSPEECH-2009, pp 959–962
go back to reference Askenfelt A, Hammarberg B (1986) Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures. J Speech Hearing Res 29:50–64 Askenfelt A, Hammarberg B (1986) Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures. J Speech Hearing Res 29:50–64
go back to reference Awan SN, Scarpino SE (2004) Measures of vocal F0 from continuous speech samples: an inter-program comparison. J Speech Lang Pathol Audiol 28:122–131 Awan SN, Scarpino SE (2004) Measures of vocal F0 from continuous speech samples: an inter-program comparison. J Speech Lang Pathol Audiol 28:122–131
go back to reference Baken RI (1987) Clinical measurement of speech and voice. College Hill Press, Boston Baken RI (1987) Clinical measurement of speech and voice. College Hill Press, Boston
go back to reference Carmichael J, Wan V, Green P (2008) Combining neural network and rule-based systems for dysarthria diagnosis. In: Proceedings of INTERSPEECH-2008, pp 2226–2229 Carmichael J, Wan V, Green P (2008) Combining neural network and rule-based systems for dysarthria diagnosis. In: Proceedings of INTERSPEECH-2008, pp 2226–2229
go back to reference Castillo-Guerra E, Lovey DF (2003) A modern approach to dysarthria classification. In: 25th Annual Conference of the IEEE Engineering in Medicine and Biology Society, vol 3, 2257–2260. doi:10.1109/IEMBS.2003.1280248 Castillo-Guerra E, Lovey DF (2003) A modern approach to dysarthria classification. In: 25th Annual Conference of the IEEE Engineering in Medicine and Biology Society, vol 3, 2257–2260. doi:10.​1109/​IEMBS.​2003.​1280248
go back to reference de Krom G (1994) Consistency and reliability of voice quality ratings for different types of speech fragments. J Speech Hearing Res 37(5):965–1000 de Krom G (1994) Consistency and reliability of voice quality ratings for different types of speech fragments. J Speech Hearing Res 37(5):965–1000
go back to reference de Krom G (1995) Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. J Speech Hearing Res 38:794–811 de Krom G (1995) Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. J Speech Hearing Res 38:794–811
go back to reference Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Eng Med and Biol 2002. In: Proceedings of the 24th annual conference and annual fall meeting of the biomedical engineering society EMBS/BMES, vol 1, pp 182–183. doi:10.1109/IEMBS.2002.1134447 Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Eng Med and Biol 2002. In: Proceedings of the 24th annual conference and annual fall meeting of the biomedical engineering society EMBS/BMES, vol 1, pp 182–183. doi:10.​1109/​IEMBS.​2002.​1134447
go back to reference Gunn SR (1998) Support vector machines for classification and regression. School of Electronics and Computer Science technical report, University of Southampton Gunn SR (1998) Support vector machines for classification and regression. School of Electronics and Computer Science technical report, University of Southampton
go back to reference Haderlein T, Zorn D, Steidl S, Nöth E, Shozakai M, Schuster M (2006) Visualization of voice disorders using the Sammon transform. In: Proceedings of the 9th international conference on text, speech and dialogue (TSD ‘06). Lecture notes in computer science, vol 4188, pp 589–596 Haderlein T, Zorn D, Steidl S, Nöth E, Shozakai M, Schuster M (2006) Visualization of voice disorders using the Sammon transform. In: Proceedings of the 9th international conference on text, speech and dialogue (TSD ‘06). Lecture notes in computer science, vol 4188, pp 589–596
go back to reference Hariharan M, Paulraj MP, Yaacob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23(1):60–67 Hariharan M, Paulraj MP, Yaacob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23(1):60–67
go back to reference Henríquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, Díaz-de-María F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195CrossRef Henríquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, Díaz-de-María F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195CrossRef
go back to reference Horii Y (1979) Fundamental frequency perturbation observed in sustained phonation. J Speech Hearing Res 22:5–19 Horii Y (1979) Fundamental frequency perturbation observed in sustained phonation. J Speech Hearing Res 22:5–19
go back to reference Hosom JP, Shriberg L, Green JR (2004) Diagnostic assessment of childhood apraxia of speech using automatic speech recognition (ASR) methods. J Med Speech Lang Pathol 12(4):167–171 Hosom JP, Shriberg L, Green JR (2004) Diagnostic assessment of childhood apraxia of speech using automatic speech recognition (ASR) methods. J Med Speech Lang Pathol 12(4):167–171
go back to reference Kumar A, Mullick SK (1990) Attractor dimension, entropy and modelling of speech time series. Electron Lett 26(21):1790–1791CrossRef Kumar A, Mullick SK (1990) Attractor dimension, entropy and modelling of speech time series. Electron Lett 26(21):1790–1791CrossRef
go back to reference Llerena C, Alvarez L, Ayllon D (2011) Pitch detection in pathological voices driven by three tailored classical pitch detection algorithms. In: Recent advances in signal processing, computational geometry and systems theory. Proceeding of the ISCGAV’11 and ISTASC’11, pp 113–118 Llerena C, Alvarez L, Ayllon D (2011) Pitch detection in pathological voices driven by three tailored classical pitch detection algorithms. In: Recent advances in signal processing, computational geometry and systems theory. Proceeding of the ISCGAV’11 and ISTASC’11, pp 113–118
go back to reference Maier A, Haderlein T, Stelzle F, Nöth E, Nkenke E, Rosanowski F, Schützenberger A, Schuster M (2010) Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J Audio Speech Music Process. doi:10.1155/2010/926951 Maier A, Haderlein T, Stelzle F, Nöth E, Nkenke E, Rosanowski F, Schützenberger A, Schuster M (2010) Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J Audio Speech Music Process. doi:10.​1155/​2010/​926951
go back to reference Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using biologically inspired amplitude-modulation features. In: IEEE international conference on acoustics, speech, and. signal processing ICASSP-2005, pp 873–876 Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using biologically inspired amplitude-modulation features. In: IEEE international conference on acoustics, speech, and. signal processing ICASSP-2005, pp 873–876
go back to reference Markaki M, Stylianou Y (2009) Using modulation spectra for voice pathology detection and classification. In: Proceedings of the IEEE conference on engineering in medicine and biology society 2009, pp 2514–2517 Markaki M, Stylianou Y (2009) Using modulation spectra for voice pathology detection and classification. In: Proceedings of the IEEE conference on engineering in medicine and biology society 2009, pp 2514–2517
go back to reference Markaki M, Stylianou Y, Arias-Londono JD, Godino-Llorente JI (2010) Dysphonia detection based on modulation spectral features and cepstral coefficients. In Proceedings of ICASSP-2010, pp 5162–5165. doi:10.1109/ICASSP.2010.5495020 Markaki M, Stylianou Y, Arias-Londono JD, Godino-Llorente JI (2010) Dysphonia detection based on modulation spectral features and cepstral coefficients. In Proceedings of ICASSP-2010, pp 5162–5165. doi:10.​1109/​ICASSP.​2010.​5495020
go back to reference Middag C, Martens J-P, van Nuffelen G, de Bodt M (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP J Adv Signal Process. doi:10.1155/2009/629030 MATH Middag C, Martens J-P, van Nuffelen G, de Bodt M (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP J Adv Signal Process. doi:10.​1155/​2009/​629030 MATH
go back to reference Moakes PA, Beet S (1994) Analysis of non-linear speech generating dynamics. In Proceedings of 3rd international conference on spoken language processing (ICSLP 94), pp 1039–1042 Moakes PA, Beet S (1994) Analysis of non-linear speech generating dynamics. In Proceedings of 3rd international conference on spoken language processing (ICSLP 94), pp 1039–1042
go back to reference Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16:69–88CrossRef Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16:69–88CrossRef
go back to reference Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M (2011) Formant analysis in dysphonic patients and automatic Arabic digit speech recognition. BioMed Eng OnLine 10:41. doi:10.1186/1475-925X-10-41 CrossRef Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M (2011) Formant analysis in dysphonic patients and automatic Arabic digit speech recognition. BioMed Eng OnLine 10:41. doi:10.​1186/​1475-925X-10-41 CrossRef
go back to reference Parsa V, Jamieson DG (2001) Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. J Speech Lang Hear Res 44:327–339CrossRef Parsa V, Jamieson DG (2001) Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. J Speech Lang Hear Res 44:327–339CrossRef
go back to reference Pinto J, Lovitt A, Hermansky H (2007) Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting. In Proceedings of INTERSPEECH-2007, pp 1817–1820 Pinto J, Lovitt A, Hermansky H (2007) Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting. In Proceedings of INTERSPEECH-2007, pp 1817–1820
go back to reference Reilly RB, Moran R, Lacy PD (2004) Voice pathology assessment based on a dialogue system and speech analysis. In Proc Amer Assoc Artif Intell Fall Symp Dialogue Syst Health Commun 104–109 Reilly RB, Moran R, Lacy PD (2004) Voice pathology assessment based on a dialogue system and speech analysis. In Proc Amer Assoc Artif Intell Fall Symp Dialogue Syst Health Commun 104–109
go back to reference Ringeval F, Demouy J, Szaszák G, Chetouani M, Robel L, Xavier J, Cohen D, Plaza M (2010) Automatic intonation recognition for the prosodic assessment of language-impaired children. IEEE Trans Audio, Speech, and Lang Process 19(5):1328–1342. doi:10.1109/TASL.2010.2090147 CrossRef Ringeval F, Demouy J, Szaszák G, Chetouani M, Robel L, Xavier J, Cohen D, Plaza M (2010) Automatic intonation recognition for the prosodic assessment of language-impaired children. IEEE Trans Audio, Speech, and Lang Process 19(5):1328–1342. doi:10.​1109/​TASL.​2010.​2090147 CrossRef
go back to reference Salhi L, Mourad T, Cherif A (2010) Voice disorders identification using multilayer neural network. Int Arab J Inf Technol 7(2):177–185 Salhi L, Mourad T, Cherif A (2010) Voice disorders identification using multilayer neural network. Int Arab J Inf Technol 7(2):177–185
go back to reference Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409CrossRef Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409CrossRef
go back to reference Silva DG, Oliveira LC, Andrea M (2009) Jitter estimation algorithms for detection of pathological voices. EURASIP J Adv Signal Process 1–10. doi:10.1155/2009/567875 Silva DG, Oliveira LC, Andrea M (2009) Jitter estimation algorithms for detection of pathological voices. EURASIP J Adv Signal Process 1–10. doi:10.​1155/​2009/​567875
go back to reference Steidl S, Stemmer G, Hacker C, Nöth E (2004) Adaption in the pronunciation space for non-native speech recognition. In Proc Int Conf on Spoken Lang Process ICSLP 318–321 Steidl S, Stemmer G, Hacker C, Nöth E (2004) Adaption in the pronunciation space for non-native speech recognition. In Proc Int Conf on Spoken Lang Process ICSLP 318–321
go back to reference Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Trans Biomed Eng 59(5):1264–1271CrossRef Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Trans Biomed Eng 59(5):1264–1271CrossRef
Metadata
Title
Established Methods
Authors
Ladan Baghai-Ravary
Steve W. Beet
Copyright Year
2013
Publisher
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-4574-6_5