Top

Published in:

2013 | OriginalPaper | Chapter

5. Established Methods

Authors : Ladan Baghai-Ravary, Steve W. Beet

Published in: Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Both pre-processing (feature extraction) and pattern classification techniques are discussed in this chapter. Traditionally, specialised parameters have been used for the analysis of speech disorders: harmonic-to-noise ratio, jitter, shimmer, and others. These have been devised using expert opinions from speech and language therapists and other professionals. They are typically calculated using widely available software packages, but still require trained personnel to collect and prepare the recordings, as well as to interpret the resulting parameters. More recently, researchers have also investigated many of the parameters or features used in speech and speaker recognition. Features such as the ubiquitous mel-frequency cepstral coefficients are often used, but so are numerous less common methods, such as formant frequencies, modulation spectra, chaos-theory parameters, and prosodic and phonological features. Each of these has had its fair share of success, but the most successful systems have generally used a combination of multiple features and/or multiple classification algorithms. Numerous methods for discriminating between disordered and normal speech, and sometimes between different forms of speech disorder, have been devised. They have typically been based on neural networks, Markov models, support vector machines, and other classifiers (both linear and non-linear), although Gaussian Mixture Models are probably the most widely used, robust, and successful so far.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Technology and Implementation

next chapter Novel Approaches

Accardo A, Fabbro F, Mumolo E (1992) Analysis of normal and pathological voices via short-time fractal dimension. In: Proceedings of annual international conference of the IEEE engineering in medicine and biology society, vol 14, pp 1270–1271

Alpan A, Schoentgen J, Maryn Y, Grenez F, Murphy P (2009) Cepstral analysis of vocal dysperiodicities in disordered connected speech. In: Proceedings of INTERSPEECH-2009, pp 959–962

Askenfelt A, Hammarberg B (1986) Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures. J Speech Hearing Res 29:50–64

Awan SN, Scarpino SE (2004) Measures of vocal F0 from continuous speech samples: an inter-program comparison. J Speech Lang Pathol Audiol 28:122–131

Baken RI (1987) Clinical measurement of speech and voice. College Hill Press, Boston

Carmichael J, Wan V, Green P (2008) Combining neural network and rule-based systems for dysarthria diagnosis. In: Proceedings of INTERSPEECH-2008, pp 2226–2229

Castillo-Guerra E, Lovey DF (2003) A modern approach to dysarthria classification. In: 25th Annual Conference of the IEEE Engineering in Medicine and Biology Society, vol 3, 2257–2260. doi:10.1109/IEMBS.2003.1280248

Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans Audio, Speech Lang Process 20(1):30–42. doi:10.1109/TASL.2011.2134090 CrossRef

de Krom G (1994) Consistency and reliability of voice quality ratings for different types of speech fragments. J Speech Hearing Res 37(5):965–1000

de Krom G (1995) Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. J Speech Hearing Res 38:794–811

Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Eng Med and Biol 2002. In: Proceedings of the 24th annual conference and annual fall meeting of the biomedical engineering society EMBS/BMES, vol 1, pp 182–183. doi:10.1109/IEMBS.2002.1134447

Droppo J, Acero A (2010). In: IEEE international conference on acoustics speech and signal processing ICASSP-2010, pp 4358–4361. doi:10.1109/ICASSP.2010.5495652

Ganapathiraju A, Hamaker JE, Picone J (2004) Applications of support vector machines to speech recognition. IEEE Trans Signal Process 52(8):2348–2355. doi:10.1109/TSP.2004.831018 CrossRef

Gunn SR (1998) Support vector machines for classification and regression. School of Electronics and Computer Science technical report, University of Southampton

Haderlein T, Zorn D, Steidl S, Nöth E, Shozakai M, Schuster M (2006) Visualization of voice disorders using the Sammon transform. In: Proceedings of the 9th international conference on text, speech and dialogue (TSD ‘06). Lecture notes in computer science, vol 4188, pp 589–596

Hariharan M, Paulraj MP, Yaacob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23(1):60–67

Henríquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, Díaz-de-María F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195CrossRef

Horii Y (1979) Fundamental frequency perturbation observed in sustained phonation. J Speech Hearing Res 22:5–19

Hosom JP, Shriberg L, Green JR (2004) Diagnostic assessment of childhood apraxia of speech using automatic speech recognition (ASR) methods. J Med Speech Lang Pathol 12(4):167–171

Kumar A, Mullick SK (1990) Attractor dimension, entropy and modelling of speech time series. Electron Lett 26(21):1790–1791CrossRef

Llerena C, Alvarez L, Ayllon D (2011) Pitch detection in pathological voices driven by three tailored classical pitch detection algorithms. In: Recent advances in signal processing, computational geometry and systems theory. Proceeding of the ISCGAV’11 and ISTASC’11, pp 113–118

Maier A, Haderlein T, Eysholdt U, Rosanowski F, Batliner A, Schuster M, Nöth E (2009) PEAKS—a system for the automatic evaluation of voice and speech disorders. Speech Commun 51(5):425–437. doi:10.1016/j.specom.2009.01.004 CrossRef

Maier A, Haderlein T, Stelzle F, Nöth E, Nkenke E, Rosanowski F, Schützenberger A, Schuster M (2010) Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J Audio Speech Music Process. doi:10.1155/2010/926951

Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using biologically inspired amplitude-modulation features. In: IEEE international conference on acoustics, speech, and. signal processing ICASSP-2005, pp 873–876

Markaki M, Stylianou Y (2009) Using modulation spectra for voice pathology detection and classification. In: Proceedings of the IEEE conference on engineering in medicine and biology society 2009, pp 2514–2517

Markaki M, Stylianou Y, Arias-Londono JD, Godino-Llorente JI (2010) Dysphonia detection based on modulation spectral features and cepstral coefficients. In Proceedings of ICASSP-2010, pp 5162–5165. doi:10.1109/ICASSP.2010.5495020

Middag C, Martens J-P, van Nuffelen G, de Bodt M (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP J Adv Signal Process. doi:10.1155/2009/629030 MATH

Moakes PA, Beet S (1994) Analysis of non-linear speech generating dynamics. In Proceedings of 3rd international conference on spoken language processing (ICSLP 94), pp 1039–1042

Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16:69–88CrossRef

Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M (2011) Formant analysis in dysphonic patients and automatic Arabic digit speech recognition. BioMed Eng OnLine 10:41. doi:10.1186/1475-925X-10-41 CrossRef

Padrell-Sendra J, Martin-Iglesias D, Diaz-de-Maria F (2006) Support vector machines for continuous speech recognition. In: Proceedings of the 14th European signal processing conference EUSIPCO-2006. http://www.eurasip.org/Proceedings/Eusipco/Eusipco2006/papers/1568981563.pdf. Accessed 16 Feb 2012

Parsa V, Jamieson DG (2001) Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. J Speech Lang Hear Res 44:327–339CrossRef

Pinto J, Lovitt A, Hermansky H (2007) Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting. In Proceedings of INTERSPEECH-2007, pp 1817–1820

Pompili A, Abad A, Trancoso I, Fonseca J, Martins IP, Leal G, Farrajota L (2011) An on-line system for remote treatment of aphasia. In: Proceedings of 2nd workshop on speech and language processing for assistive technologies (SLPAT). http://www.inesc-id.pt/pt/indicadores/Ficheiros/7415.pdf. Accessed 16 Feb 2012

Reilly RB, Moran R, Lacy PD (2004) Voice pathology assessment based on a dialogue system and speech analysis. In Proc Amer Assoc Artif Intell Fall Symp Dialogue Syst Health Commun 104–109

Ringeval F, Demouy J, Szaszák G, Chetouani M, Robel L, Xavier J, Cohen D, Plaza M (2010) Automatic intonation recognition for the prosodic assessment of language-impaired children. IEEE Trans Audio, Speech, and Lang Process 19(5):1328–1342. doi:10.1109/TASL.2010.2090147 CrossRef

Salhi L, Mourad T, Cherif A (2010) Voice disorders identification using multilayer neural network. Int Arab J Inf Technol 7(2):177–185

Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409CrossRef

Silva DG, Oliveira LC, Andrea M (2009) Jitter estimation algorithms for detection of pathological voices. EURASIP J Adv Signal Process 1–10. doi:10.1155/2009/567875

Steidl S, Stemmer G, Hacker C, Nöth E (2004) Adaption in the pronunciation space for non-native speech recognition. In Proc Int Conf on Spoken Lang Process ICSLP 318–321

Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Trans Biomed Eng 59(5):1264–1271CrossRef

Title: Established Methods
Authors: Ladan Baghai-Ravary
Steve W. Beet
Publisher: Springer New York
Book: Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders
Print ISBN: 978-1-4614-4573-9

Electronic ISBN: 978-1-4614-4574-6

Copyright Year: 2013
DOI: https://doi.org/10.1007/978-1-4614-4574-6_5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"