Top

Published in:

2013 | OriginalPaper | Chapter

4. Technology and Implementation

Authors : Ladan Baghai-Ravary, Steve W. Beet

Published in: Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The aim of this chapter is to highlight relevant technical factors and limitations affecting collection and interpretation of speech signals. We concentrate on the typical corruption or distortion of the speech signal which is encountered in the real world, and where possible, we include an indication of how important these effects can be. Transmission and encoding of speech signals in mobile phone networks and the internet is almost invariably lossy, and this has an acute effect on the accuracy of speech recognition systems. Published research has also shown a comparable effect on the accuracy of dysphonia/dysarthria detection. The relationship between some specific aspects of the data collection process and the validity of assessments of new techniques, is discussed. The current absence of a realistic database of remotely collected speech samples is highlighted, and adherence to standardised methods and datasets is shown to be crucial to the evaluation of new algorithms. Methods for combining multiple features into a single result are frequently required, and these too are discussed in this chapter.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Acoustic Effects of Speech Impairment

next chapter Established Methods

For a complete description of the Dr Speech software and the parameters it calculates, see the company’s official web site: http://www.drspeech.com/

ESPS is no longer commercially available, but the source code can be found at KTH, Stockholm: http://www.speech.kth.se/software/#esps

HTK is available from Cambridge University Engineering Department, through http://htk.eng.cam.ac.uk/

CMUSphinx is an open-source toolkit for speech recognition, developed by Carnegie Mellon University, and available via http://cmusphinx.org/

WaveSurfer is an open-source tool for sound visualization and manipulation developed at KTH Stockholm: http://www.speech.kth.se/wavesurfer/.

Alku P (1992) Glottal wave analysis with pitch synchronous Interactive adaptive inverse filtering. Speech Commun 11:109–118CrossRef

Alonso JB, de León J, Alonso I, Ferrer MA (2001) Automatic detection of pathologies in the voice by HOS based parameters. EURASIP J Appl Signal Process 2001:275–284CrossRef

Alpan A, Schoentgen J, Maryn Y, Grenez F (2010) Automatic perceptual categorization of disordered connected speech. In: Proceedings of 11th annual conference on international speech communication association 2010, pp 2574–2577

Bakker K, Arkebauer H, Boutsen F (1993) Computer-assisted determination of diadochokinetic rate and variability. Mini-seminar presented at annual convention of the American Speech and Hearing Association (ASHA). http://www.sph.sc.edu/Documents/1997ASHAhandout.pdf. Accessed 16 Feb 2012

Barker J, Josifovski L, Cooke M, Green P (2000) Soft decisions in missing data techniques for robust automatic speech recognition. In Proceedings of the international conference on speech and language processing ICSLP-2000, pp 373–376

Boersma P, Weenink D (2009) Praat: doing phonetics by computer. http://www.praat.org/. Accessed 16 Feb 2012

Broun CC, Campbell WM, Pearce D, Kelleher H (2001) Distributed speaker recognition using the ETSI distributed speech recognition standard. In: Proceedings of international conference on artificial intelligence ICAI-2001, vol 1, pp 244–248

Castillo-Guerra E, Lee W (2008) Automatic acoustics measurement of audible inspirations in pathological voices.: In Proceedings of acoustics-08, pp 3661–3666

Castillo-Guerra E, Méndez Rodríguez NL (2001) Methodology for obtaining a pathological dysarthric speech database. In Proceedings of 7th international symposium on social communication

Cooke M (2006) A glimpsing model of speech perception in noise. J Acoust Soc Amer 119(3):1562–1573MathSciNetCrossRef

Darley FL, Aronson AE, Brown JR (1975) Motor speech disorders. W. B. Saunders, Philadelphia

Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Eng Med and Biol 2002: In: 24th annual conference and the annual fall meeting of the biomedical engineering society embs/bmes conference, vol 1, pp 182–183. doi:10.1109/IEMBS.2002.1134447

Doddington GR, Przybocki MA, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodology, systems, results, perspective. Speech Commun 31(2–3):225–254CrossRef

Dubuisson T, Drugman T, Dutoit T (2011) On the use of grey zones in automatic voice pathology detection. In Proceedings of 9th Pan-Eur Voice Conference (PEVOC9). http://tcts.fpms.ac.be/~drugman/files/pevoc9-VoicePatho.pdf. Accessed 16 Feb 2012

Ferguson A, Craig H, Spencer E (2009) Exploring the potential for corpus-based research in speech-language pathology. In Selected proceedings of the 2008 HCSNet workshop on designing the Australian National Corpus: Mustering Languages:30–36

Godino-Llorente JI, Gómez-Vilda P (2004) Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 51(2):380–384CrossRef

Godino-Llorente JI, Osma-Ruiz V, Sáenz-Lechón N, Cobeta-Marco I, González-Herranz R, Ramírez-Calvo C (2008) Acoustic analysis of voice using WPCVox: a comparative study with multi dimensional voice program. Eur Arch Otolaryngol 265:465–476CrossRef

Gracco VL (1992) Analysis of speech movements: practical considerations and clinical application. Haskins Laboratories status report on speech research SR-109/110, pp 45–58

Hadjitodorov S, Mitev P (2002) A computer system for acoustic analysis of pathological voices and laryngeal diseases screening. Med Eng Phys 24:419–429CrossRef

Hariharan M, Paulraj MP, Yaacob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23(1):60–67

Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Heidelberg

Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall International, New Jersey

Henríquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, Díaz-de-María F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195CrossRef

James A, Milner B (2006) Towards improving the robustness of distributed speech recognition in packet loss. Speech Commun 48:1402–1421. doi:10.1016/j.specom.2006.07.005 CrossRef

Jo C (2010) Source analysis of pathological voice. In: Proceedings of the international multiconference of engineers and computer scientists, vol 2, 1271–1274

Joshi N, Guan L (2006) Missing data ASR with fusion of features and combination of recognizers. In Proceedings of IEEE spoken language technology workshop 2006, pp 114–117. doi:10.1109/SLT.2006.326830

Keating PA, Esposito C (2006) Linguistic voice quality. UCLA Working Pap Phon 105:85–91

Maier A, Haderlein T, Eysholdt U, Rosanowski F, Batliner A, Schuster M, Nöth E (2009) PEAKS—a system for the automatic evaluation of voice and speech disorders. Speech Commun 51(5):425–437. doi:10.1016/j.specom.2009.01.004 CrossRef

Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using biologically inspired amplitude-modulation features. In: IEEE international conference on acoustics, speech and signal processing ICASSP-2005, pp 873–876

Markaki M, Stylianou Y (2009) Using modulation spectra for voice pathology detection and classification. In Proceedings of IEEE Conference on Engineering in Medicine and Biology Society 2009, pp 2514–2517

Masaki A (2010) Optimizing acoustic and perceptual assessment of voice quality in children with vocal nodules. PhD thesis, Harvard-MIT Health Sciences and Technology

Medida P (2009) Spectral analysis of pathological acoustic speech waveforms. MSc Thesis, University of Nevada, Las Vegas

Mehta DD, Hillman RE (2008) Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Curr Opin in Otolaryngol Head Neck Surg 16(3):211–215. doi:10.1097/MOO.0b013e3282fe96ce CrossRef

Parsa V, Jamieson DG (2000) Identification of pathological voices using glottal noise measures. J Speech Lang Hearing Res 43(2):469–485

Pearce D (2005) Distributed speech recognition. http://www.w3.org/2005/05/DSR.pdf. Accessed 16 Feb 2012

Plante F, Meyer GF, Ainsworth WA (1995) A pitch extraction reference database. In: Proceeding of 4th european conference on speech communication and technology-1995, pp 837–840

Pützer M, Koreman J (1997) A German database of patterns of pathological vocal fold vibration. PHONUS 3:143–153

Reilly RB, Moran R, Lacy PD (2004) Voice pathology assessment based on a dialogue system and speech analysis. In: Proceedings of American association for artificial intelligence fall symposium on dialog systems for health communication, pp 104–109

Sáenz-Lechón N, Godino-Llorente JI, Osma-Ruiz V, Gómez-Vilda P (2006) Methodological issues in the development of automatic systems for voice pathology detection. Biomed Signal Process Control 1(2):120–128CrossRef

Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409CrossRef

Schoentgen J, de Guchteneere R (1995) Time series analysis of jitter. J Phon 23:189–201CrossRef

Shriberg LD, Hosom J-P, Green JR (2004) Diagnostic assessment of childhood apraxia of speech using automatic speech recognition (ASR) systems. J Med Speech Lang Pathol 12(4):167–171

Silva DG, Oliveira LC, Andrea M (2009) Jitter estimation algorithms for detection of pathological voices. EURASIP J Adv Signal Process 2009:1–10. doi:10.1155/2009/567875 CrossRef

Sjölander K (2004) The snack sound toolkit http://www.speech.kth.se/snack/. Accessed 16 Feb 2012

Sutton S, Cole RA, de Villiers J, Schalkwyk J, Vermeulen PJE, Macon MW, Yan Y, Kaiser EC, Rundle B, Shobaki K, Hosom J-P, Kain A, Wouters J, Massaro DW, Cohen MM (1998) Universal speech tools: the CSLU toolkit. In: Proceedings of international conference on spoken language processing ICSLP-98:3221–3224

Titze IR (1994) Summary statement. National center for voice and speech workshop on acoustic voice analysis. http://www.ncvs.org/freebooks/summary-statement.pdf. Accessed 16 Feb 2012

Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans Biomed Eng 59(5):1264–1271CrossRef

Vasilakis M, Stylianou Y (2009) Voice pathology detection based on short-term jitter estimations in running speech. Folia Phoniatr Logop 61(3):153–170. doi:10.1159/000219951 CrossRef

Title: Technology and Implementation
Authors: Ladan Baghai-Ravary
Steve W. Beet
Publisher: Springer New York
Book: Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders
Print ISBN: 978-1-4614-4573-9

Electronic ISBN: 978-1-4614-4574-6

Copyright Year: 2013
DOI: https://doi.org/10.1007/978-1-4614-4574-6_4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"