Skip to main content
Top

2013 | OriginalPaper | Chapter

4. Technology and Implementation

Authors : Ladan Baghai-Ravary, Steve W. Beet

Published in: Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The aim of this chapter is to highlight relevant technical factors and limitations affecting collection and interpretation of speech signals. We concentrate on the typical corruption or distortion of the speech signal which is encountered in the real world, and where possible, we include an indication of how important these effects can be. Transmission and encoding of speech signals in mobile phone networks and the internet is almost invariably lossy, and this has an acute effect on the accuracy of speech recognition systems. Published research has also shown a comparable effect on the accuracy of dysphonia/dysarthria detection. The relationship between some specific aspects of the data collection process and the validity of assessments of new techniques, is discussed. The current absence of a realistic database of remotely collected speech samples is highlighted, and adherence to standardised methods and datasets is shown to be crucial to the evaluation of new algorithms. Methods for combining multiple features into a single result are frequently required, and these too are discussed in this chapter.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
For a complete description of the Dr Speech software and the parameters it calculates, see the company’s official web site: http://​www.​drspeech.​com/​
 
2
ESPS is no longer commercially available, but the source code can be found at KTH, Stockholm: http://​www.​speech.​kth.​se/​software/​#esps
 
3
HTK is available from Cambridge University Engineering Department, through http://​htk.​eng.​cam.​ac.​uk/​
 
4
CMUSphinx is an open-source toolkit for speech recognition, developed by Carnegie Mellon University, and available via http://​cmusphinx.​org/​
 
5
WaveSurfer is an open-source tool for sound visualization and manipulation developed at KTH Stockholm: http://​www.​speech.​kth.​se/​wavesurfer/​.
 
Literature
go back to reference Alku P (1992) Glottal wave analysis with pitch synchronous Interactive adaptive inverse filtering. Speech Commun 11:109–118CrossRef Alku P (1992) Glottal wave analysis with pitch synchronous Interactive adaptive inverse filtering. Speech Commun 11:109–118CrossRef
go back to reference Alonso JB, de León J, Alonso I, Ferrer MA (2001) Automatic detection of pathologies in the voice by HOS based parameters. EURASIP J Appl Signal Process 2001:275–284CrossRef Alonso JB, de León J, Alonso I, Ferrer MA (2001) Automatic detection of pathologies in the voice by HOS based parameters. EURASIP J Appl Signal Process 2001:275–284CrossRef
go back to reference Alpan A, Schoentgen J, Maryn Y, Grenez F (2010) Automatic perceptual categorization of disordered connected speech. In: Proceedings of 11th annual conference on international speech communication association 2010, pp 2574–2577 Alpan A, Schoentgen J, Maryn Y, Grenez F (2010) Automatic perceptual categorization of disordered connected speech. In: Proceedings of 11th annual conference on international speech communication association 2010, pp 2574–2577
go back to reference Barker J, Josifovski L, Cooke M, Green P (2000) Soft decisions in missing data techniques for robust automatic speech recognition. In Proceedings of the international conference on speech and language processing ICSLP-2000, pp 373–376 Barker J, Josifovski L, Cooke M, Green P (2000) Soft decisions in missing data techniques for robust automatic speech recognition. In Proceedings of the international conference on speech and language processing ICSLP-2000, pp 373–376
go back to reference Broun CC, Campbell WM, Pearce D, Kelleher H (2001) Distributed speaker recognition using the ETSI distributed speech recognition standard. In: Proceedings of international conference on artificial intelligence ICAI-2001, vol 1, pp 244–248 Broun CC, Campbell WM, Pearce D, Kelleher H (2001) Distributed speaker recognition using the ETSI distributed speech recognition standard. In: Proceedings of international conference on artificial intelligence ICAI-2001, vol 1, pp 244–248
go back to reference Castillo-Guerra E, Lee W (2008) Automatic acoustics measurement of audible inspirations in pathological voices.: In Proceedings of acoustics-08, pp 3661–3666 Castillo-Guerra E, Lee W (2008) Automatic acoustics measurement of audible inspirations in pathological voices.: In Proceedings of acoustics-08, pp 3661–3666
go back to reference Castillo-Guerra E, Méndez Rodríguez NL (2001) Methodology for obtaining a pathological dysarthric speech database. In Proceedings of 7th international symposium on social communication Castillo-Guerra E, Méndez Rodríguez NL (2001) Methodology for obtaining a pathological dysarthric speech database. In Proceedings of 7th international symposium on social communication
go back to reference Darley FL, Aronson AE, Brown JR (1975) Motor speech disorders. W. B. Saunders, Philadelphia Darley FL, Aronson AE, Brown JR (1975) Motor speech disorders. W. B. Saunders, Philadelphia
go back to reference Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Eng Med and Biol 2002: In: 24th annual conference and the annual fall meeting of the biomedical engineering society embs/bmes conference, vol 1, pp 182–183. doi:10.1109/IEMBS.2002.1134447 Dibazar AA, Narayanan S, Berger TW (2002) Feature analysis for automatic detection of pathological speech. Eng Med and Biol 2002: In: 24th annual conference and the annual fall meeting of the biomedical engineering society embs/bmes conference, vol 1, pp 182–183. doi:10.​1109/​IEMBS.​2002.​1134447
go back to reference Doddington GR, Przybocki MA, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodology, systems, results, perspective. Speech Commun 31(2–3):225–254CrossRef Doddington GR, Przybocki MA, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodology, systems, results, perspective. Speech Commun 31(2–3):225–254CrossRef
go back to reference Ferguson A, Craig H, Spencer E (2009) Exploring the potential for corpus-based research in speech-language pathology. In Selected proceedings of the 2008 HCSNet workshop on designing the Australian National Corpus: Mustering Languages:30–36 Ferguson A, Craig H, Spencer E (2009) Exploring the potential for corpus-based research in speech-language pathology. In Selected proceedings of the 2008 HCSNet workshop on designing the Australian National Corpus: Mustering Languages:30–36
go back to reference Godino-Llorente JI, Gómez-Vilda P (2004) Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 51(2):380–384CrossRef Godino-Llorente JI, Gómez-Vilda P (2004) Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 51(2):380–384CrossRef
go back to reference Godino-Llorente JI, Osma-Ruiz V, Sáenz-Lechón N, Cobeta-Marco I, González-Herranz R, Ramírez-Calvo C (2008) Acoustic analysis of voice using WPCVox: a comparative study with multi dimensional voice program. Eur Arch Otolaryngol 265:465–476CrossRef Godino-Llorente JI, Osma-Ruiz V, Sáenz-Lechón N, Cobeta-Marco I, González-Herranz R, Ramírez-Calvo C (2008) Acoustic analysis of voice using WPCVox: a comparative study with multi dimensional voice program. Eur Arch Otolaryngol 265:465–476CrossRef
go back to reference Gracco VL (1992) Analysis of speech movements: practical considerations and clinical application. Haskins Laboratories status report on speech research SR-109/110, pp 45–58 Gracco VL (1992) Analysis of speech movements: practical considerations and clinical application. Haskins Laboratories status report on speech research SR-109/110, pp 45–58
go back to reference Hadjitodorov S, Mitev P (2002) A computer system for acoustic analysis of pathological voices and laryngeal diseases screening. Med Eng Phys 24:419–429CrossRef Hadjitodorov S, Mitev P (2002) A computer system for acoustic analysis of pathological voices and laryngeal diseases screening. Med Eng Phys 24:419–429CrossRef
go back to reference Hariharan M, Paulraj MP, Yaacob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23(1):60–67 Hariharan M, Paulraj MP, Yaacob S (2010) Time-domain features and probabilistic neural network for the detection of vocal fold pathology. Malays J Comput Sci 23(1):60–67
go back to reference Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Heidelberg Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Heidelberg
go back to reference Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall International, New Jersey Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall International, New Jersey
go back to reference Henríquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, Díaz-de-María F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195CrossRef Henríquez P, Alonso JB, Ferrer MA, Travieso CM, Godino-Llorente JI, Díaz-de-María F (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195CrossRef
go back to reference Jo C (2010) Source analysis of pathological voice. In: Proceedings of the international multiconference of engineers and computer scientists, vol 2, 1271–1274 Jo C (2010) Source analysis of pathological voice. In: Proceedings of the international multiconference of engineers and computer scientists, vol 2, 1271–1274
go back to reference Joshi N, Guan L (2006) Missing data ASR with fusion of features and combination of recognizers. In Proceedings of IEEE spoken language technology workshop 2006, pp 114–117. doi:10.1109/SLT.2006.326830 Joshi N, Guan L (2006) Missing data ASR with fusion of features and combination of recognizers. In Proceedings of IEEE spoken language technology workshop 2006, pp 114–117. doi:10.​1109/​SLT.​2006.​326830
go back to reference Keating PA, Esposito C (2006) Linguistic voice quality. UCLA Working Pap Phon 105:85–91 Keating PA, Esposito C (2006) Linguistic voice quality. UCLA Working Pap Phon 105:85–91
go back to reference Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using biologically inspired amplitude-modulation features. In: IEEE international conference on acoustics, speech and signal processing ICASSP-2005, pp 873–876 Malyska N, Quatieri TF, Sturim D (2005) Automatic dysphonia recognition using biologically inspired amplitude-modulation features. In: IEEE international conference on acoustics, speech and signal processing ICASSP-2005, pp 873–876
go back to reference Markaki M, Stylianou Y (2009) Using modulation spectra for voice pathology detection and classification. In Proceedings of IEEE Conference on Engineering in Medicine and Biology Society 2009, pp 2514–2517 Markaki M, Stylianou Y (2009) Using modulation spectra for voice pathology detection and classification. In Proceedings of IEEE Conference on Engineering in Medicine and Biology Society 2009, pp 2514–2517
go back to reference Masaki A (2010) Optimizing acoustic and perceptual assessment of voice quality in children with vocal nodules. PhD thesis, Harvard-MIT Health Sciences and Technology Masaki A (2010) Optimizing acoustic and perceptual assessment of voice quality in children with vocal nodules. PhD thesis, Harvard-MIT Health Sciences and Technology
go back to reference Medida P (2009) Spectral analysis of pathological acoustic speech waveforms. MSc Thesis, University of Nevada, Las Vegas Medida P (2009) Spectral analysis of pathological acoustic speech waveforms. MSc Thesis, University of Nevada, Las Vegas
go back to reference Parsa V, Jamieson DG (2000) Identification of pathological voices using glottal noise measures. J Speech Lang Hearing Res 43(2):469–485 Parsa V, Jamieson DG (2000) Identification of pathological voices using glottal noise measures. J Speech Lang Hearing Res 43(2):469–485
go back to reference Plante F, Meyer GF, Ainsworth WA (1995) A pitch extraction reference database. In: Proceeding of 4th european conference on speech communication and technology-1995, pp 837–840 Plante F, Meyer GF, Ainsworth WA (1995) A pitch extraction reference database. In: Proceeding of 4th european conference on speech communication and technology-1995, pp 837–840
go back to reference Pützer M, Koreman J (1997) A German database of patterns of pathological vocal fold vibration. PHONUS 3:143–153 Pützer M, Koreman J (1997) A German database of patterns of pathological vocal fold vibration. PHONUS 3:143–153
go back to reference Reilly RB, Moran R, Lacy PD (2004) Voice pathology assessment based on a dialogue system and speech analysis. In: Proceedings of American association for artificial intelligence fall symposium on dialog systems for health communication, pp 104–109 Reilly RB, Moran R, Lacy PD (2004) Voice pathology assessment based on a dialogue system and speech analysis. In: Proceedings of American association for artificial intelligence fall symposium on dialog systems for health communication, pp 104–109
go back to reference Sáenz-Lechón N, Godino-Llorente JI, Osma-Ruiz V, Gómez-Vilda P (2006) Methodological issues in the development of automatic systems for voice pathology detection. Biomed Signal Process Control 1(2):120–128CrossRef Sáenz-Lechón N, Godino-Llorente JI, Osma-Ruiz V, Gómez-Vilda P (2006) Methodological issues in the development of automatic systems for voice pathology detection. Biomed Signal Process Control 1(2):120–128CrossRef
go back to reference Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409CrossRef Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409CrossRef
go back to reference Schoentgen J, de Guchteneere R (1995) Time series analysis of jitter. J Phon 23:189–201CrossRef Schoentgen J, de Guchteneere R (1995) Time series analysis of jitter. J Phon 23:189–201CrossRef
go back to reference Shriberg LD, Hosom J-P, Green JR (2004) Diagnostic assessment of childhood apraxia of speech using automatic speech recognition (ASR) systems. J Med Speech Lang Pathol 12(4):167–171 Shriberg LD, Hosom J-P, Green JR (2004) Diagnostic assessment of childhood apraxia of speech using automatic speech recognition (ASR) systems. J Med Speech Lang Pathol 12(4):167–171
go back to reference Sutton S, Cole RA, de Villiers J, Schalkwyk J, Vermeulen PJE, Macon MW, Yan Y, Kaiser EC, Rundle B, Shobaki K, Hosom J-P, Kain A, Wouters J, Massaro DW, Cohen MM (1998) Universal speech tools: the CSLU toolkit. In: Proceedings of international conference on spoken language processing ICSLP-98:3221–3224 Sutton S, Cole RA, de Villiers J, Schalkwyk J, Vermeulen PJE, Macon MW, Yan Y, Kaiser EC, Rundle B, Shobaki K, Hosom J-P, Kain A, Wouters J, Massaro DW, Cohen MM (1998) Universal speech tools: the CSLU toolkit. In: Proceedings of international conference on spoken language processing ICSLP-98:3221–3224
go back to reference Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans Biomed Eng 59(5):1264–1271CrossRef Tsanas A, Little MA, McSharry PE, Spielman J, Ramig LO (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans Biomed Eng 59(5):1264–1271CrossRef
go back to reference Vasilakis M, Stylianou Y (2009) Voice pathology detection based on short-term jitter estimations in running speech. Folia Phoniatr Logop 61(3):153–170. doi:10.1159/000219951 CrossRef Vasilakis M, Stylianou Y (2009) Voice pathology detection based on short-term jitter estimations in running speech. Folia Phoniatr Logop 61(3):153–170. doi:10.​1159/​000219951 CrossRef
Metadata
Title
Technology and Implementation
Authors
Ladan Baghai-Ravary
Steve W. Beet
Copyright Year
2013
Publisher
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-4574-6_4