Skip to main content
Erschienen in: Cognitive Computation 2/2014

01.06.2014

Novel Two-Stage Audiovisual Speech Filtering in Noisy Environments

verfasst von: Andrew Abel, Amir Hussain

Erschienen in: Cognitive Computation | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, we build on previous work by the authors and present a novel two-stage audiovisual speech enhancement system, making use of audio-only beamforming, automatic lip tracking, and pre-processing with visually derived Wiener speech filtering. Initial results have demonstrated that this two-stage multimodal speech enhancement approach can produce positive results with noisy speech mixtures that conventional audio-only beamforming would struggle to cope with, such as in very noisy environments with a very low signal to noise ratio, and when the type of noise is difficult for audio-only beamforming to process.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Greenberg J. Improved design of microphone-array hearing aids. 1994. Greenberg J. Improved design of microphone-array hearing aids. 1994.
2.
Zurück zum Zitat Zelinski R. A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. In: Acoustics, speech, and signal processing, 1988. ICASSP-88., 1988 international conference. p. 2578–2581, 1988. Zelinski R. A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. In: Acoustics, speech, and signal processing, 1988. ICASSP-88., 1988 international conference. p. 2578–2581, 1988.
3.
Zurück zum Zitat Hussain A, Cifani S, Squartini S, Piazza F, Durrani TS. A novel psyco-acoustically motivated multi-channel speech enhancement system. In: Verbal and non-verbal communication behaviours. Lecture Notes in Computer Science (LNCS), vol. 4775. Springer-Verlag; 2007. p. 190–199. Hussain A, Cifani S, Squartini S, Piazza F, Durrani TS. A novel psyco-acoustically motivated multi-channel speech enhancement system. In: Verbal and non-verbal communication behaviours. Lecture Notes in Computer Science (LNCS), vol. 4775. Springer-Verlag; 2007. p. 190–199.
4.
Zurück zum Zitat Liu Y, Lv J, Xiang Y. Underdetermined blind source separation by parallel factor analysis in time-frequency domain. Cogn Comput. 2012;5(2):207–14. Liu Y, Lv J, Xiang Y. Underdetermined blind source separation by parallel factor analysis in time-frequency domain. Cogn Comput. 2012;5(2):207–14.
5.
Zurück zum Zitat Gannot S, Burshtein D, Weinstein E. Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Trans Signal Process. 2001;49(8):1614–26.CrossRef Gannot S, Burshtein D, Weinstein E. Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Trans Signal Process. 2001;49(8):1614–26.CrossRef
6.
Zurück zum Zitat Griffiths L, Jim C. An alternative approach to linearly constrained adaptive beamforming. IEEE Trans Antennas Propag. 1982;30(1):27–34.CrossRef Griffiths L, Jim C. An alternative approach to linearly constrained adaptive beamforming. IEEE Trans Antennas Propag. 1982;30(1):27–34.CrossRef
7.
Zurück zum Zitat Wiener N. Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications. Cambridge, MA: The MIT Press; 1949. Wiener N. Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications. Cambridge, MA: The MIT Press; 1949.
8.
Zurück zum Zitat Li J, Sakamoto S, Hongo S, Akagi M, Suzuki Y. A two-stage binaural speech enhancement approach for hearing aids with preserving binaural benefits in noisy environments. In: Proceedings of forum acousticum 2008, Paris, France; 2008. p. 723–727. Li J, Sakamoto S, Hongo S, Akagi M, Suzuki Y. A two-stage binaural speech enhancement approach for hearing aids with preserving binaural benefits in noisy environments. In: Proceedings of forum acousticum 2008, Paris, France; 2008. p. 723–727.
9.
Zurück zum Zitat Li J, Sakamoto S, Hongo S, Akagi M, Suzuki Y. Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication. Speech Communication, 2010. Li J, Sakamoto S, Hongo S, Akagi M, Suzuki Y. Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication. Speech Communication, 2010.
10.
Zurück zum Zitat Van den Bogaert T, Doclo S, Wouters J, Moonen M. Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. J Acoust Soc Am. 2009;125:360–71.PubMedCrossRef Van den Bogaert T, Doclo S, Wouters J, Moonen M. Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. J Acoust Soc Am. 2009;125:360–71.PubMedCrossRef
11.
Zurück zum Zitat Anderson M, Adali T, Li X. Joint blind source separation with multivariate Gaussian model: algorithms and performance analysis. IEEE Trans Signal Process. 2012;60(4):1672–83.CrossRef Anderson M, Adali T, Li X. Joint blind source separation with multivariate Gaussian model: algorithms and performance analysis. IEEE Trans Signal Process. 2012;60(4):1672–83.CrossRef
12.
Zurück zum Zitat Rivet B, Girin L, Jutten C. Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures. IEEE Trans Audio Speech Lang Process. 2007;15(1):96–108.CrossRef Rivet B, Girin L, Jutten C. Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures. IEEE Trans Audio Speech Lang Process. 2007;15(1):96–108.CrossRef
13.
Zurück zum Zitat Rivet B, Chambers J. Multimodal speech separation. In: Sol-Casals J, Zaiats V, editors. Advances in nonlinear speech processing, vol 5933 of Lecture Notes in Computer Science. Berlin/Heidelberg: Springer; 2010. p. 1–11. Rivet B, Chambers J. Multimodal speech separation. In: Sol-Casals J, Zaiats V, editors. Advances in nonlinear speech processing, vol 5933 of Lecture Notes in Computer Science. Berlin/Heidelberg: Springer; 2010. p. 1–11.
14.
Zurück zum Zitat Rivet B, Girin L, Jutten C. Log-Rayleigh distribution: a simple and efficient statistical representation of log-spectral coefficients. IEEE Trans Audio Speech Lang Process. 2007;15(3):796–802.CrossRef Rivet B, Girin L, Jutten C. Log-Rayleigh distribution: a simple and efficient statistical representation of log-spectral coefficients. IEEE Trans Audio Speech Lang Process. 2007;15(3):796–802.CrossRef
15.
Zurück zum Zitat Sumby W, Pollack I. Visual contribution to speech intelligibility in noise. J Acoust Soc Am. 1954;26(2):212–5.CrossRef Sumby W, Pollack I. Visual contribution to speech intelligibility in noise. J Acoust Soc Am. 1954;26(2):212–5.CrossRef
16.
Zurück zum Zitat Erber NP. Auditory-visual perception of speech. J Speech Hear Disord. 1975;40(4):481–92.PubMed Erber NP. Auditory-visual perception of speech. J Speech Hear Disord. 1975;40(4):481–92.PubMed
17.
Zurück zum Zitat Summerfield Q. Use of visual information for phonetic perception. Phonetica. 1979;36(4):314–31.PubMedCrossRef Summerfield Q. Use of visual information for phonetic perception. Phonetica. 1979;36(4):314–31.PubMedCrossRef
18.
Zurück zum Zitat Berthommier F. A phonetically neutral model of the low-level audio-visual interaction. Speech Commun. 2004;44(1):31–41.CrossRef Berthommier F. A phonetically neutral model of the low-level audio-visual interaction. Speech Commun. 2004;44(1):31–41.CrossRef
20.
Zurück zum Zitat Patterson ML, Werker JF. Two-month-old infants match phonetic information in lips and voice. Dev Sci. 2003;6(2):191–6.CrossRef Patterson ML, Werker JF. Two-month-old infants match phonetic information in lips and voice. Dev Sci. 2003;6(2):191–6.CrossRef
21.
Zurück zum Zitat Patterson ML, Werker JF. Matching phonetic information in lips and voice is robust in 4.5-month-old infants. Infant Behav Dev. 1999;22(2):237–47.CrossRef Patterson ML, Werker JF. Matching phonetic information in lips and voice is robust in 4.5-month-old infants. Infant Behav Dev. 1999;22(2):237–47.CrossRef
22.
Zurück zum Zitat Benoit C, Guiard-Marigny T, Le Fogg B, Adjoudani A. Which components of the face do humans and machines best speechread? Nato ASI Ser F Comput Syst Sci. 1996;150:315–30.CrossRef Benoit C, Guiard-Marigny T, Le Fogg B, Adjoudani A. Which components of the face do humans and machines best speechread? Nato ASI Ser F Comput Syst Sci. 1996;150:315–30.CrossRef
23.
Zurück zum Zitat Vatikiotis-Bateson E, Eigsti I-M, Yano S, Munhall KG. Eye movement of perceivers during audiovisual speech perception. Percept Psychophys. 1998;60(6):926–40.PubMedCrossRef Vatikiotis-Bateson E, Eigsti I-M, Yano S, Munhall KG. Eye movement of perceivers during audiovisual speech perception. Percept Psychophys. 1998;60(6):926–40.PubMedCrossRef
24.
Zurück zum Zitat Ghazanfar AA, Nielson K, Logothetis NK. Eye movements of monkey observers viewing vocalizing conspecifics. Cognition. 2006;101(3):515–29.PubMedCrossRef Ghazanfar AA, Nielson K, Logothetis NK. Eye movements of monkey observers viewing vocalizing conspecifics. Cognition. 2006;101(3):515–29.PubMedCrossRef
25.
Zurück zum Zitat Schwartz J-L, Berthommier F. Seeing to hear better: evidence for early audio-visual interactions in speech identification. Cognition. 2004;93(2):B69–78.PubMedCrossRef Schwartz J-L, Berthommier F. Seeing to hear better: evidence for early audio-visual interactions in speech identification. Cognition. 2004;93(2):B69–78.PubMedCrossRef
26.
Zurück zum Zitat Grant KW, Seitz P. The use of visible speech cues for improving auditory detection of spoken sentences. J Acoust Soc Am. 2000;108(3):1197–208.PubMedCrossRef Grant KW, Seitz P. The use of visible speech cues for improving auditory detection of spoken sentences. J Acoust Soc Am. 2000;108(3):1197–208.PubMedCrossRef
27.
Zurück zum Zitat Bernstein LE, Takayanagi S, Auer E Jr. Enhanced auditory detection with AV speech: perceptual evidence for speech and non-speech mechanisms. AVSP. 2004;2003:2003. Bernstein LE, Takayanagi S, Auer E Jr. Enhanced auditory detection with AV speech: perceptual evidence for speech and non-speech mechanisms. AVSP. 2004;2003:2003.
28.
Zurück zum Zitat Grant KW. The effect of speechreading on masked detection thresholds for filtered speech. J Acoust Soc Am. 2001;109(5):2272–5.PubMedCrossRef Grant KW. The effect of speechreading on masked detection thresholds for filtered speech. J Acoust Soc Am. 2001;109(5):2272–5.PubMedCrossRef
29.
Zurück zum Zitat Kim J, Davis C. Hearing foreign voices: does knowing what is said affect visual-masked-speech detection? Perception. 2003;32(1):111–20.PubMedCrossRef Kim J, Davis C. Hearing foreign voices: does knowing what is said affect visual-masked-speech detection? Perception. 2003;32(1):111–20.PubMedCrossRef
30.
Zurück zum Zitat Petajan ED. Automatic lipreading to enhance speech recognition (speech reading). Doctoral Thesis, University of Illinois at Urbana-Champaign, 1984. Petajan ED. Automatic lipreading to enhance speech recognition (speech reading). Doctoral Thesis, University of Illinois at Urbana-Champaign, 1984.
31.
Zurück zum Zitat Helfer KS, Freyman R. The role of visual speech cues in reducing energetic and informational masking. J Acoust Soc Am. 2005;117(2):842–9.PubMedCrossRef Helfer KS, Freyman R. The role of visual speech cues in reducing energetic and informational masking. J Acoust Soc Am. 2005;117(2):842–9.PubMedCrossRef
32.
Zurück zum Zitat Wightman F, Kistler D, Brungart D. Informational masking of speech in children: auditory-visual integration. J Acoust Soc Am. 2006;119(6):3940–9.PubMedCentralPubMedCrossRef Wightman F, Kistler D, Brungart D. Informational masking of speech in children: auditory-visual integration. J Acoust Soc Am. 2006;119(6):3940–9.PubMedCentralPubMedCrossRef
33.
Zurück zum Zitat Sodoyer D, Rivet B, Girin L, Savariaux C, Schwartz J-L. A study of lip movements during spontaneous dialog and its application to voice activity detection. J Acoust Soc Am. 2009;125(2):1184–96.PubMedCrossRef Sodoyer D, Rivet B, Girin L, Savariaux C, Schwartz J-L. A study of lip movements during spontaneous dialog and its application to voice activity detection. J Acoust Soc Am. 2009;125(2):1184–96.PubMedCrossRef
34.
Zurück zum Zitat Yehia H, Rubin P, Vatikiotis-Bateson E. Quantitative association of vocal-tract and facial behavior. Speech Commun. 1998;26(1–2):23–43.CrossRef Yehia H, Rubin P, Vatikiotis-Bateson E. Quantitative association of vocal-tract and facial behavior. Speech Commun. 1998;26(1–2):23–43.CrossRef
35.
Zurück zum Zitat Barker J, Berthommier F. Estimation of speech acoustics from visual speech features: a comparison of linear and non-linear models. In: AVSP’99-international conference on auditory-visual speech processing, 1999. Barker J, Berthommier F. Estimation of speech acoustics from visual speech features: a comparison of linear and non-linear models. In: AVSP’99-international conference on auditory-visual speech processing, 1999.
36.
Zurück zum Zitat Almajai I, Milner B. Maximising audio-visual speech correlation. In: Proceedings of AVSP, 2007. Almajai I, Milner B. Maximising audio-visual speech correlation. In: Proceedings of AVSP, 2007.
37.
Zurück zum Zitat Cifani S, Abel A, Hussain A, Squartini S, Piazza F. An investigation into audiovisual speech correlation in reverberant noisy environments. In: Esposito A, Vích R, editors. Cross-modal analysis of speech, gestures, gaze and facial expressions. Berlin, Heidelberg: Springer; 2009. p. 331–343. Cifani S, Abel A, Hussain A, Squartini S, Piazza F. An investigation into audiovisual speech correlation in reverberant noisy environments. In: Esposito A, Vích R, editors. Cross-modal analysis of speech, gestures, gaze and facial expressions. Berlin, Heidelberg: Springer; 2009. p. 331–343.
38.
Zurück zum Zitat Abel A, Hussain A, Nguyen Q, Ringeval F, Chetouani M, Milgram M. Maximising audiovisual correlation with automatic lip tracking and vowel based segmentation. In: Fierrez J, Ortega-Garcia J, Esposito A, Drygajlo A, Faundez-Zanuy M, editors. Biometric ID management and multimodal communication. Berlin, Heidelberg: Springer; 2009. p. 65–72. Abel A, Hussain A, Nguyen Q, Ringeval F, Chetouani M, Milgram M. Maximising audiovisual correlation with automatic lip tracking and vowel based segmentation. In: Fierrez J, Ortega-Garcia J, Esposito A, Drygajlo A, Faundez-Zanuy M, editors. Biometric ID management and multimodal communication. Berlin, Heidelberg: Springer; 2009. p. 65–72.
39.
Zurück zum Zitat Girin L, Schwartz J, Feng G. Audio-visual enhancement of speech in noise. J Acoust Soc Am. 2001;109:3007.PubMedCrossRef Girin L, Schwartz J, Feng G. Audio-visual enhancement of speech in noise. J Acoust Soc Am. 2001;109:3007.PubMedCrossRef
40.
Zurück zum Zitat Goecke R, Potamianos G, Neti C. Noisy audio feature enhancement using audio-visual speech data. In: Acoustics, speech, and signal processing, 2002. Proceedings.(ICASSP’02). IEEE international conference on, vol. 2, p. 2025–2028, IEEE, 2002. Goecke R, Potamianos G, Neti C. Noisy audio feature enhancement using audio-visual speech data. In: Acoustics, speech, and signal processing, 2002. Proceedings.(ICASSP’02). IEEE international conference on, vol. 2, p. 2025–2028, IEEE, 2002.
41.
Zurück zum Zitat Potamianos G, Neti C, Deligne S. Joint audio-visual speech processing for recognition and enhancement. In: AVSP 2003-international conference on auditory-visual speech processing, p. 95–104, 2003. Potamianos G, Neti C, Deligne S. Joint audio-visual speech processing for recognition and enhancement. In: AVSP 2003-international conference on auditory-visual speech processing, p. 95–104, 2003.
42.
Zurück zum Zitat Acero A, Stern R. Environmental robustness in automatic speech recognition. In: Acoustics, speech, and signal processing, 1990. ICASSP-90., 1990 international conference, p. 849–852, IEEE, 2002. Acero A, Stern R. Environmental robustness in automatic speech recognition. In: Acoustics, speech, and signal processing, 1990. ICASSP-90., 1990 international conference, p. 849–852, IEEE, 2002.
43.
Zurück zum Zitat Deng L, Acero A, Jiang L, Droppo J, Huang X. High-performance robust speech recognition using stereo training data. In: Acoustics, speech, and signal processing, 2001. Proceedings.(ICASSP’01). 2001 IEEE international conference on, vol. 1, p. 301–304, IEEE, 2002. Deng L, Acero A, Jiang L, Droppo J, Huang X. High-performance robust speech recognition using stereo training data. In: Acoustics, speech, and signal processing, 2001. Proceedings.(ICASSP’01). 2001 IEEE international conference on, vol. 1, p. 301–304, IEEE, 2002.
44.
Zurück zum Zitat Almajai I, Milner B. Enhancing audio speech using visual speech features. In: Proceedings of Interspeech, Brighton, UK, 2009. Almajai I, Milner B. Enhancing audio speech using visual speech features. In: Proceedings of Interspeech, Brighton, UK, 2009.
45.
Zurück zum Zitat Almajai I, Milner B, Darch J, Vaseghi S. Visually-derived Wiener filters for speech enhancement. In: IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007, vol. 4, p. 585–588, 2007. Almajai I, Milner B, Darch J, Vaseghi S. Visually-derived Wiener filters for speech enhancement. In: IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007, vol. 4, p. 585–588, 2007.
46.
Zurück zum Zitat Solé-Casals J, Zaiats V. A non-linear VAD for noisy environments. Cogn Comput. 2010;2(3):191–8.CrossRef Solé-Casals J, Zaiats V. A non-linear VAD for noisy environments. Cogn Comput. 2010;2(3):191–8.CrossRef
47.
Zurück zum Zitat Nguyen Q, Milgram M. Semi adaptive appearance models for lip tracking. In: ICIP09, p. 2437–2440, 2009. Nguyen Q, Milgram M. Semi adaptive appearance models for lip tracking. In: ICIP09, p. 2437–2440, 2009.
48.
Zurück zum Zitat Calinon S, Guenter F, Billard A. On learning, representing, and generalizing a task in a humanoid robot. IEEE Trans Syst Man Cybern B. 2007;37(2):286–98.CrossRef Calinon S, Guenter F, Billard A. On learning, representing, and generalizing a task in a humanoid robot. IEEE Trans Syst Man Cybern B. 2007;37(2):286–98.CrossRef
49.
Zurück zum Zitat Cooke M, Barker J, Cunningham S, Shao X. An audio-visual corpus for speech perception and automatic speech recognition. J Acoust Soc Am. 2006;120(5 Pt 1):2421–4.PubMedCrossRef Cooke M, Barker J, Cunningham S, Shao X. An audio-visual corpus for speech perception and automatic speech recognition. J Acoust Soc Am. 2006;120(5 Pt 1):2421–4.PubMedCrossRef
50.
Zurück zum Zitat Levey A, Lindenbaum M. Sequential Karhunen-Loeve basis extraction and its application to images. IEEE Trans Image Process. 2000;9(8):1371–4.PubMedCrossRef Levey A, Lindenbaum M. Sequential Karhunen-Loeve basis extraction and its application to images. IEEE Trans Image Process. 2000;9(8):1371–4.PubMedCrossRef
51.
Zurück zum Zitat Golub G, Van Loan C. Matrix computations. Baltimore, MD: Johns Hopkins University Press; 1996. Golub G, Van Loan C. Matrix computations. Baltimore, MD: Johns Hopkins University Press; 1996.
52.
Zurück zum Zitat Cauwenberghs G, Poggio T. Incremental and decremental support vector machine learning. In: Advances in neural information processing systems 13: proceedings of the 2000 conference, p. 409–415, The MIT Press, 2001. Cauwenberghs G, Poggio T. Incremental and decremental support vector machine learning. In: Advances in neural information processing systems 13: proceedings of the 2000 conference, p. 409–415, The MIT Press, 2001.
53.
Zurück zum Zitat Hiller A, Chin R. Iterative Wiener filters for image restoration. In: Acoustics, speech, and signal processing, 1990. ICASSP-90. 1990 international conference on, p. 1901–1904, 1990. Hiller A, Chin R. Iterative Wiener filters for image restoration. In: Acoustics, speech, and signal processing, 1990. ICASSP-90. 1990 international conference on, p. 1901–1904, 1990.
54.
Zurück zum Zitat Sargin M, Yemez Y, Erzin E, Tekalp A. Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Trans Multimedia. 2007;9(7):1396–403.CrossRef Sargin M, Yemez Y, Erzin E, Tekalp A. Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Trans Multimedia. 2007;9(7):1396–403.CrossRef
55.
Zurück zum Zitat Fritsch F, Carlson R. Monotone piecewise cubic interpolation. SIAM J Numer Anal. 1980;17(2):238–46.CrossRef Fritsch F, Carlson R. Monotone piecewise cubic interpolation. SIAM J Numer Anal. 1980;17(2):238–46.CrossRef
56.
Zurück zum Zitat Loizou P. Speech enhancement: theory and practice (signal processing and communication. Boca Raton, FL: CRC; 2007. Loizou P. Speech enhancement: theory and practice (signal processing and communication. Boca Raton, FL: CRC; 2007.
57.
Zurück zum Zitat Hu Y, Loizou P. Evaluation of objective measures for speech enhancement. Proc Interspeech. 2006;2006:1447–50. Hu Y, Loizou P. Evaluation of objective measures for speech enhancement. Proc Interspeech. 2006;2006:1447–50.
58.
Zurück zum Zitat Hu Y, Loizou P. Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process. 2008;16(1):229–38.CrossRef Hu Y, Loizou P. Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process. 2008;16(1):229–38.CrossRef
59.
Zurück zum Zitat Rix AW, Beerends JG, Hollier MP, Hekstra AP. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Acoustics, speech, and signal processing, IEEE international conference on (ICASSP’01), vol. 2, p. 749–752, 2001. Rix AW, Beerends JG, Hollier MP, Hekstra AP. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Acoustics, speech, and signal processing, IEEE international conference on (ICASSP’01), vol. 2, p. 749–752, 2001.
60.
Zurück zum Zitat Klatt D. Prediction of perceived phonetic distance from critical-band spectra: a first step. In: Acoustics, speech, and signal processing, IEEE international conference on (ICASSP’82), vol 7, p. 1278–1281, 1982. Klatt D. Prediction of perceived phonetic distance from critical-band spectra: a first step. In: Acoustics, speech, and signal processing, IEEE international conference on (ICASSP’82), vol 7, p. 1278–1281, 1982.
Metadaten
Titel
Novel Two-Stage Audiovisual Speech Filtering in Noisy Environments
verfasst von
Andrew Abel
Amir Hussain
Publikationsdatum
01.06.2014
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 2/2014
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-013-9231-2

Weitere Artikel der Ausgabe 2/2014

Cognitive Computation 2/2014 Zur Ausgabe