Skip to main content
Top
Published in: International Journal of Speech Technology 4/2013

01-12-2013

Computational auditory models in predicting noise reduction performance for wideband telephony applications

Authors: Nazanin Pourmand, Vijay Parsa, Angela Weaver

Published in: International Journal of Speech Technology | Issue 4/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The performance of several noise reduction algorithms intended for wideband telephony was evaluated both subjectively and objectively. The chosen algorithms were based on statistical modeling, spectral subtraction, Wiener filtering, or subspace modelling principles. A customized wideband noise reduction database containing speech samples corrupted by three types of background noises at three SNR levels, along with their enhanced versions was created. The overall quality of the speech samples in the database was subsequently rated by a group of listeners with normal hearing capabilities. Comprehensive statistical analyses were performed to assess the reliability of the subjective data, and to assess the performance of noise reduction algorithms across varied noisy conditions. The subjective quality ratings were then used to investigate the performance of several auditory model-based objective quality metrics. Key results from these investigations include: (a) there was a high degree of inter- and intra-subject reliability in the subjective ratings, (b) noise reduction algorithms enhance speech quality for only a subset of the noise conditions, and (c) auditory model-based metrics perform similarly in predicting speech quality ratings, when speech quality scores pertaining to a particular noise condition were averaged.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
LogMMSE and LogMMSE_SPU are from the same class of noise reduction algorithms and their implementations are also same with the difference that LogMMSE_SPU algorithm is based on the fact that speech may not be present at all time and there are some pause periods even during speech activity. This Speech Presence Uncertainty (SPU) is taken into account by involving a factor which shows the probability of the presence of the speech at a particular frequency (Loizou 2007).
 
Literature
go back to reference ANSI S3.5 (1997). Methods for calculation of the speech intelligibility index. Washington: ANSI. ANSI S3.5 (1997). Methods for calculation of the speech intelligibility index. Washington: ANSI.
go back to reference Beaugeant, C., Schönle, M., & Varga, I. (2006). Challenges of 16 kHz in acoustic pre- and post-processing for terminals. IEEE Communications Magazine, 44(5), 98–104. CrossRef Beaugeant, C., Schönle, M., & Varga, I. (2006). Challenges of 16 kHz in acoustic pre- and post-processing for terminals. IEEE Communications Magazine, 44(5), 98–104. CrossRef
go back to reference Chen, G., & Parsa, V. (2007). Loudness pattern-based speech quality evaluation using Bayesian modeling and Markov chain Monte Carlo methods. The Journal of the Acoustical Society of America, 121(2), EL77-83. Chen, G., & Parsa, V. (2007). Loudness pattern-based speech quality evaluation using Bayesian modeling and Markov chain Monte Carlo methods. The Journal of the Acoustical Society of America, 121(2), EL77-83.
go back to reference Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54(3), 477–490. CrossRef Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54(3), 477–490. CrossRef
go back to reference Cox, R. V., Kroon, P., Chen, J. H., Thorkildsen, R., O’Dell, K. M., & Isenberg, D. S. (1995). Speech coders: from idea to product. AT&T Technical Journal, 74, 14–21. CrossRef Cox, R. V., Kroon, P., Chen, J. H., Thorkildsen, R., O’Dell, K. M., & Isenberg, D. S. (1995). Speech coders: from idea to product. AT&T Technical Journal, 74, 14–21. CrossRef
go back to reference Dau, T., Püschel, D., & Kohlrausch, A. (1996). A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America, 99(6), 3615–3622. CrossRef Dau, T., Püschel, D., & Kohlrausch, A. (1996). A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America, 99(6), 3615–3622. CrossRef
go back to reference Dau, T., Kollmeier, B., & Kohlrausch, A. (1997). Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. The Journal of the Acoustical Society of America, 102(5), 2892–2905. CrossRef Dau, T., Kollmeier, B., & Kohlrausch, A. (1997). Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. The Journal of the Acoustical Society of America, 102(5), 2892–2905. CrossRef
go back to reference Egi, N., Aoki, H., & Takahashi, A. (2008). Objective quality evaluation method for noise-reduced speech. IEICE Transactions on Communications, E91-B(5), 1279–1286. CrossRef Egi, N., Aoki, H., & Takahashi, A. (2008). Objective quality evaluation method for noise-reduced speech. IEICE Transactions on Communications, E91-B(5), 1279–1286. CrossRef
go back to reference Falk, T. H., & Chan, W. Y. (2008). A non-intrusive quality measure of dereverberated speech. In International workshop for acoustic echo and noise control. Falk, T. H., & Chan, W. Y. (2008). A non-intrusive quality measure of dereverberated speech. In International workshop for acoustic echo and noise control.
go back to reference Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138. CrossRef Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138. CrossRef
go back to reference Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50(5), 331–342. Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50(5), 331–342.
go back to reference Gupta, M., Forrester, C., & Simmons, S. (2009). Review of wideband speech noise reduction techniques. Canadian Acoustic, 37(3), 84–85. Gupta, M., Forrester, C., & Simmons, S. (2009). Review of wideband speech noise reduction techniques. Canadian Acoustic, 37(3), 84–85.
go back to reference Hansen, M., & Kollmeier, B. (2000). Objective modeling of speech quality with a psychoacoustically validated auditory model. Journal of the Audio Engineering Society, 48(5), 395–409. Hansen, M., & Kollmeier, B. (2000). Objective modeling of speech quality with a psychoacoustically validated auditory model. Journal of the Audio Engineering Society, 48(5), 395–409.
go back to reference Helfenstein, M., & Moschytz, G. S. (2000). Circuits and systems for wireless communications (p. 404). Dordrecht: Kluwer Academic. Helfenstein, M., & Moschytz, G. S. (2000). Circuits and systems for wireless communications (p. 404). Dordrecht: Kluwer Academic.
go back to reference Heute, U. (2008). Speech-transmission quality: aspects and assessment for wideband vs. narrowband signals. In Advances in digital speech transmission (pp. 9–50). New York: Wiley. Heute, U. (2008). Speech-transmission quality: aspects and assessment for wideband vs. narrowband signals. In Advances in digital speech transmission (pp. 9–50). New York: Wiley.
go back to reference Holube, I., & Kollmeier, B. (1996). Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. The Journal of the Acoustical Society of America, 100(3), 1703–1716. CrossRef Holube, I., & Kollmeier, B. (1996). Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. The Journal of the Acoustical Society of America, 100(3), 1703–1716. CrossRef
go back to reference Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601. CrossRef Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601. CrossRef
go back to reference Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238. CrossRef Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238. CrossRef
go back to reference Huber, R., & Kollmeier, B. (2006). PEMO-Q—a new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1902–1911. CrossRef Huber, R., & Kollmeier, B. (2006). PEMO-Q—a new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1902–1911. CrossRef
go back to reference ITU-T P. 835 (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T. ITU-T P. 835 (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T.
go back to reference ITU-T Rec. P. 563 (2004). Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T. ITU-T Rec. P. 563 (2004). Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T.
go back to reference ITU-T Rec. P. 862 (2001). Perceptual evaluation of speech quality (PESQ). ITU-T. ITU-T Rec. P. 862 (2001). Perceptual evaluation of speech quality (PESQ). ITU-T.
go back to reference ITU-T Rec. P. 862.2 (2007). Wideband extension to Recommendation P. 862 for the assessment of wideband telephone networks and speech codecs. ITU-T. ITU-T Rec. P. 862.2 (2007). Wideband extension to Recommendation P. 862 for the assessment of wideband telephone networks and speech codecs. ITU-T.
go back to reference Jelinek, M., & Salami, R. (2004). Noise reduction method for wideband speech coding. In Proc EUSIPCO, Vienna, Austria (pp. 1959–1962). Jelinek, M., & Salami, R. (2004). Noise reduction method for wideband speech coding. In Proc EUSIPCO, Vienna, Austria (pp. 1959–1962).
go back to reference Jepsen, M. L., Ewert, S. D., & Dau, T. (2008). A computational model of human auditory signal processing and perception. The Journal of the Acoustical Society of America, 124(1), 422–438. CrossRef Jepsen, M. L., Ewert, S. D., & Dau, T. (2008). A computational model of human auditory signal processing and perception. The Journal of the Acoustical Society of America, 124(1), 422–438. CrossRef
go back to reference Kamath, S.D. (2001). A multi-band spectral subtraction method for speech enhancement (Master’s thesis). Dallas: University of Texas. Kamath, S.D. (2001). A multi-band spectral subtraction method for speech enhancement (Master’s thesis). Dallas: University of Texas.
go back to reference Kates, J. M., & Arehart, K. H. (2010). The hearing-aid speech quality index (HASQI). Journal of the Audio Engineering Society, 58(5), 363–381. Kates, J. M., & Arehart, K. H. (2010). The hearing-aid speech quality index (HASQI). Journal of the Audio Engineering Society, 58(5), 363–381.
go back to reference Kim, D. (2005). ANIQUE: an auditory model for single-ended speech quality estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 821–831. CrossRef Kim, D. (2005). ANIQUE: an auditory model for single-ended speech quality estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 821–831. CrossRef
go back to reference Kondo, K. (2012). Subjective quality measurement of speech: its evaluation, estimation and applications (p. 153). Berlin: Springer. CrossRef Kondo, K. (2012). Subjective quality measurement of speech: its evaluation, estimation and applications (p. 153). Berlin: Springer. CrossRef
go back to reference Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2011). Robustness of the hearing aid speech quality index (HASQI). In Workshop on applications of signal processing to audio and acoustics. Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2011). Robustness of the hearing aid speech quality index (HASQI). In Workshop on applications of signal processing to audio and acoustics.
go back to reference Laska, B., Bolic, M., & Goubran, R. (2010). Discrete cosine transform particle filter speech enhancement. Speech Communication, 52, 762–775. CrossRef Laska, B., Bolic, M., & Goubran, R. (2010). Discrete cosine transform particle filter speech enhancement. Speech Communication, 52, 762–775. CrossRef
go back to reference Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press. Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.
go back to reference Matsunaga, M. (2007). Familywise error in multiple comparisons: disentangling a knot through a critique of O’keefe’s arguments against alpha adjustment. Communication Methods and Measures, 1(4), 243–265. CrossRef Matsunaga, M. (2007). Familywise error in multiple comparisons: disentangling a knot through a critique of O’keefe’s arguments against alpha adjustment. Communication Methods and Measures, 1(4), 243–265. CrossRef
go back to reference Moore, B. C. J., & Glasberg, B. R. (2004). A revised model of loudness perception applied to cochlear hearing loss. Hearing Research, 188, 70–88. CrossRef Moore, B. C. J., & Glasberg, B. R. (2004). A revised model of loudness perception applied to cochlear hearing loss. Hearing Research, 188, 70–88. CrossRef
go back to reference Moore, B. C. J., & Tan, C.-T. (2003). Perceived naturalness of spectrally distorted speech and music. The Journal of the Acoustical Society of America, 114, 408–419. CrossRef Moore, B. C. J., & Tan, C.-T. (2003). Perceived naturalness of spectrally distorted speech and music. The Journal of the Acoustical Society of America, 114, 408–419. CrossRef
go back to reference Moore, B. C. J., & Tan, C. T. (2004). Development and validation of a method for predicting the perceived naturalness of sounds subjected to spectral distortion. Journal of the Audio Engineering Society, 52(9), 900–914. Moore, B. C. J., & Tan, C. T. (2004). Development and validation of a method for predicting the perceived naturalness of sounds subjected to spectral distortion. Journal of the Audio Engineering Society, 52(9), 900–914.
go back to reference Möller, S., Chan, W., Côté, N., Falk, T. H., Raake, A., & Wältermann, M. (2011). Speech quality estimation: models and trends. IEEE Signal Processing Magazine, 28(6), 18–28. CrossRef Möller, S., Chan, W., Côté, N., Falk, T. H., Raake, A., & Wältermann, M. (2011). Speech quality estimation: models and trends. IEEE Signal Processing Magazine, 28(6), 18–28. CrossRef
go back to reference Quackenbush, S. R., Barnwell, T. P., & Clements, M. A. (1988). Objective measures of speech quality. New York: Prentice Hall. Quackenbush, S. R., Barnwell, T. P., & Clements, M. A. (1988). Objective measures of speech quality. New York: Prentice Hall.
go back to reference Ricketts, T. A., Dittberner, A. B., & Johnson, E. E. (2008). High frequency amplification and sound quality in listeners with normal through moderate hearing loss. Journal of Speech, Language, and Hearing Research, 51, 160–172. CrossRef Ricketts, T. A., Dittberner, A. B., & Johnson, E. E. (2008). High frequency amplification and sound quality in listeners with normal through moderate hearing loss. Journal of Speech, Language, and Hearing Research, 51, 160–172. CrossRef
go back to reference Rohdenburg, T., Hohmann, V., & Kollmeier, B. (2005). Objective perceptual quality measures for the evaluation of noise reduction schemes. In 9th international workshop on acoustic echo and noise control (pp. 169–172). Rohdenburg, T., Hohmann, V., & Kollmeier, B. (2005). Objective perceptual quality measures for the evaluation of noise reduction schemes. In 9th international workshop on acoustic echo and noise control (pp. 169–172).
go back to reference Salmela, J., & Mattila, V. (2004). New intrusive method for the objective quality evaluation of acoustic noise suppression in mobile communications. In Proc. 116th audio eng. soc. conv. Salmela, J., & Mattila, V. (2004). New intrusive method for the objective quality evaluation of acoustic noise suppression in mobile communications. In Proc. 116th audio eng. soc. conv.
go back to reference Scalart, P., & Filho, J.V. (1996). Speech enhancement based on a priori signal to noise estimation. In International conference on acoustics, speech, and signal processing (Vol. 2, pp. 629–632). Scalart, P., & Filho, J.V. (1996). Speech enhancement based on a priori signal to noise estimation. In International conference on acoustics, speech, and signal processing (Vol. 2, pp. 629–632).
go back to reference Sohn, J., Kim, N.S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3. CrossRef Sohn, J., Kim, N.S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3. CrossRef
go back to reference Stelmachowicz, P., Pittman, A., Hoover, B., & Lewis, D. (2001). Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults. The Journal of the Acoustical Society of America, 110(4), 2183–2190. CrossRef Stelmachowicz, P., Pittman, A., Hoover, B., & Lewis, D. (2001). Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults. The Journal of the Acoustical Society of America, 110(4), 2183–2190. CrossRef
go back to reference Stoll, G., & Kozamernlk, F. (2000). EBU listening tests on Internet audio codecs (EBU Technical Review). Stoll, G., & Kozamernlk, F. (2000). EBU listening tests on Internet audio codecs (EBU Technical Review).
go back to reference Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050. CrossRef Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050. CrossRef
go back to reference Varga, I., Iacovo, R. D. De, & Usai, P. (2006). Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Communications Magazine, May, 66–73. CrossRef Varga, I., Iacovo, R. D. De, & Usai, P. (2006). Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Communications Magazine, May, 66–73. CrossRef
go back to reference Voran, S. (1997). Listener ratings of speech passbands. In Speech coding for telecommunications proceeding (pp. 81–82). Voran, S. (1997). Listener ratings of speech passbands. In Speech coding for telecommunications proceeding (pp. 81–82).
Metadata
Title
Computational auditory models in predicting noise reduction performance for wideband telephony applications
Authors
Nazanin Pourmand
Vijay Parsa
Angela Weaver
Publication date
01-12-2013
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 4/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-013-9189-1

Other articles of this Issue 4/2013

International Journal of Speech Technology 4/2013 Go to the issue