Skip to main content
Erschienen in: International Journal of Speech Technology 4/2013

01.12.2013

Computational auditory models in predicting noise reduction performance for wideband telephony applications

verfasst von: Nazanin Pourmand, Vijay Parsa, Angela Weaver

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The performance of several noise reduction algorithms intended for wideband telephony was evaluated both subjectively and objectively. The chosen algorithms were based on statistical modeling, spectral subtraction, Wiener filtering, or subspace modelling principles. A customized wideband noise reduction database containing speech samples corrupted by three types of background noises at three SNR levels, along with their enhanced versions was created. The overall quality of the speech samples in the database was subsequently rated by a group of listeners with normal hearing capabilities. Comprehensive statistical analyses were performed to assess the reliability of the subjective data, and to assess the performance of noise reduction algorithms across varied noisy conditions. The subjective quality ratings were then used to investigate the performance of several auditory model-based objective quality metrics. Key results from these investigations include: (a) there was a high degree of inter- and intra-subject reliability in the subjective ratings, (b) noise reduction algorithms enhance speech quality for only a subset of the noise conditions, and (c) auditory model-based metrics perform similarly in predicting speech quality ratings, when speech quality scores pertaining to a particular noise condition were averaged.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
LogMMSE and LogMMSE_SPU are from the same class of noise reduction algorithms and their implementations are also same with the difference that LogMMSE_SPU algorithm is based on the fact that speech may not be present at all time and there are some pause periods even during speech activity. This Speech Presence Uncertainty (SPU) is taken into account by involving a factor which shows the probability of the presence of the speech at a particular frequency (Loizou 2007).
 
Literatur
Zurück zum Zitat ANSI S3.5 (1997). Methods for calculation of the speech intelligibility index. Washington: ANSI. ANSI S3.5 (1997). Methods for calculation of the speech intelligibility index. Washington: ANSI.
Zurück zum Zitat Beaugeant, C., Schönle, M., & Varga, I. (2006). Challenges of 16 kHz in acoustic pre- and post-processing for terminals. IEEE Communications Magazine, 44(5), 98–104. CrossRef Beaugeant, C., Schönle, M., & Varga, I. (2006). Challenges of 16 kHz in acoustic pre- and post-processing for terminals. IEEE Communications Magazine, 44(5), 98–104. CrossRef
Zurück zum Zitat Chen, G., & Parsa, V. (2007). Loudness pattern-based speech quality evaluation using Bayesian modeling and Markov chain Monte Carlo methods. The Journal of the Acoustical Society of America, 121(2), EL77-83. Chen, G., & Parsa, V. (2007). Loudness pattern-based speech quality evaluation using Bayesian modeling and Markov chain Monte Carlo methods. The Journal of the Acoustical Society of America, 121(2), EL77-83.
Zurück zum Zitat Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54(3), 477–490. CrossRef Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54(3), 477–490. CrossRef
Zurück zum Zitat Cox, R. V., Kroon, P., Chen, J. H., Thorkildsen, R., O’Dell, K. M., & Isenberg, D. S. (1995). Speech coders: from idea to product. AT&T Technical Journal, 74, 14–21. CrossRef Cox, R. V., Kroon, P., Chen, J. H., Thorkildsen, R., O’Dell, K. M., & Isenberg, D. S. (1995). Speech coders: from idea to product. AT&T Technical Journal, 74, 14–21. CrossRef
Zurück zum Zitat Dau, T., Püschel, D., & Kohlrausch, A. (1996). A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America, 99(6), 3615–3622. CrossRef Dau, T., Püschel, D., & Kohlrausch, A. (1996). A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America, 99(6), 3615–3622. CrossRef
Zurück zum Zitat Dau, T., Kollmeier, B., & Kohlrausch, A. (1997). Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. The Journal of the Acoustical Society of America, 102(5), 2892–2905. CrossRef Dau, T., Kollmeier, B., & Kohlrausch, A. (1997). Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. The Journal of the Acoustical Society of America, 102(5), 2892–2905. CrossRef
Zurück zum Zitat Egi, N., Aoki, H., & Takahashi, A. (2008). Objective quality evaluation method for noise-reduced speech. IEICE Transactions on Communications, E91-B(5), 1279–1286. CrossRef Egi, N., Aoki, H., & Takahashi, A. (2008). Objective quality evaluation method for noise-reduced speech. IEICE Transactions on Communications, E91-B(5), 1279–1286. CrossRef
Zurück zum Zitat Falk, T. H., & Chan, W. Y. (2008). A non-intrusive quality measure of dereverberated speech. In International workshop for acoustic echo and noise control. Falk, T. H., & Chan, W. Y. (2008). A non-intrusive quality measure of dereverberated speech. In International workshop for acoustic echo and noise control.
Zurück zum Zitat Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138. CrossRef Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138. CrossRef
Zurück zum Zitat Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50(5), 331–342. Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50(5), 331–342.
Zurück zum Zitat Gupta, M., Forrester, C., & Simmons, S. (2009). Review of wideband speech noise reduction techniques. Canadian Acoustic, 37(3), 84–85. Gupta, M., Forrester, C., & Simmons, S. (2009). Review of wideband speech noise reduction techniques. Canadian Acoustic, 37(3), 84–85.
Zurück zum Zitat Hansen, M., & Kollmeier, B. (2000). Objective modeling of speech quality with a psychoacoustically validated auditory model. Journal of the Audio Engineering Society, 48(5), 395–409. Hansen, M., & Kollmeier, B. (2000). Objective modeling of speech quality with a psychoacoustically validated auditory model. Journal of the Audio Engineering Society, 48(5), 395–409.
Zurück zum Zitat Helfenstein, M., & Moschytz, G. S. (2000). Circuits and systems for wireless communications (p. 404). Dordrecht: Kluwer Academic. Helfenstein, M., & Moschytz, G. S. (2000). Circuits and systems for wireless communications (p. 404). Dordrecht: Kluwer Academic.
Zurück zum Zitat Heute, U. (2008). Speech-transmission quality: aspects and assessment for wideband vs. narrowband signals. In Advances in digital speech transmission (pp. 9–50). New York: Wiley. Heute, U. (2008). Speech-transmission quality: aspects and assessment for wideband vs. narrowband signals. In Advances in digital speech transmission (pp. 9–50). New York: Wiley.
Zurück zum Zitat Holube, I., & Kollmeier, B. (1996). Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. The Journal of the Acoustical Society of America, 100(3), 1703–1716. CrossRef Holube, I., & Kollmeier, B. (1996). Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. The Journal of the Acoustical Society of America, 100(3), 1703–1716. CrossRef
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601. CrossRef Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601. CrossRef
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238. CrossRef Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238. CrossRef
Zurück zum Zitat Huber, R., & Kollmeier, B. (2006). PEMO-Q—a new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1902–1911. CrossRef Huber, R., & Kollmeier, B. (2006). PEMO-Q—a new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1902–1911. CrossRef
Zurück zum Zitat ITU-T P. 835 (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T. ITU-T P. 835 (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T.
Zurück zum Zitat ITU-T Rec. P. 563 (2004). Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T. ITU-T Rec. P. 563 (2004). Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T.
Zurück zum Zitat ITU-T Rec. P. 862 (2001). Perceptual evaluation of speech quality (PESQ). ITU-T. ITU-T Rec. P. 862 (2001). Perceptual evaluation of speech quality (PESQ). ITU-T.
Zurück zum Zitat ITU-T Rec. P. 862.2 (2007). Wideband extension to Recommendation P. 862 for the assessment of wideband telephone networks and speech codecs. ITU-T. ITU-T Rec. P. 862.2 (2007). Wideband extension to Recommendation P. 862 for the assessment of wideband telephone networks and speech codecs. ITU-T.
Zurück zum Zitat Jelinek, M., & Salami, R. (2004). Noise reduction method for wideband speech coding. In Proc EUSIPCO, Vienna, Austria (pp. 1959–1962). Jelinek, M., & Salami, R. (2004). Noise reduction method for wideband speech coding. In Proc EUSIPCO, Vienna, Austria (pp. 1959–1962).
Zurück zum Zitat Jepsen, M. L., Ewert, S. D., & Dau, T. (2008). A computational model of human auditory signal processing and perception. The Journal of the Acoustical Society of America, 124(1), 422–438. CrossRef Jepsen, M. L., Ewert, S. D., & Dau, T. (2008). A computational model of human auditory signal processing and perception. The Journal of the Acoustical Society of America, 124(1), 422–438. CrossRef
Zurück zum Zitat Kamath, S.D. (2001). A multi-band spectral subtraction method for speech enhancement (Master’s thesis). Dallas: University of Texas. Kamath, S.D. (2001). A multi-band spectral subtraction method for speech enhancement (Master’s thesis). Dallas: University of Texas.
Zurück zum Zitat Kates, J. M., & Arehart, K. H. (2010). The hearing-aid speech quality index (HASQI). Journal of the Audio Engineering Society, 58(5), 363–381. Kates, J. M., & Arehart, K. H. (2010). The hearing-aid speech quality index (HASQI). Journal of the Audio Engineering Society, 58(5), 363–381.
Zurück zum Zitat Kim, D. (2005). ANIQUE: an auditory model for single-ended speech quality estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 821–831. CrossRef Kim, D. (2005). ANIQUE: an auditory model for single-ended speech quality estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 821–831. CrossRef
Zurück zum Zitat Kondo, K. (2012). Subjective quality measurement of speech: its evaluation, estimation and applications (p. 153). Berlin: Springer. CrossRef Kondo, K. (2012). Subjective quality measurement of speech: its evaluation, estimation and applications (p. 153). Berlin: Springer. CrossRef
Zurück zum Zitat Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2011). Robustness of the hearing aid speech quality index (HASQI). In Workshop on applications of signal processing to audio and acoustics. Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2011). Robustness of the hearing aid speech quality index (HASQI). In Workshop on applications of signal processing to audio and acoustics.
Zurück zum Zitat Laska, B., Bolic, M., & Goubran, R. (2010). Discrete cosine transform particle filter speech enhancement. Speech Communication, 52, 762–775. CrossRef Laska, B., Bolic, M., & Goubran, R. (2010). Discrete cosine transform particle filter speech enhancement. Speech Communication, 52, 762–775. CrossRef
Zurück zum Zitat Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press. Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.
Zurück zum Zitat Matsunaga, M. (2007). Familywise error in multiple comparisons: disentangling a knot through a critique of O’keefe’s arguments against alpha adjustment. Communication Methods and Measures, 1(4), 243–265. CrossRef Matsunaga, M. (2007). Familywise error in multiple comparisons: disentangling a knot through a critique of O’keefe’s arguments against alpha adjustment. Communication Methods and Measures, 1(4), 243–265. CrossRef
Zurück zum Zitat Moore, B. C. J., & Glasberg, B. R. (2004). A revised model of loudness perception applied to cochlear hearing loss. Hearing Research, 188, 70–88. CrossRef Moore, B. C. J., & Glasberg, B. R. (2004). A revised model of loudness perception applied to cochlear hearing loss. Hearing Research, 188, 70–88. CrossRef
Zurück zum Zitat Moore, B. C. J., & Tan, C.-T. (2003). Perceived naturalness of spectrally distorted speech and music. The Journal of the Acoustical Society of America, 114, 408–419. CrossRef Moore, B. C. J., & Tan, C.-T. (2003). Perceived naturalness of spectrally distorted speech and music. The Journal of the Acoustical Society of America, 114, 408–419. CrossRef
Zurück zum Zitat Moore, B. C. J., & Tan, C. T. (2004). Development and validation of a method for predicting the perceived naturalness of sounds subjected to spectral distortion. Journal of the Audio Engineering Society, 52(9), 900–914. Moore, B. C. J., & Tan, C. T. (2004). Development and validation of a method for predicting the perceived naturalness of sounds subjected to spectral distortion. Journal of the Audio Engineering Society, 52(9), 900–914.
Zurück zum Zitat Möller, S., Chan, W., Côté, N., Falk, T. H., Raake, A., & Wältermann, M. (2011). Speech quality estimation: models and trends. IEEE Signal Processing Magazine, 28(6), 18–28. CrossRef Möller, S., Chan, W., Côté, N., Falk, T. H., Raake, A., & Wältermann, M. (2011). Speech quality estimation: models and trends. IEEE Signal Processing Magazine, 28(6), 18–28. CrossRef
Zurück zum Zitat Quackenbush, S. R., Barnwell, T. P., & Clements, M. A. (1988). Objective measures of speech quality. New York: Prentice Hall. Quackenbush, S. R., Barnwell, T. P., & Clements, M. A. (1988). Objective measures of speech quality. New York: Prentice Hall.
Zurück zum Zitat Ricketts, T. A., Dittberner, A. B., & Johnson, E. E. (2008). High frequency amplification and sound quality in listeners with normal through moderate hearing loss. Journal of Speech, Language, and Hearing Research, 51, 160–172. CrossRef Ricketts, T. A., Dittberner, A. B., & Johnson, E. E. (2008). High frequency amplification and sound quality in listeners with normal through moderate hearing loss. Journal of Speech, Language, and Hearing Research, 51, 160–172. CrossRef
Zurück zum Zitat Rohdenburg, T., Hohmann, V., & Kollmeier, B. (2005). Objective perceptual quality measures for the evaluation of noise reduction schemes. In 9th international workshop on acoustic echo and noise control (pp. 169–172). Rohdenburg, T., Hohmann, V., & Kollmeier, B. (2005). Objective perceptual quality measures for the evaluation of noise reduction schemes. In 9th international workshop on acoustic echo and noise control (pp. 169–172).
Zurück zum Zitat Salmela, J., & Mattila, V. (2004). New intrusive method for the objective quality evaluation of acoustic noise suppression in mobile communications. In Proc. 116th audio eng. soc. conv. Salmela, J., & Mattila, V. (2004). New intrusive method for the objective quality evaluation of acoustic noise suppression in mobile communications. In Proc. 116th audio eng. soc. conv.
Zurück zum Zitat Scalart, P., & Filho, J.V. (1996). Speech enhancement based on a priori signal to noise estimation. In International conference on acoustics, speech, and signal processing (Vol. 2, pp. 629–632). Scalart, P., & Filho, J.V. (1996). Speech enhancement based on a priori signal to noise estimation. In International conference on acoustics, speech, and signal processing (Vol. 2, pp. 629–632).
Zurück zum Zitat Sohn, J., Kim, N.S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3. CrossRef Sohn, J., Kim, N.S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3. CrossRef
Zurück zum Zitat Stelmachowicz, P., Pittman, A., Hoover, B., & Lewis, D. (2001). Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults. The Journal of the Acoustical Society of America, 110(4), 2183–2190. CrossRef Stelmachowicz, P., Pittman, A., Hoover, B., & Lewis, D. (2001). Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults. The Journal of the Acoustical Society of America, 110(4), 2183–2190. CrossRef
Zurück zum Zitat Stoll, G., & Kozamernlk, F. (2000). EBU listening tests on Internet audio codecs (EBU Technical Review). Stoll, G., & Kozamernlk, F. (2000). EBU listening tests on Internet audio codecs (EBU Technical Review).
Zurück zum Zitat Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050. CrossRef Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050. CrossRef
Zurück zum Zitat Varga, I., Iacovo, R. D. De, & Usai, P. (2006). Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Communications Magazine, May, 66–73. CrossRef Varga, I., Iacovo, R. D. De, & Usai, P. (2006). Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Communications Magazine, May, 66–73. CrossRef
Zurück zum Zitat Voran, S. (1997). Listener ratings of speech passbands. In Speech coding for telecommunications proceeding (pp. 81–82). Voran, S. (1997). Listener ratings of speech passbands. In Speech coding for telecommunications proceeding (pp. 81–82).
Metadaten
Titel
Computational auditory models in predicting noise reduction performance for wideband telephony applications
verfasst von
Nazanin Pourmand
Vijay Parsa
Angela Weaver
Publikationsdatum
01.12.2013
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-013-9189-1

Weitere Artikel der Ausgabe 4/2013

International Journal of Speech Technology 4/2013 Zur Ausgabe

Neuer Inhalt