nach oben

International Journal of Speech Technology

Erschienen in:

01.12.2013

Computational auditory models in predicting noise reduction performance for wideband telephony applications

verfasst von: Nazanin Pourmand, Vijay Parsa, Angela Weaver

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The performance of several noise reduction algorithms intended for wideband telephony was evaluated both subjectively and objectively. The chosen algorithms were based on statistical modeling, spectral subtraction, Wiener filtering, or subspace modelling principles. A customized wideband noise reduction database containing speech samples corrupted by three types of background noises at three SNR levels, along with their enhanced versions was created. The overall quality of the speech samples in the database was subsequently rated by a group of listeners with normal hearing capabilities. Comprehensive statistical analyses were performed to assess the reliability of the subjective data, and to assess the performance of noise reduction algorithms across varied noisy conditions. The subjective quality ratings were then used to investigate the performance of several auditory model-based objective quality metrics. Key results from these investigations include: (a) there was a high degree of inter- and intra-subject reliability in the subjective ratings, (b) noise reduction algorithms enhance speech quality for only a subset of the noise conditions, and (c) auditory model-based metrics perform similarly in predicting speech quality ratings, when speech quality scores pertaining to a particular noise condition were averaged.

Nächster Artikel A unified framework for domain independent online speaker indexing in eigen-voice space using an index tree of reference models

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

LogMMSE and LogMMSE_SPU are from the same class of noise reduction algorithms and their implementations are also same with the difference that LogMMSE_SPU algorithm is based on the fact that speech may not be present at all time and there are some pause periods even during speech activity. This Speech Presence Uncertainty (SPU) is taken into account by involving a factor which shows the probability of the presence of the speech at a particular frequency (Loizou 2007).

ANSI S3.5 (1997). Methods for calculation of the speech intelligibility index. Washington: ANSI.

Beaugeant, C., Schönle, M., & Varga, I. (2006). Challenges of 16 kHz in acoustic pre- and post-processing for terminals. IEEE Communications Magazine, 44(5), 98–104. CrossRef

Chen, G., & Parsa, V. (2007). Loudness pattern-based speech quality evaluation using Bayesian modeling and Markov chain Monte Carlo methods. The Journal of the Acoustical Society of America, 121(2), EL77-83.

Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54(3), 477–490. CrossRef

Cox, R. V., Kroon, P., Chen, J. H., Thorkildsen, R., O’Dell, K. M., & Isenberg, D. S. (1995). Speech coders: from idea to product. AT&T Technical Journal, 74, 14–21. CrossRef

Dau, T., Püschel, D., & Kohlrausch, A. (1996). A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America, 99(6), 3615–3622. CrossRef

Dau, T., Kollmeier, B., & Kohlrausch, A. (1997). Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. The Journal of the Acoustical Society of America, 102(5), 2892–2905. CrossRef

Egi, N., Aoki, H., & Takahashi, A. (2008). Objective quality evaluation method for noise-reduced speech. IEICE Transactions on Communications, E91-B(5), 1279–1286. CrossRef

Falk, T. H., & Chan, W. Y. (2008). A non-intrusive quality measure of dereverberated speech. In International workshop for acoustic echo and noise control.

Garbin, C. (2013). Bivariate correlation comparisons. Retrieved from http://psych.unl.edu/psycrs/statpage/biv_corr_comp_eg.pdf.

Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138. CrossRef

Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50(5), 331–342.

Gupta, M., Forrester, C., & Simmons, S. (2009). Review of wideband speech noise reduction techniques. Canadian Acoustic, 37(3), 84–85.

Hansen, M., & Kollmeier, B. (2000). Objective modeling of speech quality with a psychoacoustically validated auditory model. Journal of the Audio Engineering Society, 48(5), 395–409.

Helfenstein, M., & Moschytz, G. S. (2000). Circuits and systems for wireless communications (p. 404). Dordrecht: Kluwer Academic.

Heute, U. (2008). Speech-transmission quality: aspects and assessment for wideband vs. narrowband signals. In Advances in digital speech transmission (pp. 9–50). New York: Wiley.

Holube, I., & Kollmeier, B. (1996). Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. The Journal of the Acoustical Society of America, 100(3), 1703–1716. CrossRef

Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601. CrossRef

Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238. CrossRef

Huber, R., & Kollmeier, B. (2006). PEMO-Q—a new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1902–1911. CrossRef

ITU-T P. 835 (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T.

ITU-T Rec. P. 563 (2004). Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T.

ITU-T Rec. P. 862 (2001). Perceptual evaluation of speech quality (PESQ). ITU-T.

ITU-T Rec. P. 862.2 (2007). Wideband extension to Recommendation P. 862 for the assessment of wideband telephone networks and speech codecs. ITU-T.

Jelinek, M., & Salami, R. (2004). Noise reduction method for wideband speech coding. In Proc EUSIPCO, Vienna, Austria (pp. 1959–1962).

Jepsen, M. L., Ewert, S. D., & Dau, T. (2008). A computational model of human auditory signal processing and perception. The Journal of the Acoustical Society of America, 124(1), 422–438. CrossRef

Kabal, P. (2002). TSP speech database. Retrieved from http://www-mmsp.ece.mcgill.ca/Documents/Data/index.html.

Kamath, S.D. (2001). A multi-band spectral subtraction method for speech enhancement (Master’s thesis). Dallas: University of Texas.

Kates, J. M., & Arehart, K. H. (2010). The hearing-aid speech quality index (HASQI). Journal of the Audio Engineering Society, 58(5), 363–381.

Kim, D. (2005). ANIQUE: an auditory model for single-ended speech quality estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 821–831. CrossRef

Kondo, K. (2012). Subjective quality measurement of speech: its evaluation, estimation and applications (p. 153). Berlin: Springer. CrossRef

Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2011). Robustness of the hearing aid speech quality index (HASQI). In Workshop on applications of signal processing to audio and acoustics.

Laska, B., Bolic, M., & Goubran, R. (2010). Discrete cosine transform particle filter speech enhancement. Speech Communication, 52, 762–775. CrossRef

Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.

Matsunaga, M. (2007). Familywise error in multiple comparisons: disentangling a knot through a critique of O’keefe’s arguments against alpha adjustment. Communication Methods and Measures, 1(4), 243–265. CrossRef

Moore, B. C. J., & Glasberg, B. R. (2004). A revised model of loudness perception applied to cochlear hearing loss. Hearing Research, 188, 70–88. CrossRef

Moore, B. C. J., & Tan, C.-T. (2003). Perceived naturalness of spectrally distorted speech and music. The Journal of the Acoustical Society of America, 114, 408–419. CrossRef

Moore, B. C. J., & Tan, C. T. (2004). Development and validation of a method for predicting the perceived naturalness of sounds subjected to spectral distortion. Journal of the Audio Engineering Society, 52(9), 900–914.

Möller, S., Chan, W., Côté, N., Falk, T. H., Raake, A., & Wältermann, M. (2011). Speech quality estimation: models and trends. IEEE Signal Processing Magazine, 28(6), 18–28. CrossRef

Quackenbush, S. R., Barnwell, T. P., & Clements, M. A. (1988). Objective measures of speech quality. New York: Prentice Hall.

Ricketts, T. A., Dittberner, A. B., & Johnson, E. E. (2008). High frequency amplification and sound quality in listeners with normal through moderate hearing loss. Journal of Speech, Language, and Hearing Research, 51, 160–172. CrossRef

Rohdenburg, T., Hohmann, V., & Kollmeier, B. (2005). Objective perceptual quality measures for the evaluation of noise reduction schemes. In 9th international workshop on acoustic echo and noise control (pp. 169–172).

Salmela, J., & Mattila, V. (2004). New intrusive method for the objective quality evaluation of acoustic noise suppression in mobile communications. In Proc. 116th audio eng. soc. conv.

Scalart, P., & Filho, J.V. (1996). Speech enhancement based on a priori signal to noise estimation. In International conference on acoustics, speech, and signal processing (Vol. 2, pp. 629–632).

Sohn, J., Kim, N.S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3. CrossRef

Stelmachowicz, P., Pittman, A., Hoover, B., & Lewis, D. (2001). Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults. The Journal of the Acoustical Society of America, 110(4), 2183–2190. CrossRef

Stoll, G., & Kozamernlk, F. (2000). EBU listening tests on Internet audio codecs (EBU Technical Review).

Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050. CrossRef

Varga, I., Iacovo, R. D. De, & Usai, P. (2006). Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Communications Magazine, May, 66–73. CrossRef

Voran, S. (1997). Listener ratings of speech passbands. In Speech coding for telecommunications proceeding (pp. 81–82).

Titel: Computational auditory models in predicting noise reduction performance for wideband telephony applications
verfasst von: Nazanin Pourmand
Vijay Parsa
Angela Weaver
Publikationsdatum: 01.12.2013
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-013-9189-1

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Arbeitszeit/© granata68 / Fotolia, E-Autos im Fuhrpark: Lohnt sich das noch?/© Petair / stock.adobe.com, Kryptowährungen/© gopixa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2013

Wavelet fuzzy LVQ based speaker verification system

A new approach of speaker clustering based on the stereophonic differential energy

Identification of Indian languages using multi-level spectral and prosodic features

A voice command system for AUTONOMY using a novel speech alignment algorithm

Pitch synchronous and glottal closure based speech analysis for language recognition

Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.