Top

Published in:

2014 | OriginalPaper | Chapter

12. On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis

Authors : Christopher Hummersone, Toby Stokes, Tim Brookes

Published in: Blind Source Separation

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The ideal binary mask (IBM) is widely considered to be the benchmark for time–frequency-based sound source separation techniques such as computational auditory scene analysis (CASA). However, it is well known that binary masking introduces objectionable distortion, especially musical noise. This can make binary masking unsuitable for sound source separation applications where the output is auditioned. It has been suggested that soft masking reduces musical noise and leads to a higher quality output. A previously defined soft mask, the ideal ratio mask (IRM), is found to have similar properties to the IBM, may correspond more closely to auditory processes, and offers additional computational advantages. Consequently, the IRM is proposed as the goal of CASA. To further support this position, a number of studies are reviewed that show soft masks to provide superior performance to the IBM in applications such as automatic speech recognition and speech intelligibility. A brief empirical study provides additional evidence demonstrating the objective and perceptual superiority of the IRM over the IBM.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Speech Separation and Extraction by Combining Superdirective Beamforming and Blind Source Separation

next chapter Monaural Speech Enhancement Based on Multi-threshold Masking

http://sisec.wiki.irisa.fr/tiki-index.php

Anzalone, M., Calandruccio, L., Doherty, K., Carney, L.: Determination of the potential benefit of time-frequency gain manipulation. Ear and hearing 27(5), 480 (2006)CrossRef

Araki, S., Makino, S., Sawada, H., Mukai, R.: Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ICA. In: Puntonet, C., Prieto, A. (eds.) Independent Component Analysis and Blind Signal Separation. Lecture Notes in Computer Science, vol. 3195, pp. 898–905. Springer, Berlin (2004)

Araki, S., Makino, S., Sawada, H., Mukai, R.: Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask. IEEE Int. Conf. Acoust. Speech Signal Proc. (ICASSP) III, 81–84 (2005)

Araki, S., Nesta, F., Vincent, E., Koldovsk, Z., Nolte, G., Ziehe, A., Benichoux, A.: The 2011 signal separation evaluation campaign (SiSEC2011): audio source separation. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) Latent Variable Analysis and Signal Separation. Lecture Notes in Computer Science, vol. 7191, pp. 414–422. Springer, Berlin, Heidelberg (2012)

Araki, S., Sawada, H., Mukai, R., Makino, S.: Blind sparse source separation with spatially smoothed time-frequency masking. In: International Workshop on Acoustic, Echo and Noise Control. Paris (2006)

Barker, J., Josifovski, L., Cooke, M.P., Green, P.D.: Soft decisions in missing data techniques for robust automatic speech recognition. In: Proceedings of International Conference on Spoken Language Processing, pp. 373–376 (2000)

Bell, A.J., Sejnowski, T.J.: An information-maximization approach to blind separation and blind deconvolution. Neural computation 7(6), 1129–1159 (1995)CrossRef

Bregman, A.: The meaning of duplex perception: sounds as transparent objects. In: Schouten, M.E.H. (ed.) The Psychophysics of Speech Perception, pp. 95–111. Martinus Nijhoff, Dordrecht (1987)

Bregman, A.S.: Auditory Scene Analysis. MIT Press, Cambridge (1990)

10.

Brons, I., Houben, R., Dreschler, W.A.: Perceptual effects of noise reduction by time-frequency masking of noisy speech. J. Acoust. Soc. Am. 132(4), 2690–2699 (2012)CrossRef

11.

Brungart, D.S., Chang, P.S., Simpson, B.D., Wang, D.: Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. J. Acoust. Soc. Am. 120(6), 4007–4018 (2006)CrossRef

12.

Christensen, H., Barker, J., Ma, N., Green, P.: The chime corpus: a resource and a challenge for computational hearing in multisource environments. In: Proceedings of Interspeech (2010)

13.

Coy, A., Barker, J.: An automatic speech recognition system based on the scene analysis account of auditory perception. Speech Commun. 49(5), 384–401 (2007)CrossRef

14.

Emiya, V., Vincent, E., Harlander, N., Hohmann, V.: Subjective and objective quality assessment of audio source separation. IEEE Trans. Audio Speech Lang. Proc. 19(7), 2046–2057 (2011)

15.

Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Proc. 32(6), 1109–1121 (1984)

16.

Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Proc. 33(2), 443–445 (1985)

17.

Erkelens, J., Hendriks, R., Heusdens, R., Jensen, J.: Minimum mean-square error estimation of discrete fourier coefficients with generalized gamma priors. IEEE Trans. Audio Speech Lang. Proc. 15(6), 1741–1752 (2007)

18.

Grais, E., Erdogan, H.: Single channel speech music separation using nonnegative matrix factorization and spectral masks. In: The 17th International Conference on Digital Signal Processing, pp. 1–6 (2011)

19.

Hartmann, W., Fosler-Lussier, E.: Investigations into the incorporation of the ideal binary mask in ASR. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4804–4807 (2011)

20.

Hendriks, R., Heusdens, R., Jensen, J.: MMSE based noise PSD tracking with low complexity. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4266–4269 (2010)

21.

Hu, Y., Loizou, P.C.: Techniques for estimating the ideal binary mask. In: Proceedings 11th International Workshop on Acoustic Echo and Noise Control (2008)

22.

Jensen, J., Hendriks, R.: Spectral magnitude minimum mean-square error estimation using binary and continuous gain functions. IEEE Trans. Audio Speech Lang. Proc. 20(1), 92–102 (2012)

23.

Jutten, C., Hérault, J.: Independent component analysis (inca) versus principal component analysis. In: Signal Processing IV: Theories and applications—Proceedings of EUSIPCO, pp. 643–646. North-Holland, Grenoble (1988)

24.

Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRef

25.

Li, M., McAllister, H., Black, N., De Perez, T.: Perceptual time-frequency subtraction algorithm for noise reduction in hearing aids. IEEE Trans. Biomed. Eng. 48(9), 979–988 (2001)CrossRef

26.

Li, N., Loizou, P.C.: Effect of spectral resolution on the intelligibility of ideal binary masked speech. J. Acoust. Soc. Am. 123(4), 59–64 (2008)CrossRef

27.

Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. J. Acoust. Soc. Am. 123(3), 1673–1682 (2008)CrossRef

28.

Li, Y., Wang, D.: On the optimality of ideal binary time-frequency masks. Speech Commun. 51(3), 230–239 (2009)CrossRef

29.

Madhu, N., Breithaupt, C., Martin, R.: Temporal smoothing of spectral masks in the cepstral domain for speech separation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 45–48 (2008)

30.

Madhu, N., Spriet, A., Jansen, S., Koning, R., Wouters, J.: The potential for speech intelligibility improvement using the ideal binary mask and the ideal Wiener filter in single channel noise reduction systems: Application to auditory prostheses. IEEE Trans. Audio Speech Lang. Proc. 21(1), 63–72 (2013)

31.

Makkiabadi, B., Sanei, S., Marshall, D.: A k-subspace based tensor factorization approach for under-determined blind identification. In: Forty Fourth Asilomar Conference on Signals, Systems and Computers, pp. 18–22 (2010)

32.

Moore, B.C.J.: An Introduction to the Psychology of Hearing, 5th edn. Academic Press, London (2004)

33.

Mowlaee, P., Saeidi, R., Martin, R.: Model-driven speech enhancement for multisource reverberant environment (signal separation evaluation campaign (SiSEC) 2011). In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) Latent Variable Analysis and Signal Separation. Lecture Notes in Computer Science, vol. 7191, pp. 454–461. Springer, Berlin, Heidelberg (2012)

34.

Naik, G.R., Kumar, D.K.: An overview of independent component analysis and its applications. Informatica 35, 63–81 (2011)MATH

35.

Ozerov, A., Vincent, E., Bimbot, F.: A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Proc. 20(4), 1118–1133 (2012)

36.

Patterson, R., Nimmo-Smith, I., Holdsworth, J., Rice, P.: An efficient auditory filterbank based on the gammatone function. Technical report, MRC Applied Psychology Unit, Cambridge (1987)

37.

Pedersen, M., Wang, D., Larsen, J., Kjems, U.: Overcomplete blind source separation by combining ICA and binary time-frequency masking. In: IEEE Workshop Machine Learning Signal Processing, pp. 15–20 (2005)

38.

Peterson, W., Birdsall, T.G., Fox, W.C.: The theory of signal detectability. In: Proceedings of the IRE Professional Group on Information Theory 4, pp. 171–212 (1954)

39.

Rangachari, S., Loizou, P.C.: A noise-estimation algorithm for highly non-stationary environments. Speech Commun. 48(2), 220–231 (2006)CrossRef

40.

Roman, N., Wang, D.: Pitch-based monaural segregation of reverberant speech. J. Acoust. Soc. Am. 120(1), 458–469 (2006)CrossRef

41.

Shannon, R., Zeng, F., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with primarily temporal cues. Science 270, 303–303 (1995)CrossRef

42.

Srinivasan, S., Roman, N., Wang, D.: Binary and ratio time-frequency masks for robust speech recognition. Speech Commun. 48(11), 1486–1501 (2006)CrossRef

43.

Stokes, T., Hummersone, C., Brookes, T.: Reducing binary masking artefacts in blind audio source separation. In: Proceedings of 134th Engineering Society Convention Rome (2013)

44.

Swets, J.A.: Is there a sensory threshold? Science 134(3473), 168–177 (1961)CrossRef

45.

Swets, J.A.: Signal Detection and Recognition by Human Observers. Wiley, New York (1964)

46.

Tanner Jr, W.P., Swets, J.A.: A decision-making theory of visual detection. Psychol. Rev. 61(6), 401–409 (1954)CrossRef

47.

Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Proc. 14(4), 1462–1469 (2006)

48.

Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi, P. (ed.) Speech Separation by Humans and Machines, pp. 181–197. Kluwer Academic, Norwell (2005)

49.

Wang, D.: Time-frequency masking for speech separation and its potential for hearing aid design. Trends Amplif. 12(4), 332–353 (2008)CrossRef

50.

Wang, D., Brown, G.J.: Fundamentals of computational auditory scene analysis. In: Wang, D., Brown, G.J. (eds.) Computational Auditory Scene Analysis: Principles, Algorithms and Applications, pp. 1–44. Wiley, Hoboken (2006)

51.

Wiener, N.: Extrapolation, Interpolation, and Smoothing of Stationary Time Series: with Engineering Applications. MIT Press, Cambridge (1950)

Title: On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis
Authors: Christopher Hummersone
Toby Stokes
Tim Brookes
Publisher: Springer Berlin Heidelberg
Book: Blind Source Separation
Print ISBN: 978-3-642-55015-7

Electronic ISBN: 978-3-642-55016-4

Copyright Year: 2014
DOI: https://doi.org/10.1007/978-3-642-55016-4_12

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"