Skip to main content
Top

2014 | OriginalPaper | Chapter

8. Itakura-Saito Nonnegative Matrix Two-Dimensional Factorizations for Blind Single Channel Audio Separation

Authors : Bin Gao, Wai Lok Woo

Published in: Blind Source Separation

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A new blind single channel source separation method is presented. The proposed method does not require training knowledge and the separation system is based on nonuniform time-frequency (TF) analysis and feature extraction. Unlike conventional researches that concentrate on the use of spectrogram or its variants, we develop our separation algorithms using an alternative TF representation based on the gammatone filterbank. In particular, we show that the monaural mixed audio signal is considerably more separable in this nonuniform TF domain. We also provide the analysis of signal separability to verify this finding. In addition, we derive two new algorithms that extend the recently published Itakura-Saito nonnegative matrix factorization to the case of convolutive model for the nonstationary source signals. These formulations are based on the Quasi-EM framework and the Multiplicative Gradient Descent (MGD) rule, respectively. Experimental tests have been conducted which show that the proposed method is efficient in extracting the sources’ spectral–temporal features that are characterized by large dynamic range of energy, and thus lead to significant improvement in source separation performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lee, T.W.: Blind source separation of nonlinear mixing models. Neural Netw. 7, 121–131 (1997) Lee, T.W.: Blind source separation of nonlinear mixing models. Neural Netw. 7, 121–131 (1997)
2.
go back to reference Gao, B., Woo, W.L., Dlay, S.S.: Unsupervised single channel separation of non-stationary signals using gammatone filterbank and Itakura-Saito nonnegative matrix two-dimensional factorizations. IEEE Trans. Circuits Syst. I 60(3), 662–675 (2013) Gao, B., Woo, W.L., Dlay, S.S.: Unsupervised single channel separation of non-stationary signals using gammatone filterbank and Itakura-Saito nonnegative matrix two-dimensional factorizations. IEEE Trans. Circuits Syst. I 60(3), 662–675 (2013)
3.
go back to reference Gao, B., Woo, W.L., Dlay, S.S.: Variational regularized two-dimensional nonnegative matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 703–716 (2012) Gao, B., Woo, W.L., Dlay, S.S.: Variational regularized two-dimensional nonnegative matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 703–716 (2012)
4.
go back to reference Hyvarinen, A., Karhunen, J., Oja, E.: Independent component analysis and blind source separation, pp. 20–60. Wiley, New York (2001) Hyvarinen, A., Karhunen, J., Oja, E.: Independent component analysis and blind source separation, pp. 20–60. Wiley, New York (2001)
5.
go back to reference Cichocki, A., Amari, S.I.: Adaptive Blind Signal and Image Processing—Learning Algorithms and Applications. Wiley (2003) Cichocki, A., Amari, S.I.: Adaptive Blind Signal and Image Processing—Learning Algorithms and Applications. Wiley (2003)
6.
go back to reference Hyvarinen, A.: Survey on independent component analysis. Neural Comput. Surv. 1, 94–128 (1999) Hyvarinen, A.: Survey on independent component analysis. Neural Comput. Surv. 1, 94–128 (1999)
7.
go back to reference Taleb, A., Jutten, C.: Source separation in post-nonlinear mixtures. IEEE Trans. Sign. Process. 47(10), 2807–2820 (1999)CrossRef Taleb, A., Jutten, C.: Source separation in post-nonlinear mixtures. IEEE Trans. Sign. Process. 47(10), 2807–2820 (1999)CrossRef
8.
go back to reference Lee, D., Seung, H.: Learning the parts of objects by nonnegative matrix factorisation. Nature 401(6755), 788–791 (1999)CrossRef Lee, D., Seung, H.: Learning the parts of objects by nonnegative matrix factorisation. Nature 401(6755), 788–791 (1999)CrossRef
9.
go back to reference Xie, S., Yang, Z.Y., Fu, Y.L.: Nonnegative matrix factorization applied to nonlinear speech and image Cryptosystems. IEEE Trans. on Circuits Syst. I 55, 2356–2367 (2008) Xie, S., Yang, Z.Y., Fu, Y.L.: Nonnegative matrix factorization applied to nonlinear speech and image Cryptosystems. IEEE Trans. on Circuits Syst. I 55, 2356–2367 (2008)
10.
go back to reference Helén, M., Virtanen, T.: Separation of drums from polyphonic music using nonnegative matrix factorization and support vector machine. In: Proceedings of 13th European Signal Processing. Turkey (2005) Helén, M., Virtanen, T.: Separation of drums from polyphonic music using nonnegative matrix factorization and support vector machine. In: Proceedings of 13th European Signal Processing. Turkey (2005)
11.
go back to reference Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 177–180. (2003) Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 177–180. (2003)
12.
go back to reference Rickard, S., Cichocki, A.: When is non-negative matrix decomposition unique? In: 42nd Annual Conference on Information Sciences and Systems (CISS), pp. 1091–1092. (2008) Rickard, S., Cichocki, A.: When is non-negative matrix decomposition unique? In: 42nd Annual Conference on Information Sciences and Systems (CISS), pp. 1091–1092. (2008)
13.
go back to reference Abdallah, S.A., Plumbley, M.D.: Polyphonic transcription by non-negative sparse coding of power spectra. In: Proceedings of 5th International Confernence on Music Information Retrieval (ISMIR ’04), pp. 318–325. Spain (2004) Abdallah, S.A., Plumbley, M.D.: Polyphonic transcription by non-negative sparse coding of power spectra. In: Proceedings of 5th International Confernence on Music Information Retrieval (ISMIR ’04), pp. 318–325. Spain (2004)
14.
go back to reference Parry, R.M., Essa, I.: Incorporating phase information for source separation via spectrogram factorization. In: Proceedings of Conference on Acoustics, Speech and Signal Processing (ICASSP’07), pp. 661–664. Hawaii (2007) Parry, R.M., Essa, I.: Incorporating phase information for source separation via spectrogram factorization. In: Proceedings of Conference on Acoustics, Speech and Signal Processing (ICASSP’07), pp. 661–664. Hawaii (2007)
15.
go back to reference Kompass, R.: A generalized divergence measure for nonnegative matrix factorization. Neural Comput. 19(3), 780–791 (2007) Kompass, R.: A generalized divergence measure for nonnegative matrix factorization. Neural Comput. 19(3), 780–791 (2007)
16.
go back to reference Cichocki, A., Zdunek, R., Amari, S.-I.: Csisz’ar’s divergences for non-negative matrix factorization: family of new algorithms. In: Proceedings of 6th International Conference on Independent Component Analysis and Signal Separation (ICA ’06), pp. 32–39. Charleston (2006) Cichocki, A., Zdunek, R., Amari, S.-I.: Csisz’ar’s divergences for non-negative matrix factorization: family of new algorithms. In: Proceedings of 6th International Conference on Independent Component Analysis and Signal Separation (ICA ’06), pp. 32–39. Charleston (2006)
17.
go back to reference Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007) Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
18.
go back to reference Radfa, M.H., Dansereau, R.M.: Single-channel speech separation using soft mask filtering. IEEE Trans. Audio Speech Lang. Process. 15(6) (2007) Radfa, M.H., Dansereau, R.M.: Single-channel speech separation using soft mask filtering. IEEE Trans. Audio Speech Lang. Process. 15(6) (2007)
19.
go back to reference Roweis, S.: One microphone source separation. In: Proceedings of Neural Information Processing, pp. 793–799 (2000) Roweis, S.: One microphone source separation. In: Proceedings of Neural Information Processing, pp. 793–799 (2000)
20.
go back to reference Morup, M., Schmidt, M.N.: Sparse Non-negative Matrix Factor 2-D Deconvolution. Technical Report, Denmark (2006) Morup, M., Schmidt, M.N.: Sparse Non-negative Matrix Factor 2-D Deconvolution. Technical Report, Denmark (2006)
21.
go back to reference Schmidt, M.N., Morup, M.: Nonnegative matrix factor 2-D deconvolution for blind single channel source separation. In: Proceedings 6th Internatinal Conference on Independent Component Analysis and Signal Separation (ICA ’06), pp. 700–707. Charleston (2006) Schmidt, M.N., Morup, M.: Nonnegative matrix factor 2-D deconvolution for blind single channel source separation. In: Proceedings 6th Internatinal Conference on Independent Component Analysis and Signal Separation (ICA ’06), pp. 700–707. Charleston (2006)
22.
go back to reference Gröchenig, K.: Foundations of Time-Frequency Analysis. Birkhauser, Boston (2001) Gröchenig, K.: Foundations of Time-Frequency Analysis. Birkhauser, Boston (2001)
23.
go back to reference Brown, Judith C.: Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)CrossRef Brown, Judith C.: Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)CrossRef
24.
go back to reference Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Networks 15(5), 1135–1150 (2004) Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Networks 15(5), 1135–1150 (2004)
25.
go back to reference Roads, C., et al.: The computer music tutorial. The MIT Press, Cambridge (1996) Roads, C., et al.: The computer music tutorial. The MIT Press, Cambridge (1996)
26.
go back to reference Schulz, S., Herfet, t.: Binaural source separation in non-ideal reverberant environments. In: Proceedings of 10th International Conference on Digital Audio Effects (DAFx-07), pp. 10–15. Bordeaux, France (2007) Schulz, S., Herfet, t.: Binaural source separation in non-ideal reverberant environments. In: Proceedings of 10th International Conference on Digital Audio Effects (DAFx-07), pp. 10–15. Bordeaux, France (2007)
27.
go back to reference Wang, D.L.: On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi, P. (ed.) Speech Separation by Humans and Machines, pp. 181–197. Norwell, Kluwer (2005) Wang, D.L.: On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi, P. (ed.) Speech Separation by Humans and Machines, pp. 181–197. Norwell, Kluwer (2005)
28.
go back to reference Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC music database: music genre database and musical instrument sound database. In: Proceedings of International Symposium on Music Information Retrieval (ISMIR), pp. 229–230. Baltimore, Maryland (2003) Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC music database: music genre database and musical instrument sound database. In: Proceedings of International Symposium on Music Information Retrieval (ISMIR), pp. 229–230. Baltimore, Maryland (2003)
29.
go back to reference Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Sign. Process. 52(7), 1830–1847 (2004) Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Sign. Process. 52(7), 1830–1847 (2004)
30.
go back to reference Gao, B., Woo, W.L., Dlay, S.S.: Single channel source separation using EMD-subband variable regularized sparse features. IEEE Trans. Audio Speech Lang. Process. 19, 961–976 (2011) Gao, B., Woo, W.L., Dlay, S.S.: Single channel source separation using EMD-subband variable regularized sparse features. IEEE Trans. Audio Speech Lang. Process. 19, 961–976 (2011)
31.
go back to reference Gao, B., Woo, W.L., Dlay, S.S.: Adaptive sparsity non-negative matrix factorization for single channel source separation. IEEE J. Sel. Top. Sign. Process, 5, 1932–4553 (2011) Gao, B., Woo, W.L., Dlay, S.S.: Adaptive sparsity non-negative matrix factorization for single channel source separation. IEEE J. Sel. Top. Sign. Process, 5, 1932–4553 (2011)
32.
go back to reference Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of 6th International Congress on Acoustics, pp. C-17–C-20. Tokyo, Aug 1968 Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of 6th International Congress on Acoustics, pp. C-17–C-20. Tokyo, Aug 1968
33.
go back to reference Fevotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009) Fevotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009)
Metadata
Title
Itakura-Saito Nonnegative Matrix Two-Dimensional Factorizations for Blind Single Channel Audio Separation
Authors
Bin Gao
Wai Lok Woo
Copyright Year
2014
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-55016-4_8