Skip to main content
Top

2018 | OriginalPaper | Chapter

4. An Introduction to Multichannel NMF for Audio Source Separation

Authors : Alexey Ozerov, Cédric Févotte, Emmanuel Vincent

Published in: Audio Source Separation

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter introduces multichannel nonnegative matrix factorization (NMF) methods for audio source separation. All the methods and some of their extensions are introduced within a more general local Gaussian modeling (LGM) framework. These methods are very attractive since allow combining spatial and spectral cues in a joint and principal way, but also are natural extensions and generalizations of many single-channel NMF-based methods to the multichannel case. The chapter introduces the spectral (NMF-based) and spatial models, as well as the way to combine them within the LGM framework. Model estimation criteria and algorithms are described as well, while going deeper into details of some of them.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Throughout the chapter we will generally refer to all these methods as multichannel NMF, while precising when we are speaking about multichannel NTF.
 
2
The spatial image of a source means not the source signal itself, but its contribution into the I-channel mixture.
 
3
Due to the scale ambiguity between \(\mathbf{R}_{jfn}\) and \(v_{jfn}\) in (4.2), the loudness can be fully attributed to \(v_{jfn}\).
 
4
When we write \(\overset{\mathrm{c}}{=}\), that means that the equality is up to some constant that is independent on model parameters \(\varvec{\theta }\), and thus has no influence on the optimization over parameters in (4.23).
 
5
Note that if the spatial covariances \(\mathbf{R}_{jf}\) are needed, they can be always computed with (4.29).
 
Literature
1.
go back to reference D.D. Lee, H.S. Seung, Learning the parts of objects with nonnegative matrix factorization. Nature 401, 788–791 (1999)CrossRefMATH D.D. Lee, H.S. Seung, Learning the parts of objects with nonnegative matrix factorization. Nature 401, 788–791 (1999)CrossRefMATH
2.
go back to reference T. Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3),1066–1074 (2007) T. Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3),1066–1074 (2007)
3.
go back to reference M.N. Schmidt, R.K. Olsson, Single-channel speech separation using sparse non-negative matrix factorization, in Spoken Language Proceesing, ISCA International Conference on (INTERSPEECH) (2006) M.N. Schmidt, R.K. Olsson, Single-channel speech separation using sparse non-negative matrix factorization, in Spoken Language Proceesing, ISCA International Conference on (INTERSPEECH) (2006)
4.
go back to reference L. Le Magoarou, A. Ozerov, N.Q. Duong, Text-informed audio source separation. Example-based approach using non-negative matrix partial co-factorization. J. Signal Process. Syst. 79(2), 117–131 (2015)CrossRef L. Le Magoarou, A. Ozerov, N.Q. Duong, Text-informed audio source separation. Example-based approach using non-negative matrix partial co-factorization. J. Signal Process. Syst. 79(2), 117–131 (2015)CrossRef
5.
go back to reference C.Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis. Neural Comput. 21(3), 793–830 (2009) C.Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis. Neural Comput. 21(3), 793–830 (2009)
6.
go back to reference D. El Badawy, N.Q. Duong, A. Ozerov, On-the-fly audio source separation—a novel user-friendly framework. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 261–272 (2017)CrossRef D. El Badawy, N.Q. Duong, A. Ozerov, On-the-fly audio source separation—a novel user-friendly framework. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 261–272 (2017)CrossRef
7.
go back to reference E. Vincent, N. Bertin, R. Badeau, Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18, 528–537 (2010)CrossRef E. Vincent, N. Bertin, R. Badeau, Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18, 528–537 (2010)CrossRef
8.
go back to reference A. Ozerov, E. Vincent, F. Bimbot, A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Process. 20(4), 1118–1133 (2012)CrossRef A. Ozerov, E. Vincent, F. Bimbot, A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Process. 20(4), 1118–1133 (2012)CrossRef
9.
go back to reference N. Mohammadiha, P. Smaragdis, A. Leijon, Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans. Audio Speech Lang. Process. 21(10), 2140–2151 (2013)CrossRef N. Mohammadiha, P. Smaragdis, A. Leijon, Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans. Audio Speech Lang. Process. 21(10), 2140–2151 (2013)CrossRef
10.
go back to reference D. FitzGerald, M. Cranitch, E. Coyle, Non-negative tensor factorisation for sound source separation, in Proceeding of the Irish Signals and Systems Conference, Dublin, Ireland, Sept 2005 D. FitzGerald, M. Cranitch, E. Coyle, Non-negative tensor factorisation for sound source separation, in Proceeding of the Irish Signals and Systems Conference, Dublin, Ireland, Sept 2005
11.
go back to reference D. FitzGerald, M. Cranitch, E. Coyle, Extended nonnegative tensor factorisation models for musical sound source separation. Comput. Intell. Neurosci. 2008(872425),15 (2008) D. FitzGerald, M. Cranitch, E. Coyle, Extended nonnegative tensor factorisation models for musical sound source separation. Comput. Intell. Neurosci. 2008(872425),15 (2008)
12.
go back to reference A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2010)CrossRef A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2010)CrossRef
13.
go back to reference H. Sawada, R. Mukai, S. Araki, S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 530–538 (2004)CrossRef H. Sawada, R. Mukai, S. Araki, S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 530–538 (2004)CrossRef
14.
go back to reference M.I. Mandel, D.P. Ellis, T. Jebara, An EM algorithm for localizing multiple sound sources in reverberant environments. NIPS. 19 (2006) M.I. Mandel, D.P. Ellis, T. Jebara, An EM algorithm for localizing multiple sound sources in reverberant environments. NIPS. 19 (2006)
15.
go back to reference A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, (May 2011), pp. 257–260 A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, (May 2011), pp. 257–260
16.
go back to reference H. Sawada, H. Kameoka, S. Araki, N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio Speech Lang. Process. 21(5), 971–982 (2013)CrossRef H. Sawada, H. Kameoka, S. Araki, N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio Speech Lang. Process. 21(5), 971–982 (2013)CrossRef
17.
go back to reference J. Nikunen, T. Virtanen, Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 727–739 (2014)CrossRef J. Nikunen, T. Virtanen, Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 727–739 (2014)CrossRef
18.
go back to reference N.Q. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)CrossRef N.Q. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)CrossRef
19.
go back to reference C.Févotte, J.-F. Cardoso, Maximum likelihood approach for blind audio source separation using time-frequency gaussian source models, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE, 2005), pp. 78–81 C.Févotte, J.-F. Cardoso, Maximum likelihood approach for blind audio source separation using time-frequency gaussian source models, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE, 2005), pp. 78–81
20.
go back to reference E. Vincent, S. Arberet, R. Gribonval, Underdetermined instantaneous audio source separation via local gaussian modeling, in International Conference on Independent Component Analysis and Signal Separation. (Springer, 2009), pp. 775–782 E. Vincent, S. Arberet, R. Gribonval, Underdetermined instantaneous audio source separation via local gaussian modeling, in International Conference on Independent Component Analysis and Signal Separation. (Springer, 2009), pp. 775–782
21.
go back to reference H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in International Conference on Latent Variable Analysis and Signal Separation, (Springer, 2010), pp. 245–253 H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in International Conference on Latent Variable Analysis and Signal Separation, (Springer, 2010), pp. 245–253
22.
go back to reference T. Higuchi, H. Takeda, T. Nakamura, H. Kameoka, A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden markov models, in INTERSPEECH, (2014), pp. 850–854 T. Higuchi, H. Takeda, T. Nakamura, H. Kameoka, A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden markov models, in INTERSPEECH, (2014), pp. 850–854
23.
go back to reference J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, Parametric coding of stereo audio. EURASIP J. Appl. Signal Process. 2005, 1305–1322 (2005)MATH J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, Parametric coding of stereo audio. EURASIP J. Appl. Signal Process. 2005, 1305–1322 (2005)MATH
24.
go back to reference M.I. Mandel, R.J. Weiss, D.P. Ellis, Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Process. 18(2), 382–394 (2010)CrossRef M.I. Mandel, R.J. Weiss, D.P. Ellis, Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Process. 18(2), 382–394 (2010)CrossRef
25.
go back to reference E. Vincent, X. Rodet, Underdetermined source separation with structured source priors, in International Conference on Independent Component Analysis and Signal Separation, (Springer, 2004), pp. 327–334 E. Vincent, X. Rodet, Underdetermined source separation with structured source priors, in International Conference on Independent Component Analysis and Signal Separation, (Springer, 2004), pp. 327–334
26.
go back to reference E. Vincent, Musical source separation using time-frequency source priors. IEEE Trans. Audio Speech Lang. Process. 14(1), 91–98 (2006)CrossRef E. Vincent, Musical source separation using time-frequency source priors. IEEE Trans. Audio Speech Lang. Process. 14(1), 91–98 (2006)CrossRef
27.
go back to reference S. Arberet, A. Ozerov, N.Q. Duong, E. Vincent, R. Gribonval, F. Bimbot, P. Vandergheynst, Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation, in 10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA), 2010, (IEEE, 2010), pp. 1–4 S. Arberet, A. Ozerov, N.Q. Duong, E. Vincent, R. Gribonval, F. Bimbot, P. Vandergheynst, Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation, in 10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA), 2010, (IEEE, 2010), pp. 1–4
28.
go back to reference T. Virtanen, A. Klapuri, Analysis of polyphonic audio using source-filter model and non-negative matrix factorization, in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop, (Citeseer, 2006) T. Virtanen, A. Klapuri, Analysis of polyphonic audio using source-filter model and non-negative matrix factorization, in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop, (Citeseer, 2006)
29.
go back to reference N. Souviraà-Labastie, A. Olivero, E. Vincent, F. Bimbot, Multi-channel audio source separation using multiple deformed references. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(11), 1775–1787 (2015)CrossRef N. Souviraà-Labastie, A. Olivero, E. Vincent, F. Bimbot, Multi-channel audio source separation using multiple deformed references. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(11), 1775–1787 (2015)CrossRef
30.
go back to reference V.Y.F. Tan, C. Févotte, Automatic relevance determination in nonnegative matrix factorization with the beta-divergence. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1592–1605 (2013) V.Y.F. Tan, C. Févotte, Automatic relevance determination in nonnegative matrix factorization with the beta-divergence. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1592–1605 (2013)
31.
go back to reference R. Bro, Parafac. tutorial and applications. Chemom. Intell. Lab. Syst. 38(2), 149–171 (1997)CrossRef R. Bro, Parafac. tutorial and applications. Chemom. Intell. Lab. Syst. 38(2), 149–171 (1997)CrossRef
32.
go back to reference L. Parra, C. Spence, Convolutive blind separation of non-stationary sources. IEEE Trans. Speech Audio Process. 8(3), 320–327 (2000)CrossRefMATH L. Parra, C. Spence, Convolutive blind separation of non-stationary sources. IEEE Trans. Speech Audio Process. 8(3), 320–327 (2000)CrossRefMATH
33.
go back to reference S. Gannot, E. Vincent, S. Markovich-Golan, A. Ozerov, A consolidated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 692–730 (2017)CrossRef S. Gannot, E. Vincent, S. Markovich-Golan, A. Ozerov, A consolidated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 692–730 (2017)CrossRef
34.
go back to reference N.Q. Duong, E. Vincent, R. Gribonval, Spatial location priors for gaussian model based reverberant audio source separation. EURASIP J. Adv. Signal Process. 2013(1), 149 (2013)CrossRef N.Q. Duong, E. Vincent, R. Gribonval, Spatial location priors for gaussian model based reverberant audio source separation. EURASIP J. Adv. Signal Process. 2013(1), 149 (2013)CrossRef
35.
go back to reference R. Badeau, M.D. Plumbley, Multichannel high-resolution nmf for modeling convolutive mixtures of non-stationary signals in the time-frequency domain. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(11), 1670–1680 (2014)CrossRef R. Badeau, M.D. Plumbley, Multichannel high-resolution nmf for modeling convolutive mixtures of non-stationary signals in the time-frequency domain. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(11), 1670–1680 (2014)CrossRef
36.
go back to reference D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, An inverse-gamma source variance prior with factorized parameterization for audio source separation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2016), pp. 136–140 D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, An inverse-gamma source variance prior with factorized parameterization for audio source separation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2016), pp. 136–140
37.
go back to reference N.Q. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval, S. Sagayama, Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2011), pp. 205–208 N.Q. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval, S. Sagayama, Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2011), pp. 205–208
38.
go back to reference T. Higuchi, N. Takamune, T. Nakamura, H. Kameoka, Underdetermined blind separation and tracking of moving sources based on DOA-HMM, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2014), pp. 3191–3195 T. Higuchi, N. Takamune, T. Nakamura, H. Kameoka, Underdetermined blind separation and tracking of moving sources based on DOA-HMM, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2014), pp. 3191–3195
39.
go back to reference D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1408–1423 (2016)CrossRef D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1408–1423 (2016)CrossRef
40.
go back to reference M. Togami, Online speech source separation based on maximum likelihood of local gaussian modeling, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (IEEE, 2011), pp. 213–216 M. Togami, Online speech source separation based on maximum likelihood of local gaussian modeling, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (IEEE, 2011), pp. 213–216
41.
go back to reference L.S. Simon, E. Vincent, A general framework for online audio source separation, in International conference on Latent Variable Analysis and Signal Separation, (Springer, 2012), pp. 397–404 L.S. Simon, E. Vincent, A general framework for online audio source separation, in International conference on Latent Variable Analysis and Signal Separation, (Springer, 2012), pp. 397–404
42.
go back to reference N.Q. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using local observed covariance and auditory-motivated time-frequency representation, in International Conference on Latent Variable Analysis and Signal Separation, (Springer, 2010), pp. 73–80 N.Q. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using local observed covariance and auditory-motivated time-frequency representation, in International Conference on Latent Variable Analysis and Signal Separation, (Springer, 2010), pp. 73–80
43.
go back to reference K. Adiloğlu, E. Vincent, Variational bayesian inference for source separation and robust feature extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1746–1758 (2016)CrossRef K. Adiloğlu, E. Vincent, Variational bayesian inference for source separation and robust feature extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1746–1758 (2016)CrossRef
44.
go back to reference A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat.Soc. Ser. B (Statistical Methodology) 39, 1–38 (1977)MathSciNetMATH A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat.Soc. Ser. B (Statistical Methodology) 39, 1–38 (1977)MathSciNetMATH
45.
go back to reference J. Thiemann, E. Vincent, A fast EM algorithm for Gaussian model-based source separation, in Proceedings of the 21st European Signal Processing Conference (EUSIPCO), (IEEE, 2013), pp. 1–5 J. Thiemann, E. Vincent, A fast EM algorithm for Gaussian model-based source separation, in Proceedings of the 21st European Signal Processing Conference (EUSIPCO), (IEEE, 2013), pp. 1–5
Metadata
Title
An Introduction to Multichannel NMF for Audio Source Separation
Authors
Alexey Ozerov
Cédric Févotte
Emmanuel Vincent
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-73031-8_4