Top

Published in:

2018 | OriginalPaper | Chapter

4. An Introduction to Multichannel NMF for Audio Source Separation

Authors : Alexey Ozerov, Cédric Févotte, Emmanuel Vincent

Published in: Audio Source Separation

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This chapter introduces multichannel nonnegative matrix factorization (NMF) methods for audio source separation. All the methods and some of their extensions are introduced within a more general local Gaussian modeling (LGM) framework. These methods are very attractive since allow combining spatial and spectral cues in a joint and principal way, but also are natural extensions and generalizations of many single-channel NMF-based methods to the multichannel case. The chapter introduces the spectral (NMF-based) and spatial models, as well as the way to combine them within the LGM framework. Model estimation criteria and algorithms are described as well, while going deeper into details of some of them.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Dynamic Non-negative Models for Audio Source Separation

next chapter General Formulation of Multichannel Extensions of NMF Variants

Throughout the chapter we will generally refer to all these methods as multichannel NMF, while precising when we are speaking about multichannel NTF.

The spatial image of a source means not the source signal itself, but its contribution into the I-channel mixture.

Due to the scale ambiguity between \(\mathbf{R}_{jfn}\) and \(v_{jfn}\) in (4.2), the loudness can be fully attributed to \(v_{jfn}\).

When we write \(\overset{\mathrm{c}}{=}\), that means that the equality is up to some constant that is independent on model parameters \(\varvec{\theta }\), and thus has no influence on the optimization over parameters in (4.23).

Note that if the spatial covariances \(\mathbf{R}_{jf}\) are needed, they can be always computed with (4.29).

D.D. Lee, H.S. Seung, Learning the parts of objects with nonnegative matrix factorization. Nature 401, 788–791 (1999)CrossRefMATH

T. Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3),1066–1074 (2007)

M.N. Schmidt, R.K. Olsson, Single-channel speech separation using sparse non-negative matrix factorization, in Spoken Language Proceesing, ISCA International Conference on (INTERSPEECH) (2006)

L. Le Magoarou, A. Ozerov, N.Q. Duong, Text-informed audio source separation. Example-based approach using non-negative matrix partial co-factorization. J. Signal Process. Syst. 79(2), 117–131 (2015)CrossRef

C.Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis. Neural Comput. 21(3), 793–830 (2009)

D. El Badawy, N.Q. Duong, A. Ozerov, On-the-fly audio source separation—a novel user-friendly framework. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 261–272 (2017)CrossRef

E. Vincent, N. Bertin, R. Badeau, Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18, 528–537 (2010)CrossRef

A. Ozerov, E. Vincent, F. Bimbot, A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Process. 20(4), 1118–1133 (2012)CrossRef

N. Mohammadiha, P. Smaragdis, A. Leijon, Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans. Audio Speech Lang. Process. 21(10), 2140–2151 (2013)CrossRef

10.

D. FitzGerald, M. Cranitch, E. Coyle, Non-negative tensor factorisation for sound source separation, in Proceeding of the Irish Signals and Systems Conference, Dublin, Ireland, Sept 2005

11.

D. FitzGerald, M. Cranitch, E. Coyle, Extended nonnegative tensor factorisation models for musical sound source separation. Comput. Intell. Neurosci. 2008(872425),15 (2008)

12.

A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2010)CrossRef

13.

H. Sawada, R. Mukai, S. Araki, S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 530–538 (2004)CrossRef

14.

M.I. Mandel, D.P. Ellis, T. Jebara, An EM algorithm for localizing multiple sound sources in reverberant environments. NIPS. 19 (2006)

15.

A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, (May 2011), pp. 257–260

16.

H. Sawada, H. Kameoka, S. Araki, N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio Speech Lang. Process. 21(5), 971–982 (2013)CrossRef

17.

J. Nikunen, T. Virtanen, Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 727–739 (2014)CrossRef

18.

N.Q. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)CrossRef

19.

C.Févotte, J.-F. Cardoso, Maximum likelihood approach for blind audio source separation using time-frequency gaussian source models, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE, 2005), pp. 78–81

20.

E. Vincent, S. Arberet, R. Gribonval, Underdetermined instantaneous audio source separation via local gaussian modeling, in International Conference on Independent Component Analysis and Signal Separation. (Springer, 2009), pp. 775–782

21.

H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in International Conference on Latent Variable Analysis and Signal Separation, (Springer, 2010), pp. 245–253

22.

T. Higuchi, H. Takeda, T. Nakamura, H. Kameoka, A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden markov models, in INTERSPEECH, (2014), pp. 850–854

23.

J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, Parametric coding of stereo audio. EURASIP J. Appl. Signal Process. 2005, 1305–1322 (2005)MATH

24.

M.I. Mandel, R.J. Weiss, D.P. Ellis, Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Process. 18(2), 382–394 (2010)CrossRef

25.

E. Vincent, X. Rodet, Underdetermined source separation with structured source priors, in International Conference on Independent Component Analysis and Signal Separation, (Springer, 2004), pp. 327–334

26.

E. Vincent, Musical source separation using time-frequency source priors. IEEE Trans. Audio Speech Lang. Process. 14(1), 91–98 (2006)CrossRef

27.

S. Arberet, A. Ozerov, N.Q. Duong, E. Vincent, R. Gribonval, F. Bimbot, P. Vandergheynst, Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation, in 10th International Conference on Information Sciences Signal Processing and their Applications (ISSPA), 2010, (IEEE, 2010), pp. 1–4

28.

T. Virtanen, A. Klapuri, Analysis of polyphonic audio using source-filter model and non-negative matrix factorization, in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop, (Citeseer, 2006)

29.

N. Souviraà-Labastie, A. Olivero, E. Vincent, F. Bimbot, Multi-channel audio source separation using multiple deformed references. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(11), 1775–1787 (2015)CrossRef

30.

V.Y.F. Tan, C. Févotte, Automatic relevance determination in nonnegative matrix factorization with the beta-divergence. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1592–1605 (2013)

31.

R. Bro, Parafac. tutorial and applications. Chemom. Intell. Lab. Syst. 38(2), 149–171 (1997)CrossRef

32.

L. Parra, C. Spence, Convolutive blind separation of non-stationary sources. IEEE Trans. Speech Audio Process. 8(3), 320–327 (2000)CrossRefMATH

33.

S. Gannot, E. Vincent, S. Markovich-Golan, A. Ozerov, A consolidated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 692–730 (2017)CrossRef

34.

N.Q. Duong, E. Vincent, R. Gribonval, Spatial location priors for gaussian model based reverberant audio source separation. EURASIP J. Adv. Signal Process. 2013(1), 149 (2013)CrossRef

35.

R. Badeau, M.D. Plumbley, Multichannel high-resolution nmf for modeling convolutive mixtures of non-stationary signals in the time-frequency domain. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(11), 1670–1680 (2014)CrossRef

36.

D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, An inverse-gamma source variance prior with factorized parameterization for audio source separation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2016), pp. 136–140

37.

N.Q. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval, S. Sagayama, Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2011), pp. 205–208

38.

T. Higuchi, N. Takamune, T. Nakamura, H. Kameoka, Underdetermined blind separation and tracking of moving sources based on DOA-HMM, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2014), pp. 3191–3195

39.

D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1408–1423 (2016)CrossRef

40.

M. Togami, Online speech source separation based on maximum likelihood of local gaussian modeling, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (IEEE, 2011), pp. 213–216

41.

L.S. Simon, E. Vincent, A general framework for online audio source separation, in International conference on Latent Variable Analysis and Signal Separation, (Springer, 2012), pp. 397–404

42.

N.Q. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using local observed covariance and auditory-motivated time-frequency representation, in International Conference on Latent Variable Analysis and Signal Separation, (Springer, 2010), pp. 73–80

43.

K. Adiloğlu, E. Vincent, Variational bayesian inference for source separation and robust feature extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1746–1758 (2016)CrossRef

44.

A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat.Soc. Ser. B (Statistical Methodology) 39, 1–38 (1977)MathSciNetMATH

45.

J. Thiemann, E. Vincent, A fast EM algorithm for Gaussian model-based source separation, in Proceedings of the 21st European Signal Processing Conference (EUSIPCO), (IEEE, 2013), pp. 1–5

46.

D.R. Hunter, K. Lange, A tutorial on mm algorithms. Am. Stat. 58(1), 30–37 (2004)MathSciNetCrossRef

Title: An Introduction to Multichannel NMF for Audio Source Separation
Authors: Alexey Ozerov
Cédric Févotte
Emmanuel Vincent
Publisher: Springer International Publishing
Book: Audio Source Separation
Print ISBN: 978-3-319-73030-1

Electronic ISBN: 978-3-319-73031-8

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-73031-8_4