Top

Published in:

2018 | OriginalPaper | Chapter

5. General Formulation of Multichannel Extensions of NMF Variants

Authors : Hirokazu Kameoka, Hiroshi Sawada, Takuya Higuchi

Published in: Audio Source Separation

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Blind source separation (BSS) is generally a mathematically ill-posed problem that involves separating out individual source signals from microphone array inputs. The frequency domain BSS approach is particularly notable in that it provides the flexibility needed to exploit various models for the time-frequency representations of source signals and/or array responses. Many frequency domain BSS approaches can be categorized according to the way in which the source power spectrograms and/or the mixing process are modeled. For source power spectrogram modeling, the non-negative matrix factorization (NMF) model and its variants have recently proved very powerful. For mixing process modeling, one reasonable way involves introducing a plane wave assumption so that the spatial covariances of each source can be described explicitly using the direction of arrival (DOA). This chapter provides a general formulation of the frequency domain BSS that makes it possible to incorporate the models for the source power spectrogram and the source spatial covariance matrix. Through this formulation, we reveal the relationship between the state-of-the-art BSS approaches. We further show that combining these models allows us to solve the problems of source separation, DOA estimation, dereverberation, and voice activity detection in a unified manner.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

previous chapter An Introduction to Multichannel NMF for Audio Source Separation

next chapter Determined Blind Source Separation with Independent Low-Rank Matrix Analysis

The permutation alignment problem refers to a problem of grouping together the separated components of different frequency bins that originate from the same source to construct a separated signal.

If we want to maximize \(\mathscr {C}({\varvec{\theta }})\), we will use a minorizer instead, which is defined as \(\mathscr {C}({\varvec{\theta }}) = \max _{{\varvec{\alpha }}} \mathscr {D}({\varvec{\theta }},{\varvec{\alpha }})\).

A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, New York, 2001)CrossRef

A. Hiroe, Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proceedings International Conference on Independent Component Analysis and Blind Source Separation (ICA) (2006), pp. 601–608

T. Kim, T. Eltoft, T.-W. Lee, Independent vector analysis: An extension of ICA to multivariate components, in Proceedings of International Conference on Independent Component Analysis and Blind Source Separation (ICA) (2006), pp. 165–172

A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2010). MarCrossRef

H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in Proceedings of International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2010), pp. 245–253

A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, May 2011, pp. 257–260

H. Sawada, H. Kameoka, S. Araki, N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio Speech Lang. Process. 21(5), 971–982 (2013). MayCrossRef

J. Nikunen, T. Virtanen, Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 727–739 (2014). MarCrossRef

D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2015, pp. 276–280

10.

D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1626–1641 (2016)CrossRef

11.

K. Adiloğlu, E. Vincent, Variational Bayesian inference for source separation and robust feature extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 24, 1746–1758 (2016)CrossRef

12.

D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1408–1423 (2016)CrossRef

13.

P. Smaragdis, J.C. Brown, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2003), pp. 177–180

14.

C. Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Comput. 21(3), 793–830 (2009). MarCrossRefMATH

15.

T. Higuchi, H. Takeda, T. Nakamura, H. Kameoka, A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models, in Proceedings of Annual Conference of the International Speech Communication Association (Interspeech) (2014), pp. 850–854

16.

T. Higuchi, H. Kameoka, Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model, in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (2014)

17.

T. Higuchi, H. Kameoka, Unified approach for underdetermined BSS, VAD, dereverberation and DOA estimation with multichannel factorial HMM, in Proceedings of IEEE Global Conference on Signal and Information Processing (GlobalSIP) (2014)

18.

T. Higuchi, H. Kameoka, Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model, in Proceedings of European Signal Processing Conference (EUSIPCO), August 2015

19.

H. Kameoka, M. Sato, T. Ono, N. Ono, S. Sagayama, Blind separation of infinitely many sparse sources, in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC) (2012)

20.

H. Kameoka, M. Sato, T. Ono, N. Ono, S. Sagayama, Bayesian nonparametric approach to blind separation of infinitely many sparse sources. IEICE Trans. Fundamentals Electronics E96-A(10), 1928–1937 (2013)

21.

T. Otsuka, K. Ishiguro, H. Sawada, H.G. Okuno, Bayesian nonparametrics for microphone array processing. IEEE/ACM Trans. Audio Speech Lang. Process. 22(2), 493–504 (2014)CrossRef

22.

T. Higuchi, N. Takamune, T. Nakamura, H. Kameoka, Underdetermined blind separation and tracking of moving sources based on DOA-HMM, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 3215–3219

23.

H. Attias, New EM algorithms for source separation and deconvolution with a microphone array, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. V (2003), pp. 297–300

24.

N.Q.K. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)CrossRef

25.

A. Ozerov, E. Vincent, F. Bimbot, A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Process. 20(4), 1118–1133 (2012)CrossRef

26.

T. Ono, N. Ono, S. Sagayama, User-guided independent vector analysis with source activity tuning, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 2417–2420

27.

S. Dégerine, A. Zaïdi, Separation of an instantaneous mixture of gaussian autoregressive sources by the exact maximum likelihood approach. IEEE Trans. Sig. Process. 52(6), 1499–1512 (2004)MathSciNetCrossRefMATH

28.

T. Yoshioka, T. Nakatani, M. Miyoshi, H.G. Okuno, Blind separation and dereverberation of speech mixtures by joint optimization. IEEE Trans. Audio Speech Lang. Process. 19(1), 69–84 (2011). Mar.CrossRef

29.

H. Kameoka, K. Kashino, Composite autoregressive system for sparse source-filter representation of speech, in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (2009), pp. 2477–2480

30.

N.Q.K. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval, S. Sagayama, Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 205–208

31.

J.D. Leeuw, W.J. Heiser, Convergence of correction matrix algorithms for multidimensional scaling, in Geometric representations of relational data, ed. by J.C. Lingoes, E.E. Roskam, I. Borg (Mathesis Press, Ann Arbor, MI, 1977)

32.

D.R. Hunter, K. Lange, A tutorial on MM algorithms. Am. Statistician 58(1), 30–37 (2004). Feb.MathSciNetCrossRef

33.

A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Statistical Soc. Series B 39, 1–38 (1977)MathSciNetMATH

34.

D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems, vol. 13 (2001)

35.

M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, S. Sagayama, Convergence-guaranteed multiplicative algorithms for non-negative matrix factorization with beta-divergence, in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (2010), pp. 283–288

36.

C. Févotte, J. Idier, Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)MathSciNetCrossRefMATH

37.

C. Bishop, Pattern Recognit. Mach. Learn. (Springer-Verlag, New York, 2006)

38.

Y. Izumi, N. Ono, S. Sagayama, Sparseness-based 2ch BSS using the EM algorithm in reverberant environment, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2007), pp. 147–150

39.

H. Kameoka, M. Goto, S. Sagayama, Selective amplifier of periodic and non-periodic components in concurrent audio signals with spectral control envelopes, in IPSJ SIG Technical Reports, vol. 2006-MUS-66-13 (2006), pp. 77–84, in Japanese

40.

S. Amari, A. Cichocki, H.H. Yang, A new learning algorithm for blind signal separation, in Advances in Neural Information Processing Systems (MIT Press, 1996), pp. 757–763

Title: General Formulation of Multichannel Extensions of NMF Variants
Authors: Hirokazu Kameoka
Hiroshi Sawada
Takuya Higuchi
Publisher: Springer International Publishing
Book: Audio Source Separation
Print ISBN: 978-3-319-73030-1

Electronic ISBN: 978-3-319-73031-8

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-73031-8_5