Skip to main content
Top

2018 | OriginalPaper | Chapter

5. General Formulation of Multichannel Extensions of NMF Variants

Authors : Hirokazu Kameoka, Hiroshi Sawada, Takuya Higuchi

Published in: Audio Source Separation

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Blind source separation (BSS) is generally a mathematically ill-posed problem that involves separating out individual source signals from microphone array inputs. The frequency domain BSS approach is particularly notable in that it provides the flexibility needed to exploit various models for the time-frequency representations of source signals and/or array responses. Many frequency domain BSS approaches can be categorized according to the way in which the source power spectrograms and/or the mixing process are modeled. For source power spectrogram modeling, the non-negative matrix factorization (NMF) model and its variants have recently proved very powerful. For mixing process modeling, one reasonable way involves introducing a plane wave assumption so that the spatial covariances of each source can be described explicitly using the direction of arrival (DOA). This chapter provides a general formulation of the frequency domain BSS that makes it possible to incorporate the models for the source power spectrogram and the source spatial covariance matrix. Through this formulation, we reveal the relationship between the state-of-the-art BSS approaches. We further show that combining these models allows us to solve the problems of source separation, DOA estimation, dereverberation, and voice activity detection in a unified manner.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
The permutation alignment problem refers to a problem of grouping together the separated components of different frequency bins that originate from the same source to construct a separated signal.
 
2
If we want to maximize \(\mathscr {C}({\varvec{\theta }})\), we will use a minorizer instead, which is defined as \(\mathscr {C}({\varvec{\theta }}) = \max _{{\varvec{\alpha }}} \mathscr {D}({\varvec{\theta }},{\varvec{\alpha }})\).
 
Literature
1.
go back to reference A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, New York, 2001)CrossRef A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, New York, 2001)CrossRef
2.
go back to reference A. Hiroe, Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proceedings International Conference on Independent Component Analysis and Blind Source Separation (ICA) (2006), pp. 601–608 A. Hiroe, Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proceedings International Conference on Independent Component Analysis and Blind Source Separation (ICA) (2006), pp. 601–608
3.
go back to reference T. Kim, T. Eltoft, T.-W. Lee, Independent vector analysis: An extension of ICA to multivariate components, in Proceedings of International Conference on Independent Component Analysis and Blind Source Separation (ICA) (2006), pp. 165–172 T. Kim, T. Eltoft, T.-W. Lee, Independent vector analysis: An extension of ICA to multivariate components, in Proceedings of International Conference on Independent Component Analysis and Blind Source Separation (ICA) (2006), pp. 165–172
4.
go back to reference A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2010). MarCrossRef A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2010). MarCrossRef
5.
go back to reference H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in Proceedings of International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2010), pp. 245–253 H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in Proceedings of International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2010), pp. 245–253
6.
go back to reference A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, May 2011, pp. 257–260 A. Ozerov, C. Févotte, R. Blouet, J.-L. Durrieu, Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, May 2011, pp. 257–260
7.
go back to reference H. Sawada, H. Kameoka, S. Araki, N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio Speech Lang. Process. 21(5), 971–982 (2013). MayCrossRef H. Sawada, H. Kameoka, S. Araki, N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio Speech Lang. Process. 21(5), 971–982 (2013). MayCrossRef
8.
go back to reference J. Nikunen, T. Virtanen, Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 727–739 (2014). MarCrossRef J. Nikunen, T. Virtanen, Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(3), 727–739 (2014). MarCrossRef
9.
go back to reference D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2015, pp. 276–280 D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2015, pp. 276–280
10.
go back to reference D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1626–1641 (2016)CrossRef D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1626–1641 (2016)CrossRef
11.
go back to reference K. Adiloğlu, E. Vincent, Variational Bayesian inference for source separation and robust feature extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 24, 1746–1758 (2016)CrossRef K. Adiloğlu, E. Vincent, Variational Bayesian inference for source separation and robust feature extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 24, 1746–1758 (2016)CrossRef
12.
go back to reference D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1408–1423 (2016)CrossRef D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud, A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Trans. Audio Speech Lang. Process. 24(8), 1408–1423 (2016)CrossRef
13.
go back to reference P. Smaragdis, J.C. Brown, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2003), pp. 177–180 P. Smaragdis, J.C. Brown, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2003), pp. 177–180
14.
go back to reference C. Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Comput. 21(3), 793–830 (2009). MarCrossRefMATH C. Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Comput. 21(3), 793–830 (2009). MarCrossRefMATH
15.
go back to reference T. Higuchi, H. Takeda, T. Nakamura, H. Kameoka, A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models, in Proceedings of Annual Conference of the International Speech Communication Association (Interspeech) (2014), pp. 850–854 T. Higuchi, H. Takeda, T. Nakamura, H. Kameoka, A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models, in Proceedings of Annual Conference of the International Speech Communication Association (Interspeech) (2014), pp. 850–854
16.
go back to reference T. Higuchi, H. Kameoka, Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model, in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (2014) T. Higuchi, H. Kameoka, Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model, in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (2014)
17.
go back to reference T. Higuchi, H. Kameoka, Unified approach for underdetermined BSS, VAD, dereverberation and DOA estimation with multichannel factorial HMM, in Proceedings of IEEE Global Conference on Signal and Information Processing (GlobalSIP) (2014) T. Higuchi, H. Kameoka, Unified approach for underdetermined BSS, VAD, dereverberation and DOA estimation with multichannel factorial HMM, in Proceedings of IEEE Global Conference on Signal and Information Processing (GlobalSIP) (2014)
18.
go back to reference T. Higuchi, H. Kameoka, Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model, in Proceedings of European Signal Processing Conference (EUSIPCO), August 2015 T. Higuchi, H. Kameoka, Unified approach for audio source separation with multichannel factorial HMM and DOA mixture model, in Proceedings of European Signal Processing Conference (EUSIPCO), August 2015
19.
go back to reference H. Kameoka, M. Sato, T. Ono, N. Ono, S. Sagayama, Blind separation of infinitely many sparse sources, in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC) (2012) H. Kameoka, M. Sato, T. Ono, N. Ono, S. Sagayama, Blind separation of infinitely many sparse sources, in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC) (2012)
20.
go back to reference H. Kameoka, M. Sato, T. Ono, N. Ono, S. Sagayama, Bayesian nonparametric approach to blind separation of infinitely many sparse sources. IEICE Trans. Fundamentals Electronics E96-A(10), 1928–1937 (2013) H. Kameoka, M. Sato, T. Ono, N. Ono, S. Sagayama, Bayesian nonparametric approach to blind separation of infinitely many sparse sources. IEICE Trans. Fundamentals Electronics E96-A(10), 1928–1937 (2013)
21.
go back to reference T. Otsuka, K. Ishiguro, H. Sawada, H.G. Okuno, Bayesian nonparametrics for microphone array processing. IEEE/ACM Trans. Audio Speech Lang. Process. 22(2), 493–504 (2014)CrossRef T. Otsuka, K. Ishiguro, H. Sawada, H.G. Okuno, Bayesian nonparametrics for microphone array processing. IEEE/ACM Trans. Audio Speech Lang. Process. 22(2), 493–504 (2014)CrossRef
22.
go back to reference T. Higuchi, N. Takamune, T. Nakamura, H. Kameoka, Underdetermined blind separation and tracking of moving sources based on DOA-HMM, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 3215–3219 T. Higuchi, N. Takamune, T. Nakamura, H. Kameoka, Underdetermined blind separation and tracking of moving sources based on DOA-HMM, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014), pp. 3215–3219
23.
go back to reference H. Attias, New EM algorithms for source separation and deconvolution with a microphone array, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. V (2003), pp. 297–300 H. Attias, New EM algorithms for source separation and deconvolution with a microphone array, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. V (2003), pp. 297–300
24.
go back to reference N.Q.K. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)CrossRef N.Q.K. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)CrossRef
25.
go back to reference A. Ozerov, E. Vincent, F. Bimbot, A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Process. 20(4), 1118–1133 (2012)CrossRef A. Ozerov, E. Vincent, F. Bimbot, A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Process. 20(4), 1118–1133 (2012)CrossRef
26.
go back to reference T. Ono, N. Ono, S. Sagayama, User-guided independent vector analysis with source activity tuning, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 2417–2420 T. Ono, N. Ono, S. Sagayama, User-guided independent vector analysis with source activity tuning, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 2417–2420
27.
go back to reference S. Dégerine, A. Zaïdi, Separation of an instantaneous mixture of gaussian autoregressive sources by the exact maximum likelihood approach. IEEE Trans. Sig. Process. 52(6), 1499–1512 (2004)MathSciNetCrossRefMATH S. Dégerine, A. Zaïdi, Separation of an instantaneous mixture of gaussian autoregressive sources by the exact maximum likelihood approach. IEEE Trans. Sig. Process. 52(6), 1499–1512 (2004)MathSciNetCrossRefMATH
28.
go back to reference T. Yoshioka, T. Nakatani, M. Miyoshi, H.G. Okuno, Blind separation and dereverberation of speech mixtures by joint optimization. IEEE Trans. Audio Speech Lang. Process. 19(1), 69–84 (2011). Mar.CrossRef T. Yoshioka, T. Nakatani, M. Miyoshi, H.G. Okuno, Blind separation and dereverberation of speech mixtures by joint optimization. IEEE Trans. Audio Speech Lang. Process. 19(1), 69–84 (2011). Mar.CrossRef
29.
go back to reference H. Kameoka, K. Kashino, Composite autoregressive system for sparse source-filter representation of speech, in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (2009), pp. 2477–2480 H. Kameoka, K. Kashino, Composite autoregressive system for sparse source-filter representation of speech, in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (2009), pp. 2477–2480
30.
go back to reference N.Q.K. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval, S. Sagayama, Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 205–208 N.Q.K. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval, S. Sagayama, Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011), pp. 205–208
31.
go back to reference J.D. Leeuw, W.J. Heiser, Convergence of correction matrix algorithms for multidimensional scaling, in Geometric representations of relational data, ed. by J.C. Lingoes, E.E. Roskam, I. Borg (Mathesis Press, Ann Arbor, MI, 1977) J.D. Leeuw, W.J. Heiser, Convergence of correction matrix algorithms for multidimensional scaling, in Geometric representations of relational data, ed. by J.C. Lingoes, E.E. Roskam, I. Borg (Mathesis Press, Ann Arbor, MI, 1977)
33.
go back to reference A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Statistical Soc. Series B 39, 1–38 (1977)MathSciNetMATH A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Statistical Soc. Series B 39, 1–38 (1977)MathSciNetMATH
34.
go back to reference D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems, vol. 13 (2001) D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems, vol. 13 (2001)
35.
go back to reference M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, S. Sagayama, Convergence-guaranteed multiplicative algorithms for non-negative matrix factorization with beta-divergence, in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (2010), pp. 283–288 M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, S. Sagayama, Convergence-guaranteed multiplicative algorithms for non-negative matrix factorization with beta-divergence, in Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (2010), pp. 283–288
36.
go back to reference C. Févotte, J. Idier, Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)MathSciNetCrossRefMATH C. Févotte, J. Idier, Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)MathSciNetCrossRefMATH
37.
go back to reference C. Bishop, Pattern Recognit. Mach. Learn. (Springer-Verlag, New York, 2006) C. Bishop, Pattern Recognit. Mach. Learn. (Springer-Verlag, New York, 2006)
38.
go back to reference Y. Izumi, N. Ono, S. Sagayama, Sparseness-based 2ch BSS using the EM algorithm in reverberant environment, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2007), pp. 147–150 Y. Izumi, N. Ono, S. Sagayama, Sparseness-based 2ch BSS using the EM algorithm in reverberant environment, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2007), pp. 147–150
39.
go back to reference H. Kameoka, M. Goto, S. Sagayama, Selective amplifier of periodic and non-periodic components in concurrent audio signals with spectral control envelopes, in IPSJ SIG Technical Reports, vol. 2006-MUS-66-13 (2006), pp. 77–84, in Japanese H. Kameoka, M. Goto, S. Sagayama, Selective amplifier of periodic and non-periodic components in concurrent audio signals with spectral control envelopes, in IPSJ SIG Technical Reports, vol. 2006-MUS-66-13 (2006), pp. 77–84, in Japanese
40.
go back to reference S. Amari, A. Cichocki, H.H. Yang, A new learning algorithm for blind signal separation, in Advances in Neural Information Processing Systems (MIT Press, 1996), pp. 757–763 S. Amari, A. Cichocki, H.H. Yang, A new learning algorithm for blind signal separation, in Advances in Neural Information Processing Systems (MIT Press, 1996), pp. 757–763
Metadata
Title
General Formulation of Multichannel Extensions of NMF Variants
Authors
Hirokazu Kameoka
Hiroshi Sawada
Takuya Higuchi
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-73031-8_5