Skip to main content
Erschienen in:

08.12.2023

Underdetermined Blind Source Separation Based on Spatial Estimation and Compressed Sensing

verfasst von: Shuang Wei, Rui Zhang

Erschienen in: Circuits, Systems, and Signal Processing | Ausgabe 4/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a dual-channel speech separation method based on spatial estimation via sparse Bayesian inference (SBI) and nonnegative matrix factorization (NMF). The spatial information estimated by traditional compressed sensing (CS) models is insufficient when two microphones receive limited columns of mixed signals. Considering the sparsity of peak values in the cross-correlation spectrum between two received signals, the proposed method builds a new CS model based on cross-correlation spectrum and applies SBI algorithm to solve this model to improve the estimation accuracy of spatial information. Combined the spatial information with the spectral features decomposed by NMF, NMF coefficient matrix masks belonging to individual source are generated for pre-separation. To mitigate retained potential interference components, a post-separation processing stage is designed using an expectation maximization (EM) algorithm based on a Gaussian mixture model (GMM). The estimated spatial information and binary time–frequency masks are used for parameter initialization of the EM algorithm. The experimental results using real-world speech data show that the proposed method can achieve better separation performance compared to various existing methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat A. Acharyya, M. Neehar, G.R. Naik, An accurate clustering algorithm for fast protein-profiling using SCICA on MALDI-TOF, in 2015 IEEE International Symposium on Circuits and Systems (ISCAS) (2015), pp. 69–72 A. Acharyya, M. Neehar, G.R. Naik, An accurate clustering algorithm for fast protein-profiling using SCICA on MALDI-TOF, in 2015 IEEE International Symposium on Circuits and Systems (ISCAS) (2015), pp. 69–72
2.
Zurück zum Zitat S. Araki, S. Makino, A. Blin, R. Mukai, H. Sawada, Underdetermined blind separation for speech in real environments with sparseness and ICA, in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3 (2004), pp. iii–881 S. Araki, S. Makino, A. Blin, R. Mukai, H. Sawada, Underdetermined blind separation for speech in real environments with sparseness and ICA, in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3 (2004), pp. iii–881
3.
Zurück zum Zitat M. Bella, H. Saylani, S. Hosseini, Y. Deville, Bin-wise combination of time-frequency masking and beamforming for convolutive source separation, in 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP) (2022), pp. 1–6 M. Bella, H. Saylani, S. Hosseini, Y. Deville, Bin-wise combination of time-frequency masking and beamforming for convolutive source separation, in 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP) (2022), pp. 1–6
4.
Zurück zum Zitat R. Chai, G.R. Naik, T.N. Nguyen et al., Driver fatigue classification with independent component by entropy rate bound minimization analysis in an EEG-based system. IEEE J. Biomed. Health Inform. 21(3), 715–724 (2016)CrossRefPubMed R. Chai, G.R. Naik, T.N. Nguyen et al., Driver fatigue classification with independent component by entropy rate bound minimization analysis in an EEG-based system. IEEE J. Biomed. Health Inform. 21(3), 715–724 (2016)CrossRefPubMed
5.
Zurück zum Zitat S.E. Chazan, H. Hammer, G. Hazan et al., Multi-microphone speaker separation based on deep DOA estimation, in 2019 27th European Signal Processing Conference (EUSIPCO) (2019), pp. 1–5 S.E. Chazan, H. Hammer, G. Hazan et al., Multi-microphone speaker separation based on deep DOA estimation, in 2019 27th European Signal Processing Conference (EUSIPCO) (2019), pp. 1–5
6.
Zurück zum Zitat Y. Chi, Guaranteed blind sparse spikes deconvolution via lifting and convex optimization. IEEE J. Sel. Top. Signal Process. 10(4), 782–794 (2016)ADSCrossRef Y. Chi, Guaranteed blind sparse spikes deconvolution via lifting and convex optimization. IEEE J. Sel. Top. Signal Process. 10(4), 782–794 (2016)ADSCrossRef
7.
Zurück zum Zitat A. Das, W.S. Hodgkiss, P. Gerstoft, Coherent multipath direction-of-arrival resolution using compressed sensing. IEEE J. Ocean. Eng. 42(2), 494–505 (2016)CrossRef A. Das, W.S. Hodgkiss, P. Gerstoft, Coherent multipath direction-of-arrival resolution using compressed sensing. IEEE J. Ocean. Eng. 42(2), 494–505 (2016)CrossRef
8.
Zurück zum Zitat V. Emiya, E. Vincent, N. Harlander, V. Hohmann, Subjective and objective quality assessment of audio source separation. IEEE Trans. Audio Speech Lang. Process. 19(7), 2046–2057 (2011)CrossRef V. Emiya, E. Vincent, N. Harlander, V. Hohmann, Subjective and objective quality assessment of audio source separation. IEEE Trans. Audio Speech Lang. Process. 19(7), 2046–2057 (2011)CrossRef
9.
Zurück zum Zitat C. Févotte, J. Idier, Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)MathSciNetCrossRef C. Févotte, J. Idier, Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)MathSciNetCrossRef
10.
Zurück zum Zitat A. Gretsistas, M.D. Plumbley, An alternating descent algorithm for the off-grid DOA estimation problem with sparsity constraints, in 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO) (2012), pp. 874–878 A. Gretsistas, M.D. Plumbley, An alternating descent algorithm for the off-grid DOA estimation problem with sparsity constraints, in 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO) (2012), pp. 874–878
11.
Zurück zum Zitat Y. Guo, S. Huang, Y. Li, G.R. Naik, Edge effect elimination in single-mixture blind source separation. Circuits Syst. Signal Process. 32, 2317–2334 (2013)MathSciNetCrossRef Y. Guo, S. Huang, Y. Li, G.R. Naik, Edge effect elimination in single-mixture blind source separation. Circuits Syst. Signal Process. 32, 2317–2334 (2013)MathSciNetCrossRef
12.
Zurück zum Zitat D.B. Haddad, L. Lovisolo, M.R. Petraglia et al., Blind and semi-blind anechoic mixing system identification using multichannel matching pursuit. Circuits Syst. Signal Process. 40(9), 4546–4575 (2021)CrossRef D.B. Haddad, L. Lovisolo, M.R. Petraglia et al., Blind and semi-blind anechoic mixing system identification using multichannel matching pursuit. Circuits Syst. Signal Process. 40(9), 4546–4575 (2021)CrossRef
13.
Zurück zum Zitat N. Ito, T. Nakatani, FastMNMF: Joint diagonalization based accelerated algorithms for multichannel nonnegative matrix factorization, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019), pp. 371–375 N. Ito, T. Nakatani, FastMNMF: Joint diagonalization based accelerated algorithms for multichannel nonnegative matrix factorization, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019), pp. 371–375
14.
Zurück zum Zitat S. Liang, W. Liu, W. Jiang, Integrating binary mask estimation with MRF priors of cochleagram for speech separation. IEEE Signal Process. Lett. 19(10), 627–630 (2012)ADSCrossRef S. Liang, W. Liu, W. Jiang, Integrating binary mask estimation with MRF priors of cochleagram for speech separation. IEEE Signal Process. Lett. 19(10), 627–630 (2012)ADSCrossRef
15.
Zurück zum Zitat H. Liu, S. Liu, T. Huang, Z. Zhang et al., Infrared spectrum blind deconvolution algorithm via learned dictionaries and sparse representation. Opt. Publ. Group 55(10), 2813–2818 (2016) H. Liu, S. Liu, T. Huang, Z. Zhang et al., Infrared spectrum blind deconvolution algorithm via learned dictionaries and sparse representation. Opt. Publ. Group 55(10), 2813–2818 (2016)
16.
Zurück zum Zitat D.J. MacKay, Bayesian interpolation. Neural Comput. 4(3), 415–447 (1992)CrossRef D.J. MacKay, Bayesian interpolation. Neural Comput. 4(3), 415–447 (1992)CrossRef
17.
Zurück zum Zitat M.I. Mandel, R.J. Weiss, D.P. Ellis, Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Process. 18(2), 382–394 (2009)CrossRef M.I. Mandel, R.J. Weiss, D.P. Ellis, Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Process. 18(2), 382–394 (2009)CrossRef
18.
Zurück zum Zitat G.R. Naik, S.E. Selvan, H.T. Nguyen, Single-channel EMG classification with ensemble-empirical-mode-decomposition-based ICA for diagnosing neuromuscular disorders. IEEE Trans. Neural Syst. Rehabil. Eng. 24(7), 734–743 (2015)CrossRefPubMed G.R. Naik, S.E. Selvan, H.T. Nguyen, Single-channel EMG classification with ensemble-empirical-mode-decomposition-based ICA for diagnosing neuromuscular disorders. IEEE Trans. Neural Syst. Rehabil. Eng. 24(7), 734–743 (2015)CrossRefPubMed
19.
Zurück zum Zitat N. Ono, Z. Rafii, D. Kitamura et al., The 2015 signal separation evaluation campaign, in Latent Variable Analysis and Signal Separation (2015), pp. 387–395 N. Ono, Z. Rafii, D. Kitamura et al., The 2015 signal separation evaluation campaign, in Latent Variable Analysis and Signal Separation (2015), pp. 387–395
20.
Zurück zum Zitat A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2009)CrossRef A. Ozerov, C. Févotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550–563 (2009)CrossRef
21.
Zurück zum Zitat G. Pendharkar, G.R. Naik, H.T. Nguyen, Using blind source separation on accelerometry data to analyze and distinguish the toe walking gait from normal gait in ITW children. Biomed. Signal Process. Control 13, 41–49 (2014)CrossRef G. Pendharkar, G.R. Naik, H.T. Nguyen, Using blind source separation on accelerometry data to analyze and distinguish the toe walking gait from normal gait in ITW children. Biomed. Signal Process. Control 13, 41–49 (2014)CrossRef
22.
Zurück zum Zitat V.G. Reju, S.N. Koh, I.Y. Soon, Underdetermined convolutive blind source separation via time-frequency masking. IEEE Trans. Audio Speech Lang. Process. 18(1), 101–116 (2009)CrossRef V.G. Reju, S.N. Koh, I.Y. Soon, Underdetermined convolutive blind source separation via time-frequency masking. IEEE Trans. Audio Speech Lang. Process. 18(1), 101–116 (2009)CrossRef
23.
Zurück zum Zitat S. Rickard, The DUET blind source separation algorithm, in Blind speech separation (2007), pp. 217–241 S. Rickard, The DUET blind source separation algorithm, in Blind speech separation (2007), pp. 217–241
24.
Zurück zum Zitat S. Rickard, O. Yilmaz, On the approximate W-disjoint orthogonality of speech, in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1 (2002), pp. I–529 S. Rickard, O. Yilmaz, On the approximate W-disjoint orthogonality of speech, in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1 (2002), pp. I–529
25.
Zurück zum Zitat H. Sawada, S. Araki, S. Makino, Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 19(3), 516–527 (2010)CrossRef H. Sawada, S. Araki, S. Makino, Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 19(3), 516–527 (2010)CrossRef
26.
Zurück zum Zitat M. Sunohara, C. Haruta, N. Ono, Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017), pp. 216–220 M. Sunohara, C. Haruta, N. Ono, Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017), pp. 216–220
27.
Zurück zum Zitat N. Tengtrairat, W.L. Woo, S.S. Dlay, B. Gao, Online noisy single-channel source separation using adaptive spectrum amplitude estimator and masking. IEEE Trans. Signal Process. 64(7), 1881–1895 (2015)ADSMathSciNetCrossRef N. Tengtrairat, W.L. Woo, S.S. Dlay, B. Gao, Online noisy single-channel source separation using adaptive spectrum amplitude estimator and masking. IEEE Trans. Signal Process. 64(7), 1881–1895 (2015)ADSMathSciNetCrossRef
28.
Zurück zum Zitat J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007)MathSciNetCrossRef J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007)MathSciNetCrossRef
29.
Zurück zum Zitat E. Vincent, R. Gribonval, C. Févotte, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRef E. Vincent, R. Gribonval, C. Févotte, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRef
30.
31.
Zurück zum Zitat W. Wang, Y. Qian, Y.Y. Tang, Hypergraph-regularized sparse NMF for hyperspectral unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9(2), 681–694 (2016)ADSCrossRef W. Wang, Y. Qian, Y.Y. Tang, Hypergraph-regularized sparse NMF for hyperspectral unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9(2), 681–694 (2016)ADSCrossRef
32.
Zurück zum Zitat S.U. Wood, J. Rouat, S. Dupont, G. Pironkov, Blind speech separation and enhancement with GCC-NMF. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 745–755 (2017)CrossRef S.U. Wood, J. Rouat, S. Dupont, G. Pironkov, Blind speech separation and enhancement with GCC-NMF. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 745–755 (2017)CrossRef
33.
Zurück zum Zitat X. Wu, W. Zhu, J. Yan, Direction of arrival estimation for off-grid signals based on sparse Bayesian learning. IEEE Sens. J. 16(7), 2004–2016 (2015)ADSCrossRef X. Wu, W. Zhu, J. Yan, Direction of arrival estimation for off-grid signals based on sparse Bayesian learning. IEEE Sens. J. 16(7), 2004–2016 (2015)ADSCrossRef
34.
Zurück zum Zitat Z. Yang, L. Xie, C. Zhang, Off-grid direction of arrival estimation using sparse Bayesian inference. IEEE Trans. Signal Process. 61(1), 38–43 (2012)ADSMathSciNetCrossRef Z. Yang, L. Xie, C. Zhang, Off-grid direction of arrival estimation using sparse Bayesian inference. IEEE Trans. Signal Process. 61(1), 38–43 (2012)ADSMathSciNetCrossRef
35.
Zurück zum Zitat J. Yang, Y. Guo, Z. Yang, S. Xie, Under-determined convolutive blind source separation combining density-based clustering and sparse reconstruction in time-frequency domain. IEEE Trans. Circuits Syst. I Regul. Pap. 66(8), 3015–3027 (2019)MathSciNetCrossRef J. Yang, Y. Guo, Z. Yang, S. Xie, Under-determined convolutive blind source separation combining density-based clustering and sparse reconstruction in time-frequency domain. IEEE Trans. Circuits Syst. I Regul. Pap. 66(8), 3015–3027 (2019)MathSciNetCrossRef
36.
Zurück zum Zitat T. Yoshioka, H. Erdogan, Z. Chen, F. Alleva, Multi-microphone neural speech separation for far-field multi-talker speech recognition, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 5739–5743 T. Yoshioka, H. Erdogan, Z. Chen, F. Alleva, Multi-microphone neural speech separation for far-field multi-talker speech recognition, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 5739–5743
Metadaten
Titel
Underdetermined Blind Source Separation Based on Spatial Estimation and Compressed Sensing
verfasst von
Shuang Wei
Rui Zhang
Publikationsdatum
08.12.2023
Verlag
Springer US
Erschienen in
Circuits, Systems, and Signal Processing / Ausgabe 4/2024
Print ISSN: 0278-081X
Elektronische ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-023-02566-1