Skip to main content
Erschienen in:

18.01.2024

Time-Frequency Bins Selection for Direction of Arrival Estimation Based on Speech Presence Probability Learning

verfasst von: Qinzheng Zhang, Haiyan Wang, Jesper Rindom Jensen, Shuai Tao, Mads Græsbøll Christensen

Erschienen in: Circuits, Systems, and Signal Processing | Ausgabe 5/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the development of deep learning techniques, the field of direction of arrival (DOA) estimation has also made significant progress. However, the accuracy of DOA estimation using end-to-end neural networks (NNs) heavily relies on the classification step of the networks, which necessitates the use of large and representative datasets. Additionally, conventional speech presence probability (SPP) estimation methods based on the ideal ratio mask (IRM) may misclassify time-frequency (T-F) bins dominated by non-speech and noise, which hinders the accurate extraction of directional information. To improve the robustness of existing DOA estimation algorithms, this paper proposes a DOA estimation method with T-F bin selection. In terms of output, instead of using IRM-based SPP, our proposed approach focuses on the a posteriori SPP, a deliberate choice aimed at circumventing potential confusion. For input optimization, we construct features that encompass spatial, temporal, and directional information concurrently, and these are coupled with a frequency bin-wise recurrent neural network (RNN) model to attain precise multi-channel SPP estimation. Subsequently, these SPP estimates are utilized to extract local information for DOA estimation. Moreover, the cascaded structure ensures that the model has the ability to complete out-of-label tasks, effectively reducing the dataset requirements by training only a subset of direction information to achieve omnidirectional DOA estimation. Besides, this contributes to the algorithm’s ability to eliminate its reliance on the step size, setting it apart from other end-to-end methods. Simulation results validate that the proposed method achieves higher accuracy and lower error compared to both NN-based end-to-end approaches and traditional full-band approaches under various conditions of reverberation and signal-to-noise ratio.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat S. Afshar, R. Boostani, S. Sanei, A combinatorial deep learning structure for precise depth of anesthesia estimation from EEG signals. IEEE J. Biomed. Health Inform. 25(9), 3408–3415 (2021)CrossRef S. Afshar, R. Boostani, S. Sanei, A combinatorial deep learning structure for precise depth of anesthesia estimation from EEG signals. IEEE J. Biomed. Health Inform. 25(9), 3408–3415 (2021)CrossRef
2.
Zurück zum Zitat J. Basu, S. Khan, R. Roy et al., Multilingual speech corpus in low-resource eastern and northeastern Indian languages for speaker and language identification. Circuits Syst. Signal Process. 40, 4986–5013 (2021)CrossRef J. Basu, S. Khan, R. Roy et al., Multilingual speech corpus in low-resource eastern and northeastern Indian languages for speaker and language identification. Circuits Syst. Signal Process. 40, 4986–5013 (2021)CrossRef
3.
Zurück zum Zitat B.W. Chen, C.Y. Chen, J.F. Wang, Smart homecare surveillance system: Behavior identification based on state-transition support vector machines and sound directivity pattern analysis. IEEE Trans. Syst. Man Cybern. Syst. 43(6), 1279–1289 (2013)CrossRef B.W. Chen, C.Y. Chen, J.F. Wang, Smart homecare surveillance system: Behavior identification based on state-transition support vector machines and sound directivity pattern analysis. IEEE Trans. Syst. Man Cybern. Syst. 43(6), 1279–1289 (2013)CrossRef
4.
Zurück zum Zitat A. Dehghan Firoozabadi, H.R. Abutalebi, A novel nested circular microphone array and subband processing-based system for counting and DOA estimation of multiple simultaneous speakers. Circuits Syst. Signal Process. 35, 573–601 (2016)CrossRef A. Dehghan Firoozabadi, H.R. Abutalebi, A novel nested circular microphone array and subband processing-based system for counting and DOA estimation of multiple simultaneous speakers. Circuits Syst. Signal Process. 35, 573–601 (2016)CrossRef
5.
Zurück zum Zitat J.H. DiBiase, A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays (Brown University, Providence, 2000) J.H. DiBiase, A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays (Brown University, Providence, 2000)
6.
Zurück zum Zitat W. Fang, D. Yu, W. Wang et al., A deep learning based mutual coupling correction and DOA estimation algorithm. in 2021 13th international conference on wireless communications and signal processing (WCSP), IEEE, pp. 1–5 (2021) W. Fang, D. Yu, W. Wang et al., A deep learning based mutual coupling correction and DOA estimation algorithm. in 2021 13th international conference on wireless communications and signal processing (WCSP), IEEE, pp. 1–5 (2021)
7.
Zurück zum Zitat J.S. Garofolo, L.F. Lamel, W.M. Fisher et al., DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report n 93, 27403 (1993) J.S. Garofolo, L.F. Lamel, W.M. Fisher et al., DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report n 93, 27403 (1993)
8.
Zurück zum Zitat T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2011)CrossRef T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2011)CrossRef
9.
Zurück zum Zitat P.A. Grumiaux, S. Kitić, L. Girin et al., A survey of sound source localization with deep learning methods. J. Acoust. Soc. Am. 152(1), 107–151 (2022)CrossRef P.A. Grumiaux, S. Kitić, L. Girin et al., A survey of sound source localization with deep learning methods. J. Acoust. Soc. Am. 152(1), 107–151 (2022)CrossRef
10.
Zurück zum Zitat T. Gustafsson, B.D. Rao, M. Trivedi, Source localization in reverberant environments: modeling and statistical analysis. IEEE Trans. Speech Audio Process. 11(6), 791–803 (2003)CrossRef T. Gustafsson, B.D. Rao, M. Trivedi, Source localization in reverberant environments: modeling and statistical analysis. IEEE Trans. Speech Audio Process. 11(6), 791–803 (2003)CrossRef
11.
Zurück zum Zitat E.A. Habets, Room impulse response generator. Technische Universiteit Eindhoven Tech. Rep. 2(2.4), 1 (2006) E.A. Habets, Room impulse response generator. Technische Universiteit Eindhoven Tech. Rep. 2(2.4), 1 (2006)
12.
Zurück zum Zitat E. Hadad, F. Heese, P. Vary et al., Multichannel audio database in various acoustic environments. in 2014 14th international workshop on acoustic signal enhancement (IWAENC), IEEE, pp. 313–317 (2014) E. Hadad, F. Heese, P. Vary et al., Multichannel audio database in various acoustic environments. in 2014 14th international workshop on acoustic signal enhancement (IWAENC), IEEE, pp. 313–317 (2014)
13.
Zurück zum Zitat J. Hu, Q. Mo, Z. Liu et al., Multi-source classification: a DOA-based deep learning approach. in 2020 international conference on computer engineering and application (ICCEA), IEEE, pp. 463–467 (2020) J. Hu, Q. Mo, Z. Liu et al., Multi-source classification: a DOA-based deep learning approach. in 2020 international conference on computer engineering and application (ICCEA), IEEE, pp. 463–467 (2020)
14.
Zurück zum Zitat G. Huang, J. Chen, J. Benesty, Direction-of-arrival estimation of passive acoustic sources in reverberant environments based on the householder transformation. J. Acoust. Soc. Am. 138(5), 3053–3060 (2015)CrossRef G. Huang, J. Chen, J. Benesty, Direction-of-arrival estimation of passive acoustic sources in reverberant environments based on the householder transformation. J. Acoust. Soc. Am. 138(5), 3053–3060 (2015)CrossRef
15.
Zurück zum Zitat G. Huang, J. Benesty, J. Chen, On the design of frequency-invariant beampatterns with uniform circular microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1140–1153 (2017)CrossRef G. Huang, J. Benesty, J. Chen, On the design of frequency-invariant beampatterns with uniform circular microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1140–1153 (2017)CrossRef
16.
Zurück zum Zitat G. Huang, J. Chen, J. Benesty, Insights into frequency-invariant beamforming with concentric circular microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 26(12), 2305–2318 (2018)CrossRef G. Huang, J. Chen, J. Benesty, Insights into frequency-invariant beamforming with concentric circular microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 26(12), 2305–2318 (2018)CrossRef
17.
Zurück zum Zitat G. Huang, J. Benesty, J. Chen et al., Robust and steerable Kronecker product differential beamforming with rectangular microphone arrays. in ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 211–215 (2020) G. Huang, J. Benesty, J. Chen et al., Robust and steerable Kronecker product differential beamforming with rectangular microphone arrays. in ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 211–215 (2020)
18.
Zurück zum Zitat G. Huang, J. Benesty, I. Cohen et al., A simple theory and new method of differential beamforming with uniform linear microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 1079–1093 (2020)CrossRef G. Huang, J. Benesty, I. Cohen et al., A simple theory and new method of differential beamforming with uniform linear microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 1079–1093 (2020)CrossRef
19.
Zurück zum Zitat J.R. Jensen, M.G. Christensen, S.H. Jensen, Nonlinear least squares methods for joint DOA and pitch estimation. IEEE Trans. Audio Speech Lang. Process. 21(5), 923–933 (2013)CrossRef J.R. Jensen, M.G. Christensen, S.H. Jensen, Nonlinear least squares methods for joint DOA and pitch estimation. IEEE Trans. Audio Speech Lang. Process. 21(5), 923–933 (2013)CrossRef
20.
Zurück zum Zitat J.R. Jensen, J.K. Nielsen, R. Heusdens et al., DOA estimation of audio sources in reverberant environments. in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 176–180 (2016) J.R. Jensen, J.K. Nielsen, R. Heusdens et al., DOA estimation of audio sources in reverberant environments. in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 176–180 (2016)
21.
Zurück zum Zitat S. Karimian-Azari, J.R Jensen, M.G Christensen, Robust DOA estimation of harmonic signals using constrained filters on phase estimates. in 2014 22nd European signal processing conference (EUSIPCO), IEEE, pp. 1930–1934 (2014) S. Karimian-Azari, J.R Jensen, M.G Christensen, Robust DOA estimation of harmonic signals using constrained filters on phase estimates. in 2014 22nd European signal processing conference (EUSIPCO), IEEE, pp. 1930–1934 (2014)
22.
Zurück zum Zitat G. Lee, K. Tatara, N.Y Chong, Hardware-assisted direction estimation for mobile robot target tracking applications. in 2015 IEEE international conference on mechatronics (ICM), IEEE, pp 182–187 (2015) G. Lee, K. Tatara, N.Y Chong, Hardware-assisted direction estimation for mobile robot target tracking applications. in 2015 IEEE international conference on mechatronics (ICM), IEEE, pp 182–187 (2015)
23.
Zurück zum Zitat L. Li, T. Qiu, X. Shi, Parameter estimation based on fractional power spectrum density in bistatic MIMO radar system under impulsive noise environment. Circuits Syst. Signal Process. 35(9), 3266–3283 (2016)MathSciNetCrossRef L. Li, T. Qiu, X. Shi, Parameter estimation based on fractional power spectrum density in bistatic MIMO radar system under impulsive noise environment. Circuits Syst. Signal Process. 35(9), 3266–3283 (2016)MathSciNetCrossRef
24.
Zurück zum Zitat S.S. Mane, S.G. Mali, S. Mahajan, Localization of steady sound source and direction detection of moving sound source using CNN. in 2019 10th international conference on computing, communication and Networking Technologies (ICCCNT), IEEE, pp. 1–6 (2019) S.S. Mane, S.G. Mali, S. Mahajan, Localization of steady sound source and direction detection of moving sound source using CNN. in 2019 10th international conference on computing, communication and Networking Technologies (ICCCNT), IEEE, pp. 1–6 (2019)
25.
Zurück zum Zitat Q. Nguyen, G. Shen, J. Choi, Sound detection and localization in windy conditions for intelligent outdoor security cameras. Circuits Syst. Signal Process. 35, 233–251 (2016)MathSciNetCrossRef Q. Nguyen, G. Shen, J. Choi, Sound detection and localization in windy conditions for intelligent outdoor security cameras. Circuits Syst. Signal Process. 35, 233–251 (2016)MathSciNetCrossRef
26.
Zurück zum Zitat G.K. Papageorgiou, M. Sellathurai, Y.C. Eldar, Deep networks for direction-of-arrival estimation in low snr. IEEE Trans. Signal Process. 69, 3714–3729 (2021)MathSciNetCrossRef G.K. Papageorgiou, M. Sellathurai, Y.C. Eldar, Deep networks for direction-of-arrival estimation in low snr. IEEE Trans. Signal Process. 69, 3714–3729 (2021)MathSciNetCrossRef
27.
Zurück zum Zitat A.S. Subramanian, S.J. Chen, Watanabe S Student-teacher learning for BLSTM mask-based speech enhancement. arXiv preprint arXiv:1803.10013 (2018) A.S. Subramanian, S.J. Chen, Watanabe S Student-teacher learning for BLSTM mask-based speech enhancement. arXiv preprint arXiv:​1803.​10013 (2018)
28.
Zurück zum Zitat S. Tao, H. Reddy, J.R. Jensen et al., Frequency bin-wise single channel speech presence probability estimation using multiple DNNS. in ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 1–5 (2023) S. Tao, H. Reddy, J.R. Jensen et al., Frequency bin-wise single channel speech presence probability estimation using multiple DNNS. in ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 1–5 (2023)
29.
Zurück zum Zitat Y.H. Tu, J. Du, C.H. Lee, Speech enhancement based on teacher-student deep learning using improved speech presence probability for noise-robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 27(12), 2080–2091 (2019)CrossRef Y.H. Tu, J. Du, C.H. Lee, Speech enhancement based on teacher-student deep learning using improved speech presence probability for noise-robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 27(12), 2080–2091 (2019)CrossRef
30.
Zurück zum Zitat A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)CrossRef A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)CrossRef
31.
Zurück zum Zitat P. Vecchiotti, N. Ma, S. Squartini et al., End-to-end binaural sound localisation from the raw waveform, in ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 451–455 (2019) P. Vecchiotti, N. Ma, S. Squartini et al., End-to-end binaural sound localisation from the raw waveform, in ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 451–455 (2019)
32.
Zurück zum Zitat S. Wandale, K. Ichige, On the DOA estimation performance of optimum arrays based on deep learning. in 2020 IEEE 11th sensor array and multichannel signal processing workshop (SAM), IEEE, pp. 1–5 (2020) S. Wandale, K. Ichige, On the DOA estimation performance of optimum arrays based on deep learning. in 2020 IEEE 11th sensor array and multichannel signal processing workshop (SAM), IEEE, pp. 1–5 (2020)
33.
Zurück zum Zitat H. Wang, K. Chen, J. Lu, U-net based direct-path dominance test for robust direction-of-arrival estimation. arXiv preprint arXiv:2005.04376 (2020a) H. Wang, K. Chen, J. Lu, U-net based direct-path dominance test for robust direction-of-arrival estimation. arXiv preprint arXiv:​2005.​04376 (2020a)
34.
Zurück zum Zitat X. Wang, G. Huang, J. Benesty et al., Time difference of arrival estimation based on a Kronecker product decomposition. IEEE Signal Process. Lett. 28, 51–55 (2020)CrossRef X. Wang, G. Huang, J. Benesty et al., Time difference of arrival estimation based on a Kronecker product decomposition. IEEE Signal Process. Lett. 28, 51–55 (2020)CrossRef
35.
Zurück zum Zitat X. Xiao, S. Zhao, X. Zhong et al., A learning-based approach to direction of arrival estimation in noisy and reverberant environments, in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 2814–2818 (2015) X. Xiao, S. Zhao, X. Zhong et al., A learning-based approach to direction of arrival estimation in noisy and reverberant environments, in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 2814–2818 (2015)
36.
Zurück zum Zitat C. Ying, W. Xiang, H. Zhitao, Underdetermined DOA estimation via multiple time-delay covariance matrices and deep residual network. J. Syst. Eng. Electron. 32(6), 1354–1363 (2021)CrossRef C. Ying, W. Xiang, H. Zhitao, Underdetermined DOA estimation via multiple time-delay covariance matrices and deep residual network. J. Syst. Eng. Electron. 32(6), 1354–1363 (2021)CrossRef
37.
Zurück zum Zitat Y. Yuan, S. Wu, Y. Yang et al., Multi-DOA estimation based on the KR image tensor and improved estimation network. Sci. Rep. 11(1), 6386 (2021)CrossRef Y. Yuan, S. Wu, Y. Yang et al., Multi-DOA estimation based on the KR image tensor and improved estimation network. Sci. Rep. 11(1), 6386 (2021)CrossRef
38.
Zurück zum Zitat O.B Zaken, B. Rafaely, A. Kumar et al. Direction of arrival estimation for reverberant speech based on neural networks and the direct-path dominance test. in 2022 international workshop on acoustic signal enhancement (IWAENC), IEEE, pp. 1–5 (2022) O.B Zaken, B. Rafaely, A. Kumar et al. Direction of arrival estimation for reverberant speech based on neural networks and the direct-path dominance test. in 2022 international workshop on acoustic signal enhancement (IWAENC), IEEE, pp. 1–5 (2022)
39.
Zurück zum Zitat M. Zhang, X. Pan, Y. Shen et al., Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array. J. Acoust. Soc. Am. 149(6), 3841–3850 (2021)CrossRef M. Zhang, X. Pan, Y. Shen et al., Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array. J. Acoust. Soc. Am. 149(6), 3841–3850 (2021)CrossRef
40.
Zurück zum Zitat X. Zhang, Z. Zheng, W.Q. Wang et al., DOA estimation of coherent sources using coprime array via atomic norm minimization. IEEE Signal Process. Lett. 29, 1312–1316 (2022)CrossRef X. Zhang, Z. Zheng, W.Q. Wang et al., DOA estimation of coherent sources using coprime array via atomic norm minimization. IEEE Signal Process. Lett. 29, 1312–1316 (2022)CrossRef
41.
Zurück zum Zitat Z. Zhang, X. Wu, C. Li et al., An \(l\) p-norm based method for off-grid DOA estimation. Circuits Syst. Signal Process. 38(2), 904–917 (2019)CrossRef Z. Zhang, X. Wu, C. Li et al., An \(l\) p-norm based method for off-grid DOA estimation. Circuits Syst. Signal Process. 38(2), 904–917 (2019)CrossRef
Metadaten
Titel
Time-Frequency Bins Selection for Direction of Arrival Estimation Based on Speech Presence Probability Learning
verfasst von
Qinzheng Zhang
Haiyan Wang
Jesper Rindom Jensen
Shuai Tao
Mads Græsbøll Christensen
Publikationsdatum
18.01.2024
Verlag
Springer US
Erschienen in
Circuits, Systems, and Signal Processing / Ausgabe 5/2024
Print ISSN: 0278-081X
Elektronische ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-023-02586-x