Skip to main content
Top
Published in:

18-01-2024

Time-Frequency Bins Selection for Direction of Arrival Estimation Based on Speech Presence Probability Learning

Authors: Qinzheng Zhang, Haiyan Wang, Jesper Rindom Jensen, Shuai Tao, Mads Græsbøll Christensen

Published in: Circuits, Systems, and Signal Processing | Issue 5/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the development of deep learning techniques, the field of direction of arrival (DOA) estimation has also made significant progress. However, the accuracy of DOA estimation using end-to-end neural networks (NNs) heavily relies on the classification step of the networks, which necessitates the use of large and representative datasets. Additionally, conventional speech presence probability (SPP) estimation methods based on the ideal ratio mask (IRM) may misclassify time-frequency (T-F) bins dominated by non-speech and noise, which hinders the accurate extraction of directional information. To improve the robustness of existing DOA estimation algorithms, this paper proposes a DOA estimation method with T-F bin selection. In terms of output, instead of using IRM-based SPP, our proposed approach focuses on the a posteriori SPP, a deliberate choice aimed at circumventing potential confusion. For input optimization, we construct features that encompass spatial, temporal, and directional information concurrently, and these are coupled with a frequency bin-wise recurrent neural network (RNN) model to attain precise multi-channel SPP estimation. Subsequently, these SPP estimates are utilized to extract local information for DOA estimation. Moreover, the cascaded structure ensures that the model has the ability to complete out-of-label tasks, effectively reducing the dataset requirements by training only a subset of direction information to achieve omnidirectional DOA estimation. Besides, this contributes to the algorithm’s ability to eliminate its reliance on the step size, setting it apart from other end-to-end methods. Simulation results validate that the proposed method achieves higher accuracy and lower error compared to both NN-based end-to-end approaches and traditional full-band approaches under various conditions of reverberation and signal-to-noise ratio.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Show more products
Literature
1.
go back to reference S. Afshar, R. Boostani, S. Sanei, A combinatorial deep learning structure for precise depth of anesthesia estimation from EEG signals. IEEE J. Biomed. Health Inform. 25(9), 3408–3415 (2021)CrossRef S. Afshar, R. Boostani, S. Sanei, A combinatorial deep learning structure for precise depth of anesthesia estimation from EEG signals. IEEE J. Biomed. Health Inform. 25(9), 3408–3415 (2021)CrossRef
2.
go back to reference J. Basu, S. Khan, R. Roy et al., Multilingual speech corpus in low-resource eastern and northeastern Indian languages for speaker and language identification. Circuits Syst. Signal Process. 40, 4986–5013 (2021)CrossRef J. Basu, S. Khan, R. Roy et al., Multilingual speech corpus in low-resource eastern and northeastern Indian languages for speaker and language identification. Circuits Syst. Signal Process. 40, 4986–5013 (2021)CrossRef
3.
go back to reference B.W. Chen, C.Y. Chen, J.F. Wang, Smart homecare surveillance system: Behavior identification based on state-transition support vector machines and sound directivity pattern analysis. IEEE Trans. Syst. Man Cybern. Syst. 43(6), 1279–1289 (2013)CrossRef B.W. Chen, C.Y. Chen, J.F. Wang, Smart homecare surveillance system: Behavior identification based on state-transition support vector machines and sound directivity pattern analysis. IEEE Trans. Syst. Man Cybern. Syst. 43(6), 1279–1289 (2013)CrossRef
4.
go back to reference A. Dehghan Firoozabadi, H.R. Abutalebi, A novel nested circular microphone array and subband processing-based system for counting and DOA estimation of multiple simultaneous speakers. Circuits Syst. Signal Process. 35, 573–601 (2016)CrossRef A. Dehghan Firoozabadi, H.R. Abutalebi, A novel nested circular microphone array and subband processing-based system for counting and DOA estimation of multiple simultaneous speakers. Circuits Syst. Signal Process. 35, 573–601 (2016)CrossRef
5.
go back to reference J.H. DiBiase, A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays (Brown University, Providence, 2000) J.H. DiBiase, A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays (Brown University, Providence, 2000)
6.
go back to reference W. Fang, D. Yu, W. Wang et al., A deep learning based mutual coupling correction and DOA estimation algorithm. in 2021 13th international conference on wireless communications and signal processing (WCSP), IEEE, pp. 1–5 (2021) W. Fang, D. Yu, W. Wang et al., A deep learning based mutual coupling correction and DOA estimation algorithm. in 2021 13th international conference on wireless communications and signal processing (WCSP), IEEE, pp. 1–5 (2021)
7.
go back to reference J.S. Garofolo, L.F. Lamel, W.M. Fisher et al., DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report n 93, 27403 (1993) J.S. Garofolo, L.F. Lamel, W.M. Fisher et al., DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report n 93, 27403 (1993)
8.
go back to reference T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2011)CrossRef T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2011)CrossRef
9.
go back to reference P.A. Grumiaux, S. Kitić, L. Girin et al., A survey of sound source localization with deep learning methods. J. Acoust. Soc. Am. 152(1), 107–151 (2022)CrossRef P.A. Grumiaux, S. Kitić, L. Girin et al., A survey of sound source localization with deep learning methods. J. Acoust. Soc. Am. 152(1), 107–151 (2022)CrossRef
10.
go back to reference T. Gustafsson, B.D. Rao, M. Trivedi, Source localization in reverberant environments: modeling and statistical analysis. IEEE Trans. Speech Audio Process. 11(6), 791–803 (2003)CrossRef T. Gustafsson, B.D. Rao, M. Trivedi, Source localization in reverberant environments: modeling and statistical analysis. IEEE Trans. Speech Audio Process. 11(6), 791–803 (2003)CrossRef
11.
go back to reference E.A. Habets, Room impulse response generator. Technische Universiteit Eindhoven Tech. Rep. 2(2.4), 1 (2006) E.A. Habets, Room impulse response generator. Technische Universiteit Eindhoven Tech. Rep. 2(2.4), 1 (2006)
12.
go back to reference E. Hadad, F. Heese, P. Vary et al., Multichannel audio database in various acoustic environments. in 2014 14th international workshop on acoustic signal enhancement (IWAENC), IEEE, pp. 313–317 (2014) E. Hadad, F. Heese, P. Vary et al., Multichannel audio database in various acoustic environments. in 2014 14th international workshop on acoustic signal enhancement (IWAENC), IEEE, pp. 313–317 (2014)
13.
go back to reference J. Hu, Q. Mo, Z. Liu et al., Multi-source classification: a DOA-based deep learning approach. in 2020 international conference on computer engineering and application (ICCEA), IEEE, pp. 463–467 (2020) J. Hu, Q. Mo, Z. Liu et al., Multi-source classification: a DOA-based deep learning approach. in 2020 international conference on computer engineering and application (ICCEA), IEEE, pp. 463–467 (2020)
14.
go back to reference G. Huang, J. Chen, J. Benesty, Direction-of-arrival estimation of passive acoustic sources in reverberant environments based on the householder transformation. J. Acoust. Soc. Am. 138(5), 3053–3060 (2015)CrossRef G. Huang, J. Chen, J. Benesty, Direction-of-arrival estimation of passive acoustic sources in reverberant environments based on the householder transformation. J. Acoust. Soc. Am. 138(5), 3053–3060 (2015)CrossRef
15.
go back to reference G. Huang, J. Benesty, J. Chen, On the design of frequency-invariant beampatterns with uniform circular microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1140–1153 (2017)CrossRef G. Huang, J. Benesty, J. Chen, On the design of frequency-invariant beampatterns with uniform circular microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1140–1153 (2017)CrossRef
16.
go back to reference G. Huang, J. Chen, J. Benesty, Insights into frequency-invariant beamforming with concentric circular microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 26(12), 2305–2318 (2018)CrossRef G. Huang, J. Chen, J. Benesty, Insights into frequency-invariant beamforming with concentric circular microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 26(12), 2305–2318 (2018)CrossRef
17.
go back to reference G. Huang, J. Benesty, J. Chen et al., Robust and steerable Kronecker product differential beamforming with rectangular microphone arrays. in ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 211–215 (2020) G. Huang, J. Benesty, J. Chen et al., Robust and steerable Kronecker product differential beamforming with rectangular microphone arrays. in ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 211–215 (2020)
18.
go back to reference G. Huang, J. Benesty, I. Cohen et al., A simple theory and new method of differential beamforming with uniform linear microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 1079–1093 (2020)CrossRef G. Huang, J. Benesty, I. Cohen et al., A simple theory and new method of differential beamforming with uniform linear microphone arrays. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 1079–1093 (2020)CrossRef
19.
go back to reference J.R. Jensen, M.G. Christensen, S.H. Jensen, Nonlinear least squares methods for joint DOA and pitch estimation. IEEE Trans. Audio Speech Lang. Process. 21(5), 923–933 (2013)CrossRef J.R. Jensen, M.G. Christensen, S.H. Jensen, Nonlinear least squares methods for joint DOA and pitch estimation. IEEE Trans. Audio Speech Lang. Process. 21(5), 923–933 (2013)CrossRef
20.
go back to reference J.R. Jensen, J.K. Nielsen, R. Heusdens et al., DOA estimation of audio sources in reverberant environments. in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 176–180 (2016) J.R. Jensen, J.K. Nielsen, R. Heusdens et al., DOA estimation of audio sources in reverberant environments. in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 176–180 (2016)
21.
go back to reference S. Karimian-Azari, J.R Jensen, M.G Christensen, Robust DOA estimation of harmonic signals using constrained filters on phase estimates. in 2014 22nd European signal processing conference (EUSIPCO), IEEE, pp. 1930–1934 (2014) S. Karimian-Azari, J.R Jensen, M.G Christensen, Robust DOA estimation of harmonic signals using constrained filters on phase estimates. in 2014 22nd European signal processing conference (EUSIPCO), IEEE, pp. 1930–1934 (2014)
22.
go back to reference G. Lee, K. Tatara, N.Y Chong, Hardware-assisted direction estimation for mobile robot target tracking applications. in 2015 IEEE international conference on mechatronics (ICM), IEEE, pp 182–187 (2015) G. Lee, K. Tatara, N.Y Chong, Hardware-assisted direction estimation for mobile robot target tracking applications. in 2015 IEEE international conference on mechatronics (ICM), IEEE, pp 182–187 (2015)
23.
go back to reference L. Li, T. Qiu, X. Shi, Parameter estimation based on fractional power spectrum density in bistatic MIMO radar system under impulsive noise environment. Circuits Syst. Signal Process. 35(9), 3266–3283 (2016)MathSciNetCrossRef L. Li, T. Qiu, X. Shi, Parameter estimation based on fractional power spectrum density in bistatic MIMO radar system under impulsive noise environment. Circuits Syst. Signal Process. 35(9), 3266–3283 (2016)MathSciNetCrossRef
24.
go back to reference S.S. Mane, S.G. Mali, S. Mahajan, Localization of steady sound source and direction detection of moving sound source using CNN. in 2019 10th international conference on computing, communication and Networking Technologies (ICCCNT), IEEE, pp. 1–6 (2019) S.S. Mane, S.G. Mali, S. Mahajan, Localization of steady sound source and direction detection of moving sound source using CNN. in 2019 10th international conference on computing, communication and Networking Technologies (ICCCNT), IEEE, pp. 1–6 (2019)
25.
go back to reference Q. Nguyen, G. Shen, J. Choi, Sound detection and localization in windy conditions for intelligent outdoor security cameras. Circuits Syst. Signal Process. 35, 233–251 (2016)MathSciNetCrossRef Q. Nguyen, G. Shen, J. Choi, Sound detection and localization in windy conditions for intelligent outdoor security cameras. Circuits Syst. Signal Process. 35, 233–251 (2016)MathSciNetCrossRef
26.
go back to reference G.K. Papageorgiou, M. Sellathurai, Y.C. Eldar, Deep networks for direction-of-arrival estimation in low snr. IEEE Trans. Signal Process. 69, 3714–3729 (2021)MathSciNetCrossRef G.K. Papageorgiou, M. Sellathurai, Y.C. Eldar, Deep networks for direction-of-arrival estimation in low snr. IEEE Trans. Signal Process. 69, 3714–3729 (2021)MathSciNetCrossRef
27.
go back to reference A.S. Subramanian, S.J. Chen, Watanabe S Student-teacher learning for BLSTM mask-based speech enhancement. arXiv preprint arXiv:1803.10013 (2018) A.S. Subramanian, S.J. Chen, Watanabe S Student-teacher learning for BLSTM mask-based speech enhancement. arXiv preprint arXiv:​1803.​10013 (2018)
28.
go back to reference S. Tao, H. Reddy, J.R. Jensen et al., Frequency bin-wise single channel speech presence probability estimation using multiple DNNS. in ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 1–5 (2023) S. Tao, H. Reddy, J.R. Jensen et al., Frequency bin-wise single channel speech presence probability estimation using multiple DNNS. in ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 1–5 (2023)
29.
go back to reference Y.H. Tu, J. Du, C.H. Lee, Speech enhancement based on teacher-student deep learning using improved speech presence probability for noise-robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 27(12), 2080–2091 (2019)CrossRef Y.H. Tu, J. Du, C.H. Lee, Speech enhancement based on teacher-student deep learning using improved speech presence probability for noise-robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 27(12), 2080–2091 (2019)CrossRef
30.
go back to reference A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)CrossRef A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)CrossRef
31.
go back to reference P. Vecchiotti, N. Ma, S. Squartini et al., End-to-end binaural sound localisation from the raw waveform, in ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 451–455 (2019) P. Vecchiotti, N. Ma, S. Squartini et al., End-to-end binaural sound localisation from the raw waveform, in ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 451–455 (2019)
32.
go back to reference S. Wandale, K. Ichige, On the DOA estimation performance of optimum arrays based on deep learning. in 2020 IEEE 11th sensor array and multichannel signal processing workshop (SAM), IEEE, pp. 1–5 (2020) S. Wandale, K. Ichige, On the DOA estimation performance of optimum arrays based on deep learning. in 2020 IEEE 11th sensor array and multichannel signal processing workshop (SAM), IEEE, pp. 1–5 (2020)
33.
go back to reference H. Wang, K. Chen, J. Lu, U-net based direct-path dominance test for robust direction-of-arrival estimation. arXiv preprint arXiv:2005.04376 (2020a) H. Wang, K. Chen, J. Lu, U-net based direct-path dominance test for robust direction-of-arrival estimation. arXiv preprint arXiv:​2005.​04376 (2020a)
34.
go back to reference X. Wang, G. Huang, J. Benesty et al., Time difference of arrival estimation based on a Kronecker product decomposition. IEEE Signal Process. Lett. 28, 51–55 (2020)CrossRef X. Wang, G. Huang, J. Benesty et al., Time difference of arrival estimation based on a Kronecker product decomposition. IEEE Signal Process. Lett. 28, 51–55 (2020)CrossRef
35.
go back to reference X. Xiao, S. Zhao, X. Zhong et al., A learning-based approach to direction of arrival estimation in noisy and reverberant environments, in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 2814–2818 (2015) X. Xiao, S. Zhao, X. Zhong et al., A learning-based approach to direction of arrival estimation in noisy and reverberant environments, in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 2814–2818 (2015)
36.
go back to reference C. Ying, W. Xiang, H. Zhitao, Underdetermined DOA estimation via multiple time-delay covariance matrices and deep residual network. J. Syst. Eng. Electron. 32(6), 1354–1363 (2021)CrossRef C. Ying, W. Xiang, H. Zhitao, Underdetermined DOA estimation via multiple time-delay covariance matrices and deep residual network. J. Syst. Eng. Electron. 32(6), 1354–1363 (2021)CrossRef
37.
go back to reference Y. Yuan, S. Wu, Y. Yang et al., Multi-DOA estimation based on the KR image tensor and improved estimation network. Sci. Rep. 11(1), 6386 (2021)CrossRef Y. Yuan, S. Wu, Y. Yang et al., Multi-DOA estimation based on the KR image tensor and improved estimation network. Sci. Rep. 11(1), 6386 (2021)CrossRef
38.
go back to reference O.B Zaken, B. Rafaely, A. Kumar et al. Direction of arrival estimation for reverberant speech based on neural networks and the direct-path dominance test. in 2022 international workshop on acoustic signal enhancement (IWAENC), IEEE, pp. 1–5 (2022) O.B Zaken, B. Rafaely, A. Kumar et al. Direction of arrival estimation for reverberant speech based on neural networks and the direct-path dominance test. in 2022 international workshop on acoustic signal enhancement (IWAENC), IEEE, pp. 1–5 (2022)
39.
go back to reference M. Zhang, X. Pan, Y. Shen et al., Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array. J. Acoust. Soc. Am. 149(6), 3841–3850 (2021)CrossRef M. Zhang, X. Pan, Y. Shen et al., Deep learning-based direction-of-arrival estimation for multiple speech sources using a small scale array. J. Acoust. Soc. Am. 149(6), 3841–3850 (2021)CrossRef
40.
go back to reference X. Zhang, Z. Zheng, W.Q. Wang et al., DOA estimation of coherent sources using coprime array via atomic norm minimization. IEEE Signal Process. Lett. 29, 1312–1316 (2022)CrossRef X. Zhang, Z. Zheng, W.Q. Wang et al., DOA estimation of coherent sources using coprime array via atomic norm minimization. IEEE Signal Process. Lett. 29, 1312–1316 (2022)CrossRef
41.
go back to reference Z. Zhang, X. Wu, C. Li et al., An \(l\) p-norm based method for off-grid DOA estimation. Circuits Syst. Signal Process. 38(2), 904–917 (2019)CrossRef Z. Zhang, X. Wu, C. Li et al., An \(l\) p-norm based method for off-grid DOA estimation. Circuits Syst. Signal Process. 38(2), 904–917 (2019)CrossRef
Metadata
Title
Time-Frequency Bins Selection for Direction of Arrival Estimation Based on Speech Presence Probability Learning
Authors
Qinzheng Zhang
Haiyan Wang
Jesper Rindom Jensen
Shuai Tao
Mads Græsbøll Christensen
Publication date
18-01-2024
Publisher
Springer US
Published in
Circuits, Systems, and Signal Processing / Issue 5/2024
Print ISSN: 0278-081X
Electronic ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-023-02586-x