Top

Neural Processing Letters

Published in:

10-06-2022

Single-channel Multi-speakers Speech Separation Based on Isolated Speech Segments

Authors: Shanfa Ke, Zhongyuan Wang, Ruimin Hu, Xiaochen Wang

Published in: Neural Processing Letters | Issue 1/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In a real multi-speaker scenario, the signal collected by the microphone contains a large number of time periods with only one speaker’s speech which were called isolated speech segments. In view of this fact, this paper proposes a single-channel multi-speaker speech separation method based on the similarity between the speaker feature center and the mixture feature in the deep embedding space. In particular, the isolated speech segments extracted from the observed signal are converted to deep embedding vectors, and then a speaker feature center will be created. The similarity between this center and the deep embedding feature of mixture is constructed as a mask of the corresponding speaker, which is used to separate the speaker’s speech. A residual-based deep embedding network with stacked 2-D convolutional blocks instead of bi-directional long short-term memory is proposed for faster speed and better feature extraction. In addition, an isolated speech segment extraction method based on Chimera++ has been proposed, because the previous experiments showed that Chimera++ algorithm owns good separation performance for segments from only one speaker. The evaluation results on the general datasets show that the proposed method substantially outperforms competing algorithms up to 0.94 dB in Signal-to-Distortion Ratio.

previous article Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection

next article HoINT: Learning Explicit and Implicit High-order Feature Interactions for Click-through Rate Prediction

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Wang Z-Q, Wang DL (2016) A joint training framework for robust automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 24(4):796–806CrossRef

Li J, Deng L, Gong Y, HaebUmbach R (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22(4):745–777CrossRef

Narayanan A, Wang DL (2014) Investigation of speech separation as a front-end for noise robust speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22(4):826–835CrossRef

Pedersen MS (2006) Source separation for hearing aid applications. IMM, Informatik og Matematisk Modelling, DTU, Lyngby

Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430CrossRef

Aarabi P, Shi G, Jahromi O (2003) Robust speech separation using time-frequency masking. In: 2003 International conference on multimedia and expo. ICME’03. Proceedings (Cat. No. 03TH8698), vol 1. IEEE, pp I–741

Alinaghi A, Jackson Philip JB, Liu Q, Wang W (2014) Joint mixing vector and binaural model based stereo source separation. IEEE/ACM Trans Audio Speech Lang Process 22(9):1434–1448CrossRef

Wang Y, Han K, Wang DL (2012) Exploring monaural features for classification-based speech segregation. IEEE Trans Audio Speech Lang Process 21(2):270–279CrossRef

Wang Y, Narayanan A, Wang DL (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858CrossRef

10.

Virtanen T (2007) Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process 15(3):1066–1074CrossRef

11.

Virtanen T, Gemmeke JF, Raj B (2013) Active-set newton algorithm for overcomplete non-negative representations of audio. IEEE/ACM Trans Audio Speech Lang Process 21(11):2277–2289CrossRef

12.

Huang P-S, Kim M, Hasegawa-Johnson M, Smaragdis P (2015) Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans Audio Speech Lang Process 23(12):2136–2147CrossRef

13.

Yu D, Kolbæk M, Tan Z-H, Jensen J (2017) Permutation invariant training of deep models for speaker-independent multi-talker speech separation. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 241–245

14.

Kolbæk M, Dong Yu, Tan Z-H, Jensen J (2017) Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks. IEEE/ACM Trans Audio Speech Lang Process 25(10):1901–1913CrossRef

15.

Hershey JR, Chen Z, Le Roux J, Watanabe S (2016) Deep clustering: discriminative embeddings for segmentation and separation. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 31–35

16.

Chen Z, Luo Y, Mesgarani N (2017) Deep attractor network for single-microphone speaker separation. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 246–250

17.

Wang Z-Q, Le Roux J, Hershey JR (2018) Alternative objective functions for deep clustering. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 686–690

18.

Wang Z-Q, Le Roux J, Wang DL, Hershey JR (2018) End-to-end speech separation with unfolded iterative phase reconstruction. arXiv:1804.10204

19.

Wang DL, Wang ZQ, Tan K (2019) Deep learning-based phase reconstruction for speaker separation: a trigonometric perspective, pp 71–75

20.

Luo Yi, Mesgarani Nima (2018) Tasnet: time-domain audio separation network for real-time, single-channel speech separation. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 696–700

21.

Luo Y, Mesgarani N (2019) Conv-tasnet: surpassing ideal time-frequency magnitude masking for speech separation. IEEE/ACM Trans Audio Speech Lang Process 27(8):1256–1266CrossRef

22.

Tzinis E, Venkataramani S, Wang Z, Subakan C, Smaragdis P (2020) Two-step sound source separation: training on learned latent targets. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 31–35

23.

Luo Y, Chen Z, Yoshioka T (2020) Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 46–50

24.

Zeghidour N, Grangier D (2020) Wavesplit: end-to-end speech separation by speaker clustering. arXiv:2002.08933

25.

Aihara R, Wichern G, Le Roux J (2020) Deep clustering-based single-channel speech separation and recent advances. Acoust Sci Technol 41(2):465–471CrossRef

26.

Zheng X, Ritz C, Xi J (2012) Encoding navigable speech sources: a psychoacoustic-based analysis-by-synthesis approach. IEEE Trans Audio Speech Lang Process 21(1):29–38CrossRef

27.

Vincent E, Gribonval R, Févotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469CrossRef

28.

Raffel C, McFee B, Humphrey EJ, Salamon J, Nieto O, Liang D, Ellis DPW, Raffel CC (2014) mir_eval: a transparent implementation of common mir metrics. In: Proceedings of the 15th international society for music information retrieval conference, ISMIR. Citeseer

29.

http://labrosa.ee.columbia.edu/mireval/

30.

Ke S, Hu R, Li G, Wu T, Wang X, Wang Z (2019) Multi-speakers speech separation based on modified attractor points estimation and GMM clustering. In: 2019 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1414–1419

Title: Single-channel Multi-speakers Speech Separation Based on Isolated Speech Segments
Authors: Shanfa Ke
Zhongyuan Wang
Ruimin Hu
Xiaochen Wang
Publication date: 10-06-2022
Publisher: Springer US
Published in: Neural Processing Letters / Issue 1/2023
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-022-10887-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2023

Variational Diversity Maximization for Hierarchical Skill Discovery

Event-triggered multi-agent credit allocation pursuit-evasion algorithm

Author Profiling in Code-Mixed WhatsApp Messages Using Stacked Convolution Networks and Contextualized Embedding Based Text Augmentation

ChaInNet: Deep Chain Instance Segmentation Network for Panoptic Segmentation

Retraction Note: Cerebrum Tumor Segmentation of High Resolution Magnetic Resonance Images Using 2D-Convolutional Network with Skull Stripping

Region Centric Minutiae Propagation Measure Orient Forgery Detection with Finger Print Analysis in Health Care Systems