Top

International Journal of Speech Technology

Published in:

09-05-2020

A study on unsupervised monaural reverberant speech separation

Authors: R. Hemavathi, R. Kumaraswamy

Published in: International Journal of Speech Technology | Issue 2/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Separating individual source signals is a challenging task in musical and multitalker source separation. This work studies unsupervised monaural (co-channel) speech separation (UCSS) in reverberant environment. UCSS is the problem of separating the individual speakers from multispeaker speech without using any training data and with minimum information regarding mixing condition and sources. In this paper, state-of-art UCSS algorithms based on auditory and statistical approaches are evaluated for reverberant speech mixtures and results are discussed. This work also proposes to use multiresolution cochleagram and Constant Q Transform (CQT) spectrogram feature with two-dimensional Non-negative matrix factorization. Results show that proposed algorithm with CQT spectrogram feature gave an improvement of 1.986 and 1.262 in terms of speech intelligibility and 0.296 db and 0.561 db in terms of signal to interference ratio compared to state-of-art statistical and auditory approach respectively at T60 of 0.610s.

previous article Speaker recognition based on short utterance compensation method of generative adversarial networks

next article Order and phase ambiguities correction in the ICA based separation of speech signals

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Chen, J., Wang, Y., & Wang, D. (2014). A feature study for classification-based speech separation at very low signal-to-noise ratio. In IEEE International conference on acoustics, speech and signal processing ICASSP 2014, Florence, Italy, May 4–9 (pp. 7039–7043). https://doi.org/10.1109/ICASSP.2014.6854965.

Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975–979. https://doi.org/10.1121/1.1907229.CrossRef

Chien, J. T., & Hsieh, H. L. (2012). Convex divergence ica for blind source separation. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 302–313. https://doi.org/10.1109/TASL.2011.2161080.CrossRef

Delfarah, M., & Wang, D. (2017). Features for masking-based monaural speech separation in reverberant conditions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(5), 1085–1094.CrossRef

Fevotte, C., Bertin, N., & Durrieu, J. (2009). Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis. Neural Computation, 21(3), 793–830. https://doi.org/10.1162/neco.2008.04-08-771.CrossRefMATH

Gao, B., Woo, W. L., & Dlay, S. S. (2011). Single-channel source separation using emd-subband variable regularized sparse features. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 961–976. https://doi.org/10.1109/TASL.2010.2072500.CrossRef

Gao, B., Woo, W. L., & Dlay, S. S. (2013). Unsupervised single-channel separation of nonstationary signals using gammatone filterbank and itakura saito nonnegative matrix two-dimensional factorizations. IEEE Transactions on Circuits and Systems I: Regular Papers, 60(3), 662–675. https://doi.org/10.1109/TCSI.2012.2215735.MathSciNetCrossRef

Hadad, E., Heese, F., Vary, P., & Gannot, S. (2014). Multichannel audio database in various acoustic environments. In 2014 14th international workshop on acoustic signal enhancement (IWAENC), Juan-les-Pins, France (pp. 313–317). https://doi.org/10.1109/IWAENC.2014.6954309.

Huang, P., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(12), 2136–2147. https://doi.org/10.1109/TASLP.2015.2468583.CrossRef

Hu, K., & Wang, D. (2013). An unsupervised approach to cochannel speech separation. IEEE Transactions on Audio, Speech, and Language Processing, 21(1), 122–131. https://doi.org/10.1109/TASL.2012.2215591.MathSciNetCrossRef

Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3), 626–634. https://doi.org/10.1109/72.761722.CrossRef

Krishna, P. K. M., & Ramaswamy, K. (2017). Single channel speech separation based on empirical mode decomposition and hilbert transform. IET Signal Processing, 11(5), 579–586. https://doi.org/10.1049/iet-spr.2016.0450.CrossRef

Li, P., Guan, Y., Xu, B., & Liu, W. (2006). Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2014–2023. https://doi.org/10.1109/TASL.2006.883258.CrossRef

Li, W., Wang, L., Zhou, Y., Dines, J., Doss, M. M., Bourlard, H., et al. (2014). Feature mapping of multiple beamformed sources for robust overlapping speech recognition using a microphone array. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12), 2244–2255.CrossRef

Madhu, N., & Martin, R. (2011). A versatile framework for speaker separation using a model-based speaker localization approach. IEEE Transactions on Audio, Speech and Language Processing, 19(7), 1900–1912.CrossRef

Mesgarani, N., & Chang, E. F. (2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature, 485(5), 233–236.CrossRef

Molla, M. K. I., & Hirose, K. (2007). Single-mixture audio source separation by subspace decomposition of hilbert spectrum. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 893–900. https://doi.org/10.1109/TASL.2006.885254.CrossRef

Morgan, D. P., George, E. B., Lee, L. T., & Kay, S. M. (1995). Co-channel speaker separation. In 1995 international conference on acoustics, speech, and signal processing, Detroit, Michigan (Vol. 1, pp. 828–831). https://doi.org/10.1109/ICASSP.1995.479822.

Prasanna Kumar, M. K., & Kumaraswamy, R. (2017). Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers. International Journal of Speech Technology, 20(1), 109–125. https://doi.org/10.1007/s10772-016-9392-y.CrossRef

Rennie, S. J., Hershey, J. R., & Olsen, P. A. (2010). Single-channel multitalker speech recognition. IEEE Signal Processing Magazine, 27(6), 66–80.

Saruwatari, H., Kawamura, T., Nishikawa, T., Lee, A., & Shikano, K. (2006). Blind source separation based on a fast-convergence algorithm combining ICA and beamforming. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 666–678. https://doi.org/10.1109/TSA.2005.855832.CrossRef

Schmidt, M. N., & Olsson, R. K. (2006). Single-channel speech separation using sparse non-negative matrix factorization. In ISCA international conference on spoken language proceesing, INTERSPEECH, Pittsburgh, Pennsylvania.

Shao, Y., & Wang, D. (2003). Co-channel speaker identification using usable speech extraction based on multi-pitch tracking. In ıtIEEE international conference on acoustics, speech, and signal processing, Hong Kong, China (Vol. 2, pp. II–205–8). https://doi.org/10.1109/ICASSP.2003.1202330.

Smaragdis, P. (2007). Convolutive speech bases and their application to supervised speech separation. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 1–12. https://doi.org/10.1109/TASL.2006.876726.CrossRef

Smaragdis, P. (2009). Relative-pitch tracking of multiple arbitary sounds. The Journal of the Acoustical Society of America, 125(5), 3406–3413.CrossRef

Swamy, R. K., Murty, K. S. R., & Yegnanarayana, B. (2007). Determining number of speakers from multispeaker speech signals using excitation source information. IEEE Signal Processing Letters, 14(7), 481–484. https://doi.org/10.1109/LSP.2006.891333.CrossRef

Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2010). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In IEEE international conference on acoustics, speech and signal processing, Dallas, Texas, USA (pp. 4214–4217). https://doi.org/10.1109/ICASSP.2010.5495701.

Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning: An overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10), 1702–1726. https://doi.org/10.1109/TASLP.2018.2842159.MathSciNetCrossRef

Wang, Y., & Wang, D. (2013). Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech, and Language Processing, 21(7), 1381–1390. https://doi.org/10.1109/TASL.2013.2250961.CrossRef

Wang, Z., & Wang, D. (2019). Combining spectral and spatial features for deep learning based blind speaker separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(2), 457–468.CrossRef

Yegnanarayana, B., Swamy, R. K., & Prasanna, S. R. M. (2005). Separation of multispeaker speech using excitation information. NOLISP-2005, Barcelona, Spain (pp. 11–18).

Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207. https://doi.org/10.1109/TASL.2009.2016230.CrossRef

Zhang, X., & Wang, D. (2016). A deep ensemble learning method for monaural speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5), 967–977. https://doi.org/10.1109/TASLP.2016.2536478.MathSciNetCrossRef

Title: A study on unsupervised monaural reverberant speech separation
Authors: R. Hemavathi
R. Kumaraswamy
Publication date: 09-05-2020
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 2/2020
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-020-09706-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2020

Camouflaged object detection for machine vision applications

Athlete’s respiratory frequency and physical energy consumption model based on speech recognition technology

Multimodal speech emotion recognition and classification using convolutional neural network techniques

Synthesis of phased array antenna for side lobe level reduction using the differential evolution algorithm

A study on different closed domain question answering approaches

Audio compression with multi-algorithm fusion and its impact in speech emotion recognition