Skip to main content
Top
Published in: Cluster Computing 4/2019

14-02-2018

Research on speech separation technology based on deep learning

Authors: Yan Zhou, Heming Zhao, Jie Chen, Xinyu Pan

Published in: Cluster Computing | Special Issue 4/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In order to solve the problem of instability of the traditional speech separation algorithm, a kind of reverberation speech separation model based on deep learning is proposed. The problem of speech separation in reverberation environment has been studied. The auditory scene analysis is used to simulate the human auditory perception ability. According to the ideal two value mode principle, the target speech signal can be extracted. Moreover, the deep neural network (DNN) shows great learning ability in speech recognition and artificial intelligence. In this paper, a DNN model is proposed to learn the inverse reverberation and denoising by learning the spectrum mapping between “contaminated” speech and pure speech. By extracting a series of spectrum features, the time dynamic information of adjacent frames is fused. The DNN is used to transform the coded spectrum, and restore the pure voice frequency spectrum. Finally, the time domain signal is reconstructed. In addition, the feature classification ability of DNN is also proposed to complete the separation of double sound reverberation speech. The binaural features ITD and ILD and the mono features GFCC are fused to form a long eigenvector. The DNN is pre-trained by RBM to complete the classification task. The results show that the proposed model improves the quality and intelligibility of the speech separation, and enhances the stability of the system significantly.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Barker, J.P.: Evaluation of scene analysis using real and simulated acoustic mixtures: lessons learnt from the chime speech recognition challenges. J. Acoust. Soc. Am. 141(5), 3693–3693 (2017)CrossRef Barker, J.P.: Evaluation of scene analysis using real and simulated acoustic mixtures: lessons learnt from the chime speech recognition challenges. J. Acoust. Soc. Am. 141(5), 3693–3693 (2017)CrossRef
2.
go back to reference Asaei, A., Taghizadeh, M. J., Cevher, V.: Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis. Speech Commun. 76(C), 201–217 (2016) Asaei, A., Taghizadeh, M. J., Cevher, V.: Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis. Speech Commun. 76(C), 201–217 (2016)
3.
go back to reference Josupeit, A., Kopčo, N., Hohmann, V.: Modeling of speech localization in a multi-talker mixture using periodicity and energy-based auditory features. J. Acoust. Soc. Am. 139(5), 2911 (2016)CrossRef Josupeit, A., Kopčo, N., Hohmann, V.: Modeling of speech localization in a multi-talker mixture using periodicity and energy-based auditory features. J. Acoust. Soc. Am. 139(5), 2911 (2016)CrossRef
4.
go back to reference Scholes, C., Palmer, A.R., Sumner, C.J.: Stream segregation in the anesthetized auditory cortex. Hear. Res. 328(2), 48–58 (2015)CrossRef Scholes, C., Palmer, A.R., Sumner, C.J.: Stream segregation in the anesthetized auditory cortex. Hear. Res. 328(2), 48–58 (2015)CrossRef
5.
go back to reference Denham, S., Coath, M.: The role of form in modeling auditory scene analysis. J. Acoust. Soc. Am. 137(4), 2249–2249 (2015)CrossRef Denham, S., Coath, M.: The role of form in modeling auditory scene analysis. J. Acoust. Soc. Am. 137(4), 2249–2249 (2015)CrossRef
6.
go back to reference Vander, G.M., Bourguignon, M., de Beeck, M., Wens, V., Marty, B., Hassid, S., et al.: Left superior temporal gyrus is coupled to attended speech in a cocktail-party auditory scene. J. Neurosci. 36(5), 1596–1606 (2016) Vander, G.M., Bourguignon, M., de Beeck, M., Wens, V., Marty, B., Hassid, S., et al.: Left superior temporal gyrus is coupled to attended speech in a cocktail-party auditory scene. J. Neurosci. 36(5), 1596–1606 (2016)
7.
go back to reference Rogalsky, C., Poppa, T., Chen, K.H., Anderson, S.W., Damasio, H., Love, T., et al.: Speech repetition as a window on the neurobiology of auditory-motor integration for speech: a voxel-based lesion symptom mapping study. Neuropsychologia 71(01), 18 (2015)CrossRef Rogalsky, C., Poppa, T., Chen, K.H., Anderson, S.W., Damasio, H., Love, T., et al.: Speech repetition as a window on the neurobiology of auditory-motor integration for speech: a voxel-based lesion symptom mapping study. Neuropsychologia 71(01), 18 (2015)CrossRef
8.
go back to reference White-Schwoch, T., Davies, E.C., Thompson, E.C., Carr, K.W., Nicol, T., Bradlow, A.R., et al.: Auditory-neurophysiological responses to speech during early childhood: effects of background noise. Hear. Res. 328, 34–47 (2015)CrossRef White-Schwoch, T., Davies, E.C., Thompson, E.C., Carr, K.W., Nicol, T., Bradlow, A.R., et al.: Auditory-neurophysiological responses to speech during early childhood: effects of background noise. Hear. Res. 328, 34–47 (2015)CrossRef
9.
go back to reference Moossavi, A., Mehrkian, S., Lotfi, Y., Faghih Zadeh, S., Adjedi, H.: The effect of working memory training on auditory stream segregation in auditory processing disorders children. Optics Commun 281(9), 2491–2497 (2015) Moossavi, A., Mehrkian, S., Lotfi, Y., Faghih Zadeh, S., Adjedi, H.: The effect of working memory training on auditory stream segregation in auditory processing disorders children. Optics Commun 281(9), 2491–2497 (2015)
10.
go back to reference Kenway, B., Tam, Y.C., Vanat, Z., Harris, F., Gray, R., Birchall, J., et al.: Pitch discrimination: an independent factor in cochlear implant performance outcomes. Otol. Neurotol. 36(9), 1472–1479 (2015)CrossRef Kenway, B., Tam, Y.C., Vanat, Z., Harris, F., Gray, R., Birchall, J., et al.: Pitch discrimination: an independent factor in cochlear implant performance outcomes. Otol. Neurotol. 36(9), 1472–1479 (2015)CrossRef
11.
go back to reference Mathon, B., Ulvin, L.B., Adam, C., Baulac, M., Dupont, S., Navarro, V., et al.: Surgical treatment for mesial temporal lobe epilepsy associated with hippocampal sclerosis. Revue Neurol. 171(3), 315–325 (2015)CrossRef Mathon, B., Ulvin, L.B., Adam, C., Baulac, M., Dupont, S., Navarro, V., et al.: Surgical treatment for mesial temporal lobe epilepsy associated with hippocampal sclerosis. Revue Neurol. 171(3), 315–325 (2015)CrossRef
12.
go back to reference Leclère, T., Lavandier, M., Culling, J.F.: Speech intelligibility prediction in reverberation: towards an integrated model of speech transmission, spatial unmasking, and binaural de-reverberation. J. Acoust. Soc. Am. 137(6), 3335–3345 (2015)CrossRef Leclère, T., Lavandier, M., Culling, J.F.: Speech intelligibility prediction in reverberation: towards an integrated model of speech transmission, spatial unmasking, and binaural de-reverberation. J. Acoust. Soc. Am. 137(6), 3335–3345 (2015)CrossRef
13.
go back to reference Léger, A.C., Reed, C.M., Desloge, J.G., Swaminathan, J., Braida, L.D.: Consonant identification in noise using hilbert-transform temporal fine-structure speech and recovered-envelope speech for listeners with normal and impaired hearing. J. Acoust. Soc. Am. 138(1), 389–403 (2015)CrossRef Léger, A.C., Reed, C.M., Desloge, J.G., Swaminathan, J., Braida, L.D.: Consonant identification in noise using hilbert-transform temporal fine-structure speech and recovered-envelope speech for listeners with normal and impaired hearing. J. Acoust. Soc. Am. 138(1), 389–403 (2015)CrossRef
14.
go back to reference Koralus, P.: Can visual cognitive neuroscience learn anything from the philosophy of language? ambiguity and the topology of neural network models of multistable perception. Synthese 193(5), 1409–1432 (2016)CrossRef Koralus, P.: Can visual cognitive neuroscience learn anything from the philosophy of language? ambiguity and the topology of neural network models of multistable perception. Synthese 193(5), 1409–1432 (2016)CrossRef
Metadata
Title
Research on speech separation technology based on deep learning
Authors
Yan Zhou
Heming Zhao
Jie Chen
Xinyu Pan
Publication date
14-02-2018
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 4/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-018-2013-6

Other articles of this Special Issue 4/2019

Cluster Computing 4/2019 Go to the issue

Premium Partner