nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks

verfasst von : Emad M. Grais, Gerard Roma, Andrew J. R. Simpson, Mark D. Plumbley

Erschienen in: Latent Variable Analysis and Signal Separation

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The sources separated by most single channel audio source separation techniques are usually distorted and each separated source contains residual signals from the other sources. To tackle this problem, we propose to enhance the separated sources to decrease the distortion and interference between the separated sources using deep neural networks (DNNs). Two different DNNs are used in this work. The first DNN is used to separate the sources from the mixed signal. The second DNN is used to enhance the separated signals. To consider the interactions between the separated sources, we propose to use a single DNN to enhance all the separated sources together. To reduce the residual signals of one source from the other separated sources (interference), we train the DNN for enhancement discriminatively to maximize the dissimilarity between the predicted sources. The experimental results show that using discriminative enhancement decreases the distortion and interference between the separated sources.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel On the Use of Latent Mixing Filters in Audio Source Separation

Nächstes Kapitel Audiovisual Speech Separation Based on Independent Vector Analysis Using a Visual Voice Activity Detector

Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (2010)

Erdogan, H., Hershey, J., Watanabe, S., Roux, J.L.: Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: Proceedings of the ICASSP, pp. 708–712 (2015)

Grais, E.M., Erdogan, H.: Hidden Markov models as priors for regularized nonnegative matrix factorization in single-channel source separation. In: Proceedings of the InterSpeech (2012)

Grais, E.M., Erdogan, H.: Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation. In: Proceedings of the InterSpeech (2013)

Grais, E.M., Sen, M.U., Erdogan, H.: Deep neural networks for single channel source separation. In: Proceedings of the ICASSP, pp. 3734–3738 (2014)

Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Singing-Voice separation from monaural recordings using deep recurrent neural networks. In: Proceedings of the ISMIR, pp. 477–482 (2014)

Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)CrossRef

Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. (NIPS) 13, 556–562 (2001)

Narayanan, A., Wang, D.: Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of the ICASSP, pp. 7092–7096 (2013)

10.

Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1652–1664 (2016)CrossRef

11.

Ono, N., Rafii, Z., Kitamura, D., Ito, N., Liutkus, A.: The 2015 signal separation evaluation campaign. In: Proceedings of the LVA/ICA, pp. 387–395 (2015)

12.

Ozerov, A., Fevotte, C., Charbit, M.: Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In: Proceedings of the WASPAA, pp. 121–124 (2009)

13.

Simpson, A.J.R., Roma, G., Grais, E.M., Mason, R., Hummersone, C., Liutkus, A., Plumbley, M.D.: Evaluation of audio source separation models using hypothesis-driven non-parametric statistical methods. In: Proceedings of the EUSIPCO (2016)

14.

Simpson, A.J.R., Roma, G., Plumbley, M.D.: Deep Karaoke: extracting vocals from musical mixtures using a convolutional deep neural network. In: Proceedings of the LVA/ICA, pp. 429–436 (2015)

15.

Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRef

16.

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH

17.

Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15, 1066–1074 (2007)CrossRef

18.

Weninger, F., Hershey, J.R., Roux, J.L., Schuller, B.: Discriminatively trained recurrent neural networks for single-channel speech separation. In: Proceedings of the GlobalSIP, pp. 577–581 (2014)

19.

Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)CrossRef

20.

Williamson, D., Wang, Y., Wang, D.: A two-stage approach for improving the perceptual quality of separated speech. In: Proceedings of the ICASSP, pp. 7034–7038 (2014)

21.

Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)

22.

Hochberg, Y., Tamhane, A.C.: Multiple Comparison Procedures. Wiley, New York (1987)CrossRefMATH

Titel: Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks
verfasst von: Emad M. Grais
Gerard Roma
Andrew J. R. Simpson
Mark D. Plumbley
Verlag: Springer International Publishing
Buch: Latent Variable Analysis and Signal Separation
Print ISBN: 978-3-319-53546-3

Electronic ISBN: 978-3-319-53547-0

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-53547-0_23

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"