Skip to main content

2017 | OriginalPaper | Buchkapitel

Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks

verfasst von : Emad M. Grais, Gerard Roma, Andrew J. R. Simpson, Mark D. Plumbley

Erschienen in: Latent Variable Analysis and Signal Separation

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The sources separated by most single channel audio source separation techniques are usually distorted and each separated source contains residual signals from the other sources. To tackle this problem, we propose to enhance the separated sources to decrease the distortion and interference between the separated sources using deep neural networks (DNNs). Two different DNNs are used in this work. The first DNN is used to separate the sources from the mixed signal. The second DNN is used to enhance the separated signals. To consider the interactions between the separated sources, we propose to use a single DNN to enhance all the separated sources together. To reduce the residual signals of one source from the other separated sources (interference), we train the DNN for enhancement discriminatively to maximize the dissimilarity between the predicted sources. The experimental results show that using discriminative enhancement decreases the distortion and interference between the separated sources.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (2010) Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (2010)
2.
Zurück zum Zitat Erdogan, H., Hershey, J., Watanabe, S., Roux, J.L.: Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: Proceedings of the ICASSP, pp. 708–712 (2015) Erdogan, H., Hershey, J., Watanabe, S., Roux, J.L.: Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: Proceedings of the ICASSP, pp. 708–712 (2015)
3.
Zurück zum Zitat Grais, E.M., Erdogan, H.: Hidden Markov models as priors for regularized nonnegative matrix factorization in single-channel source separation. In: Proceedings of the InterSpeech (2012) Grais, E.M., Erdogan, H.: Hidden Markov models as priors for regularized nonnegative matrix factorization in single-channel source separation. In: Proceedings of the InterSpeech (2012)
4.
Zurück zum Zitat Grais, E.M., Erdogan, H.: Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation. In: Proceedings of the InterSpeech (2013) Grais, E.M., Erdogan, H.: Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation. In: Proceedings of the InterSpeech (2013)
5.
Zurück zum Zitat Grais, E.M., Sen, M.U., Erdogan, H.: Deep neural networks for single channel source separation. In: Proceedings of the ICASSP, pp. 3734–3738 (2014) Grais, E.M., Sen, M.U., Erdogan, H.: Deep neural networks for single channel source separation. In: Proceedings of the ICASSP, pp. 3734–3738 (2014)
6.
Zurück zum Zitat Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Singing-Voice separation from monaural recordings using deep recurrent neural networks. In: Proceedings of the ISMIR, pp. 477–482 (2014) Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Singing-Voice separation from monaural recordings using deep recurrent neural networks. In: Proceedings of the ISMIR, pp. 477–482 (2014)
7.
Zurück zum Zitat Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)CrossRef Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)CrossRef
8.
Zurück zum Zitat Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. (NIPS) 13, 556–562 (2001) Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. (NIPS) 13, 556–562 (2001)
9.
Zurück zum Zitat Narayanan, A., Wang, D.: Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of the ICASSP, pp. 7092–7096 (2013) Narayanan, A., Wang, D.: Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of the ICASSP, pp. 7092–7096 (2013)
10.
Zurück zum Zitat Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1652–1664 (2016)CrossRef Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1652–1664 (2016)CrossRef
11.
Zurück zum Zitat Ono, N., Rafii, Z., Kitamura, D., Ito, N., Liutkus, A.: The 2015 signal separation evaluation campaign. In: Proceedings of the LVA/ICA, pp. 387–395 (2015) Ono, N., Rafii, Z., Kitamura, D., Ito, N., Liutkus, A.: The 2015 signal separation evaluation campaign. In: Proceedings of the LVA/ICA, pp. 387–395 (2015)
12.
Zurück zum Zitat Ozerov, A., Fevotte, C., Charbit, M.: Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In: Proceedings of the WASPAA, pp. 121–124 (2009) Ozerov, A., Fevotte, C., Charbit, M.: Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In: Proceedings of the WASPAA, pp. 121–124 (2009)
13.
Zurück zum Zitat Simpson, A.J.R., Roma, G., Grais, E.M., Mason, R., Hummersone, C., Liutkus, A., Plumbley, M.D.: Evaluation of audio source separation models using hypothesis-driven non-parametric statistical methods. In: Proceedings of the EUSIPCO (2016) Simpson, A.J.R., Roma, G., Grais, E.M., Mason, R., Hummersone, C., Liutkus, A., Plumbley, M.D.: Evaluation of audio source separation models using hypothesis-driven non-parametric statistical methods. In: Proceedings of the EUSIPCO (2016)
14.
Zurück zum Zitat Simpson, A.J.R., Roma, G., Plumbley, M.D.: Deep Karaoke: extracting vocals from musical mixtures using a convolutional deep neural network. In: Proceedings of the LVA/ICA, pp. 429–436 (2015) Simpson, A.J.R., Roma, G., Plumbley, M.D.: Deep Karaoke: extracting vocals from musical mixtures using a convolutional deep neural network. In: Proceedings of the LVA/ICA, pp. 429–436 (2015)
15.
Zurück zum Zitat Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRef Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRef
16.
Zurück zum Zitat Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH
17.
Zurück zum Zitat Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15, 1066–1074 (2007)CrossRef Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15, 1066–1074 (2007)CrossRef
18.
Zurück zum Zitat Weninger, F., Hershey, J.R., Roux, J.L., Schuller, B.: Discriminatively trained recurrent neural networks for single-channel speech separation. In: Proceedings of the GlobalSIP, pp. 577–581 (2014) Weninger, F., Hershey, J.R., Roux, J.L., Schuller, B.: Discriminatively trained recurrent neural networks for single-channel speech separation. In: Proceedings of the GlobalSIP, pp. 577–581 (2014)
19.
Zurück zum Zitat Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)CrossRef Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)CrossRef
20.
Zurück zum Zitat Williamson, D., Wang, Y., Wang, D.: A two-stage approach for improving the perceptual quality of separated speech. In: Proceedings of the ICASSP, pp. 7034–7038 (2014) Williamson, D., Wang, Y., Wang, D.: A two-stage approach for improving the perceptual quality of separated speech. In: Proceedings of the ICASSP, pp. 7034–7038 (2014)
21.
Zurück zum Zitat Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012) Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)
22.
Metadaten
Titel
Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks
verfasst von
Emad M. Grais
Gerard Roma
Andrew J. R. Simpson
Mark D. Plumbley
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-53547-0_23