Skip to main content
Top

2018 | OriginalPaper | Chapter

2. Separation of Known Sources Using Non-negative Spectrogram Factorisation

Authors : Tuomas Virtanen, Tom Barker

Published in: Audio Source Separation

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter presents non-negative spectrogram factorisation (NMF) techniques which can be used to separate sources in the cases where source-specific training material is available in advance. We first present the basic NMF formulation for sound mixtures and then present criteria and algorithms for estimating the model parameters. We introduce selected methods for training the NMF source models by using either vector quantisation, convexity constraints, archetypal analysis, or discriminative methods. We also explain how the learned dictionaries can be adapted to deal with mismatches between the training data and usage scenario. We present also how semi-supervised learning can be used to deal with unknown noise sources within a mixture and finally we introduce a coupled NMF method which can be used to model large temporal context while retaining low algorithmic latency.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference T. Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)CrossRef T. Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)CrossRef
2.
go back to reference C. Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009)CrossRefMATH C. Févotte, N. Bertin, J.-L. Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009)CrossRefMATH
3.
go back to reference C. Févotte, J. Idier, Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Comput. 23(9), 2421–2456 (2011)MathSciNetCrossRefMATH C. Févotte, J. Idier, Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Comput. 23(9), 2421–2456 (2011)MathSciNetCrossRefMATH
4.
go back to reference T. Virtanen, B. Raj, J. Gemmeke, H.V. hamme, Active-set Newton algorithm for non-negative sparse coding of audio, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (2014) T. Virtanen, B. Raj, J. Gemmeke, H.V. hamme, Active-set Newton algorithm for non-negative sparse coding of audio, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (2014)
5.
go back to reference J. Gemmeke, T. Virtanen, A. Hurmalainen, Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(7), 2067–2080 (2011)CrossRef J. Gemmeke, T. Virtanen, A. Hurmalainen, Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(7), 2067–2080 (2011)CrossRef
6.
go back to reference B. Raj, T. Virtanen, S. Chaudhure, R. Singh, Non-negative matrix factorization based compensation of music for automatic speech recognition, in Proceedings of Interspeech (2000) B. Raj, T. Virtanen, S. Chaudhure, R. Singh, Non-negative matrix factorization based compensation of music for automatic speech recognition, in Proceedings of Interspeech (2000)
7.
go back to reference P. Smaragdis, M. Shashanka, B. Raj, A sparse non-parametric approach for single channel separation of known sounds, in Proceedings of Neural Information Processing Systems (2009) P. Smaragdis, M. Shashanka, B. Raj, A sparse non-parametric approach for single channel separation of known sounds, in Proceedings of Neural Information Processing Systems (2009)
9.
go back to reference A. Diment, T. Virtanen, Archetypal analysis for audio dictionary learning, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2015) A. Diment, T. Virtanen, Archetypal analysis for audio dictionary learning, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2015)
10.
go back to reference D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Proceedings of Neural Information Processing Systems (2000), pp. 556–562 D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Proceedings of Neural Information Processing Systems (2000), pp. 556–562
11.
go back to reference F. Weninger, J. Le Roux, J.R. Hershey, S. Watanabe, Discriminative NMF and its application to single-channel source separation, in Proceedings of Interspeech (2014) F. Weninger, J. Le Roux, J.R. Hershey, S. Watanabe, Discriminative NMF and its application to single-channel source separation, in Proceedings of Interspeech (2014)
12.
go back to reference P. Sprechmann, A.M. Bronstein, G. Sapiro, Supervised non-euclidean sparse nmf via bilevel optimization with applications to speech enhancement, in Proceedings of Joint Workshop on Hands-free Speech Communication and Microphone Arrays (2014) P. Sprechmann, A.M. Bronstein, G. Sapiro, Supervised non-euclidean sparse nmf via bilevel optimization with applications to speech enhancement, in Proceedings of Joint Workshop on Hands-free Speech Communication and Microphone Arrays (2014)
13.
go back to reference E.M. Grais, H. Erdogan, Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation, in Proceedings of Interspeech (2013) E.M. Grais, H. Erdogan, Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation, in Proceedings of Interspeech (2013)
14.
go back to reference J.F. Gemmeke, T. Virtanen, K. Demuynck, Exemplar-based joint channel and noise compensation, in Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (2013) J.F. Gemmeke, T. Virtanen, K. Demuynck, Exemplar-based joint channel and noise compensation, in Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (2013)
15.
go back to reference T. Barker, T. Virtanen, O. Delhomme, Ultrasound-coupled semi-supervised nonnegative matrix factorisation for speech enhancement, in IEEE International Conference on Acoustics, Speech and Signal Processing (2014), pp. 2129–2133 T. Barker, T. Virtanen, O. Delhomme, Ultrasound-coupled semi-supervised nonnegative matrix factorisation for speech enhancement, in IEEE International Conference on Acoustics, Speech and Signal Processing (2014), pp. 2129–2133
16.
go back to reference C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller, Real-time speech separation by semi-supervised nonnegative matrix factorization, in Proceedings of Latent Variable Analysis and Signal Separation: 10th International Conference, ed. by F. Theis, A. Cichocki, A. Yeredor, M. Zibulevsky (2012), pp. 322–329 C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller, Real-time speech separation by semi-supervised nonnegative matrix factorization, in Proceedings of Latent Variable Analysis and Signal Separation: 10th International Conference, ed. by F. Theis, A. Cichocki, A. Yeredor, M. Zibulevsky (2012), pp. 322–329
17.
go back to reference S. Laugesen, K. Hansen, J. Hellgren, Acceptable delays in hearing aids and implications for feedback cancellation. J. Acoust. Soc. Am. 105(2), 1211–1212 (1999)CrossRef S. Laugesen, K. Hansen, J. Hellgren, Acceptable delays in hearing aids and implications for feedback cancellation. J. Acoust. Soc. Am. 105(2), 1211–1212 (1999)CrossRef
18.
go back to reference J. Agnew, J. Thornton, Just noticeable and objectionable group delays in digital hearing aids. J. Am. Acad. Audiol. 11, 330–336 (2000) J. Agnew, J. Thornton, Just noticeable and objectionable group delays in digital hearing aids. J. Am. Acad. Audiol. 11, 330–336 (2000)
19.
go back to reference T. Barker, T. Virtanen, N.H. Pontoppidan, Low-latency sound-source-separation using non-negative matrix factorisation with coupled analysis and synthesis dictionaries, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (2015) T. Barker, T. Virtanen, N.H. Pontoppidan, Low-latency sound-source-separation using non-negative matrix factorisation with coupled analysis and synthesis dictionaries, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (2015)
Metadata
Title
Separation of Known Sources Using Non-negative Spectrogram Factorisation
Authors
Tuomas Virtanen
Tom Barker
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-73031-8_2