Top

Published in:

2018 | OriginalPaper | Chapter

8. Efficient Source Separation Using Bitwise Neural Networks

Authors : Minje Kim, Paris Smaragdis

Published in: Audio Source Separation

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Efficiency is one of the key issues in single-channel source separation systems due to the fact that they are often employed for real-time processing. More computationally demanding approaches tend to produce better results, but often not fast enough to be deployed in practical systems. For example, as opposed to the iterative separation algorithms using source-specific dictionaries, a Deep Neural Network (DNN) performs separation via an iteration-free feedforward process. However, even the feedforward process can be very complex depending on the size of the network. In this chapter, we introduce Bitwise Neural Networks (BNN) as an extremely compact form of neural networks, whose feedforward pass uses only efficient bitwise operations (e.g. XNOR instead of multiplication) on binary weight matrices and quantized input signals. As a result, we show that BNNs can perform denoising with a negnigible loss of quality as compared to a corresponding network with the same structure, while reducing the network complexity significantly.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Deep Neural Network Based Multichannel Audio Source Separation

next chapter DNN Based Mask Estimation for Supervised Speech Separation

Since we would like to target on real-valued masks defined between 0 and 1, we use this approximated version of masks \(t_d\) rather than the more proper complex-valued one.

We call this procedure binarization.

Note that this searching task in the binary feature space is very similar to the information retrieval process using hashing.

\([-1,+1]\) when we use bipolar binaries.

The results of this section are mostly from [15].

P. Smaragdis, J.C. Brown, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY (2003), pp. 177–180

D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefMATH

D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems (NIPS), vol. 13. (2001)

T. Hofmann, Probablistic latent semantic analysis, in Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI) (1999)

T. Hofmann, Probablistic latent semantic indexing, in Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) (1999)

B. Raj, P. Smaragdis, Latent variable decomposition of spectrograms for single channel speaker separation, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2005), pp. 17–20

M. Kim, P. Smaragdis, G.J. Mysore, Efficient manifold preserving audio source separation using locality sensitive hashing, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 479–483

M. Kim, P. Smaragdis, Efficient neighborhood-based topic modeling for collaborative audio enhancement on massive crowdsourced recordings, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2016), pp. 41–45

Y. Xu, J. Du, L.-R. Dai, C.-H. Lee, An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2014)CrossRef

10.

P. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)CrossRef

11.

D.S. Williamson, Y. Wang, D.L. Wang, Reconstruction techniques for improving the perceptual quality of binary masked speech. J. Acoust. Soc. Am. 136, 892–902 (2014)CrossRef

12.

J. LeRoux, J.R. Hershey, F. Weninger, Deep NMF for speech separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 66–70

13.

J.R. Hershey, Z. Chen, J. LeRoux, S. Watanabe, Deep clustering: discriminative embeddings for segmentation and separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2016), pp. 31–35

14.

M. Kim, P. Smaragdis, Bitwise neural networks, in International Conference on Machine Learning (ICML) Workshop on Resource-Efficient Machine Learning (2015)

15.

M. Kim, Audio computing in the wild: frameworks for big data and small computers. Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2016

16.

D. Soudry, I. Hubara, R. Meir, Expectation backpropagation: parameter-free training of multilayer neural networks with continuous or discrete weights, in Advances in Neural Information Processing Systems (NIPS) (2014), pp. 963–971

17.

M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, XNOR-Net: imagenet classification using binary convolutional neural networks (2016), arXiv preprint arXiv:1603.05279

18.

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks, in Advances in Neural Information Processing Systems (2016), pp. 4107–4115

19.

F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. LeRoux, J.R. Hershey, B. Schuller, Speech enhancement with LSTM recurrent neural networks and its application to noise- robust ASR, in Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2015), pp. 91–99

20.

Y. Wang, D.L. Wang, Towards scaling up classification-based speech separation. IEEE Trans. Audio Speech Lang. Process. 21(7), 1381–1390 (2013)CrossRef

21.

A. Narayanan, D.L. Wang, Ideal ratio mask estimation using deep neural networks for robust speech recognition, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013), pp. 7092–7096

22.

D.S. Williamson, Y. Wang, D.L. Wang, A two-stage approach for improving the perceptual quality of separated speech, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2014), pp. 7084–7088

23.

M. Kim, P. Smaragdis, Adaptive denoising autoencoders: a fine-tuning scheme to learn from test mixtures, in Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2015), pp. 100–107

24.

H. Erdogan, J.R. Hershey, S. Watanabe, J. Le Roux, Phase-sensitive and recognition- boosted speech separation using deep recurrent neural networks, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 708–712

25.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH

26.

J. Yagnik, D. Strelow, D.A. Ross, R. Lin, The power of comparative reasoning, in Proceedings of the International Conference on Computer Vision (ICCV) (2011), pp. 2431–2438

27.

T. Dean, M.A. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, J. Yagnik, Fast, accurate detection of 100,000 object classes on a single machine, in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013), pp. 1814–1821

28.

P. Indyk, R. Motwani, Approximate nearest neighbor-towards removing the curse of dimensionality, in Proceedings of the Annual ACM Symposium on Theory of Computing (STOC) (1998), pp. 604–613

29.

Y. Weiss, A. Torralba, R. Fergus, Spectral hashing, in Advances in Neural Information Processing Systems (NIPS) (2009), pp. 1753–1760

30.

R.R. Salakhutdinov, G.E. Hinton, Semantic hashing, in SIGIR Workshop on Information Retrieval and Applications of Graphical Models (2007)

31.

S. Lloyd, Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)MathSciNetCrossRefMATH

32.

W.S. McCulloch, W.H. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)MathSciNetCrossRefMATH

33.

L. Pitt, L.G. Valiant, Computational limitations on learning from examples. J. Assoc. Comput. Mach. 35, 965–984 (1988)MathSciNetCrossRefMATH

34.

M. Golea, M. March, T.R. Hancock, On learning \(\mu \)-perceptron networks with binary weights, in Advances in Neural Information Processing Systems (NIPS) (1992), pp. 591–598

35.

E. Fiesler, A. Choudry, H.J. Caulfield, Weight discretization paradigm for optical neural networks, in The Hague, 12–16 April. International Society for Optics and Photonics (1990), pp. 164–173

36.

K. Hwang, W. Sung, Fixed-point feedforward deep neural network design using weights \(+1\), 0, and \(-\)1, in 2014 IEEE Workshop on Signal Processing Systems (SiPS) (2014)

37.

Z. Duan, G.J. Mysore, P. Smaragdis, Online PLCA for real-time semi-supervised source separation, in Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2012), pp. 34–41

38.

E. Vincent, C. Févotte, R. Gribonval, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRef

Title: Efficient Source Separation Using Bitwise Neural Networks
Authors: Minje Kim
Paris Smaragdis
Publisher: Springer International Publishing
Book: Audio Source Separation
Print ISBN: 978-3-319-73030-1

Electronic ISBN: 978-3-319-73031-8

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-73031-8_8