Skip to main content
Top

2018 | OriginalPaper | Chapter

8. Efficient Source Separation Using Bitwise Neural Networks

Authors : Minje Kim, Paris Smaragdis

Published in: Audio Source Separation

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Efficiency is one of the key issues in single-channel source separation systems due to the fact that they are often employed for real-time processing. More computationally demanding approaches tend to produce better results, but often not fast enough to be deployed in practical systems. For example, as opposed to the iterative separation algorithms using source-specific dictionaries, a Deep Neural Network (DNN) performs separation via an iteration-free feedforward process. However, even the feedforward process can be very complex depending on the size of the network. In this chapter, we introduce Bitwise Neural Networks (BNN) as an extremely compact form of neural networks, whose feedforward pass uses only efficient bitwise operations (e.g. XNOR instead of multiplication) on binary weight matrices and quantized input signals. As a result, we show that BNNs can perform denoising with a negnigible loss of quality as compared to a corresponding network with the same structure, while reducing the network complexity significantly.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Since we would like to target on real-valued masks defined between 0 and 1, we use this approximated version of masks \(t_d\) rather than the more proper complex-valued one.
 
2
We call this procedure binarization.
 
3
Note that this searching task in the binary feature space is very similar to the information retrieval process using hashing.
 
4
\([-1,+1]\) when we use bipolar binaries.
 
5
The results of this section are mostly from [15].
 
Literature
1.
go back to reference P. Smaragdis, J.C. Brown, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY (2003), pp. 177–180 P. Smaragdis, J.C. Brown, Non-negative matrix factorization for polyphonic music transcription, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY (2003), pp. 177–180
2.
go back to reference D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefMATH D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefMATH
3.
go back to reference D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems (NIPS), vol. 13. (2001) D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems (NIPS), vol. 13. (2001)
4.
go back to reference T. Hofmann, Probablistic latent semantic analysis, in Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI) (1999) T. Hofmann, Probablistic latent semantic analysis, in Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI) (1999)
5.
go back to reference T. Hofmann, Probablistic latent semantic indexing, in Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) (1999) T. Hofmann, Probablistic latent semantic indexing, in Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) (1999)
6.
go back to reference B. Raj, P. Smaragdis, Latent variable decomposition of spectrograms for single channel speaker separation, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2005), pp. 17–20 B. Raj, P. Smaragdis, Latent variable decomposition of spectrograms for single channel speaker separation, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2005), pp. 17–20
7.
go back to reference M. Kim, P. Smaragdis, G.J. Mysore, Efficient manifold preserving audio source separation using locality sensitive hashing, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 479–483 M. Kim, P. Smaragdis, G.J. Mysore, Efficient manifold preserving audio source separation using locality sensitive hashing, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 479–483
8.
go back to reference M. Kim, P. Smaragdis, Efficient neighborhood-based topic modeling for collaborative audio enhancement on massive crowdsourced recordings, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2016), pp. 41–45 M. Kim, P. Smaragdis, Efficient neighborhood-based topic modeling for collaborative audio enhancement on massive crowdsourced recordings, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2016), pp. 41–45
9.
go back to reference Y. Xu, J. Du, L.-R. Dai, C.-H. Lee, An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2014)CrossRef Y. Xu, J. Du, L.-R. Dai, C.-H. Lee, An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2014)CrossRef
10.
go back to reference P. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)CrossRef P. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)CrossRef
11.
go back to reference D.S. Williamson, Y. Wang, D.L. Wang, Reconstruction techniques for improving the perceptual quality of binary masked speech. J. Acoust. Soc. Am. 136, 892–902 (2014)CrossRef D.S. Williamson, Y. Wang, D.L. Wang, Reconstruction techniques for improving the perceptual quality of binary masked speech. J. Acoust. Soc. Am. 136, 892–902 (2014)CrossRef
12.
go back to reference J. LeRoux, J.R. Hershey, F. Weninger, Deep NMF for speech separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 66–70 J. LeRoux, J.R. Hershey, F. Weninger, Deep NMF for speech separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 66–70
13.
go back to reference J.R. Hershey, Z. Chen, J. LeRoux, S. Watanabe, Deep clustering: discriminative embeddings for segmentation and separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2016), pp. 31–35 J.R. Hershey, Z. Chen, J. LeRoux, S. Watanabe, Deep clustering: discriminative embeddings for segmentation and separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2016), pp. 31–35
14.
go back to reference M. Kim, P. Smaragdis, Bitwise neural networks, in International Conference on Machine Learning (ICML) Workshop on Resource-Efficient Machine Learning (2015) M. Kim, P. Smaragdis, Bitwise neural networks, in International Conference on Machine Learning (ICML) Workshop on Resource-Efficient Machine Learning (2015)
15.
go back to reference M. Kim, Audio computing in the wild: frameworks for big data and small computers. Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2016 M. Kim, Audio computing in the wild: frameworks for big data and small computers. Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2016
16.
go back to reference D. Soudry, I. Hubara, R. Meir, Expectation backpropagation: parameter-free training of multilayer neural networks with continuous or discrete weights, in Advances in Neural Information Processing Systems (NIPS) (2014), pp. 963–971 D. Soudry, I. Hubara, R. Meir, Expectation backpropagation: parameter-free training of multilayer neural networks with continuous or discrete weights, in Advances in Neural Information Processing Systems (NIPS) (2014), pp. 963–971
17.
go back to reference M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, XNOR-Net: imagenet classification using binary convolutional neural networks (2016), arXiv preprint arXiv:1603.05279 M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, XNOR-Net: imagenet classification using binary convolutional neural networks (2016), arXiv preprint arXiv:​1603.​05279
18.
go back to reference I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks, in Advances in Neural Information Processing Systems (2016), pp. 4107–4115 I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks, in Advances in Neural Information Processing Systems (2016), pp. 4107–4115
19.
go back to reference F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. LeRoux, J.R. Hershey, B. Schuller, Speech enhancement with LSTM recurrent neural networks and its application to noise- robust ASR, in Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2015), pp. 91–99 F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. LeRoux, J.R. Hershey, B. Schuller, Speech enhancement with LSTM recurrent neural networks and its application to noise- robust ASR, in Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2015), pp. 91–99
20.
go back to reference Y. Wang, D.L. Wang, Towards scaling up classification-based speech separation. IEEE Trans. Audio Speech Lang. Process. 21(7), 1381–1390 (2013)CrossRef Y. Wang, D.L. Wang, Towards scaling up classification-based speech separation. IEEE Trans. Audio Speech Lang. Process. 21(7), 1381–1390 (2013)CrossRef
21.
go back to reference A. Narayanan, D.L. Wang, Ideal ratio mask estimation using deep neural networks for robust speech recognition, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013), pp. 7092–7096 A. Narayanan, D.L. Wang, Ideal ratio mask estimation using deep neural networks for robust speech recognition, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013), pp. 7092–7096
22.
go back to reference D.S. Williamson, Y. Wang, D.L. Wang, A two-stage approach for improving the perceptual quality of separated speech, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2014), pp. 7084–7088 D.S. Williamson, Y. Wang, D.L. Wang, A two-stage approach for improving the perceptual quality of separated speech, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2014), pp. 7084–7088
23.
go back to reference M. Kim, P. Smaragdis, Adaptive denoising autoencoders: a fine-tuning scheme to learn from test mixtures, in Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2015), pp. 100–107 M. Kim, P. Smaragdis, Adaptive denoising autoencoders: a fine-tuning scheme to learn from test mixtures, in Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2015), pp. 100–107
24.
go back to reference H. Erdogan, J.R. Hershey, S. Watanabe, J. Le Roux, Phase-sensitive and recognition- boosted speech separation using deep recurrent neural networks, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 708–712 H. Erdogan, J.R. Hershey, S. Watanabe, J. Le Roux, Phase-sensitive and recognition- boosted speech separation using deep recurrent neural networks, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015), pp. 708–712
25.
go back to reference N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH
26.
go back to reference J. Yagnik, D. Strelow, D.A. Ross, R. Lin, The power of comparative reasoning, in Proceedings of the International Conference on Computer Vision (ICCV) (2011), pp. 2431–2438 J. Yagnik, D. Strelow, D.A. Ross, R. Lin, The power of comparative reasoning, in Proceedings of the International Conference on Computer Vision (ICCV) (2011), pp. 2431–2438
27.
go back to reference T. Dean, M.A. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, J. Yagnik, Fast, accurate detection of 100,000 object classes on a single machine, in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013), pp. 1814–1821 T. Dean, M.A. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, J. Yagnik, Fast, accurate detection of 100,000 object classes on a single machine, in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013), pp. 1814–1821
28.
go back to reference P. Indyk, R. Motwani, Approximate nearest neighbor-towards removing the curse of dimensionality, in Proceedings of the Annual ACM Symposium on Theory of Computing (STOC) (1998), pp. 604–613 P. Indyk, R. Motwani, Approximate nearest neighbor-towards removing the curse of dimensionality, in Proceedings of the Annual ACM Symposium on Theory of Computing (STOC) (1998), pp. 604–613
29.
go back to reference Y. Weiss, A. Torralba, R. Fergus, Spectral hashing, in Advances in Neural Information Processing Systems (NIPS) (2009), pp. 1753–1760 Y. Weiss, A. Torralba, R. Fergus, Spectral hashing, in Advances in Neural Information Processing Systems (NIPS) (2009), pp. 1753–1760
30.
go back to reference R.R. Salakhutdinov, G.E. Hinton, Semantic hashing, in SIGIR Workshop on Information Retrieval and Applications of Graphical Models (2007) R.R. Salakhutdinov, G.E. Hinton, Semantic hashing, in SIGIR Workshop on Information Retrieval and Applications of Graphical Models (2007)
32.
go back to reference W.S. McCulloch, W.H. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)MathSciNetCrossRefMATH W.S. McCulloch, W.H. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)MathSciNetCrossRefMATH
33.
34.
go back to reference M. Golea, M. March, T.R. Hancock, On learning \(\mu \)-perceptron networks with binary weights, in Advances in Neural Information Processing Systems (NIPS) (1992), pp. 591–598 M. Golea, M. March, T.R. Hancock, On learning \(\mu \)-perceptron networks with binary weights, in Advances in Neural Information Processing Systems (NIPS) (1992), pp. 591–598
35.
go back to reference E. Fiesler, A. Choudry, H.J. Caulfield, Weight discretization paradigm for optical neural networks, in The Hague, 12–16 April. International Society for Optics and Photonics (1990), pp. 164–173 E. Fiesler, A. Choudry, H.J. Caulfield, Weight discretization paradigm for optical neural networks, in The Hague, 12–16 April. International Society for Optics and Photonics (1990), pp. 164–173
36.
go back to reference K. Hwang, W. Sung, Fixed-point feedforward deep neural network design using weights \(+1\), 0, and \(-\)1, in 2014 IEEE Workshop on Signal Processing Systems (SiPS) (2014) K. Hwang, W. Sung, Fixed-point feedforward deep neural network design using weights \(+1\), 0, and \(-\)1, in 2014 IEEE Workshop on Signal Processing Systems (SiPS) (2014)
37.
go back to reference Z. Duan, G.J. Mysore, P. Smaragdis, Online PLCA for real-time semi-supervised source separation, in Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2012), pp. 34–41 Z. Duan, G.J. Mysore, P. Smaragdis, Online PLCA for real-time semi-supervised source separation, in Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (2012), pp. 34–41
38.
go back to reference E. Vincent, C. Févotte, R. Gribonval, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRef E. Vincent, C. Févotte, R. Gribonval, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRef
Metadata
Title
Efficient Source Separation Using Bitwise Neural Networks
Authors
Minje Kim
Paris Smaragdis
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-73031-8_8