Skip to main content
Log in

Voice activity detection based on deep belief networks using likelihood ratio

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

A novel technique is proposed to improve the performance of voice activity detection (VAD) by using deep belief networks (DBN) with a likelihood ratio (LR). The likelihood ratio is derived from the speech and noise spectral components that are assumed to follow the Gaussian probability density function (PDF). The proposed algorithm employs DBN learning in order to classify voice activity by using the input signal to calculate the likelihood ratio. Experiments show that the proposed algorithm yields improved results in various noise environments, compared to the conventional VAD algorithms. Furthermore, the DBN based algorithm decreases the detection probability of error with [0.7, 2.6] compared to the support vector machine based algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. MAK M W, YU H B. A study of voice activity detection techniques for NIST speaker recognition evaluations [J]. Computer Speech & Language, 2014, 28(1): 295–313.

    Article  Google Scholar 

  2. PARK Y S, LEE S M. Voice activity detection using global speech absence probability based on teager energy for speech enhancement [J]. IEICE Trans Inf & Syst, 2012, E95-D(10): 2568–2571.

    Article  Google Scholar 

  3. KIM S K, CHANG J H. Voice activity detection based on conditional MAP criterion incorporating the spectral gradient [J]. Signal Processing, 2012, 92(7): 1699–1705.

    Article  Google Scholar 

  4. KIM Y S, SONG J H, KIM S K, LEE S M. Variable step-size affine projection algorithm based on GSAP for adaptive feedback cancellation [J]. Journal of Central South University, 2014, 21(2): 646–650.

    Article  Google Scholar 

  5. KWON K S, SHIN J W, SONOWAT S, CHOI I K, KIM N S. Speech enhancement combining statistical models and NMF with update of speech and noise bases [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Florence, 2014: 7053–7057.

    Google Scholar 

  6. RABINER L R, SAMBUR M R. Voiced-unvoiced silence detection using Itakura LPC distance measure [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Hartford, 1977: 323–326.

    Chapter  Google Scholar 

  7. SOHN J, KIM N S, SUNG W. A statistical model-based voice activity detection [J]. IEEE Signal Processing Letters, 1999, 6(1): 1–3.

    Article  Google Scholar 

  8. EPHRAIM Y, MALAH D. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator [J]. IEEE Trans Acoustic Speech Signal Processing, 1984, 32(6): 1109–1121.

    Article  Google Scholar 

  9. SHIN J W, KWON H J, JIN S H, KIM N S. Voice activity detection based on conditional MAP criterion [J]. IEEE Signal Processing Letters, 2008, 15: 257–260.

    Article  Google Scholar 

  10. JO Q H, PARK Y S, LEE K H, CHANG J H. A support vector machine-based voice activity detection employing effective feature vectors [J]. IEICE Trans Commun, 2008, E91-B(6): 2090–2093.

    Article  Google Scholar 

  11. QI Z, TIAN Y, SHI Y. Robust twin support vector machine for pattern classification [J]. Pattern Recognition, 2013, 46(1): 305–316.

    Article  MATH  Google Scholar 

  12. ZHANG X-L, WU J. Deep belief networks based voice activity detection [J]. IEEE Trans ASLP, 2013, 21(4): 697–710.

    Google Scholar 

  13. ZHANG X-L, WU J. Denoising deep neural networks based voice activity detection [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Vancouver, 2013.

    Google Scholar 

  14. HUGHES T, MIERKE K. Recurrent neural networks for voice activity detection [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Vancouver, 2013.

    Google Scholar 

  15. BENGIO Y, LECUN Y. Scaling learning algorithms towards AI [J]. Large-scale Kernel Machines, 2007, 34(5): 321–360.

    Google Scholar 

  16. HINTON G. A practical guide to training restricted Boltzmann machines [M]// Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012: 599–619.

    Chapter  Google Scholar 

  17. SELTZER M L, YU D, WANG Y. An investigation of deep neural networks for noise robust speech recognition [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Vancouver, 2013: 7398–7402.

    Google Scholar 

  18. HINTON G, Training products of experts by minimizing contrastive divergence [J]. Neural Computation, 2002, 18(7): 1527–1554.

    Article  MathSciNet  MATH  Google Scholar 

  19. ITU-T. Appendix III: G.729 Annex B enhancement in voice-over-IP applications-Option 2 [R]. 2005.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sangmin Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, SK., Park, YJ. & Lee, S. Voice activity detection based on deep belief networks using likelihood ratio. J. Cent. South Univ. 23, 145–149 (2016). https://doi.org/10.1007/s11771-016-3057-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-016-3057-5

Key words

Navigation