Abstract
A novel technique is proposed to improve the performance of voice activity detection (VAD) by using deep belief networks (DBN) with a likelihood ratio (LR). The likelihood ratio is derived from the speech and noise spectral components that are assumed to follow the Gaussian probability density function (PDF). The proposed algorithm employs DBN learning in order to classify voice activity by using the input signal to calculate the likelihood ratio. Experiments show that the proposed algorithm yields improved results in various noise environments, compared to the conventional VAD algorithms. Furthermore, the DBN based algorithm decreases the detection probability of error with [0.7, 2.6] compared to the support vector machine based algorithm.
Similar content being viewed by others
References
MAK M W, YU H B. A study of voice activity detection techniques for NIST speaker recognition evaluations [J]. Computer Speech & Language, 2014, 28(1): 295–313.
PARK Y S, LEE S M. Voice activity detection using global speech absence probability based on teager energy for speech enhancement [J]. IEICE Trans Inf & Syst, 2012, E95-D(10): 2568–2571.
KIM S K, CHANG J H. Voice activity detection based on conditional MAP criterion incorporating the spectral gradient [J]. Signal Processing, 2012, 92(7): 1699–1705.
KIM Y S, SONG J H, KIM S K, LEE S M. Variable step-size affine projection algorithm based on GSAP for adaptive feedback cancellation [J]. Journal of Central South University, 2014, 21(2): 646–650.
KWON K S, SHIN J W, SONOWAT S, CHOI I K, KIM N S. Speech enhancement combining statistical models and NMF with update of speech and noise bases [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Florence, 2014: 7053–7057.
RABINER L R, SAMBUR M R. Voiced-unvoiced silence detection using Itakura LPC distance measure [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Hartford, 1977: 323–326.
SOHN J, KIM N S, SUNG W. A statistical model-based voice activity detection [J]. IEEE Signal Processing Letters, 1999, 6(1): 1–3.
EPHRAIM Y, MALAH D. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator [J]. IEEE Trans Acoustic Speech Signal Processing, 1984, 32(6): 1109–1121.
SHIN J W, KWON H J, JIN S H, KIM N S. Voice activity detection based on conditional MAP criterion [J]. IEEE Signal Processing Letters, 2008, 15: 257–260.
JO Q H, PARK Y S, LEE K H, CHANG J H. A support vector machine-based voice activity detection employing effective feature vectors [J]. IEICE Trans Commun, 2008, E91-B(6): 2090–2093.
QI Z, TIAN Y, SHI Y. Robust twin support vector machine for pattern classification [J]. Pattern Recognition, 2013, 46(1): 305–316.
ZHANG X-L, WU J. Deep belief networks based voice activity detection [J]. IEEE Trans ASLP, 2013, 21(4): 697–710.
ZHANG X-L, WU J. Denoising deep neural networks based voice activity detection [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Vancouver, 2013.
HUGHES T, MIERKE K. Recurrent neural networks for voice activity detection [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Vancouver, 2013.
BENGIO Y, LECUN Y. Scaling learning algorithms towards AI [J]. Large-scale Kernel Machines, 2007, 34(5): 321–360.
HINTON G. A practical guide to training restricted Boltzmann machines [M]// Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012: 599–619.
SELTZER M L, YU D, WANG Y. An investigation of deep neural networks for noise robust speech recognition [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Vancouver, 2013: 7398–7402.
HINTON G, Training products of experts by minimizing contrastive divergence [J]. Neural Computation, 2002, 18(7): 1527–1554.
ITU-T. Appendix III: G.729 Annex B enhancement in voice-over-IP applications-Option 2 [R]. 2005.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, SK., Park, YJ. & Lee, S. Voice activity detection based on deep belief networks using likelihood ratio. J. Cent. South Univ. 23, 145–149 (2016). https://doi.org/10.1007/s11771-016-3057-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-016-3057-5