Voice activity detection based on deep belief networks using likelihood ratio

Kim, Sang-Kyun; Park, Young-Jin; Lee, Sangmin

doi:10.1007/s11771-016-3057-5

Voice activity detection based on deep belief networks using likelihood ratio

Published: 12 January 2016

Volume 23, pages 145–149, (2016)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

Sang-Kyun Kim¹,
Young-Jin Park² &
Sangmin Lee^1,3

131 Accesses
5 Citations
Explore all metrics

Abstract

A novel technique is proposed to improve the performance of voice activity detection (VAD) by using deep belief networks (DBN) with a likelihood ratio (LR). The likelihood ratio is derived from the speech and noise spectral components that are assumed to follow the Gaussian probability density function (PDF). The proposed algorithm employs DBN learning in order to classify voice activity by using the input signal to calculate the likelihood ratio. Experiments show that the proposed algorithm yields improved results in various noise environments, compared to the conventional VAD algorithms. Furthermore, the DBN based algorithm decreases the detection probability of error with [0.7, 2.6] compared to the support vector machine based algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

MAK M W, YU H B. A study of voice activity detection techniques for NIST speaker recognition evaluations [J]. Computer Speech & Language, 2014, 28(1): 295–313.
Article Google Scholar
PARK Y S, LEE S M. Voice activity detection using global speech absence probability based on teager energy for speech enhancement [J]. IEICE Trans Inf & Syst, 2012, E95-D(10): 2568–2571.
Article Google Scholar
KIM S K, CHANG J H. Voice activity detection based on conditional MAP criterion incorporating the spectral gradient [J]. Signal Processing, 2012, 92(7): 1699–1705.
Article Google Scholar
KIM Y S, SONG J H, KIM S K, LEE S M. Variable step-size affine projection algorithm based on GSAP for adaptive feedback cancellation [J]. Journal of Central South University, 2014, 21(2): 646–650.
Article Google Scholar
KWON K S, SHIN J W, SONOWAT S, CHOI I K, KIM N S. Speech enhancement combining statistical models and NMF with update of speech and noise bases [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Florence, 2014: 7053–7057.
Google Scholar
RABINER L R, SAMBUR M R. Voiced-unvoiced silence detection using Itakura LPC distance measure [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Hartford, 1977: 323–326.
Chapter Google Scholar
SOHN J, KIM N S, SUNG W. A statistical model-based voice activity detection [J]. IEEE Signal Processing Letters, 1999, 6(1): 1–3.
Article Google Scholar
EPHRAIM Y, MALAH D. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator [J]. IEEE Trans Acoustic Speech Signal Processing, 1984, 32(6): 1109–1121.
Article Google Scholar
SHIN J W, KWON H J, JIN S H, KIM N S. Voice activity detection based on conditional MAP criterion [J]. IEEE Signal Processing Letters, 2008, 15: 257–260.
Article Google Scholar
JO Q H, PARK Y S, LEE K H, CHANG J H. A support vector machine-based voice activity detection employing effective feature vectors [J]. IEICE Trans Commun, 2008, E91-B(6): 2090–2093.
Article Google Scholar
QI Z, TIAN Y, SHI Y. Robust twin support vector machine for pattern classification [J]. Pattern Recognition, 2013, 46(1): 305–316.
Article MATH Google Scholar
ZHANG X-L, WU J. Deep belief networks based voice activity detection [J]. IEEE Trans ASLP, 2013, 21(4): 697–710.
Google Scholar
ZHANG X-L, WU J. Denoising deep neural networks based voice activity detection [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Vancouver, 2013.
Google Scholar
HUGHES T, MIERKE K. Recurrent neural networks for voice activity detection [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Vancouver, 2013.
Google Scholar
BENGIO Y, LECUN Y. Scaling learning algorithms towards AI [J]. Large-scale Kernel Machines, 2007, 34(5): 321–360.
Google Scholar
HINTON G. A practical guide to training restricted Boltzmann machines [M]// Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012: 599–619.
Chapter Google Scholar
SELTZER M L, YU D, WANG Y. An investigation of deep neural networks for noise robust speech recognition [C]// Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP). Vancouver, 2013: 7398–7402.
Google Scholar
HINTON G, Training products of experts by minimizing contrastive divergence [J]. Neural Computation, 2002, 18(7): 1527–1554.
Article MathSciNet MATH Google Scholar
ITU-T. Appendix III: G.729 Annex B enhancement in voice-over-IP applications-Option 2 [R]. 2005.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, Inha University, Incheon, 402-751, Korea
Sang-Kyun Kim & Sangmin Lee
Korea Electrotechnology Research Institute (KERI), 111 Hanggaul ro, Sangrok Gu, An-San shi, Kyunggi Do, 426-170, Korea
Young-Jin Park
Institute for Information and Electronics Research, Inha University, Incheon, 402-751, Korea
Sangmin Lee

Authors

Sang-Kyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Young-Jin Park
View author publications
You can also search for this author in PubMed Google Scholar
Sangmin Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sangmin Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, SK., Park, YJ. & Lee, S. Voice activity detection based on deep belief networks using likelihood ratio. J. Cent. South Univ. 23, 145–149 (2016). https://doi.org/10.1007/s11771-016-3057-5

Download citation

Received: 12 June 2015
Accepted: 08 November 2015
Published: 12 January 2016
Issue Date: January 2016
DOI: https://doi.org/10.1007/s11771-016-3057-5

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Voice activity detection based on deep belief networks using likelihood ratio

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Autoencoders and their applications in machine learning: a survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

Navigation

Voice activity detection based on deep belief networks using likelihood ratio

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Autoencoders and their applications in machine learning: a survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation