Top

Published in:

2017 | OriginalPaper | Chapter

13. Voice Activity Detection

Authors : Tom Bäckström, Christian Uhle

Published in: Speech Coding

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Voice Activity Detection (VAD) provides the information whether an audio signal contains speech or not. Besides speech coding and transmission, there are many other applications in speech and audio processing that benefit from this information, and their performance is crucially dependent on the accuracy and robustness of the applied VAD. Various approaches to detect speech have been developed in the past, but when considering the challenging scenarios in which speech needs to be detected, e.g. hands-free communication in noisy environments or dialog in background music, there is still room for improvement. In this chapter, we describe the problem and the environments of VAD, discuss the procedure, examples for methods and their evaluation. Especially the more challenging application scenarios illustrate how superior human hearing can be compared to implementations of audio signal processing.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Packet Loss and Concealment

next chapter Relaxed Code-Excited Linear Prediction (RCELP)

Anemüller, J., Schmidt, D., Bach, J.-H.: Detection of speech embedded in real acoustic background based on amplitude modulation spectrogram features. In: Proceedings of the Interspeech (2008)

Barbedo, J., Lopes, A.: A robust and computationally efficient speech/music discriminator. J. Audio Eng. Soc. 54(7), 571–588 (2006)

Benyassine, A., Shlomot, E., Su, H.-S., Massaloux, D., Lamblin, C., Petit, J.-P.: Itu-t recommandation g.729 annex b: a silence compression scheme for us with g.729 optimized for v. 70 digital simultaneous voice and data applications. IEEE Commun. Mag. 35(9), 64–73 (1997)CrossRef

Carey, M., Parris, E., Lloyd-Thomas, H.: A comparison of features for speech, music discrimination. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1999)

Cornu, E., Sheikhzadeh, H., Brennan, R.L., Abutalebi, H.R., Tam, E.C.Y., Iles, P., Wong, K.W.: Etsi amr-2 vad: Evaluation and ultra low-resource implementation. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2003)

Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Proc. 28(4), 357–366 (1980)CrossRef

Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, Chichester (2000)MATH

El-Maleh, K., Klein, M., Petrucci, G., Kabal, V.: Speech/music discrimination for multimedia applications. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2000)

Dietz, M., et al.: Overview of the EVS codec aarchitecture. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)

10.

Neuendorf, M., et al.: A novel scheme for low bitrate unified speech and audio coding MPEG RM0. In: Proceedings of the AES 126th Convention (2009)

11.

Freeman, D.K., Cosier, G., Southcott, C.B., Boyd, I.: The voice activity detector for the pan-european digital cellular mobile telephone service. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1989)

12.

Fuchs, G.: A robust speech/music discriminator for switched audio coding. In: Proceedings of the European Signal Processing Conference on (EUSIPCO) (2015)

13.

Gray, A.H., Markel, J.D.: A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis. IEEE Trans. Acoust. Speech Sig. Proc. 22, 207–217 (1974)CrossRef

14.

Harb, H., Chen, L.: Robust speech music discrimination using spectrum’s first order statistics and neural networks. In: Proceedings of the International Symposium on Signal Processing and It’s Applications (2003)

15.

Hellmuth, O., Allamanche, E., Herre, J., Kastner, T., Cremer, M., Hirsch, W.: Advanced audio identification using MPEG-7 content description. In: Proceedings of the AES 111th Convection (2001)

16.

Hermansky, H.: Perceptual linear predictive (PLP) analysis for speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRef

17.

Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)CrossRef

18.

Hoyt, J., Wechsler, H.: Detection of human speech in structured noise. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1994)

19.

Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Analysis Mach. Intell. 22, 4–37 (2000)CrossRef

20.

Jarina, R., O’Connor, N., Marlow, S., Murphy, N.: Rhythm detection for speech-music discrimination in MPEG compressed domain. In: Proceedings of the 14th International Conference on Digital Signal Processing (2002)

21.

Karnebäck, S.: Discrimination between speech and music based on a low frequency modulation feature. In: Proceedings of the Eurospeech, Aalborg, Denmark (2001)

22.

Lehner, B., Widmer, W., Sonnleitner, R.: On the reduction of false positives in singing voice detection. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2014)

23.

Loizou, P.C.: Speech quality assessment. In: Lin, W., et al. (eds.) Multimedia Analysis, Processing and Communications. Springer, Heidelberg (2011)

24.

Malenovsky, V., Jelinek, M.: Improving the detection efficiency of the VMR-WB VAD algorithm on music signals. In: Proceedings of the European Signal Processing Conference on (EUSIPCO) (2008)

25.

Martin, R.: Spectral subtraction based on minimum statistics. In: Proceedings of the European Signal Processing Conference (EUSIPCO) (1994)

26.

Masri, P.: Computer modelling of sound for transformation and synthesis of musical signals. Ph.D. thesis, University of Bristol (1996)

27.

Mesgarani, N., Slaney, M., Shamma, S.: Discrimination of speech from non-speech based on multiscale spectro-temporal modulations. IEEE Trans. Audio Speech Lang. Process. 14(3), 920–930 (2006)CrossRef

28.

Moattar, M.H., Homayounpour, M.M.: A simple but efficient real-time voice activity detection algorithm. In: Proceedings of the 17th European Signal Processing Conference on (EUSIPCO) (2009)

29.

Pinquier, J., Rouas, J.-L., André-Obrecht, R.: A fusion study in speech/music classification. In: Proceedings of the International Conference on Multimedia and Expo, ICME (2003)

30.

Ramirez, J., Gorriz, J.M., Segura, J.C.: Voice activity detection. fundamentals and speech recognition system robustness. In: Grimm, M., Kroschel, K. (eds.) Robust Speech Recognition and Understanding. I-Tech (2007)

31.

Ross, M.J., Shaffer, H.L., Cohen, A., Freudenberg, R., Manley, H.J.: Average magnitude difference function pitch extractor. IEEE Trans. Acoustics Speech Signal Proc., 22(5) (1974)

32.

Saunders, J.: Real-time discrimination of broadcast speech/music. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1996)

33.

Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1997)

34.

Skovenborg, E., Lund, T.: Level normalization of feature films using loudness versus speech. In: Proceedings of the AES 135th Convection (2013)

35.

Sonnleitner, R., Niedermayer, B., Widmer, G., Schlueter, J.: A simple and effective spectral feature for speech detection in mixed audio signals. In: Proceedings of the International Conference on Digital Audio Effects (DAFx) (2012)

36.

Srinivasan, K., Gersho, A.: Voice activity detection for cellular networks. In: Proceedings of the IEEE Workshop on Speech Coding (1993)

37.

Tancerel, L., Ragot, S., Ruoppila, V.T., Lefebvre, R.: Combined speech and audio coding by discrimination. In: Proceedings of the IEEE Workshop on Speech Coding (2000)

38.

Tchorz, J., Kollmeier, B.: Speech detection and SNR prediction basing on amplitude modulation pattern recognition. In: Proceedings of the Eurospeech (1999)

39.

Thoshkahna, B., Sudha, V., Ramakrishnan, K.: A speech-music discriminator using HILN-features. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2006)

40.

Tong, S., Chen, N., Qian, Y., Yu, K.: Evaluating VAD for automatic speech recognition. In: Proceedings of the International Conference on Signal Proceesing (ICSP) (2014)

41.

Tucker, R.: Voice activity detection using a periodicity measure. In: IEE Proceedings I - Communications, Speech and Vision (1992)

42.

Uhle, C.: An investigation of low-level signal descriptor characterizing the noise nature of an audio signal. In: Proceedings of the AES 128th Convection (2010)

43.

Uhle, C., Hellmuth, O., Weigel, J.: Speech enhancement of movie sound. In: Proceedings of the AES 125th Convection (2008)

44.

Williams, G., Ellis, D.: Speech/music discrimination based on posterior probability features. In: Proceedings of the Eurospeech (1999)

Title: Voice Activity Detection
Authors: Tom Bäckström
Christian Uhle
Publisher: Springer International Publishing
Book: Speech Coding
Print ISBN: 978-3-319-50202-1

Electronic ISBN: 978-3-319-50204-5

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-50204-5_13

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"