Top

International Journal of Machine Learning and Cybernetics

Published in:

01-08-2018 | Original Article

A voice activity detection algorithm in spectro-temporal domain using sparse representation

Authors: Mohadese Eshaghi, Farbod Razzazi, Alireza Behrad

Published in: International Journal of Machine Learning and Cybernetics | Issue 7/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper describes a new algorithm for voice activity detection (VAD), based on sparse representation of spectro-temporal domain. Our audio classification algorithm is based on multi-scale spectro-temporal modulation features which are extracted using auditory cortex model. The key concept in sparse representation is that any speech fragment can be represented as a linear combination of a small number of exemplar speech tokens. In this algorithm, the approach transforms the speech into spectro-temporal domain resulting in its decomposition into auditory-based features with multiple scales of temporal and spectral resolutions; in the next stage, each frame is divided into several sub-cubes in the new domain; then the algorithm detects the speech in the signal by using the sparse representation of sub-cubes of the frames in this domain. Simulation results are given to illustrate the effectiveness of our new VAD algorithms. The results reveal that the achieved performance is 90.11 and 91.75% under − 5 db SNR in white and car noise respectively, outperforming most of the state of the art VAD algorithms.

previous article Fault diagnosis based on relevance vector machine for fuel regulator of aircraft engine

next article Local dense mixed region cutting + global rebalancing: a method for imbalanced text sentiment classification

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

Freeman DK, Cosier G, Southcott CB, Boyd I (1989) The voice activity detector for the pan European digital cellular mobile telephone service. In: International conference on acoustics, speech, and signal processing, Glascow, May 1989, pp 369–372

Ghosh PK, Tsiartas A, Narayanan S (2011) Robust voice activity detection using long-term signal variability. IEEE Trans Audio Speech Lang Process 19:600–613CrossRef

Datao Y, Jiqing H, Guibin Z, Tieran Z (2012) Sparse power spectrum based robust voice activity detector. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), March 2012, pp 289–292

Hongzhi W, Yuchao X, Meijing L (2011) Study on the MFCC similarity-based voice activity detection algorithm. In: International conference on artificial intelligence, management science and electronic commerce (AIMSEC), August 2011, pp 4391–4394

Martin G, Abeer A, Dan E et al (2013) All for one: feature combination for highly channel-degraded speech activity detection. INTERSPEECH, Lyon, pp 709–713

J. Sohn, N. S. Kim, and W. Sung (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3CrossRef

Cho YD, Kondoz A (2001) Analysis and improvement of a statistical model-based voice activity detector. IEEE Signal Process Lett 8(10):276–278CrossRef

Beritelli F, Casale S, Ruggeri G, Serrano S (2002) Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors. IEEE Signal Process Lett 9(3):85–88CrossRef

Nemer E, Goubran R, Mahmoud S (2001) Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans Speech Audio Process Lett 9(3):217–231CrossRef

10.

Benyassine AE, Shlomot HY, Su D, Massaloux C, Lamblin, Petit JP (1997) ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications. IEEE Commun Mag Lett 35(9):64–73CrossRef

11.

Eshaghi M, Karami MR, Mollaei (2010) Voice activity detection based on using wavelet packet. Digital Signal Process Lett 20(4):1102–1115CrossRef

12.

Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell Lett 11(7):674–693CrossRefMATH

13.

Mesgarani N, Shamma S (2007) Denoising in the domain of spectro-temporal modulations. EURASIP J Audio Speech Music Process 2007(3):042357

14.

Li W, Zhou Y, Poh N, Zhou F, Liao Q (2013) Feature denoising using joint sparse representation for in-car speech recognition. IEEE Signal Process Lett 20:681–684CrossRef

15.

Mesgarani N, David S, Shamma SA (2007) Representation of phoneme in primary auditory cortex: how the brain analyzes speech. In: IEEE international conference on acoustic, speech and signal processing (ICASSP), vol 4. April 2007, Hawai, pp 765–768

16.

Mirbagheri M, Mesgarani N, Shamma S (2010) Nonlinear filtering of spectrotemporal modulation in speech enhancement. In: IEEE international conference on acoustic, speech and signal processing (ICASSP), vol 6. March 2010, pp 5478–5481

17.

Kim C, Kumar K, Stern RM (2011) Binaural sound source separation motivated by auditory processing. In: IEEE international conference on acoustic, speech and signal processing (ICASSP), vol 5. May 2011, Prague, pp 5072–5075

18.

Martínez C, Goddardb J, Milone D, Rufiner H (2012) Bio inspired sparse spectro-temporal representation of speech for robust classification. Comput Speech Lang 26:336–348CrossRef

19.

Gemmeke JF, Van Hamme H, Cranen B, Boves L (2010) Compressive sensing for missing data imputation in noise robust speech recognition. IEEE J Sel Topics Signal Process 4:273–282CrossRef

20.

Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24:227–234MathSciNetCrossRefMATH

21.

Gemmeke J, Cranen B, Remes U (2011) Sparse imputation for large vocabulary noise robust ASR. Comput Speech Lang 25:462–479CrossRef

22.

Mohimani GH, Babaie-Zadeh M, Jutten C (2009) A fast approach for overcomplete sparse decomposition based on smoothed L0 norm. IEEE Trans Signal Process 57:289–301MathSciNetCrossRefMATH

23.

Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee T, Sejnowski TJ (2003) Dictionary learning algorithms for sparse representation. Neural Comput 15:349–396CrossRefMATH

24.

Aharon M, Elad M, Bruckstein A (2006) K-svd: a algorithm for designing over complete dictionaries for sparse representation. IEEE Trans Signal Process 54:4311–4322CrossRefMATH

25.

Zdunek R, Cichocki A (2007) Non-negative matrix factorization with quadratic programming. Neurocomputing 71:2309–2320CrossRefMATH

26.

Fisher WM, Doddington GR, Goudie M, Kathleen M (1986) The DARPA speech recognition research database: specifications and status. In: Proceedings of DARPA workshop on speech recognition, February 1986, Palo. AJeo, pp 93–99

27.

Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247–251CrossRef

28.

Raj B, Virtanen T, Chaudhure S, Singh R (2010) Non-negative matrix factorization based compensation of music for automatic speech recognition. In: Proceedings of international conference on speech and language processing, Makuhari, pp 717–720

29.

Mesgarani N, Shamma S, Slaney M (2004) Speech discrimination based on multiscale spectro-temporal modulations. Proc IEEE Int Conf Acoust Speech Signal Process 4(1):601–604

30.

McLoughlin IV (2014) Super-audible voice activity detection. IEEE Trans Speech Audio Process Lett 22(9):1424–1433CrossRef

31.

Tan LN, Borgstrom BJ, Alwan A (2010) Voice activity detection using harmonic frequency components in likelihood ratio test. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), March 2010, Dallas, pp 4466–4469

32.

Ramirez J, Segura JC, Benitez C, de la Torre A, Rubio A (2004) Efficient voice activity detection algorithms using long-term speech information. Speech Commun 42:271–287CrossRef

33.

Yanna M, Nishihara A (2013) Efficient voice activity detection algorithm using long-term spectral flatness measure. EURASIP J Audio Speech Music Process. https://doi.org/10.1186/1687-4722-2013-21

34.

Yang XK, He L, Qu D, Zhang WQ (2016) Voice activity detection algorithm based on long-term pitch information. EURASIP J Audio Speech Music Process. https://doi.org/10.1186/s13636-016-0092-y

Title: A voice activity detection algorithm in spectro-temporal domain using sparse representation
Authors: Mohadese Eshaghi
Farbod Razzazi
Alireza Behrad
Publication date: 01-08-2018
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 7/2019
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-018-0856-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 7/2019

An end-to-end differential network learning method for semantic segmentation

Predicting algal appearance on mortar surface with ensembles of adaptive neuro fuzzy models: a comparative study of ensemble strategies

A reinforced fuzzy ARTMAP model for data classification

Local dense mixed region cutting + global rebalancing: a method for imbalanced text sentiment classification

An emergency decision making method based on the multiplicative consistency of probabilistic linguistic preference relations

Multi-center convolutional descriptor aggregation for image retrieval