Skip to main content
Erschienen in: Cognitive Computation 3/2010

01.09.2010

A Non-Linear VAD for Noisy Environments

verfasst von: Jordi Solé-Casals, Vladimir Zaiats

Erschienen in: Cognitive Computation | Ausgabe 3/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper deals with non-linear transformations for improving the performance of an entropy-based voice activity detector (VAD). The idea to use a non-linear transformation has already been applied in the field of speech linear prediction, or linear predictive coding, based on source separation techniques, where a score function is added to classical equations in order to take into account the true distribution of the signal. We explore the possibility of estimating the entropy of frames after calculating its score function, instead of using original frames. We observe that if the signal is clean, the estimated entropy is essentially the same; if the signal is noisy, however, the frames transformed using the score function may give entropy that is different in voiced frames as compared to unvoiced ones. Experimental evidence is given to show that this fact enables voice activity detection under high noise, where the simple entropy method fails.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Altmann G. Cognitive models of speech processing: psycholinguistic and computational perspectives. USA: The MIT Press; 1995. ISBN-13: 978-0262510844. Altmann G. Cognitive models of speech processing: psycholinguistic and computational perspectives. USA: The MIT Press; 1995. ISBN-13: 978-0262510844.
2.
Zurück zum Zitat Singh D, Boland F. Voice activity detection, ACM Crossroads 13.4: Computer Vision and Speech. 2007. Singh D, Boland F. Voice activity detection, ACM Crossroads 13.4: Computer Vision and Speech. 2007.
3.
Zurück zum Zitat Grimm M, Kroschel K, editors. Robust speech recognition and understanding. Vienna, Austria: I-Tech; 2007. ISBN: 987-3-90213-08-0. Grimm M, Kroschel K, editors. Robust speech recognition and understanding. Vienna, Austria: I-Tech; 2007. ISBN: 987-3-90213-08-0.
4.
Zurück zum Zitat Górriz JM, Ramírez J, Segura JC, Puntonet CG. An effective cluster-based model for robust speech detection and speech recognition in noisy environments. J Acoust Soc Amer. 2006;120:470–81.CrossRef Górriz JM, Ramírez J, Segura JC, Puntonet CG. An effective cluster-based model for robust speech detection and speech recognition in noisy environments. J Acoust Soc Amer. 2006;120:470–81.CrossRef
10.
Zurück zum Zitat Kosmides E, Dermatas E, Kokkinakis G. Stochastic endpoint detection in noisy speech. In: Int Workshop Speech Comp (SPECOM); 1997. p. 109–14. Kosmides E, Dermatas E, Kokkinakis G. Stochastic endpoint detection in noisy speech. In: Int Workshop Speech Comp (SPECOM); 1997. p. 109–14.
12.
Zurück zum Zitat Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423. 623–656, July, Oct. 1948. Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423. 623–656, July, Oct. 1948.
13.
Zurück zum Zitat Stam AJ. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf Control. 1959;2:101–12.CrossRef Stam AJ. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf Control. 1959;2:101–12.CrossRef
14.
Zurück zum Zitat Kullback S. Information theory and statistics. Mineola, NY: Dover Publications; 1968. Kullback S. Information theory and statistics. Mineola, NY: Dover Publications; 1968.
15.
Zurück zum Zitat Verdú S. Mismatched estimation and relative entropy. In: Proc 2009 IEEE Int Symp Inform Theory, vol. 2. Seoul, Korea: Coex; 2009. p. 809–13. ISBN: 978-1-4244-4312-3. Verdú S. Mismatched estimation and relative entropy. In: Proc 2009 IEEE Int Symp Inform Theory, vol. 2. Seoul, Korea: Coex; 2009. p. 809–13. ISBN: 978-1-4244-4312-3.
16.
Zurück zum Zitat Hyvärinen A, Karhunen J, Oja E. Independent component analysis. New York: John Wiley; 2001.CrossRef Hyvärinen A, Karhunen J, Oja E. Independent component analysis. New York: John Wiley; 2001.CrossRef
17.
Zurück zum Zitat Solé-Casals J, Taleb A, Jutten C. Parametric approach to blind deconvolution of nonlinear channels. Neurocomputing. 2002;48:339–55.CrossRef Solé-Casals J, Taleb A, Jutten C. Parametric approach to blind deconvolution of nonlinear channels. Neurocomputing. 2002;48:339–55.CrossRef
19.
Zurück zum Zitat Härdle W. Smoothing techniques with implementation in S. Berlin-New York: Springer; 1990. Härdle W. Smoothing techniques with implementation in S. Berlin-New York: Springer; 1990.
20.
Zurück zum Zitat Ozeki K. The mutual information as a scoring function for speech recognition. IEICE technical report. Speech. 1995;431(95):53–60. Ozeki K. The mutual information as a scoring function for speech recognition. IEICE technical report. Speech. 1995;431(95):53–60.
21.
Zurück zum Zitat Buldygin VV, Kozachenko YuV. Metric characterization of random variables and stochastic processes. Providence: American Mathematical Society; 2000. (Translations of Mathematical Monographs, vol. 188). Buldygin VV, Kozachenko YuV. Metric characterization of random variables and stochastic processes. Providence: American Mathematical Society; 2000. (Translations of Mathematical Monographs, vol. 188).
23.
Zurück zum Zitat Cardoso J-F. Blind signal separation: statistical principles. Proc IEEE. 1998;9:2009–25.CrossRef Cardoso J-F. Blind signal separation: statistical principles. Proc IEEE. 1998;9:2009–25.CrossRef
24.
Zurück zum Zitat ETSI standard doc. ETSI ES 201 108 V1.1.3 (2003-09). ETSI standard doc. ETSI ES 201 108 V1.1.3 (2003-09).
26.
Zurück zum Zitat Kim E-K, Han W-J, Oh Y-H. A score function of splitting band for two-band speech model. Speech Commun. 2003;41:663–74.CrossRef Kim E-K, Han W-J, Oh Y-H. A score function of splitting band for two-band speech model. Speech Commun. 2003;41:663–74.CrossRef
27.
Zurück zum Zitat Kokkinakis K, Nandi AK. Flexible score functions for blind separation of speech signals based on generalized Gamma probability density functions. In: Proc ICASSP 2006, Acoustics, Speech and Signal Processing, vol. 1, 2006. Kokkinakis K, Nandi AK. Flexible score functions for blind separation of speech signals based on generalized Gamma probability density functions. In: Proc ICASSP 2006, Acoustics, Speech and Signal Processing, vol. 1, 2006.
28.
Zurück zum Zitat Chiang T-H, Lin Y-C. An integrated scoring function for a spoken dialogue system. In: Signal Process Proc, 1998. ICSP ’98, 4th Intl Conf Signal Process, vol. 1, Beijing, China; 12–16 Oct 1998. p. 617–20. ISBN: 0-7803-4325-5. Chiang T-H, Lin Y-C. An integrated scoring function for a spoken dialogue system. In: Signal Process Proc, 1998. ICSP ’98, 4th Intl Conf Signal Process, vol. 1, Beijing, China; 12–16 Oct 1998. p. 617–20. ISBN: 0-7803-4325-5.
Metadaten
Titel
A Non-Linear VAD for Noisy Environments
verfasst von
Jordi Solé-Casals
Vladimir Zaiats
Publikationsdatum
01.09.2010
Verlag
Springer-Verlag
Erschienen in
Cognitive Computation / Ausgabe 3/2010
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-010-9037-4

Weitere Artikel der Ausgabe 3/2010

Cognitive Computation 3/2010 Zur Ausgabe

Premium Partner