Skip to main content
Erschienen in: Wireless Personal Communications 3/2016

01.08.2016

Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments

verfasst von: Kyungyong Chung, Sang Yeob Oh

Erschienen in: Wireless Personal Communications | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Noise-elimination technology is used to eliminate noise, including environmental noise, from voice signals in order to increase voice recognition rates. Noise estimation is the most important factor in noise-elimination technology. One of the effective estimation methods is voice activity detection, which is based on the statistical properties of noise and voice. This method is a way of estimating noise using the statistical properties of both noise and voice, which have an independent Gaussian distribution. In cases of severe differences in a statistical property, like white noise, the method is very reliable but limited to signals having a low signal-to-noise ratio (SNR) or having speech shape noise, which has statistical properties similar to voice signals. Methods to increase the voice recognition rate suffer from decreasing voice recognition performance due to distortion of the voice spectrum and to missing voice frames, because noise remains if there has been incorrect estimation of the noise. Degradation in voice recognition performance emerges in the differences between the model training environment and the voice recognition environment. In order to decrease environmental discordance, various silence feature normalization methods are used. Existing silence feature normalization suffers from degradation of recognition performance because the classification accuracy for the voiced and unvoiced signals decreases by an increasing energy level in the silence section of a low SNR. This paper proposes a robust voice characteristic detection method for noisy environments using feature extraction and unvoiced feature normalization for a classification relative to the voiced and unvoiced signals. The suggested method constitutes a model for recognition by extracting the characteristics for classification of the voiced and unvoiced signals in a high SNR environment. Also, the model affects noise for voice characteristics less, and recognition performance improves by using the Cepstrum feature distribution property of voiced and unvoiced signals with a low SNR. The model was checked for its ability to improve recognition performance relative to the existing method based on recognition experiment results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zoltan, T., Peter, M., Zoltan, T., & Tibor, F. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. In Proceedings of the international conference on speech communication and technology (pp. 245–248). Zoltan, T., Peter, M., Zoltan, T., & Tibor, F. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. In Proceedings of the international conference on speech communication and technology (pp. 245–248).
2.
Zurück zum Zitat Chung, K., Boutaba, R., & Hariri, S. (2014). Recent trends in digital convergence information system. Wireless Personal Communications, 79(4), 2409–2413.CrossRef Chung, K., Boutaba, R., & Hariri, S. (2014). Recent trends in digital convergence information system. Wireless Personal Communications, 79(4), 2409–2413.CrossRef
3.
Zurück zum Zitat Oh, S., & Chung, K. Y. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893–899.CrossRef Oh, S., & Chung, K. Y. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893–899.CrossRef
4.
Zurück zum Zitat Kim, J. C., Jung, H., Kim, S. H., & Chung, K. (2015). Slope based intelligent 3D disaster simulation using physics engine. Wireless Personal Communications. doi:10.1007/s11277-015-2788-1. Kim, J. C., Jung, H., Kim, S. H., & Chung, K. (2015). Slope based intelligent 3D disaster simulation using physics engine. Wireless Personal Communications. doi:10.​1007/​s11277-015-2788-1.
5.
Zurück zum Zitat Kim, S. H., & Chung, K. (2015). Emergency situation monitoring service using context motion tracking of chronic disease patients. Cluster Computing, 18(2), 747–759.CrossRef Kim, S. H., & Chung, K. (2015). Emergency situation monitoring service using context motion tracking of chronic disease patients. Cluster Computing, 18(2), 747–759.CrossRef
6.
7.
Zurück zum Zitat Ball, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120.CrossRef Ball, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120.CrossRef
8.
Zurück zum Zitat Ahn, C. S., & Oh, S. Y. (2012). Gaussian Model optimization using configuration thread control in CHMM vocabulary recognition. The Journal of Digital Policy and Management, 10(7), 167–172. Ahn, C. S., & Oh, S. Y. (2012). Gaussian Model optimization using configuration thread control in CHMM vocabulary recognition. The Journal of Digital Policy and Management, 10(7), 167–172.
9.
Zurück zum Zitat Kim, J. H., & Chung, K. Y. (2014). Ontology-based healthcare context information model to implement ubiquitous environment. Multimedia Tools and Applications, 71(2), 873–888.CrossRef Kim, J. H., & Chung, K. Y. (2014). Ontology-based healthcare context information model to implement ubiquitous environment. Multimedia Tools and Applications, 71(2), 873–888.CrossRef
10.
Zurück zum Zitat Jung, H., & Chung, K. (2015). Ontology-driven slope modeling for disaster management service. Cluster Computing, 18(2), 677–692.CrossRef Jung, H., & Chung, K. (2015). Ontology-driven slope modeling for disaster management service. Cluster Computing, 18(2), 677–692.CrossRef
11.
Zurück zum Zitat Jung, H., & Chung, K. Y. (2014). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing, 18(6), 1363–1372.CrossRef Jung, H., & Chung, K. Y. (2014). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing, 18(6), 1363–1372.CrossRef
12.
Zurück zum Zitat Shen, G., & Chung, H. Y. (2010). Cepstral distance and log-energy based silence feature normalization for robust speech recognition. The Journal of the Acoustical Society of Korea, 29(4), 278–285. Shen, G., & Chung, H. Y. (2010). Cepstral distance and log-energy based silence feature normalization for robust speech recognition. The Journal of the Acoustical Society of Korea, 29(4), 278–285.
13.
Zurück zum Zitat Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather WebBot. Wireless Personal Communications, 73(2), 243–256.CrossRef Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather WebBot. Wireless Personal Communications, 73(2), 243–256.CrossRef
14.
Zurück zum Zitat Jung, H., & Chung, K. (2015). Sequential pattern profiling based bio-detection for smart health service. Cluster Computing, 18(1), 209–219.CrossRef Jung, H., & Chung, K. (2015). Sequential pattern profiling based bio-detection for smart health service. Cluster Computing, 18(1), 209–219.CrossRef
15.
Zurück zum Zitat Oh, S. Y., & Chung, K. Y. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439–2451.CrossRef Oh, S. Y., & Chung, K. Y. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439–2451.CrossRef
16.
Zurück zum Zitat Kim, K., Hong, M., Chung, K., & Oh, S. Y. (2015). Estimating unreliable objects and system reliability in P2P network. Peer-to-Peer Networking and Applications, 8(4), 610–619.CrossRef Kim, K., Hong, M., Chung, K., & Oh, S. Y. (2015). Estimating unreliable objects and system reliability in P2P network. Peer-to-Peer Networking and Applications, 8(4), 610–619.CrossRef
17.
Zurück zum Zitat Kim, S. H., & Chung, K. Y. (2014). 3D simulator for stability analysis of finite slope causing plane activity. Multimedia Tools and Applications, 68(2), 455–463.CrossRef Kim, S. H., & Chung, K. Y. (2014). 3D simulator for stability analysis of finite slope causing plane activity. Multimedia Tools and Applications, 68(2), 455–463.CrossRef
18.
Zurück zum Zitat Ahn, C. S., & Oh, S. Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management., 10(10), 277–282. Ahn, C. S., & Oh, S. Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management., 10(10), 277–282.
19.
Zurück zum Zitat Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Proceedings of the international symposium on universal communication (pp. 423–428). Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Proceedings of the international symposium on universal communication (pp. 423–428).
20.
Zurück zum Zitat Ahn, C. S., & Oh, S. Y. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. The Journal of Digital Policy and Management, 10(11), 377–382. Ahn, C. S., & Oh, S. Y. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. The Journal of Digital Policy and Management, 10(11), 377–382.
21.
Zurück zum Zitat Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (pp. 749–752). Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (pp. 749–752).
22.
Zurück zum Zitat Fletcher, H. (1940). Auditory patterns. Reviews of Modern Physics, 12(1), 47–65.CrossRef Fletcher, H. (1940). Auditory patterns. Reviews of Modern Physics, 12(1), 47–65.CrossRef
23.
Zurück zum Zitat Yao, K. S., Visser, E., Kwon, O. W., & Lee, T. W. (2003). A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. In Proceedings of the international conference on speech communication and technology (pp. 9–12). Yao, K. S., Visser, E., Kwon, O. W., & Lee, T. W. (2003). A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. In Proceedings of the international conference on speech communication and technology (pp. 9–12).
24.
Zurück zum Zitat Tai, C. F., & Hung, J. W. (2006). Silence energy normalization for robust speech recognition in additive noise environments. In Proceedings of ICSLP (pp. 2558–2561). Tai, C. F., & Hung, J. W. (2006). Silence energy normalization for robust speech recognition in additive noise environments. In Proceedings of ICSLP (pp. 2558–2561).
25.
Zurück zum Zitat Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.CrossRef Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.CrossRef
27.
Zurück zum Zitat Jung, H., & Chung, K. (2015). P2P context awareness based sensibility design recommendation using color and bio-signal analysis. Peer-to-Peer Networking and Applications,. doi:10.1007/s12083-015-0398-z. Jung, H., & Chung, K. (2015). P2P context awareness based sensibility design recommendation using color and bio-signal analysis. Peer-to-Peer Networking and Applications,. doi:10.​1007/​s12083-015-0398-z.
28.
Zurück zum Zitat Choi, G. K., & Kim, S. H. (2009). Voice activity detection method using psycho-acoustic model based on speech energy maximization in noisy environments. The Journal of the Acoustical Society of Korea, 28(5), 447–453. Choi, G. K., & Kim, S. H. (2009). Voice activity detection method using psycho-acoustic model based on speech energy maximization in noisy environments. The Journal of the Acoustical Society of Korea, 28(5), 447–453.
29.
Zurück zum Zitat Hirsch, H.-G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000 (pp. 181–188). Hirsch, H.-G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000 (pp. 181–188).
30.
Zurück zum Zitat Zhu, W. Z., & Shaughnessy, D. O. (2005). Log energy dynamic range normalization for robust for robust speech recognition. In Proceedings of the international conference on acoustics, speech, and signal (pp. 245–248). Zhu, W. Z., & Shaughnessy, D. O. (2005). Log energy dynamic range normalization for robust for robust speech recognition. In Proceedings of the international conference on acoustics, speech, and signal (pp. 245–248).
Metadaten
Titel
Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments
verfasst von
Kyungyong Chung
Sang Yeob Oh
Publikationsdatum
01.08.2016
Verlag
Springer US
Erschienen in
Wireless Personal Communications / Ausgabe 3/2016
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-015-3169-5

Weitere Artikel der Ausgabe 3/2016

Wireless Personal Communications 3/2016 Zur Ausgabe

Neuer Inhalt