nach oben

Wireless Personal Communications

Erschienen in:

01.08.2016

Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments

verfasst von: Kyungyong Chung, Sang Yeob Oh

Erschienen in: Wireless Personal Communications | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Noise-elimination technology is used to eliminate noise, including environmental noise, from voice signals in order to increase voice recognition rates. Noise estimation is the most important factor in noise-elimination technology. One of the effective estimation methods is voice activity detection, which is based on the statistical properties of noise and voice. This method is a way of estimating noise using the statistical properties of both noise and voice, which have an independent Gaussian distribution. In cases of severe differences in a statistical property, like white noise, the method is very reliable but limited to signals having a low signal-to-noise ratio (SNR) or having speech shape noise, which has statistical properties similar to voice signals. Methods to increase the voice recognition rate suffer from decreasing voice recognition performance due to distortion of the voice spectrum and to missing voice frames, because noise remains if there has been incorrect estimation of the noise. Degradation in voice recognition performance emerges in the differences between the model training environment and the voice recognition environment. In order to decrease environmental discordance, various silence feature normalization methods are used. Existing silence feature normalization suffers from degradation of recognition performance because the classification accuracy for the voiced and unvoiced signals decreases by an increasing energy level in the silence section of a low SNR. This paper proposes a robust voice characteristic detection method for noisy environments using feature extraction and unvoiced feature normalization for a classification relative to the voiced and unvoiced signals. The suggested method constitutes a model for recognition by extracting the characteristics for classification of the voiced and unvoiced signals in a high SNR environment. Also, the model affects noise for voice characteristics less, and recognition performance improves by using the Cepstrum feature distribution property of voiced and unvoiced signals with a low SNR. The model was checked for its ability to improve recognition performance relative to the existing method based on recognition experiment results.

Vorheriger Artikel Variability Change Management Using the Orthogonal Variability Model-Based Traceability

Nächster Artikel A Study on the Authentication and Security of Financial Settlement Using the Finger Vein Technology in Wireless Internet Environment

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Zoltan, T., Peter, M., Zoltan, T., & Tibor, F. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. In Proceedings of the international conference on speech communication and technology (pp. 245–248).

Chung, K., Boutaba, R., & Hariri, S. (2014). Recent trends in digital convergence information system. Wireless Personal Communications, 79(4), 2409–2413.CrossRef

Oh, S., & Chung, K. Y. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893–899.CrossRef

Kim, J. C., Jung, H., Kim, S. H., & Chung, K. (2015). Slope based intelligent 3D disaster simulation using physics engine. Wireless Personal Communications. doi:10.1007/s11277-015-2788-1.

Kim, S. H., & Chung, K. (2015). Emergency situation monitoring service using context motion tracking of chronic disease patients. Cluster Computing, 18(2), 747–759.CrossRef

Jung, H., & Chung, K. (2015). Knowledge based dietary nutrition recommendation for obesity management. Information Technology and Management. doi:10.1007/s10799-015-0218-4.

Ball, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120.CrossRef

Ahn, C. S., & Oh, S. Y. (2012). Gaussian Model optimization using configuration thread control in CHMM vocabulary recognition. The Journal of Digital Policy and Management, 10(7), 167–172.

Kim, J. H., & Chung, K. Y. (2014). Ontology-based healthcare context information model to implement ubiquitous environment. Multimedia Tools and Applications, 71(2), 873–888.CrossRef

10.

Jung, H., & Chung, K. (2015). Ontology-driven slope modeling for disaster management service. Cluster Computing, 18(2), 677–692.CrossRef

11.

Jung, H., & Chung, K. Y. (2014). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing, 18(6), 1363–1372.CrossRef

12.

Shen, G., & Chung, H. Y. (2010). Cepstral distance and log-energy based silence feature normalization for robust speech recognition. The Journal of the Acoustical Society of Korea, 29(4), 278–285.

13.

Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather WebBot. Wireless Personal Communications, 73(2), 243–256.CrossRef

14.

Jung, H., & Chung, K. (2015). Sequential pattern profiling based bio-detection for smart health service. Cluster Computing, 18(1), 209–219.CrossRef

15.

Oh, S. Y., & Chung, K. Y. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439–2451.CrossRef

16.

Kim, K., Hong, M., Chung, K., & Oh, S. Y. (2015). Estimating unreliable objects and system reliability in P2P network. Peer-to-Peer Networking and Applications, 8(4), 610–619.CrossRef

17.

Kim, S. H., & Chung, K. Y. (2014). 3D simulator for stability analysis of finite slope causing plane activity. Multimedia Tools and Applications, 68(2), 455–463.CrossRef

18.

Ahn, C. S., & Oh, S. Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management., 10(10), 277–282.

19.

Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Proceedings of the international symposium on universal communication (pp. 423–428).

20.

Ahn, C. S., & Oh, S. Y. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. The Journal of Digital Policy and Management, 10(11), 377–382.

21.

Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (pp. 749–752).

22.

Fletcher, H. (1940). Auditory patterns. Reviews of Modern Physics, 12(1), 47–65.CrossRef

23.

Yao, K. S., Visser, E., Kwon, O. W., & Lee, T. W. (2003). A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. In Proceedings of the international conference on speech communication and technology (pp. 9–12).

24.

Tai, C. F., & Hung, J. W. (2006). Silence energy normalization for robust speech recognition in additive noise environments. In Proceedings of ICSLP (pp. 2558–2561).

25.

Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.CrossRef

26.

Chung, K., Boutaba, R., & Hariri, S. (2015). Knowledge-based decision support systems. Information Technology and Management. doi:10.1007/s10799-015-0251-3.

27.

Jung, H., & Chung, K. (2015). P2P context awareness based sensibility design recommendation using color and bio-signal analysis. Peer-to-Peer Networking and Applications,. doi:10.1007/s12083-015-0398-z.

28.

Choi, G. K., & Kim, S. H. (2009). Voice activity detection method using psycho-acoustic model based on speech energy maximization in noisy environments. The Journal of the Acoustical Society of Korea, 28(5), 447–453.

29.

Hirsch, H.-G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000 (pp. 181–188).

30.

Zhu, W. Z., & Shaughnessy, D. O. (2005). Log energy dynamic range normalization for robust for robust speech recognition. In Proceedings of the international conference on acoustics, speech, and signal (pp. 245–248).

Titel: Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments
verfasst von: Kyungyong Chung
Sang Yeob Oh
Publikationsdatum: 01.08.2016
Verlag: Springer US
Erschienen in: Wireless Personal Communications / Ausgabe 3/2016
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI: https://doi.org/10.1007/s11277-015-3169-5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2016

Information Security Evaluation Using Multi-Attribute Threat Index

A New Performance Assessment Modeling and Development of a Performance Assessment System for a Cloud Service

Sentiment Analysis Using Word Polarity of Social Media

Software Vulnerability Detection Methodology Combined with Static and Dynamic Analysis

Information Interoperability System Using Multi-agent with Security

Using a Method Based on a Modified K-Means Clustering and Mean Shift Segmentation to Reduce File Sizes and Detect Brain Tumors from Magnetic Resonance (MRI) Images

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.