Skip to main content
Top
Published in: Wireless Personal Communications 4/2018

24-07-2017

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals

Authors: SangYeob Oh, Kyungyong Chung

Published in: Wireless Personal Communications | Issue 4/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech enhancement algorithms play an important role in speech signal processing. Over the past several decades, many algorithms have been studied for speech enhancement. A speech enhancement algorithm uses a noise removal method and a statistical model filter to analyze the speech signal in the frequency domain. Spectral subtraction and Wiener filters have been used as representative algorithms. These algorithms have excellent speech enhancement performance, but suffer from deterioration in performance due to specific noise or low signal-to-noise ratio (SNR) environments. In addition, according to estimations of erroneous noise, a noise existing in a voice signal is maintained so that a spectrum corresponding to a voice signal is distorted, or a frame corresponding to a voice signal cannot be retrieved, and voice recognition performance deteriorates. The problem of deterioration in speech recognition performance arises from the difference between speech recognition and training model. We use silence-feature normalization model as a methodology to improve the recognition rate resulting from the difference in the noisy environments. Conventional silence-feature normalization has a problem in that the silent part of the energy increases, which affects recognition performance due to unclear boundaries categorizing the voice. In this study, we use the cepstrum feature of the noise signals in the silence-feature normalization model to improve the performance of silence-feature normalization in a signal with a low SNR by setting a reference value for voiced and unvoiced classification. As a result of recognition rate confirmation, the recognition rates improve in performance, compared with other methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zoltan, T., Peter, M., Zoltan, T., & Tibor, F. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. In Proceedings of the international conference on speech communication and technology (pp. 245–248). Zoltan, T., Peter, M., Zoltan, T., & Tibor, F. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. In Proceedings of the international conference on speech communication and technology (pp. 245–248).
2.
go back to reference Ahn, C. S., & Oh, S. Y. (2012). Gaussian model optimization using configuration thread control in CHMM vocabulary recognition. The Journal of Digital Policy and Management, 10(7), 167–172. Ahn, C. S., & Oh, S. Y. (2012). Gaussian model optimization using configuration thread control in CHMM vocabulary recognition. The Journal of Digital Policy and Management, 10(7), 167–172.
3.
go back to reference Ahn, C. S., & Oh, S. Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management, 10(10), 277–282. Ahn, C. S., & Oh, S. Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management, 10(10), 277–282.
4.
go back to reference Shen, G., & Chung, H. Y. (2010). Cepstral distance and log-energy based silence feature normalization for robust speech recognition. The Journal of the Acoustical Society of Korea, 29(4), 278–285. Shen, G., & Chung, H. Y. (2010). Cepstral distance and log-energy based silence feature normalization for robust speech recognition. The Journal of the Acoustical Society of Korea, 29(4), 278–285.
5.
go back to reference Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Proceedings of the international symposium on universal communication (pp. 423–428). Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Proceedings of the international symposium on universal communication (pp. 423–428).
6.
go back to reference Ahn, C. S., & Oh, S. Y. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. The Journal of Digital Policy and Management., 10(11), 377–382. Ahn, C. S., & Oh, S. Y. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. The Journal of Digital Policy and Management., 10(11), 377–382.
7.
go back to reference Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (pp. 749–752). Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (pp. 749–752).
8.
go back to reference Park, J. S., & Ko, H. S. (2013). Robust speech endpoint detection in noisy environments for HRI. The Journal of the Acoustical Society of Korea, 32(2), 147–156.CrossRef Park, J. S., & Ko, H. S. (2013). Robust speech endpoint detection in noisy environments for HRI. The Journal of the Acoustical Society of Korea, 32(2), 147–156.CrossRef
9.
go back to reference Yao, K. S., Visser, E., Kwon, O. W., & Lee, T. W. (2003). A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. In Proceedings of the international conference on speech communication and technology (pp. 9–12). Yao, K. S., Visser, E., Kwon, O. W., & Lee, T. W. (2003). A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. In Proceedings of the international conference on speech communication and technology (pp. 9–12).
10.
go back to reference Tai, C. F., & Hung, J. W. (2006). Silence energy normalization for robust speech recognition in additive noise environments. In Proceedings of the international conference on spoken language processing (pp. 2558–2561). Tai, C. F., & Hung, J. W. (2006). Silence energy normalization for robust speech recognition in additive noise environments. In Proceedings of the international conference on spoken language processing (pp. 2558–2561).
11.
go back to reference Han, I. S., & Ahn, C. S. (2014). Robust speech detection using SEM and SFN. International Journal of Multimedia and Ubiquitous Engineering, 9(9), 61–68.CrossRef Han, I. S., & Ahn, C. S. (2014). Robust speech detection using SEM and SFN. International Journal of Multimedia and Ubiquitous Engineering, 9(9), 61–68.CrossRef
12.
go back to reference Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.CrossRef Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.CrossRef
13.
go back to reference Chung, K., & Park, R. C. (2016). PHR open platform based smart health service using distributed object group framework. Cluster Computing, 19(1), 505–517.CrossRef Chung, K., & Park, R. C. (2016). PHR open platform based smart health service using distributed object group framework. Cluster Computing, 19(1), 505–517.CrossRef
14.
go back to reference Kim, J. C., & Chung, K. (2017). Depression index service using knowledge based crowdsourcing in smart health. Wireless Personal Communication, 93(1), 255–268.CrossRef Kim, J. C., & Chung, K. (2017). Depression index service using knowledge based crowdsourcing in smart health. Wireless Personal Communication, 93(1), 255–268.CrossRef
15.
go back to reference Park, R. C., Jung, H., Chung, K., & Yoon, K. H. (2015). Picocell based telemedicine health service for human UX/UI. Multimedia Tools and Applications, 74(7), 2519–2534.CrossRef Park, R. C., Jung, H., Chung, K., & Yoon, K. H. (2015). Picocell based telemedicine health service for human UX/UI. Multimedia Tools and Applications, 74(7), 2519–2534.CrossRef
16.
go back to reference Choi, G. K., & Kim, S. H. (2009). Voice activity detection method using psycho-acoustic model based on speech energy maximization in noisy environments. The Journal of the Acoustical Society of Korea, 28(5), 447–453. Choi, G. K., & Kim, S. H. (2009). Voice activity detection method using psycho-acoustic model based on speech energy maximization in noisy environments. The Journal of the Acoustical Society of Korea, 28(5), 447–453.
17.
go back to reference Chung, K., & Oh, S. Y. (2016). Voice activity detection using improvement unvoiced feature normalization process in noisy environment. Wireless Personal Communications, 89(3), 747–759.CrossRef Chung, K., & Oh, S. Y. (2016). Voice activity detection using improvement unvoiced feature normalization process in noisy environment. Wireless Personal Communications, 89(3), 747–759.CrossRef
18.
go back to reference Oh, S. Y., & Chung, K. Y. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893–899.CrossRef Oh, S. Y., & Chung, K. Y. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893–899.CrossRef
19.
go back to reference Oh, S. Y., & Chung, K. Y. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439–2451.CrossRef Oh, S. Y., & Chung, K. Y. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439–2451.CrossRef
20.
go back to reference Pearce, D., Hirsch, H., & Deutschland Gmbh, E. E. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000 (pp. 29–32). Pearce, D., Hirsch, H., & Deutschland Gmbh, E. E. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000 (pp. 29–32).
21.
go back to reference Zhu, W. Z., & Shaughnessy, D. O. (2005). Log energy dynamic range normalization for robust for robust speech recognition. In Proceedings of the international conference on acoustics, speech, and signal (pp. 245–248). Zhu, W. Z., & Shaughnessy, D. O. (2005). Log energy dynamic range normalization for robust for robust speech recognition. In Proceedings of the international conference on acoustics, speech, and signal (pp. 245–248).
22.
go back to reference Jung, H., & Chung, K. Y. (2014). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing, 18(6), 1363–1372.CrossRef Jung, H., & Chung, K. Y. (2014). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing, 18(6), 1363–1372.CrossRef
23.
go back to reference Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather webbot. Wireless Personal Communications, 73(2), 243–256.CrossRef Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather webbot. Wireless Personal Communications, 73(2), 243–256.CrossRef
24.
go back to reference Chung, K., & Park, R. C. (2016). P2P cloud network services for IoT based disaster situations information. Peer-to-Peer Networking and Applications, 9(3), 566–577.CrossRef Chung, K., & Park, R. C. (2016). P2P cloud network services for IoT based disaster situations information. Peer-to-Peer Networking and Applications, 9(3), 566–577.CrossRef
25.
go back to reference Jung, H., Yoo, H., & Chung, K. (2016). Associative context mining for ontology-driven hidden knowledge discovery. Cluster Computing, 19(4), 2261–2271.CrossRef Jung, H., Yoo, H., & Chung, K. (2016). Associative context mining for ontology-driven hidden knowledge discovery. Cluster Computing, 19(4), 2261–2271.CrossRef
26.
go back to reference Oh, S. Y., & Chung, K. (2016). Vocabulary optimization process using similar phoneme recognition and feature extraction. Cluster Computing, 19(3), 1683–1690.CrossRef Oh, S. Y., & Chung, K. (2016). Vocabulary optimization process using similar phoneme recognition and feature extraction. Cluster Computing, 19(3), 1683–1690.CrossRef
27.
go back to reference Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 16(1), 229–238.CrossRef Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 16(1), 229–238.CrossRef
28.
go back to reference Kim, J. C., Jung, H., Kim, S. H., & Chung, K. (2016). Slope based intelligent 3D disaster simulation using physics engine. Wireless Personal Communications, 86(1), 183–199.CrossRef Kim, J. C., Jung, H., Kim, S. H., & Chung, K. (2016). Slope based intelligent 3D disaster simulation using physics engine. Wireless Personal Communications, 86(1), 183–199.CrossRef
29.
go back to reference Chung, K., Kim, J. C., & Park, R. C. (2016). Knowledge-based health service considering user convenience using hybrid Wi-Fi P2P. Information Technology and Management, 17(1), 67–80.CrossRef Chung, K., Kim, J. C., & Park, R. C. (2016). Knowledge-based health service considering user convenience using hybrid Wi-Fi P2P. Information Technology and Management, 17(1), 67–80.CrossRef
30.
go back to reference Jung, H., & Chung, K. (2016). Knowledge based dietary nutrition recommendation for obesity management. Information Technology and Management, 17(1), 29–42.CrossRef Jung, H., & Chung, K. (2016). Knowledge based dietary nutrition recommendation for obesity management. Information Technology and Management, 17(1), 29–42.CrossRef
31.
go back to reference Kim, S. H., & Chung, K. (2016). Emergency situation monitoring service using context motion tracking of chronic disease Patients. Cluster Computing, 18(2), 747–759.CrossRef Kim, S. H., & Chung, K. (2016). Emergency situation monitoring service using context motion tracking of chronic disease Patients. Cluster Computing, 18(2), 747–759.CrossRef
32.
go back to reference Jung, H., & Chung, K. (2015). Ontology-driven slope modeling for disaster management service. Cluster Computing, 18(2), 677–692.CrossRef Jung, H., & Chung, K. (2015). Ontology-driven slope modeling for disaster management service. Cluster Computing, 18(2), 677–692.CrossRef
33.
go back to reference Yoo, H., & Chung, K. (2017). PHR based diabetes index service model using life behavior analysis. Wireless Personal Communications, 93(1), 161–174.CrossRef Yoo, H., & Chung, K. (2017). PHR based diabetes index service model using life behavior analysis. Wireless Personal Communications, 93(1), 161–174.CrossRef
34.
go back to reference Kim, K., Hong, M., Chung, K., & Oh, S. Y. (2015). Estimating unreliable objects and system reliability in P2P network. Peer-to-Peer Networking and Applications, 8(4), 610–619.CrossRef Kim, K., Hong, M., Chung, K., & Oh, S. Y. (2015). Estimating unreliable objects and system reliability in P2P network. Peer-to-Peer Networking and Applications, 8(4), 610–619.CrossRef
35.
go back to reference Chung, K., & Oh, S. Y. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629–635.CrossRef Chung, K., & Oh, S. Y. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629–635.CrossRef
Metadata
Title
Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals
Authors
SangYeob Oh
Kyungyong Chung
Publication date
24-07-2017
Publisher
Springer US
Published in
Wireless Personal Communications / Issue 4/2018
Print ISSN: 0929-6212
Electronic ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-017-4645-x

Other articles of this Issue 4/2018

Wireless Personal Communications 4/2018 Go to the issue