Skip to main content
Erschienen in: Cognitive Computation 3/2016

01.06.2016

A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement

verfasst von: M. A. Ben Messaoud, A. Bouzid, N. Ellouze

Erschienen in: Cognitive Computation | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we propose a speech enhancement approach for a single-microphone system. The main idea is to apply a specific transformation on the speech signal depending on the voicing state of the signal. We apply a voiced/unvoiced algorithm based on the multi-scale product analysis with the use of fuzzy logic to make more cognitively inspired use of speech information. A comb filtering is applied on the voiced frames of the noisy speech signal, and a spectral subtraction is operated on the unvoiced frames of the same signal. Further, the harmonics are enhanced by performing a designed comb filtering using an adjustable bandwidth. The comb filter is tuned by an accurate fundamental frequency estimation method. The fundamental frequency estimation method is based on computing the multi-scale product analysis of the noisy speech. Experimental results show that the proposed approach is capable of reducing noise in adverse noise environments with little speech degradation and outperforms several competitive methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Hussain A, Chetouani M, Squartini S, Bastari A, Piazza F. Nonlinear speech enhancement: an overview. In: Stylianou Y, Faundez-Zanuy M, Esposito A, editors. LNCS 4391. Berlin: Springer; 2007. p. 217–48. Hussain A, Chetouani M, Squartini S, Bastari A, Piazza F. Nonlinear speech enhancement: an overview. In: Stylianou Y, Faundez-Zanuy M, Esposito A, editors. LNCS 4391. Berlin: Springer; 2007. p. 217–48.
2.
Zurück zum Zitat Boll SF. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process. 1979;27:113–8.CrossRef Boll SF. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process. 1979;27:113–8.CrossRef
3.
Zurück zum Zitat Hu HT, Kuo FJ, Wang HJ. Supplementary schemes to spectral subtraction for speech enhancement. Speech Commun. 2002;36:205–14.CrossRef Hu HT, Kuo FJ, Wang HJ. Supplementary schemes to spectral subtraction for speech enhancement. Speech Commun. 2002;36:205–14.CrossRef
5.
Zurück zum Zitat Cadore J, Valverde-Albacete FJ, Gallardo-Antolín A, Peláez-Moreno C. Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement. Cognit Comput. 2013;5:426–516.CrossRef Cadore J, Valverde-Albacete FJ, Gallardo-Antolín A, Peláez-Moreno C. Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement. Cognit Comput. 2013;5:426–516.CrossRef
6.
Zurück zum Zitat Hu Y, Loizou PC. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans Speech Audio Process. 2004;12:59–69.CrossRef Hu Y, Loizou PC. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans Speech Audio Process. 2004;12:59–69.CrossRef
7.
Zurück zum Zitat Ding GH, Huang T, Xu B. Suppression of additive noise using a power spectral density MMSE estimator. IEEE Trans Signal Process Lett. 2004;11:585–604.CrossRef Ding GH, Huang T, Xu B. Suppression of additive noise using a power spectral density MMSE estimator. IEEE Trans Signal Process Lett. 2004;11:585–604.CrossRef
8.
Zurück zum Zitat Cohen I. Speech enhancement using a noncausal a priori SNR estimator. IEEE Trans Signal Process Lett. 2004;11:725–34.CrossRef Cohen I. Speech enhancement using a noncausal a priori SNR estimator. IEEE Trans Signal Process Lett. 2004;11:725–34.CrossRef
9.
Zurück zum Zitat Lee KY, Jung S. Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise. IEEE Trans Speech Audio Process. 2000;8:282–310.CrossRef Lee KY, Jung S. Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise. IEEE Trans Speech Audio Process. 2000;8:282–310.CrossRef
10.
Zurück zum Zitat Zavarehei E, Vaseghi S. Speech enhancement in temporal DFT trajectories using Kalman filters. In: Interspeech, Lisbon; 2005. Zavarehei E, Vaseghi S. Speech enhancement in temporal DFT trajectories using Kalman filters. In: Interspeech, Lisbon; 2005.
11.
Zurück zum Zitat Huag F, Lee T, Kleijn WB. Transform-domain wiener filter for speech periodicity. In: IEEE International Conference Acoustic Speech Signal Processing (ICASSP); 2012. p. 4577–84. Huag F, Lee T, Kleijn WB. Transform-domain wiener filter for speech periodicity. In: IEEE International Conference Acoustic Speech Signal Processing (ICASSP); 2012. p. 4577–84.
12.
Zurück zum Zitat Hu Y, Loizou PC. A subspace approach for enhancing speech corrupted by colored noise. IEEE Signal Process Lett. 2002;9:204–13.CrossRef Hu Y, Loizou PC. A subspace approach for enhancing speech corrupted by colored noise. IEEE Signal Process Lett. 2002;9:204–13.CrossRef
13.
Zurück zum Zitat Hardwick J, Yoo CD, Lim JS. Speech enhancement using the dual excitation model. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1993; 367–74. Hardwick J, Yoo CD, Lim JS. Speech enhancement using the dual excitation model. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1993; 367–74.
14.
Zurück zum Zitat Dubost S, Cappe O. Enhancement of speech based on non-parametric estimation of a time varying harmonic representation. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2000. p. 1859–64. Dubost S, Cappe O. Enhancement of speech based on non-parametric estimation of a time varying harmonic representation. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2000. p. 1859–64.
15.
Zurück zum Zitat Deisher ME, Spanias AS. HMM-based speech enhancement using harmonic modeling. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1997; 1175–84. Deisher ME, Spanias AS. HMM-based speech enhancement using harmonic modeling. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1997; 1175–84.
16.
Zurück zum Zitat Jensen J, Hansen JHL. Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans Speech Audio Process. 2001;9:731–810.CrossRef Jensen J, Hansen JHL. Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans Speech Audio Process. 2001;9:731–810.CrossRef
17.
Zurück zum Zitat Squartini S, Schuller B, Hussain A. Cognitive and emotional information processing for human–machine interaction. Cognit Comput. 2012;4(4):383–93.CrossRef Squartini S, Schuller B, Hussain A. Cognitive and emotional information processing for human–machine interaction. Cognit Comput. 2012;4(4):383–93.CrossRef
18.
Zurück zum Zitat Espinosa-Duro V, Faundez-Zanuy M, Mekyska J. Beyond cognitive signals. Cognit Comput. 2011;3(2):374–8.CrossRef Espinosa-Duro V, Faundez-Zanuy M, Mekyska J. Beyond cognitive signals. Cognit Comput. 2011;3(2):374–8.CrossRef
19.
Zurück zum Zitat Esposito A. The perceptual and cognitive role of visual and auditory channels in conveying emotional information. Cognit Comput. 2009;1(3):268–311.CrossRef Esposito A. The perceptual and cognitive role of visual and auditory channels in conveying emotional information. Cognit Comput. 2009;1(3):268–311.CrossRef
20.
Zurück zum Zitat Abel A, Hussain A. Novel two-stage audiovisual speech filtering in noisy environments. Cognit Comput. 2014;6:200–18.CrossRef Abel A, Hussain A. Novel two-stage audiovisual speech filtering in noisy environments. Cognit Comput. 2014;6:200–18.CrossRef
21.
Zurück zum Zitat Abel A, Hussain A. Cognitively inspired audiovisual speech filtering: towards an intelligent, fuzzy based, multimodal, two-stage speech enhancement system. Springer Briefs in Cognitive Computation, Springer International Publishing; 2015. Abel A, Hussain A. Cognitively inspired audiovisual speech filtering: towards an intelligent, fuzzy based, multimodal, two-stage speech enhancement system. Springer Briefs in Cognitive Computation, Springer International Publishing; 2015.
22.
Zurück zum Zitat Rotili R, Principi E, Squartini S, Schuller B. A Real-time speech enhancement framework in noisy and reverberated acoustic scenarios. Cognit Comput. 2013;5:504–13.CrossRef Rotili R, Principi E, Squartini S, Schuller B. A Real-time speech enhancement framework in noisy and reverberated acoustic scenarios. Cognit Comput. 2013;5:504–13.CrossRef
23.
Zurück zum Zitat Narayanan A, Wang DL. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of ICASSP; 2013. pp. 1520–6149. Narayanan A, Wang DL. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of ICASSP; 2013. pp. 1520–6149.
24.
Zurück zum Zitat Xu Y, Du J, Dai L, Lee C. An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett. 2014;21:65–74.CrossRef Xu Y, Du J, Dai L, Lee C. An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett. 2014;21:65–74.CrossRef
25.
Zurück zum Zitat Cho E, Smith JO, Widrow B. Exploiting the harmonic structure for speech enhancement. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2012. Cho E, Smith JO, Widrow B. Exploiting the harmonic structure for speech enhancement. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2012.
26.
Zurück zum Zitat George E, Smith M. Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model. IEEE Trans Speech Audio Process. 1997;5:389–418.CrossRef George E, Smith M. Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model. IEEE Trans Speech Audio Process. 1997;5:389–418.CrossRef
27.
Zurück zum Zitat Nehorai A, Porat B. Adaptive comb filtering for harmonic signal enhancement. IEEE Trans Acoust Speech Signal Process. 1986;34:1124–215.CrossRef Nehorai A, Porat B. Adaptive comb filtering for harmonic signal enhancement. IEEE Trans Acoust Speech Signal Process. 1986;34:1124–215.CrossRef
28.
Zurück zum Zitat Chen JH, Gersho A. Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans Speech Audio Process. 1995;3:59–113.CrossRef Chen JH, Gersho A. Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans Speech Audio Process. 1995;3:59–113.CrossRef
29.
Zurück zum Zitat Grancharov V, Plasberg JH, Samuelsson J, Kleijn WB. Generalized postfilter for speech quality enhancement. IEEE Trans Audio Speech Lang Process. 2008;16:57–8.CrossRef Grancharov V, Plasberg JH, Samuelsson J, Kleijn WB. Generalized postfilter for speech quality enhancement. IEEE Trans Audio Speech Lang Process. 2008;16:57–8.CrossRef
30.
Zurück zum Zitat Jin W, Liu X, Scordilis MS. Speech enhancement using harmonic emphasis and comb filtering. IEEE Trans Audio Speech Lang Process. 2010;18:356–413.CrossRef Jin W, Liu X, Scordilis MS. Speech enhancement using harmonic emphasis and comb filtering. IEEE Trans Audio Speech Lang Process. 2010;18:356–413.CrossRef
31.
Zurück zum Zitat Ahmadi S, Spanias A. Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Trans Speech Audio Process. 1999;7:333–6.CrossRef Ahmadi S, Spanias A. Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Trans Speech Audio Process. 1999;7:333–6.CrossRef
32.
Zurück zum Zitat Fisher E, Tabrikian J, Dubnov S. Generalized likelihood ratio test for voiced–unvoiced decision in noisy speech using the harmonic model. IEEE Trans Audio Speech Lang Process. 2006;14:502–9.CrossRef Fisher E, Tabrikian J, Dubnov S. Generalized likelihood ratio test for voiced–unvoiced decision in noisy speech using the harmonic model. IEEE Trans Audio Speech Lang Process. 2006;14:502–9.CrossRef
33.
Zurück zum Zitat Nakatani T, Amano S, Irino T, Ishizuka K, Kondo T. A method for fundamental frequency estimation and voicing decision: application to infant utterances recorded in real acoustical environments. Speech Commun. 2008;50:203–12.CrossRef Nakatani T, Amano S, Irino T, Ishizuka K, Kondo T. A method for fundamental frequency estimation and voicing decision: application to infant utterances recorded in real acoustical environments. Speech Commun. 2008;50:203–12.CrossRef
34.
Zurück zum Zitat Talkin D. A robust algorithm for pitch tracking (RAPT). In: Talkin D, editor. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518. Talkin D. A robust algorithm for pitch tracking (RAPT). In: Talkin D, editor. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518.
35.
Zurück zum Zitat de Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Am. 2002;111:1917–2014.CrossRefPubMed de Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Am. 2002;111:1917–2014.CrossRefPubMed
36.
Zurück zum Zitat Beritelli F, Casale S, Russo S, Serrano S. Adaptive V/UV speech detection based on characterization of background noise. EURASIP J Audio Speech Music Process. 2009;. doi:10.1155/2009/965436. Beritelli F, Casale S, Russo S, Serrano S. Adaptive V/UV speech detection based on characterization of background noise. EURASIP J Audio Speech Music Process. 2009;. doi:10.​1155/​2009/​965436.
37.
Zurück zum Zitat Ben Messaoud MA, Bouzid A, Ellouze, N. Estimation du pitch et décision de voisement par compression spectrale de l’autocorrélation du produit multi-échelle. In: Proceedings of Journée d’Etude de la parole (JEP-TALN-RECITAL 2012); 2012; pp. 201–8. Ben Messaoud MA, Bouzid A, Ellouze, N. Estimation du pitch et décision de voisement par compression spectrale de l’autocorrélation du produit multi-échelle. In: Proceedings of Journée d’Etude de la parole (JEP-TALN-RECITAL 2012); 2012; pp. 201–8.
38.
Zurück zum Zitat Bouzid A, Ellouze N. Electroglottographic measures based on GCI and GOI detection using multiscale product. Int J Comput Commun Control. 2008;3:21–32.CrossRef Bouzid A, Ellouze N. Electroglottographic measures based on GCI and GOI detection using multiscale product. Int J Comput Commun Control. 2008;3:21–32.CrossRef
39.
Zurück zum Zitat Ben Messaoud MA, Bouzid A, Ellouze N. Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Process J. 2011;5:344–412.CrossRef Ben Messaoud MA, Bouzid A, Ellouze N. Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Process J. 2011;5:344–412.CrossRef
40.
Zurück zum Zitat Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3:747–812.CrossRefPubMed Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3:747–812.CrossRefPubMed
41.
Zurück zum Zitat Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45:1043–9.CrossRef Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45:1043–9.CrossRef
42.
Zurück zum Zitat Mallat S. A wavelet tour of signal processing. 3rd ed. San Diego: Academic Press; 2008. Mallat S. A wavelet tour of signal processing. 3rd ed. San Diego: Academic Press; 2008.
43.
Zurück zum Zitat Touzi A, Ben Messaoud MA. New approach for conception and implementation of object oriented expert system using UML. Int Arab J Inf Technol. 2009;6:99–108. Touzi A, Ben Messaoud MA. New approach for conception and implementation of object oriented expert system using UML. Int Arab J Inf Technol. 2009;6:99–108.
44.
Zurück zum Zitat Ben Messaoud MA, Bouzid A, Ellouze N. An efficient method for fundamental frequency determination of noisy speech. In: Drugman T, Dutoit T, editors. LNCS 7911. Springer: Berlin; 2013. p. 33–41. Ben Messaoud MA, Bouzid A, Ellouze N. An efficient method for fundamental frequency determination of noisy speech. In: Drugman T, Dutoit T, editors. LNCS 7911. Springer: Berlin; 2013. p. 33–41.
46.
Zurück zum Zitat ITU-T P.862. Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In: ITU-T Recommendation; 2000; p. 862. ITU-T P.862. Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In: ITU-T Recommendation; 2000; p. 862.
47.
Zurück zum Zitat Camacho A, Harris JG. A sawtooth waveform inspired pitch estimator for speech and music. J Acoust Soc Am. 2008;124:1638–715.CrossRefPubMed Camacho A, Harris JG. A sawtooth waveform inspired pitch estimator for speech and music. J Acoust Soc Am. 2008;124:1638–715.CrossRefPubMed
48.
Zurück zum Zitat Ben Messaoud MA, Bouzid A, Ellouze N. Autocorrelation of the speech multi-scale product for voicing decision and pitch estimation. Cognit Comput. 2010;2:151–9.CrossRef Ben Messaoud MA, Bouzid A, Ellouze N. Autocorrelation of the speech multi-scale product for voicing decision and pitch estimation. Cognit Comput. 2010;2:151–9.CrossRef
49.
Zurück zum Zitat Loizou PC. Speech enhancement: theory and practice. Dallas: CRC Press; 2007. Loizou PC. Speech enhancement: theory and practice. Dallas: CRC Press; 2007.
Metadaten
Titel
A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement
verfasst von
M. A. Ben Messaoud
A. Bouzid
N. Ellouze
Publikationsdatum
01.06.2016
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 3/2016
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-015-9376-2

Weitere Artikel der Ausgabe 3/2016

Cognitive Computation 3/2016 Zur Ausgabe

Premium Partner