nach oben

Cognitive Computation

Erschienen in:

01.06.2016

A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement

verfasst von: M. A. Ben Messaoud, A. Bouzid, N. Ellouze

Erschienen in: Cognitive Computation | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we propose a speech enhancement approach for a single-microphone system. The main idea is to apply a specific transformation on the speech signal depending on the voicing state of the signal. We apply a voiced/unvoiced algorithm based on the multi-scale product analysis with the use of fuzzy logic to make more cognitively inspired use of speech information. A comb filtering is applied on the voiced frames of the noisy speech signal, and a spectral subtraction is operated on the unvoiced frames of the same signal. Further, the harmonics are enhanced by performing a designed comb filtering using an adjustable bandwidth. The comb filter is tuned by an accurate fundamental frequency estimation method. The fundamental frequency estimation method is based on computing the multi-scale product analysis of the noisy speech. Experimental results show that the proposed approach is capable of reducing noise in adverse noise environments with little speech degradation and outperforms several competitive methods.

Vorheriger Artikel Unsupervised Commonsense Knowledge Enrichment for Domain-Specific Sentiment Analysis

Nächster Artikel A Probabilistic Model for Information Retrieval by Mining User Behaviors

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Hussain A, Chetouani M, Squartini S, Bastari A, Piazza F. Nonlinear speech enhancement: an overview. In: Stylianou Y, Faundez-Zanuy M, Esposito A, editors. LNCS 4391. Berlin: Springer; 2007. p. 217–48.

Boll SF. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process. 1979;27:113–8.CrossRef

Hu HT, Kuo FJ, Wang HJ. Supplementary schemes to spectral subtraction for speech enhancement. Speech Commun. 2002;36:205–14.CrossRef

Lu Y, Loizou PC. A geometric approach to spectral subtraction. Speech Commun. 2008;50:453–514.CrossRefPubMedPubMedCentral

Cadore J, Valverde-Albacete FJ, Gallardo-Antolín A, Peláez-Moreno C. Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement. Cognit Comput. 2013;5:426–516.CrossRef

Hu Y, Loizou PC. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans Speech Audio Process. 2004;12:59–69.CrossRef

Ding GH, Huang T, Xu B. Suppression of additive noise using a power spectral density MMSE estimator. IEEE Trans Signal Process Lett. 2004;11:585–604.CrossRef

Cohen I. Speech enhancement using a noncausal a priori SNR estimator. IEEE Trans Signal Process Lett. 2004;11:725–34.CrossRef

Lee KY, Jung S. Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise. IEEE Trans Speech Audio Process. 2000;8:282–310.CrossRef

10.

Zavarehei E, Vaseghi S. Speech enhancement in temporal DFT trajectories using Kalman filters. In: Interspeech, Lisbon; 2005.

11.

Huag F, Lee T, Kleijn WB. Transform-domain wiener filter for speech periodicity. In: IEEE International Conference Acoustic Speech Signal Processing (ICASSP); 2012. p. 4577–84.

12.

Hu Y, Loizou PC. A subspace approach for enhancing speech corrupted by colored noise. IEEE Signal Process Lett. 2002;9:204–13.CrossRef

13.

Hardwick J, Yoo CD, Lim JS. Speech enhancement using the dual excitation model. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1993; 367–74.

14.

Dubost S, Cappe O. Enhancement of speech based on non-parametric estimation of a time varying harmonic representation. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2000. p. 1859–64.

15.

Deisher ME, Spanias AS. HMM-based speech enhancement using harmonic modeling. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1997; 1175–84.

16.

Jensen J, Hansen JHL. Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans Speech Audio Process. 2001;9:731–810.CrossRef

17.

Squartini S, Schuller B, Hussain A. Cognitive and emotional information processing for human–machine interaction. Cognit Comput. 2012;4(4):383–93.CrossRef

18.

Espinosa-Duro V, Faundez-Zanuy M, Mekyska J. Beyond cognitive signals. Cognit Comput. 2011;3(2):374–8.CrossRef

19.

Esposito A. The perceptual and cognitive role of visual and auditory channels in conveying emotional information. Cognit Comput. 2009;1(3):268–311.CrossRef

20.

Abel A, Hussain A. Novel two-stage audiovisual speech filtering in noisy environments. Cognit Comput. 2014;6:200–18.CrossRef

21.

Abel A, Hussain A. Cognitively inspired audiovisual speech filtering: towards an intelligent, fuzzy based, multimodal, two-stage speech enhancement system. Springer Briefs in Cognitive Computation, Springer International Publishing; 2015.

22.

Rotili R, Principi E, Squartini S, Schuller B. A Real-time speech enhancement framework in noisy and reverberated acoustic scenarios. Cognit Comput. 2013;5:504–13.CrossRef

23.

Narayanan A, Wang DL. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of ICASSP; 2013. pp. 1520–6149.

24.

Xu Y, Du J, Dai L, Lee C. An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett. 2014;21:65–74.CrossRef

25.

Cho E, Smith JO, Widrow B. Exploiting the harmonic structure for speech enhancement. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2012.

26.

George E, Smith M. Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model. IEEE Trans Speech Audio Process. 1997;5:389–418.CrossRef

27.

Nehorai A, Porat B. Adaptive comb filtering for harmonic signal enhancement. IEEE Trans Acoust Speech Signal Process. 1986;34:1124–215.CrossRef

28.

Chen JH, Gersho A. Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans Speech Audio Process. 1995;3:59–113.CrossRef

29.

Grancharov V, Plasberg JH, Samuelsson J, Kleijn WB. Generalized postfilter for speech quality enhancement. IEEE Trans Audio Speech Lang Process. 2008;16:57–8.CrossRef

30.

Jin W, Liu X, Scordilis MS. Speech enhancement using harmonic emphasis and comb filtering. IEEE Trans Audio Speech Lang Process. 2010;18:356–413.CrossRef

31.

Ahmadi S, Spanias A. Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Trans Speech Audio Process. 1999;7:333–6.CrossRef

32.

Fisher E, Tabrikian J, Dubnov S. Generalized likelihood ratio test for voiced–unvoiced decision in noisy speech using the harmonic model. IEEE Trans Audio Speech Lang Process. 2006;14:502–9.CrossRef

33.

Nakatani T, Amano S, Irino T, Ishizuka K, Kondo T. A method for fundamental frequency estimation and voicing decision: application to infant utterances recorded in real acoustical environments. Speech Commun. 2008;50:203–12.CrossRef

34.

Talkin D. A robust algorithm for pitch tracking (RAPT). In: Talkin D, editor. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518.

35.

de Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Am. 2002;111:1917–2014.CrossRefPubMed

36.

Beritelli F, Casale S, Russo S, Serrano S. Adaptive V/UV speech detection based on characterization of background noise. EURASIP J Audio Speech Music Process. 2009;. doi:10.1155/2009/965436.

37.

Ben Messaoud MA, Bouzid A, Ellouze, N. Estimation du pitch et décision de voisement par compression spectrale de l’autocorrélation du produit multi-échelle. In: Proceedings of Journée d’Etude de la parole (JEP-TALN-RECITAL 2012); 2012; pp. 201–8.

38.

Bouzid A, Ellouze N. Electroglottographic measures based on GCI and GOI detection using multiscale product. Int J Comput Commun Control. 2008;3:21–32.CrossRef

39.

Ben Messaoud MA, Bouzid A, Ellouze N. Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Process J. 2011;5:344–412.CrossRef

40.

Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3:747–812.CrossRefPubMed

41.

Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45:1043–9.CrossRef

42.

Mallat S. A wavelet tour of signal processing. 3rd ed. San Diego: Academic Press; 2008.

43.

Touzi A, Ben Messaoud MA. New approach for conception and implementation of object oriented expert system using UML. Int Arab J Inf Technol. 2009;6:99–108.

44.

Ben Messaoud MA, Bouzid A, Ellouze N. An efficient method for fundamental frequency determination of noisy speech. In: Drugman T, Dutoit T, editors. LNCS 7911. Springer: Berlin; 2013. p. 33–41.

45.

Hu Y, Loizou PC. Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 2007;49:588–614.CrossRefPubMedPubMedCentral

46.

ITU-T P.862. Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In: ITU-T Recommendation; 2000; p. 862.

47.

Camacho A, Harris JG. A sawtooth waveform inspired pitch estimator for speech and music. J Acoust Soc Am. 2008;124:1638–715.CrossRefPubMed

48.

Ben Messaoud MA, Bouzid A, Ellouze N. Autocorrelation of the speech multi-scale product for voicing decision and pitch estimation. Cognit Comput. 2010;2:151–9.CrossRef

49.

Loizou PC. Speech enhancement: theory and practice. Dallas: CRC Press; 2007.

Titel: A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement
verfasst von: M. A. Ben Messaoud
A. Bouzid
N. Ellouze
Publikationsdatum: 01.06.2016
Verlag: Springer US
Erschienen in: Cognitive Computation / Ausgabe 3/2016
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI: https://doi.org/10.1007/s12559-015-9376-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2016

Incrementally Detecting Moving Objects in Video with Sparsity and Connectivity

Subject-Specific Channel Selection Using Time Information for Motor Imagery Brain–Computer Interfaces

A Probabilistic Model for Information Retrieval by Mining User Behaviors

Unsupervised Commonsense Knowledge Enrichment for Domain-Specific Sentiment Analysis

Granular Computing Techniques for Classification and Semantic Characterization of Structured Data

Trusted Autonomy and Cognitive Cyber Symbiosis: Open Challenges

Premium Partner