nach oben

Pattern Analysis and Applications

Erschienen in:

03.04.2019 | Theoretical advances

Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

verfasst von: Anirban Bhowmick, Astik Biswas, Mahesh Chandra

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Wavelet-based front-end processing technique has gained popularity for its noise removing capability. In this paper, a robust automatic speech recognition system is proposed by utilizing the advantages of psycho-acoustically motivated wavelet-based front-end compensator. In the front-end compensator block, voiced speech probability-based voice activity detector system is designed to separate voiced and unvoiced frames and to update noise statistics. The wavelet packet decomposition tree is designed according to equal rectangular bandwidth (ERB) scale. Wavelet decomposition based on ERB scale is utilized here as the central frequency of the ERB distribution resembles frequency response of human cochlea. Voiced and unvoiced frames are separately decomposed into 24 sub-bands to estimate average sub-band energy (ASE) of each frame. ASE is then used to calculate threshold value. Lastly, Wiener filtering is employed for reducing the residual noise before final reconstruction stage. The proposed system is evaluated on TIMIT database under various noise conditions. The phoneme recognition accuracy of the proposed system is compared with different baseline and robust features as well as with existing front-end compensation techniques. Additionally, the proposed front-end compensator is evaluated in terms of phoneme classification accuracy. Performance improvement is observed in all above experiments.

Vorheriger Artikel On enhancing the deadlock-preventing object migration automaton using the pursuit paradigm

Nächster Artikel DBSCAN-like clustering method for various data densities

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366CrossRef

Wong E, Sridharan S (2001) Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification. In: Proceedings of 2001 international symposium on intelligent multimedia, video and speech processing. IEEE, pp 95–98

Shao Y, Srinivasan S, Jin Z, Wang D (2010) A computational auditory scene analysis system for speech segregation and robust speech recognition. Comput Speech Lang 24(1):77–93CrossRef

Biswas A, Sahu P, Bhowmick A, Chandra M (2014) Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition. WSEAS Trans Syst 13:130–143

Hermansky H, Morgan N, Bayya A, Kohn P (1991) RASTA-PLP speech analysis. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, vol 1. Citeseer, pp 121–124

Gandhiraj R, Sathidevi P (2007) Auditory-based wavelet packet filterbank for speech recognition using neural network. In: International conference on advanced computing and communications, 2007. ADCOM 2007. IEEE, pp 666–673

Farooq O, Datta S (2001) Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process Lett 8(7):196–198CrossRef

Farooq O, Datta S, Shrotriya M (2010) Wavelet sub-band based temporal features for robust Hindi phoneme recognition. Int J Wavelets Multiresolut Inf Process 8(06):847–859CrossRef

Wang XP, Zhu C-Q, Li Z-G (2002) A comparative study on wavelet packet based front-end in connected mandarin digit recognition. In: International symposium on Chinese spoken language processing

10.

Biswas A, Sahu P, Chandra M (2014) Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Comput Electr Eng 40(4):1111–1122CrossRef

11.

Sahu P, Biswas A, Bhowmick A, Chandra M (2014) Auditory erb like admissible wavelet packet features for timit phoneme recognition. Int J Eng Sci Technol 17(3):145–151CrossRef

12.

Ali AMA, Van der Spiegel J, Mueller P (2002) Robust auditory-based speech processing using the average localized synchrony detection. IEEE Trans Speech Audio Process 10(5):279–292CrossRef

13.

Kajita S, Itakura F (1994) Subband-autocorrelation analysis and its application for speech recognition. In: 1994 IEEE international conference on acoustics, speech, and signal processing, 1994. ICASSP-94, vol 2. IEEE, pp 193–196

14.

Ishizuka K, Miyazaki N (2004) Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition. In: IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings.(ICASSP’04), vol 1. IEEE, pp I–141

15.

Biswas A, Sahu P, Bhowmick A, Chandra M (2015) Hindi phoneme classification using wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Comput Electr Eng 42:12–22CrossRef

16.

Goh YH, Raveendran P, Jamuar SS (2014) Robust speech recognition using harmonic features. IET Signal Process 8(2):167–175CrossRef

17.

Fukuda T, Ichikawa O, Nishimura M (2010) Long-term spectro-temporal and static harmonic features for voice activity detection. IEEE J Sel Top Signal Process 4(5):834–844CrossRef

18.

Biswas A, Sahu PK, Chandra M (2016) Admissible wavelet packet sub-band based harmonic energy features using anova fusion techniques for Hindi phoneme recognition. IET Signal Process 10(8):902–911CrossRef

19.

Biswas A, Sahu PK, Bhowmick A, Chandra M (2015) Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition. IET Signal Process 9(6):511–519CrossRef

20.

Bhowmick A, Chandra M (2017) Speech enhancement using voiced speech probability based wavelet decomposition. Comput Electr Eng 62:706–718CrossRef

21.

Gonzalez S, Brookes M (2014) PEFAC-a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Trans Audio Speech Lang Process 22(2):518–530CrossRef

22.

Islam MT, Shahnaz C, Zhu W-P, Ahmad MO (2015) Speech enhancement based on student modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans Audio Speech Lang Process 23(11):1800–1811CrossRef

23.

Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627MathSciNetCrossRef

24.

Scalart P, Filho JV (1996) Speech enhancement based on a priori signal to noise estimation. In: 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96. Conference Proceedings, vol 2, IEEE, pp 629–632

25.

El-Fattah MAA, Dessouky MI, Abbas AM, Diab SM, El-Rabaie E-SM, Al-Nuaimy W, Alshebeili SA, El-Samie FEA (2014) Speech enhancement with an adaptive wiener filter. Int J Speech Technol 17(1):53–64CrossRef

26.

Cohen I (2004) Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Process Lett 11(9):725–728CrossRef

27.

Lu Y, Loizou PC (2008) A geometric approach to spectral subtraction. Speech Commun 50(6):453–466CrossRef

28.

Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans Audio Speech Lang Process 14(6):2098–2108CrossRef

Titel: Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition
verfasst von: Anirban Bhowmick
Astik Biswas
Mahesh Chandra
Publikationsdatum: 03.04.2019
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 2/2020
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-019-00816-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2020

Segmentation of scanning tunneling microscopy images using variational methods and empirical wavelets

Locality preserving difference component analysis based on the Lq norm

Optimal face templates: the next step in surveillance face recognition

Manifold ranking graph regularization non-negative matrix factorization with global and local structures

Smooth estimates of multiple quantiles in dynamically varying data streams

A novel feature descriptor for image retrieval by combining modified color histogram and diagonally symmetric co-occurrence texture pattern

Premium Partner