Skip to main content
Erschienen in: Pattern Analysis and Applications 2/2020

03.04.2019 | Theoretical advances

Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

verfasst von: Anirban Bhowmick, Astik Biswas, Mahesh Chandra

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Wavelet-based front-end processing technique has gained popularity for its noise removing capability. In this paper, a robust automatic speech recognition system is proposed by utilizing the advantages of psycho-acoustically motivated wavelet-based front-end compensator. In the front-end compensator block, voiced speech probability-based voice activity detector system is designed to separate voiced and unvoiced frames and to update noise statistics. The wavelet packet decomposition tree is designed according to equal rectangular bandwidth (ERB) scale. Wavelet decomposition based on ERB scale is utilized here as the central frequency of the ERB distribution resembles frequency response of human cochlea. Voiced and unvoiced frames are separately decomposed into 24 sub-bands to estimate average sub-band energy (ASE) of each frame. ASE is then used to calculate threshold value. Lastly, Wiener filtering is employed for reducing the residual noise before final reconstruction stage. The proposed system is evaluated on TIMIT database under various noise conditions. The phoneme recognition accuracy of the proposed system is compared with different baseline and robust features as well as with existing front-end compensation techniques. Additionally, the proposed front-end compensator is evaluated in terms of phoneme classification accuracy. Performance improvement is observed in all above experiments.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366CrossRef Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366CrossRef
2.
Zurück zum Zitat Wong E, Sridharan S (2001) Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification. In: Proceedings of 2001 international symposium on intelligent multimedia, video and speech processing. IEEE, pp 95–98 Wong E, Sridharan S (2001) Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification. In: Proceedings of 2001 international symposium on intelligent multimedia, video and speech processing. IEEE, pp 95–98
3.
Zurück zum Zitat Shao Y, Srinivasan S, Jin Z, Wang D (2010) A computational auditory scene analysis system for speech segregation and robust speech recognition. Comput Speech Lang 24(1):77–93CrossRef Shao Y, Srinivasan S, Jin Z, Wang D (2010) A computational auditory scene analysis system for speech segregation and robust speech recognition. Comput Speech Lang 24(1):77–93CrossRef
4.
Zurück zum Zitat Biswas A, Sahu P, Bhowmick A, Chandra M (2014) Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition. WSEAS Trans Syst 13:130–143 Biswas A, Sahu P, Bhowmick A, Chandra M (2014) Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition. WSEAS Trans Syst 13:130–143
5.
Zurück zum Zitat Hermansky H, Morgan N, Bayya A, Kohn P (1991) RASTA-PLP speech analysis. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, vol 1. Citeseer, pp 121–124 Hermansky H, Morgan N, Bayya A, Kohn P (1991) RASTA-PLP speech analysis. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, vol 1. Citeseer, pp 121–124
6.
Zurück zum Zitat Gandhiraj R, Sathidevi P (2007) Auditory-based wavelet packet filterbank for speech recognition using neural network. In: International conference on advanced computing and communications, 2007. ADCOM 2007. IEEE, pp 666–673 Gandhiraj R, Sathidevi P (2007) Auditory-based wavelet packet filterbank for speech recognition using neural network. In: International conference on advanced computing and communications, 2007. ADCOM 2007. IEEE, pp 666–673
7.
Zurück zum Zitat Farooq O, Datta S (2001) Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process Lett 8(7):196–198CrossRef Farooq O, Datta S (2001) Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process Lett 8(7):196–198CrossRef
8.
Zurück zum Zitat Farooq O, Datta S, Shrotriya M (2010) Wavelet sub-band based temporal features for robust Hindi phoneme recognition. Int J Wavelets Multiresolut Inf Process 8(06):847–859CrossRef Farooq O, Datta S, Shrotriya M (2010) Wavelet sub-band based temporal features for robust Hindi phoneme recognition. Int J Wavelets Multiresolut Inf Process 8(06):847–859CrossRef
9.
Zurück zum Zitat Wang XP, Zhu C-Q, Li Z-G (2002) A comparative study on wavelet packet based front-end in connected mandarin digit recognition. In: International symposium on Chinese spoken language processing Wang XP, Zhu C-Q, Li Z-G (2002) A comparative study on wavelet packet based front-end in connected mandarin digit recognition. In: International symposium on Chinese spoken language processing
10.
Zurück zum Zitat Biswas A, Sahu P, Chandra M (2014) Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Comput Electr Eng 40(4):1111–1122CrossRef Biswas A, Sahu P, Chandra M (2014) Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Comput Electr Eng 40(4):1111–1122CrossRef
11.
Zurück zum Zitat Sahu P, Biswas A, Bhowmick A, Chandra M (2014) Auditory erb like admissible wavelet packet features for timit phoneme recognition. Int J Eng Sci Technol 17(3):145–151CrossRef Sahu P, Biswas A, Bhowmick A, Chandra M (2014) Auditory erb like admissible wavelet packet features for timit phoneme recognition. Int J Eng Sci Technol 17(3):145–151CrossRef
12.
Zurück zum Zitat Ali AMA, Van der Spiegel J, Mueller P (2002) Robust auditory-based speech processing using the average localized synchrony detection. IEEE Trans Speech Audio Process 10(5):279–292CrossRef Ali AMA, Van der Spiegel J, Mueller P (2002) Robust auditory-based speech processing using the average localized synchrony detection. IEEE Trans Speech Audio Process 10(5):279–292CrossRef
13.
Zurück zum Zitat Kajita S, Itakura F (1994) Subband-autocorrelation analysis and its application for speech recognition. In: 1994 IEEE international conference on acoustics, speech, and signal processing, 1994. ICASSP-94, vol 2. IEEE, pp 193–196 Kajita S, Itakura F (1994) Subband-autocorrelation analysis and its application for speech recognition. In: 1994 IEEE international conference on acoustics, speech, and signal processing, 1994. ICASSP-94, vol 2. IEEE, pp 193–196
14.
Zurück zum Zitat Ishizuka K, Miyazaki N (2004) Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition. In: IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings.(ICASSP’04), vol 1. IEEE, pp I–141 Ishizuka K, Miyazaki N (2004) Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition. In: IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings.(ICASSP’04), vol 1. IEEE, pp I–141
15.
Zurück zum Zitat Biswas A, Sahu P, Bhowmick A, Chandra M (2015) Hindi phoneme classification using wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Comput Electr Eng 42:12–22CrossRef Biswas A, Sahu P, Bhowmick A, Chandra M (2015) Hindi phoneme classification using wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Comput Electr Eng 42:12–22CrossRef
16.
Zurück zum Zitat Goh YH, Raveendran P, Jamuar SS (2014) Robust speech recognition using harmonic features. IET Signal Process 8(2):167–175CrossRef Goh YH, Raveendran P, Jamuar SS (2014) Robust speech recognition using harmonic features. IET Signal Process 8(2):167–175CrossRef
17.
Zurück zum Zitat Fukuda T, Ichikawa O, Nishimura M (2010) Long-term spectro-temporal and static harmonic features for voice activity detection. IEEE J Sel Top Signal Process 4(5):834–844CrossRef Fukuda T, Ichikawa O, Nishimura M (2010) Long-term spectro-temporal and static harmonic features for voice activity detection. IEEE J Sel Top Signal Process 4(5):834–844CrossRef
18.
Zurück zum Zitat Biswas A, Sahu PK, Chandra M (2016) Admissible wavelet packet sub-band based harmonic energy features using anova fusion techniques for Hindi phoneme recognition. IET Signal Process 10(8):902–911CrossRef Biswas A, Sahu PK, Chandra M (2016) Admissible wavelet packet sub-band based harmonic energy features using anova fusion techniques for Hindi phoneme recognition. IET Signal Process 10(8):902–911CrossRef
19.
Zurück zum Zitat Biswas A, Sahu PK, Bhowmick A, Chandra M (2015) Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition. IET Signal Process 9(6):511–519CrossRef Biswas A, Sahu PK, Bhowmick A, Chandra M (2015) Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition. IET Signal Process 9(6):511–519CrossRef
20.
Zurück zum Zitat Bhowmick A, Chandra M (2017) Speech enhancement using voiced speech probability based wavelet decomposition. Comput Electr Eng 62:706–718CrossRef Bhowmick A, Chandra M (2017) Speech enhancement using voiced speech probability based wavelet decomposition. Comput Electr Eng 62:706–718CrossRef
21.
Zurück zum Zitat Gonzalez S, Brookes M (2014) PEFAC-a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Trans Audio Speech Lang Process 22(2):518–530CrossRef Gonzalez S, Brookes M (2014) PEFAC-a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Trans Audio Speech Lang Process 22(2):518–530CrossRef
22.
Zurück zum Zitat Islam MT, Shahnaz C, Zhu W-P, Ahmad MO (2015) Speech enhancement based on student modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans Audio Speech Lang Process 23(11):1800–1811CrossRef Islam MT, Shahnaz C, Zhu W-P, Ahmad MO (2015) Speech enhancement based on student modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans Audio Speech Lang Process 23(11):1800–1811CrossRef
24.
Zurück zum Zitat Scalart P, Filho JV (1996) Speech enhancement based on a priori signal to noise estimation. In: 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96. Conference Proceedings, vol 2, IEEE, pp 629–632 Scalart P, Filho JV (1996) Speech enhancement based on a priori signal to noise estimation. In: 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96. Conference Proceedings, vol 2, IEEE, pp 629–632
25.
Zurück zum Zitat El-Fattah MAA, Dessouky MI, Abbas AM, Diab SM, El-Rabaie E-SM, Al-Nuaimy W, Alshebeili SA, El-Samie FEA (2014) Speech enhancement with an adaptive wiener filter. Int J Speech Technol 17(1):53–64CrossRef El-Fattah MAA, Dessouky MI, Abbas AM, Diab SM, El-Rabaie E-SM, Al-Nuaimy W, Alshebeili SA, El-Samie FEA (2014) Speech enhancement with an adaptive wiener filter. Int J Speech Technol 17(1):53–64CrossRef
26.
Zurück zum Zitat Cohen I (2004) Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Process Lett 11(9):725–728CrossRef Cohen I (2004) Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Process Lett 11(9):725–728CrossRef
27.
Zurück zum Zitat Lu Y, Loizou PC (2008) A geometric approach to spectral subtraction. Speech Commun 50(6):453–466CrossRef Lu Y, Loizou PC (2008) A geometric approach to spectral subtraction. Speech Commun 50(6):453–466CrossRef
28.
Zurück zum Zitat Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans Audio Speech Lang Process 14(6):2098–2108CrossRef Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans Audio Speech Lang Process 14(6):2098–2108CrossRef
Metadaten
Titel
Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition
verfasst von
Anirban Bhowmick
Astik Biswas
Mahesh Chandra
Publikationsdatum
03.04.2019
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 2/2020
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-019-00816-0

Weitere Artikel der Ausgabe 2/2020

Pattern Analysis and Applications 2/2020 Zur Ausgabe

Premium Partner