nach oben

International Journal of Speech Technology

Erschienen in:

07.08.2018

Mel scaled M-band wavelet filter bank for speech recognition

verfasst von: Prashant Upadhyaya, Omar Farooq, M. R. Abidi

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

A Mel scaled M-band wavelet filter bank structure is used to extract the robust acoustic feature for speech recognition application. The proposed filter bank can provide flexibility of frequency partition that decomposes the speech signal into the M-frequency band. To estimate the difference between Mel scaled M-band wavelet and dyadic wavelet filter bank, relative bandwidth deviation (RBD) and root mean square bandwidth deviation (RMSBD) with respect to baseline (Mel filter bank bandwidth) is calculated. Proposed filter bank gives 40.90 and 49.84% reduction for RBD and RMSBD respectively, over 24-dyadic wavelet filter bank. Feature extraction from the proposed filter bank using AMUAV corpus shows an improvement in terms of word recognition accuracy (WRA) at all SNR range (20 dB to 0 dB) over baseline (MFCC) features. For AMUAV corpus, the proposed feature shows the maximum improvement in WRA of 3.93% over baseline features and 3.90% over dyadic wavelet filter bank features. When applied to the VidTIMIT corpus, proposed features show the maximum improvement in WRA of 1.64% over baseline features and 4.43% over dyadic features.

Vorheriger Artikel Low bit-rate speech coding based on multicomponent AFM signal model

Nächster Artikel A new efficient backward BSS crosstalk-resistant algorithm for automatic blind speech quality enhancement

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abdelnour, A. F. (2002). Wavelet design using grobner basis methods. Ph.D. Dissertation, Department of Electrical Engineering, Polytechnic University, Brooklyn, New York.

Adeli, H., Zhou, Z., & Dadmehr, N. (2003). Analysis of EEG records in an epileptic patient using wavelet transform. Journal of Neuroscience Methods, 123(1), 69–87. https://doi.org/10.1016/S0165-0270(02)00340-0.CrossRef

Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466. https://doi.org/10.1007/s11235-011-9623-0.CrossRef

Bhati, D., Sharma, M., Pachori, R. B., & Gadre, V. M. (2017). Time–frequency localized three-band biorthogonal wavelet filter bank using semidefinite relaxation and nonlinear least squares with epileptic seizure EEG signal classification. Digital Signal Processing, 62, 259–273. https://doi.org/10.1016/J.DSP.2016.12.004.CrossRef

Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014a). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, 17(4), 389–399. https://doi.org/10.1007/s10772-014-9236-6.CrossRef

Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2015). Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Computers & Electrical Engineering, 42, 12–22. https://doi.org/10.1016/J.COMPELECENG.2014.12.017.CrossRef

Biswas, A., Sahu, P. K., & Chandra, M. (2014b). Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Computers & Electrical Engineering, 40(4), 1111–1122. https://doi.org/10.1016/J.COMPELECENG.2014.01.008.CrossRef

Biswas, A., Sahu, P. K., & Chandra, M. (2016). Admissible wavelet packet sub-band based harmonic energy features using ANOVA fusion techniques for Hindi phoneme recognition. IET Signal Processing, 10(8), 902–911. https://doi.org/10.1049/iet-spr.2015.0488.CrossRef

Bouguelia, M.-R., Nowaczyk, S., Santosh, K. C., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319. https://doi.org/10.1007/s13042-017-0645-0.CrossRef

Chiu, C.-C., Chuang, C.-M., & Hsu, C.-Y. (2009). Discrete wavelet transform applied on personal identity verification with ECG signal. International Journal of Wavelets, Multiresolution and Information Processing, 07(03), 341–355. https://doi.org/10.1142/S0219691309002957.CrossRefMATH

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. https://doi.org/10.1109/TASSP.1980.1163420.CrossRef

Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198. https://doi.org/10.1109/97.928676.CrossRef

Farooq, O., & Datta, S. (2003). Wavelet-based denoising for robust feature extraction for speech recognition. Electronics Letters, 39(1), 163–165. https://doi.org/10.1049/el:20030068.CrossRef

Farooq, O., & Datta, S. (2005). Wavelet based robust sub-band features for phoneme recognition. Chinese Journal of Electronics, 14(1), 115–118. https://doi.org/10.1049/ip-vis.

Farooq, O., Datta, S., & Shrotriya, M. C. (2010). Wavelet sub-band based temporal features for robust Hindi phoneme recognition. International Journal of Wavelets, Multiresolution and Information Processing, 08(06), 847–859. https://doi.org/10.1142/S0219691310003845.CrossRef

Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM (pp. 191–194).

Grigoryan, A. M. (2005). Fourier transform representation by frequency-time wavelets. IEEE Transactions on Signal Processing, 53(7), 2489–2497. https://doi.org/10.1109/TSP.2005.849180.MathSciNetCrossRefMATH

Jyothi, P., & Hasegawa-Johnson, M. (2015). Improved Hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3164–3168).

Kim, C., & Stern, R. M. (2012). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4101–4104). IEEE. https://doi.org/10.1109/ICASSP.2012.6288820.

Kumar, K., Aggarwal, R. K., & Jain, A. (2012). A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering, 1(1), 25–32. https://doi.org/10.1504/IJCSYSE.2012.044740.CrossRef

Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777. https://doi.org/10.1109/TASLP.2014.2304637.CrossRef

Lin, T., Hao, P., & Xu, S. (2006a). Matrix factorizations for reversible integer implementation of orthonormal M-band wavelet transforms. Signal Processing, 86(8), 2085–2093. https://doi.org/10.1016/J.SIGPRO.2005.10.015.CrossRefMATH

Lin, T., Xu, S., Shi, Q., & Hao, P. (2006b). An algebraic construction of orthonormal M-band wavelets with perfect reconstruction. Applied Mathematics and Computation, 172(2), 717–730. https://doi.org/10.1016/j.amc.2004.11.025.MathSciNetCrossRefMATH

Long, C. (1999). Wavelet methods in speech recognition. PhD thesis, Loughborough University, Department of Electronic and Electrical Engineering, Loughborough University.

Long, C., & Datta, S. (1996a). Wavelet based feature extraction for phoneme recognition. In Proceeding of fourth international conference on spoken language processing. ICSLP’96 (Vol. 1, pp. 264–267). IEEE. https://doi.org/10.1109/ICSLP.1996.607095.

Long, C. J. J., & Datta, S. (1996b). Wavelet based feature extraction for phoneme recognition. In ICSLP 96: Fourth international conference on spoken language (Vol. 1, pp. 264–267). IEEE. https://doi.org/10.1109/ICSLP.1996.607095.

Mallat, S. A. (2008). A wavelet tour of signal processing the sparse way (3rd ed.). Academic press.

Mishra, A. N., Chandra, M., Biswas, A., & Sharan, S. N. (2013). Hindi phoneme-viseme recognition from continuous speech. International Journal of Signal and Imaging Systems Engineering, 6(3), 164. https://doi.org/10.1504/IJSISE.2013.054793.CrossRef

Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology. https://doi.org/10.1007/s10772-018-9525-6.

Munoz, A., Ertlé, R., & Unser, M. (2002). Continuous wavelet transform with arbitrary scales and O(N) complexity. Signal Processing, 82(5), 749–757. https://doi.org/10.1016/S0165-1684(02)00140-8.CrossRefMATH

Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036. https://doi.org/10.1016/J.ESWA.2007.12.065.CrossRef

Pollock, S., & Cascio, IL (2007). Non-dyadic wavelet analysis. In Optimisation, econometric and financial analysis (pp. 167–203). Berlin: Springer. https://doi.org/10.1007/3-540-36626-1_9.CrossRef

Rajoub, B., Alshamali, A., & Al-Fahoum, A. S. (2002). An efficient coding algorithm for the compression of ECG signals using the wavelet transform. IEEE Transactions on Biomedical Engineering, 49(4), 355–362. https://doi.org/10.1109/10.991163.CrossRef

Rioul, O., & Duhamel, P. (1992). Fast algorithms for discrete and continuous wavelet transforms. IEEE Transactions on Information Theory, 38(2), 569–586. https://doi.org/10.1109/18.119724.MathSciNetCrossRefMATH

Rioul, O., & Vetterli, M. (1991). Wavelets and signal processing. IEEE Signal Processing Magazine, 8(4), 14–38. https://doi.org/10.1109/79.91217.CrossRef

Sanderson, C., & Lovell, B. C. (2009). Multi-region probabilistic histograms for robust and scalable identity inference. In Lecture notes in computer science (Vol. 5558, pp. 199–208). Berlin: Springer. https://doi.org/10.1007/978-3-642-01793-3_21.

Shui, P., & Bao, Z. (2004). M-band biorthogonal interpolating wavelets via lifting scheme. IEEE Transactions on Signal Processing, 52(9), 2500–2512.MathSciNetCrossRefMATH

Steffen, P., Heller, P. N., Gopinath, R. A., & Burrus, C. S. (1993). Theory of regular M-band wavelet bases. IEEE Transactions on Signal Processing, 41(12), 3497–3511. https://doi.org/10.1109/78.258088.CrossRefMATH

Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence. Signal Processing, 106, 184–197. https://doi.org/10.1016/J.SIGPRO.2014.06.027.CrossRef

Tian, J., & Wells, R. O. (1998). A fast implementation of wavelet transform for m-band filter banks. In Proceedings of SPIE wavelet applications V (Vol. 3391, pp. 534–545).

Tian, J., & Wells, R. O. (2000). An algebraic structure of orthogonal wavelet space. Applied and Computational Harmonic Analysis, 8(3), 223–248. https://doi.org/10.1006/acha.2000.0300.MathSciNetCrossRefMATH

Upadhyaya, P., Farooq, O., Abidi, M. R., & Varshney, P. (2015). Comparative study of visual feature for bimodal Hindi speech recognition. Archives of Acoustics, 40(4), 609–619. https://doi.org/10.1515/aoa-2015-0061.CrossRef

Vaidyanathan, P. P. (1990). Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial. Proceedings of the IEEE, 78(1), 56–93. https://doi.org/10.1109/5.52200.CrossRef

Vaidyanathan, P. P., & Hoang, P. (1988). Lattice structures for optimal design and robust implementation of two-channel perfect-reconstruction QMF banks. IEEE Transactions on Acoustics. Speech, and Signal Processing, 36(I), 81–92.CrossRef

Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. https://doi.org/10.1016/0167-6393(93)90095-3.CrossRef

Vetterli, M., & Herley, C. (1992). Wavelets and filter banks: Theory and design. IEEE Transactions on Signal Processing, 40(9), 2207–2232. https://doi.org/10.1109/78.157221.CrossRefMATH

Vetterli, M., & Kovačević, J. (1995). Wavelets and subband coding. Book (2nd Ed.). Englewood Cliffs: Prentice Hall PTR.MATH

Zao, L., Coelho, R., & Flandrin, P. (2014). Speech enhancement with EMD and hurst-based mode selection. IEEE Transactions on Audio, Speech and Language Processing, 22(5), 899–911. https://doi.org/10.1109/TASLP.2014.2312541.CrossRef

Titel: Mel scaled M-band wavelet filter bank for speech recognition
verfasst von: Prashant Upadhyaya
Omar Farooq
M. R. Abidi
Publikationsdatum: 07.08.2018
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-018-9545-2

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Kryptowährungen/© gopixa / Getty Images / iStock, MG4 aus China auf dem Prüfstand im ADAC-Technik-Zentrum in Landsberg am Lech/© ADAC e.V., Chassis eines Elektrofahrzeugs/© chesky / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2018

Revisiting distinctive phonetic features from applied computing perspective: unifying views and analyzing modern Arabic speech varieties

Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition

A new robust forward BSS adaptive algorithm based on automatic voice activity detector for speech quality enhancement

Three-stage speaker verification architecture in emotional talking environments

Enhancing speech intelligibility in reverberant spaces by a speech features distributions dependent pre-processing

Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.