Skip to main content
Erschienen in: International Journal of Speech Technology 4/2018

07.08.2018

Mel scaled M-band wavelet filter bank for speech recognition

verfasst von: Prashant Upadhyaya, Omar Farooq, M. R. Abidi

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A Mel scaled M-band wavelet filter bank structure is used to extract the robust acoustic feature for speech recognition application. The proposed filter bank can provide flexibility of frequency partition that decomposes the speech signal into the M-frequency band. To estimate the difference between Mel scaled M-band wavelet and dyadic wavelet filter bank, relative bandwidth deviation (RBD) and root mean square bandwidth deviation (RMSBD) with respect to baseline (Mel filter bank bandwidth) is calculated. Proposed filter bank gives 40.90 and 49.84% reduction for RBD and RMSBD respectively, over 24-dyadic wavelet filter bank. Feature extraction from the proposed filter bank using AMUAV corpus shows an improvement in terms of word recognition accuracy (WRA) at all SNR range (20 dB to 0 dB) over baseline (MFCC) features. For AMUAV corpus, the proposed feature shows the maximum improvement in WRA of 3.93% over baseline features and 3.90% over dyadic wavelet filter bank features. When applied to the VidTIMIT corpus, proposed features show the maximum improvement in WRA of 1.64% over baseline features and 4.43% over dyadic features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdelnour, A. F. (2002). Wavelet design using grobner basis methods. Ph.D. Dissertation, Department of Electrical Engineering, Polytechnic University, Brooklyn, New York. Abdelnour, A. F. (2002). Wavelet design using grobner basis methods. Ph.D. Dissertation, Department of Electrical Engineering, Polytechnic University, Brooklyn, New York.
Zurück zum Zitat Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM (pp. 191–194). Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM (pp. 191–194).
Zurück zum Zitat Jyothi, P., & Hasegawa-Johnson, M. (2015). Improved Hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3164–3168). Jyothi, P., & Hasegawa-Johnson, M. (2015). Improved Hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3164–3168).
Zurück zum Zitat Long, C. (1999). Wavelet methods in speech recognition. PhD thesis, Loughborough University, Department of Electronic and Electrical Engineering, Loughborough University. Long, C. (1999). Wavelet methods in speech recognition. PhD thesis, Loughborough University, Department of Electronic and Electrical Engineering, Loughborough University.
Zurück zum Zitat Mallat, S. A. (2008). A wavelet tour of signal processing the sparse way (3rd ed.). Academic press. Mallat, S. A. (2008). A wavelet tour of signal processing the sparse way (3rd ed.). Academic press.
Zurück zum Zitat Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology. https://doi.org/10.1007/s10772-018-9525-6. Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology. https://​doi.​org/​10.​1007/​s10772-018-9525-6.
Zurück zum Zitat Shui, P., & Bao, Z. (2004). M-band biorthogonal interpolating wavelets via lifting scheme. IEEE Transactions on Signal Processing, 52(9), 2500–2512.MathSciNetCrossRefMATH Shui, P., & Bao, Z. (2004). M-band biorthogonal interpolating wavelets via lifting scheme. IEEE Transactions on Signal Processing, 52(9), 2500–2512.MathSciNetCrossRefMATH
Zurück zum Zitat Tian, J., & Wells, R. O. (1998). A fast implementation of wavelet transform for m-band filter banks. In Proceedings of SPIE wavelet applications V (Vol. 3391, pp. 534–545). Tian, J., & Wells, R. O. (1998). A fast implementation of wavelet transform for m-band filter banks. In Proceedings of SPIE wavelet applications V (Vol. 3391, pp. 534–545).
Zurück zum Zitat Vaidyanathan, P. P., & Hoang, P. (1988). Lattice structures for optimal design and robust implementation of two-channel perfect-reconstruction QMF banks. IEEE Transactions on Acoustics. Speech, and Signal Processing, 36(I), 81–92.CrossRef Vaidyanathan, P. P., & Hoang, P. (1988). Lattice structures for optimal design and robust implementation of two-channel perfect-reconstruction QMF banks. IEEE Transactions on Acoustics. Speech, and Signal Processing, 36(I), 81–92.CrossRef
Zurück zum Zitat Vetterli, M., & Kovačević, J. (1995). Wavelets and subband coding. Book (2nd Ed.). Englewood Cliffs: Prentice Hall PTR.MATH Vetterli, M., & Kovačević, J. (1995). Wavelets and subband coding. Book (2nd Ed.). Englewood Cliffs: Prentice Hall PTR.MATH
Metadaten
Titel
Mel scaled M-band wavelet filter bank for speech recognition
verfasst von
Prashant Upadhyaya
Omar Farooq
M. R. Abidi
Publikationsdatum
07.08.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9545-2

Weitere Artikel der Ausgabe 4/2018

International Journal of Speech Technology 4/2018 Zur Ausgabe

Neuer Inhalt