Skip to main content
Erschienen in: International Journal of Speech Technology 2/2012

01.06.2012

Filterbank optimization for robust ASR using GA and PSO

verfasst von: R. K. Aggarwal, M. Dave

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients (MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies. The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as in noisy environment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal, R. K., & Dave, M. (2011a). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems Journal. doi:10.1007/s11235-011-9623-0. Special issue on signal processing applications in human computer interaction. Aggarwal, R. K., & Dave, M. (2011a). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems Journal. doi:10.​1007/​s11235-011-9623-0. Special issue on signal processing applications in human computer interaction.
Zurück zum Zitat Aggarwal, R. K., & Dave, M. (2011b). Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I). International Journal of Speech Technology, 14(4), 297–308. CrossRef Aggarwal, R. K., & Dave, M. (2011b). Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I). International Journal of Speech Technology, 14(4), 297–308. CrossRef
Zurück zum Zitat Aggarwal, R. K., & Dave, M. (2011c). Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II). International Journal of Speech Technology, 14(4), 309–320. CrossRef Aggarwal, R. K., & Dave, M. (2011c). Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II). International Journal of Speech Technology, 14(4), 309–320. CrossRef
Zurück zum Zitat Benesty, J., Sondhi, M.M., & Huang, Y. (2008). Handbook of speech processing. Berlin: Springer. CrossRef Benesty, J., Sondhi, M.M., & Huang, Y. (2008). Handbook of speech processing. Berlin: Springer. CrossRef
Zurück zum Zitat Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120. CrossRef Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120. CrossRef
Zurück zum Zitat Burget, L., & Hermansky, H. (2001). Data driven design of filterbank for speech recognition. In Lecture notes in computer science: Vol. 2166. Text, speech and dialogue (pp. 299–304). Berlin: Springer. CrossRef Burget, L., & Hermansky, H. (2001). Data driven design of filterbank for speech recognition. In Lecture notes in computer science: Vol. 2166. Text, speech and dialogue (pp. 299–304). Berlin: Springer. CrossRef
Zurück zum Zitat Chau, C. W., Kwong, S., Diu, C. K., & Fahrner, W. R. (1997). Optimization of HMM by a genetic algorithm. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (pp. 1727–1730). Chau, C. W., Kwong, S., Diu, C. K., & Fahrner, W. R. (1997). Optimization of HMM by a genetic algorithm. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (pp. 1727–1730).
Zurück zum Zitat Chen, J., Benesty, J., Huang, Y., & Doclo, S. (2006). New insights into the noise reduction Wiener filter. IEEE Transactions on Audio, Speech, & Language Processing, 14(4), 1218–1234. CrossRef Chen, J., Benesty, J., Huang, Y., & Doclo, S. (2006). New insights into the noise reduction Wiener filter. IEEE Transactions on Audio, Speech, & Language Processing, 14(4), 1218–1234. CrossRef
Zurück zum Zitat Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366. CrossRef Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366. CrossRef
Zurück zum Zitat Dorigo, M., & Gambardella, L. M. (1997). Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1), 53–56. CrossRef Dorigo, M., & Gambardella, L. M. (1997). Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1), 53–56. CrossRef
Zurück zum Zitat Gales, M., & Young, S. (1996). Robust continuous speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing, 4(5), 352–359. CrossRef Gales, M., & Young, S. (1996). Robust continuous speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing, 4(5), 352–359. CrossRef
Zurück zum Zitat Hermansky, H. (1990). Perceptually predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87, 1738–1752. CrossRef Hermansky, H. (1990). Perceptually predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87, 1738–1752. CrossRef
Zurück zum Zitat Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589. CrossRef Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589. CrossRef
Zurück zum Zitat Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press. Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.
Zurück zum Zitat Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of international conference on neural networks (pp. 1942–1948). Piscataway: IEEE. Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of international conference on neural networks (pp. 1942–1948). Piscataway: IEEE.
Zurück zum Zitat Kennedy, J., Eberhart, R.C., & Shi, Y. (2001). Swarm intelligence. San Mateo: Morgan Kaufmann. Kennedy, J., Eberhart, R.C., & Shi, Y. (2001). Swarm intelligence. San Mateo: Morgan Kaufmann.
Zurück zum Zitat Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. In Proceedings IEEE international conference on acoustics, speech and signal processing (Vol. 1, pp. 421–424). Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. In Proceedings IEEE international conference on acoustics, speech and signal processing (Vol. 1, pp. 421–424).
Zurück zum Zitat Kwong, S., Chau, C. W., & Halang, W. A. (1996). Genetic algorithm for optimizing the nonlinear time alignment of automatic speech recognition systems. IEEE Transactions on Industrial Electronics, 43(5), 559–566. CrossRef Kwong, S., Chau, C. W., & Halang, W. A. (1996). Genetic algorithm for optimizing the nonlinear time alignment of automatic speech recognition systems. IEEE Transactions on Industrial Electronics, 43(5), 559–566. CrossRef
Zurück zum Zitat Kwong, S., Chau, C. W., Man, K. F., & Tang, K. S. (2001). Optimization of HMM topology and its model parameters by genetic algorithms. Pattern Recognition, 34(2), 509–522. MATHCrossRef Kwong, S., Chau, C. W., Man, K. F., & Tang, K. S. (2001). Optimization of HMM topology and its model parameters by genetic algorithms. Pattern Recognition, 34(2), 509–522. MATHCrossRef
Zurück zum Zitat Kwong, S., He, Q. H., Ku, K. W., Chan, T. M., Man, K. F., & Tang, K. S. (2002). A genetic classification error method for speech recognition. Signal Processing, 82, 737–748. MATHCrossRef Kwong, S., He, Q. H., Ku, K. W., Chan, T. M., Man, K. F., & Tang, K. S. (2002). A genetic classification error method for speech recognition. Signal Processing, 82, 737–748. MATHCrossRef
Zurück zum Zitat Loizou, P. C., & Spanias, A. S. (1996). High-performance alphabet recognition. IEEE Transactions on Speech and Audio Processing, 4(6), 430–445. CrossRef Loizou, P. C., & Spanias, A. S. (1996). High-performance alphabet recognition. IEEE Transactions on Speech and Audio Processing, 4(6), 430–445. CrossRef
Zurück zum Zitat Najkar, N., Razzazi, F., & Sameti, H. (2010). A novel approach to HMM-based speech recognition systems using particle swarm optimization. Mathematical and Computer Modelling, 52, 1910–1920. MATHCrossRef Najkar, N., Razzazi, F., & Sameti, H. (2010). A novel approach to HMM-based speech recognition systems using particle swarm optimization. Mathematical and Computer Modelling, 52, 1910–1920. MATHCrossRef
Zurück zum Zitat Paliwal, K. K. (1987). A speech enhancement method based on Kalman filtering. In Proceedings IEEE ICASSP (pp. 177–180). Paliwal, K. K. (1987). A speech enhancement method based on Kalman filtering. In Proceedings IEEE ICASSP (pp. 177–180).
Zurück zum Zitat Rabanal, P., Rodriguez, I., & Rubio, F. (2009). Applying river formation dynamics to solve NP-complete problems. In Studies in computational intelligence: Vol. 193. Nature-inspired algorithms for optimization (pp. 333–368). Springer, Berlin. CrossRef Rabanal, P., Rodriguez, I., & Rubio, F. (2009). Applying river formation dynamics to solve NP-complete problems. In Studies in computational intelligence: Vol. 193. Nature-inspired algorithms for optimization (pp. 333–368). Springer, Berlin. CrossRef
Zurück zum Zitat Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. CrossRef Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. CrossRef
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech and Language, 21, 282–295. CrossRef Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech and Language, 21, 282–295. CrossRef
Zurück zum Zitat Rao, K. S. (2011). Role of neural network models for developing speech systems. Sadhana, 36(5), 783–836. CrossRef Rao, K. S. (2011). Role of neural network models for developing speech systems. Sadhana, 36(5), 783–836. CrossRef
Zurück zum Zitat Shi, Y., & Eberhart, R. C. (1998). Parameter selection in particle swarm optimization. In Proceedings of seventh annual conference on evolutionary programming (pp. 591–601). CrossRef Shi, Y., & Eberhart, R. C. (1998). Parameter selection in particle swarm optimization. In Proceedings of seventh annual conference on evolutionary programming (pp. 591–601). CrossRef
Zurück zum Zitat Skowronski, M. D., & Harris, J. G. (2003). Improving the filterbank of a classic speech feature extraction algorithm. In Proceedings of the IEEE international symposium on circuits and systems (ISCAS’03), (Vol. 4, pp. 281–284). Skowronski, M. D., & Harris, J. G. (2003). Improving the filterbank of a classic speech feature extraction algorithm. In Proceedings of the IEEE international symposium on circuits and systems (ISCAS’03), (Vol. 4, pp. 281–284).
Zurück zum Zitat Skowronski, M. D., & Harris, J. G. (2004). Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition. The Journal of the Acoustical Society of America, 116(3), 1774–1780. CrossRef Skowronski, M. D., & Harris, J. G. (2004). Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition. The Journal of the Acoustical Society of America, 116(3), 1774–1780. CrossRef
Zurück zum Zitat Valle, Y. D., Venayagamoorthy, G. K., Mohagheghi, S., Hernandez, J.-C., & Harley, R. G. (2008). Particle swarm optimization: basic concepts, variants and applications in power systems. IEEE Transactions on Evolutionary Computation, 12(2), 171–195. CrossRef Valle, Y. D., Venayagamoorthy, G. K., Mohagheghi, S., Hernandez, J.-C., & Harley, R. G. (2008). Particle swarm optimization: basic concepts, variants and applications in power systems. IEEE Transactions on Evolutionary Computation, 12(2), 171–195. CrossRef
Zurück zum Zitat Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. ESCA Journal of Speech Communication, 12(3), 247–251. CrossRef Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. ESCA Journal of Speech Communication, 12(3), 247–251. CrossRef
Zurück zum Zitat Welch, L. R. (2003). HMMs and the Baum-Welch algorithms. IEEE Information Theory Society Newsletter, 53(4), 10–13. MathSciNet Welch, L. R. (2003). HMMs and the Baum-Welch algorithms. IEEE Information Theory Society Newsletter, 53(4), 10–13. MathSciNet
Zurück zum Zitat Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589. MATHCrossRef Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589. MATHCrossRef
Metadaten
Titel
Filterbank optimization for robust ASR using GA and PSO
verfasst von
R. K. Aggarwal
M. Dave
Publikationsdatum
01.06.2012
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2012
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9133-9

Weitere Artikel der Ausgabe 2/2012

International Journal of Speech Technology 2/2012 Zur Ausgabe

Neuer Inhalt