nach oben

International Journal of Speech Technology

Erschienen in:

01.06.2012

Filterbank optimization for robust ASR using GA and PSO

verfasst von: R. K. Aggarwal, M. Dave

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients (MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies. The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as in noisy environment.

Vorheriger Artikel A pertinent learning machine input feature for speaker discrimination by voice

Nächster Artikel Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Aggarwal, R. K., & Dave, M. (2011a). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems Journal. doi:10.1007/s11235-011-9623-0. Special issue on signal processing applications in human computer interaction.

Aggarwal, R. K., & Dave, M. (2011b). Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I). International Journal of Speech Technology, 14(4), 297–308. CrossRef

Aggarwal, R. K., & Dave, M. (2011c). Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II). International Journal of Speech Technology, 14(4), 309–320. CrossRef

Benesty, J., Sondhi, M.M., & Huang, Y. (2008). Handbook of speech processing. Berlin: Springer. CrossRef

Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120. CrossRef

Burget, L., & Hermansky, H. (2001). Data driven design of filterbank for speech recognition. In Lecture notes in computer science: Vol. 2166. Text, speech and dialogue (pp. 299–304). Berlin: Springer. CrossRef

Chau, C. W., Kwong, S., Diu, C. K., & Fahrner, W. R. (1997). Optimization of HMM by a genetic algorithm. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (pp. 1727–1730).

Chen, J., Benesty, J., Huang, Y., & Doclo, S. (2006). New insights into the noise reduction Wiener filter. IEEE Transactions on Audio, Speech, & Language Processing, 14(4), 1218–1234. CrossRef

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366. CrossRef

Dorigo, M., & Gambardella, L. M. (1997). Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1), 53–56. CrossRef

Gales, M., & Young, S. (1996). Robust continuous speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing, 4(5), 352–359. CrossRef

Hermansky, H. (1990). Perceptually predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87, 1738–1752. CrossRef

Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589. CrossRef

Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.

Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of international conference on neural networks (pp. 1942–1948). Piscataway: IEEE.

Kennedy, J., Eberhart, R.C., & Shi, Y. (2001). Swarm intelligence. San Mateo: Morgan Kaufmann.

Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. In Proceedings IEEE international conference on acoustics, speech and signal processing (Vol. 1, pp. 421–424).

Kwong, S., Chau, C. W., & Halang, W. A. (1996). Genetic algorithm for optimizing the nonlinear time alignment of automatic speech recognition systems. IEEE Transactions on Industrial Electronics, 43(5), 559–566. CrossRef

Kwong, S., Chau, C. W., Man, K. F., & Tang, K. S. (2001). Optimization of HMM topology and its model parameters by genetic algorithms. Pattern Recognition, 34(2), 509–522. MATHCrossRef

Kwong, S., He, Q. H., Ku, K. W., Chan, T. M., Man, K. F., & Tang, K. S. (2002). A genetic classification error method for speech recognition. Signal Processing, 82, 737–748. MATHCrossRef

Loizou, P. C., & Spanias, A. S. (1996). High-performance alphabet recognition. IEEE Transactions on Speech and Audio Processing, 4(6), 430–445. CrossRef

Najkar, N., Razzazi, F., & Sameti, H. (2010). A novel approach to HMM-based speech recognition systems using particle swarm optimization. Mathematical and Computer Modelling, 52, 1910–1920. MATHCrossRef

Paliwal, K. K. (1987). A speech enhancement method based on Kalman filtering. In Proceedings IEEE ICASSP (pp. 177–180).

Rabanal, P., Rodriguez, I., & Rubio, F. (2009). Applying river formation dynamics to solve NP-complete problems. In Studies in computational intelligence: Vol. 193. Nature-inspired algorithms for optimization (pp. 333–368). Springer, Berlin. CrossRef

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. CrossRef

Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech and Language, 21, 282–295. CrossRef

Rao, K. S. (2011). Role of neural network models for developing speech systems. Sadhana, 36(5), 783–836. CrossRef

Shi, Y., & Eberhart, R. C. (1998). Parameter selection in particle swarm optimization. In Proceedings of seventh annual conference on evolutionary programming (pp. 591–601). CrossRef

Skowronski, M. D., & Harris, J. G. (2003). Improving the filterbank of a classic speech feature extraction algorithm. In Proceedings of the IEEE international symposium on circuits and systems (ISCAS’03), (Vol. 4, pp. 281–284).

Skowronski, M. D., & Harris, J. G. (2004). Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition. The Journal of the Acoustical Society of America, 116(3), 1774–1780. CrossRef

Valle, Y. D., Venayagamoorthy, G. K., Mohagheghi, S., Hernandez, J.-C., & Harley, R. G. (2008). Particle swarm optimization: basic concepts, variants and applications in power systems. IEEE Transactions on Evolutionary Computation, 12(2), 171–195. CrossRef

Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. ESCA Journal of Speech Communication, 12(3), 247–251. CrossRef

Welch, L. R. (2003). HMMs and the Baum-Welch algorithms. IEEE Information Theory Society Newsletter, 53(4), 10–13. MathSciNet

Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589. MATHCrossRef

Titel: Filterbank optimization for robust ASR using GA and PSO
verfasst von: R. K. Aggarwal
M. Dave
Publikationsdatum: 01.06.2012
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 2/2012
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9133-9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Arbeitszeit/© granata68 / Fotolia, E-Autos im Fuhrpark: Lohnt sich das noch?/© Petair / stock.adobe.com, Kryptowährungen/© gopixa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2012

A HMM-WDLT framework for HNM-based voice conversion with parametric adjustment in formant bandwidth, duration and excitation

Integration of multiple acoustic and language models for improved Hindi speech recognition system

A pertinent learning machine input feature for speaker discrimination by voice

Overall performance evaluation of adaptive multi rate 06.90 speech codec based on code excited linear prediction algorithm using MATLAB

Time–domain non-linear feature parameter for consonant classification

Emotion recognition from speech: a review

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.