nach oben

International Journal of Speech Technology

Erschienen in:

09.08.2017

A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers

verfasst von: Virender Kadyan, Archana Mantri, R. K. Aggarwal

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Automatic speech recognition (ASR) system plays a vital role in the human–machine interaction. ASR system faces the challenge of performance degradation due to inconsistency between training and testing phases. This occurs due to extraction and representation of erroneous, redundant feature vectors. This paper proposes three different combinations at speech feature vector generation phase and two hybrid classifiers at modeling phase. In feature extraction phase MFCC, RASTA-PLP, and PLP are combined in different ways. In modeling phase, the mean and variance are calculated to generate the inter and intra class feature vectors. These feature vectors are further adopted by optimization algorithm to generate refined feature vectors with traditional statistical technique. This approach uses GA + HMM and DE + HMM techniques to produce refine model parameters. The experiments are conducted on datasets of large vocabulary isolated Punjabi lexicons. The simulation result shows the performance improvement using MFCC and DE + HMM technique when compared with RASTA-PLP, PLP using hybrid HMM classifiers.

Vorheriger Artikel Articulatory movement features for short-duration text-dependent speaker verification

Nächster Artikel Voice comparison between smokers and non-smokers using HMM speech recognition system

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466.CrossRef

Alam, M. J., Kenny, P., Dumouchel, P., & O’Shaughnessy, D. (2014). Robust feature extractors for continuous speech recognition. 22nd European Signal Processing Conference, (EUSIPCO), pp. 944–948.

Alam, M. J., Kinnunen, T., Kenny, P., Ouellet, P., & O’Shaughnessy, D. (2013). Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Communication, 55(2), 237–251.CrossRef

Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.MathSciNetMATH

Chang, E. I., Lippmann, R., & Tong, D. W. (1990). Using genetic algorithms to improve pattern classification performance. In NIPS, pp. 797–803.

Clemente, I. A., Heckmann, M., & Wrede, B. (2012). Incremental word learning: Efficient hmm initialization and large margin discriminative adaptation. Speech Communication, 54(9), 1029–1048.CrossRef

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366.CrossRef

Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi speech to text system for connected words. Fourth International Conference on Advances in Recent Technologies in Communication and Computing, pp. 206–209.

Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi automatic speech recognition using HTK. International Journal of Computer Science Issues, 9(4), 359–364.

Ganapathiraju, A. (2002). Support vector machines for speech recognition. Doctoral dissertation, Mississippi State University.

Ghai, W., & Singh, N. (2013). Continuous speech recognition for Punjabi language. International Journal of Computer Applications, 72(14), 23–28.CrossRef

Grierson, G. A. (1968). Linguistic survey of India. 5: Indo-aryan family, Eastern group; 2. New Delhi: Motilal Banarsidass.

Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.CrossRef

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef

Holand, J. H. (1975). Adaptation in natural and artificial systems’. Ann Arbor: University of Michigan.

Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272.CrossRefMathSciNetMATH

Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 421–424.

Kumar, R., & Singh, M. (2011). Spoken isolated word recognition of Punjabi language using dynamic time warp technique. In C. Singh, G. Singh Lehal, J. Sengupta, D. V. Sharma, V. Goyal (Eds.), Information systems for Indian languages (pp. 301–301). Heidelberg: Springer.CrossRef

Lata, S., & Arora, S. (2012). Exploratory analysis of Punjabi tones in relation to orthographic characters: A case study. Workshop on Indian Language and Data: Resources and Evaluation Workshop Programme, pp. 76.

Lippmann, R. P. (1989). Review of neural networks for speech recognition. Neural Computation, 1(1), 1–38.CrossRef

Mittal, S. (2014). Development of phonetic engine for Punjabi language. Masters dissertation, Thapar University Patiala.

Mittal, T., & Sharma, R. K. (2016). Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm. Turkish Journal of Electrical Engineering & Computer Sciences, 24(6), 4790–4803.CrossRef

Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., & Schwarz, P. (2010). Subspace gaussian mixture models for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4330–4333.

Psutka, J., Müller, L., & Psutka, J. V. (2001). Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task. 7th European Conference on Speech Communication and Technology (EUROSPEECH 2001), Aalborg, Denmark, pp. 1813–1816.

Punjabi Speech Corpus. Retrieved at 10:30, August 20, 2015, from http://cdac.in/index.aspx?id=mc_ilf_Speech_Corpora.

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

Titel: A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers
verfasst von: Virender Kadyan
Archana Mantri
R. K. Aggarwal
Publikationsdatum: 09.08.2017
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2017
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-017-9446-9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Gamification/© Sergey Shulgin / Getty Images / iStock, Benedikt Bonnmann von Adesso/© Adesso, Teilzeit/© Fokussiert / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2017

A voice command detection system for aerospace applications

On the application of quantum clustering on speech data

A waveform concatenation technique for text-to-speech synthesis

Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

Power distance and verbal index in Kazakh business discourse

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.