Skip to main content
Erschienen in: International Journal of Speech Technology 4/2017

09.08.2017

A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers

verfasst von: Virender Kadyan, Archana Mantri, R. K. Aggarwal

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic speech recognition (ASR) system plays a vital role in the human–machine interaction. ASR system faces the challenge of performance degradation due to inconsistency between training and testing phases. This occurs due to extraction and representation of erroneous, redundant feature vectors. This paper proposes three different combinations at speech feature vector generation phase and two hybrid classifiers at modeling phase. In feature extraction phase MFCC, RASTA-PLP, and PLP are combined in different ways. In modeling phase, the mean and variance are calculated to generate the inter and intra class feature vectors. These feature vectors are further adopted by optimization algorithm to generate refined feature vectors with traditional statistical technique. This approach uses GA + HMM and DE + HMM techniques to produce refine model parameters. The experiments are conducted on datasets of large vocabulary isolated Punjabi lexicons. The simulation result shows the performance improvement using MFCC and DE + HMM technique when compared with RASTA-PLP, PLP using hybrid HMM classifiers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466.CrossRef Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466.CrossRef
Zurück zum Zitat Alam, M. J., Kenny, P., Dumouchel, P., & O’Shaughnessy, D. (2014). Robust feature extractors for continuous speech recognition. 22nd European Signal Processing Conference, (EUSIPCO), pp. 944–948. Alam, M. J., Kenny, P., Dumouchel, P., & O’Shaughnessy, D. (2014). Robust feature extractors for continuous speech recognition. 22nd European Signal Processing Conference, (EUSIPCO), pp. 944–948.
Zurück zum Zitat Alam, M. J., Kinnunen, T., Kenny, P., Ouellet, P., & O’Shaughnessy, D. (2013). Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Communication, 55(2), 237–251.CrossRef Alam, M. J., Kinnunen, T., Kenny, P., Ouellet, P., & O’Shaughnessy, D. (2013). Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Communication, 55(2), 237–251.CrossRef
Zurück zum Zitat Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.MathSciNetMATH Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.MathSciNetMATH
Zurück zum Zitat Chang, E. I., Lippmann, R., & Tong, D. W. (1990). Using genetic algorithms to improve pattern classification performance. In NIPS, pp. 797–803. Chang, E. I., Lippmann, R., & Tong, D. W. (1990). Using genetic algorithms to improve pattern classification performance. In NIPS, pp. 797–803.
Zurück zum Zitat Clemente, I. A., Heckmann, M., & Wrede, B. (2012). Incremental word learning: Efficient hmm initialization and large margin discriminative adaptation. Speech Communication, 54(9), 1029–1048.CrossRef Clemente, I. A., Heckmann, M., & Wrede, B. (2012). Incremental word learning: Efficient hmm initialization and large margin discriminative adaptation. Speech Communication, 54(9), 1029–1048.CrossRef
Zurück zum Zitat Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366.CrossRef Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366.CrossRef
Zurück zum Zitat Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi speech to text system for connected words. Fourth International Conference on Advances in Recent Technologies in Communication and Computing, pp. 206–209. Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi speech to text system for connected words. Fourth International Conference on Advances in Recent Technologies in Communication and Computing, pp. 206–209.
Zurück zum Zitat Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi automatic speech recognition using HTK. International Journal of Computer Science Issues, 9(4), 359–364. Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi automatic speech recognition using HTK. International Journal of Computer Science Issues, 9(4), 359–364.
Zurück zum Zitat Ganapathiraju, A. (2002). Support vector machines for speech recognition. Doctoral dissertation, Mississippi State University. Ganapathiraju, A. (2002). Support vector machines for speech recognition. Doctoral dissertation, Mississippi State University.
Zurück zum Zitat Ghai, W., & Singh, N. (2013). Continuous speech recognition for Punjabi language. International Journal of Computer Applications, 72(14), 23–28.CrossRef Ghai, W., & Singh, N. (2013). Continuous speech recognition for Punjabi language. International Journal of Computer Applications, 72(14), 23–28.CrossRef
Zurück zum Zitat Grierson, G. A. (1968). Linguistic survey of India. 5: Indo-aryan family, Eastern group; 2. New Delhi: Motilal Banarsidass. Grierson, G. A. (1968). Linguistic survey of India. 5: Indo-aryan family, Eastern group; 2. New Delhi: Motilal Banarsidass.
Zurück zum Zitat Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.CrossRef Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.CrossRef
Zurück zum Zitat Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef
Zurück zum Zitat Holand, J. H. (1975). Adaptation in natural and artificial systems’. Ann Arbor: University of Michigan. Holand, J. H. (1975). Adaptation in natural and artificial systems’. Ann Arbor: University of Michigan.
Zurück zum Zitat Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 421–424. Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 421–424.
Zurück zum Zitat Kumar, R., & Singh, M. (2011). Spoken isolated word recognition of Punjabi language using dynamic time warp technique. In C. Singh, G. Singh Lehal, J. Sengupta, D. V. Sharma, V. Goyal (Eds.), Information systems for Indian languages (pp. 301–301). Heidelberg: Springer.CrossRef Kumar, R., & Singh, M. (2011). Spoken isolated word recognition of Punjabi language using dynamic time warp technique. In C. Singh, G. Singh Lehal, J. Sengupta, D. V. Sharma, V. Goyal (Eds.), Information systems for Indian languages (pp. 301–301). Heidelberg: Springer.CrossRef
Zurück zum Zitat Lata, S., & Arora, S. (2012). Exploratory analysis of Punjabi tones in relation to orthographic characters: A case study. Workshop on Indian Language and Data: Resources and Evaluation Workshop Programme, pp. 76. Lata, S., & Arora, S. (2012). Exploratory analysis of Punjabi tones in relation to orthographic characters: A case study. Workshop on Indian Language and Data: Resources and Evaluation Workshop Programme, pp. 76.
Zurück zum Zitat Lippmann, R. P. (1989). Review of neural networks for speech recognition. Neural Computation, 1(1), 1–38.CrossRef Lippmann, R. P. (1989). Review of neural networks for speech recognition. Neural Computation, 1(1), 1–38.CrossRef
Zurück zum Zitat Mittal, S. (2014). Development of phonetic engine for Punjabi language. Masters dissertation, Thapar University Patiala. Mittal, S. (2014). Development of phonetic engine for Punjabi language. Masters dissertation, Thapar University Patiala.
Zurück zum Zitat Mittal, T., & Sharma, R. K. (2016). Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm. Turkish Journal of Electrical Engineering & Computer Sciences, 24(6), 4790–4803.CrossRef Mittal, T., & Sharma, R. K. (2016). Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm. Turkish Journal of Electrical Engineering & Computer Sciences, 24(6), 4790–4803.CrossRef
Zurück zum Zitat Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., & Schwarz, P. (2010). Subspace gaussian mixture models for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4330–4333. Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., & Schwarz, P. (2010). Subspace gaussian mixture models for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4330–4333.
Zurück zum Zitat Psutka, J., Müller, L., & Psutka, J. V. (2001). Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task. 7th European Conference on Speech Communication and Technology (EUROSPEECH 2001), Aalborg, Denmark, pp. 1813–1816. Psutka, J., Müller, L., & Psutka, J. V. (2001). Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task. 7th European Conference on Speech Communication and Technology (EUROSPEECH 2001), Aalborg, Denmark, pp. 1813–1816.
Zurück zum Zitat Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Metadaten
Titel
A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers
verfasst von
Virender Kadyan
Archana Mantri
R. K. Aggarwal
Publikationsdatum
09.08.2017
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2017
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-017-9446-9

Weitere Artikel der Ausgabe 4/2017

International Journal of Speech Technology 4/2017 Zur Ausgabe

Neuer Inhalt