Top

International Journal of Speech Technology

Published in:

16-02-2018

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

Authors: Jyoti Guglani, A. N. Mishra

Published in: International Journal of Speech Technology | Issue 2/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The performance of automatic speech recognition (ASR) system for both monophone and triphone model i.e., tri1, tri2 and tri3 model using N-gram language model is reported. The performance of ASR system were computed in terms of word error rate (WER). A significant reduction in WER was observed using the tri phone model over mono phone model ASR .Also the performance of ASR using tri3 model is improved over tri2 model and the performance of tri2 model is improved over tri1 model ASR. Further, it was found that MFCC feature provides higher speech recognition accuracy than PLP features for continuous Punjabi speech.

previous article Chhattisgarhi speech corpus for research and development in automatic speech recognition

next article Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback–Leibler divergence

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., & Mohri, M. (2007). OpenFst: A general and efficient weighted finitestate transducer library. In Proc. CIAA.

Allen, J. B. (1994). How do humans process and recognize speech. IEEE Transactions on Speech and Audio Processing, 2(4), 567–576.CrossRef

Becerra, A., de la Rosa, J. I., & González, E. (2016). A case study of speech recognition in Spanish: From conventional to deep approach. In IEEE ANDESCON.

Bezoui, M., Elmoutaouakkil, A., & Beni-hssane, A. (2016). Feature extraction of some Quranic recitation using mel-frequency cepstral coeficients (MFCC). In 5th International Conference on Multimedia Computing and Systems ICMCS.

Chen, W., Zhenjiang, M., & Xiao, M. (2009). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(16), 582–589.

Chourasia, V., Samudravijaya, K., Ingle, M., & Chandwani, M. (2007). Hindi speech recognition under noisy conditions. In International Journal of Acoustic Society India (pp. 41–46).

Chow, Y.-L. (1990). Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm. In IEEE 1990 International Conference on Acoustics, Speech, and Signal Processing, 1990 (ICASSP-90) (pp. 701–704). IEEE.

Cosi, P. (2015). A KALDI-DNN-based ASR system for Italian. In International Joint Conference on Neural Networks IJCNN.

Gopinath, R. A. (1998). Maximum likelihood modeling with Gaussian distributions for classification. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, 1998 (Vol. 2, pp. 661–664). IEEE.

Hermansky, H. (1990). Perceptual linear prediction (PLP) analysis of speech. Journal of Acoustic Society America, 87, 1738–1752.CrossRef

Kipyatkova, I., & Karpov, A. (2016). DNN-based acoustic modeling for Russian speech recognition using Kaldi. In International Conference on Speech and Computer SPECOM (pp. 246–253).

Kou, H., & Shang, W. (2014). Parallelized feature extraction and acoustic model training. In Digital Signal Processing. Proceedings ICDSP. IEEE.

Lee, A., Kawahara, T., & Shikano, K. (2001). Julius—An open source realtime, large vocabulary recognition engine. In EUROSPEECH (pp. 1691–1694).

Lippman, R. P. (1997). Speech recognition by machines and humans. Speech Communication, 22, 1–15.CrossRef

Mohri, M., Pereira, F., & Riley, M. (2002). Weighted finite-state transducers in speech recognition. Computer Speech and Language, 20(1), 69–88.CrossRef

Povey, D. (2003). Discriminative training for large vocabulary speech recognition, PhD thesis, Cambridge University Engineering Department.

Povey, D., Gales, M. J. F., Kim, D. Y., & Woodland, P. C. (2003). MMI-MAP and MPE-MAP for acoustic model adaptation. In INTERSPEECH.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., & Silovsky, J. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF192584). IEEE Signal Processing Society.

Povey, D., Hannemann, M., Boulianne, G., Burget, L., Ghoshal, A., Janda, M., Karafit, M., Kombrink, S., Motlek, P., Qian, Y., & Riedhammer, K. (2012). Generating exact lattices in the WFST framework. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4213–4216).

Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., & Visweswariah, K. (2008a). Boosted MMI for model and feature-space discriminative training, In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008 (ICASSP 2008) (pp. 4057–4060). IEEE.

Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., & Visweswariah, K. (2008b). Boosted MMI for model and feature-space discriminative training, In ICASSP.

Povey, D., & Woodland, P. C. (2002). Minimum phone error and ismoothing for improved discriminative training. Cambridge: Cambridge University Engineering Department.

Rabiner, L. R., & Juang, B. H. (2003). Fundamental of speech recognition (1st edn.). Delhi: Pearson Education.MATH

Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Loof, J., Schluter, R., & Ney, H. (2009) The RWTH Aachen University open source speech recognition system. In INTERSPEECH (pp. 2111–2114).

Upadhyaya, P., Farooq, O., Abidi, M. R., & Varshney, P. (2015). Comparative study of visual feature for bimodal Hindi speech recognition. Archives of Acoustics, 40(4), 609–619.CrossRef

Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P., & Woelfel, J. (2004) Sphinx-4: A flexible open source framework for speech recognition. Sun Microsystems Inc., Technical Report SML1 TR20040811.

Yadava, G. T., & Jayanna, H. S. (2016). Development and comparison of ASR models using Kaldi for noisy and enhanced kannada speech. In International Conference on Advances in Computing, Communications and Informatics ICACCI (pp. 635–644).

Young, G., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2009). The HTK book (for version 3.4). Cambridge: Cambridge University Engineering Department.

Title: Continuous Punjabi speech recognition model based on Kaldi ASR toolkit
Authors: Jyoti Guglani
A. N. Mishra
Publication date: 16-02-2018
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 2/2018
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-018-9497-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2018

A multi-tier security system (SAIL) for protecting audio signals from malicious exploits

Combined distributed incremental affine projection algorithm for acoustic echo cancellation

Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback–Leibler divergence

Adaptive framing based similarity measurement between time warped speech signals using Kalman filter

Robust front-end for audio, visual and audio–visual speech classification

Speech recognition with reference to Assamese language using novel fusion technique