Skip to main content
Top
Published in: International Journal of Speech Technology 2/2018

23-03-2018

Speech recognition with reference to Assamese language using novel fusion technique

Authors: Sruti Sruba Bharali, Sanjib Kr. Kalita

Published in: International Journal of Speech Technology | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper describes the implementation of a speech recognition system in Assamese language. The database for this research work consists of a vocabulary of ten Assamese words. The models for speech recognition have been trained using Hidden Markov Model, Vector Quantization technique and I-vector technique. Two new fusion methods have been proposed in this research study by combining the three techniques.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Balleda, J., Murthy, H. A., & Nagarajan, T. (2000). Language identification from short segments of speech. In Interspeech, Beijing. Balleda, J., Murthy, H. A., & Nagarajan, T. (2000). Language identification from short segments of speech. In Interspeech, Beijing.
go back to reference Bansal, P., Dev, A., & Jain, S. B. (2007). Automatic speaker identification using vector quantization. Asian Journal of Information Technology, 6(9), 938–942. Bansal, P., Dev, A., & Jain, S. B. (2007). Automatic speaker identification using vector quantization. Asian Journal of Information Technology, 6(9), 938–942.
go back to reference Bharali, S. S., & Kalita, S. K. (2015). A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. International Journal of Speech Technology, 18(4), 673–684.CrossRef Bharali, S. S., & Kalita, S. K. (2015). A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. International Journal of Speech Technology, 18(4), 673–684.CrossRef
go back to reference Biswas, S., Rohdin, J., & Shinoda, K. (2014). I-vector selection for effective PLDA modeling in speaker recognition. In Proceedings Odyssey the speaker and language recognition workshop, Brno (pp. 100–105). Biswas, S., Rohdin, J., & Shinoda, K. (2014). I-vector selection for effective PLDA modeling in speaker recognition. In Proceedings Odyssey the speaker and language recognition workshop, Brno (pp. 100–105).
go back to reference Debyeche, M., Haton, J. P., & Houacine, A. (2014). A new vector quantization approach for discrete HMM speech recognition system. International Journal of Computing, 5(1), 72–78.MathSciNet Debyeche, M., Haton, J. P., & Houacine, A. (2014). A new vector quantization approach for discrete HMM speech recognition system. International Journal of Computing, 5(1), 72–78.MathSciNet
go back to reference Dehak, N., Dehak, R., Kenny, P., Brümmer, N., Ouellet, P., & Dumouchel, P. (2009). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, vol. 9. In Interspeech, Brighton. Dehak, N., Dehak, R., Kenny, P., Brümmer, N., Ouellet, P., & Dumouchel, P. (2009). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, vol. 9. In Interspeech, Brighton.
go back to reference Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011a). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19.4, 788–798.CrossRef Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011a). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19.4, 788–798.CrossRef
go back to reference Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D. A., & Dehak, R. (2011b). Language recognition via I-vectors and dimensionality reduction. In Interspeech, Florence (pp. 857–860). Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D. A., & Dehak, R. (2011b). Language recognition via I-vectors and dimensionality reduction. In Interspeech, Florence (pp. 857–860).
go back to reference En-Naimani, Z. A. K. A. R. I., A. E., Lazaar, M. O. H. A. M. E. D., & Ettaouil, M. O. H. A. M. E. D. (2014). Hybrid system of optimal self organizing maps and hidden Markov Model for Arabic digits recognition. WSEAS Transactions on Systems, 13(60), 606–616. En-Naimani, Z. A. K. A. R. I., A. E., Lazaar, M. O. H. A. M. E. D., & Ettaouil, M. O. H. A. M. E. D. (2014). Hybrid system of optimal self organizing maps and hidden Markov Model for Arabic digits recognition. WSEAS Transactions on Systems, 13(60), 606–616.
go back to reference Garcia-Romero, D., & Espy-Wilson, C. Y. (2011). Analysis of I-vector length normalization in speaker recognition systems. In Interspeech, Florence (pp. 249–252). Garcia-Romero, D., & Espy-Wilson, C. Y. (2011). Analysis of I-vector length normalization in speaker recognition systems. In Interspeech, Florence (pp. 249–252).
go back to reference Hassan, F., Khan, M. S. A., Kotwal, M. R. A., & Huda, M. N. (2012). Gender independent bangia automatic speech recognition. In International Conference on Informatics, Electronics & Vision (ICIEV). Hassan, F., Khan, M. S. A., Kotwal, M. R. A., & Huda, M. N. (2012). Gender independent bangia automatic speech recognition. In International Conference on Informatics, Electronics & Vision (ICIEV).
go back to reference Kanagasundaram, A., Vogt, R., Dean, D. B., Sridharan, S., & Mason, M. W. (2011). I-vector based speaker recognition on short utterances. In Proceedings of the 12th annual conference of the international speech communication association. International speech communication association (ISCA), Florence (pp. 2341–2344). Kanagasundaram, A., Vogt, R., Dean, D. B., Sridharan, S., & Mason, M. W. (2011). I-vector based speaker recognition on short utterances. In Proceedings of the 12th annual conference of the international speech communication association. International speech communication association (ISCA), Florence (pp. 2341–2344).
go back to reference Kumar, K., & Aggarwal, R. K. (2011). Hindi speech recognition system using HTK. International Journal of Computing and Business Research, 2(2), 2229–6166. Kumar, K., & Aggarwal, R. K. (2011). Hindi speech recognition system using HTK. International Journal of Computing and Business Research, 2(2), 2229–6166.
go back to reference Kumar, K., Aggarwal, R. K., & Jain, A. (2012). A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering, 1(1), 25–32.CrossRef Kumar, K., Aggarwal, R. K., & Jain, A. (2012). A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering, 1(1), 25–32.CrossRef
go back to reference Kumar, R., & Singh, M. (2011). Spoken isolated word recognition of Punjabi language using dynamic time warp technique. In Information systems for Indian languages. Berlin: Springer (pp. 301–301). Kumar, R., & Singh, M. (2011). Spoken isolated word recognition of Punjabi language using dynamic time warp technique. In Information systems for Indian languages. Berlin: Springer (pp. 301–301).
go back to reference Kurian, C., & Balakrishnan, K. (2009). Speech recognition of Malayalam numbers. In IEEE World Congress on Nature & Biologically Inspired Computing, 2009. NaBIC 2009, Coimbatore (pp. 1475–1479). Kurian, C., & Balakrishnan, K. (2009). Speech recognition of Malayalam numbers. In IEEE World Congress on Nature & Biologically Inspired Computing, 2009. NaBIC 2009, Coimbatore (pp. 1475–1479).
go back to reference Matějka, P., Glembek, O., Castaldo, F., Alam, M. J., Plchot, O., Kenny, P., & Černocky, J. (2011). Full-covariance UBM and heavy-tailed PLDA in I-vector speaker verification. In IEEE International conference on acoustics, speech and signal processing (ICASSP) IEEE, Prague (pp. 4828). Matějka, P., Glembek, O., Castaldo, F., Alam, M. J., Plchot, O., Kenny, P., & Černocky, J. (2011). Full-covariance UBM and heavy-tailed PLDA in I-vector speaker verification. In IEEE International conference on acoustics, speech and signal processing (ICASSP) IEEE, Prague (pp. 4828).
go back to reference Misra, D. D., Dutta, K., Bhattacharjee, U., Sarma, K. K., & Goswami, P. K. (2015). Assamese vowel speech recognition using GMM and ANN approaches. In Recent trends in intelligent and emerging systems (pp. 163–170). New Delhi: Springer. Misra, D. D., Dutta, K., Bhattacharjee, U., Sarma, K. K., & Goswami, P. K. (2015). Assamese vowel speech recognition using GMM and ANN approaches. In Recent trends in intelligent and emerging systems (pp. 163–170). New Delhi: Springer.
go back to reference Muslima, U., & Islam, M. B. (2014). Experimental framework for mel-scaled LP based Bangla speech recognition. In 2013 IEEE 16th international conference on computer and information technology (ICCIT), Khulna (pp. 56–59). Muslima, U., & Islam, M. B. (2014). Experimental framework for mel-scaled LP based Bangla speech recognition. In 2013 IEEE 16th international conference on computer and information technology (ICCIT), Khulna (pp. 56–59).
go back to reference Pruthi, T., Saksena, S., & Das, P. K. (2000). Swaranjali: Isolated word recognition for Hindi language using VQ and HMM. In international conference on multimedia processing and systems (ICMPS), Chennai. Pruthi, T., Saksena, S., & Das, P. K. (2000). Swaranjali: Isolated word recognition for Hindi language using VQ and HMM. In international conference on multimedia processing and systems (ICMPS), Chennai.
go back to reference Rabiner, L. R. (1989). A tutorial on hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef Rabiner, L. R. (1989). A tutorial on hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef
go back to reference Rabiner, L. R., & Juang, B. H. (1986). An introduction to hidden Markov Models. IEEE ASSP Magazine, 3(1), 4–16.CrossRef Rabiner, L. R., & Juang, B. H. (1986). An introduction to hidden Markov Models. IEEE ASSP Magazine, 3(1), 4–16.CrossRef
go back to reference Rabiner, L. R., Levinson, S. E., & Sondhi, M. M. (1983). On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition. Bell System Technical Journal, 62(4), 1075–1105.CrossRef Rabiner, L. R., Levinson, S. E., & Sondhi, M. M. (1983). On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition. Bell System Technical Journal, 62(4), 1075–1105.CrossRef
go back to reference Razavi, M., Rasipuram, R., & Magimai-Doss, M. (2014). On modeling context-dependent clustered states: Comparing HMM/GMM, hybrid HMM/ANN and KL-HMM approaches. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) , New York (pp. 7659–7663). Razavi, M., Rasipuram, R., & Magimai-Doss, M. (2014). On modeling context-dependent clustered states: Comparing HMM/GMM, hybrid HMM/ANN and KL-HMM approaches. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) , New York (pp. 7659–7663).
go back to reference Senoussaoui, M., Kenny, P., Dehak, N., & Dumouchel, P. (2010). An I-vector extractor suitable for speaker recognition with both microphone and telephone speech. In Odyssey, Brno. Senoussaoui, M., Kenny, P., Dehak, N., & Dumouchel, P. (2010). An I-vector extractor suitable for speaker recognition with both microphone and telephone speech. In Odyssey, Brno.
go back to reference Sharma, M., & Sarma, K. K. (2015). Dialectal Assamese vowel speech detection using acoustic phonetic features, KNN and RNN. In 2015 IEEE 2nd international conference on signal processing and integrated networks (SPIN), Noida (pp. 674–678). Sharma, M., & Sarma, K. K. (2015). Dialectal Assamese vowel speech detection using acoustic phonetic features, KNN and RNN. In 2015 IEEE 2nd international conference on signal processing and integrated networks (SPIN), Noida (pp. 674–678).
go back to reference Soong, F. K., Rosenberg, A. E., Juang, B. H., & Rabiner, L. R. (1987). Report: A vector quantization approach to speaker recognition. AT&T Technical Journal, 66(2), 14–26.CrossRef Soong, F. K., Rosenberg, A. E., Juang, B. H., & Rabiner, L. R. (1987). Report: A vector quantization approach to speaker recognition. AT&T Technical Journal, 66(2), 14–26.CrossRef
go back to reference Verma, P., & Das, P. K. (2015). i-Vectors in speech processing applications: A survey. International Journal of Speech Technology, 18(4), 529–546.CrossRef Verma, P., & Das, P. K. (2015). i-Vectors in speech processing applications: A survey. International Journal of Speech Technology, 18(4), 529–546.CrossRef
go back to reference Zarrouk, E., Ayed, Y. B., & Gargouri, F. (2014). Hybrid continuous speech recognition systems by HMM, MLP and SVM: A comparative study. International Journal of Speech Technology, 17(3), 223–233.CrossRef Zarrouk, E., Ayed, Y. B., & Gargouri, F. (2014). Hybrid continuous speech recognition systems by HMM, MLP and SVM: A comparative study. International Journal of Speech Technology, 17(3), 223–233.CrossRef
go back to reference Zeinali, H., Sameti, H., & Burget, L. (2017). HMM-based phrase-independent I-vector extractor for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7), 1421–1435.CrossRef Zeinali, H., Sameti, H., & Burget, L. (2017). HMM-based phrase-independent I-vector extractor for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7), 1421–1435.CrossRef
Metadata
Title
Speech recognition with reference to Assamese language using novel fusion technique
Authors
Sruti Sruba Bharali
Sanjib Kr. Kalita
Publication date
23-03-2018
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2018
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9501-1

Other articles of this Issue 2/2018

International Journal of Speech Technology 2/2018 Go to the issue