Skip to main content

Advertisement

Log in

Confusion analysis in phoneme based speech recognition in Hindi

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Phoneme recognition is an essential step in the development of a speech recognition system (SRS), as phonemes are fundamental building blocks in a spoken language. This research work aimed to present phoneme recognition with systematic confusion analysis for the Hindi language. The accuracy of phoneme recognition is the foundation for developing an efficient SRS. Therefore, the systematic confusion analysis for phoneme recognition is essential to improve speech recognition performance. Experiments conducted on Continuous Hindi speech corpus for phoneme recognition with speaker-dependent mode using Hidden Markov Model (HMM) based tool kit HTK. Feature extraction technique Perceptual Linear Predictive Coefficient (PLP) was used with five states Monophones HMM model. Tests were performed for exploring the recognition of Hindi vowels and consonants. Confusion matrices were presented for both vowels and consonants with analysis and possible solutions. During systematic analysis, the vowels were divided into front, middle, and back vowels while consonants were categorized based on place of articulation and manner of articulation. Research findings show that some Hindi phonemes have significant effects on speech recognition. The investigations also reveal that some Hindi phonemes are mostly confused, and some phonemes have more deletions and insertions. The research further demonstrates that the words made of less number of phonemes show more insertion errors. It was also found that most of the Hindi sentences end with some specific words. These particular words can be used to reduce the search place in language modeling for improving speech recognition. The research findings can be utilized to enhance the performance of the speech recognition system by selecting suitable feature extraction techniques and classification techniques for phonemes. The outcome of the research can also be used to develop improved pronunciation dictionaries and designing the text for developing phonetically balanced speech corpus for improvement in speech recognition. Experimental results show an average corrected recognition score of 70% for vowel class and consonant categories, the maximum average corrected recognition score of 94% was obtained with palatal sounds, and the lowest average corrected recognition score of 54% was achieved with liquid sounds. The comparative analysis of the presented work was made to similar existing works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Agarwal A, Jain A, Prakash N, Agrawal SS (2010) Word-based emotion conversion in the Hindi language. In: Computer science and information technology (ICCSIT), 2010 3rd IEEE international conference on Vol 9, pp 419–423

  • Aggarwal RK, Dave M (2012) Integration of multiple acoustic and language models for improved Hindi speech recognition system. Int J Speech Technol 15(2):165–180

    Article  Google Scholar 

  • Alotaibi YA (2012) Comparing ANN to HMM in implementing limited Arabic vocabulary ASR systems. Int J Speech Technol 15(1):25–32

    Article  Google Scholar 

  • Amami R, Ellouze N (2015) Study of phonemes confusions in the hierarchical automatic phoneme recognition system. arXiv preprint arXiv 1508:01718

    Google Scholar 

  • Audhkhasi K, Osoba O, Kosko B (2013) Noisy hidden Markov models for speech recognition. In: The 2013 international joint conference on neural networks (IJCNN). IEEE, pp 1–6

  • Bailey TM, Hahn U (2005) Phoneme similarity and confusability. J Mem Lang 52(3):339–362

    Article  Google Scholar 

  • Balyan A, Agrawal SS, Dev A (2009) Development of database for speech synthesizer in Hindi language using festvox. In: Proceedings of joint conference of the ACL-NLP of the Asian Federation of Natural Language Processing, Singapore, pp 1–4

  • Bansal P, Dev A, Jain SB (2008) Optimum HMM combined with vector quantization for Hindi speech word recognition. IETE J Res 54(4):239–243

    Article  Google Scholar 

  • Ben J, Wan WG, Yu XQ (2003) Phoneme-based speaker-independent English command recognition. J Shanghai Univ (English Edition) 7(2):163–167

    Article  Google Scholar 

  • Bhatt S, Jain A, Dev A (2017) Hindi speech recognition: issues and challenges. In: International conference on computing for sustainable global development, India Com 2017. IEEE Conference ID: 40353, pp 2719–2723. http://bvicam.ac.in/news/INDIACom%202017%20Proceedings/Main/papers/936.pdf

  • Bhatt S, Dev A, Jain A (2018) Hindi speech vowel recognition using hidden markov model. In: Proc. the 6th intl. workshop on spoken language technologies for under-resourced languages, pp 196–199

  • Birjandi P, Salmani-Nodoushan MA (2005) An introduction to phonetics. Zabankadeh Publications, Tehran

    Google Scholar 

  • Biswas A, Sahu PK, Bhowmick A, Chandra M (2014) Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition. WSEAS Trans Syst 13:130–143

    Google Scholar 

  • Bourlard H, Kamp Y, Wellekens C (1985) Speaker dependent connected speech recognition via phonetic Markov models. In: Acoustics, speech, and signal processing, IEEE international conference on ICASSP'85, vol 10. IEEE, pp 1213–1216

  • Dev A, Agrawal SS, Choudhury DR (2003) Categorization of Hindi phonemes by neural networks. AI Soc 17(3–4):375–382

    Article  Google Scholar 

  • Digalakis V, Ostendorf M, Rohlicek JR (1989) Improvements in the stochastic segment model for phoneme recognition. In:  Proceedings of the workshop on speech and natural language. Association for computational linguistics, pp 332–338

  • Dua M, Aggarwal RK, Biswas M (2018) Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3499-9

    Article  Google Scholar 

  • Du Preez M (2009) Fast accurate diphone-based phoneme recognition (Doctoral dissertation, Stellenbosch: University of Farooq, O Datta S, and Shrotriya MC (2010) Wavelet sub-band based temporal features for robust Hindi phoneme recognition. Int J Wavel Multiresolut Inf Process 8(06):847–859 (Stellenbosch)

  • Farooq O, Datta S, Shrotriya MC (2010) Wavelet sub-band based temporal features for robust Hindi phoneme recognition. Int J Wavel Multiresolut Inf Process 8(06):847–859

    Article  Google Scholar 

  • Frank S, Wang NJC (1998) Phonetic modeling in the Philips Chinese continuous-speech recognition system. Proc. ISCSLP. Vol 98. 2

  • Fredj IB, Kaïs OUNI (2014) Phoneme recognition using hidden markov models. Int J Control Energy Electr Eng 1:57–61

    Article  Google Scholar 

  • Goldwater S, Jurafsky D, Manning CD (2010) Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Commun 52(3):181–200

    Article  Google Scholar 

  • Hämäläinen A, De Veth J, Boves L (2005) Longer-length acoustic units for continuous speech recognition. In: 2005 13th European signal processing conference. IEEE, pp 1–4

  • Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE trans speech audio process 2(4):578–589

    Article  Google Scholar 

  • Jain A, Agrawal SS, Prakash N (2011) Transformation of emotion based on acoustic features of intonation patterns for Hindi speech and their perception. IETE J Res 57(4):318–324

    Article  Google Scholar 

  • Juang BH, Rabiner LR (1991) Hidden Markov models for speech recognition. Technometrics 33(3):251–272

    Article  MathSciNet  Google Scholar 

  • Kachru Y (2006) Hindi, vol 12. John Benjamins Publishing, Amsterdam

    Book  Google Scholar 

  • Karpagavalli S, Chandra E (2015) Phoneme and word-based model for Tamil speech recognition using GMM-HMM. In: Advanced computing and communication systems, 2015 international conference on. IEEE. pp 1–5

  • Kavya BM, Chakrasali SV (2015) Performance analysis of MFCC and LPC techniques in Kannada phoneme recognition. Int J Adv Electr Power Syst Inf Technol 1(2):21–25

    Google Scholar 

  • Khwaja MK, Vikash P, Arulmozhivarman P, Lui S (2016) Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model. Int J Speech Technol 19(4):895–905

    Article  Google Scholar 

  • Kimanuka UA, Buyuk O (2018) Turkish speech recognition based on deep neural networks. Süleyman Demirel Üniversitesi Fen Bilim Enstitüsü Derg 22:319. https://doi.org/10.19113/sdufbed.12798

    Article  Google Scholar 

  • Koolagudi SG, Thakur SN, Barthwal A, Singh MK, Rawat R, Sreenivasa Rao K (2012) Vowel recognition from telephonic speech using MFCCs and Gaussian mixture models. In: Mathew J, Patra P, Pradhan DK, Kuttyamma AJ (eds) Eco-friendly computing and communication systems. ICECCS 2012. Communications in computer and information science, vol 305. Springer, Berlin, Heidelberg, pp 170–177. https://doi.org/10.1007/978-3-642-32112-2_21

    Chapter  Google Scholar 

  • Kotwal MRA, Hassan F, Muhammad G, Huda MN (2011) Tandem MLNs based phonetic feature extraction for phoneme recognition. Int J Comput Inf Syst Ind Manag Appl 3:88–95

    Google Scholar 

  • Kumar M, Rajput N, Verma A (2004) A large-vocabulary continuous speech recognition system for Hindi. IBM J Res Dev 48(56):703–715

    Article  Google Scholar 

  • Kurian C (2018) Automatic speech recognition of plosive phonetic class words in the Malayalam language. Int J Comput Sci Mobile Appl 6:213–216

    Google Scholar 

  • Laleye FA, Ezin EC, Motamed C (2016) Automatic fongbe phoneme recognition from the spoken speech signal. In: Proceedings of the 13th international conference on informatics in control, automation and robotics. SCITEPRESS-Science and Technology Publications, Lda. pp 102–109

  • Lawrence R, Juang BH (1993) Fundamental of speech recognition. Prentice-hall International, New Jersey

    Google Scholar 

  • Lee CH, Juang BH, Soong FK, Rabiner LR (1989) Word recognition using whole word and subword models. IEEE Int Conf Acoust Speech Signal Process (ICASSP) 1:683–686. https://doi.org/10.1109/icassp.1989.266519

    Article  Google Scholar 

  • Liu DR, Chen KY, Lee HY, Lee LS (2018) Completely unsupervised phoneme recognition by adversarially learning mapping relationships from audio embeddings. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH, pp 3748–3752. https://doi.org/10.21437/Interspeech.2018-1800

  • Livescu K, Fosler-Lussier E, Metze F (2012) Subword modeling for automatic speech recognition: past, present, and emerging approaches. IEEE Signal Process Mag 29(6):44–57

    Article  Google Scholar 

  • Lopes LC, Perdigao F (2011) Phoneme recognition on the TIMIT database. In: Speech technologies InTech.

  • Magimai-Doss M, Bengio S, Bourlard H (2004) Joint decoding for phoneme-grapheme continuous speech recognition. In: Acoustics, speech, and signal processing, 2004. Proceedings. (ICASSP'04). IEEE international conference on. vol 1, IEEE, pp I–177

  • Makhoul J, Schwartz R (1995) State of the art in continuous speech recognition. Proc Natl Acad Sci 92(22):9956–9963

    Article  Google Scholar 

  • Meyer BT, Jürgens T, Wesker T, Brand T, Kollmeier B (2010) Human phoneme recognition depending on speech-intrinsic variability. J Acoust Soc Am 128(5):3126–3141

    Article  Google Scholar 

  • Mishra S, Bhowmick A, Shrotriya MC (2016) Hindi vowel classification using QCN-MFCC features. Perspect Sci 8:28–31

    Article  Google Scholar 

  • Mukherjee N, Rajput N, Subramaniam LV, Verma A (2000) On deriving a phoneme model for a new language. In: Sixth international conference on spoken language processing

  • Nahar KM, Shquier MA, Al-Khatib WG, Al-Muhtaseb H, Elshafei M (2016) Arabic phonemes recognition using a hybrid LVQ/HMM model for continuous speech recognition. Int J Speech Technol 19(3):495–508

    Article  Google Scholar 

  • Ney H, Paeseler A (1988) Phoneme-based continuous speech recognition results for different language models in the 1000-word spicos system. Speech Commun 7(4):367–373

    Article  Google Scholar 

  • Official site of HTK toolkit (2017) Available: https://htk.eng.cam.ac.uk. Accessed 2 Feb 2017

  • Ohala M (1983) Aspects of Hindi phonology, vol 2. Motilal Banarsidass Publisher, Delhi

    Google Scholar 

  • Palia N, Kant S, Dev A (2019) Performance evaluation of speaker recognition system. J Discret Math Sci Cryptogr 22(2):203–218

    Article  Google Scholar 

  • Patil V, Rao P (2011) Acoustic features for detection of aspirated stops. In: Communications (NCC), 2011 national conference on. IEEE. pp 1–5

  • Patil VV, Rao P (2016) Detection of phonemic aspiration for spoken Hindi pronunciation evaluation. J Phon 54:202–221

    Article  Google Scholar 

  • Pruthi T, Saksena S, Das PK (2000) Swaranjali: isolated word recognition for Hindi language using VQ and HMM. In: International conference on multimedia processing and systems (ICMPS), pp 13–15

  • Reddy DR (1967) Computer recognition of connected speech. J Acoust Soc Am 42(2):329–347

    Article  Google Scholar 

  • Romdhani S (2015) Implementation of dnn-hmm acoustic models for phoneme recognition. Master’s thesis. http://hdl.handle.net/10012/9061

  • Sadhukhan T, Bansal S, Kumar A (2017) Automatic Identification of Spoken Language. IOSR 2278:84–89

    Article  Google Scholar 

  • Samudravijaya K, Murthy HA (2012) Indian language speech sound label set (ILSL12), 2012 developed by Indian Language TTS Consortium & ASR Consortium retrieved from https://www.iitm.ac.in/donlab/tts/downloads/cls/cls_v2.1.6.pdf. Accessed 2 Feb 2017

  • Schwartz R, Chow Y, Roucos S, Krasner M, Makhoul J (1984) Improved hidden Markov modeling of phonemes for continuous speech recognition. In Acoustics, speech, and signal processing, IEEE international conference on ICASSP'84, vol 9. IEEE, pp 21–24

  • Senior A, Sak H, Shafran I (2015) Context-dependent phone models for LSTM RNN acoustic modeling. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4585–4589

  • Shastri JC (1939) Cambridge Hindi grammar. Sharda Mandir Delhi

  • Shipra, Chandra M (2016) Hindi vowel classification using QCN-PNCC features. Indian J Sci Technol 9(38):1–8. https://doi.org/10.17485/ijst/2016/v9i38/102972

    Article  Google Scholar 

  • Sinha S, Agrawal SS, Jain A (2013) Continuous density hidden markov model for context dependent Hindi speech recognition. In: Advances in computing, communications and informatics (ICACCI), 2013 international conference on. IEEE, pp 1953–1958

  • Sitaram RNV, Sreenivas TV (1994) Phoneme recognition in continuous speech using large inhomogeneous hidden Markov models. In: Acoustics, speech, and signal processing, 1994. ICASSP-94, 1994 IEEE international conference on, vol 1, pp I–41

  • Song W, Cai J (2015) End-to-end deep neural network for automatic speech recognition. Standford CS224D Reports

  • Vu NT, Schultz T (2009) Vietnamese large vocabulary continuous speech recognition. In: Automatic speech recognition and understanding, 2009. ASRU 2009. IEEE workshop on. IEEE, pp 333–338

  • Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Valtchev V (2002) The HTK book, vol 3. Cambridge University Engineering Department, Cambridge, p 175

    Google Scholar 

  • Yu J, Xie L, Xiao X, Chng ES (2017) A hybrid neural network hidden Markov model approach for automatic story segmentation. J Ambient Intell Human Comput 8(6):925–936

    Article  Google Scholar 

  • Zarrouk E, Benayed Y (2016) Hybrid SVM/HMM model for the Arab phonemes recognition. Int Arab J Inf Technol (IAJIT) 13(5)

Download references

Acknowledgements

The authors would like to acknowledge the Ministry of Electronics and Information Technology (MeitY), Government of India, for providing financial assistance for this research work through “Visvesvaraya Ph.D. Scheme for Electronics and IT”. We would also like to convey our special thanks to Director General KIIT, Gurgaon for providing speech database and his valuable guidance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shobha Bhatt.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

See appendix Tables 11, 12, 13, 14, 15, 16, 17.

Table 11 Hindi vowel
Table 12 Hindi nasalized and breathy counterpart of vowel
Table 13 Hindi borrowed vowel
Table 14 Hindi consonants
Table 15 Hindi consonant (semivowels and fricatives)
Table 16 Hindi conjunct consonant
Table 17 Hindi Phoneme confusion matrix

Appendix B

See Table 17.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhatt, S., Dev, A. & Jain, A. Confusion analysis in phoneme based speech recognition in Hindi. J Ambient Intell Human Comput 11, 4213–4238 (2020). https://doi.org/10.1007/s12652-020-01703-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-01703-x

Keywords

Navigation