Skip to main content
Top

2019 | OriginalPaper | Chapter

Speech Recognition Using Novel Diatonic Frequency Cepstral Coefficients and Hybrid Neuro Fuzzy Classifier

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech recognition is the ability of the machine to identify spoken words and classify them into appropriate category. First stage in the process of speech recognition is the extraction of appropriate features from the recorded words. We propose a novel algorithm for feature extraction using diatonic frequency cepstral coefficients. Diatonic frequencies are derived from a musical scale called as diatonic scale. The scale is based on harmonics of sound and models nonlinear behavior of human auditory filter. After feature extraction, the next classification stage uses a hybrid classifier using artificial neural network and fuzzy logic. If the difference between prediction values available at the output of the neural network is less, the classifier matches wrong patterns. Proposed algorithm overcomes this drawback using fuzzy logic. Proposed hybrid classifier improves the recognition rate significantly over existing classifiers. Test bed used in the experimentation focuses on Marathi language. It is the native language spoken in the state of Maharashtra.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Gupta D, Bansal P, Choudhary K (2018) The state of the art of feature extraction techniques in speech recognition. In: Agrawal S, Devi A, Wason R, Bansal P (eds) Speech and language processing for human-machine communications, vol 664. Advances in intelligent systems and computing. Springer, Singapore, pp 197–207CrossRef Gupta D, Bansal P, Choudhary K (2018) The state of the art of feature extraction techniques in speech recognition. In: Agrawal S, Devi A, Wason R, Bansal P (eds) Speech and language processing for human-machine communications, vol 664. Advances in intelligent systems and computing. Springer, Singapore, pp 197–207CrossRef
2.
go back to reference Lin Y, Abdulla WH (2015) Principles of psychoacoustics. Audio watermark. Springer, Cham, pp 15–49CrossRef Lin Y, Abdulla WH (2015) Principles of psychoacoustics. Audio watermark. Springer, Cham, pp 15–49CrossRef
3.
go back to reference Shanon BJ, Paliwal KK (2003) A comparative study of filter bank spacing for speech recognition. In: Microelectronic engineering research conference, Brisbane, pp 1–3 Shanon BJ, Paliwal KK (2003) A comparative study of filter bank spacing for speech recognition. In: Microelectronic engineering research conference, Brisbane, pp 1–3
4.
go back to reference Hsieh SH, Lu CS, Pei SC (2013) Sparse fast fourier transform by downsampling. In: IEEE International conference on acoustics, Vancouver, pp 5637–5641 Hsieh SH, Lu CS, Pei SC (2013) Sparse fast fourier transform by downsampling. In: IEEE International conference on acoustics, Vancouver, pp 5637–5641
5.
go back to reference Bhavsar H, Trivedi J (2018) Image based sign language recognition using neuro fuzzy approach. Int J Sci Res Comput Sci, Eng Inform Technol, IJSRCSEIT 3:487–491 Bhavsar H, Trivedi J (2018) Image based sign language recognition using neuro fuzzy approach. Int J Sci Res Comput Sci, Eng Inform Technol, IJSRCSEIT 3:487–491
6.
go back to reference Gaikwad S, Gawali B, Mehrotra S (2013) Creation of Marathi speech corpus for automatic speech recognition. In: Conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE), Gurgaon, pp 1–5 Gaikwad S, Gawali B, Mehrotra S (2013) Creation of Marathi speech corpus for automatic speech recognition. In: Conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE), Gurgaon, pp 1–5
7.
go back to reference Gedam YK, Magare SS, Dabhade AC, Deshmukh RR (2014) Development of automatic speech recognition of Marathi numerals. Int J Eng Innovative Technol (IJEIT) 3:198–203 Gedam YK, Magare SS, Dabhade AC, Deshmukh RR (2014) Development of automatic speech recognition of Marathi numerals. Int J Eng Innovative Technol (IJEIT) 3:198–203
8.
go back to reference Qasim M, Nawaz S, Hussain S, Habib T (2016) Urdu speech recognition system for district names of Pakistan. In: Conference of the oriental chapter of international committee for coordination and standardization of speech databases and assessment technique, Bali, pp 28–32 Qasim M, Nawaz S, Hussain S, Habib T (2016) Urdu speech recognition system for district names of Pakistan. In: Conference of the oriental chapter of international committee for coordination and standardization of speech databases and assessment technique, Bali, pp 28–32
9.
go back to reference Wang D, Tang Z, Tang D, Chen Q (2016) A Chinese-English Mixlingual database and a speech recognition baseline. In: Conference of the oriental chapter of international committee for coordination and standardization of speech databases and assessment technique, Bali, pp 84–88 Wang D, Tang Z, Tang D, Chen Q (2016) A Chinese-English Mixlingual database and a speech recognition baseline. In: Conference of the oriental chapter of international committee for coordination and standardization of speech databases and assessment technique, Bali, pp 84–88
10.
go back to reference Li W, Hu X, Gravina R, Fortino G (2017) A neuro-fuzzy fatigue tracking and classification system for wheelchair users. IEEE Access 5:19420–19431CrossRef Li W, Hu X, Gravina R, Fortino G (2017) A neuro-fuzzy fatigue tracking and classification system for wheelchair users. IEEE Access 5:19420–19431CrossRef
11.
go back to reference Diago L, Kitaoka T, Hagiwara I, Kambayashi T (2011) Neuro-fuzzy quantification of personal perceptions of facial images based on a limited dataset. IEEE Trans Neural Networks 22:2422–2432CrossRef Diago L, Kitaoka T, Hagiwara I, Kambayashi T (2011) Neuro-fuzzy quantification of personal perceptions of facial images based on a limited dataset. IEEE Trans Neural Networks 22:2422–2432CrossRef
12.
go back to reference Tailor JH, Shah DB (2018) HMM based light weight speech recognition system for gujarati language. In: Mishra D, Nayak M, Joshi A (eds) Information and communication technology for sustainable development. Lecture notes in networks and systems, vol 10. Springer, Singapore Tailor JH, Shah DB (2018) HMM based light weight speech recognition system for gujarati language. In: Mishra D, Nayak M, Joshi A (eds) Information and communication technology for sustainable development. Lecture notes in networks and systems, vol 10. Springer, Singapore
13.
go back to reference Samudravijaya K, Ahuja R, Bondale N, Jose T, Krishnan S, Poddar P, Raveendran R (1998) A feature based hierarchical speech recognition system for Hindi. Sadhana. 23:313–340CrossRef Samudravijaya K, Ahuja R, Bondale N, Jose T, Krishnan S, Poddar P, Raveendran R (1998) A feature based hierarchical speech recognition system for Hindi. Sadhana. 23:313–340CrossRef
14.
go back to reference Sneha V, Hardhika G, JeevaPriya K, Gupta D (2018) Isolated Kannada speech recognition using HTK-A detailed approach. In: Saeed K, Chaki N, Pati B, Bakshi S, Mohapatra D (eds) Process in advanced computing and intelligent engineering. Advances in intelligent systems and computing, vol 564. Springer, Singapore Sneha V, Hardhika G, JeevaPriya K, Gupta D (2018) Isolated Kannada speech recognition using HTK-A detailed approach. In: Saeed K, Chaki N, Pati B, Bakshi S, Mohapatra D (eds) Process in advanced computing and intelligent engineering. Advances in intelligent systems and computing, vol 564. Springer, Singapore
15.
go back to reference Dalmiya CP, Dharun VS, Rajesh KP, (2013) An efficient method for tamil speech recognition using MFCC and DTW mobile applications. In: IEEE conference on information and communication technologies, Jeju Island, pp 1263–1268 Dalmiya CP, Dharun VS, Rajesh KP, (2013) An efficient method for tamil speech recognition using MFCC and DTW mobile applications. In: IEEE conference on information and communication technologies, Jeju Island, pp 1263–1268
16.
go back to reference Gaikwad S, Gawali B, Yannawar P (2010) A review on speech recognition technique. Int J Comput App 3:16–24 Gaikwad S, Gawali B, Yannawar P (2010) A review on speech recognition technique. Int J Comput App 3:16–24
17.
go back to reference Ganoun A, Almerhag I (2012) Performance analysis of spoken arabic digits recognition techniques. J Electron Sci Technol 10:153–157 Ganoun A, Almerhag I (2012) Performance analysis of spoken arabic digits recognition techniques. J Electron Sci Technol 10:153–157
18.
go back to reference Jalil M, Butt FA, Malik A (2013) Short time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals. In: The international conference on technological advances in electrical, electronics and computer engineering (TAEECE), Konya, pp 208–212 Jalil M, Butt FA, Malik A (2013) Short time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals. In: The international conference on technological advances in electrical, electronics and computer engineering (TAEECE), Konya, pp 208–212
19.
go back to reference Kondhalkar H, Mukherji P (2017) A database of Marathi numerals for speech data mining. Int J Adv Res Sci Eng 6:395–399 Kondhalkar H, Mukherji P (2017) A database of Marathi numerals for speech data mining. Int J Adv Res Sci Eng 6:395–399
20.
go back to reference Bai Y, Wang D (2006) Fundamentals of fuzzy logic control-fuzzy sets, fuzzy rules and defuzzifications. In: Bai Y, Zhuang H, Wang D (eds) Advanced fuzzy logic technologies in industrial applications, advances in industrial control. Springer, London, pp 17–36MATH Bai Y, Wang D (2006) Fundamentals of fuzzy logic control-fuzzy sets, fuzzy rules and defuzzifications. In: Bai Y, Zhuang H, Wang D (eds) Advanced fuzzy logic technologies in industrial applications, advances in industrial control. Springer, London, pp 17–36MATH
Metadata
Title
Speech Recognition Using Novel Diatonic Frequency Cepstral Coefficients and Hybrid Neuro Fuzzy Classifier
Authors
Himgauri Kondhalkar
Prachi Mukherji
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-00665-5_76