Skip to main content
Top

2021 | OriginalPaper | Chapter

Speech Recognition Employing MFCC and Dynamic Time Warping Algorithm

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech has been an integral part of human life acting as one of the five primitive senses of the human body. As such any software or application based upon speech recognition has a high degree of acceptance and a wide range of applications in defense, security, health care, and home automation. Speech is a waffling signal with varying characteristics at a high rate. When examined over a very short scale of time, it can be considered as a stationary signal with very small variations. In this paper, authors have worked upon the detection of a single user using multiple isolated words as speech signals. For designing the system, feature extraction using Mel-frequency cepstral coefficients (MFCCs) and feature matching using dynamic time warping (DTW) are considered as the designing of the system because of its simplicity and efficiency. Short-time spectral analysis is adopted which is the main part of the MFCC algorithm used in feature extraction. To compare any two signals varying in speed or having phase difference between them, DTW is used. Since two spoken words can never be the same, the DTW algorithm is best suited to compare two words.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Afrillia, Y., Mawengkang, H., Ramli, M., & Fhonna, R. P. (2017). Performance measurement of mel frequency ceptral coefficient (MFCC) method in learning system of Al-Qur’an based in nagham pattern recognition. Journal of Physics: Conference Series IOP Publishing., 930(1), 1–6. Afrillia, Y., Mawengkang, H., Ramli, M., & Fhonna, R. P. (2017). Performance measurement of mel frequency ceptral coefficient (MFCC) method in learning system of Al-Qur’an based in nagham pattern recognition. Journal of Physics: Conference Series IOP Publishing., 930(1), 1–6.
go back to reference Anggraeni, D., Sanjaya, W. S. M., Solih, M. Y., & Munawwaroh, M. (2018). The implementation of speech recognition using mel-frequency cepstrum coefficients (MFCC) and support vector machine (SVM) method based on python to control robot arm. Annual Applied Science and Engineering Conference, 2, 1–9. Anggraeni, D., Sanjaya, W. S. M., Solih, M. Y., & Munawwaroh, M. (2018). The implementation of speech recognition using mel-frequency cepstrum coefficients (MFCC) and support vector machine (SVM) method based on python to control robot arm. Annual Applied Science and Engineering Conference, 2, 1–9.
go back to reference Azami, H., Mohammadi, K., Bozorgtabar, B. (2012). An ımproved signal segmentation using moving average and savitzky-golay filter. Journal of Signal & Information Processing, 3, 39–44. Azami, H., Mohammadi, K., Bozorgtabar, B. (2012). An ımproved signal segmentation using moving average and savitzky-golay filter. Journal of Signal & Information Processing, 3, 39–44.
go back to reference Brown, P. F., Lee, C.H., Spohr, J. C. (1983). Bayesian adaptation inspeech recognition. IEEE International Cont on Acoustics, Speech, and Signal Processing, 8, 761–764. Brown, P. F., Lee, C.H., Spohr, J. C. (1983). Bayesian adaptation inspeech recognition. IEEE International Cont on Acoustics, Speech, and Signal Processing, 8, 761–764.
go back to reference Das, B. P., & Parek, R. (2012). Recognition of isolated words using features based on LPC, MFCC, ZCR and STE with neural network classifiers. International Journal of Modern Enginnering Research, 2(3), 854–858. Das, B. P., & Parek, R. (2012). Recognition of isolated words using features based on LPC, MFCC, ZCR and STE with neural network classifiers. International Journal of Modern Enginnering Research, 2(3), 854–858.
go back to reference Dhingra, S., Nijhawan, G., Poonam, Pandit. (2013). Isolated speech recognıtıon usıng MFCC And DTW. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2(8), 4085–4092. Dhingra, S., Nijhawan, G., Poonam, Pandit. (2013). Isolated speech recognıtıon usıng MFCC And DTW. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2(8), 4085–4092.
go back to reference Huang, X., & Lee, K. (1993). On speaker-independent, speaker-dependent and speaker-adpaptive speech recognition. IEEE Transaction on Speech and Audio Processing, 1(2), 150–157.CrossRef Huang, X., & Lee, K. (1993). On speaker-independent, speaker-dependent and speaker-adpaptive speech recognition. IEEE Transaction on Speech and Audio Processing, 1(2), 150–157.CrossRef
go back to reference Mansour, A. H., Salh, G. Z. A., & Mohammed, K. A. (2015). Voice recognition using dynamic time warping and mel-frequency cepstral coefficients algorithms. International Journal of Computer Applications., 116(2), 34–41.CrossRef Mansour, A. H., Salh, G. Z. A., & Mohammed, K. A. (2015). Voice recognition using dynamic time warping and mel-frequency cepstral coefficients algorithms. International Journal of Computer Applications., 116(2), 34–41.CrossRef
go back to reference Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient and dynamic time warping techniques. Journal of Computing., 2(3), 138–143. Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient and dynamic time warping techniques. Journal of Computing., 2(3), 138–143.
go back to reference Plouffe, G., & Cretu, A. M. (2015). Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Transactions on Instrumentation and Measurement, 65(2), 305–316.CrossRef Plouffe, G., & Cretu, A. M. (2015). Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Transactions on Instrumentation and Measurement, 65(2), 305–316.CrossRef
go back to reference Riyaz, S., Bhavani, B. L., & Kumar, S. V. P. (2019). Automatic speaker recognition system in Urdu using MFCC & HMM. International Journal of Recent Technology and Engineering (IJRTE), 7, 109–113. Riyaz, S., Bhavani, B. L., & Kumar, S. V. P. (2019). Automatic speaker recognition system in Urdu using MFCC & HMM. International Journal of Recent Technology and Engineering (IJRTE), 7, 109–113.
go back to reference Shaikh, H., Mesquita, L., Das, S., & Araujo, S. (2017). Recognition of isolated spoken words and numeric using MFCC and DTW. International Journal Engineering Science and Computing., 7(4), 10539–10543. Shaikh, H., Mesquita, L., Das, S., & Araujo, S. (2017). Recognition of isolated spoken words and numeric using MFCC and DTW. International Journal Engineering Science and Computing., 7(4), 10539–10543.
go back to reference Singh, P. K., Kar, A. K., Singh, Y., Kolekar, M. H., Tanwar, S. (2019). Proceedings of ICRIC Recent Innovations in Computing, vol. 597. Springer Nature. Singh, P. K., Kar, A. K., Singh, Y., Kolekar, M. H., Tanwar, S. (2019). Proceedings of ICRIC Recent Innovations in Computing, vol. 597. Springer Nature.
go back to reference Zhao, X., & Wang, D. (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification. IEEE International Conference on Acoustics, Speech and Signal Processing 7204–7208. Zhao, X., & Wang, D. (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification. IEEE International Conference on Acoustics, Speech and Signal Processing 7204–7208.
Metadata
Title
Speech Recognition Employing MFCC and Dynamic Time Warping Algorithm
Authors
Meenakshi Sood
Shruti Jain
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-66218-9_27