Skip to main content
Erschienen in: International Journal of Speech Technology 1/2016

30.11.2015

MFCC-GMM based accent recognition system for Telugu speech signals

verfasst von: Kasiprasad Mannepalli, Panyam Narahari Sastry, Maloji Suman

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. Identification of the accent before the speech recognition can improve performance of the speech recognition systems. If the number of accents is more in a language, the accent recognition becomes crucial. Telugu is an Indian language which is widely spoken in Southern part of India. Telugu language has different accents. The main accents are coastal Andhra, Telangana, and Rayalaseema. In this present work the samples of speeches are collected from the native speakers of different accents of Telugu language for both training and testing. In this work, Mel frequency cepstral coefficients (MFCC) features are extracted for each speech of both training and test samples. In the next step Gaussian mixture model (GMM) is used for classification of the speech based on accent. The overall efficiency of the proposed system to recognize the speaker, about the region he belongs, based on accent is 91 %.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwal, R. K., & Dave, M. (2011). Using Gaussian mixtures for Hindi speech recognition system. International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 157–170. Aggarwal, R. K., & Dave, M. (2011). Using Gaussian mixtures for Hindi speech recognition system. International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 157–170.
Zurück zum Zitat Beek, B., Neuberg, E., Hodge, D. (1977) An assessment of the technology of automatic speech recognition for military applications. IEEE transactions on acoustics speech and signal processing, ASSP-25 (pp 310–322). Beek, B., Neuberg, E., Hodge, D. (1977) An assessment of the technology of automatic speech recognition for military applications. IEEE transactions on acoustics speech and signal processing, ASSP-25 (pp 310–322).
Zurück zum Zitat Bricker, P. D., et al. (1971). Statistical techniques for talker identification. Bell System Technical Journal, 50, 1427–1454.CrossRef Bricker, P. D., et al. (1971). Statistical techniques for talker identification. Bell System Technical Journal, 50, 1427–1454.CrossRef
Zurück zum Zitat Eriksson, T., Kim, S., Kang, H.-G., & Lee, C. (2005). An information-theoretic perspective on feature selection in speaker recognition. IEEE Signal Processing Letters, 12(7), 500–503.CrossRef Eriksson, T., Kim, S., Kang, H.-G., & Lee, C. (2005). An information-theoretic perspective on feature selection in speaker recognition. IEEE Signal Processing Letters, 12(7), 500–503.CrossRef
Zurück zum Zitat Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., Precoda, K. (2014) Lexical stress classification for language learning using spectral and segmental features. ICASSP-14 (pp. 7754–7758). Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., Precoda, K. (2014) Lexical stress classification for language learning using spectral and segmental features. ICASSP-14 (pp. 7754–7758).
Zurück zum Zitat Kun, L. I., & Jia, L. I. U. (2010). English sentence accent detection based on auditory features. Beijing: Tsinghua Tongfang Knowledge Network Technology Co., Ltd. Kun, L. I., & Jia, L. I. U. (2010). English sentence accent detection based on auditory features. Beijing: Tsinghua Tongfang Knowledge Network Technology Co., Ltd.
Zurück zum Zitat Kumar, G. S., Prasad Raju, K. A., Satheesh, P., & Mohan Rao, (2010). Speaker recognition using GMM. International Journal of Engineering Science and Technology, 2(6), 2428–2436. Kumar, G. S., Prasad Raju, K. A., Satheesh, P., & Mohan Rao, (2010). Speaker recognition using GMM. International Journal of Engineering Science and Technology, 2(6), 2428–2436.
Zurück zum Zitat Li, K. P., & Hughes, G. W. (1974). Talker differences as they appear in correlation matrices of continuous speech spectra. The Journal of the Acoustical Society of America., 55, 833–837.CrossRef Li, K. P., & Hughes, G. W. (1974). Talker differences as they appear in correlation matrices of continuous speech spectra. The Journal of the Acoustical Society of America., 55, 833–837.CrossRef
Zurück zum Zitat Li, Q., & Huang, Y. (2011). An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1791–1801.CrossRef Li, Q., & Huang, Y. (2011). An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1791–1801.CrossRef
Zurück zum Zitat Liu, M., Xu, B., Hunng, T., Deng, Y., & Li, C. (2000) Mandarin accent adaptation based on context independent/Context-dependent pronunciation modeling. In Proceedings of the acoustics, speech, and signal processing, ICASSP ‘00 (pp: II1025–II1028). Washington, DC: IEEE Computer Society. Liu, M., Xu, B., Hunng, T., Deng, Y., & Li, C. (2000) Mandarin accent adaptation based on context independent/Context-dependent pronunciation modeling. In Proceedings of the acoustics, speech, and signal processing, ICASSP ‘00 (pp: II1025–II1028). Washington, DC: IEEE Computer Society.
Zurück zum Zitat Luoh, L., Su, Y.-Z., & Hsu, C.-F. (2010) Speech signal processing based emotion recognition. International Conference on System Science and Engineering, IEEE Conference (pp. 487–490). Luoh, L., Su, Y.-Z., & Hsu, C.-F. (2010) Speech signal processing based emotion recognition. International Conference on System Science and Engineering, IEEE Conference (pp. 487–490).
Zurück zum Zitat Mandal, S. K. D., Gupta, B., & Datta, A. K. (2007). Word boundary detection based on supra segmental features: A case study on Bangla speech. International Journal of Speech Technology, 9(1–2), 17–28.CrossRef Mandal, S. K. D., Gupta, B., & Datta, A. K. (2007). Word boundary detection based on supra segmental features: A case study on Bangla speech. International Journal of Speech Technology, 9(1–2), 17–28.CrossRef
Zurück zum Zitat Ma, Zichen, & Fokoué, Ernest. (2014). A comparison of classifiers in performing speaker accent recognition using MFCCs. Open Journal of Statistics, 4, 258–266.CrossRef Ma, Zichen, & Fokoué, Ernest. (2014). A comparison of classifiers in performing speaker accent recognition using MFCCs. Open Journal of Statistics, 4, 258–266.CrossRef
Zurück zum Zitat Malhotra, Kamini, & Khosla, Anu. (2013). Impact of regional Indian accents on spoken Hindi, Asian spoken language research and evaluation (O-COCOSDA/CASLRE). International Conference, 01(2013), 1–4. doi:10.1109/ICSDA.2013.6709876. Malhotra, Kamini, & Khosla, Anu. (2013). Impact of regional Indian accents on spoken Hindi, Asian spoken language research and evaluation (O-COCOSDA/CASLRE). International Conference, 01(2013), 1–4. doi:10.​1109/​ICSDA.​2013.​6709876.
Zurück zum Zitat Mannepalli, K., Sastry, P. N., Rajesh, V. (2014) Modellling and analysis of accent based recognition and speaker identification system, ARPN Journal of Engineering and Applied Sciences, 9(12), ISSN: 1819-6608. Mannepalli, K., Sastry, P. N., Rajesh, V. (2014) Modellling and analysis of accent based recognition and speaker identification system, ARPN Journal of Engineering and Applied Sciences, 9(12), ISSN: 1819-6608.
Zurück zum Zitat Meena, K., Subramanian, U., & Muthusamy, G. (2013). Gender classification in speech recognition using fuzzy logic and neural network. The International Arab Journal of Information Technology, 10(5), 477–485. Meena, K., Subramanian, U., & Muthusamy, G. (2013). Gender classification in speech recognition using fuzzy logic and neural network. The International Arab Journal of Information Technology, 10(5), 477–485.
Zurück zum Zitat Mermelstein, P., & Davis, S. (1980). Comparison of parametric representation for mono syllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustic Speech and Signal Processing, 28(4), 357–366.CrossRef Mermelstein, P., & Davis, S. (1980). Comparison of parametric representation for mono syllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustic Speech and Signal Processing, 28(4), 357–366.CrossRef
Zurück zum Zitat Nidhyananthan, S. S., & Kumari, R. S. S. (2013). Language and text-independent speaker identification system using GMM. WSEAS Transactions on Signal Processing, 9(4), 185–194. Nidhyananthan, S. S., & Kumari, R. S. S. (2013). Language and text-independent speaker identification system using GMM. WSEAS Transactions on Signal Processing, 9(4), 185–194.
Zurück zum Zitat Nelwamondo, F. V., & Marwala, T. (2006), Faults detection using gaussian mixture models, mel-frequency cepstral coefficients and kurtosis. IEEE International Conference on Systems, Man, and Cybernetics October 8–11, Taipei. 1-4244-0100-3/06: pp. 290–295 (Print). Nelwamondo, F. V., & Marwala, T. (2006), Faults detection using gaussian mixture models, mel-frequency cepstral coefficients and kurtosis. IEEE International Conference on Systems, Man, and Cybernetics October 8–11, Taipei. 1-4244-0100-3/06: pp. 290–295 (Print).
Zurück zum Zitat Rao, K. S., & Koolagudi, S. G. (2011) Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systems, Cybernetics and Informatics, 9(4). ISSN: 1690-4524. Rao, K. S., & Koolagudi, S. G. (2011) Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systems, Cybernetics and Informatics, 9(4). ISSN: 1690-4524.
Zurück zum Zitat Singh, N., Khan, R. A., & Shree, R. (2012). MFCC and prosodic feature extraction techniques: A comparative study (0975– 8887). International Journal of Computer Applications, 54(1), 9–13.CrossRef Singh, N., Khan, R. A., & Shree, R. (2012). MFCC and prosodic feature extraction techniques: A comparative study (0975– 8887). International Journal of Computer Applications, 54(1), 9–13.CrossRef
Zurück zum Zitat Yan, Q., & Vaseghi, S. (2002) A comparative study of UK and US english accents in recognition and synthesis. IEEE international conference on acoustics, speech, and signal processing (ICASSP, 2002) (pp. 413–416). doi: 10.1109/ICASSP.2002.5745496. Yan, Q., & Vaseghi, S. (2002) A comparative study of UK and US english accents in recognition and synthesis. IEEE international conference on acoustics, speech, and signal processing (ICASSP, 2002) (pp. 413–416). doi: 10.​1109/​ICASSP.​2002.​5745496.
Metadaten
Titel
MFCC-GMM based accent recognition system for Telugu speech signals
verfasst von
Kasiprasad Mannepalli
Panyam Narahari Sastry
Maloji Suman
Publikationsdatum
30.11.2015
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9328-y

Weitere Artikel der Ausgabe 1/2016

International Journal of Speech Technology 1/2016 Zur Ausgabe

Neuer Inhalt