nach oben

International Journal of Speech Technology

Erschienen in:

30.11.2015

MFCC-GMM based accent recognition system for Telugu speech signals

verfasst von: Kasiprasad Mannepalli, Panyam Narahari Sastry, Maloji Suman

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. Identification of the accent before the speech recognition can improve performance of the speech recognition systems. If the number of accents is more in a language, the accent recognition becomes crucial. Telugu is an Indian language which is widely spoken in Southern part of India. Telugu language has different accents. The main accents are coastal Andhra, Telangana, and Rayalaseema. In this present work the samples of speeches are collected from the native speakers of different accents of Telugu language for both training and testing. In this work, Mel frequency cepstral coefficients (MFCC) features are extracted for each speech of both training and test samples. In the next step Gaussian mixture model (GMM) is used for classification of the speech based on accent. The overall efficiency of the proposed system to recognize the speaker, about the region he belongs, based on accent is 91 %.

Vorheriger Artikel Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Nächster Artikel Automatic prosodic tone choice classification with Brazil’s intonation model

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Aggarwal, R. K., & Dave, M. (2011). Using Gaussian mixtures for Hindi speech recognition system. International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 157–170.

Beek, B., Neuberg, E., Hodge, D. (1977) An assessment of the technology of automatic speech recognition for military applications. IEEE transactions on acoustics speech and signal processing, ASSP-25 (pp 310–322).

Biadsy, F. (2011), Automatic dialect and accent recognition and its application to speech recognition, A Ph.D. Thesis, Columbia University. http://www.cs.columbia.edu/speech/ThesisFiles/fadi_biadsy.pdf.

Bricker, P. D., et al. (1971). Statistical techniques for talker identification. Bell System Technical Journal, 50, 1427–1454.CrossRef

Eriksson, T., Kim, S., Kang, H.-G., & Lee, C. (2005). An information-theoretic perspective on feature selection in speaker recognition. IEEE Signal Processing Letters, 12(7), 500–503.CrossRef

Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., Precoda, K. (2014) Lexical stress classification for language learning using spectral and segmental features. ICASSP-14 (pp. 7754–7758).

Kumpf, K., & King, R. W. (1996) Automatic accent classification of foreign accented Australian english speech, ICSLP-96 (Vol. 3, pp. 1740–1743). doi: 10.1109/ICSLP.1996.607964.

Kun, L. I., & Jia, L. I. U. (2010). English sentence accent detection based on auditory features. Beijing: Tsinghua Tongfang Knowledge Network Technology Co., Ltd.

Kumar, G. S., Prasad Raju, K. A., Satheesh, P., & Mohan Rao, (2010). Speaker recognition using GMM. International Journal of Engineering Science and Technology, 2(6), 2428–2436.

Li, K. P., & Hughes, G. W. (1974). Talker differences as they appear in correlation matrices of continuous speech spectra. The Journal of the Acoustical Society of America., 55, 833–837.CrossRef

Li, Q., & Huang, Y. (2011). An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1791–1801.CrossRef

Liu, M., Xu, B., Hunng, T., Deng, Y., & Li, C. (2000) Mandarin accent adaptation based on context independent/Context-dependent pronunciation modeling. In Proceedings of the acoustics, speech, and signal processing, ICASSP ‘00 (pp: II1025–II1028). Washington, DC: IEEE Computer Society.

Luoh, L., Su, Y.-Z., & Hsu, C.-F. (2010) Speech signal processing based emotion recognition. International Conference on System Science and Engineering, IEEE Conference (pp. 487–490).

Mandal, S. K. D., Gupta, B., & Datta, A. K. (2007). Word boundary detection based on supra segmental features: A case study on Bangla speech. International Journal of Speech Technology, 9(1–2), 17–28.CrossRef

Ma, Zichen, & Fokoué, Ernest. (2014). A comparison of classifiers in performing speaker accent recognition using MFCCs. Open Journal of Statistics, 4, 258–266.CrossRef

Malhotra, Kamini, & Khosla, Anu. (2013). Impact of regional Indian accents on spoken Hindi, Asian spoken language research and evaluation (O-COCOSDA/CASLRE). International Conference, 01(2013), 1–4. doi:10.1109/ICSDA.2013.6709876.

Mannepalli, K., Sastry, P. N., Rajesh, V. (2014) Modellling and analysis of accent based recognition and speaker identification system, ARPN Journal of Engineering and Applied Sciences, 9(12), ISSN: 1819-6608.

Meena, K., Subramanian, U., & Muthusamy, G. (2013). Gender classification in speech recognition using fuzzy logic and neural network. The International Arab Journal of Information Technology, 10(5), 477–485.

Mermelstein, P., & Davis, S. (1980). Comparison of parametric representation for mono syllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustic Speech and Signal Processing, 28(4), 357–366.CrossRef

Nidhyananthan, S. S., & Kumari, R. S. S. (2013). Language and text-independent speaker identification system using GMM. WSEAS Transactions on Signal Processing, 9(4), 185–194.

Nelwamondo, F. V., & Marwala, T. (2006), Faults detection using gaussian mixture models, mel-frequency cepstral coefficients and kurtosis. IEEE International Conference on Systems, Man, and Cybernetics October 8–11, Taipei. 1-4244-0100-3/06: pp. 290–295 (Print).

Rao, K. S., & Koolagudi, S. G. (2011) Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systems, Cybernetics and Informatics, 9(4). ISSN: 1690-4524.

Singh, N., Khan, R. A., & Shree, R. (2012). MFCC and prosodic feature extraction techniques: A comparative study (0975– 8887). International Journal of Computer Applications, 54(1), 9–13.CrossRef

Yan, Q., & Vaseghi, S. (2002) A comparative study of UK and US english accents in recognition and synthesis. IEEE international conference on acoustics, speech, and signal processing (ICASSP, 2002) (pp. 413–416). doi: 10.1109/ICASSP.2002.5745496.

YunXue, Z., Long, Z., ShiJie, Z., Wei, Z. (2015) Chinese accent detection research based on features structured. International Journal of Hybrid Information Technology, 8(5), 303–316. http://dx.doi.org/10.14257/ijhit.2015.8.5.33.

Titel: MFCC-GMM based accent recognition system for Telugu speech signals
verfasst von: Kasiprasad Mannepalli
Panyam Narahari Sastry
Maloji Suman
Publikationsdatum: 30.11.2015
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 1/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-015-9328-y

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2016

Speech coding using Best Tree Encoding (BTE) technique based on LPC and trigonometric features

Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement

Integration of Yoruba language into MaryTTS

Automatic speech segmentation in syllable centric speech recognition system

A study on the roles of total variability space and session variability modeling in speaker recognition

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.