nach oben

International Journal of Speech Technology

Erschienen in:

22.07.2016

Analysis and modeling of acoustic information for automatic dialect classification

verfasst von: S. S. Agrawal, Aruna Jain, Shweta Sinha

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

A primary challenge in the field of automatic speech recognition is to understand and create acoustic models to represent individual differences in their spoken language. Individual’s age, gender; their speaking styles influenced by their dialect may be few of the reasons for these differences. This work investigates the dialectal differences by measuring the analysis of variance of acoustic features such as, formant frequencies, pitch, pitch slope, duration and intensity for vowel sounds. This paper attempts to discuss methods to capture dialect specific knowledge through vocal tract and prosody information extracted from speech that can be utilized for automatic identification of dialects. Kernel based support vector machine is utilized for measuring the dialect discriminating ability of acoustic features. For the spectral feature shifted delta cepstral coefficients along with Mel frequency cepstral coefficients gives a recognition performance of 66.97 %. Combination of prosodic features performs better with a classification score of 74 %. The model is further evaluated for the combination of spectral and prosodic feature set and achieves a classification accuracy of 88.77 %. The proposed model is compared with the human perception of dialects. The overall work is based on four dialects of Hindi; one of the world’s major languages.

Vorheriger Artikel Audio steganalysis using deep belief networks

Nächster Artikel Integrated acoustic echo and noise suppression in modulation domain

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Adank, P., Van Hout, R., & Van de Velde, H. (2007). An acoustic description of the vowels of northern and southern standard Dutch II: Regional vari- etiesa. The Journal of the Acoustical Society of America, 121(2), 1130–1141.CrossRef

Aggarwal, R. K., & Dave, M. (2012). Integration of multiple acoustic and language mod- els for improved Hindi speech recognition system. International Journal of Speech Technology, 15(2), 165–180.CrossRef

Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.CrossRef

Barkat, M., Ohala, J., & Pellegrino, F. (1999). Prosody as a distinctive feature for the discrimination of Arabic dialects. EUROSPEECH, 99, 395–398.

Biadsy, F., Hirschberg, J. B. & Ellis, D. P. W. (2011). Dialect and accent recognition using phonetic-segmentation supervectors. In INTERSPEECH (pp. 752–756).

Cho, T., & Keating, P. A. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean. Journal of Phonetics, 29(2), 155–190.CrossRef

Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable-based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.CrossRef

Grover, C., Jamieson, D. G., & Dobrovolsky, M. B. (1987). Intonation in English, French and German: perception and production. Language and Speech, 30(3), 277–295.

Hamdi, R., Barkat-Defradas, M., Ferragne, E. & Pellegrino, F. (2004). Speech Timing and Rhythmic structure in Arabic dialects: A comparison of two approaches. In INTERSPEECH (Vol. 4, pp. 1613–1616).

Hanani, A., Russell, M. J., & Carey, M. J. (2013). Human and computer recognition of regional accents and ethnic groups from British English speech. Computer Speech & Language, 27(1), 59–74.CrossRef

Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In Contemporary computing (pp. 485–492). Springer.

Kulshreshtha, M., & Mathur, R. (2012). Dialect accent features for establishing speaker identity: A case study. New York: Springer.CrossRef

Kumar, M., Rajput, N., & Verma, A. (2004). A large-vocabulary continuous speech recognition system for Hindi. IBM journal of research and development, 48(5.6), 703–715.CrossRef

Lazaridis, A., Goldman, J.-P., Avanzi, M. & Garner, P. N. (2014). Syllable-based Regional Swiss French Accent Identification using Prosodic Features. In Nouveaux cahiers de linguistique francaise, Number EPFL-CONF-199821.

Ljolje, Andrej, & Fallside, Frank. (1987). Recognition of isolated prosodic patterns using Hidden Markov models. Computer Speech & Language, 2(1), 27–34.CrossRef

Mishra, D. & Bali, K (2011). A comparative phonological study of the dialects of Hindi. In Proceedings of ICPhS XVII, Hong Kong (pp. 17–21)

Pandey, P. K. (1989). Word accentuation in Hindi. Lingua, 77(1), 37–73.CrossRef

Peters, J., Gilles, P., Auer, P., & Selting, M. (2002). Identification of regional varieties by intonational cues: An experimental study on Hamburg and Berlin German. Language and Speech, 45(2), 115–138.CrossRef

Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice hall.MATH

Raman, S. (1985). Speech recognition of Hindi stop consonants. PhD thesis, Indian Institute of Technology, Madras, 1985.

Rao, P. V. S. (1993). VOICE: An integrated speech recognition synthesis system for the Hindi language. Speech Communication, 13(1), 197–205.CrossRef

Rao, K. S., & Koolagudi, S. G. (2012). Emotion recognition using speech features. New York: Springer.MATH

Rifkin, R. (2008). Multiclass classification. http://www.mit.edu/~9.520/spring09/Classes/. Accessed 20 Sept 2014.

Robinson, A. J. (1989). Dynamic error propagation networks. PhD thesis, University of Cambridge.

Sekhar, C. C., & Yegnanarayana, B. (2002). A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Transactions on Speech and Audio Processing, 10(7), 472–480.CrossRef

Sinha, S., Agrawal, S. S. & Jain, A. (2013) Dialectal influences on acoustic duration of Hindi phonemes. In Conference on Asian spoken language research and evaluation (O- COCOSDA/CASLRE) (pp. 1–5). IEEE, Piscataway.

Sinha, S., Jain, A., & Agrawal, S. S. (2015). Fusion of multi-stream speech features for dialect classification. CSI Transactions on ICT, 2(4), 243–252.CrossRef

Sreenivasa, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256.CrossRef

Torres-Carrasquillo, P.A., Gleason, T. P. & Reynolds, D. A. (2004). Dialect identification using Gaussian mixture models. In ODYSSEY 04-the speaker and language recognition workshop (pp. 297–300).

Wells, J. C. (1982). Accents of English (Vol. 1). Cambridge: Cambridge University Press.CrossRef

Yan, Q. & Vaseghi, S. (2003). Analysis, modelling and synthesis of formants of British, American and Australian accents”. In Proceeding acoustics, speech, and signal processing (Vol. 1, pp. I–712). IEEE, Piscataway.

Zheng, D. C., Dyke, D., Berryman, F., Morgan, C., & Dang Cong. (2012). A new approach to acoustic analysis of two British regional accents: Birmingham and Liverpool accents. International Journal of Speech Technology, 15(2), 77–85.CrossRef

Titel: Analysis and modeling of acoustic information for automatic dialect classification
verfasst von: S. S. Agrawal
Aruna Jain
Shweta Sinha
Publikationsdatum: 22.07.2016
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-016-9351-7

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Suresh Vittal/© Alteryx, Additiv gefertigte Teile/© Marina_Skoropadskaya | Getty Images | iStock, Warnschild "Land unter"/© Bluedesign / Fotolia, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2016

Audio steganalysis using deep belief networks

Robust analysis for improvement of vowel onset point detection under noisy conditions

Integrated acoustic echo and noise suppression in modulation domain

Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition

Corpus based part-of-speech tagging

Assessment of dysarthric speech using Elman back propagation network (recurrent network) for speech recognition

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.