Skip to main content
Erschienen in: International Journal of Speech Technology 3/2016

22.07.2016

Analysis and modeling of acoustic information for automatic dialect classification

verfasst von: S. S. Agrawal, Aruna Jain, Shweta Sinha

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A primary challenge in the field of automatic speech recognition is to understand and create acoustic models to represent individual differences in their spoken language. Individual’s age, gender; their speaking styles influenced by their dialect may be few of the reasons for these differences. This work investigates the dialectal differences by measuring the analysis of variance of acoustic features such as, formant frequencies, pitch, pitch slope, duration and intensity for vowel sounds. This paper attempts to discuss methods to capture dialect specific knowledge through vocal tract and prosody information extracted from speech that can be utilized for automatic identification of dialects. Kernel based support vector machine is utilized for measuring the dialect discriminating ability of acoustic features. For the spectral feature shifted delta cepstral coefficients along with Mel frequency cepstral coefficients gives a recognition performance of 66.97 %. Combination of prosodic features performs better with a classification score of 74 %. The model is further evaluated for the combination of spectral and prosodic feature set and achieves a classification accuracy of 88.77 %. The proposed model is compared with the human perception of dialects. The overall work is based on four dialects of Hindi; one of the world’s major languages.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Adank, P., Van Hout, R., & Van de Velde, H. (2007). An acoustic description of the vowels of northern and southern standard Dutch II: Regional vari- etiesa. The Journal of the Acoustical Society of America, 121(2), 1130–1141.CrossRef Adank, P., Van Hout, R., & Van de Velde, H. (2007). An acoustic description of the vowels of northern and southern standard Dutch II: Regional vari- etiesa. The Journal of the Acoustical Society of America, 121(2), 1130–1141.CrossRef
Zurück zum Zitat Aggarwal, R. K., & Dave, M. (2012). Integration of multiple acoustic and language mod- els for improved Hindi speech recognition system. International Journal of Speech Technology, 15(2), 165–180.CrossRef Aggarwal, R. K., & Dave, M. (2012). Integration of multiple acoustic and language mod- els for improved Hindi speech recognition system. International Journal of Speech Technology, 15(2), 165–180.CrossRef
Zurück zum Zitat Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.CrossRef Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.CrossRef
Zurück zum Zitat Barkat, M., Ohala, J., & Pellegrino, F. (1999). Prosody as a distinctive feature for the discrimination of Arabic dialects. EUROSPEECH, 99, 395–398. Barkat, M., Ohala, J., & Pellegrino, F. (1999). Prosody as a distinctive feature for the discrimination of Arabic dialects. EUROSPEECH, 99, 395–398.
Zurück zum Zitat Biadsy, F., Hirschberg, J. B. & Ellis, D. P. W. (2011). Dialect and accent recognition using phonetic-segmentation supervectors. In INTERSPEECH (pp. 752–756). Biadsy, F., Hirschberg, J. B. & Ellis, D. P. W. (2011). Dialect and accent recognition using phonetic-segmentation supervectors. In INTERSPEECH (pp. 752–756).
Zurück zum Zitat Cho, T., & Keating, P. A. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean. Journal of Phonetics, 29(2), 155–190.CrossRef Cho, T., & Keating, P. A. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean. Journal of Phonetics, 29(2), 155–190.CrossRef
Zurück zum Zitat Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable-based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.CrossRef Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable-based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.CrossRef
Zurück zum Zitat Grover, C., Jamieson, D. G., & Dobrovolsky, M. B. (1987). Intonation in English, French and German: perception and production. Language and Speech, 30(3), 277–295. Grover, C., Jamieson, D. G., & Dobrovolsky, M. B. (1987). Intonation in English, French and German: perception and production. Language and Speech, 30(3), 277–295.
Zurück zum Zitat Hamdi, R., Barkat-Defradas, M., Ferragne, E. & Pellegrino, F. (2004). Speech Timing and Rhythmic structure in Arabic dialects: A comparison of two approaches. In INTERSPEECH (Vol. 4, pp. 1613–1616). Hamdi, R., Barkat-Defradas, M., Ferragne, E. & Pellegrino, F. (2004). Speech Timing and Rhythmic structure in Arabic dialects: A comparison of two approaches. In INTERSPEECH (Vol. 4, pp. 1613–1616).
Zurück zum Zitat Hanani, A., Russell, M. J., & Carey, M. J. (2013). Human and computer recognition of regional accents and ethnic groups from British English speech. Computer Speech & Language, 27(1), 59–74.CrossRef Hanani, A., Russell, M. J., & Carey, M. J. (2013). Human and computer recognition of regional accents and ethnic groups from British English speech. Computer Speech & Language, 27(1), 59–74.CrossRef
Zurück zum Zitat Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In Contemporary computing (pp. 485–492). Springer. Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In Contemporary computing (pp. 485–492). Springer.
Zurück zum Zitat Kulshreshtha, M., & Mathur, R. (2012). Dialect accent features for establishing speaker identity: A case study. New York: Springer.CrossRef Kulshreshtha, M., & Mathur, R. (2012). Dialect accent features for establishing speaker identity: A case study. New York: Springer.CrossRef
Zurück zum Zitat Kumar, M., Rajput, N., & Verma, A. (2004). A large-vocabulary continuous speech recognition system for Hindi. IBM journal of research and development, 48(5.6), 703–715.CrossRef Kumar, M., Rajput, N., & Verma, A. (2004). A large-vocabulary continuous speech recognition system for Hindi. IBM journal of research and development, 48(5.6), 703–715.CrossRef
Zurück zum Zitat Lazaridis, A., Goldman, J.-P., Avanzi, M. & Garner, P. N. (2014). Syllable-based Regional Swiss French Accent Identification using Prosodic Features. In Nouveaux cahiers de linguistique francaise, Number EPFL-CONF-199821. Lazaridis, A., Goldman, J.-P., Avanzi, M. & Garner, P. N. (2014). Syllable-based Regional Swiss French Accent Identification using Prosodic Features. In Nouveaux cahiers de linguistique francaise, Number EPFL-CONF-199821.
Zurück zum Zitat Ljolje, Andrej, & Fallside, Frank. (1987). Recognition of isolated prosodic patterns using Hidden Markov models. Computer Speech & Language, 2(1), 27–34.CrossRef Ljolje, Andrej, & Fallside, Frank. (1987). Recognition of isolated prosodic patterns using Hidden Markov models. Computer Speech & Language, 2(1), 27–34.CrossRef
Zurück zum Zitat Mishra, D. & Bali, K (2011). A comparative phonological study of the dialects of Hindi. In Proceedings of ICPhS XVII, Hong Kong (pp. 17–21) Mishra, D. & Bali, K (2011). A comparative phonological study of the dialects of Hindi. In Proceedings of ICPhS XVII, Hong Kong (pp. 17–21)
Zurück zum Zitat Pandey, P. K. (1989). Word accentuation in Hindi. Lingua, 77(1), 37–73.CrossRef Pandey, P. K. (1989). Word accentuation in Hindi. Lingua, 77(1), 37–73.CrossRef
Zurück zum Zitat Peters, J., Gilles, P., Auer, P., & Selting, M. (2002). Identification of regional varieties by intonational cues: An experimental study on Hamburg and Berlin German. Language and Speech, 45(2), 115–138.CrossRef Peters, J., Gilles, P., Auer, P., & Selting, M. (2002). Identification of regional varieties by intonational cues: An experimental study on Hamburg and Berlin German. Language and Speech, 45(2), 115–138.CrossRef
Zurück zum Zitat Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice hall.MATH Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice hall.MATH
Zurück zum Zitat Raman, S. (1985). Speech recognition of Hindi stop consonants. PhD thesis, Indian Institute of Technology, Madras, 1985. Raman, S. (1985). Speech recognition of Hindi stop consonants. PhD thesis, Indian Institute of Technology, Madras, 1985.
Zurück zum Zitat Rao, P. V. S. (1993). VOICE: An integrated speech recognition synthesis system for the Hindi language. Speech Communication, 13(1), 197–205.CrossRef Rao, P. V. S. (1993). VOICE: An integrated speech recognition synthesis system for the Hindi language. Speech Communication, 13(1), 197–205.CrossRef
Zurück zum Zitat Rao, K. S., & Koolagudi, S. G. (2012). Emotion recognition using speech features. New York: Springer.MATH Rao, K. S., & Koolagudi, S. G. (2012). Emotion recognition using speech features. New York: Springer.MATH
Zurück zum Zitat Robinson, A. J. (1989). Dynamic error propagation networks. PhD thesis, University of Cambridge. Robinson, A. J. (1989). Dynamic error propagation networks. PhD thesis, University of Cambridge.
Zurück zum Zitat Sekhar, C. C., & Yegnanarayana, B. (2002). A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Transactions on Speech and Audio Processing, 10(7), 472–480.CrossRef Sekhar, C. C., & Yegnanarayana, B. (2002). A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Transactions on Speech and Audio Processing, 10(7), 472–480.CrossRef
Zurück zum Zitat Sinha, S., Agrawal, S. S. & Jain, A. (2013) Dialectal influences on acoustic duration of Hindi phonemes. In Conference on Asian spoken language research and evaluation (O- COCOSDA/CASLRE) (pp. 1–5). IEEE, Piscataway. Sinha, S., Agrawal, S. S. & Jain, A. (2013) Dialectal influences on acoustic duration of Hindi phonemes. In Conference on Asian spoken language research and evaluation (O- COCOSDA/CASLRE) (pp. 1–5). IEEE, Piscataway.
Zurück zum Zitat Sinha, S., Jain, A., & Agrawal, S. S. (2015). Fusion of multi-stream speech features for dialect classification. CSI Transactions on ICT, 2(4), 243–252.CrossRef Sinha, S., Jain, A., & Agrawal, S. S. (2015). Fusion of multi-stream speech features for dialect classification. CSI Transactions on ICT, 2(4), 243–252.CrossRef
Zurück zum Zitat Sreenivasa, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256.CrossRef Sreenivasa, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256.CrossRef
Zurück zum Zitat Torres-Carrasquillo, P.A., Gleason, T. P. & Reynolds, D. A. (2004). Dialect identification using Gaussian mixture models. In ODYSSEY 04-the speaker and language recognition workshop (pp. 297–300). Torres-Carrasquillo, P.A., Gleason, T. P. & Reynolds, D. A. (2004). Dialect identification using Gaussian mixture models. In ODYSSEY 04-the speaker and language recognition workshop (pp. 297–300).
Zurück zum Zitat Wells, J. C. (1982). Accents of English (Vol. 1). Cambridge: Cambridge University Press.CrossRef Wells, J. C. (1982). Accents of English (Vol. 1). Cambridge: Cambridge University Press.CrossRef
Zurück zum Zitat Yan, Q. & Vaseghi, S. (2003). Analysis, modelling and synthesis of formants of British, American and Australian accents”. In Proceeding acoustics, speech, and signal processing (Vol. 1, pp. I–712). IEEE, Piscataway. Yan, Q. & Vaseghi, S. (2003). Analysis, modelling and synthesis of formants of British, American and Australian accents”. In Proceeding acoustics, speech, and signal processing (Vol. 1, pp. I–712). IEEE, Piscataway.
Zurück zum Zitat Zheng, D. C., Dyke, D., Berryman, F., Morgan, C., & Dang Cong. (2012). A new approach to acoustic analysis of two British regional accents: Birmingham and Liverpool accents. International Journal of Speech Technology, 15(2), 77–85.CrossRef Zheng, D. C., Dyke, D., Berryman, F., Morgan, C., & Dang Cong. (2012). A new approach to acoustic analysis of two British regional accents: Birmingham and Liverpool accents. International Journal of Speech Technology, 15(2), 77–85.CrossRef
Metadaten
Titel
Analysis and modeling of acoustic information for automatic dialect classification
verfasst von
S. S. Agrawal
Aruna Jain
Shweta Sinha
Publikationsdatum
22.07.2016
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-016-9351-7

Weitere Artikel der Ausgabe 3/2016

International Journal of Speech Technology 3/2016 Zur Ausgabe

Neuer Inhalt