nach oben

Erschienen in:

2022 | OriginalPaper | Buchkapitel

Deep Neural Networks for Spoken Language Identification in Short Utterances

verfasst von : Shweta Sinha, S. S. Agrawal

Erschienen in: Artificial Intelligence and Speech Technology

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This work presents the elements of language identification (LID) in small segments created using short duration utterances. For low-resourced languages availability of data itself is a challenge. The paper tries to apply DNN for low resourced language. This paper presents a feed-forward deep neural network (FF-DNN) for language identification using acoustic features of short-time utterances. Two network topologies for DNN have been checked for their performance in LID task. The obtained findings of the experiments are compared to a well-established technique based on i-vector system. This i-vector system uses MFCC-SDC to represent speech feature that represent the acoustic characteristics and the back end is implemented using support vector machine (SVM) that serves as a classifier. These mechanisms were put in place to help with identification of Hindi and Punjabi, two widely spoken Indian languages. The speech utterances are divided into short segments of 5 s, 10 s, 20 s and 35-s duration. The system’s efficiency is measured in EER (%) and for short time segments, a relative improvement of 3% is achieved by the DNN system, whereas the average error rate overall the utterances was decreased by 2% using DNN.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Development of ManiTo: A Manipuri Tonal Contrast Dataset

Nächstes Kapitel A Lightweight Deep Learning Approach for Diabetic Retinopathy Classification

Bansal, P.: Amita dev and Shail Bala Jain, “Automatic speaker identification using Mel-frequency cepstral coefficients.” Pb. Univ. Res. J (Sci.) 59, 165–168 (2009)

Bansal, P., Dev, A., Shail Bala, J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9), 938–942 (2007)

Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)CrossRef

Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)

Poonam, B., Amita, D., Shail, B.J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9) 938–942 (2007)

Kumari, R., Dev, A., Kumar, A.: An efficient adaptive artificial neural network based text to speech synthesizer for Hindi language. Multimedia Tools Appl. 80(16), 24669–24695 (2021). https://doi.org/10.1007/s11042-021-10771-wCrossRef

Pitrelli, J.F., Bakis, R., Eide, E.M., Fernandez, R., Hamza, W., Picheny, M.A.: The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Audio Speech Lang. Process. 14(4), 1099–1108 (2006)CrossRef

Rajesh, M.H., Hema, A.M.: Automatic language identification and discrimination using the modified group delay feature. In: Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, pp. 395–399. IEEE (2005)

Song, Y., Hong, X., Jiang, B., Cui, R., McLoughlin, I., Dai, L-R.: Deep bottleneck network based i-vector representation for language identification. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

10.

Br¨ummer, N., et al.: Description and analysis of the brno276 system for lre2011. In: Odyssey 2012-the speaker and language recognition workshop (2012)

11.

Haizhou, L., Bin, M., Kong, A.L.: Spoken language recognition: from fundamentals to practice. Proc. IEEE 101(5), 1136–1159 (2013)

12.

Lopez-Moreno, I., et al.: Automatic language identification using deep neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5337–5341. IEEE (2014)

13.

Ambikairajah, E., Li, H., Wang, L., Yin, B., Sethu, V.: Lang. Ident. Tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)CrossRef

14.

Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)

15.

Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller Jr, J.R.: Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In: Seventh international conference on spoken language processing (2002)

16.

Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell, W.M., Reynolds, D.A.: Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eighth European Conference on Speech Communication and Technology (2003)

17.

Lopez-Moreno, I., Gonzalez-Dominguez, J., Martinez, D., Plchot, O., Gonzalez-Rodriguez, J., Moreno, P.J.: On the use of deep feedforward neural networks for automatic language identification. Comput. Speech Lang. 40, 46–59 (2016)

18.

Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)CrossRef

19.

Montavon, G.: Deep learning for spoken language identification. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, pp. 1–4. Whistler, Canada (2009)

20.

Sinha, S., Jain, A., Agrawal, S.S.: Empirical analysis of linguistic and paralinguistic information for automatic dialect classification. Artif. Intell. Rev. 51(4), 647–672 (2017). https://doi.org/10.1007/s10462-017-9573-3CrossRef

21.

Watanabe, S., Hori, T., Hershey, J.R.: Language independent end-to-end architecture for joint language identification and speech recognition. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 265–271. IEEE (2017)

Titel: Deep Neural Networks for Spoken Language Identification in Short Utterances
verfasst von: Shweta Sinha
S. S. Agrawal
Verlag: Springer International Publishing
Buch: Artificial Intelligence and Speech Technology
Print ISBN: 978-3-030-95710-0

Electronic ISBN: 978-3-030-95711-7

Copyright-Jahr: 2022
DOI: https://doi.org/10.1007/978-3-030-95711-7_24

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner