Skip to main content

2022 | OriginalPaper | Buchkapitel

Deep Neural Networks for Spoken Language Identification in Short Utterances

verfasst von : Shweta Sinha, S. S. Agrawal

Erschienen in: Artificial Intelligence and Speech Technology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work presents the elements of language identification (LID) in small segments created using short duration utterances. For low-resourced languages availability of data itself is a challenge. The paper tries to apply DNN for low resourced language. This paper presents a feed-forward deep neural network (FF-DNN) for language identification using acoustic features of short-time utterances. Two network topologies for DNN have been checked for their performance in LID task. The obtained findings of the experiments are compared to a well-established technique based on i-vector system. This i-vector system uses MFCC-SDC to represent speech feature that represent the acoustic characteristics and the back end is implemented using support vector machine (SVM) that serves as a classifier. These mechanisms were put in place to help with identification of Hindi and Punjabi, two widely spoken Indian languages. The speech utterances are divided into short segments of 5 s, 10 s, 20 s and 35-s duration. The system’s efficiency is measured in EER (%) and for short time segments, a relative improvement of 3% is achieved by the DNN system, whereas the average error rate overall the utterances was decreased by 2% using DNN.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bansal, P.: Amita dev and Shail Bala Jain, “Automatic speaker identification using Mel-frequency cepstral coefficients.” Pb. Univ. Res. J (Sci.) 59, 165–168 (2009) Bansal, P.: Amita dev and Shail Bala Jain, “Automatic speaker identification using Mel-frequency cepstral coefficients.” Pb. Univ. Res. J (Sci.) 59, 165–168 (2009)
2.
Zurück zum Zitat Bansal, P., Dev, A., Shail Bala, J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9), 938–942 (2007) Bansal, P., Dev, A., Shail Bala, J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9), 938–942 (2007)
3.
Zurück zum Zitat Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)CrossRef Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)CrossRef
4.
Zurück zum Zitat Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997) Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
5.
Zurück zum Zitat Poonam, B., Amita, D., Shail, B.J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9) 938–942 (2007) Poonam, B., Amita, D., Shail, B.J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9) 938–942 (2007)
7.
Zurück zum Zitat Pitrelli, J.F., Bakis, R., Eide, E.M., Fernandez, R., Hamza, W., Picheny, M.A.: The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Audio Speech Lang. Process. 14(4), 1099–1108 (2006)CrossRef Pitrelli, J.F., Bakis, R., Eide, E.M., Fernandez, R., Hamza, W., Picheny, M.A.: The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Audio Speech Lang. Process. 14(4), 1099–1108 (2006)CrossRef
8.
Zurück zum Zitat Rajesh, M.H., Hema, A.M.: Automatic language identification and discrimination using the modified group delay feature. In: Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, pp. 395–399. IEEE (2005) Rajesh, M.H., Hema, A.M.: Automatic language identification and discrimination using the modified group delay feature. In: Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, pp. 395–399. IEEE (2005)
9.
Zurück zum Zitat Song, Y., Hong, X., Jiang, B., Cui, R., McLoughlin, I., Dai, L-R.: Deep bottleneck network based i-vector representation for language identification. In: Sixteenth Annual Conference of the International Speech Communication Association (2015) Song, Y., Hong, X., Jiang, B., Cui, R., McLoughlin, I., Dai, L-R.: Deep bottleneck network based i-vector representation for language identification. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
10.
Zurück zum Zitat Br¨ummer, N., et al.: Description and analysis of the brno276 system for lre2011. In: Odyssey 2012-the speaker and language recognition workshop (2012) Br¨ummer, N., et al.: Description and analysis of the brno276 system for lre2011. In: Odyssey 2012-the speaker and language recognition workshop (2012)
11.
Zurück zum Zitat Haizhou, L., Bin, M., Kong, A.L.: Spoken language recognition: from fundamentals to practice. Proc. IEEE 101(5), 1136–1159 (2013) Haizhou, L., Bin, M., Kong, A.L.: Spoken language recognition: from fundamentals to practice. Proc. IEEE 101(5), 1136–1159 (2013)
12.
Zurück zum Zitat Lopez-Moreno, I., et al.: Automatic language identification using deep neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5337–5341. IEEE (2014) Lopez-Moreno, I., et al.: Automatic language identification using deep neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5337–5341. IEEE (2014)
13.
Zurück zum Zitat Ambikairajah, E., Li, H., Wang, L., Yin, B., Sethu, V.: Lang. Ident. Tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)CrossRef Ambikairajah, E., Li, H., Wang, L., Yin, B., Sethu, V.: Lang. Ident. Tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)CrossRef
14.
Zurück zum Zitat Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996) Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)
15.
Zurück zum Zitat Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller Jr, J.R.: Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In: Seventh international conference on spoken language processing (2002) Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller Jr, J.R.: Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In: Seventh international conference on spoken language processing (2002)
16.
Zurück zum Zitat Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell, W.M., Reynolds, D.A.: Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eighth European Conference on Speech Communication and Technology (2003) Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell, W.M., Reynolds, D.A.: Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eighth European Conference on Speech Communication and Technology (2003)
17.
Zurück zum Zitat Lopez-Moreno, I., Gonzalez-Dominguez, J., Martinez, D., Plchot, O., Gonzalez-Rodriguez, J., Moreno, P.J.: On the use of deep feedforward neural networks for automatic language identification. Comput. Speech Lang. 40, 46–59 (2016) Lopez-Moreno, I., Gonzalez-Dominguez, J., Martinez, D., Plchot, O., Gonzalez-Rodriguez, J., Moreno, P.J.: On the use of deep feedforward neural networks for automatic language identification. Comput. Speech Lang. 40, 46–59 (2016)
18.
Zurück zum Zitat Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)CrossRef Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)CrossRef
19.
Zurück zum Zitat Montavon, G.: Deep learning for spoken language identification. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, pp. 1–4. Whistler, Canada (2009) Montavon, G.: Deep learning for spoken language identification. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, pp. 1–4. Whistler, Canada (2009)
21.
Zurück zum Zitat Watanabe, S., Hori, T., Hershey, J.R.: Language independent end-to-end architecture for joint language identification and speech recognition. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 265–271. IEEE (2017) Watanabe, S., Hori, T., Hershey, J.R.: Language independent end-to-end architecture for joint language identification and speech recognition. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 265–271. IEEE (2017)
Metadaten
Titel
Deep Neural Networks for Spoken Language Identification in Short Utterances
verfasst von
Shweta Sinha
S. S. Agrawal
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-95711-7_24

Premium Partner