Skip to main content
Top

2017 | OriginalPaper | Chapter

Language Identification Using Deep Convolutional Recurrent Neural Networks

Authors : Christian Bartz, Tom Herold, Haojin Yang, Christoph Meinel

Published in: Neural Information Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Language Identification (LID) systems are used to classify the spoken language from a given audio sample and are typically the first step for many spoken language processing tasks, such as Automatic Speech Recognition (ASR) systems. Without automatic language detection, speech utterances cannot be parsed correctly and grammar rules cannot be applied, causing subsequent speech recognition steps to fail. We propose a LID system that solves the problem in the image domain, rather than the audio domain. We use a hybrid Convolutional Recurrent Neural Network (CRNN) that operates on spectrogram images of the provided audio snippets. In extensive experiments we show, that our model is applicable to a range of noisy scenarios and can easily be extended to previously unknown languages, while maintaining its classification accuracy. We release our code and a large scale training set for LID systems to the community.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems arXiv:1603.04467 (2016) Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems arXiv:​1603.​04467 (2016)
2.
go back to reference Blackman, R.B., Tukey, J.W.: The measurement of power spectra from the point of view of communications engineering-part I. Bell Labs Tech. J. 37(1), 185–282 (1958)CrossRef Blackman, R.B., Tukey, J.W.: The measurement of power spectra from the point of view of communications engineering-part I. Bell Labs Tech. J. 37(1), 185–282 (1958)CrossRef
4.
go back to reference Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef
5.
go back to reference Gelly, G., Gauvain, J.L., Le, V., Messaoudi, A.: A divide-and-conquer approach for language identification based on recurrent neural networks. In: INTERSPEECH 2016, pp. 3231–3235 (2016) Gelly, G., Gauvain, J.L., Le, V., Messaoudi, A.: A divide-and-conquer approach for language identification based on recurrent neural networks. In: INTERSPEECH 2016, pp. 3231–3235 (2016)
6.
go back to reference Gelly, G., Gauvain, J.L., Lamel, L., Laurent, A., Le, V.B., Messaoudi, A.: Language Recognition for Dialects and Closely Related Languages. Odyssey, Bilbao (2016)CrossRef Gelly, G., Gauvain, J.L., Lamel, L., Laurent, A., Le, V.B., Messaoudi, A.: Language Recognition for Dialects and Closely Related Languages. Odyssey, Bilbao (2016)CrossRef
7.
go back to reference Gonzalez-Dominguez, J., Lopez-Moreno, I., Moreno, P.J., Gonzalez-Rodriguez, J.: Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 64, 49–58 (2015)CrossRef Gonzalez-Dominguez, J., Lopez-Moreno, I., Moreno, P.J., Gonzalez-Rodriguez, J.: Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 64, 49–58 (2015)CrossRef
8.
go back to reference Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456 (2015)
9.
go back to reference Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego (2015)
10.
go back to reference Lozano-Dez, A., Zazo Candil, R., Gonzlez Domnguez, J., Toledano, D.T., Gonzlez-Rodrguez, J.: An end-to-end approach to language identification in short utterances using convolutional neural networks. International Speech and Communication Association (2015) Lozano-Dez, A., Zazo Candil, R., Gonzlez Domnguez, J., Toledano, D.T., Gonzlez-Rodrguez, J.: An end-to-end approach to language identification in short utterances using convolutional neural networks. International Speech and Communication Association (2015)
11.
go back to reference Martnez, D., Plchot, O., Burget, L., Glembek, O., Matjka, P.: Language recognition in ivectors space. In: Twelfth Annual Conference of the International Speech Communication Association (2011) Martnez, D., Plchot, O., Burget, L., Glembek, O., Matjka, P.: Language recognition in ivectors space. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
12.
go back to reference Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010) Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)
13.
go back to reference Plchot, O., Matejka, P., Glembek, O., Fer, R., Novotny, O., Pesan, J., Burget, L., Brummer, N., Cumani, S.: Bat system description for NIST LRE 2015. Odyssey 2016, pp. 166–173 (2016) Plchot, O., Matejka, P., Glembek, O., Fer, R., Novotny, O., Pesan, J., Burget, L., Brummer, N., Cumani, S.: Bat system description for NIST LRE 2015. Odyssey 2016, pp. 166–173 (2016)
14.
go back to reference Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
15.
go back to reference Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates Inc., New York (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates Inc., New York (2015)
16.
go back to reference Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2016)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2016)CrossRef
17.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
18.
go back to reference Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision, pp. 2818–2826 (2016) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision, pp. 2818–2826 (2016)
19.
go back to reference Zazo, R., Lozano-Diez, A., Gonzalez-Dominguez, J., Toledano, D.T., Gonzalez-Rodriguez, J.: Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE 11(1), e0146917 (2016)CrossRef Zazo, R., Lozano-Diez, A., Gonzalez-Dominguez, J., Toledano, D.T., Gonzalez-Rodriguez, J.: Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE 11(1), e0146917 (2016)CrossRef
20.
go back to reference Zissman, M.A., et al.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)CrossRef Zissman, M.A., et al.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)CrossRef
Metadata
Title
Language Identification Using Deep Convolutional Recurrent Neural Networks
Authors
Christian Bartz
Tom Herold
Haojin Yang
Christoph Meinel
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-70136-3_93

Premium Partner