Skip to main content

2019 | OriginalPaper | Buchkapitel

The Representation of Speech in Deep Neural Networks

verfasst von : Odette Scharenborg, Nikki van der Gouw, Martha Larson, Elena Marchiori

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naïve, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 25th International Conference on Neural Information Processing Systems (NIPS 2012), vol. 1, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 25th International Conference on Neural Information Processing Systems (NIPS 2012), vol. 1, pp. 1097–1105 (2012)
2.
Zurück zum Zitat van den Oord, A., Dieleman, S., Schrauwen, B.: Deep content-based music recommendation. In: 26th International Conference on Neural Information Processing Systems (NIPS 2013), vol. 2, pp. 2643–2651 (2013) van den Oord, A., Dieleman, S., Schrauwen, B.: Deep content-based music recommendation. In: 26th International Conference on Neural Information Processing Systems (NIPS 2013), vol. 2, pp. 2643–2651 (2013)
3.
Zurück zum Zitat Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), pp. 1725–1732 (2014) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), pp. 1725–1732 (2014)
4.
Zurück zum Zitat Wan, J., et al.: Deep learning for content-based image retrieval: a comprehensive study. In: 22nd ACM International Conference on Multimedia (MM 2014), pp. 157–166 (2014) Wan, J., et al.: Deep learning for content-based image retrieval: a comprehensive study. In: 22nd ACM International Conference on Multimedia (MM 2014), pp. 157–166 (2014)
5.
Zurück zum Zitat Juneja, A.: A comparison of automatic and human speech recognition in null grammar. J. Acoust. Soc. Am. 131(3), EL256–EL261 (2012)CrossRef Juneja, A.: A comparison of automatic and human speech recognition in null grammar. J. Acoust. Soc. Am. 131(3), EL256–EL261 (2012)CrossRef
7.
Zurück zum Zitat Rauber, P.E., Fadel, S.G., Falcão, A.X., Telea, A.C.: Visualizing the hidden activity of artificial neural networks. IEEE Trans. Vis. Comput. Graph. 23(1), 101–110 (2017)CrossRef Rauber, P.E., Fadel, S.G., Falcão, A.X., Telea, A.C.: Visualizing the hidden activity of artificial neural networks. IEEE Trans. Vis. Comput. Graph. 23(1), 101–110 (2017)CrossRef
8.
Zurück zum Zitat Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)MATH Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)MATH
9.
Zurück zum Zitat Mohamed, A.-R., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), pp. 4273–4276 (2012) Mohamed, A.-R., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), pp. 4273–4276 (2012)
10.
Zurück zum Zitat Scharenborg, O., Tiesmeyer, S., Hasegawa-Johnson, M., Dehak, N.: Visualizing phoneme category adaptation in deep neural networks. In: Interspeech (2018) Scharenborg, O., Tiesmeyer, S., Hasegawa-Johnson, M., Dehak, N.: Visualizing phoneme category adaptation in deep neural networks. In: Interspeech (2018)
11.
Zurück zum Zitat Oostdijk, N.H.J., et al.: Experiences from the spoken Dutch corpus project. In: Third International Conference on Language Resources and Evaluation, (LREC 2002), pp. 340–347 (2002) Oostdijk, N.H.J., et al.: Experiences from the spoken Dutch corpus project. In: Third International Conference on Language Resources and Evaluation, (LREC 2002), pp. 340–347 (2002)
12.
Zurück zum Zitat Mohamed, A.-R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20, 14–22 (2012)CrossRef Mohamed, A.-R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20, 14–22 (2012)CrossRef
13.
Zurück zum Zitat Zeiler, M.D., et al.: On rectified linear units for speech processing. In: 2013 IEEE Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 3517–3521 (2013) Zeiler, M.D., et al.: On rectified linear units for speech processing. In: 2013 IEEE Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 3517–3521 (2013)
14.
Zurück zum Zitat McQueen, J.M., Cutler, A., Norris, D.: Phonological abstraction in the mental lexicon. Cogn. Sci. 30(6), 1113–1126 (2006)CrossRef McQueen, J.M., Cutler, A., Norris, D.: Phonological abstraction in the mental lexicon. Cogn. Sci. 30(6), 1113–1126 (2006)CrossRef
Metadaten
Titel
The Representation of Speech in Deep Neural Networks
verfasst von
Odette Scharenborg
Nikki van der Gouw
Martha Larson
Elena Marchiori
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-05716-9_16