Skip to main content

2015 | OriginalPaper | Buchkapitel

12. Representation Sharing and Transfer in Deep Neural Networks

verfasst von : Dong Yu, Li Deng

Erschienen in: Automatic Speech Recognition

Verlag: Springer London

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We have emphasized in the previous chapters that in deep neural networks (DNNs) each hidden layer is a new representation of the raw input to the DNN. The representation at higher layers is more abstract than that at lower layers. In this chapter, we show that these feature representations can be shared and transferred across related tasks through techniques such as multitask and transfer learning. We will use multilingual and crosslingual speech recognition as the main example, which uses a shared-hidden-layer DNN architecture, to demonstrate these techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Association, I.P., et al.: Report on the 1989 Kiel convention. J. Int. Phonetic Assoc. 19(2), 67–80 (1989)CrossRef Association, I.P., et al.: Report on the 1989 Kiel convention. J. Int. Phonetic Assoc. 19(2), 67–80 (1989)CrossRef
2.
Zurück zum Zitat Athineos, M., Ellis, D.P.: Frequency-domain linear prediction for temporal features. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 261–266 (2003) Athineos, M., Ellis, D.P.: Frequency-domain linear prediction for temporal features. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 261–266 (2003)
4.
Zurück zum Zitat Chen, D., Mak, B., Leung, C.C., Sivadas, S.: Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014) Chen, D., Mak, B., Leung, C.C., Sivadas, S.:  Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)
5.
Zurück zum Zitat Chen, T., Rao, R.R.: Audio-visual integration in multimodal communication. Proc. IEEE 86(5), 837–852 (1998)CrossRefMathSciNet Chen, T., Rao, R.R.: Audio-visual integration in multimodal communication. Proc. IEEE 86(5), 837–852 (1998)CrossRefMathSciNet
6.
Zurück zum Zitat Chibelushi, C.C., Deravi, F., Mason, J.S.: A review of speech-based bimodal recognition. Multimedia IEEE Trans. 4(1), 23–37 (2002) Chibelushi, C.C., Deravi, F., Mason, J.S.: A review of speech-based bimodal recognition. Multimedia IEEE Trans. 4(1), 23–37 (2002)
7.
Zurück zum Zitat Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRef Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRef
8.
Zurück zum Zitat Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. Multimedia IEEE Trans. 2(3), 141–151 (2000)CrossRef Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. Multimedia IEEE Trans. 2(3), 141–151 (2000)CrossRef
9.
Zurück zum Zitat Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 347–354 (1997) Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 347–354 (1997)
10.
Zurück zum Zitat Garofolo, J.S.: Darpa Timit: Acoustic-Phonetic Continuous Speech Corps CD-ROM. US Department of Commerce, National Institute of Standards and Technology, Gaithersburg (1993) Garofolo, J.S.: Darpa Timit: Acoustic-Phonetic Continuous Speech Corps CD-ROM. US Department of Commerce, National Institute of Standards and Technology, Gaithersburg (1993)
11.
Zurück zum Zitat Ghoshal, A., Swietojanski, P., Renals, S.: Multilingual training of deep-neural netowrks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013) Ghoshal, A., Swietojanski, P., Renals, S.: Multilingual training of deep-neural netowrks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
12.
Zurück zum Zitat Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M., Devin, M., Dean, J.: Multilingual acoustic models using distributed deep neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013) Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M., Devin, M., Dean, J.: Multilingual acoustic models using distributed deep neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
13.
Zurück zum Zitat Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738 (1990)CrossRef Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738 (1990)CrossRef
14.
Zurück zum Zitat Hinton, G., Deng, L., Yu, D., Dahl, G.E.,Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012) Hinton, G., Deng, L., Yu, D., Dahl, G.E.,Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
15.
Zurück zum Zitat Huang, J., Kingsbury, B.: Audio-visual deep learning for noise robust speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7596–7599 (2013) Huang, J., Kingsbury, B.: Audio-visual deep learning for noise robust speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7596–7599 (2013)
16.
Zurück zum Zitat Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013) Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
17.
Zurück zum Zitat Kim, M.W., Ryu, J.W., Kim, E.J.: Speech recognition by integrating audio, visual and contextual features based on neural networks. Advances in Natural Computation, pp. 155–164. Springer, Berlin (2005)CrossRef Kim, M.W., Ryu, J.W., Kim, E.J.: Speech recognition by integrating audio, visual and contextual features based on neural networks. Advances in Natural Computation, pp. 155–164. Springer, Berlin (2005)CrossRef
18.
Zurück zum Zitat Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Speech Audio Process. 37(11), 1641–1648 (1989)CrossRef Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Speech Audio Process. 37(11), 1641–1648 (1989)CrossRef
19.
Zurück zum Zitat Lewis, T.W., Powers, D.M.: Audio-visual speech recognition using red exclusion and neural networks. J. Res. Pract. Inf. Technol. 35(1), 41–64 (2003) Lewis, T.W., Powers, D.M.: Audio-visual speech recognition using red exclusion and neural networks. J. Res. Pract. Inf. Technol. 35(1), 41–64 (2003)
20.
Zurück zum Zitat Lin, H., Deng, L., Yu, D., Gong, Y.f., Acero, A., Lee, C.H.: A study on multilingual acoustic modeling for large vocabulary ASR. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4333–4336 (2009) Lin, H., Deng, L., Yu, D., Gong, Y.f., Acero, A., Lee, C.H.: A study on multilingual acoustic modeling for large vocabulary ASR. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4333–4336 (2009)
21.
Zurück zum Zitat Lu, Y., Lu, F., Sehgal, S., Gupta, S., Du, J., Tham, C.H., Green, P., Wan, V.: Multitask learning in connectionist speech recognition. In: Proceedings of the Australian International Conference on Speech Science and Technology (2004) Lu, Y., Lu, F., Sehgal, S., Gupta, S., Du, J., Tham, C.H., Green, P., Wan, V.: Multitask learning in connectionist speech recognition. In: Proceedings of the  Australian International Conference on Speech Science and Technology (2004)
22.
Zurück zum Zitat Martens, J.: Deep learning via Hessian-free optimization. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 735–742 (2010) Martens, J.: Deep learning via Hessian-free optimization. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 735–742 (2010)
23.
Zurück zum Zitat Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 689–696 (2011) Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 689–696 (2011)
24.
Zurück zum Zitat Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRef Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRef
25.
Zurück zum Zitat Plahl, C., Schluter, R., Ney, H.: Cross-lingual portability of chinese and english neural network features for french and german LVCSR.In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 371–376 (2011) Plahl, C., Schluter, R., Ney, H.: Cross-lingual portability of chinese and english neural network features for french and german LVCSR.In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 371–376 (2011)
26.
Zurück zum Zitat Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91(9), 1306–1326 (2003)CrossRef Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91(9), 1306–1326 (2003)CrossRef
27.
Zurück zum Zitat Qian, Y., Liu, J.: Cross-lingual and ensemble MLPs strategies for low-resource speech recognition. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012) Qian, Y., Liu, J.: Cross-lingual and ensemble MLPs strategies for low-resource speech recognition. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)
28.
Zurück zum Zitat Schultz, T., Waibel, A.: Multilingual and crosslingual speech recognition. In: Proceedings of the DARPA Workshop on Broadcast News Transcription and Understanding, pp. 259–262 (1998) Schultz, T., Waibel, A.: Multilingual and crosslingual speech recognition. In: Proceedings of the DARPA Workshop on Broadcast News Transcription and Understanding, pp. 259–262 (1998)
29.
Zurück zum Zitat Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011) Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)
30.
Zurück zum Zitat Seltzer, M.L., Droppo, J.: Multi-task learning in deep neural networks for improved phoneme recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6965–6969 (2013) Seltzer, M.L., Droppo, J.: Multi-task learning in deep neural networks for improved phoneme recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6965–6969 (2013)
31.
Zurück zum Zitat Sumby, W.H., Pollack, I.: Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. (JASA) 26(2), 212–215 (1954)CrossRef Sumby, W.H., Pollack, I.: Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. (JASA) 26(2), 212–215 (1954)CrossRef
32.
Zurück zum Zitat Thomas, S., Ganapathy, S., Hermansky, H.: Cross-lingual and multi-stream posterior features for low resource LVCSR systems. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 877–880 (2010) Thomas, S., Ganapathy, S., Hermansky, H.: Cross-lingual and multi-stream posterior features for low resource LVCSR systems. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 877–880 (2010)
33.
Zurück zum Zitat Thomas, S., Ganapathy, S., Hermansky, H.: Multilingual MLP features for low-resource LVCSR systems. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4269–4272 (2012) Thomas, S., Ganapathy, S., Hermansky, H.: Multilingual MLP features for low-resource LVCSR systems. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4269–4272 (2012)
34.
Zurück zum Zitat Yu, D., Deng, L., Liu, P., Wu, J., Gong, Y., Acero, A.: Cross-lingual speech recognition under runtime resource constraints. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4193–4196 (2009) Yu, D., Deng, L., Liu, P., Wu, J., Gong, Y., Acero, A.: Cross-lingual speech recognition under runtime resource constraints. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4193–4196 (2009)
Metadaten
Titel
Representation Sharing and Transfer in Deep Neural Networks
verfasst von
Dong Yu
Li Deng
Copyright-Jahr
2015
Verlag
Springer London
DOI
https://doi.org/10.1007/978-1-4471-5779-3_12

Neuer Inhalt