Skip to main content

2019 | OriginalPaper | Buchkapitel

Enhanced LSTM with Batch Normalization

verfasst von : Li-Na Wang, Guoqiang Zhong, Shoujun Yan, Junyu Dong, Kaizhu Huang

Erschienen in: Neural Information Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recurrent neural networks (RNNs) are powerful models for sequence learning. However, the training of RNNs is complicated because the internal covariate shift problem, where the input distribution at each iteration changes during the training as the parameters have been updated. Although some work has applied batch normalization (BN) to alleviate this problem in long short-term memory (LSTM), unfortunately, BN has not been applied to the update of the LSTM cell. In this paper, to tackle the internal covariate shift problem of LSTM, we introduce a method to successfully integrate BN into the update of the LSTM cell. Experimental results on two benchmark data sets, i.e. MNIST and Fashion-MNIST, show that the proposed method, enhanced LSTM with BN (eLSTM-BN), has achieved a faster convergence than LSTM and its variants, while obtained higher classification accuracy on sequence learning tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: ICML, pp. 1120–1128 (2016) Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: ICML, pp. 1120–1128 (2016)
2.
Zurück zum Zitat Bayer, J., Osendorfer, C., Chen, N., Urban, S., Smagt, P.: On fast dropout and its applicability to recurrent networks. CoRR abs/1311.0701 (2013) Bayer, J., Osendorfer, C., Chen, N., Urban, S., Smagt, P.: On fast dropout and its applicability to recurrent networks. CoRR abs/1311.0701 (2013)
3.
Zurück zum Zitat Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRef Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRef
4.
Zurück zum Zitat Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014) Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)
5.
Zurück zum Zitat Cooijmans, T., Ballas, N., Laurent, C., Courville, A.: Recurrent batch normalization. CoRR abs/1603.09025 (2016) Cooijmans, T., Ballas, N., Laurent, C., Courville, A.: Recurrent batch normalization. CoRR abs/1603.09025 (2016)
6.
Zurück zum Zitat Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)MathSciNetCrossRef Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)MathSciNetCrossRef
7.
Zurück zum Zitat Hochreiter, S.: Untersuchungen zu Dynamischen Neuronalen Netzen. Master’s thesis, Institut Fur Informatik, Technische Universitat, Munchen (1991) Hochreiter, S.: Untersuchungen zu Dynamischen Neuronalen Netzen. Master’s thesis, Institut Fur Informatik, Technische Universitat, Munchen (1991)
8.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
9.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
10.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
11.
Zurück zum Zitat Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. In: ICASSP, pp. 2657–2661 (2016) Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. In: ICASSP, pp. 2657–2661 (2016)
12.
Zurück zum Zitat Le, Q., Jaitly, N., Hinton, G.: A simple way to initialize recurrent networks of rectified linear units. CoRR abs/1504.00941 (2015) Le, Q., Jaitly, N., Hinton, G.: A simple way to initialize recurrent networks of rectified linear units. CoRR abs/1504.00941 (2015)
13.
Zurück zum Zitat Liao, Q., Poggio, T.: Bridging the gaps between residual learning, recurrent neural networks and visual cortex. CoRR abs/1604.03640 (2016) Liao, Q., Poggio, T.: Bridging the gaps between residual learning, recurrent neural networks and visual cortex. CoRR abs/1604.03640 (2016)
14.
Zurück zum Zitat Saxe, A., McClelland, J., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR abs/1312.6120 (2013) Saxe, A., McClelland, J., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR abs/1312.6120 (2013)
15.
Zurück zum Zitat Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017) Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017)
16.
Zurück zum Zitat Yann, L., Lon, B., Yoshua, B., Patrick, H.: Gradient-based learning applied to document recognition, pp. 2278–2324. IEEE (1998) Yann, L., Lon, B., Yoshua, B., Patrick, H.: Gradient-based learning applied to document recognition, pp. 2278–2324. IEEE (1998)
17.
Zurück zum Zitat Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. CoRR abs/1409.2329 (2014) Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. CoRR abs/1409.2329 (2014)
Metadaten
Titel
Enhanced LSTM with Batch Normalization
verfasst von
Li-Na Wang
Guoqiang Zhong
Shoujun Yan
Junyu Dong
Kaizhu Huang
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-36708-4_61