Skip to main content
Erschienen in: Neural Computing and Applications 7/2019

19.10.2017 | Original Article

A comparative performance analysis of different activation functions in LSTM networks for classification

verfasst von: Amir Farzad, Hoda Mashayekhi, Hamid Hassanpour

Erschienen in: Neural Computing and Applications | Ausgabe 7/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recurrent neural networks such as the long short-term memory (LSTM), the sigmoid and hyperbolic tangent functions are commonly used as activation functions in the network units. Other activation functions developed for the neural networks are not thoroughly analyzed in LSTMs. While many researchers have adopted LSTM networks for classification tasks, no comprehensive study is available on the choice of activation functions for the gates in these networks. In this paper, we compare 23 different kinds of activation functions in a basic LSTM network with a single hidden layer. Performance of different activation functions and different number of LSTM blocks in the hidden layer are analyzed for classification of records in the IMDB, Movie Review, and MNIST data sets. The quantitative results on all data sets demonstrate that the least average error is achieved with the Elliott activation function and its modifications. Specifically, this family of functions exhibits better results than the sigmoid activation function which is popular in LSTM networks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
2.
Zurück zum Zitat Graves A (2012) Supervised sequence labelling with recurrent neural networks. Springer, BerlinCrossRefMATH Graves A (2012) Supervised sequence labelling with recurrent neural networks. Springer, BerlinCrossRefMATH
3.
Zurück zum Zitat Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5):602–610CrossRef Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5):602–610CrossRef
4.
Zurück zum Zitat Liwicki M, Graves A, Bunke H, Schmidhuber J (2007) A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proceedings of the 9th international conference on document analysis and recognition, ICDAR 2007 Liwicki M, Graves A, Bunke H, Schmidhuber J (2007) A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proceedings of the 9th international conference on document analysis and recognition, ICDAR 2007
5.
6.
Zurück zum Zitat Graves A, Fernández S, Schmidhuber J (2005) Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. In: Duch W, Kacprzyk J, Oja E, Zadrożny S (eds) Artificial neural networks: formal models and their applications—ICANN 2005. Springer, Berlin, pp 799–804 Graves A, Fernández S, Schmidhuber J (2005) Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. In: Duch W, Kacprzyk J, Oja E, Zadrożny S (eds) Artificial neural networks: formal models and their applications—ICANN 2005. Springer, Berlin, pp 799–804
7.
Zurück zum Zitat Otte S, Krechel D, Liwicki M, Dengel A (2012) Local feature based online mode detection with recurrent neural networks. In: 2012 international conference on frontiers in handwriting recognition (ICFHR). pp 533–537 Otte S, Krechel D, Liwicki M, Dengel A (2012) Local feature based online mode detection with recurrent neural networks. In: 2012 international conference on frontiers in handwriting recognition (ICFHR). pp 533–537
10.
Zurück zum Zitat Wöllmer M, Metallinou A, Eyben F, et al (2010) Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. In: Proceedings of interspeech, Makuhari. pp 2362–2365 Wöllmer M, Metallinou A, Eyben F, et al (2010) Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. In: Proceedings of interspeech, Makuhari. pp 2362–2365
11.
Zurück zum Zitat Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the annual conference of international speech communication association (INTERSPEECH). pp 338–342 Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the annual conference of international speech communication association (INTERSPEECH). pp 338–342
12.
Zurück zum Zitat Fan Y, Qian Y, Xie F, Soong FK (2014) TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Proceedings interspeech. pp 1964–1968 Fan Y, Qian Y, Xie F, Soong FK (2014) TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Proceedings interspeech. pp 1964–1968
14.
Zurück zum Zitat Sønderby SK, Winther O (2014) Protein secondary structure prediction with long short term memory networks. arXiv:1412.7828 [cs, q-bio] Sønderby SK, Winther O (2014) Protein secondary structure prediction with long short term memory networks. arXiv:​1412.​7828 [cs, q-bio]
15.
Zurück zum Zitat Marchi E, Ferroni G, Eyben F, et al (2014) Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 2164–2168 Marchi E, Ferroni G, Eyben F, et al (2014) Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 2164–2168
16.
Zurück zum Zitat Donahue J, Hendricks LA, Guadarrama S, et al (2014) Long-term recurrent convolutional networks for visual recognition and description. arXiv:1411.4389 [cs] Donahue J, Hendricks LA, Guadarrama S, et al (2014) Long-term recurrent convolutional networks for visual recognition and description. arXiv:​1411.​4389 [cs]
19.
Zurück zum Zitat Duch W, Jankowski N (1999) Survey of neural transfer functions. Neural Comput Surv 2:163–213 Duch W, Jankowski N (1999) Survey of neural transfer functions. Neural Comput Surv 2:163–213
20.
21.
22.
Zurück zum Zitat Gomes GS d S, Ludermir TB (2008) Complementary log-log and probit: activation functions implemented in artificial neural networks. In: Eighth international conference on hybrid intelligent systems, 2008. HIS’08. pp 939–942 Gomes GS d S, Ludermir TB (2008) Complementary log-log and probit: activation functions implemented in artificial neural networks. In: Eighth international conference on hybrid intelligent systems, 2008. HIS’08. pp 939–942
30.
Zurück zum Zitat Pao Y-H (1989) Adaptive pattern recognition and neural networks. Addison-Wesley Longman Publishing Co. Inc, BostonMATH Pao Y-H (1989) Adaptive pattern recognition and neural networks. Addison-Wesley Longman Publishing Co. Inc, BostonMATH
31.
Zurück zum Zitat Carroll SM, Dickinson BW (1989) Construction of neural nets using the radon transform. In: International joint conference on neural networks, 1989, vol 1. IJCNN. pp 607–611 Carroll SM, Dickinson BW (1989) Construction of neural nets using the radon transform. In: International joint conference on neural networks, 1989, vol 1. IJCNN. pp 607–611
34.
Zurück zum Zitat Williams RJ, Zipser D (1995) Gradient-based learning algorithms for recurrent networks and their computational complexity. In: Chauvin Y, Rumelhart DE (eds) Back-propagation: theory, architectures and applications. L. Erlbaum Associates Inc., Hillsdale, pp 433–486 Williams RJ, Zipser D (1995) Gradient-based learning algorithms for recurrent networks and their computational complexity. In: Chauvin Y, Rumelhart DE (eds) Back-propagation: theory, architectures and applications. L. Erlbaum Associates Inc., Hillsdale, pp 433–486
37.
Zurück zum Zitat Yuan M, Hu H, Jiang Y, Hang S (2013) A new camera calibration based on neural network with tunable activation function in intelligent space. In: 2013 6th international symposium on computational intelligence and design (ISCID). pp 371–374 Yuan M, Hu H, Jiang Y, Hang S (2013) A new camera calibration based on neural network with tunable activation function in intelligent space. In: 2013 6th international symposium on computational intelligence and design (ISCID). pp 371–374
38.
Zurück zum Zitat Chandra P, Sodhi SS (2014) A skewed derivative activation function for SFFANNs. In: Recent advances and innovations in engineering (ICRAIE). IEEE, pp 1–6 Chandra P, Sodhi SS (2014) A skewed derivative activation function for SFFANNs. In: Recent advances and innovations in engineering (ICRAIE). IEEE, pp 1–6
39.
Zurück zum Zitat Elliott DL (1993) A better activation function for artificial neural networks. Technical Report ISR TR 93–8, University of Maryland Elliott DL (1993) A better activation function for artificial neural networks. Technical Report ISR TR 93–8, University of Maryland
40.
Zurück zum Zitat Hara K, Nakayamma K (1994) Comparison of activation functions in multilayer neural network for pattern classification. In: 1994 IEEE international conference on neural networks, 1994. IEEE world congress on computational intelligence, vol 5. pp 2997–3002 Hara K, Nakayamma K (1994) Comparison of activation functions in multilayer neural network for pattern classification. In: 1994 IEEE international conference on neural networks, 1994. IEEE world congress on computational intelligence, vol 5. pp 2997–3002
41.
Zurück zum Zitat Burhani H, Feng W, Hu G (2015) Denoising autoencoder in neural networks with modified Elliott activation function and sparsity-favoring cost function. In: 2015 3rd international conference on applied computing and information technology/2nd international conference on computational science and intelligence (ACIT-CSI). pp 343–348 Burhani H, Feng W, Hu G (2015) Denoising autoencoder in neural networks with modified Elliott activation function and sparsity-favoring cost function. In: 2015 3rd international conference on applied computing and information technology/2nd international conference on computational science and intelligence (ACIT-CSI). pp 343–348
43.
Zurück zum Zitat Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML 2010). pp 807–814 Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML 2010). pp 807–814
44.
Zurück zum Zitat Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw Mach Learn 4:26–31 Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw Mach Learn 4:26–31
45.
Zurück zum Zitat Duchi J, Hazan E, Singer Y (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12:2121–2159MathSciNetMATH Duchi J, Hazan E, Singer Y (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12:2121–2159MathSciNetMATH
46.
Zurück zum Zitat Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019 [cs] Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv:​1506.​00019 [cs]
48.
Zurück zum Zitat Hinton GE, Srivastava N, Krizhevsky A, et al (2012) Improving neural networks by preventing co-adaptation of feature detectors, vol abs/1207.0580. arXiv preprint arXiv:1207.0580. The Computing Research Repository (CoRR) Hinton GE, Srivastava N, Krizhevsky A, et al (2012) Improving neural networks by preventing co-adaptation of feature detectors, vol abs/1207.0580. arXiv preprint arXiv:​1207.​0580. The Computing Research Repository (CoRR)
49.
Zurück zum Zitat Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA. pp 115–124 Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA. pp 115–124
50.
Zurück zum Zitat Maas AL, Daly RE, Pham PT, et al (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, Stroudsburg, PA, USA. pp 142–150 Maas AL, Daly RE, Pham PT, et al (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, Stroudsburg, PA, USA. pp 142–150
51.
Zurück zum Zitat Lenc L, Hercig T (2016) Neural networks for sentiment analysis in Czech. In: ITAT 2016 proceedings, CEUR Workshop Proceedings, vol 1649. pp 48–55 Lenc L, Hercig T (2016) Neural networks for sentiment analysis in Czech. In: ITAT 2016 proceedings, CEUR Workshop Proceedings, vol 1649. pp 48–55
52.
Zurück zum Zitat Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Cortes C, Lawrence ND, Lee DD et al (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc, Red Hook, pp 3079–3087 Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Cortes C, Lawrence ND, Lee DD et al (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc, Red Hook, pp 3079–3087
53.
Zurück zum Zitat Arjovsky M, Shah A, Bengio Y (2016) Unitary evolution recurrent neural networks. In: Proceedings of the 33rd international conference on machine learning. pp 1120–1128 Arjovsky M, Shah A, Bengio Y (2016) Unitary evolution recurrent neural networks. In: Proceedings of the 33rd international conference on machine learning. pp 1120–1128
54.
55.
Zurück zum Zitat Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Proceedings of the 30th international conference on machine learning. pp 1310–1318 Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Proceedings of the 30th international conference on machine learning. pp 1310–1318
56.
Zurück zum Zitat Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. Neural computing: new challenges and perspectives for the new millennium, vol 3. pp 189–194 Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. Neural computing: new challenges and perspectives for the new millennium, vol 3. pp 189–194
58.
Zurück zum Zitat Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder–decoder approaches. arXiv:1409.1259 [cs, stat] Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder–decoder approaches. arXiv:​1409.​1259 [cs, stat]
Metadaten
Titel
A comparative performance analysis of different activation functions in LSTM networks for classification
verfasst von
Amir Farzad
Hoda Mashayekhi
Hamid Hassanpour
Publikationsdatum
19.10.2017
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 7/2019
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-3210-6

Weitere Artikel der Ausgabe 7/2019

Neural Computing and Applications 7/2019 Zur Ausgabe

Premium Partner