nach oben

Neural Computing and Applications

Erschienen in:

19.10.2017 | Original Article

A comparative performance analysis of different activation functions in LSTM networks for classification

verfasst von: Amir Farzad, Hoda Mashayekhi, Hamid Hassanpour

Erschienen in: Neural Computing and Applications | Ausgabe 7/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In recurrent neural networks such as the long short-term memory (LSTM), the sigmoid and hyperbolic tangent functions are commonly used as activation functions in the network units. Other activation functions developed for the neural networks are not thoroughly analyzed in LSTMs. While many researchers have adopted LSTM networks for classification tasks, no comprehensive study is available on the choice of activation functions for the gates in these networks. In this paper, we compare 23 different kinds of activation functions in a basic LSTM network with a single hidden layer. Performance of different activation functions and different number of LSTM blocks in the hidden layer are analyzed for classification of records in the IMDB, Movie Review, and MNIST data sets. The quantitative results on all data sets demonstrate that the least average error is achieved with the Elliott activation function and its modifications. Specifically, this family of functions exhibits better results than the sigmoid activation function which is popular in LSTM networks.

Vorheriger Artikel Attribute weight computation in a decision making problem by particle swarm optimization

Nächster Artikel Simple speed estimators reproduce MT responses and identify strength of visual illusion

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

http://www.cs.cornell.edu/people/pabo/movie-review-data/rt-polaritydata.tar.gz.

http://ai.stanford.edu/amaas/data/sentiment/.

http://yann.lecun.com/exdb/mnist/.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. doi:10.1162/neco.1997.9.8.1735 CrossRef

Graves A (2012) Supervised sequence labelling with recurrent neural networks. Springer, BerlinCrossRefMATH

Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5):602–610CrossRef

Liwicki M, Graves A, Bunke H, Schmidhuber J (2007) A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proceedings of the 9th international conference on document analysis and recognition, ICDAR 2007

Graves A, Liwicki M, Fernández S et al (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31:855–868. doi:10.1109/TPAMI.2008.137 CrossRef

Graves A, Fernández S, Schmidhuber J (2005) Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. In: Duch W, Kacprzyk J, Oja E, Zadrożny S (eds) Artificial neural networks: formal models and their applications—ICANN 2005. Springer, Berlin, pp 799–804

Otte S, Krechel D, Liwicki M, Dengel A (2012) Local feature based online mode detection with recurrent neural networks. In: 2012 international conference on frontiers in handwriting recognition (ICFHR). pp 533–537

Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. arXiv:1303.5778 [cs]

Thang Luong IS (2014) Addressing the rare word problem in neural machine translation. doi:10.3115/v1/P15-1002

10.

Wöllmer M, Metallinou A, Eyben F, et al (2010) Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. In: Proceedings of interspeech, Makuhari. pp 2362–2365

11.

Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the annual conference of international speech communication association (INTERSPEECH). pp 338–342

12.

Fan Y, Qian Y, Xie F, Soong FK (2014) TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Proceedings interspeech. pp 1964–1968

13.

Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv:1409.2329 [cs]

14.

Sønderby SK, Winther O (2014) Protein secondary structure prediction with long short term memory networks. arXiv:1412.7828 [cs, q-bio]

15.

Marchi E, Ferroni G, Eyben F, et al (2014) Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp 2164–2168

16.

Donahue J, Hendricks LA, Guadarrama S, et al (2014) Long-term recurrent convolutional networks for visual recognition and description. arXiv:1411.4389 [cs]

17.

Wollmer M, Blaschke C, Schindl T et al (2011) Online driver distraction detection using long short-term memory. IEEE Trans Intell Transp Syst 12:574–582. doi:10.1109/TITS.2011.2119483 CrossRef

18.

da Gomes GSS, Ludermir TB (2013) Optimization of the weights and asymmetric activation function family of neural network for time series forecasting. Exp Syst Appl 40:6438–6446. doi:10.1016/j.eswa.2013.05.053 CrossRef

19.

Duch W, Jankowski N (1999) Survey of neural transfer functions. Neural Comput Surv 2:163–213

20.

Singh Sodhi S, Chandra P (2003) A class +1 sigmoidal activation functions for FFANNs. J Econ Dyna Control 28:183–187MathSciNetCrossRefMATH

21.

da Gomes GSS, Ludermir TB, Lima LMMR (2010) Comparison of new activation functions in neural network for forecasting financial time series. Neural Comput Appl 20:417–439. doi:10.1007/s00521-010-0407-3 CrossRef

22.

Gomes GS d S, Ludermir TB (2008) Complementary log-log and probit: activation functions implemented in artificial neural networks. In: Eighth international conference on hybrid intelligent systems, 2008. HIS’08. pp 939–942

23.

Michal Rosen-Zvi MB (1998) Learnability of periodic activation functions: general results. Phys Rev E 58:3606–3609. doi:10.1103/PhysRevE.58.3606 CrossRef

24.

Leung H, Haykin S (1993) Rational function neural network. Neural Comput 5:928–938. doi:10.1162/neco.1993.5.6.928 CrossRef

25.

Ma L, Khorasani K (2005) Constructive feedforward neural networks using Hermite polynomial activation functions. IEEE Trans Neural Netw 16:821–833. doi:10.1109/TNN.2005.851786 CrossRef

26.

Hornik K (1993) Some new results on neural network approximation. Neural Netw 6:1069–1072. doi:10.1016/S0893-6080(09)80018-X CrossRef

27.

Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257. doi:10.1016/0893-6080(91)90009-T CrossRef

28.

Hartman E, Keeler JD, Kowalski JM (1990) Layered neural networks with Gaussian hidden units as universal approximations. Neural Comput 2:210–215. doi:10.1162/neco.1990.2.2.210 CrossRef

29.

Skoundrianos EN, Tzafestas SG (2004) Modelling and FDI of dynamic discrete time systems using a MLP with a new sigmoidal activation function. J Intell Robot Syst 41:19–36. doi:10.1023/B:JINT.0000049175.78893.2f CrossRef

30.

Pao Y-H (1989) Adaptive pattern recognition and neural networks. Addison-Wesley Longman Publishing Co. Inc, BostonMATH

31.

Carroll SM, Dickinson BW (1989) Construction of neural nets using the radon transform. In: International joint conference on neural networks, 1989, vol 1. IJCNN. pp 607–611

32.

Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signal Syst 2:303–314. doi:10.1007/BF02551274 MathSciNetCrossRefMATH

33.

Chandra P, Singh Y (2004) Feedforward sigmoidal networks—equicontinuity and fault-tolerance properties. IEEE Trans Neural Netw 15:1350–1366. doi:10.1109/TNN.2004.831198 CrossRef

34.

Williams RJ, Zipser D (1995) Gradient-based learning algorithms for recurrent networks and their computational complexity. In: Chauvin Y, Rumelhart DE (eds) Back-propagation: theory, architectures and applications. L. Erlbaum Associates Inc., Hillsdale, pp 433–486

35.

Zeiler MD (2012) ADADELTA: an adaptive learning rate method. arXiv:1212.5701 [cs]

36.

Singh Sodhi S, Chandra P (2014) Bi-modal derivative activation function for sigmoidal feedforward networks. Neurocomputing 143:182–196. doi:10.1016/j.neucom.2014.06.007 CrossRef

37.

Yuan M, Hu H, Jiang Y, Hang S (2013) A new camera calibration based on neural network with tunable activation function in intelligent space. In: 2013 6th international symposium on computational intelligence and design (ISCID). pp 371–374

38.

Chandra P, Sodhi SS (2014) A skewed derivative activation function for SFFANNs. In: Recent advances and innovations in engineering (ICRAIE). IEEE, pp 1–6

39.

Elliott DL (1993) A better activation function for artificial neural networks. Technical Report ISR TR 93–8, University of Maryland

40.

Hara K, Nakayamma K (1994) Comparison of activation functions in multilayer neural network for pattern classification. In: 1994 IEEE international conference on neural networks, 1994. IEEE world congress on computational intelligence, vol 5. pp 2997–3002

41.

Burhani H, Feng W, Hu G (2015) Denoising autoencoder in neural networks with modified Elliott activation function and sparsity-favoring cost function. In: 2015 3rd international conference on applied computing and information technology/2nd international conference on computational science and intelligence (ACIT-CSI). pp 343–348

42.

Chandra P, Singh Y (2004) A case for the self-adaptation of activation functions in FFANNs. Neurocomputing 56:447–454. doi:10.1016/j.neucom.2003.08.005 CrossRef

43.

Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML 2010). pp 807–814

44.

Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw Mach Learn 4:26–31

45.

Duchi J, Hazan E, Singer Y (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12:2121–2159MathSciNetMATH

46.

Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019 [cs]

47.

Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747 [cs]

48.

Hinton GE, Srivastava N, Krizhevsky A, et al (2012) Improving neural networks by preventing co-adaptation of feature detectors, vol abs/1207.0580. arXiv preprint arXiv:1207.0580. The Computing Research Repository (CoRR)

49.

Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA. pp 115–124

50.

Maas AL, Daly RE, Pham PT, et al (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, Stroudsburg, PA, USA. pp 142–150

51.

Lenc L, Hercig T (2016) Neural networks for sentiment analysis in Czech. In: ITAT 2016 proceedings, CEUR Workshop Proceedings, vol 1649. pp 48–55

52.

Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Cortes C, Lawrence ND, Lee DD et al (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc, Red Hook, pp 3079–3087

53.

Arjovsky M, Shah A, Bengio Y (2016) Unitary evolution recurrent neural networks. In: Proceedings of the 33rd international conference on machine learning. pp 1120–1128

54.

Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5:157–166. doi:10.1109/72.279181 CrossRef

55.

Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Proceedings of the 30th international conference on machine learning. pp 1310–1318

56.

Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. Neural computing: new challenges and perspectives for the new millennium, vol 3. pp 189–194

57.

Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681. doi:10.1109/78.650093 CrossRef

58.

Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder–decoder approaches. arXiv:1409.1259 [cs, stat]

Titel: A comparative performance analysis of different activation functions in LSTM networks for classification
verfasst von: Amir Farzad
Hoda Mashayekhi
Hamid Hassanpour
Publikationsdatum: 19.10.2017
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 7/2019
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-017-3210-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 7/2019

Development of novel methods to predict the strength properties of thermally treated sandstone using statistical and soft-computing approach

Sparse subspace clustering with low-rank transformation

Nonparametric kernel smoother on topology learning neural networks for incremental and ensemble regression

Large-scale structural learning and predicting via hashing approximation

Nuclear reconstructive feature extraction

Fuzzy logic control on FPGA for two axes solar tracking

Premium Partner