Skip to main content
Log in

Extreme learning machines: a survey

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Computational intelligence techniques have been used in wide applications. Out of numerous computational intelligence techniques, neural networks and support vector machines (SVMs) have been playing the dominant roles. However, it is known that both neural networks and SVMs face some challenging issues such as: (1) slow learning speed, (2) trivial human intervene, and/or (3) poor computational scalability. Extreme learning machine (ELM) as emergent technology which overcomes some challenges faced by other techniques has recently attracted the attention from more and more researchers. ELM works for generalized single-hidden layer feedforward networks (SLFNs). The essence of ELM is that the hidden layer of SLFNs need not be tuned. Compared with those traditional computational intelligence techniques, ELM provides better generalization performance at a much faster learning speed and with least human intervene. This paper gives a survey on ELM and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELM, (3) online sequential ELM, (4) incremental ELM, and (5) ensemble of ELM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagation errors. Nature 323:533–536

    Article  Google Scholar 

  2. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  3. Rosenblatt F (1962) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan Books, New York

    MATH  Google Scholar 

  4. Lowe D (1989) Adaptive radial basis function nonlinearities and the problem of generalisation. In: Proceedings of first IEE international conference on artificial neural networks, pp 171–175

  5. Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol 2, Budapest, Hungary, 25–29 July 2004, pp 985–990

  6. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501

    Article  Google Scholar 

  7. Huang G-B, Chen L, Siew C-K (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892

    Article  Google Scholar 

  8. Huang G-B, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70:3056–3062

    Article  Google Scholar 

  9. Huang G-B, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71:3460–3468

    Article  Google Scholar 

  10. Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44(2):525–536

    Article  MathSciNet  MATH  Google Scholar 

  11. Huang S-C, Huang Y-F (1991) Bounds on the number of hidden neurons in multilayer perceptrons. IEEE Trans Neural Netw 2(1):47–55

    Article  Google Scholar 

  12. Sartori MA, Antsaklis PJ (1991) A simple method to derive bounds on the size and to train multilayer neural networks. IEEE Trans Neural Netw 2(4):467–471

    Article  Google Scholar 

  13. Huang G-B, Babri HA (1998) Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions. IEEE Trans Neural Netw 9(1):224–229

    Article  Google Scholar 

  14. Gallant A, White H (1992) There exists a neural network that does not make avoidable mistakes. In: White H (ed) Artificial neural networks: approximation and learning theory. Blackwell, Oxford, pp 5–11

  15. Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257

    Article  Google Scholar 

  16. Leshno M, Lin VY, Pinkus A, Schocken S (1993) Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw 6:861–867

    Article  Google Scholar 

  17. Park J, Sandberg IW (1991) Universal approximation using radial-basis-function networks. Neural Comput 3:246–257

    Article  Google Scholar 

  18. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366

    Article  Google Scholar 

  19. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314

    Article  MathSciNet  MATH  Google Scholar 

  20. Funahashi K (1989) On the approximate realization of continuous mappings by neural networks. Neural Netw 2:183–192

    Article  Google Scholar 

  21. Stinchcombe M, White H (1992) Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions. In: White H (ed) Artificial neural networks: approximation and learning theory. Blackwell, Oxford, pp 29–40

  22. Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39(3):930–945

    Article  MathSciNet  MATH  Google Scholar 

  23. Kwok T-Y, Yeung D-Y (1997) Objective functions for training new hidden units in constructive neural networks. IEEE Trans Neural Netw 8(5):1131–1148

    Article  Google Scholar 

  24. Meir R, Maiorov VE (2000) On the optimality of neural-network approximation using incremental algorithms. IEEE Trans Neural Netw 11(2):323–337

    Article  Google Scholar 

  25. Romero E (2001) Function approximation with SAOCIF: a general sequential method and a particular algorithm with feed-forward neural networks. Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya. http://www.lsi.upc.es/dept/techreps/html/R01-41.html

  26. Huang G-B (2003) Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans Neural Netw 14(2):274–281

    Article  Google Scholar 

  27. Corwin EM, Logar AM, Oldham WJB (1994) An iterative method for training multilayer networks with threshold function. IEEE Trans Neural Netw 5(3):507–508

    Article  Google Scholar 

  28. Toms DJ (1990) Training binary node feedforward neural networks by backpropagation of error. Electron Lett 26(21):1745–1746

    Article  Google Scholar 

  29. Goodman RM, Zeng Z (1994) A learning algorithm for multi-layer perceptrons with hard-limiting threshold units. In: Proceedings of the 1994 IEEE workshop of neural networks for signal processing, pp 219–228

  30. Plagianakos VP, Magoulas GD, Nousis NK, Vrahatis MN (2001) Training multilayer networks with discrete activation functions. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN’2001), Washington, DC, USA

  31. Voxman WL, Roy J, Goetschel H (1981) Advanced calculus: an introduction to modern analysis. Marcel Dekker, New York

  32. Broomhead DS, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 2:321–355

    MathSciNet  MATH  Google Scholar 

  33. Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329

    Article  Google Scholar 

  34. Huang G-B, Li M-B, Chen L, Siew C-K (2008) Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing 71:576–583

    Article  Google Scholar 

  35. Huang G-B, Siew C-K (2004) Extreme learning machine: RBF network case. In: Proceedings of the eighth international conference on control, automation, robotics and vision (ICARCV 2004), vol 2, Kunming, China, 6–9 Dec 2004, pp 1029–1036

  36. Huang G-B, Zhu Q-Y, Mao K-Z, Siew C-K, Saratchandran P, Sundararajan N (2006) Can threshold networks be trained directly?. IEEE Trans Circuits Syst II 53(3):187–191

    Article  Google Scholar 

  37. Serre D (2002) Matrices: theory and applications. Springer, New York

    MATH  Google Scholar 

  38. Rao CR, Mitra SK (1971) Generalized inverse of matrices and its applications. Wiley, New York

    MATH  Google Scholar 

  39. Huang G-B, Zhou H, Ding X, Zhang R (2010) Extreme learning machine for regression and multi-class classification. IEEE Trans Pattern Anal Mach Intell (submitted)

  40. Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Article  MathSciNet  MATH  Google Scholar 

  41. Toh K-A (2008) Deterministic neural classification. Neural Comput 20(6):1565–1595

    Article  MathSciNet  MATH  Google Scholar 

  42. Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: IEEE symposium on computational intelligence and data mining (CIDM2009), 30 March 2009–2 April 2009, pp 389–395

  43. Man Z, Lee K, Wang D, Cao Z, Miao C (2011) A new robust training algorithm for a class of single-hidden layer feedforward neural networks. Neurocomputing (in press)

  44. Miche Y, van Heeswijk M, Bas P, Simula O, Lendasse A (2011) TROP-ELM: a double-regularized elm using lars and tikhonov regularization. Neurocomputing (in press)

  45. Drucker H, Burges CJ, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. In: Mozer M, Jordan J, Petscbe T (eds) Neural information processing systems, vol 9. MIT Press, Cambridge, pp 155–161

  46. Hsu C-W, Lin C-J (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Article  Google Scholar 

  47. Lin K-M, Lin C-J (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14(6):1449–1459

    Article  Google Scholar 

  48. Lee Y-J, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of the SIAM international conference on data mining, Chicago, USA, 5–7 Apr 2001

  49. Suykens JAK, Vandewalle J (1997) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  50. Frénay B, Verleysen M (2010) Using SVMs with randomised feature spaces: an extreme learning approach. In: Proceedings of the 18th European symposium on artificial neural networks (ESANN), Bruges, Belgium, 28–30 Apr 2010, pp 315–320

  51. Frénay B, Verleysen M (2011) Parameter-insensitive kernel in extreme learning for non-linear support vector regression. Neurocomputing (in press)

  52. Li M-B, Huang G-B, Saratchandran P, Sundararajan N (2005) Fully complex extreme learning machine. Neurocomputing 68:306–314

    Article  Google Scholar 

  53. Cha I, Kassam SA (1995) Channel equalization using adaptive complex radial basis function networks. IEEE J Sel Areas Commun 13:122–131

    Article  Google Scholar 

  54. Jianping D, Sundararajan N, Saratchandran P (2002) Communication channel equalization using complex-valued minimal radial basis function neural networks. IEEE Trans Neural Netw 13:687–696

    Article  Google Scholar 

  55. Kim T, Adali T (2003) Approximation by fully complex multilayer perseptrons. Neural Comput 15:1641–1666

    Article  MATH  Google Scholar 

  56. LeCun Y, Bottou L, Orr GB, Müller K-R (1998) Efficient BackProp. Lect Notes Comput Sci 1524:9–50

  57. Platt J (1991) A resource-allocating network for function interpolation. Neural Comput 3:213–225

    Article  MathSciNet  Google Scholar 

  58. Kadirkamanathan V, Niranjan M (1993) A function estimation approach to sequential learning with neural networks. Neural Comput 5:954–975

    Article  Google Scholar 

  59. Yingwei L, Sundararajan N, Saratchandran P (1997) A sequential learning scheme for function approximation using minimal radial basis function (RBF) neural networks. Neural Comput 9:461–478

    Article  MATH  Google Scholar 

  60. Yingwei L, Sundararajan N, Saratchandran P (1998) Performance evaluation of a sequential minimal radial basis function (RBF) neural network learning algorithm. IEEE Trans Neural Netw 9(2):308–318

    Article  Google Scholar 

  61. Salmerón M, Ortega J, Puntonet CG, Prieto A (2001) Improved RAN sequential prediction using orthogonal techniques. Neurocomputing 41:153–172

    Google Scholar 

  62. Rojas I, Pomares H, Bernier JL, Ortega J, Pino B, Pelayo FJ, Prieto A (2002) Time series analysis using normalized PG-RBF network with regression weights. Neurocomputing 42:267–285

    Article  MATH  Google Scholar 

  63. Huang G-B, Saratchandran P, Sundararajan N (2004) An efficient sequential learning algorithm for growing and pruning RBF (GAP-RBF) networks. IEEE Trans Syst Man Cybern Part B 34(6):2284–2292

    Article  Google Scholar 

  64. Huang G-B, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67

    Article  Google Scholar 

  65. Liang N-Y, Huang G-B, Saratchandran P, Sundararajan N (2006) A fast and accurate on-line sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423

    Article  Google Scholar 

  66. Chong EKP,  Zak SH (2001) An introduction to optimization. Wiley, New York

    MATH  Google Scholar 

  67. Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore

  68. Mackey MC, Glass L (1997) Oscillation and chaos in physiological control systems. Science 197:287–289

    Article  Google Scholar 

  69. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  70. Smola A, Schölkopf B (1998) A tutorial on support vector regression. NeuroCOLT2 technical report NC2-TR-1998-030

  71. Hansen LK, Salamon P (1990) Neural network ensemble. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001

    Article  Google Scholar 

  72. Breiman L (1996) Bagging predictor. Mach Learn 24(2):123–140

    MathSciNet  MATH  Google Scholar 

  73. Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227

    Google Scholar 

  74. Freund Y (1995) Boosting a weak algorithm by majority. Inf Comput 121(2):256–285

    Article  MathSciNet  MATH  Google Scholar 

  75. Freund Y, Schapire RE (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 55:119–139

    Article  MathSciNet  MATH  Google Scholar 

  76. Sun Z-L, Choi T-M, Au K-F, Yu Y (2008) Sales forecasting using extreme learning machine with applications in fashion retailing. Decis Support Syst 46(1):411–419

    Article  Google Scholar 

  77. van Heeswijk M, Miche Y, Lindh-Knuutila T, Hilbers PA, Honkela T, Oja E, Lendasse A (2009) Adaptive ensemble models of extreme learning machines for time series prediction. Lect Notes Comput Sci 5769:305–314

    Article  Google Scholar 

  78. van Heeswijk M, Miche Y, Oja E, Lendasse A (2011) Gpu-accelerated and parallelized ELM ensembles for large-scale regression. Neurocomputing (in press)

  79. Minku FL, Inoue H, Yao X (2011) Negative correlation in incremental learning. Nat Comp (in press)

  80. Sun Y, Yuan Y, Wang G (2011) An OS-ELM based distributed ensemble classification framework in p2p networks. Neurocomputing (in press)

  81. Lan Y, Soh YC, Huang G-B (2009) Ensemble of online sequential extreme learning machine. Neurocomputing 72:3391–3395

    Article  Google Scholar 

  82. Rong H-J, Ong Y-S, Tan A-H, Zhu Z (2008) A fast pruned-extreme learning machine for classification problem. Neurocomputing 72:359–366

    Article  Google Scholar 

  83. Miche Y, Sorjamaa A, Lendasse A (2008) OP-ELM: theory, experiments and a toolbox. Lect Notes Comput Sci 5163:145–154

    Article  Google Scholar 

  84. Simila T, Tikka J (2005) Multiresponse sparse regression with application to multidimensional scaling. In: Proceedings in artificial neural networks: formal models and their applications, ICANN 2005, vol 3697, pp 97–102

  85. Feng G, Huang G-B, Lin Q, Gay R (2009) Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20(8):1352–1357

    Article  Google Scholar 

  86. Lan Y, Soh YC, Huang G-B (2010) Random search enhancement of error minimized extreme learning machine. In: European symposium on artificial neural networks (ESANN 2010), Bruges, Belgium, Apr 2010, pp 327–332

  87. Li K, Huang G-B, Ge SS (2010) Fast construction of single hidden layer feedforward networks. In: Rozenberg G, Bäck T, Kok JN (eds) Handbook of natural computing. Springer, Berlin, Mar 2010

  88. Mao K-Z, Bilings SA (1997) Algorithms for minimal model structure detection in nonlinear dynamic system identification. Int J Control 68(2):311–330

    Article  MATH  Google Scholar 

  89. Lan Y, Soh YC, Huang G-B (2010) Constructive hidden nodes selection of extreme learning machine for regression. Neurocomputing 73:3191–3199

    Article  Google Scholar 

  90. Lan Y, Soh YC, Huang GB (2010) Two-stage extreme learning machine for regression. Neurocomputing 73:3028–3038

    Article  Google Scholar 

  91. Liu Q, He Q, Shi Z (2008) Extreme support vector machine classifier. Lect Notes Comput Sci 5012:222–233

    Article  Google Scholar 

  92. Huang G-B, Ding X, Zhou H (2010) Optimization method based extreme learning machine for classification. Neurocomputing 74:155–163

    Article  Google Scholar 

  93. Fletcher R (1981) Practical methods of optimization. In: Constrained optimization, vol 2. Wiley, New York

  94. Handoko SD, Keong KC, Soon OY, Zhang GL, Brusic V (2006) Extreme learning machine for predicting hla-peptide binding. Lect Notes Comput Sci 3973:716–721

    Article  Google Scholar 

  95. Sun Z-L, Au K-F, Choi T-M (2008) A neuro-fuzzy inference system through integration of fuzzy logic and extreme learning machines. IEEE Trans Syst Man Cybern Part B Cybern 37(5):1321–1331

    Article  Google Scholar 

  96. Tang X, Han M (2009) Partial lanczos extreme learning machine for single-output regression problems. Neurocomputing 72(13-15):3066–3076

    Article  Google Scholar 

  97. Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A (2010) OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw 21(1):158–162

    Article  Google Scholar 

  98. Yeu C-WT, Lim M-H, Huang G-B, Agarwal A, Ong Y-S (2006) A new machine learning paradigm for terrain reconstruction. IEEE Geosci Remote Sens Lett 3(3):382–386

    Article  Google Scholar 

  99. Soria-Olivas E, Gomez-Sanchis J, Martin JD, Vila-Frances J, Martinez M, Magdalena JR, Serrano AJ (2011) BELM: Bayesian extreme learning machine. IEEE Trans Neural Netw 22(3):505–509

    Article  Google Scholar 

  100. Xu Y, Dong ZY, Meng K, Zhang R, Wong KP (2011) Real-time transient stability assessment model using extreme learning machine. IET Gener Transm Distrib 5(3):314–322

    Article  Google Scholar 

  101. Barea R, Boquete L, Rodriguez-Ascariz JM, Ortega S, Lopez E (2011) Sensory system for implementing a human-computer interface based on electrooculography. Sensors 11(1):310–328

    Article  Google Scholar 

  102. Chang N-B, Han M, Yao W, Chen L-C, Xu S (2011) Change detection of land use and land cover in an urban region with SPOT-5 images and partial lanczos extreme learning machine. J Appl Remote Sens 4

  103. Saraswathi S, Sundaram S, Sundararajan N, Zimmermann M, Nilsen-Hamilton M (2011) ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented. IEEE ACM Trans Comput Biol Bioinforma 6(2):452–463

    Article  Google Scholar 

  104. Li F-C, Wang P-K, Wang G-E (2009) Comparison of the primitive classifiers with extreme learning machine in credit scoring. In: 2009 IEEE international conference on industrial engineering and engineering management, pp 685–688

  105. Choi K, Toh K-A, Byun H (2011) Realtime training on mobile devices for face recognition applications. Pattern Recogn 44(2):386–400

    Google Scholar 

  106. Chen FL, Ou TY (2011) Sales forecasting system based on gray extreme learning machine with Taguchi method in retail industry. Expert Syst Appl 38(3):1336–1345

    Article  Google Scholar 

  107. Ye Y, Squartim S, Piazza F (2010) Incremental-based extreme learning machine algorithms for time-variant neural networks. Lect Notes Comput Sci 6215:9–16

    Article  Google Scholar 

  108. Suresh S, Saraswathi S, Sundararajan N (2010) Performance enhancement of extreme learning machine for multi-category sparse data classification problems. Eng Appl Artif Intell 23(7):1149–1157

    Article  Google Scholar 

  109. Li G, Liu M, Dong M (2010) A new online learning algorithm for structure-adjustable extreme learning machine. Comput Math Appl 60(3):377–389

    Article  MathSciNet  MATH  Google Scholar 

  110. Liu Y, Xu X, Wang C (2009) Simple ensemble of extreme learning machine. In: Proceedings of the 2009 2nd international congress on image and signal processing, pp 2177–2181

  111. Deng W, Chen L (2010) Color image watermarking using regularized extreme learning machine. Neural Network World 20(3):317–330

    Google Scholar 

  112. Mohammed AA, Wu QMJ, Sid-Ahmed MA (2010) Application of wave atoms decomposition and extreme learning machine for fingerprint classification. Lect Notes Comput Sci 6112:246–256

    Article  Google Scholar 

  113. Minhas R, Baradarani A, Seifzadeh S, Wu QMJ (2010) Human action recognition using extreme learning machine based on visual vocabularies. Neurocomputing 73:1906–1917

    Article  Google Scholar 

  114. Malathi V, Marimuthu NS, Baskar S (2010) Intelligent approaches using support vector machine and extreme learning machine for transmission line protection. Neurocomputing 73:2160–2167

    Article  Google Scholar 

  115. Tang X-L, Han M (2010) Ternary reversible extreme learning machines: the incremental tri-training method for semi-supervised classification. Knowl Inf Syst 22(3):345–372

    Article  Google Scholar 

  116. Nizar AH, Dong ZY, Wang Y (2008) Power utility nontechnical loss analysis with extreme learning machine method. IEEE Trans Power Syst 23(3):946–955

    Article  Google Scholar 

  117. Cho JS, White H (2011) Testing correct model specification using extreme learning machines. Neurocomputing (in press)

  118. Wang Y, Cao F, Yuan Y (2011) A study on effectiveness of extreme learning machine. Neurocomputing (in press)

  119. Deng J, Li K, Irwin GW (2011) Fast automatic two-stage nonlinear model identification based on the extreme learning machine. Neurocomputing (in press)

Download references

Acknowledgments

This research was sponsored by the grant from Academic Research Fund (AcRF) Tier 1 under project no. RG 22/08 (M52040128).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guang-Bin Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, GB., Wang, D.H. & Lan, Y. Extreme learning machines: a survey. Int. J. Mach. Learn. & Cyber. 2, 107–122 (2011). https://doi.org/10.1007/s13042-011-0019-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-011-0019-y

Keywords

Navigation