Top

Neural Computing and Applications

Published in:

24-06-2019 | Original Article

Approximate empirical kernel map-based iterative extreme learning machine for clustering

Authors: Chuangquan Chen, Chi-Man Vong, Pak-Kin Wong, Keng-Iam Tai

Published in: Neural Computing and Applications | Issue 12/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Maximum margin clustering (MMC) is a recent approach of applying margin maximization in supervised learning to unsupervised learning, aiming to partition the data into clusters with high discrimination. Recently, extreme learning machine (ELM) has been applied to MMC (called iterative ELM clustering or ELMC^Iter) which maximizes the data discrimination by iteratively training a weighted extreme learning machine (W-ELM). In this way, ELMC^Iter achieves a substantial reduction in training time and provides a unified model for both binary and multi-class clusterings. However, there exist two issues in ELMC^Iter: (1) random feature mappings adopted in ELMC^Iter are unable to well obtain high-quality discriminative features for clustering and (2) a large model is usually required in ELMC^Iter because its performance is affected by the number of hidden nodes, and training such model becomes relatively slow. In this paper, the hidden layer in ELMC^Iter is encoded by an approximate empirical kernel map (AEKM) rather than the random feature mappings, in order to solve these two issues. AEKM is generated from low-rank approximation of the kernel matrix, derived from the input data through a kernel function. Our proposed method is called iterative AEKM for clustering (AEKMC^Iter), whose contributions are: (1) AEKM can extract discriminative and robust features from the kernel matrix so that better performance is always achieved in AEKMC^Iter and (2) AEKMC^Iter produces an extremely small number of hidden nodes for low memory consumption and fast training. Detailed experiments verified the effectiveness and efficiency of our approach. As an illustration, on the MNIST10 dataset, our approach AEKMC^Iter improves the clustering accuracy over ELMC^Iter up to 5%, while significantly reducing the training time and the memory consumption (i.e., the number of hidden nodes) up to 1/7 and 1/20, respectively.

previous article Short-term solar power prediction using multi-kernel-based random vector functional link with water cycle algorithm-based parameter optimization

next article Detecting outliers in industrial systems using a hybrid ensemble scheme

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html.

http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html.

http://www.escience.cn/people/fpnie/papers.html.

http://www.csie.ntu.edu.tw/∼cjlin/libsvmtools/datasets/multiclass.html#usps.

https://www.mathworks.com/matlabcentral/fileexchange/41459-6-functions-for-generating-artificial-datasets.

http://manifold.cs.uchicago.edu/manifold_regularization/data.html.

Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297MATH

Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501CrossRef

Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42:513–529CrossRef

Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle RiverMATH

Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J Roy Stat Soc: Ser C (Appl Stat) 28:100–108MATH

Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556

McLachlan GJ, Lee SX, Rathnayake SI (2000) Finite mixture models. Ann Rev Stat Appl 6:355–378MathSciNetCrossRef

Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Proceedings of advances in neural information processing systems, pp 849–856

Xu L, Neufeld J, Larson B, Schuurmans D (2005) Maximum margin clustering. In: Proceedings of advances in neural information processing systems, pp 1537–1544

10.

Valizadegan H, Jin R (2007) Generalized maximum margin clustering and unsupervised kernel learning. In: Proceedings of advances in neural information processing systems, pp 1417–1424

11.

Zhang K, Tsang IW, Kwok JT (2009) Maximum margin clustering made practical. IEEE Trans Neural Netw 20:583–596CrossRef

12.

Bezdek JC, Hathaway RJ (2003) Convergence of alternating optimization. Neural Parallel Sci Comput 11:351–368MathSciNetMATH

13.

Zhang C, Xia S, Liu B, Zhang L (2013) Extreme maximum margin clustering. IEICE Trans Inf Syst 96:1745–1753CrossRef

14.

Huang G, Liu T, Yang Y, Lin Z, Song S, Wu C (2015) Discriminative clustering via extreme learning machine. Neural Netw 70:1–8CrossRef

15.

Zong W, Huang G-B, Chen Y (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229–242CrossRef

16.

He Q, Jin X, Du C, Zhuang F, Shi Z (2014) Clustering in extreme learning machine feature space. Neurocomputing 128:88–95CrossRef

17.

Huang G, Song S, Gupta JN, Wu C (2014) Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern 44:2405–2417CrossRef

18.

Achlioptas D (2003) Database-friendly random projections: Johnson–Lindenstrauss with binary coins. J Comput Syst Sci 66:671–687MathSciNetCrossRef

19.

Li C, Deng C, Zhou S, Zhao B, Huang G-B (2018) Conditional random mapping for effective elm feature representation. Cognit Comput 10:1–21CrossRef

20.

Zhang K, Lan L, Wang Z, Moerchen F (2012) Scaling up kernel SVM on limited resources: a low-rank linearization approach. Proc Mach Learn Res 22:1425–1434

21.

Golts A, Elad M (2016) Linearized kernel dictionary learning. IEEE J Sel Top Sig Process 10:726–739CrossRef

22.

Pourkamali-Anaraki F, Becker S (2016) A randomized approach to efficient kernel clustering. In: Proceedings of the IEEE global conference on signal and information processing, pp 207–211

23.

Vong C-M, Chen C, Wong P-K (2018) Empirical kernel map-based multilayer extreme learning machines for representation learning. Neurocomputing 310:265–276CrossRef

24.

Zhang K, Kwok JT (2010) Clustered Nyström method for large scale manifold learning and dimension reduction. IEEE Trans Neural Netw 21:1576–1587CrossRef

25.

Kumar S, Mohri M, Talwalkar A (2012) Sampling methods for the Nyström method. J Mach Learn Res 13:981–1006MathSciNetMATH

26.

Scholkopf B, Mika S, Burges CJ, Knirsch P, Muller K-R, Ratsch G, Smola AJ (1999) Input space versus feature space in kernel-based methods. IEEE Trans Neural Netw 10:1000–1017CrossRef

27.

Pourkamali-Anaraki F, Becker S (2018) Randomized clustered Nystrom for large-scale kernel machines. ArXiv preprint arXiv:1612.06470. Accessed 20 May 2018

28.

Gittens A, Mahoney MW (2013) Revisiting the Nyström method for improved large-scale machine learning. J Mach Learn Res 28:567–575MATH

29.

Drineas P, Kannan R, Mahoney MW (2006) Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J Comput 36:132–157MathSciNetCrossRef

30.

Drineas P, Mahoney MW (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetMATH

31.

Williams CK, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Proceedings of advances in neural information processing systems, pp 682–688

32.

Zhang K, Tsang IW, Kwok JT (2008) Improved Nyström low-rank approximation and error analysis. In: Proceedings of the 25th international conference on machine learning, pp 1232–1239

33.

Lichman M (2013) UCI machine learning repository, University of California, School of Information and Computer Sciences, Irvine, CA. http://archive.ics.uci.edu/ml

Title: Approximate empirical kernel map-based iterative extreme learning machine for clustering
Authors: Chuangquan Chen
Chi-Man Vong
Pak-Kin Wong
Keng-Iam Tai
Publication date: 24-06-2019
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 12/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-019-04295-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 12/2020

Multi-granularity bidirectional attention stream machine comprehension method for emotion cause extraction

A novel hybrid network of fusing rhythmic and morphological features for atrial fibrillation detection on mobile ECG signals

Analysis of Boolean functions based on interaction graphs and their influence in system biology

Dangerous goods detection based on transfer learning in X-ray images

SRS-DNN: a deep neural network with strengthening response sparsity

A dynamic ensemble learning algorithm for neural networks

Premium Partner