Elsevier

Neurocomputing

Volume 214, 19 November 2016, Pages 483-494
Neurocomputing

Multi-view clustering with extreme learning machine

https://doi.org/10.1016/j.neucom.2016.06.035Get rights and content

Abstract

Nowadays, data always have multiple representations, and a good feature representation usually leads to a good clustering performance. Existing multi-view clustering works generally integrate multiple complementary information to gain better clustering performance rather than relying on a single view. However, these works usually focus on the combination of information rather than improving the feature representation capability of each view. As a new method, extreme learning machine (ELM) has excellent feature representation capability, easy parameter selection, and promising performance in various clustering tasks. This paper proposes a novel multi-view clustering framework with ELM to further improve clustering performance, and implements three algorithms based on this framework. In this framework, the normalized features of each individual view are mapped onto a higher dimensional feature space by the ELM random mapping. Afterwards, the unsupervised multi-view clustering is performed in this feature space. Thus far, this is the first work on multi-view clustering with ELM. Numerous baseline methods on five real-world datasets are empirically compared to show the effectiveness of the proposed algorithms. As indicated, the proposed algorithms yield superior clustering performance when compared with several state-of-art multi-view clustering methods in recent literatures.

Introduction

Clustering [1] partitions a dataset into groups or clusters, wherein objects belonging to a cluster are coherent internally, while those that fit to different ones are not coherent. In existing literatures, classical clustering algorithms, such as k-means [2], spectral clustering [3], Gaussian mixture models [4], fuzzy c-means [5], and so on, have been widely used in data mining [6], pattern recognition [7], information retrieval [8], etc. Among these algorithms, k-means and spectral clustering have achieved great success due to their simplicity and high efficiency. K-means performs to group the data into a set of clusters, and optimizes to minimize the squared error between the empirical mean of a cluster and the points in the cluster. Differently, spectral clustering exploits the properties of the Laplacian graphs constructed from the data points with the edges denoting their similarities.

In many practical applications, data has multiple representations or data sources [9], which usually contain complementary and compatible information, that are helpful for clustering. For example, in the clustering task of Oxford Flower17 [10], flowers can be represented by different features, such as color, shape and texture. Each representation is referred to as a particular view, which is beneficial for clustering. Excellent clustering performance can be achieved by combining these views together, which is known as multiple view learning [11].

Many efforts have been devoted on extending multi-view learning to clustering. These works can be roughly classified into three categories. Algorithms in the first category incorporate multi-view features into a common representation before or in the clustering process directly through optimizing certain loss functions [12], [13], [14], [15], [16], [17], [18]. For example, the method in [12], [13] incorporates multi-view features to construct the loss functions for clustering. Algorithms in the second category first project a multi-view dataset onto a common low-dimensional subspace and then apply clustering in this subspace. A representative method in this category is the canonical correlation analysis (CCA) based multi-view clustering [19], which uses CCA to project the high dimensional data onto a lower dimensional subspace. Algorithms in the third category are called late fusion or late integration [20], [21], which first learn a clustering solution from each individual view, and then combine these intermediate outputs to get a final consensus clustering solution. The proposed algorithms in this paper fall into the first category.

It is acknowledged that data become more linearly separable after they are mapped onto a high-dimensional feature space by a nonlinear data transformation. Thus, we can improve the feature representation capability of the input data in the high-dimensional feature space by various nonlinear transformations, such as Mercer based kernel methods [22]. However, the above-mentioned multi-view clustering methods generally focus on the combination of complementary information rather than further improving the feature representation capability. In clustering tasks, constructing a suitable representation for clustering is crucial. To get suitable representation, an abundant experience and a great quantity of work for feature extraction and parameter selection are needed. As a new method, ELM [23] has excellent feature representation capability and easy parameter selection. ELM has been widely used in various machine learning tasks, such as regression and classification [24], [25], [26], [27], [28], [29], [30]. Recently, ELM has been extended to clustering. A straightforward approach is to conduct clustering in the embedding space obtained by ELM [31], [32]. Benefitting from ELM's universal approximation capability [33], the data structure becomes much simpler after ELM feature transformation process. Thus this approach is convenient for implementation and is efficient for computation. Besides, it only needs an ELM feature mapping process which is simpler than the kernel based feature mapping methods and deep neural networks. The US-ELM algorithm [34] proposed by Huang et al. can capture the manifold structure of the output weight of ELM, which is shown to perform well on datasets with manifold property. Zhang et al. [35] proposed to iteratively train the ELM classifier for clustering, and some heuristics were introduced to avoid trivial solutions during the iterative training procedure. Kasun et al. [36] introduced a clustering algorithm by projecting the data along the output weights learned by the ELM Auto-Encoder. This algorithm has shown to be able to reduce the within-cluster variance and preserve between-cluster variance, because the output weights can learn the variance information of the data. Huang et al. [29] extended ELM to discriminative clustering, and proposed three novel clustering methods respectively based on weighted ELM, Fisher's linear discriminant analysis, and kernel k-means. However, all algorithm methods mentioned-above are conducted on the single view dataset.

This paper proposes a multi-view clustering framework with ELM to further improve clustering performance. In this framework, datasets are projected onto a high dimensional space to get better feature representation. Afterwards, multi-view clustering algorithm is performed in the high dimensional feature space. We implement three multi-view clustering algorithms using this framework: ELM-based co-regularized spectral clustering (Co-Reg-ELM), ELM-based robust multi-view spectral clustering (RMSC-ELM) and ELM-based multi-modal spectral clustering (MMSC-ELM). These algorithms are tested on a wide range of datasets.

The rest of this paper is organized as follows. Section 2 provides a brief overview of multi-view clustering and ELM. Section 3 presents the proposed multi-view clustering method. Section 4 shows the extensive experimental results. Finally, Section 5 concludes this paper.

Section snippets

Related work

Before introducing the proposed method for multi-view clustering, several clustering methods and the ELM theory, which have inspired this work, are reviewed in this section.

Multi-view clustering with ELM

This section describes the proposed approach which combines ELM and multi-view clustering algorithms. First, the framework of the proposed method is given. Next, three clustering algorithms are implemented by using our framework. A discussion of our work is then provided.

Experimental results

In this section, five benchmark datasets are tested under our proposed method, and we compare their performance with several state-of-the-art clustering algorithms. All studies are carried out on a computer with a 3.6 GHz Intel Xeon E5-1620 CPU and 48 GB of memory with Matlab R2014a (64bit).

Conclusion

This paper proposed a framework for multi-view clustering with ELM. The framework first normalizes the original features of each individual view, and then projects the normalized feature space onto a higher dimensional feature space by ELM. After that, unsupervised multi-view clustering is performed on this feature space. Three typical multi-view clustering algorithms are implemented with our framework. Benefitting from the better feature representation with ELM mapping, the clustering results

Acknowledgement

This work was supported by the National Natural Science Foundation of China under Nos. U1435219, 61402507, 61303070, 61403405 and the National High Technology Research and Development Program of China under No. 2012AA012706.

Qiang Wang received his B.S. degree in computer science and technology from Jilin University, Changchun, China, in 2011, and received his M.S. degree in computer science and technology from National University of Defense Technology, Changsha, China, in 2013. Now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computing, information security, and machine learning.

References (46)

  • G.-B. Huang et al.

    Extreme learning machinetheory and applications

    Neurocomputing

    (2006)
  • Y. Lan et al.

    Constructive hidden nodes selection of extreme learning machine for regression

    Neurocomputing

    (2010)
  • G.-B. Huang et al.

    Convex incremental extreme learning machine

    Neurocomputing

    (2007)
  • J. Goldberger et al.

    A hierarchical clustering algorithm based on the Hungarian method

    Pattern Recognit. Lett.

    (2008)
  • J.A. Hartigan

    Clustering Algorithms

    (1975)
  • S.P. Lloyd

    Least squares quantization in pcm

    IEEE Trans. Inf. Theory

    (1982)
  • U. Von Luxburg

    A tutorial on spectral clustering

    Stat. Comput.

    (2007)
  • J.D. Banfield et al.

    Model-based Gaussian and non-Gaussian clustering

    Biometrics

    (1993)
  • P. Berkhin

    A survey of clustering data mining techniques

  • A. Baraldi et al.

    A survey of fuzzy clustering algorithms for pattern recognition. i

    IEEE Trans. Syst. Man Cybern. Part B: Cybern.

    (1999)
  • A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the Eleventh Annual...
  • P. Gehler, S. Nowozin, On feature combination for multiclass object classification, in: Proceedings of the 2009 IEEE...
  • C. Xu, D. Tao, C. Xu, A survey on multi-view learning, arXiv preprint...
  • Cited by (45)

    • Multi-view subspace clustering networks with local and global graph information

      2021, Neurocomputing
      Citation Excerpt :

      Evidently, it is not a good choice to use a single-view clustering algorithm on multi-view data straightforward [9–14]. In this study, we consider the multi-view clustering problem based on the subspace clustering algorithm [4,15,6,16], which utilizes the linear subspace model for clustering. To be clear, the linear subspace model assumes that a data point can be represented by a linear combination of other points in the same cluster.

    • A survey of neighborhood construction algorithms for clustering and classifying data points

      2020, Computer Science Review
      Citation Excerpt :

      The basic idea of artificial neural network-based approaches is to perform the machine learning models by training feature subspace uses the concept of learning and simulation. To sum up, the work done regarding data categorizing can also be shown in these groups [57–62]. Most of these algorithms use neural networks [13,63] and Support Vector Machine (SVM) for classification [64,65].

    • Feature concatenation multi-view subspace clustering

      2020, Neurocomputing
      Citation Excerpt :

      Finally, Section 6 provides the conclusions. A lot of approaches have been proposed recently to solve the multi-view clustering problem [3,5,8,15,24,25,30,39,40,42,44,49,56–58,64,66,70]. Most existing multi-view clustering methods can be grouped into two main categories roughly: generative methods and discriminative methods [8].

    View all citing articles on Scopus

    Qiang Wang received his B.S. degree in computer science and technology from Jilin University, Changchun, China, in 2011, and received his M.S. degree in computer science and technology from National University of Defense Technology, Changsha, China, in 2013. Now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computing, information security, and machine learning.

    Yong Dou was born in 1966, professor, Ph.D. supervisor. He received his B.S., M.S., and Ph.D. degrees in computer science and technology from National University of Defense Technology in 1989, 1992 and 1995. His research interests include high performance computer architecture, reconfigurable computing, machine learning, and bioinformatics. He is a member of the IEEE and the ACM.

    Xinwang Liu is a research assistant at National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha, China. He received his B.S. degree in computer science and technology from Chongqing Technology and Business University, Chongqing, in 2006, and received M.S. and Ph.D. degrees in computer science and technology from National University of Defense Technology in 2008 and 2013. From October 2010, he spent one year in visiting the Engineering & Computer Science, the Australia National University, supported by the China Scholarship Council. From November 2011 to October 2012, he is a visiting student of the School of Computer Science and Software Engineering, University of Wollongong, Australia. His research interests focus on kernel learning and feature selection.

    Qi Lv is a Ph.D. candidate at National University of Defense Technology. He received his B.S. degree in computer science and technology from Tsinghua University, Beijing, in 2009, and received his M.S. degree in computer science and technology from National University of Defense Technology in 2011. His research interests include high performance computer architecture, machine learning, and remote sensing image processing.

    Shijie Li is a Ph.D. candidate at National University of Defense Technology. He received his B.S. and M.S. degrees in computer science and technology in National University of Defense Technology, in 2012 and 2014. His research interests include high performance computer architecture, parallel computing, and machine learning.

    View full text