Elsevier

Pattern Recognition

Volume 46, Issue 9, September 2013, Pages 2562-2575
Pattern Recognition

Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data

https://doi.org/10.1016/j.patcog.2013.02.005Get rights and content

Abstract

Many conventional soft subspace clustering techniques merge several criteria into a single objective to improve performance; however, the weighting parameters become important but difficult to set. In this paper, a novel soft subspace clustering with a multi-objective evolutionary approach (MOEASSC) is proposed to this problem. This clustering method considers two types of criteria as multiple objectives and optimizes them simultaneously by using a modified multi-objective evolutionary algorithm with new encoding and operators. An indicator called projection similarity validity index (PSVIndex) is designed to select the best solution and cluster number. Experiments on many datasets demonstrate the usefulness of MOEASSC and PSVIndex, and show that our algorithm is insensitive to its parameters and is scalable to large datasets.

Highlights

► A soft subspace clustering with multi-objective evolutionary approach is proposed. ► A multi-objective evolutionary approach is designed with several characteristics. ► An index is designed to identify the best solution and the cluster number. ► The experimental results are presented on kinds of datasets. ► The algorithm is insensitive to its parameters and scalable to large dataset.

Introduction

In the data mining, image processing and biomedicine fields clustering is a very important and practical technique. Despite many successful cases [1], [2], [3], major challenges remain, especially for clustering high-dimensional data. In the high-dimensional feature space, conventional clustering techniques often fail to obtain satisfactory results because the distances between any pair of objects are almost equivalent so that conventional distance-measurement-based clustering algorithms do not work well. In addition, finding clusters in the entire space is impossible because of many irrelevant and redundant dimensions. These clusters are often relevant to partial dimensions and always have distinct subspaces.

Subspace generation technique can be used to effectively cluster this type of dataset. For unsupervised learning, this technique includes four methods: feature subset selection, feature weight learning, hard subspace clustering and soft subspace clustering. For feature selection, two approaches were proposed: filter and wrapper. The filter approach evaluates, ranks and selects features based on criteria such as mutual information [4], chi-square test [5], pairwise constraints [6] and Laplacian score [7]. The filter method is computationally cheap, but obtaining a small and good feature subset and to determine a proper cut-off point for rankings are difficult. The wrapper method “wraps” the feature selection in a learning algorithm. In [8], [9], [10], the multi-objective optimization algorithm, the ensemble method and a new unbiased criterion on both cluster and feature numbers were used to design learning algorithms for feature subset selection, respectively. In [11], Tsang et al. started with an empty set and added features by using a greedy algorithm. This procedure was equivalent to searching a path with the least columns in the extension matrix based on a similarity measure. The wrapper method is computationally expensive but usually outperforms the filter method in accuracy.

Feature weight learning is a generation of feature subset selection that uses real-valued number to describe the relevance of a feature to clusters. Yeung et al. [12] obtained feature weights through a learning process which used the gradient descent technique to minimize a similarity-based function. Many feature weight learning methods are realized in K-means clustering by replacing the distance measurement with the weighted one [13], [14], [15]. The difference is that different methods were used to adjust the weights. Hung et al. [13] used the bootstrapping approach to obtain statistical variations in the data to evaluate feature weights. Tsai et al. [14] updated feature weights by simultaneously minimizing the separations within-clusters and maximizing the separations between clusters.

Both conventional feature subset selection and feature weight learning are global methods [16] which consider that the same feature selection result is valid for all clusters. By contrast, the subspace clustering method used in this paper is a local method, in which each cluster has its own weight vector. The global method is a special case of the local method in which all the weight vectors are same. The local method has a more generalized ability to generate subspaces. The relationship among the four methods is shown in Fig. 1.

Hard subspace clustering assumes that the membership between dimensions and clusters is binary; thus, each cluster has an exact subspace. It also considers that all relevant dimensions have the same importance. The more detailed hard subspace clustering can be divided into bottom-up and top-down subspace search methods. The representatives of the bottom-up method include CLIQUE [17] and ENCLUS [18], which are based on static grid, and CBF [19], DOC [20] and MAFIA [21], which are based on adaptive grid. Their performance depends heavily on the grid size and density threshold parameters. Examples of the top-down subspace clustering are PROCLUS [22], ORCLUS [23], FINDIT [24] and FLOC [25]. A disadvantage of this method is that the cluster number and the subspace size should be predetermined, but they are difficult to set. Details of hard subspace clustering are given by Parsons et al. [26].

In many situations, the descriptions of the data from each feature dimension are deflected because of some disturbances. These irrelevant dimensions, which are considered in hard subspace clustering, may be meaningful but have large noises. In addition, the contributions of each relevant dimension would be different. Much useful information in hard subspace clustering is ignored because exact subspaces are detected and the contributions of the relevant dimensions are the same. Comparatively, the soft subspace clustering assigns different weight values to the dimensions for each cluster to build a more reasonable non-binary relationship between dimensions and clusters.

The minimization objective function of the soft subspace clustering algorithm is formulated in Eq. (1)J(V,U,W)=JIn(V,U,W)+iγiJAddi(V,U,W)where JIn represents the within-cluster dispersion, JAddi is an addition that contains information for evaluating the clustering results from other views, γi (γi>0) is a weighting parameter used to control the strength of JAddi and V, U and W are the cluster center, partition and weight matrices, respectively.

JIn usually has the following formulation:JIn(V,U,W)=i=1Cj=1Nuijηk=1Dwikβdis(ajk,vik)s.t.0uij1,i=1Cuij=1,0wik1,k=1Dwik=1,where C is the number of clusters, N is the number of objects, D is the number of feature dimensions, uij is the membership of the jth object that belongs to the ith cluster, wik is the weight value of the kth dimension for the ith cluster, dis(·) is a certain distance measure, ajk is the value of the jth object in the kth dimension, vik is the value of the ith cluster center in the kth dimension and η (η≥1) and β (β≥1) are two parameters.

The earlier proposed soft subspace clustering algorithms did not contain JAdd [27], [28]; recently, however, an increasing number of novel algorithms have adopted some additions in the design of objective function to further improve the performance. Jing et al. [29] proposed a fuzzy weighting K-means algorithm (FWKM), where JAdd is the sum of all the fuzzy weights and γ is the average dispersion of the entire data. The main function is to prevent the weights from becoming infinite. The fuzzy subspace clustering (FSC) is a similar algorithm suggested by Gan et al. [30]. In [31] and [32], both the entropy weighting K-means (EWKM) and the local adaptive clustering (LAC) used the weight entropy as an addition, which represented the certainty of dimensions in the identification of a cluster. It preferred the selection of more dimensions to obtain complete subspaces. Another new subspace clustering called FG-K-means [33] also utilized the concept of entropy as additions. FG-K-means not only considered the weight entropy of dimensions but also divided the dimensions into some feature groups and designed the weight entropy of the feature groups. Furthermore, in the enhanced soft subspace clustering (ESSC) [34], the addition was composed of the weight entropy and the between-cluster information.

Although JAdd can promote the performance of soft subspace clustering algorithms by considering the clustering results from different perspectives, setting the parameter γ becomes a problem. When the weight matrix is updated in the iterative process, all the weight values concern γ. For example, the iterative formula of the weights in EWKM is updated by Eq. (3), where Dik represents the dispersion of the data values in the kth dimension of the objects in the ith cluster. When γ is small, wik is sensitive to the value of Dik. The clustering results can become unsteady sometimes. When γ is too large, the influence of Dik on wik is inappreciable such that all the weights tend to be equal. This condition means that clustering in the entire space conflicts with the hypothesis of clustering in many subspaces. Another shortcoming is that the design of Eq. (1) should ensure that the formulas for V, U and W are expressed explicitly. Hence, the construction of the objective function is limitedwik=exp(Dik/γ)/l=1Dexp(Dil/γ)

Multi-objective evolutionary clustering has recently been found to exhibit significant performance because it evaluates the quality of a clustering solution based on a diverse set of criteria [35]. If JIn and JAdd are considered two separate objectives and a multi-objective evolutionary algorithm is used to realize the soft subspace clustering, the problem of setting the value of γ will be solved, and the limitation of the formula derivation will be eliminated due to the adoption of the optimization method.

Based on the above idea, a novel multi-objective-evolutionary-approach-based soft subspace clustering (MOEASSC) is presented. This approach optimizes two minimization objective functions simultaneously by using a proposed multi-objective evolutionary approach. To reduce the difficulty in identifying the subspace of each cluster, the cluster memberships of the objects, and the number of clusters, a two-step method is suggested. First, the cluster number is fixed and MOEASSC is used to obtain a set of trade-off solutions with distinct centers and weights but the same cluster number. Then, an indicator called projection similarity validity index (PSVIndex) is designed to select the best solution. Second, to obtain the best solution under different cluster numbers, a proper number is identified by combining PSVIndex with the gap-statistic method [36]. In the experiments, MOEASSC is used for clustering three synthetic datasets, ten UCI datasets and three bearing-fault datasets. The results demonstrate that our algorithm always has better clustering accuracy than other subspace clustering algorithms such as EWKM, ESSC and FSC.

This paper is organized as follows: Section 2 reviews some related work on multi-objective evolutionary clustering. Section 3 presents a detailed description of MOEASSC. Section 4 designs a method for both selecting the best solution from the Pareto set and identifying the cluster number. Section 5 presents the experiments on the kinds of datasets and discusses the robustness and scalability. Section 6 concludes this paper.

Section snippets

Related work

Usually, designing some validity criteria under certain assumptions is necessary to assess the quality of the clustering result because clustering is an unsupervised learning process. Traditional clustering algorithms always have a strong bias towards a single criterion; therefore, they are not suitable for a wide variety of datasets. The multi-objective evolutionary clustering algorithm regards these criteria as many separately objectives to be optimized, which is the same as considering many

Proposed algorithm: MOEASSC

The design of an appropriate multi-objective evolutionary algorithm is the key in clustering. In MOEASSC, NSGA-II [43] is adopted as the basis of the proposed algorithm, but a series of changes are applied to enable the algorithm to cope better with subspace clustering. NSGA-II has been successfully used in many multi-objective evolutionary clustering algorithms [40], [41], [42]. Compared with other algorithms, it has a lower time complexity in selecting nondominated solutions and better

Method of selecting the best solution and the cluster number

For a fixed cluster number, a method of choosing the best solution was proposed based on similarity among objects on relevant dimensions. The method is described as follows: first, each dimension is divided into many segments (the number of segments is denoted as totalSeg) before clustering. Then, all objects are projected onto each dimension. Projectik=j means that the ith object is projected onto the jth interval of the kth dimension. j is called projection coordinate, as shown in Fig. 4.

Experiments and analysis

In this section, the testing results of both MOEASSC and the other algorithms in numerous datasets are shown for comparison. The other algorithms include EWKM [31], ESSC [34], FSC [30] and SOEASSC. SOEASSC is a soft clustering algorithm with a single-objective evolutionary approach. The objective of SOEASSC is obtained by weighting the two objectives of MOEASSC. Its genetic operators and local search strategy are the same as those of MOEASSC. The objective functions of EWKM, ESSC, FSC and

Conclusions

In this paper, a new soft subspace clustering algorithm called MOEASSC was proposed to address the clustering problem of high-dimensional data. In contrast to conventional soft subspace clustering, MOEASSC minimizes two objective functions simultaneously, which are the within-cluster dispersion and the one with negative weight entropy and separation between clusters. These functions are optimized by using a special multi-objective evolutionary approach, which involves mixed encoding, a local

Conflict of interest

The authors declare that they have no conflict of interest.

Acknowledgments

The authors gratefully acknowledge the reviewers for their comments. This work was jointly supported by the Fundamental Research Funds for the Central University and Scientific Research Project of Guangdong Province under Grant 2011A080401008.

Hu Xia received the B.Eng. degree in mechanical engineering from Xi'an Jiaotong University, in 2004. Now, he is proceeding to a Ph.D. degree at the School of Mechanical Engineering, Xi'an Jiaotong University. His current research focuses on artificial intelligence and failure diagnoses.

References (50)

  • P.J. Rousseeuw

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

    Journal of Computational and Applied Mathematics

    (1987)
  • R. Ng et al.

    CLARANS: a method for clustering objects for spatial data mining

    IEEE Transactions on Knowledge and Data Engineering

    (2002)
  • P.C.H. Ma et al.

    An evolutionary clustering algorithm for gene expression microarray data analysis

    IEEE Transactions on Evolutionary Computation

    (2006)
  • X. Jin et al.

    Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles

    Lecture Notes in Computer Science

    (2006)
  • X. He et al.

    Laplacian score for feature selection

    (2005)
  • J. Handl et al.

    Feature subset selection in unsupervised learning via multiobjective optimization

    International Journal of Computational Intelligence Research

    (2006)
  • M. Breaban et al.

    A unifying criterion for unsupervised clustering and feature selection

    Pattern Recognition

    (2010)
  • E.C.C. Tsang et al.

    OFFSS: optimal fuzzy-valued feature subset selection

    IEEE Transactions on Fuzzy Systems

    (2003)
  • D.S. Yeung et al.

    Improving performance of similarity-based clustering by feature weight learning

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • R.C. de Amorim, Learning Feature Weights for K-means Clustering Using Theminkowski Metric, Ph.D. Thesis, University of...
  • R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, Automatic subspace clustering of high dimensional data for data...
  • C.H. Cheng, A.W. Fu, Y. Zhang, Entropy-based subspace clustering for mining numerical data, in: Proceedings of the...
  • J.W. Chang, D.S. Jin, A new cell-based clustering method for large, high-dimensional data in data mining applications,...
  • C.M. Procopiuc, M. Jones, P.K. Agarwal, T.M. Murali, A Monte Carlo algorithm for fast projective clustering, in:...
  • S. Goil, H. Nagesh, A. Choudhary, Mafia: Efficient and Scalable Subspace Clustering for very Large Data Sets, Technical...
  • Cited by (57)

    • Feature weighting methods: A review

      2021, Expert Systems with Applications
    • A multi-objective gradient optimizer approach-based weighted multi-view clustering

      2021, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      In Wang et al. (2013), a multi-objective multi-view spectral clustering technique is proposed, and the Pareto front identifies optimal data partitioning. In Xia et al. (2013), authors introduce an NSGA2-based multi-objective technique named (MOEASSC) to minimize from the partitioning of all views two conflictual objective functions: the first one is the within-cluster dispersion, and the second one is the sum of the between-clusters distance and the negative information of weight entropy among clusters. MOEASSC aims to find for each cluster the optimal weight vector and the optimal cluster centroids.

    View all citing articles on Scopus

    Hu Xia received the B.Eng. degree in mechanical engineering from Xi'an Jiaotong University, in 2004. Now, he is proceeding to a Ph.D. degree at the School of Mechanical Engineering, Xi'an Jiaotong University. His current research focuses on artificial intelligence and failure diagnoses.

    Jian Zhuang received the B.Eng., M.S. and Ph.D. degrees from Xi'an Jiaotong University, in 1996, 1999 and 2003, respectively. He is currently an associate professor in the School of Mechanical Engineering, Xi'an Jiaotong University. His research involved in artificial intelligence, image processing and failure diagnoses.

    Dehong Yu joined Xi'an Jiaotong University in 1985, where he is currently a professor in the School of Mechanical Engineering, Xi'an Jiaotong University. His research interests include plastic processing and failure diagnoses.

    View full text