Review
Research on particle swarm optimization based clustering: A systematic review of literature and techniques

https://doi.org/10.1016/j.swevo.2014.02.001Get rights and content

Abstract

Optimization based pattern discovery has emerged as an important field in knowledge discovery and data mining (KDD), and has been used to enhance the efficiency and accuracy of clustering, classification, association rules and outlier detection. Cluster analysis, which identifies groups of similar data items in large datasets, is one of its recent beneficiaries. The increasing complexity and large amounts of data in the datasets have seen data clustering emerge as a popular focus for the application of optimization based techniques. Different optimization techniques have been applied to investigate the optimal solution for clustering problems. Swarm intelligence (SI) is one such optimization technique whose algorithms have successfully been demonstrated as solutions for different data clustering domains. In this paper we investigate the growth of literature in SI and its algorithms, particularly Particle Swarm Optimization (PSO). This paper makes two major contributions. Firstly, it provides a thorough literature overview focusing on some of the most cited techniques that have been used for PSO-based data clustering. Secondly, we analyze the reported results and highlight the performance of different techniques against contemporary clustering techniques. We also provide an brief overview of our PSO-based hierarchical clustering approach (HPSO-clustering) and compare the results with traditional hierarchical agglomerative clustering (HAC), K-means, and PSO clustering.

Introduction

Regardless of the type, source, or format of the data, when it grows beyond certain limits, it becomes difficult to comprehend. Extracting information from such data becomes difficult. On one hand, the increase in the amount of data adds to the possibility of the data possessing more information but on the other hand, it decreases the speed of pattern extraction. Knowledge Discovery and Data mining (KDD) has made possible the discovery of such useful and hidden patterns in the data. KDD is the process of automatically searching large volumes of data for previously unknown, potentially interesting and informative patterns [1], [2], [3], [4]. KDD techniques are mainly influenced by modern information exploration techniques, however, they also rely on traditional computational techniques from statistics, information retrieval, machine learning and pattern recognition [5], [4]. Data selection is the first phase of KDD, and specifies the scope of the data to be used for information extraction. The data might come from different sources and need integration before the actual pattern extraction process. Primary and secondary attributes, which will be used in further analysis, are specified in the second stage, the data is analyzed and preprocessed to enhance the reliability of pattern extraction, removing irrelevant data, handling missing values in the data, and removing outlier observations from the data. The preprocessed data is then transformed into a suitable format to be processed by the data mining algorithms. Transformation includes sampling and feature selection. The transformed data is exploited by a data mining method and post-processed to reveal the hidden informative patterns. Finally, the evaluation and interpretation of the resulting patterns take place for decision making (Fig. 1).

Data mining, which is the core of KDD, extracts informative patterns such as clusters of relevant data, classification and association rules, sequential patterns and prediction models from different types of data such as textual data, audio-visual data, and microarray data. This data comes from different sources such as structured databases, semi-structured documents, and data warehouses, where the growing amount of data is challenging the information extraction capabilities of domain experts.

Data clustering, one of the most important techniques in data mining, aims to group unlabeled data into different groups on the basis of similarities and dissimilarities between the data elements [6], [5]. A typical clustering process involves feature selection, selection of a similarity measure, grouping of data, and assessment of the output. The process can be performed in a supervised, semi-supervised, or unsupervised manner. Different algorithms have been proposed that take into account the nature of the data, the quantity of the data, and other input parameters in order to cluster the data. Data clustering has received a lot of attention from researchers of various data mining domains. This has resulted in a number of approaches being suggested to address one or other aspects of the data clustering problem. Two of the most commonly used clustering approaches are partition-based clustering and hierarchy-based clustering.

Partition-based clustering divides the data into different groups based on similarities and dissimilarities among the data elements. Similarity measures differ from application to application, but the most common measures are distance-based, pattern-based, and density-based similarity measures. In distance-based similarity measures, a distance function is used to calculate the relative position of a data element inside the cluster by comparing it with the center of the cluster i.e. the centroid. The centroid changes its position during different iterations to improve the quality of the clusters in terms of intra-cluster and inter-cluster distances. The quality of a cluster is relative to an objective function which can be minimizing the intra-cluster distance, maximizing the inter-cluster distance, maximizing similarities, and minimizing dissimilarities among the data elements. K-means clustering technique is one of the foundations of the partitional approaches [6]. It uses a distance measure to divide the dataset into K-clusters. The assignment of data to a particular cluster-centroid is continued until the centroid remains unchanged in successive iterations or a maximum number of iterations is reached. In K-means clustering a data element belongs to only one cluster at a time. K-Harmonic means [7] is another partition-based clustering technique introduced to tackle the problem of sensitivity to the initialization of the K-means method. Partitional approaches are efficient but the quality of the solution depends on domain knowledge, initial centroid position, and the number of clusters that are specified in advance.

Hierarchical clustering provides a sequence of nested partitions of the dataset in the form of a hierarchy. It divides the data into a nested tree structure where the levels of the tree show similarity or dissimilarity among the clusters at different levels. A hierarchical approach is either agglomerative or divisive. A divisive approach is based on the splitting of one large cluster into sub-clusters [8]. In the divisive approach, the clustering process starts with every data element in one large cluster. The cluster is then divided into smaller clusters on the basis of proximity until some criteria related to the number of clusters or number of data elements per cluster have been reached. In the agglomerative approach, the clustering process starts with every data element in an individual cluster. Initially the two most similar objects are merged using a dissimilarity matrix by selecting the smallest value of distance between two data points. In each successive pass, the individual clusters are then merged based on the distance between these clusters which is calculated using any of the linkage methods. Linkage methods calculate the distance between two clusters to find the dissimilarity of the two clusters. There are three commonly used linkage methods, single linkage, average linkage, and complete linkage. In the single linkage method the distance between two clusters is calculated by taking the minimum distance between any two of the members of the clusters. Complete linkage represents inter-cluster distance by the maximum distance between any two of the members of the clusters. In the average linkage method the distance between two clusters is calculated by taking the average distance between all members of the two clusters. The clusters can be cut at some predefined level on the basis of dissimilarity among the data points [6]. The process continues until the last two clusters are merged into a single cluster. The visualization of hierarchical clustering can be shown using a dendrogram.

Apart from traditional hierarchical clustering methods, a number of other hierarchy based clustering techniques have been proposed such as Robust Hierarchical Clustering Algorithm, ROCK [9]. ROCK is a hierarchical clustering approach for categorical data that merges the clusters on the basis of number of common neighbors. Some of the algorithms integrate hierarchical clustering with partitional clustering such as BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) [10], CURE (Clustering Using Representatives) [11], and Chameleon (Hierarchical Clustering Algorithm Using Dynamic Modelling) [12]. Some hierarchical algorithms such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise) [13] use density instead of distance as a grouping function, and consider the high density areas as a cluster and the low density areas as outliers.

Unlike partitional clustering, hierarchical clustering does not need the number of clusters to be specified in advance. Hierarchical clustering provides a hierarchy of clusters as a solution while partitional clustering provides a single solution. In hierarchical clustering an element assigned to a cluster cannot be reassigned to another cluster in successive passes. While in partitional clustering a data element can go into different clusters in successive passes. Partitional clustering has lower execution time as compared to hierarchical clustering due to its lower algorithmic complexity [6].

Partitional clustering and hierarchical clustering have their own advantages and limitations in terms of generating numbers of clusters, generating different shapes, and overlapping boundaries of clusters. The exact number of natural groups in the data, initial choice of centroids, sensitivity to outliers, and algorithmic complexity are some of the issues which cause bottlenecks in the performance of a particular clustering technique.

Apart from their advantages, both techniques have deficiencies in terms of shape and structure of clusters, exact number of clusters, clustering configuration and degeneracy. To tackle these problems, optimization-based techniques have been investigated for data clustering. When optimization is involved in the process, it either uses an optimization technique as a data clustering algorithm or adds optimization to the existing data clustering approaches. Optimization-based clustering techniques treat data clustering as an optimization problem and try to optimize an objective function either to a minima or maxima. In the context of data clustering, a minimization objective function can be the intra-cluster distance and maximization can correspond to the inter-cluster distance. The results achieved so far from adding optimization to the data mining processes are promising. Optimization has significantly improved accuracy and efficiency while solving some other problems such as global optimization, multi-objective optimization and being trapped in local optima [14], [15], [16], [17]. The involvement of intelligent optimization techniques has been found to be effective in enhancing the performance of complex, real time, and costly data mining process. A number of optimization techniques have been proposed to add to the performance of the clustering process. Swarm intelligence (SI) is one such optimization area where techniques based on SI have been used extensively to either perform clustering independently or add to the existing clustering techniques. The next section provides an overview of some of the most important SI-based optimization techniques.

Overall the paper has three main aims. Firstly, it provides the details of evolution of different PSO based clustering techniques and their growth in the literature. We report these results in Section 3 highlighting the number of relevant papers and citations to these papers which were published in the last decade. The second contribution of this paper is to overview the literature of PSO based data clustering. We have highlighted the most cited work in the area of hybrid-PSO clustering and stand-alone PSO clustering. The last objective of this paper is to provide the comparative results of different techniques along with a discussion on pros and cons of each technique.

Section snippets

Swarm intelligence and particle swarm optimization (PSO)

Swarm intelligence (SI), inspired by the biological behavior of birds, is an innovative intelligent optimization technique [18], [19]. SI techniques are based on the collective behavior of swarms of bees, fish schools, and colonies of insects while searching for food, communicating with each other and socializing in their colonies. The SI models are based on self organization, decentralization, communication, and cooperation between the individuals within the team. The individual interaction is

Popularity and growth of the literature in swarm intelligence

Due to its simplicity and extendibility to different domains, the study and usage of SI techniques have tremendously increased in recent years. An enormous rise in the number of papers published and the number of citations of the SI-based optimization techniques have been recorded. To highlight this trend we conducted a systematic literature review of the area, basing our search on the Scopus database. The database has a variety of options to retrieve data. It supports regular expressions and

Application areas of PSO

This section highlights different application areas of PSO. Most of the clustering implementations using PSO are in the area of clustering of numeric and text data. An average annual increase of about 50% in the number of papers published is recorded in the last four years, as shown in Fig. 3. Some other KDD areas where PSO has been implemented are PSO-based outlier detection [26], PSO-based classification and association rule mining [27], [28], [29], [30], particle swarm based feature

Particle swarm optimization based data clustering

Particle swarm optimization was first used by Van der Merwe and Engelbrecht [17] for data clustering, where randomly created particles were mapped to one data vector. Each particle in that data vector is a representative of one centroid. The particles then moved to a better centroid position during each iteration. The evaluation of the method was based on the cost function that evaluates each candidate solution based on the positions of the proposed clusters׳ centroids. Xiao et al. [52]

Clustering comparison of selected techniques

PSO-based clustering has been studied by different researchers for a variety of different application areas. In this section we will overview some of such work and present their results while also discussing the pros and cons of their approaches.

We select different performance comparison measure such as inter-cluster distance (separation of clusters), intra-cluster distance (compactness of clusters) , number of function evaluations, quantization error (classification error), and accuracy

Future work

From the current literature survey in the field of PSO, we notice that there is an increasing trend in the number of papers and citations published in the area. There is evidence that the future of PSO will mostly be dominated by the research in the implementation of PSO in different application domains rather than in the foundations of PSO. Nonetheless there are still some remaining issues in the foundation of existing PSO techniques that still need to be investigated thoroughly. In this

Summary and conclusion

In this paper we have discussed the evolution of clustering techniques based on Particle Swarm optimization. We systematically surveyed the work, and presented the results of increasing trends in the literature of swarm intelligence, Particle Swarm Optimization and PSO-based data clustering. The literature survey (2 Swarm intelligence and particle swarm optimization (PSO), 3 Popularity and growth of the literature in swarm intelligence) and the comparison of results (Section 6) are evidence

References (69)

  • M.-Y. Chen

    Bankruptcy prediction in firms with statistical and intelligent techniques and a comparison of evolutionary computation approaches

    Comput. Math. Appl.

    (2011)
  • W.J. Frawley et al.

    Knowledge discovery in databasesan overview

    AI Mag.

    (1992)
  • A.H. Eschenfelder
    (1980)
  • P. Wright

    Knowledge discovery in databasestools and techniques

    Crossroads

    (1998)
  • H.A. Edelstein

    Introduction to Data Mining and Knowledge Discovery

    (1998)
  • U.M. Fayyad et al.

    Advances in Knowledge Discovery and Data Mining

    (1996)
  • A.K. Jain et al.

    Data clusteringa review

    ACM Comput. Surv. (CSUR)

    (1999)
  • B. Zhang, M. Hsu, U. Dayal, K-harmonic means – a spatial clustering algorithm with boosting, in: TSDM, 2000, pp....
  • M.R. Anderberg, Cluster Analysis for Applications, Technical Report, DTIC Document,...
  • T. Zhang, R. Ramakrishnan, M. Livny, Birch: an efficient data clustering method for very large databases, in: ACM...
  • S. Guha, R. Rastogi, K. Shim, Cure: an efficient clustering algorithm for large databases, in: ACM SIGMOD Record, vol....
  • G. Karypis et al.

    Chameleonhierarchical clustering using dynamic modeling

    Computer

    (1999)
  • J. Sander et al.

    Density-based clustering in spatial databasesthe algorithm gdbscan and its applications

    Data Min. Knowl. Discov.

    (1998)
  • C.-Y. Chen, F. Ye, Particle swarm optimization algorithm and its application to clustering analysis, in: 2004 IEEE...
  • S. Alam, G. Dobbie, P. Riddle, An evolutionary particle swarm optimization algorithm for data clustering, in: IEEE...
  • S. Das, A. Chowdhury, A. Abraham, A bacterial evolutionary algorithm for automatic data clustering, in: Proceedings of...
  • D. Van der Merwe, A. Engelbrecht, Data clustering using particle swarm optimization, in: The 2003 Congress on...
  • A. Abraham, H. Guo, H. Liu, Swarm intelligence: foundations, perspectives and applications, in: Swarm Intelligent...
  • A.P. Engelbrecht
    (2005)
  • E. Bonabeau et al.

    Swarm intelligencea whole new way to think about business

    Harv. Bus. Rev.

    (2001)
  • D. Martens et al.

    Editorial surveyswarm intelligence for data mining

    Mach. Learn.

    (2011)
  • A. Forestiero et al.

    A single pass algorithm for clustering evolving data streams based on swarm intelligence

    Data Min. Knowl. Discov.

    (2013)
  • J. Kennedy, R. Eberhart, Particle swarm optimization, in: Proceedings of the IEEE International Conference on Neural...
  • J.F. Kennedy et al.

    Swarm Intelligence

    (2001)
  • Cited by (0)

    View full text