Ant Colony Optimization based clustering methodology
Graphical abstract
Introduction
Cluster analysis is the organization of a collection of data points into clusters based on similarity [1]. Clustering is usually considered as an unsupervised classification task. That is, the characteristics of the clusters and the number of clusters are not known a priori, and they are extracted during the clustering process. In this work we focus on spatial data sets in which a priori information about the data set (the number of clusters, shapes and densities of the clusters) is not available. Finding such clusters has applications in geographical information systems [2], computer graphics [3], and image segmentation [4]. In addition, clusters of spatial defect shapes provide valuable information about the potential problems in manufacturing processes of semiconductors [5], [6].
We consider spatial clustering as an optimization problem. Our aim is to obtain compact, connected and well-separated clusters. To the best of our knowledge, there is not a single objective function that works well for any kind of geometrical clustering structure. Therefore, we first introduce two solution evaluation mechanisms for measuring the quality of a clustering solution. The main idea behind both mechanisms is similar, and each mechanism is based on two objectives: adjusted compactness and relative separation. The first objective measures the compactness and connectivity of a clustering solution, and the second objective is a measure for separation. The difference between the two mechanisms is the degree of locality addressed in the calculations. The main advantage of these objectives is that the length of an edge is evaluated relatively, that is, it is scaled relative to the lengths of other edges within its neighborhood. This scaling permits us to evaluate the quality of the clustering solution independent of the shape and density of the clusters.
We implement the proposed solution evaluation mechanisms in a clustering framework based on Ant Colony Optimization (ACO). In order to find the target clusters, we use two complementary objective functions (adjusted compactness and relative separation) in a multiple-objective context. Hence, the output of ACO-C is a set of non-dominated solutions. Different from the literature, we are not interested in finding all non-dominated solutions or the entire Pareto efficient frontier. ACO-C has two pre-processing steps: neighborhood construction and data set reduction. Neighborhood construction extracts the local connectivity, proximity and density information inherent in the data set. Data set reduction helps reduce the storage requirements and processing time for the clustering task. Our experimental results indicate that ACO-C finds the arbitrary-shaped clusters with varying densities effectively, where the number of clusters is unknown.
Our contributions to the literature are as follows:
- 1.
The proposed solution evaluation mechanisms allow us to quantify the quality of a clustering solution having arbitrary-shaped clusters with different densities in an optimization context. The use of these evaluation mechanisms is not restricted to ACO. They can be used in other metaheuristics and optimization-based clustering approaches.
- 2.
The proposed ACO-based methodology introduces a general, unified framework for the spatial clustering problem without a priori information. It includes the solution evaluation mechanism, extraction of local properties, data set reduction, and the clustering task itself.
- 3.
ACO-C is a novel methodology for the clustering problem in which there is no a priori information, that is,
- •
the number of clusters is unknown,
- •
clusters may have arbitrary shapes,
- •
there may be density variations within the clusters, and
- •
different clusters may have density differences.
- •
We provide the related literature in Section 2. Section 3 introduces the solution evaluation mechanisms. The details of ACO-C are explained in Section 4. Section 5 is about the empirical performance of ACO-C. First, we set the algorithm parameters using a full factorial design. Then, we compare ACO-C with some well-known algorithms. Finally, we conclude in Section 6.
Section snippets
Related literature
The clustering algorithms can be classified into partitional, hierarchical, density-based algorithms, and metaheuristics (simulated annealing, tabu search, evolutionary algorithms, particle swarm optimization, ACO, and so on). [1], [7], [8] provide comprehensive reviews of clustering approaches.
In this section, we present the related literature on the solution evaluation mechanisms and ant-based clustering algorithms.
How to evaluate a clustering solution?
Our aim is to obtain compact, connected and well-separated clusters. For this purpose, we introduce two solution evaluation mechanisms: Clustering Evaluation Relative to Neighborhood (CERN) and Weighted Clustering Evaluation Relative to Neighborhood (WCERN).
The ACO-based clustering (ACO-C) methodology
ACO-C is a clustering methodology in a multi-objective framework. It has two pre-processing steps: neighborhood construction and data set reduction. Neighborhood construction helps extract the local information inherent in the data set. This local information is used in the solution evaluation. Data set reduction ensures the scalability of the approach.
In ACO-C an ant is a search agent. Ants construct tours by inserting edges between pairs of data points. Connected points in a tour form a
Experimental results for the ACO-C methodology
In this section, we test the performance of ACO-C empirically. First, we present the test data sets and the performance evaluation criteria. Second, using some pilot data sets, we conduct a full factorial experiment in order to set the ACO-C parameters. Third, we elaborate on the impact of data set reduction. Finally, we compare the performance of ACO-C with other clustering algorithms.
The algorithm was coded in Matlab 7.9 and run on a PC with Intel Core2 Duo 2.33 GHz processor and 2 GB RAM.
Conclusion
In this work we consider the spatial clustering problem with no a priori information on the data set. The clusters may include arbitrary shapes, and there may be density differences within and between the clusters. Moreover, the number of clusters is unknown. We present a novel ACO-based clustering methodology, namely ACO-C. In ACO-C we combine the connectivity, proximity, density and distance information with the exploration and exploitation capabilities of ACO in a multi-objective framework.
References (75)
- et al.
Genetic clustering for automatic evolution of clusters and application to image classification
Pattern Recogn.
(2002) - et al.
GAPS: a clustering method using a new point symmetry-based distance measure
Pattern Recogn.
(2007) - et al.
A symmetry-based multi-objective clustering technique for automatic evolution of clusters
Pattern Recogn.
(2010) - et al.
Some connectivity-based cluster validity indices
Appl. Soft Comput.
(2012) - et al.
A generalized automatic clustering algorithm in a multi-objective framework
Appl. Soft Comput.
(2013) - et al.
An ant colony approach for clustering
Anal. Chim. Acta
(2004) - et al.
Application of ant k-means on clustering analysis
Comput. Math. Appl.
(2005) - et al.
Hybridization strategies for continuous ant colony optimization and particle swarm optimization applied to data clustering
Appl. Soft Comput.
(2013) - et al.
An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis
Appl. Soft Comput.
(2010) Finding groups in data: cluster analysis with ants
Appl. Soft Comput.
(2009)
An aggregated clustering approach using multi-ant colonies algorithms
Pattern Recogn.
Formation of an ant cemetery: swarm intelligence or statistical accident?
Future Gener. Comput. Syst.
A novel ant-based clustering algorithm using Renyi entropy
Appl. Soft Comput.
A hierarchical ant-based clustering algorithm and its use in three real-world applications
Eur. J. Oper. Res.
ACODF: a novel data clustering approach for data mining in large databases
J. Syst. Softw.
Aggregation pheromone density-based data clustering
Inf. Sci.
Chaotic ant swarm approach for data clustering
Appl. Soft Comput.
An adaptive neighbourhood construction algorithm based on density and connectivity
Pattern Recogn. Lett.
Data clustering: a review
ACM Comput. Surv.
Voronoi-based region approximation for geographical information retrieval with gazetteers
Int. J. Geogr. Inf. Sci.
Point-based computer graphics
IEEE Comput. Graph. Appl.
Yet another survey on image segmentation: region and boundary information integration
Lect. Notes Comput. Sci.
A model-based clustering approach to the recognition of the spatial defect patterns produced semiconductor fabrication
IIE Trans.
Detection and classification of defect patterns on semiconductor wafers
IIE Trans.
Survey of clustering algorithms
IEEE Trans. Neural Netw.
A survey of clustering data mining techniques
Some methods for classification and analysis of multivariate observations
Algorithm AS 136: a k-means clustering algorithm
J. R. Stat. Soc.: Ser. C (Appl. Stat.)
Finding Groups in Data: An Introduction to Cluster Analysis
Graph-theoretical methods for detecting and describing gestalt clusters
IEEE Trans. Comput.
Algorithms for clustering data
Prentice-Hall Advanced Reference Series
Cluster validity methods: Part I
ACM SIGMOD Rec.
Cluster validity methods: Part II
ACM SIGMOD Rec.
Nonparametric genetic clustering: comparison of validity indices
IEEE Trans. Syst. Man Cybern. – Part C: Appl. Rev.
A genetic algorithm for cluster analysis
Intell. Data Anal.
A weighted sum validity function for clustering with a hybrid niching genetic algorithm
IEEE Trans. Syst. Man Cybern. – Part B: Cybern.
A point symmetry-based clustering technique for automatic evolution of clusters
IEEE Trans. Knowl. Data Eng.
Cited by (86)
A Hybrid Clustered Ant Colony Optimization Approach for the Hierarchical Multi-Switch Multi-Echelon Vehicle Routing Problem with Service Times
2024, Computers and Industrial EngineeringPSOGSA: A parallel implementation model for data clustering using new hybrid swarm intelligence and improved machine learning technique
2024, Sustainable Computing: Informatics and SystemsComputational intelligence techniques for localization and clustering in wireless sensor networks
2021, Recent Trends in Computational Intelligence Enabled Research: Theoretical Foundations and ApplicationsOptimization Inspired by the Organization of Interiors Applied to Cluster Analysis
2023, Revista de Informatica Teorica e Aplicada