Elsevier

Applied Soft Computing

Volume 28, March 2015, Pages 301-311
Applied Soft Computing

Ant Colony Optimization based clustering methodology

https://doi.org/10.1016/j.asoc.2014.11.060Get rights and content

Highlights

  • A novel ACO based methodology (ACO-C) is proposed for spatial clustering.

  • It works in data sets with no a priori information.

  • It includes solution evaluation, neighborhood construction and data set reduction.

  • It has a multi-objective framework, and yields a set of non-dominated solutions.

  • Experimental results show that ACO-C outperforms other competing approaches.

Abstract

In this work we consider spatial clustering problem with no a priori information. The number of clusters is unknown, and clusters may have arbitrary shapes and density differences. The proposed clustering methodology addresses several challenges of the clustering problem including solution evaluation, neighborhood construction, and data set reduction. In this context, we first introduce two objective functions, namely adjusted compactness and relative separation. Each objective function evaluates the clustering solution with respect to the local characteristics of the neighborhoods. This allows us to measure the quality of a wide range of clustering solutions without a priori information. Next, using the two objective functions we present a novel clustering methodology based on Ant Colony Optimization (ACO-C). ACO-C works in a multi-objective setting and yields a set of non-dominated solutions. ACO-C has two pre-processing steps: neighborhood construction and data set reduction. The former extracts the local characteristics of data points, whereas the latter is used for scalability. We compare the proposed methodology with other clustering approaches. The experimental results indicate that ACO-C outperforms the competing approaches. The multi-objective evaluation mechanism relative to the neighborhoods enhances the extraction of the arbitrary-shaped clusters having density variations.

Introduction

Cluster analysis is the organization of a collection of data points into clusters based on similarity [1]. Clustering is usually considered as an unsupervised classification task. That is, the characteristics of the clusters and the number of clusters are not known a priori, and they are extracted during the clustering process. In this work we focus on spatial data sets in which a priori information about the data set (the number of clusters, shapes and densities of the clusters) is not available. Finding such clusters has applications in geographical information systems [2], computer graphics [3], and image segmentation [4]. In addition, clusters of spatial defect shapes provide valuable information about the potential problems in manufacturing processes of semiconductors [5], [6].

We consider spatial clustering as an optimization problem. Our aim is to obtain compact, connected and well-separated clusters. To the best of our knowledge, there is not a single objective function that works well for any kind of geometrical clustering structure. Therefore, we first introduce two solution evaluation mechanisms for measuring the quality of a clustering solution. The main idea behind both mechanisms is similar, and each mechanism is based on two objectives: adjusted compactness and relative separation. The first objective measures the compactness and connectivity of a clustering solution, and the second objective is a measure for separation. The difference between the two mechanisms is the degree of locality addressed in the calculations. The main advantage of these objectives is that the length of an edge is evaluated relatively, that is, it is scaled relative to the lengths of other edges within its neighborhood. This scaling permits us to evaluate the quality of the clustering solution independent of the shape and density of the clusters.

We implement the proposed solution evaluation mechanisms in a clustering framework based on Ant Colony Optimization (ACO). In order to find the target clusters, we use two complementary objective functions (adjusted compactness and relative separation) in a multiple-objective context. Hence, the output of ACO-C is a set of non-dominated solutions. Different from the literature, we are not interested in finding all non-dominated solutions or the entire Pareto efficient frontier. ACO-C has two pre-processing steps: neighborhood construction and data set reduction. Neighborhood construction extracts the local connectivity, proximity and density information inherent in the data set. Data set reduction helps reduce the storage requirements and processing time for the clustering task. Our experimental results indicate that ACO-C finds the arbitrary-shaped clusters with varying densities effectively, where the number of clusters is unknown.

Our contributions to the literature are as follows:

  • 1.

    The proposed solution evaluation mechanisms allow us to quantify the quality of a clustering solution having arbitrary-shaped clusters with different densities in an optimization context. The use of these evaluation mechanisms is not restricted to ACO. They can be used in other metaheuristics and optimization-based clustering approaches.

  • 2.

    The proposed ACO-based methodology introduces a general, unified framework for the spatial clustering problem without a priori information. It includes the solution evaluation mechanism, extraction of local properties, data set reduction, and the clustering task itself.

  • 3.

    ACO-C is a novel methodology for the clustering problem in which there is no a priori information, that is,

    • the number of clusters is unknown,

    • clusters may have arbitrary shapes,

    • there may be density variations within the clusters, and

    • different clusters may have density differences.

We provide the related literature in Section 2. Section 3 introduces the solution evaluation mechanisms. The details of ACO-C are explained in Section 4. Section 5 is about the empirical performance of ACO-C. First, we set the algorithm parameters using a full factorial design. Then, we compare ACO-C with some well-known algorithms. Finally, we conclude in Section 6.

Section snippets

Related literature

The clustering algorithms can be classified into partitional, hierarchical, density-based algorithms, and metaheuristics (simulated annealing, tabu search, evolutionary algorithms, particle swarm optimization, ACO, and so on). [1], [7], [8] provide comprehensive reviews of clustering approaches.

In this section, we present the related literature on the solution evaluation mechanisms and ant-based clustering algorithms.

How to evaluate a clustering solution?

Our aim is to obtain compact, connected and well-separated clusters. For this purpose, we introduce two solution evaluation mechanisms: Clustering Evaluation Relative to Neighborhood (CERN) and Weighted Clustering Evaluation Relative to Neighborhood (WCERN).

The ACO-based clustering (ACO-C) methodology

ACO-C is a clustering methodology in a multi-objective framework. It has two pre-processing steps: neighborhood construction and data set reduction. Neighborhood construction helps extract the local information inherent in the data set. This local information is used in the solution evaluation. Data set reduction ensures the scalability of the approach.

In ACO-C an ant is a search agent. Ants construct tours by inserting edges between pairs of data points. Connected points in a tour form a

Experimental results for the ACO-C methodology

In this section, we test the performance of ACO-C empirically. First, we present the test data sets and the performance evaluation criteria. Second, using some pilot data sets, we conduct a full factorial experiment in order to set the ACO-C parameters. Third, we elaborate on the impact of data set reduction. Finally, we compare the performance of ACO-C with other clustering algorithms.

The algorithm was coded in Matlab 7.9 and run on a PC with Intel Core2 Duo 2.33 GHz processor and 2 GB RAM.

Conclusion

In this work we consider the spatial clustering problem with no a priori information on the data set. The clusters may include arbitrary shapes, and there may be density differences within and between the clusters. Moreover, the number of clusters is unknown. We present a novel ACO-based clustering methodology, namely ACO-C. In ACO-C we combine the connectivity, proximity, density and distance information with the exploration and exploitation capabilities of ACO in a multi-objective framework.

References (75)

  • Y. Yang et al.

    An aggregated clustering approach using multi-ant colonies algorithms

    Pattern Recogn.

    (2006)
  • M. Martin et al.

    Formation of an ant cemetery: swarm intelligence or statistical accident?

    Future Gener. Comput. Syst.

    (2002)
  • L. Zhang et al.

    A novel ant-based clustering algorithm using Renyi entropy

    Appl. Soft Comput.

    (2013)
  • H. Azzag et al.

    A hierarchical ant-based clustering algorithm and its use in three real-world applications

    Eur. J. Oper. Res.

    (2007)
  • C.F. Tsai et al.

    ACODF: a novel data clustering approach for data mining in large databases

    J. Syst. Softw.

    (2004)
  • A. Ghosh et al.

    Aggregation pheromone density-based data clustering

    Inf. Sci.

    (2008)
  • M. Wan et al.

    Chaotic ant swarm approach for data clustering

    Appl. Soft Comput.

    (2012)
  • T. İnkaya et al.

    An adaptive neighbourhood construction algorithm based on density and connectivity

    Pattern Recogn. Lett.

    (2015)
  • A.K. Jain et al.

    Data clustering: a review

    ACM Comput. Surv.

    (1999)
  • H. Alani et al.

    Voronoi-based region approximation for geographical information retrieval with gazetteers

    Int. J. Geogr. Inf. Sci.

    (2001)
  • H. Pfister et al.

    Point-based computer graphics

    IEEE Comput. Graph. Appl.

    (2004)
  • J. Freixenet et al.

    Yet another survey on image segmentation: region and boundary information integration

    Lect. Notes Comput. Sci.

    (2002)
  • T. Yuan et al.

    A model-based clustering approach to the recognition of the spatial defect patterns produced semiconductor fabrication

    IIE Trans.

    (2007)
  • C.H. Wang et al.

    Detection and classification of defect patterns on semiconductor wafers

    IIE Trans.

    (2006)
  • R. Xu et al.

    Survey of clustering algorithms

    IEEE Trans. Neural Netw.

    (2005)
  • P. Berkhin

    A survey of clustering data mining techniques

  • J.B. MacQueen

    Some methods for classification and analysis of multivariate observations

  • J.A. Hartigan et al.

    Algorithm AS 136: a k-means clustering algorithm

    J. R. Stat. Soc.: Ser. C (Appl. Stat.)

    (1979)
  • L. Kaufman et al.

    Finding Groups in Data: An Introduction to Cluster Analysis

    (1990)
  • C.T. Zahn

    Graph-theoretical methods for detecting and describing gestalt clusters

    IEEE Trans. Comput.

    (1971)
  • A.K. Jain et al.

    Algorithms for clustering data

    Prentice-Hall Advanced Reference Series

    (1988)
  • M. Halkidi et al.

    Cluster validity methods: Part I

    ACM SIGMOD Rec.

    (2002)
  • M. Halkidi et al.

    Cluster validity methods: Part II

    ACM SIGMOD Rec.

    (2002)
  • S. Bandyopadhyay et al.

    Nonparametric genetic clustering: comparison of validity indices

    IEEE Trans. Syst. Man Cybern. – Part C: Appl. Rev.

    (2001)
  • E.R. Hruschka et al.

    A genetic algorithm for cluster analysis

    Intell. Data Anal.

    (2003)
  • W. Sheng et al.

    A weighted sum validity function for clustering with a hybrid niching genetic algorithm

    IEEE Trans. Syst. Man Cybern. – Part B: Cybern.

    (2005)
  • S. Bandyopadhyay et al.

    A point symmetry-based clustering technique for automatic evolution of clusters

    IEEE Trans. Knowl. Data Eng.

    (2008)
  • Cited by (86)

    • Computational intelligence techniques for localization and clustering in wireless sensor networks

      2021, Recent Trends in Computational Intelligence Enabled Research: Theoretical Foundations and Applications
    View all citing articles on Scopus
    View full text