ABSTRACT
Numerous domains ranging from distributed data acquisition to knowledge reuse need to solve the cluster ensemble problem of combining multiple clusterings into a single unified clustering. Unfortunately current non-agent-based cluster combining methods do not work in a distributed environment, are not robust to corrupted clusterings and require centralized access to all original clusterings. Overcoming these issues will allow cluster ensembles to be used in fundamentally distributed and failure-prone domains such as data acquisition from satellite constellations, in addition to domains demanding confidentiality such as combining clusterings of user profiles. This paper proposes an efficient, distributed, agent-based clustering ensemble method that addresses these issues. In this approach each agent is assigned a small subset of the data and votes on which final cluster its data points should belong to. The final clustering is then evaluated by a global utility, computed in a distributed way. This clustering is also evaluated using an agent-specific utility that is shown to be easier for the agents to maximize. Results show that agents using the agent-specific utility can achieve better performance than traditional non-agent based methods and are effective even when up to 50% of the agents fail.
- A. Agogino and K. Tumer. Efficient evaluation functions for multi-rover systems. In The Genetic and Evolutionary Computation Conference, pages 1--12, Seatle, WA, June 2004.]] Google ScholarDigital Library
- Christopher M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1996.]] Google ScholarDigital Library
- D. Boley, M. Gini, R. Gross, S. Han, K. Hastings, G. Kary pis, V. Kumar, B. Mobasher, and J. Moore. Partitioning-based clustering for web document categorization. Decision Support Systems, 27(3):329--341, 1999.]] Google ScholarDigital Library
- S. V. Chakravarthy and J. Ghosh. Scale-based clustering using the radial basis function network. IEEE Trans. on Neural Networks, pages 1250--61, September 1996.]]Google Scholar
- I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1):143--175, 2001.]]Google ScholarDigital Library
- R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2nd ed). John Wiley & Sons, New York, NY, 2000.]] Google ScholarDigital Library
- G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359--392, 1998.]] Google ScholarDigital Library
- George Karypis, Rajat Aggarwal, Vipin Kumar, and Shashi Shekhar. Multilevel hypergraph partitioning: Applications in vlsi domain. In Proceedings of the Design and Automation Conference, 1997.]] Google ScholarDigital Library
- B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 42(2):291--307, 1970.]]Google ScholarCross Ref
- J. Kim and T. Warnow. Tutorial on phylogenetic tree estimation. In "Intelligent Systems for Molecular Biology", Heidelberg, Germany, 1999.]]Google Scholar
- T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:59--69, 1982.]]Google ScholarCross Ref
- J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281--297, 1967.]]Google Scholar
- T. M. Mitchell. Machine Learning. WCB/McGraw-Hill, Boston, MA, 1997.]] Google ScholarDigital Library
- Alexander Strehl. Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining. PhD thesis, The University of Texas at Austin, May 2002.]] Google ScholarDigital Library
- Alexander Strehl and Joydeep Ghosh. Cluster ensembles --- a knowledge reuse framework for combining partitionings. In Proceedings of AAAI 2002, Edmonton, Canada, pages 93--98. AAAI, July 2002.]] Google ScholarDigital Library
- Alexander Strehl, Joydeep Ghosh, and Raymond J. Mooney. Impact of similarity measures on web-page clustering. In Proc. AAAI Workshop on AI for Web Search (AAAI 2000), Austin, pages 58--64. AAAI/MIT Press, July 2000.]]Google Scholar
- K. Tumer and D. Wolpert, editors. Collectives and the Design of Complex Systems. Springer, New York, 2004.]] Google ScholarDigital Library
- K. Tumer and D. H. Wolpert. Collective intelligence and Braess' paradox. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 104--109, Austin, TX, 2000.]] Google ScholarDigital Library
- Ian Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2005.]] Google ScholarDigital Library
- D. H. Wolpert and K. Tumer. Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(2/3):265--279, 2001.]]Google ScholarCross Ref
Recommendations
An Efficient Formulation of the Improved Visual Assessment of Cluster Tendency (iVAT) Algorithm
The VAT algorithm is a visual method for determining the possible number of clusters in, or the cluster tendency of a set of objects. The improved VAT (iVAT) algorithm uses a graph-theoretic distance transform to improve the effectiveness of the VAT ...
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization
Many clustering algorithms, including cluster ensembles, rely on a random component. Stability of the results across different runs is considered to be an asset of the algorithm. The cluster ensembles considered here are based on k-means clusterers. ...
Some connectivity based cluster validity indices
Identification of the correct number of clusters and the appropriate partitioning technique are some important considerations in clustering where several cluster validity indices, primarily utilizing the Euclidean distance, have been used in the ...
Comments