ABSTRACT
The k-means method is an old but popular clustering algorithm known for its observed speed and its simplicity. Until recently, however, no meaningful theoretical bounds were known on its running time. In this paper, we demonstrate that the worst-case running time of k-means is superpolynomial by improving the best known lower bound from Ω(n) iterations to 2Ω(√n).
- Pankaj K. Agarwal and Nabil H. Mustafa. k-means projective clustering. In PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 155--165, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- David Arthur and Sergei Vassilvitskii. Improved smoothed analysis for the k-means method. Manuscript, 2006.Google Scholar
- David Arthur and Sergei Vassilvitskii. k-means lower bound implementation. http://www.stanford.edu/~darthur/kMeansLbTest.zip, 2006.Google Scholar
- Sanjoy Dasgupta. How fast is k-means? In COLT Computational Learning Theory, volume 2777, page 735, 2003.Google ScholarCross Ref
- R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience Publication, 2000. Google ScholarDigital Library
- Frédéric Gibou and Ronald Fedkiw. A fast hybrid k-means level set algorithm for segmentation. In 4th Annual Hawaii International Conference on Statistics and Mathematics, pages 281--291, 2005.Google Scholar
- Sariel Har-Peled and Bardia Sadri. How fast is the k-means method? Algorithmica, 41(3):185--202, 2005. Google ScholarDigital Library
- R. Herwig, A.J. Poustka, C. Muller, C. Bull, H. Lehrach, and J O'Brien. Large-scale clustering of cdna-fingerprinting data. Genome Research, 9:1093--1105, 1999.Google ScholarCross Ref
- Mary Inaba, Naoki Katoh, and Hiroshi Imai. Applications of weighted voronoi diagrams and randomization to variance-based k-clustering: extended abstract. In SCG '94: Proceedings of the tenth annual symposium on Computational geometry, pages 332--339, New York, NY, USA, 1994. ACM Press. Google ScholarDigital Library
- W. Johnson and J. Lindenstrauss. Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics, 26:189--206, 1984.Google ScholarCross Ref
- Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. In SCG '02: Proceedings of the eighteenth annual symposium on Computational geometry, pages 10--18, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
- Stuart P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129--136, 1982.Google ScholarDigital Library
- Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. ACM, 51(3):385--463, 2004. Google ScholarDigital Library
Index Terms
- How slow is the k-means method?
Recommendations
Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology
This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...
k-means requires exponentially many iterations even in the plane
SCG '09: Proceedings of the twenty-fifth annual symposium on Computational geometryThe k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its running time (...
k-means Requires Exponentially Many Iterations Even in the Plane
The k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its running time (...
Comments