Article

How slow is the k-means method?

Authors:
David Arthur

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Sergei Vassilvitskii

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

SCG '06: Proceedings of the twenty-second annual symposium on Computational geometryJune 2006Pages 144–153https://doi.org/10.1145/1137856.1137880

Published:05 June 2006Publication History

SCG '06: Proceedings of the twenty-second annual symposium on Computational geometry

Pages 144–153

ABSTRACT

The k-means method is an old but popular clustering algorithm known for its observed speed and its simplicity. Until recently, however, no meaningful theoretical bounds were known on its running time. In this paper, we demonstrate that the worst-case running time of k-means is superpolynomial by improving the best known lower bound from Ω(n) iterations to 2^Ω(√n).

References

Pankaj K. Agarwal and Nabil H. Mustafa. k-means projective clustering. In PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 155--165, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
David Arthur and Sergei Vassilvitskii. Improved smoothed analysis for the k-means method. Manuscript, 2006.Google Scholar
David Arthur and Sergei Vassilvitskii. k-means lower bound implementation. http://www.stanford.edu/~darthur/kMeansLbTest.zip, 2006.Google Scholar
Sanjoy Dasgupta. How fast is k-means? In COLT Computational Learning Theory, volume 2777, page 735, 2003.Google ScholarCross Ref
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience Publication, 2000. Google ScholarDigital Library
Frédéric Gibou and Ronald Fedkiw. A fast hybrid k-means level set algorithm for segmentation. In 4th Annual Hawaii International Conference on Statistics and Mathematics, pages 281--291, 2005.Google Scholar
Sariel Har-Peled and Bardia Sadri. How fast is the k-means method? Algorithmica, 41(3):185--202, 2005. Google ScholarDigital Library
R. Herwig, A.J. Poustka, C. Muller, C. Bull, H. Lehrach, and J O'Brien. Large-scale clustering of cdna-fingerprinting data. Genome Research, 9:1093--1105, 1999.Google ScholarCross Ref
Mary Inaba, Naoki Katoh, and Hiroshi Imai. Applications of weighted voronoi diagrams and randomization to variance-based k-clustering: extended abstract. In SCG '94: Proceedings of the tenth annual symposium on Computational geometry, pages 332--339, New York, NY, USA, 1994. ACM Press. Google ScholarDigital Library
W. Johnson and J. Lindenstrauss. Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics, 26:189--206, 1984.Google ScholarCross Ref
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. In SCG '02: Proceedings of the eighteenth annual symposium on Computational geometry, pages 10--18, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
Stuart P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129--136, 1982.Google ScholarDigital Library
Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. ACM, 51(3):385--463, 2004. Google ScholarDigital Library

Index Terms

How slow is the k-means method?
1. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...
Read More
k-means requires exponentially many iterations even in the plane
SCG '09: Proceedings of the twenty-fifth annual symposium on Computational geometry

The k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its running time (...
Read More
k-means Requires Exponentially Many Iterations Even in the Plane

The k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its running time (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SCG '06: Proceedings of the twenty-second annual symposium on Computational geometry
June 2006
500 pages
ISBN:1595933409
DOI:10.1145/1137856
Program Chairs:
Nina Amenta
University of California at Davis
,
Otfried Cheong
Korea Advanced Institute of Science and Technology
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
k-means
local search
lower bounds
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate625of1,685submissions,37%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 183
  Total Citations
  View Citations
- 1,543
  Total Downloads
- Downloads (Last 12 months)160
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How slow is the k-means method?

SCG '06: Proceedings of the twenty-second annual symposium on Computational geometry

ABSTRACT

References

Cited By

Index Terms

Recommendations

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

k-means requires exponentially many iterations even in the plane

k-means Requires Exponentially Many Iterations Even in the Plane