ABSTRACT
In this paper, we show that for several clustering problems one can extract a small set of points, so that using those core-sets enable us to perform approximate clustering efficiently. The surprising property of those core-sets is that their size is independent of the dimension.Using those, we present a (1+ ε)-approximation algorithms for the k-center clustering and k-median clustering problems in Euclidean space. The running time of the new algorithms has linear or near linear dependency on the number of points and the dimension, and exponential dependency on 1/ε and k. As such, our results are a substantial improvement over what was previously known.We also present some other clustering results including (1+ ε)-approximate 1-cylinder clustering, and k-center clustering with outliers.
- P. Agarwal and C. Procopiuc. Exact and approximation algorithms for clustering. In Proc. 9th ACM-SIAM Sympos. Discrete Algorithms, pages 658--667, 1998. Google ScholarDigital Library
- N. Alon, S. Dar, M. Parnas, and D. Ron. Testing of clustering. In Proc. 41th Annu. IEEE Sympos. Found. Comput. Sci., 2000. Google ScholarDigital Library
- S. Arora, P. Raghavan, and S. Rao. Approximation schemes for Euclidean k-median and related problems. In Proc. 30th Annu. ACM Sympos. Theory Comput., pages 106--113, 1998. Google ScholarDigital Library
- M. Bern and D. Eppstein. Approximation algorithms for geometric problems. In D. Hochbaum, editor, Approximationg algorithms for NP-Hard problems. PWS Publishing Company, 1997. Google ScholarDigital Library
- R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley-Interscience, New York, 2nd edition, 2001. Google ScholarDigital Library
- L. Engebretsen, P. Indyk, and R. O'Donnell. Derandomized dimensionality reduction with applications. In Proc. 13th ACM-SIAM Sympos. Discrete Algorithms, 2002. to appear. Google ScholarDigital Library
- A. Goel, P. Indyk, and K. Varadarajan. Reductions among high dimensional proximity problems. In Proc. 12th ACM-SIAM Sympos. Discrete Algorithms, pages 769--778, 2001. Google ScholarDigital Library
- T. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoret. Comput. Sci., 38:293--306, 1985.Google ScholarCross Ref
- M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization, volume 2 of Algorithms and Combinatorics. Springer-Verlag, Berlin Heidelberg, 2nd edition, 1988. 2nd edition 1994.Google Scholar
- S. Har-Peled. Clustering motion. In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., pages 84--93, 2001. Google ScholarDigital Library
- S. Har-Peled and K. Varadarajan. Approximate shape fitting via linearization. In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., pages 66--73, 2001. Google ScholarDigital Library
- P. Indyk. Algorithmic applications of low-distortion geometric embeddings. In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., pages 10--31, 2001. Tutorial. Google ScholarDigital Library
- P. Indyk and M. Thorup. Approximate 1-median. manuscript, 2000.Google Scholar
- N. Mishra, D. Oblinger, and L. Pitt. Sublinear time approximate clustering. In SODA 12, 2001. Google ScholarDigital Library
- R. Ostrovsky and Y. Rabani. Polynomial time approximation schemes for geometric k-clustering. In Proc. 41st Symp. Foundations of Computer Science, pages 349--358. IEEE, Nov 2000. Google ScholarDigital Library
Index Terms
- Approximate clustering via core-sets
Recommendations
Approximate Kernel Clustering
FOCS '08: Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer ScienceIn the kernel clustering problem we are given a large $n\times n$positive semi-definite matrix $A=(a_{ij})$ with$\sum_{i,j=1}^na_{ij}=0$ and a small $k\times k$ positivesemi-definite matrix $B=(b_{ij})$. The goal is to find a partition$S_1,\ldots,S_k$ ...
Robust Clustering Based on Dominant Sets
ICPR '14: Proceedings of the 2014 22nd International Conference on Pattern RecognitionClustering is an important unsupervised learning approach and widely used in pattern recognition, data mining and image processing, etc. Different from existing clustering algorithms based on partitioning within data, dominant sets clustering extracts ...
Approximate clustering without the approximation
SODA '09: Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithmsApproximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, ...
Comments