skip to main content
10.1145/2746539.2746569acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Authors Info & Claims
Published:14 June 2015Publication History

ABSTRACT

We show how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+ε) error. Importantly, this class includes k-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k) dimensions, we generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For k-means dimensionality reduction, we provide (1+ε) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only 'cover' a good subspace for A}, but can be used directly to compute this subspace.

Finally, for k-means clustering, we show how to achieve a (9+ε) approximation by Johnson-Lindenstrauss projecting data to just O(log k/ε2) dimensions. This is the first result that leverages the specific structure of k-means to achieve dimension independent of input size and sublinear in k.

References

  1. D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245--248, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Arthur and S. Vassilvitskii. K-means+: The advantages of careful seeding. InSODA2007, pages 1027--1035, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Asteris, D. Papailiopoulos, and A. Dimakis. Nonnegative sparse PCA with provable guarantees. InICML2014, pages 1728--1736, 2014.Google ScholarGoogle Scholar
  5. M.-F. Balcan, S. Ehrlich, and Y. Liang. Distributed k-means and k-median clustering on general topologies. InNIPS2013, pages 1995--2003, 2013.Google ScholarGoogle Scholar
  6. M.-F. Balcan, V. Kanchanapally, Y. Liang, and D. P. Woodruff. Improved distributed principal component analysis. InNIPS2014, pages 3113--3121, 2014.Google ScholarGoogle Scholar
  7. J. Batson, D. A. Spielman, and N. Srivastava. Twice-Ramanujan sparsifiers. SIAM Journal on Computing, 41(6):1704--1721, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix reconstruction. SIAM Journal on Computing, 43(2):687--717, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Boutsidis and M. Magdon-Ismail. Deterministic feature selection for k-means clustering. IEEE Transactions on Information Theory, 59(9), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for the k-means clustering problem. In NIPS 2009, pages 153--161, 2009.Google ScholarGoogle Scholar
  11. C. Boutsidis and D. P. Woodruff. Optimal CUR matrix decompositions. In STOC 2014, pages 291--300, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Boutsidis, A. Zouzias, and P. Drineas. Random projections for k-means clustering. In NIPS 2010, 2010.Google ScholarGoogle Scholar
  13. C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas. Randomized dimensionality reduction for k-means clustering. IEEE Transactions on Information Theory, 61(2):1045--1062, Feb 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In STOC 2009, pages 205--214, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC 2013, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu. Dimensionality reduction for k-means clustering and low rank approximation.CoRR, abs/1410.6801, 2014.Google ScholarGoogle Scholar
  17. M. B. Cohen, Y. T. Lee, C. Musco, C. Musco, R. Peng, and A. Sidford. Uniform sampling for matrix approximation. In ITCS 2015, pages 181--190, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. B. Cohen, J. Nelson, and D. P. Woodruff. Optimal approximate matrix product in terms of stable rank. Manuscript, 2014.Google ScholarGoogle Scholar
  19. A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2(1):225--247, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  20. P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering large graphs via the singular value decomposition. Machine Learning, 56(1--3), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. InSODA2013, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Ghashami and J. M. Phillips. Relative errors for deterministic low-rank matrix approximations. In SODA 2014, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Guruswami and A. K. Sinop. Optimal column-based low-rank matrix reconstruction. In SODA 2012, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217--288, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. Discrete and Computational Geometry, 37(1):3--19, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In STOC 2004, pages 291--300, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Hsu, S. Kakade, and T. Zhang. Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electron. Commun. Probab., 17:1--13, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. Inaba, N. Katoh, and H. Imai. Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In SoCG 1994, pages 332--339, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Kane and J. Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM, 61(1):4, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Kannan, S. S. Vempala, and D. P. Woodruff. Principal component analysis and higher correlations for distributed data. In COLT 2014, pages 1040--1057, 2014.Google ScholarGoogle Scholar
  31. T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. A local search approximation algorithm for k-means clustering. In SoCG 2002, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions. In FOCS 2004, pages 454--462, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Li, G. L. Miller, and R. Peng. Iterative row sampling. In FOCS 2013, pages 127--136, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Liang, M.-F. Balcan, and V. Kanchanapally. Distributed PCA and k-means clustering. In The Big Learning Workshop at NIPS 2013, 2013.Google ScholarGoogle Scholar
  35. E. Liberty. Simple and deterministic matrix sketching. InKDD2013, pages 581--588, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem is NP-hard. In WALCOM 2009, pages 274--285, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning, 3(2):123--224, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. W. Mahoney and X. Meng. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In STOC 2013, pages 91--100, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Mirsky. Symmetric gauge functions and unitarily invariant norms. The Quarterly Journal of Mathematics, 11:50--59, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  41. J. Nelson and H. L. Nguyen. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In FOCS 2013, pages 117--126, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Papailiopoulos, A. Dimakis, and S. Korokythakis. Sparse PCA through low-rank approximations. In ICML 2013, pages 747--755, 2013.Google ScholarGoogle Scholar
  43. T. Sarlós. Improved approximation algorithms for large matrices via random projections. In FOCS 2006, pages 143--152, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. P. Woodruff. Low rank approximation lower bounds in row-update streams. In NIPS 2014, 2014.Google ScholarGoogle Scholar
  45. D. P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1--2):1--157, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. X.-T. Yuan and T. Zhang. Truncated power method for sparse eigenvalue problems. The Journal of Machine Learning Research, 14(1):899--925, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader