research-article

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Authors:
Michael B. Cohen

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Sam Elder

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Cameron Musco

Massachusetts Institute of Technology, Cambrdige, MA, USA

Massachusetts Institute of Technology, Cambrdige, MA, USA
View Profile

,
Christopher Musco

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Madalina Persu

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of ComputingJune 2015Pages 163–172https://doi.org/10.1145/2746539.2746569

Published:14 June 2015Publication History

STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

Pages 163–172

ABSTRACT

We show how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+ε) error. Importantly, this class includes k-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k) dimensions, we generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For k-means dimensionality reduction, we provide (1+ε) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only 'cover' a good subspace for A}, but can be used directly to compute this subspace.

Finally, for k-means clustering, we show how to achieve a (9+ε) approximation by Johnson-Lindenstrauss projecting data to just O(log k/ε²) dimensions. This is the first result that leverages the specific structure of k-means to achieve dimension independent of input size and sublinear in k.

References

D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4), 2003. Google ScholarDigital Library
D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245--248, 2009. Google ScholarDigital Library
D. Arthur and S. Vassilvitskii. K-means+: The advantages of careful seeding. InSODA2007, pages 1027--1035, 2007. Google ScholarDigital Library
M. Asteris, D. Papailiopoulos, and A. Dimakis. Nonnegative sparse PCA with provable guarantees. InICML2014, pages 1728--1736, 2014.Google Scholar
M.-F. Balcan, S. Ehrlich, and Y. Liang. Distributed k-means and k-median clustering on general topologies. InNIPS2013, pages 1995--2003, 2013.Google Scholar
M.-F. Balcan, V. Kanchanapally, Y. Liang, and D. P. Woodruff. Improved distributed principal component analysis. InNIPS2014, pages 3113--3121, 2014.Google Scholar
J. Batson, D. A. Spielman, and N. Srivastava. Twice-Ramanujan sparsifiers. SIAM Journal on Computing, 41(6):1704--1721, 2012.Google ScholarDigital Library
C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix reconstruction. SIAM Journal on Computing, 43(2):687--717, 2014.Google ScholarCross Ref
C. Boutsidis and M. Magdon-Ismail. Deterministic feature selection for k-means clustering. IEEE Transactions on Information Theory, 59(9), 2013. Google ScholarDigital Library
C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for the k-means clustering problem. In NIPS 2009, pages 153--161, 2009.Google Scholar
C. Boutsidis and D. P. Woodruff. Optimal CUR matrix decompositions. In STOC 2014, pages 291--300, 2014. Google ScholarDigital Library
C. Boutsidis, A. Zouzias, and P. Drineas. Random projections for k-means clustering. In NIPS 2010, 2010.Google Scholar
C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas. Randomized dimensionality reduction for k-means clustering. IEEE Transactions on Information Theory, 61(2):1045--1062, Feb 2015.Google ScholarDigital Library
K. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In STOC 2009, pages 205--214, 2009. Google ScholarDigital Library
K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC 2013, 2013. Google ScholarDigital Library
M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu. Dimensionality reduction for k-means clustering and low rank approximation.CoRR, abs/1410.6801, 2014.Google Scholar
M. B. Cohen, Y. T. Lee, C. Musco, C. Musco, R. Peng, and A. Sidford. Uniform sampling for matrix approximation. In ITCS 2015, pages 181--190, 2015. Google ScholarDigital Library
M. B. Cohen, J. Nelson, and D. P. Woodruff. Optimal approximate matrix product in terms of stable rank. Manuscript, 2014.Google Scholar
A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2(1):225--247, 2006.Google ScholarCross Ref
P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering large graphs via the singular value decomposition. Machine Learning, 56(1--3), 2004. Google ScholarDigital Library
D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. InSODA2013, 2013. Google ScholarDigital Library
M. Ghashami and J. M. Phillips. Relative errors for deterministic low-rank matrix approximations. In SODA 2014, 2014. Google ScholarDigital Library
V. Guruswami and A. K. Sinop. Optimal column-based low-rank matrix reconstruction. In SODA 2012, 2012. Google ScholarDigital Library
N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217--288, 2011. Google ScholarDigital Library
S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. Discrete and Computational Geometry, 37(1):3--19, 2007. Google ScholarDigital Library
S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In STOC 2004, pages 291--300, 2004. Google ScholarDigital Library
D. Hsu, S. Kakade, and T. Zhang. Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electron. Commun. Probab., 17:1--13, 2012.Google ScholarCross Ref
M. Inaba, N. Katoh, and H. Imai. Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In SoCG 1994, pages 332--339, 1994. Google ScholarDigital Library
D. Kane and J. Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM, 61(1):4, 2014. Google ScholarDigital Library
R. Kannan, S. S. Vempala, and D. P. Woodruff. Principal component analysis and higher correlations for distributed data. In COLT 2014, pages 1040--1057, 2014.Google Scholar
T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. A local search approximation algorithm for k-means clustering. In SoCG 2002, 2002. Google ScholarDigital Library
A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions. In FOCS 2004, pages 454--462, 2004. Google ScholarDigital Library
M. Li, G. L. Miller, and R. Peng. Iterative row sampling. In FOCS 2013, pages 127--136, 2013. Google ScholarDigital Library
Y. Liang, M.-F. Balcan, and V. Kanchanapally. Distributed PCA and k-means clustering. In The Big Learning Workshop at NIPS 2013, 2013.Google Scholar
E. Liberty. Simple and deterministic matrix sketching. InKDD2013, pages 581--588, 2013. Google ScholarDigital Library
S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 1982. Google ScholarDigital Library
M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem is NP-hard. In WALCOM 2009, pages 274--285, 2009. Google ScholarDigital Library
M. W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning, 3(2):123--224, 2011. Google ScholarDigital Library
M. W. Mahoney and X. Meng. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In STOC 2013, pages 91--100, 2013. Google ScholarDigital Library
L. Mirsky. Symmetric gauge functions and unitarily invariant norms. The Quarterly Journal of Mathematics, 11:50--59, 1960.Google ScholarCross Ref
J. Nelson and H. L. Nguyen. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In FOCS 2013, pages 117--126, 2013. Google ScholarDigital Library
D. Papailiopoulos, A. Dimakis, and S. Korokythakis. Sparse PCA through low-rank approximations. In ICML 2013, pages 747--755, 2013.Google Scholar
T. Sarlós. Improved approximation algorithms for large matrices via random projections. In FOCS 2006, pages 143--152, 2006. Google ScholarDigital Library
D. P. Woodruff. Low rank approximation lower bounds in row-update streams. In NIPS 2014, 2014.Google Scholar
D. P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1--2):1--157, 2014. Google ScholarDigital Library
X.-T. Yuan and T. Zhang. Truncated power method for sparse eigenvalue problems. The Journal of Machine Learning Research, 14(1):899--925, 2013. Google ScholarDigital Library

Index Terms

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Recommendations

Low rank approximation with entrywise l₁-norm error
STOC 2017: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

We study the ℓ₁-low rank approximation problem, where for a given n x d matrix A and approximation factor α ≤ 1, the goal is to output a rank-k matrix Â for which

‖A-Â‖₁ ≤ α · min _{rank-k matrices A′} ‖A-A′‖₁,

where for an n x d matrix C, we let ‖C‖₁ ...
Read More
Robust Dimensionality Reduction via Low-rank Laplacian Graph Learning
Manifold learning is a widely used technique for dimensionality reduction as it can reveal the intrinsic geometric structure of data. However, its performance decreases drastically when data samples are contaminated by heavy noise or occlusions, which ...
Read More
Extended Lanczos bidiagonalization algorithm for low rank approximation and its applications

We propose an extended Lanczos bidiagonalization algorithm for finding a low rank approximation of a given matrix. We show that this method can yield better low-rank approximations than standard Lanczos bidiagonalization algorithm, without increasing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing
June 2015
916 pages
ISBN:9781450335362
DOI:10.1145/2746539
General Chair:
Rocco Servedio
Columbia University
,
Program Chair:
Ronitt Rubinfeld
MIT and Tel Aviv University
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Johnson-Lindenstrauss
clustering
dimensionality reduction
distributed
feature extraction
feature selection
k-means
low rank approximation
principal component analysis
projection cost preserving
singular value decomposition
sketching
streaming
Qualifiers
- research-article
Conference

Acceptance Rates
STOC '15 Paper Acceptance Rate93of347submissions,27%Overall Acceptance Rate1,469of4,586submissions,32%
More
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 134
  Total Citations
  View Citations
- 1,910
  Total Downloads
- Downloads (Last 12 months)189
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Low rank approximation with entrywise l₁-norm error

Robust Dimensionality Reduction via Low-rank Laplacian Graph Learning

Extended Lanczos bidiagonalization algorithm for low rank approximation and its applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Low rank approximation with entrywise l1-norm error

Robust Dimensionality Reduction via Low-rank Laplacian Graph Learning

Extended Lanczos bidiagonalization algorithm for low rank approximation and its applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Low rank approximation with entrywise l₁-norm error