ABSTRACT
We show how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+ε) error. Importantly, this class includes k-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k) dimensions, we generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For k-means dimensionality reduction, we provide (1+ε) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only 'cover' a good subspace for A}, but can be used directly to compute this subspace.
Finally, for k-means clustering, we show how to achieve a (9+ε) approximation by Johnson-Lindenstrauss projecting data to just O(log k/ε2) dimensions. This is the first result that leverages the specific structure of k-means to achieve dimension independent of input size and sublinear in k.
- D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4), 2003. Google ScholarDigital Library
- D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245--248, 2009. Google ScholarDigital Library
- D. Arthur and S. Vassilvitskii. K-means+: The advantages of careful seeding. InSODA2007, pages 1027--1035, 2007. Google ScholarDigital Library
- M. Asteris, D. Papailiopoulos, and A. Dimakis. Nonnegative sparse PCA with provable guarantees. InICML2014, pages 1728--1736, 2014.Google Scholar
- M.-F. Balcan, S. Ehrlich, and Y. Liang. Distributed k-means and k-median clustering on general topologies. InNIPS2013, pages 1995--2003, 2013.Google Scholar
- M.-F. Balcan, V. Kanchanapally, Y. Liang, and D. P. Woodruff. Improved distributed principal component analysis. InNIPS2014, pages 3113--3121, 2014.Google Scholar
- J. Batson, D. A. Spielman, and N. Srivastava. Twice-Ramanujan sparsifiers. SIAM Journal on Computing, 41(6):1704--1721, 2012.Google ScholarDigital Library
- C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix reconstruction. SIAM Journal on Computing, 43(2):687--717, 2014.Google ScholarCross Ref
- C. Boutsidis and M. Magdon-Ismail. Deterministic feature selection for k-means clustering. IEEE Transactions on Information Theory, 59(9), 2013. Google ScholarDigital Library
- C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for the k-means clustering problem. In NIPS 2009, pages 153--161, 2009.Google Scholar
- C. Boutsidis and D. P. Woodruff. Optimal CUR matrix decompositions. In STOC 2014, pages 291--300, 2014. Google ScholarDigital Library
- C. Boutsidis, A. Zouzias, and P. Drineas. Random projections for k-means clustering. In NIPS 2010, 2010.Google Scholar
- C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas. Randomized dimensionality reduction for k-means clustering. IEEE Transactions on Information Theory, 61(2):1045--1062, Feb 2015.Google ScholarDigital Library
- K. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In STOC 2009, pages 205--214, 2009. Google ScholarDigital Library
- K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC 2013, 2013. Google ScholarDigital Library
- M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu. Dimensionality reduction for k-means clustering and low rank approximation.CoRR, abs/1410.6801, 2014.Google Scholar
- M. B. Cohen, Y. T. Lee, C. Musco, C. Musco, R. Peng, and A. Sidford. Uniform sampling for matrix approximation. In ITCS 2015, pages 181--190, 2015. Google ScholarDigital Library
- M. B. Cohen, J. Nelson, and D. P. Woodruff. Optimal approximate matrix product in terms of stable rank. Manuscript, 2014.Google Scholar
- A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2(1):225--247, 2006.Google ScholarCross Ref
- P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering large graphs via the singular value decomposition. Machine Learning, 56(1--3), 2004. Google ScholarDigital Library
- D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. InSODA2013, 2013. Google ScholarDigital Library
- M. Ghashami and J. M. Phillips. Relative errors for deterministic low-rank matrix approximations. In SODA 2014, 2014. Google ScholarDigital Library
- V. Guruswami and A. K. Sinop. Optimal column-based low-rank matrix reconstruction. In SODA 2012, 2012. Google ScholarDigital Library
- N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217--288, 2011. Google ScholarDigital Library
- S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. Discrete and Computational Geometry, 37(1):3--19, 2007. Google ScholarDigital Library
- S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In STOC 2004, pages 291--300, 2004. Google ScholarDigital Library
- D. Hsu, S. Kakade, and T. Zhang. Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electron. Commun. Probab., 17:1--13, 2012.Google ScholarCross Ref
- M. Inaba, N. Katoh, and H. Imai. Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In SoCG 1994, pages 332--339, 1994. Google ScholarDigital Library
- D. Kane and J. Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM, 61(1):4, 2014. Google ScholarDigital Library
- R. Kannan, S. S. Vempala, and D. P. Woodruff. Principal component analysis and higher correlations for distributed data. In COLT 2014, pages 1040--1057, 2014.Google Scholar
- T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. A local search approximation algorithm for k-means clustering. In SoCG 2002, 2002. Google ScholarDigital Library
- A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions. In FOCS 2004, pages 454--462, 2004. Google ScholarDigital Library
- M. Li, G. L. Miller, and R. Peng. Iterative row sampling. In FOCS 2013, pages 127--136, 2013. Google ScholarDigital Library
- Y. Liang, M.-F. Balcan, and V. Kanchanapally. Distributed PCA and k-means clustering. In The Big Learning Workshop at NIPS 2013, 2013.Google Scholar
- E. Liberty. Simple and deterministic matrix sketching. InKDD2013, pages 581--588, 2013. Google ScholarDigital Library
- S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 1982. Google ScholarDigital Library
- M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem is NP-hard. In WALCOM 2009, pages 274--285, 2009. Google ScholarDigital Library
- M. W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning, 3(2):123--224, 2011. Google ScholarDigital Library
- M. W. Mahoney and X. Meng. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In STOC 2013, pages 91--100, 2013. Google ScholarDigital Library
- L. Mirsky. Symmetric gauge functions and unitarily invariant norms. The Quarterly Journal of Mathematics, 11:50--59, 1960.Google ScholarCross Ref
- J. Nelson and H. L. Nguyen. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In FOCS 2013, pages 117--126, 2013. Google ScholarDigital Library
- D. Papailiopoulos, A. Dimakis, and S. Korokythakis. Sparse PCA through low-rank approximations. In ICML 2013, pages 747--755, 2013.Google Scholar
- T. Sarlós. Improved approximation algorithms for large matrices via random projections. In FOCS 2006, pages 143--152, 2006. Google ScholarDigital Library
- D. P. Woodruff. Low rank approximation lower bounds in row-update streams. In NIPS 2014, 2014.Google Scholar
- D. P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1--2):1--157, 2014. Google ScholarDigital Library
- X.-T. Yuan and T. Zhang. Truncated power method for sparse eigenvalue problems. The Journal of Machine Learning Research, 14(1):899--925, 2013. Google ScholarDigital Library
Index Terms
- Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
Recommendations
Low rank approximation with entrywise l1-norm error
STOC 2017: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of ComputingWe study the ℓ1-low rank approximation problem, where for a given n x d matrix A and approximation factor α ≤ 1, the goal is to output a rank-k matrix  for which
‖A-Â‖1 ≤ α · min rank-k matrices A′ ‖A-A′‖1,
where for an n x d matrix C, we let ‖C‖1 ...
Robust Dimensionality Reduction via Low-rank Laplacian Graph Learning
Manifold learning is a widely used technique for dimensionality reduction as it can reveal the intrinsic geometric structure of data. However, its performance decreases drastically when data samples are contaminated by heavy noise or occlusions, which ...
Extended Lanczos bidiagonalization algorithm for low rank approximation and its applications
We propose an extended Lanczos bidiagonalization algorithm for finding a low rank approximation of a given matrix. We show that this method can yield better low-rank approximations than standard Lanczos bidiagonalization algorithm, without increasing ...
Comments