ABSTRACT
We give near-optimal space bounds in the streaming model for linear algebra problems that include estimation of matrix products, linear regression, low-rank approximation, and approximation of matrix rank. In the streaming model, sketches of input matrices are maintained under updates of matrix entries; we prove results for turnstile updates, given in an arbitrary order. We give the first lower bounds known for the space needed by the sketches, for a given estimation error ε. We sharpen prior upper bounds, with respect to combinations of space, failure probability, and number of passes. The sketch we use for matrix A is simply STA, where S is a sign matrix. Our results include the following upper and lower bounds on the bits of space needed for 1-pass algorithms. Here A is an n x d matrix, B is an n x d' matrix, and c := d+d'. These results are given for fixed failure probability; for failure probability δ>0, the upper bounds require a factor of log(1/δ) more space. We assume the inputs have integer entries specified by O(log(nc)) bits, or O(log(nd)) bits. (Matrix Product) Output matrix C with F(ATB-C) ≤ ε F(A) F(B). We show that Θ(cε-2log(nc)) space is needed. (Linear Regression) For d'=1, so that B is a vector b, find x so that Ax-b ≤ (1+ε) minx' ∈ Reald Ax'-b. We show that Θ(d2ε-1 log(nd)) space is needed. (Rank-k Approximation) Find matrix tAk of rank no more than k, so that F(A-tAk) ≤ (1+ε) F{A-Ak}, where Ak is the best rank-k approximation to A. Our lower bound is Ω(kε-1(n+d)log(nd)) space, and we give a one-pass algorithm matching this when A is given row-wise or column-wise. For general updates, we give a one-pass algorithm needing [O(kε-2(n + d/ε2)log(nd))] space. We also give upper and lower bounds for algorithms using multiple passes, and a sketching analog of the CUR decomposition.
- D. Achlioptas and F. Mcsherry. Fast computation of low-rank matrix approximations. J. ACM, 54(2):9, 2007. Google ScholarDigital Library
- N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-join sizes in limited storage. J. Comput. Syst. Sci., 64(3):719--747, 2002.Google ScholarDigital Library
- N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137--147, 1999. Google ScholarDigital Library
- Z. Bar-Yossef. The complexity of massive data set computations, 2002.Google Scholar
- Z. Bar-Yossef. Sampling lower bounds via information theory. In STOC, pages 335--344, 2003. Google ScholarDigital Library
- M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703, 2002. Google ScholarDigital Library
- J. I. Chu and G. Schnitger. The communication complexity of several problems in matrix computation. J. Complexity, 7(4):395--407, 1991.Google ScholarCross Ref
- J. I. Chu and G. Schnitger. Communication complexity of matrix computation over finite fields. Mathematical Systems Theory, 28(3):215--228, 1995.Google ScholarCross Ref
- D. Coppersmith. Rectangular matrix multiplication revisited. J. Complexity, 13(1):42--49, 1997. Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005. Google ScholarDigital Library
- A. Deshpande and S. Vempala. Adaptive sampling and fast low-rank matrix approximation. In APPROX-RANDOM, pages 292--303, 2006. Google ScholarDigital Library
- P. Drineas, R. Kannan, and M. W. Mahoney. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing, 36(1):184--206, 2006. Google ScholarDigital Library
- P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844--881, 2008. Google ScholarDigital Library
- P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlós. Faster least squares approximation. Technical report, 2007. arXiv:0710.1435.Google Scholar
- D. Feldman, M. Monemizadeh, C. Sohler, and D. Woodruff. Coresets and sketches for subspace approximation problems, 2008.Google Scholar
- A. Frieze, R. Kannan, and S. Vempala. Fast monte-carlo algorithms for finding low-rank approximations. J. ACM, 51(6):1025--1041, 2004. Google ScholarDigital Library
- S. Har-Peled. Low-rank approximation in linear time, 2006.Google Scholar
- X. Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257--299, 1998. Google ScholarDigital Library
- E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997. Google ScholarDigital Library
- P. B. Miltersen, N. Nisan, S. Safra, and A. Wigderson. On data structures and asymmetric communication complexity. J. Comput. Syst. Sci., 57(1):37--49, 1998. Google ScholarDigital Library
- S. Muthukrishnan. Data streams: algorithms and applications. Foundations and Trends in Theoretical Computer Science, 2005. Google ScholarDigital Library
- J. Nelson and D. Woodruff. Revisiting norm estimating in data streams, 2008.Google Scholar
- T. Sarlós. Improved approximation algorithms for large matrices via random projections. In FOCS '06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 143--152, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- V. H. Vu. Spectral norm of random matrices. Combinatorica, 27(6):721--736, 2007. Google ScholarDigital Library
Index Terms
- Numerical linear algebra in the streaming model
Recommendations
Low rank approximation and regression in input sparsity time
STOC '13: Proceedings of the forty-fifth annual ACM symposium on Theory of ComputingWe design a new distribution over poly(r ε-1) x n matrices S so that for any fixed n x d matrix A of rank r, with probability at least 9/10, SAx2 = (1 pm ε)Ax2 simultaneously for all x ∈ Rd. Such a matrix S is called a subspace embedding. Furthermore, ...
Sublinear time numerical linear algebra for structured matrices
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial IntelligenceWe show how to solve a number of problems in numerical linear algebra, such as least squares regression, ℓp-regression for any p ≥ 1, low rank approximation, and kernel regression, in time T(A)poly(log(nd)), where for a given input matrix A ∈ ℝn×d, T(A) ...
Comments