Abstract
In this article, we show several results obtained by combining the use of stable distributions with pseudorandom generators for bounded space. In particular:---We show that, for any p ∈ (0, 2], one can maintain (using only O(log n/ϵ2) words of storage) a sketch C(q) of a point q ∈ lnp under dynamic updates of its coordinates. The sketch has the property that, given C(q) and C(s), one can estimate ‖q − s‖p up to a factor of (1 + ϵ) with large probability. This solves the main open problem of Feigenbaum et al. [1999].---We show that the aforementioned sketching approach directly translates into an approximate algorithm that, for a fixed linear mapping A, and given x ∈ ℜn and y ∈ ℜm, estimates ‖Ax − y‖p in O(n + m) time, for any p ∈ (0, 2]. This generalizes an earlier algorithm of Wasserman and Blum [1997] which worked for the case p = 2.---We obtain another sketch function C′ which probabilistically embeds ln1 into a normed space lm1. The embedding guarantees that, if we set m = log(1/δ)O(1/ϵ), then for any pair of points q, s ∈ ln1, the distance between q and s does not increase by more than (1 + ϵ) with constant probability, and it does not decrease by more than (1 − ϵ) with probability 1 − δ. This is the only known dimensionality reduction theorem for the l1 norm. In fact, stronger theorems of this type (i.e., that guarantee very low probability of expansion as well as of contraction) cannot exist [Brinkman and Charikar 2003].---We give an explicit embedding of ln2 into lnO(log n)1 with distortion (1 + 1/nΘ(1)).
- Alon, N., Matias, Y., and Szegedy, M. 1996. The space complexity of approximating the frequency moments. In Proceedings of the ACM Symposium on Theory of Computing, ACM, New York, 20--29.]] Google ScholarDigital Library
- Ar, S., Blum, M., Codenotti, B., and Gemmell, P. 1993. Checking approximate computation over the reals. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]] Google ScholarDigital Library
- Berger, B. 1997. The fourth moment method. SIAM J. Comput. 26.]] Google ScholarDigital Library
- Brinkman, B., and Charikar, M. 2003. On the impossibility of dimension reduction in l1. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarDigital Library
- Broder, A. 1998. Filtering near-duplicate documents. In Proceedings of FUN.]]Google Scholar
- Broder, A., Glassman, S., Manasse, M., and Zweig, G. 1997. Syntactic clustering of the web. In Proceedings of the 6th International World Wide Web Conference, 391--404.]] Google ScholarDigital Library
- Chambers, J. M., Mallows, C. L., and Stuck, B. W. 1976. A method for simulating stable random variables. J. Amer. Statist. Assoc. 71, 340--344.]]Google ScholarCross Ref
- Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J., and Yang, C. 2000. Finding interesting associations without support prunning. In Proceedings of the 16th International Conference on Data Engineering (ICDE).]] Google ScholarDigital Library
- Cormode, G. 2003. Stable distributions for stream computations: It's as easy as 0,1,2. In Proceedings of the Workshop on Management and Processing of Data Streams.]]Google Scholar
- Cormode, G., Datar, M., Indyk, P., and Muthukrishnan, S. 2002a. Comparing data streams using hamming norms. In Proceedings of the International Conference on Very Large Databases (VLDB).]]Google Scholar
- Cormode, G., Indyk, P., Koudas, N., and Muthukrishnan, S. 2002b. Fast mining of massive tabular data via approximate distance computations. In Proceedings of the 18th International Conference on Data Engineering (ICDE).]] Google ScholarDigital Library
- Cormode, G., and Muthukrishnan, S. 2003. Estimating dominance norms of multiple data streams. In Proceedings of the European Symposium on Algorithms.]]Google Scholar
- Datar, M., Gionis, A., Indyk, P., and Motwani, R. 2002. Maintaining stream statistics over sliding windows. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, ACM, New York.]] Google ScholarDigital Library
- Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the ACM Symposium on Computational Geometry, ACM, New York.]] Google ScholarDigital Library
- Engebretsen, L., Indyk, P., and O'Donnell, R. 2002. Deterministic dimensionality reduction with applications. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, ACM, New York.]] Google ScholarDigital Library
- Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M. J., and Wright, R. N. 2001a. Secure multiparty computation of approximations. Lecture Notes in Computer Science, vol. 2076, Springer-Verlag, New York, 927.]] Google ScholarDigital Library
- Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M. J., and Wright, R. N. 2001b. Secure multiparty computation of approximations. http://eprint.iacr.org/2001/024/.]]Google Scholar
- Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. 1999. An approximate l1-difference algorithm for massive data streams. In Proceedings of the Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarDigital Library
- Figiel, T., Lindenstrauss, J., and Milman, V. D. 1977. The dimension of almost spherical sections of convex bodies. Acta Math. 139, 53--94.]]Google ScholarCross Ref
- Freivalds, R. 1979. Fast probabilistic algorithms. In Proceedings of the Mathematical Foundations of Computer Science. Lecture Notes in Computer Science, vol. 74, Springer-Verlag, New York.]]Google Scholar
- Gilbert, A., Guha, S., Kotidis, Y., Indyk, P., Muthukrishnan, M., and Strauss, M. 2002. In Proceedings of the Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]] Google ScholarDigital Library
- Henzinger, M., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Technical Note 1998-011, Digital Systems Research Center, Palo Alto, CA.]]Google Scholar
- Indyk, P. 2000. Dimensionality reduction techniques for proximity problems. In Proceedings of the Ninth ACM-SIAM Symposium on Discrete Algorithms, ACM, New York.]] Google ScholarDigital Library
- Indyk, P. 2001. Tutorial: Algorithmic applications of low-distortion geometric embeddings. In Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarDigital Library
- Indyk, P. 2004. Algorithms for dynamic geometric problems over data streams. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]] Google ScholarDigital Library
- Indyk, P., Koudas, N., and Muthukrishnan, S. 2000. Identifying representative trends in massive time series datasets using sketches. In Proceedings of the 26th International Conference on Very Large Databases (VLDB).]] Google ScholarDigital Library
- Indyk, P., and Motwani, R. 1998. Approximate nearest neighbor: towards removing the curse of dimensionality. In Proceedings of the Symposium on Theory of Computing, ACM, New York.]] Google ScholarDigital Library
- Indyk, P., and Woodruff, D. 2005. Optimal approximations of the frequency moments of data streams. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]] Google ScholarDigital Library
- Johnson, W., and Lindenstrauss, J. 1984. Extensions of lipshitz mapping into hilbert space. Contemp. Math. 26, 189--206.]]Google ScholarCross Ref
- Johnson, W., and Schechtman, G. 1982. Embedding lmp into ln1. Acta Math. 149, 71--85.]]Google ScholarCross Ref
- Lindenstrauss, J., and Milman, V. D. 1993. The local theory of normed spaces and its applications to convexity. In Handbook of Convex Geometry, P. M. Gruber and J. M. Wills, eds Elsevier, Amsterdam, The Netherlands, 1149--1220.]]Google Scholar
- Linial, N., London, E., and Rabinovich, Y. 1994. The geometry of graphs and some of its algorithmic applications. In Proceedings of 35th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA, 577--591.]]Google Scholar
- Matoušek, J. 2004. Collection of open problems on low-distortion embeddings of finite metric spaces. Available at http://kam.mff.cuni.cz/~matousek/metrop.ps.gz|.]]Google Scholar
- Maurer, A. 2003. A bound on the deviation probability for sums of non-negative random variables. J. Ineq. Pure Applied Math. 4, 1, Art. 15.]]Google Scholar
- Muthukrishnan, S. 2003. Data streams: Algorithms and applications (invited talk at SODA'03). Available at http://athos.rutgers.edu/~muthu/stream-1-1.ps.]] Google ScholarDigital Library
- Nisan, N. 1990. Pseudorandom generators for space-bounded computation. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York, 204--212.]] Google ScholarDigital Library
- Nisan, N. 1992. RL ⊂ SC. In Proceedings of the Annual ACM Symposium on Theory of Computing, 619--623.]] Google ScholarDigital Library
- Sivakumar, D. 2002. Algorithmic derandomization via complexity theory. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York, 619--626.]] Google ScholarDigital Library
- Thaper, N., Guha, S., Indyk, P., and Koudas, N. 2002. Dynamic multidimensional histograms. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), ACM, New York.]] Google ScholarDigital Library
- Wasserman, H., and Blum, M. 1997. Software reliability via run-time result checking. J. ACM.]] Google ScholarDigital Library
- Zolotarev, V. 1986. One-Dimensional Stable Distributions. Vol. 65 of Translations of Mathematical Monographs, American Mathematical Society.]]Google Scholar
Index Terms
- Stable distributions, pseudorandom generators, embeddings, and data stream computation
Recommendations
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00: Proceedings of the 41st Annual Symposium on Foundations of Computer ScienceIn this paper we show several results obtained by combining the use of stable distributions with pseudorandom generators for bounded space. In particular: we show how to maintain (using only O(log n//spl epsiv//sup 2/) words of storage) a sketch C(p) of ...
Improved pseudorandom generators for depth 2 circuits
APPROX/RANDOM'10: Proceedings of the 13th international conference on Approximation, and 14 the International conference on Randomization, and combinatorial optimization: algorithms and techniquesWe prove the existence of a poly(n,m)-time computable pseudorandom generator which "1/poly(n,m)-fools" DNFs with n variables and m terms, and has seed length O(log2nm ċ log log nm). Previously, the best pseudorandom generator for depth-2 circuits had ...
A near-optimal algorithm for estimating the entropy of a stream
We describe a simple algorithm for approximating the empirical entropy of a stream of m values up to a multiplicative factor of (1+ϵ) using a single pass, O(ϵ−2 log (δ−1) log m) words of space, and O(log ϵ−1 + log log δ−1 + log log m) processing time ...
Comments