skip to main content
article

Stable distributions, pseudorandom generators, embeddings, and data stream computation

Published:01 May 2006Publication History
Skip Abstract Section

Abstract

In this article, we show several results obtained by combining the use of stable distributions with pseudorandom generators for bounded space. In particular:---We show that, for any p ∈ (0, 2], one can maintain (using only O(log n2) words of storage) a sketch C(q) of a point qlnp under dynamic updates of its coordinates. The sketch has the property that, given C(q) and C(s), one can estimate ‖qsp up to a factor of (1 + ϵ) with large probability. This solves the main open problem of Feigenbaum et al. [1999].---We show that the aforementioned sketching approach directly translates into an approximate algorithm that, for a fixed linear mapping A, and given x ∈ ℜn and y ∈ ℜm, estimates ‖Axyp in O(n + m) time, for any p ∈ (0, 2]. This generalizes an earlier algorithm of Wasserman and Blum [1997] which worked for the case p = 2.---We obtain another sketch function C′ which probabilistically embeds ln1 into a normed space lm1. The embedding guarantees that, if we set m = log(1/δ)O(1/ϵ), then for any pair of points q, sln1, the distance between q and s does not increase by more than (1 + ϵ) with constant probability, and it does not decrease by more than (1 − ϵ) with probability 1 − δ. This is the only known dimensionality reduction theorem for the l1 norm. In fact, stronger theorems of this type (i.e., that guarantee very low probability of expansion as well as of contraction) cannot exist [Brinkman and Charikar 2003].---We give an explicit embedding of ln2 into lnO(log n)1 with distortion (1 + 1/nΘ(1)).

References

  1. Alon, N., Matias, Y., and Szegedy, M. 1996. The space complexity of approximating the frequency moments. In Proceedings of the ACM Symposium on Theory of Computing, ACM, New York, 20--29.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ar, S., Blum, M., Codenotti, B., and Gemmell, P. 1993. Checking approximate computation over the reals. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Berger, B. 1997. The fourth moment method. SIAM J. Comput. 26.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brinkman, B., and Charikar, M. 2003. On the impossibility of dimension reduction in l1. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Broder, A. 1998. Filtering near-duplicate documents. In Proceedings of FUN.]]Google ScholarGoogle Scholar
  6. Broder, A., Glassman, S., Manasse, M., and Zweig, G. 1997. Syntactic clustering of the web. In Proceedings of the 6th International World Wide Web Conference, 391--404.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chambers, J. M., Mallows, C. L., and Stuck, B. W. 1976. A method for simulating stable random variables. J. Amer. Statist. Assoc. 71, 340--344.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J., and Yang, C. 2000. Finding interesting associations without support prunning. In Proceedings of the 16th International Conference on Data Engineering (ICDE).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cormode, G. 2003. Stable distributions for stream computations: It's as easy as 0,1,2. In Proceedings of the Workshop on Management and Processing of Data Streams.]]Google ScholarGoogle Scholar
  10. Cormode, G., Datar, M., Indyk, P., and Muthukrishnan, S. 2002a. Comparing data streams using hamming norms. In Proceedings of the International Conference on Very Large Databases (VLDB).]]Google ScholarGoogle Scholar
  11. Cormode, G., Indyk, P., Koudas, N., and Muthukrishnan, S. 2002b. Fast mining of massive tabular data via approximate distance computations. In Proceedings of the 18th International Conference on Data Engineering (ICDE).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cormode, G., and Muthukrishnan, S. 2003. Estimating dominance norms of multiple data streams. In Proceedings of the European Symposium on Algorithms.]]Google ScholarGoogle Scholar
  13. Datar, M., Gionis, A., Indyk, P., and Motwani, R. 2002. Maintaining stream statistics over sliding windows. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the ACM Symposium on Computational Geometry, ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Engebretsen, L., Indyk, P., and O'Donnell, R. 2002. Deterministic dimensionality reduction with applications. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M. J., and Wright, R. N. 2001a. Secure multiparty computation of approximations. Lecture Notes in Computer Science, vol. 2076, Springer-Verlag, New York, 927.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M. J., and Wright, R. N. 2001b. Secure multiparty computation of approximations. http://eprint.iacr.org/2001/024/.]]Google ScholarGoogle Scholar
  18. Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. 1999. An approximate l1-difference algorithm for massive data streams. In Proceedings of the Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Figiel, T., Lindenstrauss, J., and Milman, V. D. 1977. The dimension of almost spherical sections of convex bodies. Acta Math. 139, 53--94.]]Google ScholarGoogle ScholarCross RefCross Ref
  20. Freivalds, R. 1979. Fast probabilistic algorithms. In Proceedings of the Mathematical Foundations of Computer Science. Lecture Notes in Computer Science, vol. 74, Springer-Verlag, New York.]]Google ScholarGoogle Scholar
  21. Gilbert, A., Guha, S., Kotidis, Y., Indyk, P., Muthukrishnan, M., and Strauss, M. 2002. In Proceedings of the Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Henzinger, M., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Technical Note 1998-011, Digital Systems Research Center, Palo Alto, CA.]]Google ScholarGoogle Scholar
  23. Indyk, P. 2000. Dimensionality reduction techniques for proximity problems. In Proceedings of the Ninth ACM-SIAM Symposium on Discrete Algorithms, ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Indyk, P. 2001. Tutorial: Algorithmic applications of low-distortion geometric embeddings. In Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Indyk, P. 2004. Algorithms for dynamic geometric problems over data streams. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Indyk, P., Koudas, N., and Muthukrishnan, S. 2000. Identifying representative trends in massive time series datasets using sketches. In Proceedings of the 26th International Conference on Very Large Databases (VLDB).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Indyk, P., and Motwani, R. 1998. Approximate nearest neighbor: towards removing the curse of dimensionality. In Proceedings of the Symposium on Theory of Computing, ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Indyk, P., and Woodruff, D. 2005. Optimal approximations of the frequency moments of data streams. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Johnson, W., and Lindenstrauss, J. 1984. Extensions of lipshitz mapping into hilbert space. Contemp. Math. 26, 189--206.]]Google ScholarGoogle ScholarCross RefCross Ref
  30. Johnson, W., and Schechtman, G. 1982. Embedding lmp into ln1. Acta Math. 149, 71--85.]]Google ScholarGoogle ScholarCross RefCross Ref
  31. Lindenstrauss, J., and Milman, V. D. 1993. The local theory of normed spaces and its applications to convexity. In Handbook of Convex Geometry, P. M. Gruber and J. M. Wills, eds Elsevier, Amsterdam, The Netherlands, 1149--1220.]]Google ScholarGoogle Scholar
  32. Linial, N., London, E., and Rabinovich, Y. 1994. The geometry of graphs and some of its algorithmic applications. In Proceedings of 35th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA, 577--591.]]Google ScholarGoogle Scholar
  33. Matoušek, J. 2004. Collection of open problems on low-distortion embeddings of finite metric spaces. Available at http://kam.mff.cuni.cz/~matousek/metrop.ps.gz|.]]Google ScholarGoogle Scholar
  34. Maurer, A. 2003. A bound on the deviation probability for sums of non-negative random variables. J. Ineq. Pure Applied Math. 4, 1, Art. 15.]]Google ScholarGoogle Scholar
  35. Muthukrishnan, S. 2003. Data streams: Algorithms and applications (invited talk at SODA'03). Available at http://athos.rutgers.edu/~muthu/stream-1-1.ps.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Nisan, N. 1990. Pseudorandom generators for space-bounded computation. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York, 204--212.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nisan, N. 1992. RL ⊂ SC. In Proceedings of the Annual ACM Symposium on Theory of Computing, 619--623.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sivakumar, D. 2002. Algorithmic derandomization via complexity theory. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York, 619--626.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Thaper, N., Guha, S., Indyk, P., and Koudas, N. 2002. Dynamic multidimensional histograms. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wasserman, H., and Blum, M. 1997. Software reliability via run-time result checking. J. ACM.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zolotarev, V. 1986. One-Dimensional Stable Distributions. Vol. 65 of Translations of Mathematical Monographs, American Mathematical Society.]]Google ScholarGoogle Scholar

Index Terms

  1. Stable distributions, pseudorandom generators, embeddings, and data stream computation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal of the ACM
        Journal of the ACM  Volume 53, Issue 3
        May 2006
        225 pages
        ISSN:0004-5411
        EISSN:1557-735X
        DOI:10.1145/1147954
        Issue’s Table of Contents

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 May 2006
        Published in jacm Volume 53, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader