skip to main content
10.1145/375551.375608acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Database-friendly random projections

Published:01 May 2001Publication History

ABSTRACT

A classic result of Johnson and Lindenstrauss asserts that any set of n points in d-dimensional Euclidean space can be embedded into k-dimensional Euclidean space where k is logarithmic in n and independent of d so that all pairwise distances are maintained within an arbitrarily small factor. All known constructions of such embeddings involve projecting the n points onto a random k-dimensional hyperplane. We give a novel construction of the embedding, suitable for database applications, which amounts to computing a simple aggregate over k random attribute partitions.

References

  1. 1.S. Arora and R. Kannan. Learning mixtures of arbitrary Gaussians. Submitted, 2000.Google ScholarGoogle Scholar
  2. 2.S. Dasgupta. Learning mixtures of Gaussians. In 40th Annual Symposium on Foundations of Computer Science (New York, NY, 1999), pages 634-644. IEEE Comput. Soc. Press, Los Alamitos, CA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical report 99-006, UC Berkeley, March 1999.Google ScholarGoogle Scholar
  4. 4.P. Frankl and H. Maehara. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. J. Combin. Theory Ser. B, 44(3):355-362, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In 41st Annual Symposium on Foundations of Computer Science (Redondo Beach, CA, 2000), pages 189-197. IEEE Comput. Soc. Press, Los Alamitos, CA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In 30th Annual ACM Symposium on Theory of Computing (Dallas, TX), pages 604-613. ACM, New York, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In Conference in modern analysis and probability (New Haven, Conn., 1982), pages 189-206. Amer. Math. Soc., Providence, R.I., 1984.Google ScholarGoogle Scholar
  8. 8.J. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In 29th Annual ACM Symposium on Theory of Computing (El Paso, TX, 1997), pages 599-608. ACM, New York, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.N. Linial, E. London, and Y. Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215-245, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  10. 10.C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In 17th Annual Symposium on Principles of Database Systems (Seattle, WA, 1998), pages 159-168, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.L. J. Schulman. Clustering for edge-cost minimization. In 32nd Annual ACM Symposium on Theory of Computing (Portland, OR, 2000), pages 547-555. ACM, New York, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.S. Vempala. A random sampling based algorithm for learning the intersection of half-spaces. In 38th Annual Symposium on Foundations of Computer Science (Miami, FL, 1997), pages 508-513. IEEE Comput. Soc. Press, Los Alamitos, CA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.S. Vempala and R. I. Arriaga. An algorithmic theory of learning: robust concepts and random projection. In 40th Annual Symposium on Foundations of Computer Science (New York, NY, 1999), pages 616-623. IEEE Comput. Soc. Press, Los Alamitos, CA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Database-friendly random projections

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
        May 2001
        301 pages
        ISBN:1581133618
        DOI:10.1145/375551

        Copyright © 2001 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 May 2001

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        PODS '01 Paper Acceptance Rate26of99submissions,26%Overall Acceptance Rate642of2,707submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader