ABSTRACT
A classic result of Johnson and Lindenstrauss asserts that any set of n points in d-dimensional Euclidean space can be embedded into k-dimensional Euclidean space where k is logarithmic in n and independent of d so that all pairwise distances are maintained within an arbitrarily small factor. All known constructions of such embeddings involve projecting the n points onto a random k-dimensional hyperplane. We give a novel construction of the embedding, suitable for database applications, which amounts to computing a simple aggregate over k random attribute partitions.
- 1.S. Arora and R. Kannan. Learning mixtures of arbitrary Gaussians. Submitted, 2000.Google Scholar
- 2.S. Dasgupta. Learning mixtures of Gaussians. In 40th Annual Symposium on Foundations of Computer Science (New York, NY, 1999), pages 634-644. IEEE Comput. Soc. Press, Los Alamitos, CA, 1999. Google ScholarDigital Library
- 3.S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical report 99-006, UC Berkeley, March 1999.Google Scholar
- 4.P. Frankl and H. Maehara. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. J. Combin. Theory Ser. B, 44(3):355-362, 1988. Google ScholarDigital Library
- 5.P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In 41st Annual Symposium on Foundations of Computer Science (Redondo Beach, CA, 2000), pages 189-197. IEEE Comput. Soc. Press, Los Alamitos, CA, 2000. Google ScholarDigital Library
- 6.P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In 30th Annual ACM Symposium on Theory of Computing (Dallas, TX), pages 604-613. ACM, New York, 1998. Google ScholarDigital Library
- 7.W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In Conference in modern analysis and probability (New Haven, Conn., 1982), pages 189-206. Amer. Math. Soc., Providence, R.I., 1984.Google Scholar
- 8.J. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In 29th Annual ACM Symposium on Theory of Computing (El Paso, TX, 1997), pages 599-608. ACM, New York, 1997. Google ScholarDigital Library
- 9.N. Linial, E. London, and Y. Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215-245, 1995.Google ScholarCross Ref
- 10.C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In 17th Annual Symposium on Principles of Database Systems (Seattle, WA, 1998), pages 159-168, 1998. Google ScholarDigital Library
- 11.L. J. Schulman. Clustering for edge-cost minimization. In 32nd Annual ACM Symposium on Theory of Computing (Portland, OR, 2000), pages 547-555. ACM, New York, 2000. Google ScholarDigital Library
- 12.S. Vempala. A random sampling based algorithm for learning the intersection of half-spaces. In 38th Annual Symposium on Foundations of Computer Science (Miami, FL, 1997), pages 508-513. IEEE Comput. Soc. Press, Los Alamitos, CA, 1997. Google ScholarDigital Library
- 13.S. Vempala and R. I. Arriaga. An algorithmic theory of learning: robust concepts and random projection. In 40th Annual Symposium on Foundations of Computer Science (New York, NY, 1999), pages 616-623. IEEE Comput. Soc. Press, Los Alamitos, CA, 1999. Google ScholarDigital Library
Index Terms
- Database-friendly random projections
Recommendations
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
Special issu on PODS 2001A classic result of Johnson and Lindenstrauss asserts that any set of n points in d-dimensional Euclidean space can be embedded into k-dimensional Euclidean space---where k is logarithmic in n and independent of d--so that all pairwise distances are ...
Random Projections of Smooth Manifolds
We propose a new approach for nonadaptive dimensionality reduction of manifold-modeled data, demonstrating that a small number of random linear projections can preserve key information about a manifold-modeled signal. We center our analysis on the ...
Enhanced Locality Preserving Projections
CSSE '08: Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 01In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data. In this paper, a new manifold learning algorithm, called Enhanced Locality Preserving Projections, to identify the underlying manifold ...
Comments