ABSTRACT
We introduce a new low-distortion embedding of l2d into lpO(log n) (p=1,2), called the Fast-Johnson-Linden-strauss-Transform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for low-distortion embeddings. We overcome this handicap by exploiting the "Heisenberg principle" of the Fourier transform, ie, its local-global duality. The FJLT can be used to speed up search algorithms based on low-distortion embeddings in l1 and l2. We consider the case of approximate nearest neighbors in l2d. We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
- Achlioptas, D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins, Journal of Comp. & Sys. Sci. 66 (2003), 671--687. Google ScholarDigital Library
- Alon, N., Spencer, J. The probabilistic method, John Wiley, 2nd edition, 2000.Google Scholar
- Alon, N. Problems and results in extremal combinatorics, I, Discrete Math. 273 (2003), 31--53.Google ScholarDigital Library
- Arya, S., Mount, D.M. Approximate nearest neighbor searching, Proc. 4th Annu. ACM-SIAM Symp. Disc. Alg. (1993), 271--280. Google ScholarDigital Library
- Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A. An optimal algorithm for approximate nearest neighbor searching, J. ACM 45 (1998), 891--923. Google ScholarDigital Library
- Bern, M. Approximate closest-point queries in high dimensions, Inform. Process. Lett. 45 (1993), 95--99. Google ScholarDigital Library
- Bingham, E., Mannila, H. Random projection in dimensionality reduction: applications to image and text data, Knowledge Discovery and Data Mining (2001), 245--250. Google ScholarDigital Library
- Borodin, A., Ostrovsky, R., Rabani, Y. Lower bounds for high dimensional nearest neighbor search and related problems, Proc. 31st STOC (1999), 312--321. Google ScholarDigital Library
- Carlen, E. A., Carvalho, M. C., Loss, M. Determination of the Spectral Gap for Kac's Master Equation and Related Stochastic Evolutions, Preprint arXiv:math-ph/0109003, (2000)Google Scholar
- Chakrabarti, A., Regev, O. An optimal randomised cell probe lower bound for approximate nearest neighbor searching, Proc. 44th FOCS (2004). Google ScholarDigital Library
- Chan, T. Approximate nearest neighbor queries revisited, Proc. 13th Annu. ACM Symp. Comput. Geom. (1997), 352--358. Google ScholarDigital Library
- Clarkson, K.L. An algorithm for approximate closest-point queries, Proc. 10th Annu. ACM Symp. Comput. Geom. 10 (1994), 160--164. Google ScholarDigital Library
- Clarkson, K.L. Nearest neighbor queries in metric spaces, Proc. 29th Annu. ACM Sympos. Theory Comput., 1997. Google ScholarDigital Library
- Dasgupta, S., Gupta, A. An elementary proof of the Johnson-Lindenstrauss lemma, Technical Report 99-006, UC Berkeley, March 1999.Google Scholar
- Diaconis, P., Saloff-Coste, L. Bounds for Kac's Master Equation, Communications in Mathematical Physics 209(3), (2000), 729--755Google ScholarCross Ref
- Farach-Colton, M., Indyk, P. Approximate nearest neighbor algorithms for Hausdorff metrics via embeddings, Proc. 40th FOCS (1999). Google ScholarDigital Library
- Feige, U., Peleg, D., Raghavan, P., Upfal, E. Computing with unreliable information, Proc. 20nd STOC (1990), 128--137. Google ScholarDigital Library
- Frankl, P., Maehara, H. The Johnson-Lindenstrauss lemma and the sphericity of some graphs, Journal of Combinatorial Theory Series A, 44 (1987), 355--362. Google ScholarDigital Library
- Har-Peled, S. A replacement for Voronoi diagrams of near linear size, Proc. FOCS (2001), 94--103. Google ScholarDigital Library
- Hastings, W. Monte Carlo sampling methods using Markov chains and their applications, Biometrica 57, 97--109Google ScholarCross Ref
- Indyk, P. On approximate nearest neighbors in non-Euclidean spaces, Proc. 39th FOCS (1999). Google ScholarDigital Library
- Indyk, P. High-dimensional computational geometry, Thesis (2000), Stanford University. Google ScholarDigital Library
- Indyk, P. Dimensionality reduction techniques for proximity problems, Proc. SODA (2000), 371--378. Google ScholarDigital Library
- Indyk, P. Nearest neighbors in high-dimensional spaces, Handbook of Discrete and Computational Geometry, eds., J.E. Goodman and J. O'Rourke, CRC Press (2004).Google Scholar
- Indyk, P., Matousek, J. Low-distortion embeddings of finite metric spaces, Handbook of Discrete and Computational Geometry, eds., J.E. Goodman and J. O'Rourke, CRC Press (2004).Google Scholar
- Indyk, P., Motwani, R. Approximate nearest neighbors: towards removing the curse of dimensionality, Proc. 30th STOC (1998), 604--613. Google ScholarDigital Library
- Johnson, W.B., Lindenstrauss, J. Extensions of Lipschitz mappings into a Hilbert space, Contemp. Math. 26 (1984), 189--206.Google ScholarCross Ref
- Kac, M. Probability and related topics in physical science, Wiley Interscience, N.Y.Google Scholar
- Kleinberg, J. Two algorithms for nearest neighbor search in high dimensions, Proc. 29th STOC (1997), 599--608. Google ScholarDigital Library
- Kushilevitz, E., Ostrovsky, R., Rabani, Y. Efficient search for approximate nearest neighbor in high-dimensional spaces, SIAM J. Comput. 30 (2000), 457--474. Google ScholarDigital Library
- Matousek, J. Lectures on Discrete Geometry, Springer, May 2002. Google ScholarDigital Library
- Muthukrishnan, S., Sahinalp, S. C., Simple and practical sequence nearest neighbors with block operations, Proc. 13th Annual Symposium on Combinatorial Pattern Matching (2002) Google ScholarDigital Library
- Pak, I. Using Stopping Times to Bound Mixing Times, Proc. SODA (1998) Google ScholarDigital Library
- Yianilos, P.N. Data structures and algorithms for nearest neighbor search in general metric spaces, Proc. 2nd Annual ACM-SIAM Symp. Disc. Alg. (1993), 311--321. Google ScholarDigital Library
- Yianilos, P.N. Locally lifting the curse of dimensionality for nearest neighbor search, Proc. SODA (2000), 361--370. Google ScholarDigital Library
Index Terms
- Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform
Recommendations
The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors
We introduce a new low-distortion embedding of $\ell_2^d$ into $\ell_p^{O(\log n)}$ ($p=1,2$) called the fast Johnson-Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the ...
The Johnson-Lindenstrauss transform: an empirical study
ALENEX '11: Proceedings of the Meeting on Algorithm Engineering & ExpermimentsThe Johnson-Lindenstrauss Lemma states that a set of n points may be embedded in a space of dimension O(log n/ε2) while preserving all pairwise distances within a factor of (1 + ε) with high probability. It has inspired a number of proofs that extend ...
The fast Johnson-Lindenstrauss transform is even faster
ICML'23: Proceedings of the 40th International Conference on Machine LearningThe Johnson-Lindenstaruss lemma (Johnson & Lindenstrauss, 1984) is a cornerstone result in dimensionality reduction, stating it is possible to embed a set of n points in d-dimensional Euclidean space into optimal k = O(ε-2 ln n) dimensions, while ...
Comments