ABSTRACT
Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burden-some computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally significantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.
- 1.D. Achlioptas. Database-friendly random projections. In Proc. ACM Syrup. on the Principles of Database Systems, pages 274-281, 2001.]] Google ScholarDigital Library
- 2.C. C. Aggarwal, J. L. Wolf, and P. S. Yu. A new method for similarity indexing of market basket data. In Proc. 1999 ACM SIGMOD Int. Conf. on Management of data, pages 407-418, 1999.]] Google ScholarDigital Library
- 3.R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In Proc. 4th Int. Conf. of Data Organization and Algorithms, pages 69-84. Springer, 1993.]] Google ScholarDigital Library
- 4.R. I. Arriaga and S. Vempala. An algorithmic theory of learning: robust concepts and random projection. In Proc. 40th Annual Syrup. on Foundations of Computer Science, pages 616-623. IEEE Computer Society Press, 1999.]] Google ScholarDigital Library
- 5.M.-W. Berry. Large-scale sparse singular value computations. International Journal of Super-Computer Applications, 6(1):13-49, 1992.]]Google ScholarDigital Library
- 6.S. Dasgupta. Learning mixtures of Gaussians. In ,40th Annual IEEE Symp. on Foundations of Computer Science, pages 634-644, 1999.]] Google ScholarDigital Library
- 7.S. Dasgupta. Experiments with random projection. In Proc. Uncertainty in Artificial Intelligence, 2000.]] Google ScholarDigital Library
- 8.S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute, Berkeley, California, USA, 1999.]]Google Scholar
- 9.S. Deerwester, S.T. Dumais, G.W. Furnas, and T.K. Landauer. Indexing by latent semantic analysis. Journal of the Am. Soc. for Information Science, 41(6):391-407, 1990.]]Google ScholarCross Ref
- 10.P. Frankl and H. Maehara. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. Journal of Combinatorial Theory, Ser. B, 44:355-362, 1988.]] Google ScholarDigital Library
- 11.G.H. Golub and C.F. van Loan. Matrix Computations. North Oxford Academic, Oxford, UK, 1983.]]Google Scholar
- 12.A. Craps. An introduction to wavelets. IEEE Computational Science and Engineering, 2(2):50-61, 1995.]] Google ScholarDigital Library
- 13.R. Hecht-Nielsen. Context vectors: general purpose approximate meaning representations self-organized from raw data. In J.M. Zurada, R.J. Marks II, and C.J. Robinson, editors, Computational Intelligence: Imitating Life, pages 43-56. IEEE Press, 1994.]]Google Scholar
- 14.P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proc. 30th Syrup. on Theory of Computing, pages 604-613. ACM, 1998.]] Google ScholarDigital Library
- 15.W.B. Johnson and J. Lindenstranss. Extensions of Lipshitz mapping into Hilbert space. In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, pages 189-206. Amer. Math. Soc., 1984.]]Google ScholarCross Ref
- 16.S. Kaski. Data exploration using self-organizing maps. In Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series, number 82. 1997. Dr.Tech. thesis, Helsinki University of Technology, Finland.]]Google Scholar
- 17.S. Kaski. Dimensionality reduction by random mapping. In Proc. Int. Joint Conf. on Neural Networks, volume 1, pages 413-418, 1998.]]Google Scholar
- 18.E. J. Keogh and M. J. Pazzani. A simple dimensionality reduction technique for fast similarity search in large time series databases. In 4th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, 2000.]] Google ScholarDigital Library
- 19.J.M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proc. 29th A CM Syrup. on Theory of Computing, pages 599-608, 1997.]] Google ScholarDigital Library
- 20.M. Kurimo. Indexing audio documents by using latent semantic analysis and SOM. In E. Oja and S. Kaski, editors, Kohonen Maps, pages 363-374. Elsevier, 1999.]]Google ScholarCross Ref
- 21.R. Ostrovsky and Y. Rabani. Polynomial time approximation schemens for geometric k-clustering. In Proe. 41st Syrup. on Foundations of Computer Science, pages 349-358. IEEE, 2000.]] Google ScholarDigital Library
- 22.C.H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In Proc. 17th ACM Syrup. on the Principles of Database Systems, pages 159-168, 1998.]] Google ScholarDigital Library
- 23.K.R. Rao and P. Yip. Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic Press, 1990.]] Google ScholarDigital Library
- 24.S. Roweis. EM algorithms for PCA and SPCA. In Neural Information Processing Systems 10, pages 626-632, 1997.]] Google ScholarDigital Library
- 25.G. Salton and M.J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983.]] Google ScholarDigital Library
- 26.L. Sirovich and R. Everson. Management and analysis of large scientific datasets. Int. Journal of Supercomputer Applications, 6(1):50-68, spring 1992.]]Google ScholarDigital Library
- 27.M. Sonka, V. Hlavac, and R. Boyle. Image processing, analysis, and machine vision. PWS Publishing, 1998.]] Google ScholarDigital Library
- 28.S. Vempala. Random projection: a new approach to VLSI layout. In Proc. 39th Annual Syrup. on Foundations of Computer Science. IEEE Computer Society Press, 1998.]] Google ScholarDigital Library
Recommendations
Dimensionality reduction via the Johnson---Lindenstrauss Lemma: theoretical and empirical bounds on embedding dimension
The Johnson---Lindenstrauss (JL) lemma has led to the development of tools for dealing with datasets in high dimensions. The lemma asserts that a set of high-dimensional points can be projected into lower dimensions, while approximately preserving the ...
Dimensionality reduction-based spoken emotion recognition
To improve effectively the performance on spoken emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space. In this paper, a new supervised ...
Dimensionality Reduction for Distributed Vision Systems Using Random Projection
ICPR '10: Proceedings of the 2010 20th International Conference on Pattern RecognitionDimensionality reduction is an important issue in the context of distributed vision systems. Processing of dimensionality reduced data requires far less network resources (e.g., storage space, network bandwidth) than processing of original data. In this ...
Comments