Article

Random projection in dimensionality reduction: applications to image and text data

Authors:
Ella Bingham

Helsinki University of Technology, FIN-02015 HUT, Finland

Helsinki University of Technology, FIN-02015 HUT, Finland
View Profile

,
Heikki Mannila

Helsinki University of Technology, FIN-02015 HUT, Finland

Helsinki University of Technology, FIN-02015 HUT, Finland
View Profile

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2001Pages 245–250https://doi.org/10.1145/502512.502546

Published:26 August 2001Publication History

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 245–250

ABSTRACT

Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burden-some computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally significantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.

References

1.D. Achlioptas. Database-friendly random projections. In Proc. ACM Syrup. on the Principles of Database Systems, pages 274-281, 2001.]] Google ScholarDigital Library
2.C. C. Aggarwal, J. L. Wolf, and P. S. Yu. A new method for similarity indexing of market basket data. In Proc. 1999 ACM SIGMOD Int. Conf. on Management of data, pages 407-418, 1999.]] Google ScholarDigital Library
3.R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In Proc. 4th Int. Conf. of Data Organization and Algorithms, pages 69-84. Springer, 1993.]] Google ScholarDigital Library
4.R. I. Arriaga and S. Vempala. An algorithmic theory of learning: robust concepts and random projection. In Proc. 40th Annual Syrup. on Foundations of Computer Science, pages 616-623. IEEE Computer Society Press, 1999.]] Google ScholarDigital Library
5.M.-W. Berry. Large-scale sparse singular value computations. International Journal of Super-Computer Applications, 6(1):13-49, 1992.]]Google ScholarDigital Library
6.S. Dasgupta. Learning mixtures of Gaussians. In ,40th Annual IEEE Symp. on Foundations of Computer Science, pages 634-644, 1999.]] Google ScholarDigital Library
7.S. Dasgupta. Experiments with random projection. In Proc. Uncertainty in Artificial Intelligence, 2000.]] Google ScholarDigital Library
8.S. Dasgupta and A. Gupta. An elementary proof of the Johnson-Lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute, Berkeley, California, USA, 1999.]]Google Scholar
9.S. Deerwester, S.T. Dumais, G.W. Furnas, and T.K. Landauer. Indexing by latent semantic analysis. Journal of the Am. Soc. for Information Science, 41(6):391-407, 1990.]]Google ScholarCross Ref
10.P. Frankl and H. Maehara. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. Journal of Combinatorial Theory, Ser. B, 44:355-362, 1988.]] Google ScholarDigital Library
11.G.H. Golub and C.F. van Loan. Matrix Computations. North Oxford Academic, Oxford, UK, 1983.]]Google Scholar
12.A. Craps. An introduction to wavelets. IEEE Computational Science and Engineering, 2(2):50-61, 1995.]] Google ScholarDigital Library
13.R. Hecht-Nielsen. Context vectors: general purpose approximate meaning representations self-organized from raw data. In J.M. Zurada, R.J. Marks II, and C.J. Robinson, editors, Computational Intelligence: Imitating Life, pages 43-56. IEEE Press, 1994.]]Google Scholar
14.P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proc. 30th Syrup. on Theory of Computing, pages 604-613. ACM, 1998.]] Google ScholarDigital Library
15.W.B. Johnson and J. Lindenstranss. Extensions of Lipshitz mapping into Hilbert space. In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, pages 189-206. Amer. Math. Soc., 1984.]]Google ScholarCross Ref
16.S. Kaski. Data exploration using self-organizing maps. In Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series, number 82. 1997. Dr.Tech. thesis, Helsinki University of Technology, Finland.]]Google Scholar
17.S. Kaski. Dimensionality reduction by random mapping. In Proc. Int. Joint Conf. on Neural Networks, volume 1, pages 413-418, 1998.]]Google Scholar
18.E. J. Keogh and M. J. Pazzani. A simple dimensionality reduction technique for fast similarity search in large time series databases. In 4th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, 2000.]] Google ScholarDigital Library
19.J.M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proc. 29th A CM Syrup. on Theory of Computing, pages 599-608, 1997.]] Google ScholarDigital Library
20.M. Kurimo. Indexing audio documents by using latent semantic analysis and SOM. In E. Oja and S. Kaski, editors, Kohonen Maps, pages 363-374. Elsevier, 1999.]]Google ScholarCross Ref
21.R. Ostrovsky and Y. Rabani. Polynomial time approximation schemens for geometric k-clustering. In Proe. 41st Syrup. on Foundations of Computer Science, pages 349-358. IEEE, 2000.]] Google ScholarDigital Library
22.C.H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In Proc. 17th ACM Syrup. on the Principles of Database Systems, pages 159-168, 1998.]] Google ScholarDigital Library
23.K.R. Rao and P. Yip. Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic Press, 1990.]] Google ScholarDigital Library
24.S. Roweis. EM algorithms for PCA and SPCA. In Neural Information Processing Systems 10, pages 626-632, 1997.]] Google ScholarDigital Library
25.G. Salton and M.J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983.]] Google ScholarDigital Library
26.L. Sirovich and R. Everson. Management and analysis of large scientific datasets. Int. Journal of Supercomputer Applications, 6(1):50-68, spring 1992.]]Google ScholarDigital Library
27.M. Sonka, V. Hlavac, and R. Boyle. Image processing, analysis, and machine vision. PWS Publishing, 1998.]] Google ScholarDigital Library
28.S. Vempala. Random projection: a new approach to VLSI layout. In Proc. 39th Annual Syrup. on Foundations of Computer Science. IEEE Computer Society Press, 1998.]] Google ScholarDigital Library

Recommendations

Dimensionality reduction via the Johnson---Lindenstrauss Lemma: theoretical and empirical bounds on embedding dimension

The Johnson---Lindenstrauss (JL) lemma has led to the development of tools for dealing with datasets in high dimensions. The lemma asserts that a set of high-dimensional points can be projected into lower dimensions, while approximately preserving the ...
Read More
Dimensionality reduction-based spoken emotion recognition

To improve effectively the performance on spoken emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space. In this paper, a new supervised ...
Read More
Dimensionality Reduction for Distributed Vision Systems Using Random Projection
ICPR '10: Proceedings of the 2010 20th International Conference on Pattern Recognition

Dimensionality reduction is an important issue in the context of distributed vision systems. Processing of dimensionality reduced data requires far less network resources (e.g., storage space, network bandwidth) than processing of original data. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Conference Chair:
Doheon Lee
Chonnam National University, Korea
,
General Chair:
Mario Schkolnick
SGI
,
Program Chairs:
Foster Provost
New York University
,
Ramakrishnan Srikant
IBM Almaden Research Center
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dimensionality reduction
high-dimensional data
image data
random projection
text document data
Qualifiers
- Article
Conference

Acceptance Rates
KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 784
  Total Citations
  View Citations
- 6,198
  Total Downloads
- Downloads (Last 12 months)324
- Downloads (Last 6 weeks)31
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Random projection in dimensionality reduction: applications to image and text data

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Recommendations

Dimensionality reduction via the Johnson---Lindenstrauss Lemma: theoretical and empirical bounds on embedding dimension

Dimensionality reduction-based spoken emotion recognition

Dimensionality Reduction for Distributed Vision Systems Using Random Projection