Abstract
In recent years, the detrimental effects of the curse of high dimensionality have been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from the performance perspective. Recent research results show that in high dimensional space, the concept of proximity may not even be qualitatively meaningful [6]. In this paper, we try to outline the effects of generalizing low dimensional techniques to high dimensional applications and the natural effects of sparsity on distance based applications. We outline the guidelines required in order to re-design either the distance functions or the distance-based applications in a meaningful way for high dimensional domains. We provide novel perspectives and insights on some new lines of work for broadening application definitions in order to effectively deal with the dimensionality curse.
Index Terms
- Re-designing distance functions and distance-based applications for high dimensional data
Recommendations
Distance-preserving projection of high dimensional data
This paper presents a distance-preserving method of mapping high dimensional data to low spaces. The low-dimensional configuration preserves exact distances of each data point to some of its near neighbors. Unlike other nonlinear mapping methods which ...
Dimensionality reduction for similarity search with the Euclidean distance in high-dimensional applications
In multimedia information retrieval, multimedia data are represented as vectors in high-dimensional space. To search these vectors efficiently, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades ...
Enhanced algorithm for high-dimensional data classification
Graphical abstractIllustration of the decision hyperplanes generated by TSSVM, MCVSVM, and LMLP on an artificial dataset. Display Omitted HighlightsIn the case of the singularity of the within-class scatter matrix, the drawbacks of both MCVSVM and LMLP ...
Comments