article

iDistance: An adaptive B⁺-tree based indexing method for nearest neighbor search

Authors:
H. V. Jagadish

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Beng Chin Ooi

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Kian-Lee Tan

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Cui Yu

Monmouth University, West Long Branch, NJ

Monmouth University, West Long Branch, NJ
View Profile

,
Rui Zhang

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 30 Issue 2pp 364–397https://doi.org/10.1145/1071610.1071612

Published:01 June 2005Publication History

ACM Transactions on Database Systems

Abstract

In this article, we present an efficient B⁺-tree based indexing method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional metric space. iDistance partitions the data based on a space- or data-partitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed into a single dimensional value based on their similarity with respect to the reference point. This allows the points to be indexed using a B⁺-tree structure and KNN search to be performed using one-dimensional range search. The choice of partition and reference points adapts the index structure to the data distribution.We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness. We also present a cost model for iDistance KNN search, which can be exploited in query optimization.

References

Aggarwal, C., Procopiuc, C., Wolf, J., Yu, P., and Park, J. 1999. Fast algorithm for projected clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]] Google ScholarDigital Library
Arya, S., Mount, D., Netanyahu, N., Silverman, R., and Wu, A. 1994. An optimal algorithm for approximate nearest neighbor searching. In Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 573--582.]] Google ScholarDigital Library
Arya, S., Mount, D., Netanyahu, N., Silverman, R., and Wu, A. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 6, 891--923.]] Google ScholarDigital Library
Berchtold, S., Böhm, C., Jagadish, H., Kriegel, H., and Sander, J. 2000. Independent quantization: An index compression technique for high-dimensional data spaces. In Proceedings of the International Conference on Data Engineering. 577--588.]] Google ScholarDigital Library
Berchtold, S., Böhm, C., and Kriegel, H.-P. 1998a. The pyramid-technique: Towards breaking the curse of dimensionality. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 142--153.]] Google ScholarDigital Library
Berchtold, S., Ertl, B., Keim, D., Kriegel, H.-P., and Seidl, T. 1998b. Fast nearest neighbor search in high-dimensional space. In Proceedings of the International Conference on Data Engineering. 209--218.]] Google ScholarDigital Library
Berchtold, S., Keim, D., and Kriegel, H. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the International Conference on Very Large Data Bases. 28--37.]] Google ScholarDigital Library
Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. 1999. When is nearest neighbors meaningful? In Proceedings of the International Conference on Database Theory.]] Google ScholarDigital Library
Böhm, C., Berchtold, S., and Keim, D. 2001. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33, 3, 322--373.]] Google ScholarDigital Library
Bozkaya, T. and Ozsoyoglu, M. 1997. Distance-based indexing for high-dimensional metric spaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 357--368.]] Google ScholarDigital Library
Chakrabarti, K. and Mehrotra, S. 1999. The hybrid tree: An index structure for high dimensional feature spaces. In Proceedings of the International Conference on Data Engineering. 322--331.]] Google ScholarDigital Library
Chakrabarti, K. and Mehrotra, S. 2000. Local dimensionality reduction: a new approach to indexing high dimensional spaces. In Proceedings of the International Conference on Very Large Databases. 89--100.]] Google ScholarDigital Library
Ciaccia, P., Patella, M., and Zezula, P. 1997. M-trees: An efficient access method for similarity search in metric space. In Proceedings of the International Conference on Very Large Data Bases. 426--435.]] Google ScholarDigital Library
Cui, B., Ooi, B. C., Su, J. W., and Tan, K. L. 2003. Contorting high dimensional data for efficient main memory processing. In Proceedings of the ACM SIGMOD Conference. 479--490.]] Google ScholarDigital Library
Cui, B., Ooi, B. C., Su, J. W., and Tan, K. L. 2004. Indexing high-dimensional data for efficient in-memory similarity search. In IEEE Trans. Knowl. Data Eng. to appear.]] Google ScholarDigital Library
Faloutsos, C. and Lin, K.-I. 1995. Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 163--174.]] Google ScholarDigital Library
Filho, R. F. S., Traina, A., and Faloutsos, C. 2001. Similarity search without tears: The omni family of all-purpose access methods. In Proceedings of the International Conference on Data Engineering. 623--630.]] Google ScholarDigital Library
Goldstein, J. and Ramakrishnan, R. 2000. Contrast plots and p-sphere trees: space vs. time in nearest neighbor searches. In Proceedings of the International Conference on Very Large Databases. 429--440.]] Google ScholarDigital Library
Guha, S., Rastogi, R., and Shim, K. 1998. Cure: an efficient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]] Google ScholarDigital Library
Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 47--57.]] Google ScholarDigital Library
Jagadish, H., Ooi, B. C., Tan, K.-L., Yu, C., and Zhang, R. 2004. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. Tech. Rep. www.comp.nus.edu.sg/~ooibc, National University of Singapore.]]Google Scholar
Jolliffe, I. T. 1986. Principle Component Analysis. Springer-Verlag.]]Google Scholar
Katamaya, N. and Satoh, S. 1997. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]] Google ScholarDigital Library
Koudas, N., Ooi, B. C., Tan, K.-L., and Zhang, R. 2004. Approximate NN queries on streams with guaranteed error/performance bounds. In Proceedings of the International Conference on Very Large Data Bases. 804--815.]] Google ScholarDigital Library
Kruskal, J. B. 1956. On the shortest spanning subtree of a graph and the travelling salesman problem. In Proceedings of the American Mathematical Society 7, 48--50.]]Google ScholarCross Ref
Lin, K., Jagadish, H., and Faloutsos, C. 1995. The TV-tree: An index structure for high-dimensional data. VLDB Journal 3, 4, 517--542.]] Google ScholarDigital Library
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Fifth Berkeley Symposium on Mathematical statistics and probability. University of California Press, 281--297.]]Google Scholar
Ooi, B. C., Tan, K. L., Yu, C., and Bressan, S. 2000. Indexing the edge: a simple and yet efficient approach to high-dimensional indexing. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 166--174.]] Google ScholarDigital Library
Pagel, B.-U., Korn, F., and Faloutsos, C. 2000. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the International Conference on Data Engineering.]] Google ScholarDigital Library
Ramakrishnan, R. and Gehrke, J. 2000. Database Management Systems. McGraw-Hill.]] Google ScholarDigital Library
Sakurai, Y., Yoshikawa, M., and Uemura, S. 2000. The a-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of the International Conference on Very Large Data Bases. 516--526.]] Google ScholarDigital Library
Tao, Y., Faloutsos, C., and Papadias, D. 2003. The power-method: A comprehensive estimation technique for multi-dimensional queries. In Proceedings of the Conference on Information and Knowledge Management.]] Google ScholarDigital Library
Traina, A., Seeger, B., and Faloutsos, C. 2000. Slim-trees: high performance metric trees minimizing overlap between nodes. In Advances in Database Technology---EDBT 2000, International Conference on Extending Database Technology, Konstanz, Germany, March 27--31, 2000, Proceedings. Lecture Notes in Computer Science, vol. 1777. Springer-Verlag, 51--65.]] Google ScholarDigital Library
Weber, R., Schek, H., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the International Conference on Very Large Data Bases. 194--205.]] Google ScholarDigital Library
White, D. and Jain, R. 1996. Similarity indexing with the SS-tree. In Proceedings of the International Conference on Data Engineering. 516--523.]] Google ScholarDigital Library
Yu, C., Ooi, B. C., Tan, K. L., and Jagadish, H. 2001. Indexing the distance: an efficient method to knn processing. In Proceedings of the International Conference on Very Large Data Bases. 421--430.]] Google ScholarDigital Library
Zhang, T., Ramakrishnan, R., and Livny, M. 1996. Birch: an efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]] Google ScholarDigital Library

Index Terms

iDistance: An adaptive B⁺-tree based indexing method for nearest neighbor search
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

A comprehensive study of idistance partitioning strategies for kNN queries and high-dimensional data indexing
BNCOD'13: Proceedings of the 29th British National conference on Big Data

Efficient database indexing and information retrieval tasks such as k-nearest neighbor (kNN) search still remain difficult challenges in large-scale and high-dimensional data. In this work, we perform the first comprehensive analysis of different ...
Read More
Improving the Performance of High-Dimensional kNN Retrieval through Localized Dataspace Segmentation and Hybrid Indexing
ADBIS 2013: Proceedings of the 17th East European Conference on Advances in Databases and Information Systems - Volume 8133

Efficient data indexing and nearest neighbor retrieval are challenging tasks in high-dimensional spaces. This work builds upon our previous analyses of iDistance partitioning strategies to develop the backbone of a new indexing method using a heuristic-...
Read More
Approximate Direct and Reverse Nearest Neighbor Queries, and the k-nearest Neighbor Graph
SISAP '09: Proceedings of the 2009 Second International Workshop on Similarity Search and Applications

Retrieving the \emph{k-nearest neighbors} of a query object is a basic primitive in similarity searching. A related, far less explored primitive is to obtain the dataset elements which would have the query object within their own \emph{k}-nearest ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 30, Issue 2
June 2005
328 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1071610
Issue’s Table of Contents

Copyright © 2005 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2005
Published in tods Volume 30, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Indexing
KNN
nearest neighbor queries
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 492
  Total Citations
  View Citations
- 2,957
  Total Downloads
- Downloads (Last 12 months)112
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

iDistance: An adaptive B⁺-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

A comprehensive study of idistance partitioning strategies for kNN queries and high-dimensional data indexing

Improving the Performance of High-Dimensional kNN Retrieval through Localized Dataspace Segmentation and Hybrid Indexing

Approximate Direct and Reverse Nearest Neighbor Queries, and the k-nearest Neighbor Graph

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

A comprehensive study of idistance partitioning strategies for kNN queries and high-dimensional data indexing

Improving the Performance of High-Dimensional kNN Retrieval through Localized Dataspace Segmentation and Hybrid Indexing

Approximate Direct and Reverse Nearest Neighbor Queries, and the k-nearest Neighbor Graph

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

iDistance: An adaptive B⁺-tree based indexing method for nearest neighbor search