Abstract
In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are proposed for applications where the data domain is high dimensional, or the distance function used to compute distances between data objects is non-Euclidean. In this paper, we introduce a distance based index structure called multi-vantage point (mvp) tree for similarity queries on high-dimensional metric spaces. The mvp-tree uses more than one vantage point to partition the space into spherical cuts at each level. It also utilizes the pre-computed (at construction time) distances between the data points and the vantage points. We have done experiments to compare mvp-trees with vp-trees which have a similar partitioning strategy, but use only one vantage point at each level, and do not make use of the pre-computed distances. Empirical studies show that mvp-tree outperforms the vp-tree 20% to 80% for varying query ranges and different distance distributions.
- AFA93 R. Agrawal, C. Faloutsos, A. Swami. "Efficient Similarity Search In Sequence Databases". In FODO Conference, 1993. Google ScholarDigital Library
- BK73 W.A. Burkhard, R.M. Keller, "Some Approaches to Best-Match File Searching", Communications of the A CM, 16(4), pages 230-236, April 1973. Google ScholarDigital Library
- BKSS90 N. Beckmann, H.P. Kriegei, R. Schneider, B. Seeger, "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles", Proceedings of the 1990 ACM SIGMOD Conference, Atlantic City, pages 322-331, May 1990. Google ScholarDigital Library
- Bri95 S. Brin, "Near Neighbor Search in Large Metric Spaces", Proceedings of the 21st VLDB Conference, pages 574-584, 1995. Google ScholarDigital Library
- Chi94 T. Chiueh, "Content-Based Image Indexing", Proceedings of the 20th VLDB Conference, pages 582-593, 1994. Google ScholarDigital Library
- FEF+94 C. Faloutsos, W. Equitz, M. Flickner et al., "Efficient and Effective Querying by Image Content", journal of Intelligent Information Systems (3), pages .231-262, 1994. Google ScholarDigital Library
- FRM94 C. Faloutsos, M. Ranganathan, Y. Manolopoulos. "Fast Subsequence Matching in Time-Series Databases". Proceedings of the 1994 ACM SIGMOD Conference, Minneapolis, pages 419-429, May 1994. Google ScholarDigital Library
- Gut84 A. Guttman, "R-Trees: A Dynamic Index Strcuture for Spatial Searching", Proceedings of the 1984 ACM SIGMOD Conference, Boston, pages 47-57, June 1984. Google ScholarDigital Library
- Ott92 M. Otterman, "Approximate Matching with High Dimensionality R-trees". M.Sc Scholarly paper, Dept. of Computer Science, Univ. of Maryland, Collage Park, MD, 1992. Supervised by C. Faloutsos.Google Scholar
- RKV95 N. Roussopoulos, S. Kelley, F. Vincent, "Nearest Neighbor Queries", Proceedings of the 1995 ACM SIGMOD Conference, San Jose, pages 71-79, May 1995. Google ScholarDigital Library
- Sam89 H. Samet, "The Design and Analysis of Spatial Data Structures", Addison Wesley, 1989. Google ScholarDigital Library
- SRF87 T. Sellis, N. Roussopoulos, C. Faloutsos, "The R+-tree: A Dynamic Index for Multi-dimensional Objects", Proceedings of thel3th VLDB Conference, pages 507-518, September 1987. Google ScholarDigital Library
- SW90 D. Shasha, T. Wang, "New Techniques for Best-Match Retrieval", ACM Transactions on Information Systems, 8(2), pages 140-158, 1990. Google ScholarDigital Library
- Uhl91 J. K. Uhlmann, "Satisfying General Proximity/Similarity Queries with Metric Trees", Information Processing Letters, vol 40, pages 175-179, 1991.Google ScholarCross Ref
- Yia93 P.N. Yiannilos, "Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces", ACM-SIAM Symposium on Discrete Algorithms, pages 311-321, 1993. Google ScholarDigital Library
Index Terms
- Distance-based indexing for high-dimensional metric spaces
Recommendations
Indexing large metric spaces for similarity search queries
One of the common queries in many database applications is finding approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query ...
Distance-based indexing for high-dimensional metric spaces
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of dataIn many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query ...
A Normalized Levenshtein Distance Metric
Although a number of normalized edit distances presented so far may offer good performance in some applications, none of them can be regarded as a genuine metric between strings because they do not satisfy the triangle inequality. Given two strings X ...
Comments