Abstract
Given a high-dimensional dataset, a top-k query can be used to shortlist the k tuples that best match the user's preferences. Typically, these preferences regard a subset of the available dimensions (i.e., attributes) whose relative significance is expressed by user-specified weights. Along with the query result, we propose to compute for each involved dimension the maximal deviation to the corresponding weight for which the query result remains valid. The derived weight ranges, called immutable regions, are useful for performing sensitivity analysis, for finetuning the query weights, etc.
In this paper, we focus on top-k queries with linear preference functions over the queried dimensions. We codify the conditions under which changes in a dimension's weight invalidate the query result, and develop algorithms to compute the immutable regions. In general, this entails the examination of numerous non-result tuples. To reduce processing time, we introduce a pruning technique and a thresholding mechanism that allow the immutable regions to be determined correctly after examining only a small number of non-result tuples. We demonstrate empirically that the two techniques combine well to form a robust and highly resource-efficient algorithm. We verify the generality of our findings using real high-dimensional data from different domains (documents, images, etc) and with different characteristics.
- R. Baeza-Yates and B. R. Neto. Modern Information Retrieval. Addison Wesley, 1999. Google Scholar
- M. d. Berg, O. Cheong, M. v. Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag TELOS, 3rd ed. edition, 2008. Google Scholar
- S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421-430, 2001. Google Scholar
- K. C.-C. Chang and S. won Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD, pages 346-357, 2002. Google Scholar
- Y.-C. Chang, L. D. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The onion technique: Indexing for linear optimization queries. In SIGMOD, pages 391-402, 2000. Google Scholar
- S. Chaudhuri, L. Gravano, and A. Marian. Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng., 16(8):992-1009, 2004. Google Scholar
- R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83-99, 1999. Google Scholar
- R. Fagin, A. Lotem, and M. Naor. Optimal Aggregation Algorithms for Middleware. JCSS, 66(4):614-656, 2003. Google Scholar
- V. Hristidis and Y. Papakonstantinou. Algorithms and applications for answering ranked queries using ranked views. VLDB Journal, 13(1):49-70, 2004. Google Scholar
- M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: a probabilistic threshold approach. In SIGMOD, pages 673-686, 2008. Google Scholar
- I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. VLDB Journal, 13(3):207-221, 2004. Google Scholar
- I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv., 40(4), 2008. Google Scholar
- I. Kemelmacher and R. Basri. Indexing with Unknown Illumination and Pose. CVPR, 1:909-916, 2005. Google Scholar
- J. Li and A. Deshpande. Ranking continuous probabilistic datasets. PVLDB, 3(1):638-649, 2010. Google Scholar
- K. Mouratidis, S. Bakiras, and D. Papadias. Continuous monitoring of top-k queries over sliding windows. In SIGMOD, pages 635-646, 2006. Google Scholar
- S. Nutanong, R. Zhang, E. Tanin, and L. Kulik. The v*-diagram: a query-dependent approach to moving knn queries. PVLDB, 1(1):1095-1106, 2008. Google Scholar
- J. Pei, W. Jin, M. Ester, and Y. Tao. Catching the best views of skyline: A semantic approach based on decisive subspaces. In VLDB, pages 253-264, 2005. Google Scholar
- M. Persin. Efficient implementation of text retrieval techniques. Tech. rep. (thesis), RMIT, Australia, 1996.Google Scholar
- S. Prabhakar, Y. Xia, D. V. Kalashnikov, W. G. Aref, and S. E. Hambrusch. Query indexing and velocity constrained indexing: Scalable techniques for continuous queries on moving objects. IEEE Trans. Computers, 51(10):1124-1140, 2002. Google Scholar
- M. A. Soliman, I. F. Ilyas, D. Martinenghi, and M. Tagliasacchi. Ranking with uncertain scoring functions: semantics and sensitivity measures. In SIGMOD, pages 805-816, 2011. Google Scholar
- Z. Song and N. Roussopoulos. K-nearest neighbor search for moving query point. In SSTD, pages 79-96, 2001. Google Scholar
- Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3):424-445, 2007. Google Scholar
- Y. Tao, X. Xiao, and J. Pei. Subsky: Efficient computation of skylines in subspaces. In ICDE, page 65, 2006. Google Scholar
- Y. Tao, X. Xiao, and J. Pei. Efficient skyline and top-k retrieval in subspaces. IEEE Trans. Knowl. Data Eng., 19(8):1072-1088, 2007. Google Scholar
- P. Tsaparas, T. Palpanas, Y. Kotidis, N. Koudas, and D. Srivastava. Ranked join indices. In ICDE, pages 277-288, 2003.Google Scholar
- A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvåg. Reverse top-k queries. In ICDE, pages 365-376, 2010.Google Scholar
- K. Yi, H. Yu, J. Yang, G. Xia, and Y. Chen. Efficient maintenance of materialized top-k views. In ICDE, pages 189-200, 2003.Google Scholar
- J. Zhang, M. Zhu, D. Papadias, Y. Tao, and D. L. Lee. Location-based spatial queries. In SIGMOD, pages 443-454, 2003. Google Scholar
Index Terms
- Computing immutable regions for subspace top-k queries
Recommendations
Top-k dominating queries in uncertain databases
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database TechnologyDue to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...
Probabilistic top-k dominating queries in uncertain databases
Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...
Scalable and efficient processing of top-k multiple-type integrated queries
AbstractIn this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Comments