article

Computing immutable regions for subspace top-k queries

Authors:
Kyriakos Mouratidis

School of Information Systems, Singapore Management University

School of Information Systems, Singapore Management University
View Profile

,
HweeHwa Pang

School of Information Systems, Singapore Management University

School of Information Systems, Singapore Management University
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 2pp 73–84https://doi.org/10.14778/2535568.2448941

Published:01 December 2012Publication History

Proceedings of the VLDB Endowment

Abstract

Given a high-dimensional dataset, a top-k query can be used to shortlist the k tuples that best match the user's preferences. Typically, these preferences regard a subset of the available dimensions (i.e., attributes) whose relative significance is expressed by user-specified weights. Along with the query result, we propose to compute for each involved dimension the maximal deviation to the corresponding weight for which the query result remains valid. The derived weight ranges, called immutable regions, are useful for performing sensitivity analysis, for finetuning the query weights, etc.

In this paper, we focus on top-k queries with linear preference functions over the queried dimensions. We codify the conditions under which changes in a dimension's weight invalidate the query result, and develop algorithms to compute the immutable regions. In general, this entails the examination of numerous non-result tuples. To reduce processing time, we introduce a pruning technique and a thresholding mechanism that allow the immutable regions to be determined correctly after examining only a small number of non-result tuples. We demonstrate empirically that the two techniques combine well to form a robust and highly resource-efficient algorithm. We verify the generality of our findings using real high-dimensional data from different domains (documents, images, etc) and with different characteristics.

References

R. Baeza-Yates and B. R. Neto. Modern Information Retrieval. Addison Wesley, 1999. Google Scholar
M. d. Berg, O. Cheong, M. v. Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag TELOS, 3rd ed. edition, 2008. Google Scholar
S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421-430, 2001. Google Scholar
K. C.-C. Chang and S. won Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD, pages 346-357, 2002. Google Scholar
Y.-C. Chang, L. D. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The onion technique: Indexing for linear optimization queries. In SIGMOD, pages 391-402, 2000. Google Scholar
S. Chaudhuri, L. Gravano, and A. Marian. Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng., 16(8):992-1009, 2004. Google Scholar
R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83-99, 1999. Google Scholar
R. Fagin, A. Lotem, and M. Naor. Optimal Aggregation Algorithms for Middleware. JCSS, 66(4):614-656, 2003. Google Scholar
V. Hristidis and Y. Papakonstantinou. Algorithms and applications for answering ranked queries using ranked views. VLDB Journal, 13(1):49-70, 2004. Google Scholar
M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: a probabilistic threshold approach. In SIGMOD, pages 673-686, 2008. Google Scholar
I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. VLDB Journal, 13(3):207-221, 2004. Google Scholar
I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv., 40(4), 2008. Google Scholar
I. Kemelmacher and R. Basri. Indexing with Unknown Illumination and Pose. CVPR, 1:909-916, 2005. Google Scholar
J. Li and A. Deshpande. Ranking continuous probabilistic datasets. PVLDB, 3(1):638-649, 2010. Google Scholar
K. Mouratidis, S. Bakiras, and D. Papadias. Continuous monitoring of top-k queries over sliding windows. In SIGMOD, pages 635-646, 2006. Google Scholar
S. Nutanong, R. Zhang, E. Tanin, and L. Kulik. The v*-diagram: a query-dependent approach to moving knn queries. PVLDB, 1(1):1095-1106, 2008. Google Scholar
J. Pei, W. Jin, M. Ester, and Y. Tao. Catching the best views of skyline: A semantic approach based on decisive subspaces. In VLDB, pages 253-264, 2005. Google Scholar
M. Persin. Efficient implementation of text retrieval techniques. Tech. rep. (thesis), RMIT, Australia, 1996.Google Scholar
S. Prabhakar, Y. Xia, D. V. Kalashnikov, W. G. Aref, and S. E. Hambrusch. Query indexing and velocity constrained indexing: Scalable techniques for continuous queries on moving objects. IEEE Trans. Computers, 51(10):1124-1140, 2002. Google Scholar
M. A. Soliman, I. F. Ilyas, D. Martinenghi, and M. Tagliasacchi. Ranking with uncertain scoring functions: semantics and sensitivity measures. In SIGMOD, pages 805-816, 2011. Google Scholar
Z. Song and N. Roussopoulos. K-nearest neighbor search for moving query point. In SSTD, pages 79-96, 2001. Google Scholar
Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3):424-445, 2007. Google Scholar
Y. Tao, X. Xiao, and J. Pei. Subsky: Efficient computation of skylines in subspaces. In ICDE, page 65, 2006. Google Scholar
Y. Tao, X. Xiao, and J. Pei. Efficient skyline and top-k retrieval in subspaces. IEEE Trans. Knowl. Data Eng., 19(8):1072-1088, 2007. Google Scholar
P. Tsaparas, T. Palpanas, Y. Kotidis, N. Koudas, and D. Srivastava. Ranked join indices. In ICDE, pages 277-288, 2003.Google Scholar
A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvåg. Reverse top-k queries. In ICDE, pages 365-376, 2010.Google Scholar
K. Yi, H. Yu, J. Yang, G. Xia, and Y. Chen. Efficient maintenance of materialized top-k views. In ICDE, pages 189-200, 2003.Google Scholar
J. Zhang, M. Zhu, D. Papadias, Y. Tao, and D. L. Lee. Location-based spatial queries. In SIGMOD, pages 443-454, 2003. Google Scholar

Index Terms

Computing immutable regions for subspace top-k queries
1. Information systems
  1. Data management systems
    1. Database management system engines
  2. Information retrieval

Recommendations

Top-k dominating queries in uncertain databases
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...
Read More
Probabilistic top-k dominating queries in uncertain databases

Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...
Read More
Scalable and efficient processing of top-k multiple-type integrated queries
Abstract
In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 2
December 2012
120 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 December 2012
Published in pvldb Volume 6, Issue 2
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 66
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Computing immutable regions for subspace top-k queries

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Top-k dominating queries in uncertain databases

Probabilistic top-k dominating queries in uncertain databases

Scalable and efficient processing of top-k multiple-type integrated queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Computing immutable regions for subspace top-k queries

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Top-k dominating queries in uncertain databases

Probabilistic top-k dominating queries in uncertain databases

Scalable and efficient processing of top-k multiple-type integrated queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media