skip to main content
article

Computing immutable regions for subspace top-k queries

Authors Info & Claims
Published:01 December 2012Publication History
Skip Abstract Section

Abstract

Given a high-dimensional dataset, a top-k query can be used to shortlist the k tuples that best match the user's preferences. Typically, these preferences regard a subset of the available dimensions (i.e., attributes) whose relative significance is expressed by user-specified weights. Along with the query result, we propose to compute for each involved dimension the maximal deviation to the corresponding weight for which the query result remains valid. The derived weight ranges, called immutable regions, are useful for performing sensitivity analysis, for finetuning the query weights, etc.

In this paper, we focus on top-k queries with linear preference functions over the queried dimensions. We codify the conditions under which changes in a dimension's weight invalidate the query result, and develop algorithms to compute the immutable regions. In general, this entails the examination of numerous non-result tuples. To reduce processing time, we introduce a pruning technique and a thresholding mechanism that allow the immutable regions to be determined correctly after examining only a small number of non-result tuples. We demonstrate empirically that the two techniques combine well to form a robust and highly resource-efficient algorithm. We verify the generality of our findings using real high-dimensional data from different domains (documents, images, etc) and with different characteristics.

References

  1. R. Baeza-Yates and B. R. Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarGoogle Scholar
  2. M. d. Berg, O. Cheong, M. v. Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag TELOS, 3rd ed. edition, 2008. Google ScholarGoogle Scholar
  3. S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421-430, 2001. Google ScholarGoogle Scholar
  4. K. C.-C. Chang and S. won Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD, pages 346-357, 2002. Google ScholarGoogle Scholar
  5. Y.-C. Chang, L. D. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The onion technique: Indexing for linear optimization queries. In SIGMOD, pages 391-402, 2000. Google ScholarGoogle Scholar
  6. S. Chaudhuri, L. Gravano, and A. Marian. Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng., 16(8):992-1009, 2004. Google ScholarGoogle Scholar
  7. R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83-99, 1999. Google ScholarGoogle Scholar
  8. R. Fagin, A. Lotem, and M. Naor. Optimal Aggregation Algorithms for Middleware. JCSS, 66(4):614-656, 2003. Google ScholarGoogle Scholar
  9. V. Hristidis and Y. Papakonstantinou. Algorithms and applications for answering ranked queries using ranked views. VLDB Journal, 13(1):49-70, 2004. Google ScholarGoogle Scholar
  10. M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: a probabilistic threshold approach. In SIGMOD, pages 673-686, 2008. Google ScholarGoogle Scholar
  11. I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. VLDB Journal, 13(3):207-221, 2004. Google ScholarGoogle Scholar
  12. I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv., 40(4), 2008. Google ScholarGoogle Scholar
  13. I. Kemelmacher and R. Basri. Indexing with Unknown Illumination and Pose. CVPR, 1:909-916, 2005. Google ScholarGoogle Scholar
  14. J. Li and A. Deshpande. Ranking continuous probabilistic datasets. PVLDB, 3(1):638-649, 2010. Google ScholarGoogle Scholar
  15. K. Mouratidis, S. Bakiras, and D. Papadias. Continuous monitoring of top-k queries over sliding windows. In SIGMOD, pages 635-646, 2006. Google ScholarGoogle Scholar
  16. S. Nutanong, R. Zhang, E. Tanin, and L. Kulik. The v*-diagram: a query-dependent approach to moving knn queries. PVLDB, 1(1):1095-1106, 2008. Google ScholarGoogle Scholar
  17. J. Pei, W. Jin, M. Ester, and Y. Tao. Catching the best views of skyline: A semantic approach based on decisive subspaces. In VLDB, pages 253-264, 2005. Google ScholarGoogle Scholar
  18. M. Persin. Efficient implementation of text retrieval techniques. Tech. rep. (thesis), RMIT, Australia, 1996.Google ScholarGoogle Scholar
  19. S. Prabhakar, Y. Xia, D. V. Kalashnikov, W. G. Aref, and S. E. Hambrusch. Query indexing and velocity constrained indexing: Scalable techniques for continuous queries on moving objects. IEEE Trans. Computers, 51(10):1124-1140, 2002. Google ScholarGoogle Scholar
  20. M. A. Soliman, I. F. Ilyas, D. Martinenghi, and M. Tagliasacchi. Ranking with uncertain scoring functions: semantics and sensitivity measures. In SIGMOD, pages 805-816, 2011. Google ScholarGoogle Scholar
  21. Z. Song and N. Roussopoulos. K-nearest neighbor search for moving query point. In SSTD, pages 79-96, 2001. Google ScholarGoogle Scholar
  22. Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou. Branch-and-bound processing of ranked queries. Inf. Syst., 32(3):424-445, 2007. Google ScholarGoogle Scholar
  23. Y. Tao, X. Xiao, and J. Pei. Subsky: Efficient computation of skylines in subspaces. In ICDE, page 65, 2006. Google ScholarGoogle Scholar
  24. Y. Tao, X. Xiao, and J. Pei. Efficient skyline and top-k retrieval in subspaces. IEEE Trans. Knowl. Data Eng., 19(8):1072-1088, 2007. Google ScholarGoogle Scholar
  25. P. Tsaparas, T. Palpanas, Y. Kotidis, N. Koudas, and D. Srivastava. Ranked join indices. In ICDE, pages 277-288, 2003.Google ScholarGoogle Scholar
  26. A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvåg. Reverse top-k queries. In ICDE, pages 365-376, 2010.Google ScholarGoogle Scholar
  27. K. Yi, H. Yu, J. Yang, G. Xia, and Y. Chen. Efficient maintenance of materialized top-k views. In ICDE, pages 189-200, 2003.Google ScholarGoogle Scholar
  28. J. Zhang, M. Zhu, D. Papadias, Y. Tao, and D. L. Lee. Location-based spatial queries. In SIGMOD, pages 443-454, 2003. Google ScholarGoogle Scholar

Index Terms

  1. Computing immutable regions for subspace top-k queries

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 6, Issue 2
        December 2012
        120 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 December 2012
        Published in pvldb Volume 6, Issue 2

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader