Abstract
In the rank join problem, we are given a set of relations and a scoring function, and the goal is to return the join results with the top k scores. It is often the case in practice that the inputs may be accessed in ranked order and the scoring function is monotonic. These conditions allow for efficient algorithms that solve the rank join problem without reading all of the input. In this article, we present a thorough analysis of such rank join algorithms. A strong point of our analysis is that it is based on a more general problem statement than previous work, making it more relevant to the execution model that is employed by database systems. One of our results indicates that the well-known HRJN algorithm has shortcomings, because it does not stop reading its input as soon as possible. We find that it is NP-hard to overcome this weakness in the general case, but cases of limited query complexity are tractable. We prove the latter with an algorithm that infers provably tight bounds on the potential benefit of reading more input in order to stop as soon as possible. As a result, the algorithm achieves a cost that is within a constant factor of optimal.
- Agrawal, P. and Widom, J. 2009. Confidence-Aware join algorithms. In Proceedings of the IEEE International Conference on Data Engineering. IEEE Computer Society, 628--639. Google ScholarDigital Library
- Bast, H., Majumdar, D., Schenkel, R., Theobald, M., and Weikum, G. 2006. IO-Top-k: Index-Access optimized top-k query processing. In Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 475--486. Google ScholarDigital Library
- Bruno, N., Gravano, L., and Marian, A. 2002. Evaluating top-k queries over Web-accessible databases. In Proceedings of the 18th International Conference on Data Engineering. IEEE Computer Society, 369--380. Google ScholarDigital Library
- Chang, K. C.-C. and Hwang, S.-w. 2002. Minimal probing: Supporting expensive predicates for top-k queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 346--357. Google ScholarDigital Library
- Chaudhuri, S. and Gravano, L. 1996. Optimizing queries over multimedia repositories. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 91--102. Google ScholarDigital Library
- Chaudhuri, S. and Gravano, L. 1999. Evaluating top-k selection queries. In Proceedings of the 25th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 397--410. Google ScholarDigital Library
- Fagin, R. 1999. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci. 58, 1, 83--99. Google ScholarDigital Library
- Fagin, R., Lotem, A., and Naor, M. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66, 4, 614--656. Google ScholarDigital Library
- Finger, J. and Polyzotis, N. 2009. Robust and efficient algorithms for rank join evaluation. In Proceedings of the 35th SIGMOD International Conference on Management of Data. ACM, New York, 415--428. Google ScholarDigital Library
- Graefe, G. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2, 73--170. Google ScholarDigital Library
- Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2002. Joining ranked inputs in practice. In Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, 950--961. Google ScholarDigital Library
- Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2004a. Supporting top-k join queries in relational databases. VLDB J. 13, 3, 207--221. Google ScholarDigital Library
- Ilyas, I. F., Aref, W. G., Elmagarmid, A. K., Elmongui, H. G., Shah, R., and Vitter, J. S. 2006. Adaptive rank-aware query optimization in relational databases. ACM Trans. Datab. Syst. 31, 4, 1257--1304. Google ScholarDigital Library
- Ilyas, I. F., Shah, R., Aref, W. G., Vitter, J. S., and Elmagarmid, A. K. 2004b. Rank-Aware query optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 203--214. Google ScholarDigital Library
- Kimelfeld, B. and Sagiv, Y. 2006. Incrementally computing ordered answers of acyclic conjunctive queries. In Next Generation Information Technologies and Systems. Springer, 141--152. Google ScholarDigital Library
- Li, C., Chang, K. C.-C., and Ilyas, I. F. 2006. Supporting ad-hoc ranking aggregates. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 61--72. Google ScholarDigital Library
- Li, C., Chang, K. C.-C., Ilyas, I. F., and Song, S. 2005. RankSQL: Query algebra and optimization for relational top-k queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 131--142. Google ScholarDigital Library
- Mamoulis, N., Yiu, M. L., Cheng, K. H., and Cheung, D. W. 2007. Efficient top-k aggregation of ranked inputs. ACM Trans. Datab. Syst. 32, 3, 19. Google ScholarDigital Library
- Natsev, A., Chang, Y.-C., Smith, J. R., Li, C.-S., and Vitter, J. S. 2001. Supporting incremental join queries on ranked inputs. In Proceedings of the 27th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 281--290. Google ScholarDigital Library
- Schnaitter, K. and Polyzotis, N. 2008. Evaluating rank joins with optimal cost. In Proceedings of the 27th Symposium on Principles of Database Systems. ACM, New York, 43--52. Google ScholarDigital Library
- Schnaitter, K., Spiegel, J., and Polyzotis, N. 2007. Depth estimation for ranking query optimization. In Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 902--913. Google ScholarDigital Library
- Yan, W. P. and Larson, P.-A. 1995. Eager aggregation and lazy aggregation. In Proceedings of the 21st International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 345--357. Google ScholarDigital Library
Index Terms
- Optimal algorithms for evaluating rank joins in database systems
Recommendations
Evaluating rank joins with optimal cost
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsIn the rank join problem, we are given a set of relations and a scoring function, and the goal is to return the join results with the top K scores. It is often the case in practice that the inputs may be accessed in ranked order and the scoring function ...
Parallel data access for multiway rank joins
ICWE'11: Proceedings of the 11th international conference on Web engineeringRank join operators perform a relational join among two or more relations, assign numeric scores to the join results based on the given scoring function and return K join results with the highest scores. The top-K join results are obtained by accessing ...
Provisional reporting for rank joins
Rank join operators perform a relational join among two or more relations, assign numeric scores to the join results based on a given scoring function, and return K join results with the highest scores, while accessing a subset of data from the input ...
Comments