Abstract
Computing the natural join of a set of relations is an important operation in relational database systems. The ordering of joins determines to a large extent the computation time of the join. Since the number of possible orderings could be very large, query optimizers first reduce the search space by using various heuristics and then try to select an optimal ordering of joins. Avoiding Cartesian products is a common heuristic for reducing the search space, but it cannot guarantee optimal ordering in its search space, because the cheapest Cartesian-product-free (CPF for short) ordering could be significantly worse than an optimal non-CPF ordering by a factor of an arbitrarily large number. In this paper, we use programs consisting of joins, semijoins, and projections for computing the join of some relations, and we introduce a novel algorithm that derives programs from CPF orderings of joins. We show that there exists a CPF ordering from which our algorithm derives a program whose cost is within a constant factor of the cost of an optimal ordering. Thus, our result demonstrates the effectiveness of avoiding Cartesian products as a heuristic for restricting the search space of orderings of joins.
- BEERI, C. R., FAGIN, R., MAIER, D., AND YANNAKAKIS, M. 1983. On the desirability of acyclic database schemas. J. ACM 30, 3 (July), 479-513. Google ScholarDigital Library
- BERNSTEIN, P. A., AND GOODMAN, N. 1981. The power of natural semijoins. SIAM J. Comput. 10, 4 (Nov.) 751-771.Google ScholarCross Ref
- CHANDRA, A. K., AND MERLIN, P. M. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the 9th ACM Symposium on the Theory of Computing, (Boulder, Colo., May 2-4). ACM, New York, pp. 77-90. Google ScholarDigital Library
- GOODMAN, N., AND SHMUELI, O. 1984. The tree projection theorem and relational query processing. J. Comput. Syst. Sci. 28, 1 (Feb.), 60-79.Google ScholarCross Ref
- SAGIV, Y., AND SHMUELI, O. 1993. Solving queries by tree projections. ACM Trans. Database Syst. 18, 3 (Sept.), 487-511. Google ScholarDigital Library
- SELINGER, P. G., ASTRAHAN, M. M., CHAMBERLIN, D. D., LORIE, P. A., AND PRICE, f. G. 1979. Access path selection in a relational database system. In Proceedings of the 1979 ACM-SIGMOD International Conference on Management of Data (Boston, Mass., May 30-June 1). ACM, New York, pp. 23-34. Google ScholarDigital Library
- SMITH, D. E., AND GENESERETH, M. R. 1985. Ordering conjunctive queries. Artif. Intell. 26, 171-215. Google ScholarDigital Library
- SWAMI, A., AND GUPTA, A. 1988. Optimization of large join queries. In Proceedings of the 1988 ACM-SIGMOD International Conference on Management of Data (Chicago, Ill., June 1-3). ACM, New York, 1988, pp. 8-17. Google ScholarDigital Library
- SWAMI, A. 1989. Optimization of large join queries: Combining heuristics and combinatorial techniques. In Proceedings of the 1989 ACM-SIGMOD International Conference on Management of Data (Portland, Ore., May 31-June 2). ACM, New York, 367-376. Google ScholarDigital Library
- fAY, Y. C. 1993. On the optimality of strategies for multiple joins. J. ACM 40, 5 (Nov.), 1067-1086. Google ScholarDigital Library
- ULLMAN, J.D. 1989. Principles of Database and Knowledge-Base Systems, vol. 2. Computer Science Press, New York. Google ScholarDigital Library
- WONG, E., AND YOUSEFFI, K. 1976. Decomposition-A strategy for query processing. ACM Trans. Database Syst. 1, 3, (Sept.), 223-241. Google ScholarDigital Library
- YANNAKAKIS, M. 1981. Algorithms for acyclic database schemes. In Proceedings of the International Conference on Very Large Data Bases (Cannes, France, Sept. 9-11). ACM, New York, pp. 82-94.Google Scholar
Index Terms
- Avoiding Cartesian products for multiple joins
Recommendations
Avoiding Cartesian products in programs for multiple joins (extended abstract)
PODS '92: Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systemsAvoiding Cartesian products is a common heuristic to reduce the search space of join expressions (orderings) over some set of relations. However, this heuristic cannot guarantee optimal join expressions in its search space because the cheapest Cartesian-...
Extension of the one-shot semijoin strategy to minimize data transmission cost in distributed query processing
The optimization of general queries in a distributed database management system is an important and challenging research issue. The problem is to find an optimal evaluation strategy for a given general query. In this paper, we propose an approach based ...
Comments