Abstract
Object-relational database management systems allow knowledgeable users to define new data types as well as new methods (operators) for the types. This flexibility produces an attendant complexity, which must be handled in new ways for an object-relational database management system to be efficient. In this article we study techniques for optimizing queries that contain time-consuming methods. The focus of traditional query optimizers has been on the choice of join methods and orders; selections have been handled by “pushdown” rules. These rules apply selections in an arbitrary order before as many joins as possible, using th e assumption that selection takes no time. However, users of object-relational systems can embed complex methods in selections. Thus selections may take significant amounts of time, and the query optimization model must be enhanced. In this article we carefully define a query cost framework that incorporates both selectivity and cost estimates for selections. We develop an algorithm called Predicate Migration, and prove that it produces optimal plans for queries with expensive methods. We then describe our implementation of Predicate Migration in the commercial object-relational database management system Illustra, and discuss practical issues that affect our earlier assumptions. We compare Predicate Migration to a variety of simplier optimization techniques, and demonstrate that Predicate Migration is the best general solution to date. The alternative techniques we present may be useful for constrained workloads.
- BITTON, D., DEWITT, D.J., AND TURBYFILL, C. 1983. Benchmarking database systems, a systematic approach. In Proceedings of the Ninth International Conference on Very Large Data Bases (Florence). Google ScholarDigital Library
- CAREY, M. g., DEWITT, D. g., KANT, C., AND NAUGHTON, g.F. 1994. A status report on the 007 OODBMS benchmarking effort. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications (Portland), 414-426. Google ScholarDigital Library
- CATTELL, R. 1994. The Object Database Standard: ODMG-93 (Release 1.1). Morgan Kaufmann, San Mateo, CA. Google ScholarDigital Library
- CHAUDHURI, S. AND SHIM, K. 1993. Query optimization in the presence of foreign functions. In Proceedings of the Nineteenth International Conference on Very Large Data Bases (Dublin), 526-541. Google ScholarDigital Library
- CHEN, H., Yu, X., YAMAGUCHI, K., KITAGAWA, H., OHBO, N., AND FUJIWARA, Y. 1992. Decomposition--an approach for optimizing queries including ADT functions. Inf. Process. Lett. 43, 6, 327-333. Google ScholarDigital Library
- CHIMENTI, D., GAMBOA, R., AND KRISHNAMURTHY, R. 1989. Towards an open architecture for LDL. In Proceedings of the Fifteenth International Conference on Very Large Data Bases (Amsterdam), 195-203. Google ScholarDigital Library
- DAYAL, U. 1987. Of nests and trees: A unified approach to processing queries that contain nested subqueries, aggregates, and quantifiers. In Proceedings of the Thirteenth International Conference on Very Large Data Bases (Brighton), 197-208. Google ScholarDigital Library
- Du, W., KRISHNAMURTHY, R., AND SHAN, M.-C. 1992. Query optimization in heterogeneous DBMS. In Proceedings of the Eighteenth International Conference on Very Large Data Bases (Vancouver), 277-291. Google ScholarDigital Library
- FALOUTSOS, C. AND KAMEL, I. 1994. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proceedings of the Thirteenth ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (Minneapolis), 4-13. Google ScholarDigital Library
- FREW, J. 1995. Personal correspondence.Google Scholar
- GRAY, J. (Ed.) 1991. The Benchmark Handbook: For Database and Transaction Processing Systems. Morgan-Kaufmann, San Mateo, CA. Google ScholarDigital Library
- GRAY, J. 1995. Personal correspondence.Google Scholar
- HAAS, P. J., NAUGHTON, J. F., SESHADRI, S., AND STOKES, L. 1995. Sampling-based estimation of the number of distinct values of an attribute. In Proceedings of the 21st International Conference on Very Large Data Bases (Zurich). Google ScholarDigital Library
- HELLERSTEIN, g.M. 1992. Predicate migration: Optimizing queries with expensive predicates. Tech. Rep. Sequoia 2000 92/13 (Dec.), University of California, Berkeley. Google ScholarDigital Library
- HELLERSTEIN, J.M. 1994. Practical predicate placement. In Proceedings of the ACM-SIG- MOD International Conference on Management of Data (Minneapolis), 325-335. Google ScholarDigital Library
- HELLERSTEIN, J.M. AND NAUGHTON, J.F. 1996. Query execution techniques for caching expensive methods. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Montreal), 423-424. Google ScholarDigital Library
- HELLERSTEIN, J. M. AND STONEBRAKER, M. 1993. Predicate migration: Optimizing queries with expensive predicates. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Washington, DC), 267-276. Google ScholarDigital Library
- HONG, W. AND STONEBRAKER, M. 1993. Optimization of parallel query execution plans in XPRS. Distrib. Parallel Databases Int. J. 1, 1 (Jan.), 9-32. Google ScholarDigital Library
- Hou, W.-C., OZSOYOGLU, G., AND TANEJA, B.K. 1988. Statistical estimators for relational algebra expressions. In Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Austin), 276-287. Google ScholarDigital Library
- IBARAKI, T. AND KAMEDA, T. 1984. Optimal nesting for computing N-relational joins. ACM Trans. Database Syst. 9, 3 (Oct.), 482-502. Google ScholarDigital Library
- ILLUSTRA 1994. Illustra User's Guide, Illustra Server Release 2.1. Illustra Information Technologies, Inc.Google Scholar
- IOANNIDIS, Y. AND CHRISTODOULAKIS, S. 1991. On the propagation of errors in the size ofjoin results. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Denver), 268-277. Google ScholarDigital Library
- IOANNIDIS, Y. AND POOSALA, V. 1995. Balancing histogram optimality and practicality for query result size estimation. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (San Jose, CA), 233-244. Google ScholarDigital Library
- IOANNIDIS, Y. E. AND KANG, Y.C. 1990. Randomized algorithms for optimizing large join queries. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Atlantic City, NJ), 312-321. Google ScholarDigital Library
- ISO_ANSI 1993. Database language SQL ISO/IEC 9075:1992.Google Scholar
- KEMPER, A., MOERKOTTE, G., PEITHNER, K., AND STEINBRUNN, M. 1994. Optimizing disjunctive queries with expensive predicates. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Minneapolis), 336-347. Google ScholarDigital Library
- KIM, W. 1993. Object-oriented database systems: Promises, reality, and future. In Proceedings of the Nineteenth International Conference on Very Large Data Bases (Dublin), 676-687. Google ScholarDigital Library
- KRISHNAMURTHY, R. AND ZANIOLO, C. 1988. Optimization in a logic based language for knowledge and data intensive applications. In Proceedings of the International Conference on Extending Data Base Technology (Venice), Advances in Database Technology--EDBT '88, J. W. Schmidt, S. Ceri, and M. Missikoff, Eds., Lecture Notes in Computer Science, Volume 303, Springer-Verlag, 128-137. Google ScholarDigital Library
- KRISHNAMURTHY, Z., BORAL, H., AND ZANIOLO, C. 1986. Optimization of nonrecursive queries. In Proceedings of the Twelfth International Conference on Very Large Data Bases (Kyoto), 128-137. Google ScholarDigital Library
- LEVY, A.Y., MUMICK, I.S., AND SAGIV, Y. 1994. Query optimization by predicate movearound. In Proceedings of the 20th International Conference on Very Large Data Bases (Santiago), 96-107. Google ScholarDigital Library
- LIPTON, R. J., NAUGHTON, J. F., SCHNEIDER, D. A., AND SESHADRI, S. 1993. Efficient sampling strategies for relational database operations. Theor. Comput. Sci. 116, 195-226. Google ScholarDigital Library
- LOHMAN, G.M. 1995. Personal correspondence.Google Scholar
- LOHMAN, G. M. AND HAAS, L.M. 1993. Personal correspondence.Google Scholar
- LOHMAN, G. M., DANIELS, D., HAAS, L. M., KISTLER, R., AND SELINGER, P.G. 1984. Optimization of nested queries in a distributed relational database. In Proceedings of the Tenth International Conference on Very Large Data Bases (Singapore), 403-415. Google ScholarDigital Library
- LYNCH, C. AND STONEBRAKER, M. 1988. Extended user-defined indexing with application to textual databases. In Proceedings of the Fourteenth International Conference on Very Large Data Bases (Los Angeles), 306-317. Google ScholarDigital Library
- MACKERT, L. F. AND LOHMAN, G.M. 1986a. R* optimizer validation and performance evaluation for distributed queries. In Proceedings of the Twelfth International Conference on Very Large Data Bases (Kyoto), 149-159. Google ScholarDigital Library
- MACKERT, L. F. AND LOHMAN, G.M. 1986b. R* optimizer validation and performance evaluation for local queries. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Washington, DC), 84-95. Google ScholarDigital Library
- MAIER, D. AND STEIN, J. 1986. Indexing in an object-oriented DBMS. In Proceedings of the Workshop on Object-Oriented Database Systems (Asilomar), K. R. Dittrich and U. Dayal, Eds., 171-182. Google ScholarDigital Library
- MONMA, C. L. AND SIDNEY, g. 1979. Sequencing with series-parallel precedence constraints. Math. Oper. Res. 4, 215-224.Google ScholarDigital Library
- NAUGHTON, J. 1993. Presentation at Fifth International High Performance Transaction Workshop.Google Scholar
- PALERMO, F.P. 1974. A data base search problem. In Information Systems COINS IV, J. T. Tou, Ed., Plenum Press, New York.Google Scholar
- PIRAHESH, H. 1994. Object-oriented features of DB2 client/server. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Minneapolis), 483. Google ScholarDigital Library
- PIRAHESH, H., HELLERSTEIN, J. M., AND HASAN, W. 1992. Extensible/rule-based query rewrite optimization in Starburst. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (San Diego), 39-48. Google ScholarDigital Library
- POOSALA, V. AND IOANNIDIS, Y. 1996. Estimation of query-result distribution and its application in parallel-join load balancing. In Proceedings of the 22nd International Conference on Very Large Data Bases (Bombay), 448-459. Google ScholarDigital Library
- POOSALA, V., IOANNIDIS, Y. E., HAAS, P. J., AND SHEKITA, E.J. 1996. Selectivity estimation of range predicates using histograms. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Montreal), 294-305. Google ScholarDigital Library
- RAAB, F. 1995. TPC Benchmark D--Standard Specification, Revision 1.0. Transaction Processing Performance Council.Google Scholar
- SELINGER, P. G., ASTRAHAN, M., CHAMBERLIN, D., LORIE, R., AND PRICE, T. 1979. Access path selection in a relational database management system. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Boston), 22-34. Google ScholarDigital Library
- SESHADRI, P., HELLERSTEIN, J. M., PIRAHESH, H., LEUNG, T. C., RAMAKRISHNAN, R., SRIVASTAVA, D., STUCKEY, P. J., AND SUDARSHAN, S. 1996a. Cost-based optimization for Magic: Algebra and implementation. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Montreal), 435-446. Google ScholarDigital Library
- SESHADRI, P., PIRAHESH, H., AND LEUNG, T.C. 1996b. Complex query decorrelation. In Proceedings of the Twelfth IEEE International Conference on Data Engineering (New Orleans). Google ScholarDigital Library
- SHAPIRO, L.D. 1986. Join processing in database systems with large main memories. ACM Trans. Database Syst. 11, 3 (Sept.), 239-264. Google ScholarDigital Library
- SMITH, D.E. 1989. Controlling backward inference. Artif. Intell. 39, 145-208. Google ScholarDigital Library
- SMITH, W.E. 1956. Various optimizers for single-stage production. Naval Res. Logist. Q. 3, 59-66.Google ScholarCross Ref
- STEINBRUNN, M., PEITHNER, K., MOERKOTTE, G., AND KEMPER, A. 1995. Bypassing joins in disjunctive queries. In Proceedings of the 21st International Conference on Very Large Data Bases (Zurich). Google ScholarDigital Library
- STONEBRAKER, M. 1991. Managing persistent objects in a multi-level store. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Denver), 2-11. Google ScholarDigital Library
- STONEBRAKER, M. AND KEMNITZ, G. 1991. The POSTGRES next-generation database management system. Commun. ACM 34, 10, 78-92. Google ScholarDigital Library
- STONEBRAKER, M., FREW, J., GARDELS, K., AND MEREDITH, J. 1993. The Sequoia 2000 storage benchmark. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Washington, DC), 2-11. Google ScholarDigital Library
- SWAMI, A. AND GUPTA, A. 1988. Optimization of large join queries. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Chicago), 8-17. Google ScholarDigital Library
- SWAMI, A. AND IYER, B.R. 1992. A polynomial time algorithm for optimizing join queries. Res. Rep. RJ 8812 (June), IBM Almaden Research Center.Google Scholar
- TURBYFILL, C., ORJI, C. AND BITTON, D. 1989. AS3AP--a comparative relational database benchmark. In Proceedings of the IEEE Compcon Spring '89.Google Scholar
- WONG, E. AND YOUSSEFI, K. 1976. Decomposition--a strategy for query processing. ACM Trans. Database Syst. 1, 3 (Sept.), 223-241. Google ScholarDigital Library
- YAJIMA, K., KITAGAWA, H., YAMAGUCHI, K., OHBO, N., AND FUJIWARA, Y. 1991. Optimization of queries including ADT functions. In Proceedings of the Second International Symposium on Database Systems for Advanced Applications (Tokyo), 366-373. Google ScholarDigital Library
Index Terms
- Optimization techniques for queries with expensive methods
Recommendations
Optimization of multi-version expensive predicates
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataModern query optimizers need to take into account the performance of expensive user-defined predicates. Existing research has shown how to incorporate such predicates in a traditional cost-based query optimizer. In this paper we deal with the ...
Containment and Optimization of Object-Preserving Conjunctive Queries
In the optimization of queries in an object-oriented database (OODB) system, a natural first step is to use the typing constraints imposed by the schema to transform a query into an equivalent one that logically accesses a minimal set of objects. We ...
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database TheoryThe problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Comments