skip to main content
article
Free Access

Optimization techniques for queries with expensive methods

Published:01 June 1998Publication History
Skip Abstract Section

Abstract

Object-relational database management systems allow knowledgeable users to define new data types as well as new methods (operators) for the types. This flexibility produces an attendant complexity, which must be handled in new ways for an object-relational database management system to be efficient. In this article we study techniques for optimizing queries that contain time-consuming methods. The focus of traditional query optimizers has been on the choice of join methods and orders; selections have been handled by “pushdown” rules. These rules apply selections in an arbitrary order before as many joins as possible, using th e assumption that selection takes no time. However, users of object-relational systems can embed complex methods in selections. Thus selections may take significant amounts of time, and the query optimization model must be enhanced. In this article we carefully define a query cost framework that incorporates both selectivity and cost estimates for selections. We develop an algorithm called Predicate Migration, and prove that it produces optimal plans for queries with expensive methods. We then describe our implementation of Predicate Migration in the commercial object-relational database management system Illustra, and discuss practical issues that affect our earlier assumptions. We compare Predicate Migration to a variety of simplier optimization techniques, and demonstrate that Predicate Migration is the best general solution to date. The alternative techniques we present may be useful for constrained workloads.

References

  1. BITTON, D., DEWITT, D.J., AND TURBYFILL, C. 1983. Benchmarking database systems, a systematic approach. In Proceedings of the Ninth International Conference on Very Large Data Bases (Florence). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. CAREY, M. g., DEWITT, D. g., KANT, C., AND NAUGHTON, g.F. 1994. A status report on the 007 OODBMS benchmarking effort. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications (Portland), 414-426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. CATTELL, R. 1994. The Object Database Standard: ODMG-93 (Release 1.1). Morgan Kaufmann, San Mateo, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. CHAUDHURI, S. AND SHIM, K. 1993. Query optimization in the presence of foreign functions. In Proceedings of the Nineteenth International Conference on Very Large Data Bases (Dublin), 526-541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CHEN, H., Yu, X., YAMAGUCHI, K., KITAGAWA, H., OHBO, N., AND FUJIWARA, Y. 1992. Decomposition--an approach for optimizing queries including ADT functions. Inf. Process. Lett. 43, 6, 327-333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CHIMENTI, D., GAMBOA, R., AND KRISHNAMURTHY, R. 1989. Towards an open architecture for LDL. In Proceedings of the Fifteenth International Conference on Very Large Data Bases (Amsterdam), 195-203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. DAYAL, U. 1987. Of nests and trees: A unified approach to processing queries that contain nested subqueries, aggregates, and quantifiers. In Proceedings of the Thirteenth International Conference on Very Large Data Bases (Brighton), 197-208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Du, W., KRISHNAMURTHY, R., AND SHAN, M.-C. 1992. Query optimization in heterogeneous DBMS. In Proceedings of the Eighteenth International Conference on Very Large Data Bases (Vancouver), 277-291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. FALOUTSOS, C. AND KAMEL, I. 1994. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proceedings of the Thirteenth ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (Minneapolis), 4-13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FREW, J. 1995. Personal correspondence.Google ScholarGoogle Scholar
  11. GRAY, J. (Ed.) 1991. The Benchmark Handbook: For Database and Transaction Processing Systems. Morgan-Kaufmann, San Mateo, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. GRAY, J. 1995. Personal correspondence.Google ScholarGoogle Scholar
  13. HAAS, P. J., NAUGHTON, J. F., SESHADRI, S., AND STOKES, L. 1995. Sampling-based estimation of the number of distinct values of an attribute. In Proceedings of the 21st International Conference on Very Large Data Bases (Zurich). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HELLERSTEIN, g.M. 1992. Predicate migration: Optimizing queries with expensive predicates. Tech. Rep. Sequoia 2000 92/13 (Dec.), University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. HELLERSTEIN, J.M. 1994. Practical predicate placement. In Proceedings of the ACM-SIG- MOD International Conference on Management of Data (Minneapolis), 325-335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. HELLERSTEIN, J.M. AND NAUGHTON, J.F. 1996. Query execution techniques for caching expensive methods. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Montreal), 423-424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. HELLERSTEIN, J. M. AND STONEBRAKER, M. 1993. Predicate migration: Optimizing queries with expensive predicates. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Washington, DC), 267-276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. HONG, W. AND STONEBRAKER, M. 1993. Optimization of parallel query execution plans in XPRS. Distrib. Parallel Databases Int. J. 1, 1 (Jan.), 9-32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hou, W.-C., OZSOYOGLU, G., AND TANEJA, B.K. 1988. Statistical estimators for relational algebra expressions. In Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Austin), 276-287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. IBARAKI, T. AND KAMEDA, T. 1984. Optimal nesting for computing N-relational joins. ACM Trans. Database Syst. 9, 3 (Oct.), 482-502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. ILLUSTRA 1994. Illustra User's Guide, Illustra Server Release 2.1. Illustra Information Technologies, Inc.Google ScholarGoogle Scholar
  22. IOANNIDIS, Y. AND CHRISTODOULAKIS, S. 1991. On the propagation of errors in the size ofjoin results. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Denver), 268-277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. IOANNIDIS, Y. AND POOSALA, V. 1995. Balancing histogram optimality and practicality for query result size estimation. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (San Jose, CA), 233-244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. IOANNIDIS, Y. E. AND KANG, Y.C. 1990. Randomized algorithms for optimizing large join queries. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Atlantic City, NJ), 312-321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. ISO_ANSI 1993. Database language SQL ISO/IEC 9075:1992.Google ScholarGoogle Scholar
  26. KEMPER, A., MOERKOTTE, G., PEITHNER, K., AND STEINBRUNN, M. 1994. Optimizing disjunctive queries with expensive predicates. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Minneapolis), 336-347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. KIM, W. 1993. Object-oriented database systems: Promises, reality, and future. In Proceedings of the Nineteenth International Conference on Very Large Data Bases (Dublin), 676-687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. KRISHNAMURTHY, R. AND ZANIOLO, C. 1988. Optimization in a logic based language for knowledge and data intensive applications. In Proceedings of the International Conference on Extending Data Base Technology (Venice), Advances in Database Technology--EDBT '88, J. W. Schmidt, S. Ceri, and M. Missikoff, Eds., Lecture Notes in Computer Science, Volume 303, Springer-Verlag, 128-137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. KRISHNAMURTHY, Z., BORAL, H., AND ZANIOLO, C. 1986. Optimization of nonrecursive queries. In Proceedings of the Twelfth International Conference on Very Large Data Bases (Kyoto), 128-137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. LEVY, A.Y., MUMICK, I.S., AND SAGIV, Y. 1994. Query optimization by predicate movearound. In Proceedings of the 20th International Conference on Very Large Data Bases (Santiago), 96-107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. LIPTON, R. J., NAUGHTON, J. F., SCHNEIDER, D. A., AND SESHADRI, S. 1993. Efficient sampling strategies for relational database operations. Theor. Comput. Sci. 116, 195-226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. LOHMAN, G.M. 1995. Personal correspondence.Google ScholarGoogle Scholar
  33. LOHMAN, G. M. AND HAAS, L.M. 1993. Personal correspondence.Google ScholarGoogle Scholar
  34. LOHMAN, G. M., DANIELS, D., HAAS, L. M., KISTLER, R., AND SELINGER, P.G. 1984. Optimization of nested queries in a distributed relational database. In Proceedings of the Tenth International Conference on Very Large Data Bases (Singapore), 403-415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. LYNCH, C. AND STONEBRAKER, M. 1988. Extended user-defined indexing with application to textual databases. In Proceedings of the Fourteenth International Conference on Very Large Data Bases (Los Angeles), 306-317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. MACKERT, L. F. AND LOHMAN, G.M. 1986a. R* optimizer validation and performance evaluation for distributed queries. In Proceedings of the Twelfth International Conference on Very Large Data Bases (Kyoto), 149-159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. MACKERT, L. F. AND LOHMAN, G.M. 1986b. R* optimizer validation and performance evaluation for local queries. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Washington, DC), 84-95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. MAIER, D. AND STEIN, J. 1986. Indexing in an object-oriented DBMS. In Proceedings of the Workshop on Object-Oriented Database Systems (Asilomar), K. R. Dittrich and U. Dayal, Eds., 171-182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. MONMA, C. L. AND SIDNEY, g. 1979. Sequencing with series-parallel precedence constraints. Math. Oper. Res. 4, 215-224.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. NAUGHTON, J. 1993. Presentation at Fifth International High Performance Transaction Workshop.Google ScholarGoogle Scholar
  41. PALERMO, F.P. 1974. A data base search problem. In Information Systems COINS IV, J. T. Tou, Ed., Plenum Press, New York.Google ScholarGoogle Scholar
  42. PIRAHESH, H. 1994. Object-oriented features of DB2 client/server. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Minneapolis), 483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. PIRAHESH, H., HELLERSTEIN, J. M., AND HASAN, W. 1992. Extensible/rule-based query rewrite optimization in Starburst. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (San Diego), 39-48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. POOSALA, V. AND IOANNIDIS, Y. 1996. Estimation of query-result distribution and its application in parallel-join load balancing. In Proceedings of the 22nd International Conference on Very Large Data Bases (Bombay), 448-459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. POOSALA, V., IOANNIDIS, Y. E., HAAS, P. J., AND SHEKITA, E.J. 1996. Selectivity estimation of range predicates using histograms. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Montreal), 294-305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. RAAB, F. 1995. TPC Benchmark D--Standard Specification, Revision 1.0. Transaction Processing Performance Council.Google ScholarGoogle Scholar
  47. SELINGER, P. G., ASTRAHAN, M., CHAMBERLIN, D., LORIE, R., AND PRICE, T. 1979. Access path selection in a relational database management system. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Boston), 22-34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. SESHADRI, P., HELLERSTEIN, J. M., PIRAHESH, H., LEUNG, T. C., RAMAKRISHNAN, R., SRIVASTAVA, D., STUCKEY, P. J., AND SUDARSHAN, S. 1996a. Cost-based optimization for Magic: Algebra and implementation. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Montreal), 435-446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. SESHADRI, P., PIRAHESH, H., AND LEUNG, T.C. 1996b. Complex query decorrelation. In Proceedings of the Twelfth IEEE International Conference on Data Engineering (New Orleans). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. SHAPIRO, L.D. 1986. Join processing in database systems with large main memories. ACM Trans. Database Syst. 11, 3 (Sept.), 239-264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. SMITH, D.E. 1989. Controlling backward inference. Artif. Intell. 39, 145-208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. SMITH, W.E. 1956. Various optimizers for single-stage production. Naval Res. Logist. Q. 3, 59-66.Google ScholarGoogle ScholarCross RefCross Ref
  53. STEINBRUNN, M., PEITHNER, K., MOERKOTTE, G., AND KEMPER, A. 1995. Bypassing joins in disjunctive queries. In Proceedings of the 21st International Conference on Very Large Data Bases (Zurich). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. STONEBRAKER, M. 1991. Managing persistent objects in a multi-level store. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Denver), 2-11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. STONEBRAKER, M. AND KEMNITZ, G. 1991. The POSTGRES next-generation database management system. Commun. ACM 34, 10, 78-92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. STONEBRAKER, M., FREW, J., GARDELS, K., AND MEREDITH, J. 1993. The Sequoia 2000 storage benchmark. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Washington, DC), 2-11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. SWAMI, A. AND GUPTA, A. 1988. Optimization of large join queries. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Chicago), 8-17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. SWAMI, A. AND IYER, B.R. 1992. A polynomial time algorithm for optimizing join queries. Res. Rep. RJ 8812 (June), IBM Almaden Research Center.Google ScholarGoogle Scholar
  59. TURBYFILL, C., ORJI, C. AND BITTON, D. 1989. AS3AP--a comparative relational database benchmark. In Proceedings of the IEEE Compcon Spring '89.Google ScholarGoogle Scholar
  60. WONG, E. AND YOUSSEFI, K. 1976. Decomposition--a strategy for query processing. ACM Trans. Database Syst. 1, 3 (Sept.), 223-241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. YAJIMA, K., KITAGAWA, H., YAMAGUCHI, K., OHBO, N., AND FUJIWARA, Y. 1991. Optimization of queries including ADT functions. In Proceedings of the Second International Symposium on Database Systems for Advanced Applications (Tokyo), 366-373. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimization techniques for queries with expensive methods

      Recommendations

      Reviews

      Kristian Torp

      Methods associated with abstract data types that are added to object-relational DBMSs (ORDBMSs) can be very time-consuming to execute. This complicates query optimization in an ORDBMS compared to a relational DBMS (RDBMS). For example, the predicate “pushdown” rule, used in most RDBMS query optimizers since System R, may no longer be generally applicable in ORDBMSs. The paper presents an algorithm, called predicate migration, for optimizing queries with expensive methods. The focus is on the selection and join operators. The author proves that the algorithm produces optimal query plans. The validity of the theoretical framework is demonstrated to hold in practice via an implementation that extends the Illustra ORDBMS query optimizer. The predicate migration algorithm and three simpler algorithms are compared. The algorithms can be incrementally added to Illustra's query optimizer (which is similar to System R). The experiments show that, overall, predicate migration is the best of the four algorithms. Furthermore, on queries with expensive methods, the speedup can be orders of magnitude compared to the plain Illustra query optimizer. The paper is well written. Furthermore, the combination of a theoretical framework and experiments on a commercially available ORDBMS makes the paper particularly interesting.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader