skip to main content
10.1145/1007568.1007613acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Adapting to source properties in processing data integration queries

Published:13 June 2004Publication History

ABSTRACT

An effective query optimizer finds a query plan that exploits the characteristics of the source data. In data integration, little is known in advance about sources' properties, which necessitates the use of adaptive query processing techniques to adjust query processing on-the-fly. Prior work in adaptive query processing has focused on compensating for delays and adjusting for mis-estimated cardinality or selectivity values. In this paper, we present a generalized architecture for adaptive query processing and introduce a new technique, called adaptive data partitioning (ADP), which is based on the idea of dividing the source data into regions, each executed by different, complementary plans. We show how this model can be applied in novel ways to not only correct for underestimated selectivity and cardinality values, but also to discover and exploit order in the source data, and to detect and exploit source data that can be effectively pre-aggregated. We experimentally compare a number of alternative strategies and show that our approach is effective.

References

  1. G. Antoshenkov and M. Ziauddin. Query processing and optimization in Oracle Rdb. VLDB Journal, 5(4):229--237, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD '00. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Bruno and S. Chaudhuri. Exploiting statistics on query expressions for optimization. In SIGMOD '02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Chaudhuri and K. Shim. Including group-by in query optimization. In VLDB '94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C.-M. Chen and N. Roussopoulos. Adaptive selectivity estimation using query feedback. In SIGMOD '94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. L. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In SIGMOD '94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Donjerkovic, Y. E. Ioannidis, and R. Ramakrishnan. Dynamic histograms: Capturing evolving data sets. In ICDE '00.Google ScholarGoogle ScholarCross RefCross Ref
  8. D. Florescu, A. Y. Levy, I. Manolescu, and D. Suciu. Query optimization in the presence of limited access patterns. In SIGMOD '99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. B. Gibbons. Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. TODS, 27(3), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Graefe, R. Bunker, and S. Cooper. Hash joins and hashteams in Microsoft SQL Server. In VLDB '98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. M. Haas, D. Kossmann, E. L. Wimmers, and J. Yang. Optimizing queries across diverse data sources. In VLDB '97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD '99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. J. Haas, J. F. Naughton, S. Seshadri, and L. Stokes. Sampling-based estimation of the number of distinct values of an attribute. In VLDB '95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4):270--294, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. M. Hellerstein, P. J. Haas, and H. Wang. Online aggregation. In SIGMOD '97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. G. Ives. Efficient Query Processing for Data Integration. PhD thesis, University of Washington, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Z. G. Ives, D. Florescu, M. T. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In SIGMOD '99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. G. Ives, A. Y. Halevy, and D. S. Weld. An XML query engine for network-bound data. VLDB Journal, 11(4):380--402, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD '98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Madden, M. A. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In SIGMOD '02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Raman, A. Deshpande, and J. M. Hellerstein. Using state modules for adaptive query processing. In ICDE '03.Google ScholarGoogle Scholar
  22. L. Raschid and S. Y. W. Su. A parallel processing strategy for evaluating recursive queries. In VLDB '86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO ---DB2's LEearning Optimizer. In VLDB '01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Tian and D. J. DeWitt. Tuple routing strategies for distributed eddies. In VLDB '03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Urhan and M. J. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. In VLDB '01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2), June 2000.Google ScholarGoogle Scholar
  27. T. Urhan, M. J. Franklin, and L. Amsaleg. Cost based query scrambling for initial delays. In SIGMOD '98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Adapting to source properties in processing data integration queries

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
        June 2004
        988 pages
        ISBN:1581138598
        DOI:10.1145/1007568

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 June 2004

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader