skip to main content
10.1145/1247480.1247560acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Scalable approximate query processing with the DBO engine

Published:11 June 2007Publication History

ABSTRACT

This paper describes query processing in the DBO database system. Like other database systems designed for ad-hoc, analytic processing, DBO is able to compute the exact answer to queries over a large relational database in a scalable fashion. Unlike any other system designed for analytic processing, DBO can constantly maintain a guess as to the final answer to an aggregate query throughout execution, along with statistically meaningful bounds for the guess's accuracy. As DBO gathers more and more information, the guess gets more and more accurate, until it is 100% accurate as the query is completed. This allows users to stop the execution at any time that they are happy with the query accuracy, and encourages exploratory data analysis.

References

  1. S. Acharya, P. Gibons, V. Poosala, S. Ramaswamy: Join Synopses for Approximate Query Processing. SIGMOD 1999:275--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Chaudhuri, R. Motwani, V. R. Narasayya: On Random Sampling over Joins. SIGMOD 1999: 263--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. Cochran: Sampling Techniques. Wiley and Sons, 1977.Google ScholarGoogle Scholar
  4. J. P. Dittrich, B. Seeger, D. S. Taylor, Peter Widmayer: On producing join results early. PODS 2003: 134--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. P. Dittrich, B. Seeger, D. S. Taylor, P. Widmayer: Progressive Merge Join: A Generic and Non-blocking Sort-based Join Algorithm. VLDB 2002: 299--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. J. Haas, J. M. Hellerstein: Ripple Joins for Online Aggregation. SIGMOD 1999: 287--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P.J. Haas: Large-Sample and Deterministic Confidence Intervals for Online Aggregation. SSDBM 1997: 51--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. J. Haas, J. F. Naughton, S. Seshadri, A. N. Swami: Selectivity and Cost Estimation for Joins Based on Random Sampling. J. Com. Syst. Sci. 52(3): 550--569 (1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. H. Hardy, J. E. Littlewood, and G. Polya. Inequalities. Cambridge University Press, 1988.Google ScholarGoogle Scholar
  10. J. M. Hellerstein, R. Avnur, A. Chou, C. Hidber, C. Olston, V. Raman, T. Roth, P. J. Haas: Interactive Data Analysis: The Control Project. IEEE Computer 32(8): 51--59 (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. M. Hellerstein, P. J. Haas, H. J. Wang: Online Aggregation. SIGMOD 1997: 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Özsoyoglu, K. Du, S. G. Swamy, W. C. Hou: Processing Real-Time, Non-Aggregate Queries with Time-Constraints in CASE-DB. ICDE 1992: 410--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Jermaine, A. Dobra, S. Arumugam, S. Joshi, A. Pol: A Disk-Based Join with Probabilistic Guarantees. SIGMOD 2005: 456--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Jermaine, A. Dobra, A. Pol, S. Joshi: Online Estimation for Subset-Based SQL Queries. VLDB 2005: 745--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Luo, C. Ellmann, P. J. Haas, J. F. Naughton: A scalable hash ripple join algorithm. SIGMOD 2002: 252--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Olken: Random Sampling from Databases. PhD Thesis, U. of California, Berkeley, 1993Google ScholarGoogle Scholar
  17. F. Olken, D. Rotem, P. Xu: Random Sampling from Hash Files. SIGMOD 1990: 375--386.Google ScholarGoogle Scholar
  18. F. Olken, D. Rotem: Random Sampling from B+-Trees. VLDB 1989: 269--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L.D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM TODS 11(3): 239--264 (1986). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Shao: Mathematical Statistics. Springer-Verlag, 1999.Google ScholarGoogle Scholar

Index Terms

  1. Scalable approximate query processing with the DBO engine

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
              June 2007
              1210 pages
              ISBN:9781595936868
              DOI:10.1145/1247480
              • General Chairs:
              • Lizhu Zhou,
              • Tok Wang Ling,
              • Program Chair:
              • Beng Chin Ooi

              Copyright © 2007 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 11 June 2007

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader