skip to main content
article

Sharing data and work across concurrent analytical queries

Published:01 July 2013Publication History
Skip Abstract Section

Abstract

Today's data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the query-centric model to execution models involving sharing of common data and work. Our goal is to show when and how a DW should employ sharing. We evaluate experimentally two sharing methodologies, based on their original prototype systems, that exploit work sharing opportunities among concurrent queries at run-time: Simultaneous Pipelining (SP), which shares intermediate results of common sub-plans, and Global Query Plans (GQP), which build and evaluate a single query plan with shared operators.

First, after a short review of sharing methodologies, we show that SP and GQP are orthogonal techniques. SP can be applied to shared operators of a GQP, reducing response times by 20%-48% in workloads with numerous common sub-plans. Second, we corroborate previous results on the negative impact of SP on performance for cases of low concurrency. We attribute this behavior to a bottleneck caused by the push-based communication model of SP. We show that pull-based communication for SP eliminates the overhead of sharing altogether for low concurrency, and scales better on multi-core machines than push-based SP, further reducing response times by 82%-86% for high concurrency. Third, we perform an experimental analysis of SP, GQP and their combination, and show when each one is beneficial. We identify a trade-off between low and high concurrency. In the former case, traditional query-centric operators with SP perform better, while in the latter case, GQP with shared operators enhanced by SP give the best results.

References

  1. TPC-H Benchmark: Standard Specification, Revision 2.14.3.Google ScholarGoogle Scholar
  2. S. Arumugam et al. The DataPath system: a data-centric analytic processing engine for large data warehouses. In Proc. of the 2010 ACM SIGMOD Int'l Conf. on Management of Data, pages 519-530, 2010. Google ScholarGoogle Scholar
  3. G. Candea et al. A scalable, predictable join operator for highly concurrent data warehouses. Proc. of the VLDB Endowment, 2(1):277-288, 2009. Google ScholarGoogle Scholar
  4. G. Candea et al. Predictable performance and high query concurrency for data analytics. The Int'l Journal on Very Large Data Bases, 20(2):227-248, 2011. Google ScholarGoogle Scholar
  5. H.-T. Chou et al. An evaluation of buffer management strategies for relational database systems. In Proc. of the 11th Int'l Conf. on Very Large Data Bases, pages 127-141, 1985. Google ScholarGoogle Scholar
  6. J. Cieslewicz et al. Adaptive aggregation on chip multiprocessors. In Proc. of the 33rd Int'l Conf. on Very Large Data Bases, pages 339-350, 2007. Google ScholarGoogle Scholar
  7. L. Colby et al. Red brick vista™: aggregate computation and management. In Proc. of the 14th Int'l Conf. on Data Engineering, pages 174-177, 1998. Google ScholarGoogle Scholar
  8. C. Cook. Database Architecture: The Storage Engine, 2001. http://msdn.microsoft.com/library/aa902689(v=sql.80).aspx.Google ScholarGoogle Scholar
  9. N. N. Dalvi et al. Pipelining in multi-query optimization. In Proc. of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Databases, pages 59-70, 2001. Google ScholarGoogle Scholar
  10. J. Dean et al. MapReduce: Simplified data processing on large clusters. Communications ACM, 51(1):107-113, 2008. Google ScholarGoogle Scholar
  11. G. Giannikis et al. SharedDB: killing one thousand queries with one stone. Proc. of the VLDB Endowment, 5(6):526-537, 2012. Google ScholarGoogle Scholar
  12. S. Harizopoulos et al. A case for staged database systems. In Proc. of the 2003 Conf. on Innovative Data Systems Research, 2003.Google ScholarGoogle Scholar
  13. S. Harizopoulos et al. QPipe: a simultaneously pipelined relational query engine. In Proc. of the 2005 ACM SIGMOD Int'l Conf. on Management of Data, pages 383-394, 2005. Google ScholarGoogle Scholar
  14. R. Johnson et al. To share or not to share? In Proc. of the 33rd Int'l Conf. on Very Large Data Bases, pages 351-362, 2007. Google ScholarGoogle Scholar
  15. R. Johnson et al. Shore-MT: a scalable storage manager for the multicore era. In Proc. of the 12th Int'l Conf. on Extending Database Technology: Advances in Database Technology, pages 24-35, 2009. Google ScholarGoogle Scholar
  16. T. Johnson et al. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In Proc. of the 20th Int'l Conf. on Very Large Data Bases, pages 439-450, 1994. Google ScholarGoogle Scholar
  17. R. Kimball et al. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. John Wiley & Sons, Inc., 2nd edition, 2002. Google ScholarGoogle Scholar
  18. C. Lang et al. Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling. In Proc. of the 23rd Int'l Conf. on Data Engineering, pages 1136-1145, 2007.Google ScholarGoogle Scholar
  19. N. Megiddo et al. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proc. of the 2nd USENIX Conf. on File and Storage Technologies, pages 115-130, 2003. Google ScholarGoogle Scholar
  20. M. Mehta et al. Batch Scheduling in Parallel Database Systems. In Proc. of the 9th Int'l Conf. on Data Engineering, pages 400-410, 1993. Google ScholarGoogle Scholar
  21. P. O. Neil et al. Star Schema Benchmark. 2009.Google ScholarGoogle Scholar
  22. E. J. O'Neil et al. The LRU-K page replacement algorithm for database disk buffering. In Proc. of the 1993 ACM SIGMOD Int'l Conf. on Management of Data, pages 297-306, 1993. Google ScholarGoogle Scholar
  23. L. Qiao et al. Main-memory scan sharing for multicore cpus. Proc. of the VLDB Endowment, 1(1):610-621, 2008. Google ScholarGoogle Scholar
  24. N. Roussopoulos. View indexing in relational databases. ACM Trans. Database Syst., 7(2):258-290, 1982. Google ScholarGoogle Scholar
  25. P. Roy et al. Efficient and extensible algorithms for multi query optimization. In Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data, pages 249-260, 2000. Google ScholarGoogle Scholar
  26. P. Russom. High-Performance Data Warehousing. TDWI, 2012. http://tdwi.org/research/2012/10/tdwi-best-practices-report-high-performance-data-warehousing.aspx.Google ScholarGoogle Scholar
  27. T. K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13(1):23-52, 1988. Google ScholarGoogle Scholar
  28. J. Shim et al. Dynamic Caching of Query Results for Decision Support Systems. In Proc. of the 11th Int'l Conf. on Scientific and Statistical Database Management, pages 254-263, 1999. Google ScholarGoogle Scholar
  29. P. Unterbrunner et al. Predictable performance for unpredictable workloads. Proc. of the VLDB Endowment, 2(1):706-717, 2009. Google ScholarGoogle Scholar
  30. M. Zukowski et al. Cooperative scans: dynamic bandwidth sharing in a DBMS. In Proc. of the 33rd Int'l Conf. on Very Large Data Bases, pages 723-734, 2007. Google ScholarGoogle Scholar

Index Terms

  1. Sharing data and work across concurrent analytical queries
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 6, Issue 9
        July 2013
        180 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 July 2013
        Published in pvldb Volume 6, Issue 9

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader