article

Sharing data and work across concurrent analytical queries

Authors:
Iraklis Psaroudakis

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Manos Athanassoulis

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Anastasia Ailamaki

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 9pp 637–648https://doi.org/10.14778/2536360.2536364

Published:01 July 2013Publication History

Proceedings of the VLDB Endowment

Abstract

Today's data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the query-centric model to execution models involving sharing of common data and work. Our goal is to show when and how a DW should employ sharing. We evaluate experimentally two sharing methodologies, based on their original prototype systems, that exploit work sharing opportunities among concurrent queries at run-time: Simultaneous Pipelining (SP), which shares intermediate results of common sub-plans, and Global Query Plans (GQP), which build and evaluate a single query plan with shared operators.

First, after a short review of sharing methodologies, we show that SP and GQP are orthogonal techniques. SP can be applied to shared operators of a GQP, reducing response times by 20%-48% in workloads with numerous common sub-plans. Second, we corroborate previous results on the negative impact of SP on performance for cases of low concurrency. We attribute this behavior to a bottleneck caused by the push-based communication model of SP. We show that pull-based communication for SP eliminates the overhead of sharing altogether for low concurrency, and scales better on multi-core machines than push-based SP, further reducing response times by 82%-86% for high concurrency. Third, we perform an experimental analysis of SP, GQP and their combination, and show when each one is beneficial. We identify a trade-off between low and high concurrency. In the former case, traditional query-centric operators with SP perform better, while in the latter case, GQP with shared operators enhanced by SP give the best results.

References

TPC-H Benchmark: Standard Specification, Revision 2.14.3.Google Scholar
S. Arumugam et al. The DataPath system: a data-centric analytic processing engine for large data warehouses. In Proc. of the 2010 ACM SIGMOD Int'l Conf. on Management of Data, pages 519-530, 2010. Google Scholar
G. Candea et al. A scalable, predictable join operator for highly concurrent data warehouses. Proc. of the VLDB Endowment, 2(1):277-288, 2009. Google Scholar
G. Candea et al. Predictable performance and high query concurrency for data analytics. The Int'l Journal on Very Large Data Bases, 20(2):227-248, 2011. Google Scholar
H.-T. Chou et al. An evaluation of buffer management strategies for relational database systems. In Proc. of the 11th Int'l Conf. on Very Large Data Bases, pages 127-141, 1985. Google Scholar
J. Cieslewicz et al. Adaptive aggregation on chip multiprocessors. In Proc. of the 33rd Int'l Conf. on Very Large Data Bases, pages 339-350, 2007. Google Scholar
L. Colby et al. Red brick vista™: aggregate computation and management. In Proc. of the 14th Int'l Conf. on Data Engineering, pages 174-177, 1998. Google Scholar
C. Cook. Database Architecture: The Storage Engine, 2001. http://msdn.microsoft.com/library/aa902689(v=sql.80).aspx.Google Scholar
N. N. Dalvi et al. Pipelining in multi-query optimization. In Proc. of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Databases, pages 59-70, 2001. Google Scholar
J. Dean et al. MapReduce: Simplified data processing on large clusters. Communications ACM, 51(1):107-113, 2008. Google Scholar
G. Giannikis et al. SharedDB: killing one thousand queries with one stone. Proc. of the VLDB Endowment, 5(6):526-537, 2012. Google Scholar
S. Harizopoulos et al. A case for staged database systems. In Proc. of the 2003 Conf. on Innovative Data Systems Research, 2003.Google Scholar
S. Harizopoulos et al. QPipe: a simultaneously pipelined relational query engine. In Proc. of the 2005 ACM SIGMOD Int'l Conf. on Management of Data, pages 383-394, 2005. Google Scholar
R. Johnson et al. To share or not to share? In Proc. of the 33rd Int'l Conf. on Very Large Data Bases, pages 351-362, 2007. Google Scholar
R. Johnson et al. Shore-MT: a scalable storage manager for the multicore era. In Proc. of the 12th Int'l Conf. on Extending Database Technology: Advances in Database Technology, pages 24-35, 2009. Google Scholar
T. Johnson et al. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In Proc. of the 20th Int'l Conf. on Very Large Data Bases, pages 439-450, 1994. Google Scholar
R. Kimball et al. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. John Wiley & Sons, Inc., 2nd edition, 2002. Google Scholar
C. Lang et al. Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling. In Proc. of the 23rd Int'l Conf. on Data Engineering, pages 1136-1145, 2007.Google Scholar
N. Megiddo et al. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proc. of the 2nd USENIX Conf. on File and Storage Technologies, pages 115-130, 2003. Google Scholar
M. Mehta et al. Batch Scheduling in Parallel Database Systems. In Proc. of the 9th Int'l Conf. on Data Engineering, pages 400-410, 1993. Google Scholar
P. O. Neil et al. Star Schema Benchmark. 2009.Google Scholar
E. J. O'Neil et al. The LRU-K page replacement algorithm for database disk buffering. In Proc. of the 1993 ACM SIGMOD Int'l Conf. on Management of Data, pages 297-306, 1993. Google Scholar
L. Qiao et al. Main-memory scan sharing for multicore cpus. Proc. of the VLDB Endowment, 1(1):610-621, 2008. Google Scholar
N. Roussopoulos. View indexing in relational databases. ACM Trans. Database Syst., 7(2):258-290, 1982. Google Scholar
P. Roy et al. Efficient and extensible algorithms for multi query optimization. In Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data, pages 249-260, 2000. Google Scholar
P. Russom. High-Performance Data Warehousing. TDWI, 2012. http://tdwi.org/research/2012/10/tdwi-best-practices-report-high-performance-data-warehousing.aspx.Google Scholar
T. K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13(1):23-52, 1988. Google Scholar
J. Shim et al. Dynamic Caching of Query Results for Decision Support Systems. In Proc. of the 11th Int'l Conf. on Scientific and Statistical Database Management, pages 254-263, 1999. Google Scholar
P. Unterbrunner et al. Predictable performance for unpredictable workloads. Proc. of the VLDB Endowment, 2(1):706-717, 2009. Google Scholar
M. Zukowski et al. Cooperative scans: dynamic bandwidth sharing in a DBMS. In Proc. of the 33rd Int'l Conf. on Very Large Data Bases, pages 723-734, 2007. Google Scholar

Index Terms

Sharing data and work across concurrent analytical queries
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Reactive and proactive sharing across concurrent analytical queries
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Today an ever increasing amount of data is collected and analyzed by researchers, businesses, and scientists in data warehouses (DW). In addition to the data size, the number of users and applications querying data grows exponentially. The increasing ...
Read More
Lightweight annotations for controlling sharing in concurrent data structures
PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation

SharC is a recently developed system for checking data-sharing in multithreaded programs. Programmers specify sharing rules (read-only, protected by a lock, etc.) for individual objects, and the SharC compiler enforces these rules using static and ...
Read More
Lightweight annotations for controlling sharing in concurrent data structures
PLDI '09

SharC is a recently developed system for checking data-sharing in multithreaded programs. Programmers specify sharing rules (read-only, protected by a lock, etc.) for individual objects, and the SharC compiler enforces these rules using static and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 9
July 2013
180 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2013
Published in pvldb Volume 6, Issue 9
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 171
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sharing data and work across concurrent analytical queries

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Reactive and proactive sharing across concurrent analytical queries

Lightweight annotations for controlling sharing in concurrent data structures

Lightweight annotations for controlling sharing in concurrent data structures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sharing data and work across concurrent analytical queries

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Reactive and proactive sharing across concurrent analytical queries

Lightweight annotations for controlling sharing in concurrent data structures

Lightweight annotations for controlling sharing in concurrent data structures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media