research-article

A scalable, predictable join operator for highly concurrent data warehouses

Authors:
George Candea

EPFL & Aster Data

EPFL & Aster Data
View Profile

,
Neoklis Polyzotis

U. C. Santa Cruz

U. C. Santa Cruz
View Profile

,
Radek Vingralek

Aster Data

Aster Data
View Profile

Proceedings of the VLDB Endowment Volume 2 Issue 1pp 277–288https://doi.org/10.14778/1687627.1687659

Published:01 August 2009Publication History

Proceedings of the VLDB Endowment

Abstract

Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention, because the physical plans---unaware of each other---compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly when multiple complex queries run at the same time.

We describe an augmentation of traditional query engines that improves join throughput in large-scale concurrent data warehouses. In contrast to the conventional query-at-a-time model, our approach employs a single physical plan that can share I/O, computation, and tuple storage across all in-flight join queries. We use an "always-on" pipeline of non-blocking operators, coupled with a controller that continuously examines the current query mix and performs run-time optimizations. Our design allows the query engine to scale gracefully to large data sets, provide predictable execution times, and reduce contention. In our empirical evaluation, we found that our prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries.

References

D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In ACM SIGMOD Intl. Conf. on Management of Data, 2006. Google ScholarDigital Library
D. J. Abadi, D. S. Myers, D. J. Dewitt, and S. R. Madden. Materialization strategies in a column-oriented DBMS. In Intl. Conf. on Data Engineering, 2007.Google ScholarCross Ref
J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman. Efficient pattern matching over event streams. In Intl. Conf. on Data Engineering, 2008.Google ScholarDigital Library
R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In ACM SIGMOD Intl. Conf. on Management of Data, 2000. Google ScholarDigital Library
S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom. Adaptive ordering of pipelined stream filters. In ACM SIGMOD Intl. Conf. on Management of Data, 2004. Google ScholarDigital Library
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. TelegraphCQ: continuous dataflow processing. In ACM SIGMOD Intl. Conf. on Management of Data, 2003. Google ScholarDigital Library
J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: a scalable continuous query system for internet databases. SIGMOD Record, 29(2), 2000. Google ScholarDigital Library
J. Cieslewicz and A. Ross, Kenneth. Adaptive aggregation on chip multiprocessors. In Intl. Conf. on Very Large Data Bases, 2007. Google ScholarDigital Library
P. M. Fernandez. Red Brick warehouse: A read-mostly RDBMS for open SMP platforms. In ACM SIGMOD Intl. Conf. on Management of Data, 1994. Google ScholarDigital Library
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In ACM SIGMOD Intl. Conf. on Management of Data, 1996. Google ScholarDigital Library
S. Harizopoulos and A. Ailamaki. StagedDB: Designing database servers for modern hardware. IEEE Data Eng. Bulletin, 28(2), 2005.Google Scholar
S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. Qpipe: a simultaneously pipelined relational query engine. In In ACM SIGMOD Intl. Conf. on Management of data, 2005. Google ScholarDigital Library
Z. Liu, S. Parthasarathy, A. Ranganathan, and H. Yang. A generic flow algorithm for shared filter ordering problems. In Symp. on Principles of Database Systems, New York, NY, USA, 2008. Google ScholarDigital Library
Z. Liu, S. Parthasarathy, A. Ranganathan, and H. Yang. Near-optimal algorithms for shared filter evaluation in data stream systems. In ACM SIGMOD Intl. Conf. on Management of Data, 2008. Google ScholarDigital Library
S. Madden, M. Shah, M. Hellerstein, Joseph, and V. Raman. Continuously adaptive continuous queries over streams. In ACM SIGMOD Intl. Conf. on Management of Data, 2002. Google ScholarDigital Library
A. Majumder, R. Rastogi, and S. Vanama. Scalable regular expression matching on data streams. In Intl. Conf. on Data Engineering, 2008. Google ScholarDigital Library
E. B. O. Patrick O'Neil and X. Chen. The Star Schema Benchmark. http://www.cs.umb.edu/~poneil/StarSchemaB.PDF, 2007.Google Scholar
L. Qiao, V. Raman, F. Reiss, P. Haas, and G. Lohman. Main-memory scan sharing for multi-core CPUs. In Intl. Conf. on Very Large Data Bases, 2008. Google ScholarDigital Library
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In Intl. Conf. on Data Engineering, 2008. Google ScholarDigital Library
N. Roussopoulos, C.-M. Chen, S. Kelley, A. Delis, and Y. Papakonstantinou. The ADMS project: Views R Us. IEEE Data Eng. Bulletin, 18(2), 1995.Google Scholar
T. K. Sellis. Multiple-query optimization. ACM Trans. Database Systems, 13(1):23--52, 1988. Google ScholarDigital Library
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era (it's time for a complete rewrite). In Intl. Conf. on Very Large Data Bases, 2007. Google ScholarDigital Library
TPC benchmark DS (decision support), draft specification, revision 32. http://www.tpc.org/tpcds/spec/tpcds32.pdf.Google Scholar
M. Zukowski, S. Héman, N. Nes, and P. Boncz. Cooperative scans: dynamic bandwidth sharing in a DBMS. In Intl. Conf. on Very Large Data Bases, 2007. Google ScholarDigital Library

Index Terms

A scalable, predictable join operator for highly concurrent data warehouses
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Efficient of bitmap join indexes for optimising star join queries in relational data warehouses

Data warehouses are dedicated to analysis and decision-making applications. They are often schematised as star relational models or variants for on-line analysis. Typically, the analysis process is conducted via online analytical processing (OLAP) type ...
Read More
Extended Adaptive Join Operator with Bind-Bloom Join for Federated SPARQL Queries

The goal of query optimization in query federation over linked data is to minimize the response time and the completion time. Communication time has the highest impact on them both. Static query optimization can end up with inefficient execution plans ...
Read More
Join and multi-join processing in data integration systems

Query processing in a data integration system is complicated by a lack of quality statistics about the data, unpredictable and bursty data transfer rates, and slow or unavailable data sources. Conventional query processing algorithms, which are based on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 2, Issue 1
August 2009
1293 pages
ISSN:2150-8097
Editors:
Serge Abiteboul,
Tova Milo,
Jignesh Patel,
Philippe Rigaux
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2009
Published in pvldb Volume 2, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 415
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A scalable, predictable join operator for highly concurrent data warehouses

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Efficient of bitmap join indexes for optimising star join queries in relational data warehouses

Extended Adaptive Join Operator with Bind-Bloom Join for Federated SPARQL Queries

Join and multi-join processing in data integration systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A scalable, predictable join operator for highly concurrent data warehouses

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Efficient of bitmap join indexes for optimising star join queries in relational data warehouses

Extended Adaptive Join Operator with Bind-Bloom Join for Federated SPARQL Queries

Join and multi-join processing in data integration systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media