Abstract
Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention, because the physical plans---unaware of each other---compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly when multiple complex queries run at the same time.
We describe an augmentation of traditional query engines that improves join throughput in large-scale concurrent data warehouses. In contrast to the conventional query-at-a-time model, our approach employs a single physical plan that can share I/O, computation, and tuple storage across all in-flight join queries. We use an "always-on" pipeline of non-blocking operators, coupled with a controller that continuously examines the current query mix and performs run-time optimizations. Our design allows the query engine to scale gracefully to large data sets, provide predictable execution times, and reduce contention. In our empirical evaluation, we found that our prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries.
- D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In ACM SIGMOD Intl. Conf. on Management of Data, 2006. Google ScholarDigital Library
- D. J. Abadi, D. S. Myers, D. J. Dewitt, and S. R. Madden. Materialization strategies in a column-oriented DBMS. In Intl. Conf. on Data Engineering, 2007.Google ScholarCross Ref
- J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman. Efficient pattern matching over event streams. In Intl. Conf. on Data Engineering, 2008.Google ScholarDigital Library
- R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In ACM SIGMOD Intl. Conf. on Management of Data, 2000. Google ScholarDigital Library
- S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom. Adaptive ordering of pipelined stream filters. In ACM SIGMOD Intl. Conf. on Management of Data, 2004. Google ScholarDigital Library
- S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. TelegraphCQ: continuous dataflow processing. In ACM SIGMOD Intl. Conf. on Management of Data, 2003. Google ScholarDigital Library
- J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: a scalable continuous query system for internet databases. SIGMOD Record, 29(2), 2000. Google ScholarDigital Library
- J. Cieslewicz and A. Ross, Kenneth. Adaptive aggregation on chip multiprocessors. In Intl. Conf. on Very Large Data Bases, 2007. Google ScholarDigital Library
- P. M. Fernandez. Red Brick warehouse: A read-mostly RDBMS for open SMP platforms. In ACM SIGMOD Intl. Conf. on Management of Data, 1994. Google ScholarDigital Library
- V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In ACM SIGMOD Intl. Conf. on Management of Data, 1996. Google ScholarDigital Library
- S. Harizopoulos and A. Ailamaki. StagedDB: Designing database servers for modern hardware. IEEE Data Eng. Bulletin, 28(2), 2005.Google Scholar
- S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. Qpipe: a simultaneously pipelined relational query engine. In In ACM SIGMOD Intl. Conf. on Management of data, 2005. Google ScholarDigital Library
- Z. Liu, S. Parthasarathy, A. Ranganathan, and H. Yang. A generic flow algorithm for shared filter ordering problems. In Symp. on Principles of Database Systems, New York, NY, USA, 2008. Google ScholarDigital Library
- Z. Liu, S. Parthasarathy, A. Ranganathan, and H. Yang. Near-optimal algorithms for shared filter evaluation in data stream systems. In ACM SIGMOD Intl. Conf. on Management of Data, 2008. Google ScholarDigital Library
- S. Madden, M. Shah, M. Hellerstein, Joseph, and V. Raman. Continuously adaptive continuous queries over streams. In ACM SIGMOD Intl. Conf. on Management of Data, 2002. Google ScholarDigital Library
- A. Majumder, R. Rastogi, and S. Vanama. Scalable regular expression matching on data streams. In Intl. Conf. on Data Engineering, 2008. Google ScholarDigital Library
- E. B. O. Patrick O'Neil and X. Chen. The Star Schema Benchmark. http://www.cs.umb.edu/~poneil/StarSchemaB.PDF, 2007.Google Scholar
- L. Qiao, V. Raman, F. Reiss, P. Haas, and G. Lohman. Main-memory scan sharing for multi-core CPUs. In Intl. Conf. on Very Large Data Bases, 2008. Google ScholarDigital Library
- V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In Intl. Conf. on Data Engineering, 2008. Google ScholarDigital Library
- N. Roussopoulos, C.-M. Chen, S. Kelley, A. Delis, and Y. Papakonstantinou. The ADMS project: Views R Us. IEEE Data Eng. Bulletin, 18(2), 1995.Google Scholar
- T. K. Sellis. Multiple-query optimization. ACM Trans. Database Systems, 13(1):23--52, 1988. Google ScholarDigital Library
- M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era (it's time for a complete rewrite). In Intl. Conf. on Very Large Data Bases, 2007. Google ScholarDigital Library
- TPC benchmark DS (decision support), draft specification, revision 32. http://www.tpc.org/tpcds/spec/tpcds32.pdf.Google Scholar
- M. Zukowski, S. Héman, N. Nes, and P. Boncz. Cooperative scans: dynamic bandwidth sharing in a DBMS. In Intl. Conf. on Very Large Data Bases, 2007. Google ScholarDigital Library
Index Terms
- A scalable, predictable join operator for highly concurrent data warehouses
Recommendations
Efficient of bitmap join indexes for optimising star join queries in relational data warehouses
Data warehouses are dedicated to analysis and decision-making applications. They are often schematised as star relational models or variants for on-line analysis. Typically, the analysis process is conducted via online analytical processing (OLAP) type ...
Extended Adaptive Join Operator with Bind-Bloom Join for Federated SPARQL Queries
The goal of query optimization in query federation over linked data is to minimize the response time and the completion time. Communication time has the highest impact on them both. Static query optimization can end up with inefficient execution plans ...
Join and multi-join processing in data integration systems
Query processing in a data integration system is complicated by a lack of quality statistics about the data, unpredictable and bursty data transfer rates, and slow or unavailable data sources. Conventional query processing algorithms, which are based on ...
Comments