ABSTRACT
Good database design is typically a very difficult and costly process. As database systems get more complex and as the amount of data under management grows, the stakes increase accordingly. Past research produced a number of design tools capable of automatically selecting secondary indexes and materialized views for a known workload. However, a significant bulk of research on automated database design has been done in the context of row-store DBMSes. While this work has produced effective design tools, new specialized database architectures demand a rethinking of automated design algorithms.
In this paper, we present results for an automatic design tool that is aimed at column-oriented DBMSes on OLAP workloads. In particular, we have chosen a commercial column store DBMS that supports data sorting. In this setting, the key problem is selecting proper sort orders and compression schemes for the columns as well as appropriate pre-join views. This paper describes our automatic design algorithms as well as the results of some experiments using it on realistic data sets.
- Create indexes with included columns. http://msdn.microsoft.com/en-us/library/ms190806.aspx.Google Scholar
- Ibm ilog cplex optimizer. http://www-01.ibm.com/software/integration/optimization/cplex-optimizer/.Google Scholar
- InnoDB Engine. http://www.innodb.com/.Google Scholar
- Vertica. http://www.vertica.com/.Google Scholar
- D. J. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD Conference, pages 671--682, 2006. Google ScholarDigital Library
- S. Agrawal, S. Chaudhuri, L. Kollár, A. P. Marathe, V. R. Narasayya, and M. Syamala. Database tuning advisor for microsoft sql server 2005. In VLDB, pages 1110--1121, 2004.Google ScholarCross Ref
- S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, pages 496--505, 2000. Google ScholarDigital Library
- N. Bruno and S. Chaudhuri. Automatic physical database tuning: A relaxation-based approach. In SIGMOD Conference, pages 227--238, 2005. Google ScholarDigital Library
- S. Chaudhuri and U. Dayal. An Overview of Data Warehousing and OLAP Technology. SIGMOD Record, 26(1):65--74, 1997. Google ScholarDigital Library
- S. Chaudhuri, A. K. Gupta, and V. R. Narasayya. Compressing sql workloads. In SIGMOD Conference, pages 488--499, 2002. Google ScholarDigital Library
- S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft sql server. In VLDB, pages 146--155, 1997. Google ScholarDigital Library
- S. Chaudhuri and V. R. Narasayya. Autoadmin 'what-if' index analysis utility. In SIGMOD Conference, pages 367--378, 1998. Google ScholarDigital Library
- S. Chaudhuri and V. R. Narasayya. Index merging. In ICDE, pages 296--303, 1999. Google ScholarDigital Library
- H. Gupta. Selection of views to materialize in a data warehouse. In ICDT, pages 98--112, 1997. Google ScholarDigital Library
- J. Han and M. Kamber. Data Mining: Concepts and Techinques. Morgan Kaufmann Publishers, 2nd edition edition, 2006. Google ScholarDigital Library
- V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD Conference, pages 205--216, 1996. Google ScholarDigital Library
- S. Héman, M. Zukowski, N. J. Nes, L. Sidirourgos, and P. A. Boncz. Positional update handling in column stores. In SIGMOD Conference, pages 543--554, 2010. Google ScholarDigital Library
- S. Idreos, M. Kersten, and S. Manegold. Database cracking. 2007.Google Scholar
- H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. CORADD: Correlation Aware Database Designer for Materialized Views and Indexes. In Proceedings of the 36th International Conference on Very Large Data Bases. VLDB Endowment, September 2010. Google ScholarDigital Library
- A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandier, L. Doshi, and C. Bear. The Vertica Analytic Database: C-Store 7 Years Later. CoRR, abs/1208.4173, 2012.Google Scholar
- P. O'Neil, E. O'Neil, and X. Chen. The Star Schema Benchmark (SSB). http://www.cs.umb.edu/~poneil/StarSchemaB.PDF.Google Scholar
- S. Papadomanolakis and A. Ailamaki. Autopart: Automating schema design for large scientific databases using data partitioning. In SSDBM, pages 383--392. IEEE Computer Society, 2004. Google ScholarDigital Library
- S. Papadomanolakis and A. Ailamaki. An integer linear programming approach to database design. In ICDE Workshops, pages 442--449. IEEE Computer Society, 2007. Google ScholarDigital Library
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A Column-oriented DBMS. In VLDB, pages 553--564, 2005. Google ScholarDigital Library
- C. Yang, C. Yen, C. Tan, and S. R. Madden. Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pages 657--668. IEEE, 2010.Google ScholarCross Ref
- D. C. Zilio, J. Rao, S. Lightstone, G. M. Lohman, A. J. Storm, C. Garcia-Arellano, and S. Fadden. DB2 design Advisor: Integrated Automatic Physical Database Design. In VLDB, pages 1087--1097, 2004. Google ScholarDigital Library
- J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337--343, 1977. Google ScholarDigital Library
Index Terms
- An automatic physical design tool for clustered column-stores
Recommendations
Column-stores vs. row-stores: how different are they really?
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataThere has been a significant amount of excitement and recent work on column-oriented database systems ("column-stores"). These database systems have been shown to perform more than an order of magnitude better than traditional row-oriented database ...
Modern Column Stores for Big Data Processing
Big Data AnalyticsAbstractThe advent of MapReduce/Hadoop and NoSQL databases undermined the primacy of SQL relational databases for data processing. Pioneering work by researchers on MonetDB and C-Store opened up the world of column stores that retain the SQL model but use ...
Graph DBs vs. Column-Oriented Stores: A Pure Performance Comparison
ALGOCLOUD 2015: Revised Selected Papers of the First International Workshop on Algorithmic Aspects of Cloud Computing - Volume 9511Cloud Computing has brought a great change in the way information is stored and applications run. In order for one or more clusters to work as a cloud we need a middleware framework, such as Apache Hadoop [17], that provides reliability, scalability and ...
Comments