research-article

Automated partitioning design in parallel database systems

Authors:
Rimma Nehme

Microsoft Jim Gray Systems Lab, Madison, WI, USA

Microsoft Jim Gray Systems Lab, Madison, WI, USA
View Profile

,
Nicolas Bruno

Microsoft, Redmond, WA, USA

Microsoft, Redmond, WA, USA
View Profile

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataJune 2011Pages 1137–1148https://doi.org/10.1145/1989323.1989444

Published:12 June 2011Publication History

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Pages 1137–1148

ABSTRACT

In recent years, Massively Parallel Processors (MPPs) have gained ground enabling vast amounts of data processing. In such environments, data is partitioned across multiple compute nodes, which results in dramatic performance improvements during parallel query execution. To evaluate certain relational operators in a query correctly, data sometimes needs to be re-partitioned (i.e., moved) across compute nodes. Since data movement operations are much more expensive than relational operations, it is crucial to design a suitable data partitioning strategy that minimizes the cost of such expensive data transfers. A good partitioning strategy strongly depends on how the parallel system would be used. In this paper we present a partitioning advisor that recommends the best partitioning design for an expected workload. Our tool recommends which tables should be replicated (i.e., copied into every compute node) and which ones should be distributed according to specific column(s) so that the cost of evaluating similar workloads is minimized. In contrast to previous work, our techniques are deeply integrated with the underlying parallel query optimizer, which results in more accurate recommendations in a shorter amount of time. Our experimental evaluation using a real MPP system, Microsoft SQL Server 2008 Parallel Data Warehouse, with both real and synthetic workloads shows the effectiveness of the proposed techniques and the importance of deep integration of the partitioning advisor with the underlying query optimizer.

References

Aster Data. http://www.asterdata.com/.Google Scholar
Greenplum. http://www.greenplum.com/.Google Scholar
Netezza. http://www.netezza.com/.Google Scholar
SQL Server 2008 R2 Parallel Data Warehouse. http://www.microsoft.com/sqlserver/2008/en/us/parallel-data-warehouse.aspx.Google Scholar
The Vertica Analytic Database Technical Overview White Paper. http://www.vertica-solutions.com/files/10986/verticaarchitecturewhit%epaper.pdf.Google Scholar
A. Abouzeid et.al. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922--933, 2009. Google ScholarDigital Library
A. Kiran et.al. Two techniques for on-line index modification in shared nothing parallel databases. In SIGMOD, 1996. Google ScholarDigital Library
S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, pages 496--505, 2000. Google ScholarDigital Library
S. Agrawal, V. R. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, pages 359--370, 2004. Google ScholarDigital Library
N. Bruno and R. V. Nehme. Configuration-parametric query optimization for physical design tuning. In SIGMOD, 2008. Google ScholarDigital Library
C. Curino et.al. Schism: a workload-driven approach to database replication and partitioning. In VLDB, 2010. Google ScholarDigital Library
S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for Microsoft SQL Server. In VLDB, 1997. Google ScholarDigital Library
S. Chaudhuri and V. R. Narasayya. AutoAdmin 'What-if' index analysis utility. In SIGMOD, pages 367--378, 1998. Google ScholarDigital Library
D. C. Zilio et.al. DB2 design advisor: Integrated automatic physical database design. In VLDB, pages 1087--1097, 2004. Google ScholarDigital Library
D. DeWitt and J. Gray. Parallel database systems: the future of high perf. database systems. Com. ACM, 35(6):85--98, 1992. Google ScholarDigital Library
G. Cong et.al. Distributed query evaluation with performance guarantees. In SIGMOD, pages 509--520, 2007. Google ScholarDigital Library
Ganguly, Sumit et.al. Efficient and accurate cost models for parallel query optimization. In PODS, pages 172--181, 1996. Google ScholarDigital Library
G. Graefe. The Cascades framework for query optimization. IEEE Data Eng. Bull., 18(3):19--29, 1995.Google Scholar
G. Graefe and W. J. McKenna. The Volcano optimizer generator: Extensibility and efficient search. In ICDE, 1993. Google ScholarDigital Library
Isard, Michael et.al. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72, 2007. Google ScholarDigital Library
W. H. Kohler and K. Steiglitz. Characterization and theoretical comparison of branch-and-bound algorithms for permutation problems. J. ACM, 21(1):140--156, 1974. Google ScholarDigital Library
T.-H. Lai and A. P. Sprague. Performance of parallel branch-and-bound algorithms. In ICPP, pages 194--201, 1985.Google ScholarCross Ref
P. G. Selinger et.al. Access path selection in a relational database management system. In SIGMOD, pages 23--34, 1979. Google ScholarDigital Library
R. Chaiken et.al. Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow., 1(2):1265--1276, 2008. Google ScholarDigital Library
J. Rao, C. Zhang, N. Megiddo, and G. M. Lohman. Automating physical database design in a parallel database. In SIGMOD, pages 558--569, 2002. Google ScholarDigital Library
S. Ghandeharizadeh et.al. Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines. In VLDB, pages 481--492, 1990. Google ScholarDigital Library
S. Papadomanolakis et.al. Efficient use of the query optimizer for automated database design. In VLDB, 2007. Google ScholarDigital Library
M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. Map-Reduce and parallel DBMSs: friends or foes? Commun. ACM, 53(1):64--71, 2010. Google ScholarDigital Library
T. Stöhr et.al. Multi-dimensional database allocation for parallel data warehouses. In VLDB, 2000. Google ScholarDigital Library
Y. Hung-Chih et.al. Map-Reduce-Merge: simplified relational data processing on large clusters. In SIGMOD, 2007. Google ScholarDigital Library
D. C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, University of Toronto, 1998. Google ScholarDigital Library

Index Terms

Automated partitioning design in parallel database systems
1. Information systems
  1. Data management systems
    1. Data structures
      1. Data access methods

Recommendations

Horizontal partitioning method for test verification in parallel database systems

In parallel database systems the partitioning methods considered in current researches are static. This research paper presents a partitioning method to divide the database relations into dynamic horizontal partitions. Every partition contains some ...
Read More
Partitioning in parallel processing of production systems
Read More
A load balancing approach for parallel database machines
PDP '95: Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing

Parallel database systems have become a major tool for high performance information processing. These systems require efficient load balancing approaches to partition each relation and to allocate them to the parallel architecture. If the database is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
General Chair:
Timos Sellis
IMIS/RC Athena
,
Program Chair:
Renée J. Miller
University of Toronto
,
Publications Chairs:
Anastasios Kementsietsidis
IBM T.J. Watson Research Center
,
Yannis Velegrakis
University of Trento
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed databases
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 93
  Total Citations
  View Citations
- 1,373
  Total Downloads
- Downloads (Last 12 months)68
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automated partitioning design in parallel database systems

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Horizontal partitioning method for test verification in parallel database systems

Partitioning in parallel processing of production systems

A load balancing approach for parallel database machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automated partitioning design in parallel database systems

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Horizontal partitioning method for test verification in parallel database systems

Partitioning in parallel processing of production systems

A load balancing approach for parallel database machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media