skip to main content
10.1145/1989323.1989444acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Automated partitioning design in parallel database systems

Published:12 June 2011Publication History

ABSTRACT

In recent years, Massively Parallel Processors (MPPs) have gained ground enabling vast amounts of data processing. In such environments, data is partitioned across multiple compute nodes, which results in dramatic performance improvements during parallel query execution. To evaluate certain relational operators in a query correctly, data sometimes needs to be re-partitioned (i.e., moved) across compute nodes. Since data movement operations are much more expensive than relational operations, it is crucial to design a suitable data partitioning strategy that minimizes the cost of such expensive data transfers. A good partitioning strategy strongly depends on how the parallel system would be used. In this paper we present a partitioning advisor that recommends the best partitioning design for an expected workload. Our tool recommends which tables should be replicated (i.e., copied into every compute node) and which ones should be distributed according to specific column(s) so that the cost of evaluating similar workloads is minimized. In contrast to previous work, our techniques are deeply integrated with the underlying parallel query optimizer, which results in more accurate recommendations in a shorter amount of time. Our experimental evaluation using a real MPP system, Microsoft SQL Server 2008 Parallel Data Warehouse, with both real and synthetic workloads shows the effectiveness of the proposed techniques and the importance of deep integration of the partitioning advisor with the underlying query optimizer.

References

  1. Aster Data. http://www.asterdata.com/.Google ScholarGoogle Scholar
  2. Greenplum. http://www.greenplum.com/.Google ScholarGoogle Scholar
  3. Netezza. http://www.netezza.com/.Google ScholarGoogle Scholar
  4. SQL Server 2008 R2 Parallel Data Warehouse. http://www.microsoft.com/sqlserver/2008/en/us/parallel-data-warehouse.aspx.Google ScholarGoogle Scholar
  5. The Vertica Analytic Database Technical Overview White Paper. http://www.vertica-solutions.com/files/10986/verticaarchitecturewhit%epaper.pdf.Google ScholarGoogle Scholar
  6. A. Abouzeid et.al. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922--933, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Kiran et.al. Two techniques for on-line index modification in shared nothing parallel databases. In SIGMOD, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, pages 496--505, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Agrawal, V. R. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, pages 359--370, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Bruno and R. V. Nehme. Configuration-parametric query optimization for physical design tuning. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Curino et.al. Schism: a workload-driven approach to database replication and partitioning. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for Microsoft SQL Server. In VLDB, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Chaudhuri and V. R. Narasayya. AutoAdmin 'What-if' index analysis utility. In SIGMOD, pages 367--378, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. C. Zilio et.al. DB2 design advisor: Integrated automatic physical database design. In VLDB, pages 1087--1097, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. DeWitt and J. Gray. Parallel database systems: the future of high perf. database systems. Com. ACM, 35(6):85--98, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Cong et.al. Distributed query evaluation with performance guarantees. In SIGMOD, pages 509--520, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ganguly, Sumit et.al. Efficient and accurate cost models for parallel query optimization. In PODS, pages 172--181, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Graefe. The Cascades framework for query optimization. IEEE Data Eng. Bull., 18(3):19--29, 1995.Google ScholarGoogle Scholar
  19. G. Graefe and W. J. McKenna. The Volcano optimizer generator: Extensibility and efficient search. In ICDE, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Isard, Michael et.al. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. H. Kohler and K. Steiglitz. Characterization and theoretical comparison of branch-and-bound algorithms for permutation problems. J. ACM, 21(1):140--156, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T.-H. Lai and A. P. Sprague. Performance of parallel branch-and-bound algorithms. In ICPP, pages 194--201, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  23. P. G. Selinger et.al. Access path selection in a relational database management system. In SIGMOD, pages 23--34, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Chaiken et.al. Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow., 1(2):1265--1276, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Rao, C. Zhang, N. Megiddo, and G. M. Lohman. Automating physical database design in a parallel database. In SIGMOD, pages 558--569, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Ghandeharizadeh et.al. Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines. In VLDB, pages 481--492, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Papadomanolakis et.al. Efficient use of the query optimizer for automated database design. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. Map-Reduce and parallel DBMSs: friends or foes? Commun. ACM, 53(1):64--71, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Stöhr et.al. Multi-dimensional database allocation for parallel data warehouses. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Hung-Chih et.al. Map-Reduce-Merge: simplified relational data processing on large clusters. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, University of Toronto, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automated partitioning design in parallel database systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
      June 2011
      1364 pages
      ISBN:9781450306614
      DOI:10.1145/1989323

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 June 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader