skip to main content
article
Free Access

Data placement in Bubba

Published:01 June 1988Publication History
Skip Abstract Section

Abstract

This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. “Highly-parallel” implies that load balancing is a critical performance issue. “Data-intensive” means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue.

In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i e, spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead.

We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.

References

  1. Ale87 W Alexander, T Keller and E Boughter, "A Workload Characterization P~pehne for Models of Parallel Systems," ACM SIGMETRICS Conference, Alberta, Canada (May 1987) Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ale88 W Alexander and G Copeland, "Comparison Of Dataflow Control Techniques In Distributed Data-Intensive Systems," ACM SIGMETRICS Conference, Santa Fe, New Mexico (May 1988) Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AlC88 W Alexander and G Copeland, "Process And Dataflow Control In D~stnbuted data-Intensive Systems," ACM SIGMOD Conference, Chicago (June 1988) Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ano85 Anon el al, "A Measure Of Transaction Processing Power," Datamatton, Vol 31 7 (Aprd 1985) Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Att84 R Attar, P Bemstem and N Goodman, "Site Imtmhzat~on, Recovery And Backup In A D~stnbuted Database System," IEEE Transactions on Software Engmeermg, Vol SE-10, No 6 (November 1984)Google ScholarGoogle Scholar
  6. Bat82 D S Batory, "Optimal File Designs And Reorgamzat~on Points," ACM TODS, Vol 7, No 1 (March 1982) Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bou87 E Boughter, W Alexander and T Keller, "A Tool for Performance-Driven Design of Parallel Systems", MCC Tech Report ACA-ST-312-87 (1987)Google ScholarGoogle Scholar
  8. Bun84 R Bunt, J Murphy, and S Majumdar, "A Measure of Program Locahty and ~ts Apphcations," A CM SIGMETRICS Conference, Cambridge, Mass (May 1984) Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chu69 W W Chu, "Multiple File Allocations m a Multiple Computer System," IEEE Trans on Computers, Vol C-18, No 10 (October 1969)Google ScholarGoogle Scholar
  10. Cve87 Z Cvetanowe, "The Effects Of Problem Partmonmg, Alloeauon, and Granularity On The Performance Of Multiple-Processor Systems," IEEE Trans on Computers, Vol C-36, No 4 (Aprd 1987) Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Den78 Denning, P, Buzen, J, "The Operational Analysis of Queuing Network Models", ACM Computing Surveys Vol 10, No 3 (September 1978) Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. DeW86 D J DeW~tt, R H Gerber, G Graefe, M H Heytens, K B Kumar and M Murahknshna, "GAMMA--A High Performance Dataflow Database Machine," VLDB Conference, Japan (August 1986) Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. DeW87 D J DeWltt, S Ghandeharizadeh, D Schneider, R Jauhan, M Mural~knshna and A Sharma, "A Single User Evaluation Of The Gamma Database Machme," Proceedings of the Fifth International Workshop on Database Machines, Japan (October 1987)Google ScholarGoogle Scholar
  14. Eas74 K Eswaran, "Placement of Records m a Fde and Fde Allocation m a Computer Network," lnformatton Processtng 74, IFIPS (1974)Google ScholarGoogle Scholar
  15. Flo78 A Flory, J Gunther and J Kouloumdjtan, "Database Reorganization By Clustering Methods," lnformatton Systems, Vol 3, No 1 (1978)Google ScholarGoogle Scholar
  16. Gra78 J Gray, "Notes on Database Operating Systems," IBM Research Laboratory, San Jose, Report RJ2188 (1978)Google ScholarGoogle Scholar
  17. Gra87 J N Gray and F Putzolu, "The 5 Minute Rule for Trading Memory for Disc Accesses and the 10 Byte Rule for Trading Memory for CPU Time," A CM SlGMOD Conference, San Francisco (May 1987) Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hwa84 K Hwang and F Bnggs, Computer Archttecture And Parallel Processing, McGraw-Hall Pub Co (1984) Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jak80 M Jakobsson, "Reducing Block Accesses In inverted Fdes By Partml Clustering," Information Systems, Vol 5, No 1 (1980)Google ScholarGoogle Scholar
  20. Kat78 J A Katzman, "A Fault-Tolerant Celnputmg System," Eleventh Conference on System Sczences, Hawan (January 1978)Google ScholarGoogle Scholar
  21. Laz84 E Lazowska, J Zahorjan, G Graham, K Sevcik, Quantttattve System Performance, Prentice-Hall (1984) Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liv87 M Livny, S Khoshafian and H Boral, "Multi-Disk Management," ACM SIGMETRICS Conference, Alberta, Canada (1987) Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mah76 S Mahmoud and J S Raordon, "Optimal Allocation of Resources m D~stnbuted Information Networks", ACM TODS, Vol 1, No 1 (March 1976) Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mar76 K Maruyama and S E Smith, "Optimal Reorgamzauon Of D~strlbuted Space D~sk Flies," Commun of the ACM, Vol 19, No 11 (November 1976) Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Muk87 R Mukkamala, "Design of Partmlly Replmated D~stnbuted Database Systems An Integrated Methodology," Tech Report 87-04, Department of Computer Science, Umverslty of Iowa (July 1987)Google ScholarGoogle Scholar
  26. Omi83 E Omiecmskl and P Scheuermann, "A Global Approach To Record Clustering and File Reorgamzataon," Techmcal Report, Department Of EECS, Northwestern Umverslty (December 1983)Google ScholarGoogle Scholar
  27. Sam87 H W Sammer, "Online Stock Trading Systems Study Of An Apphcatlon," IEEE COMPCON, San Francisco (February 1987)Google ScholarGoogle Scholar
  28. Shn73 B Shnelderman, "Optimum Data Base Reorganization Points," Commun of the ACM, Vol 16, No 6 (June 1973) Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Soc79 G H Sockut and R P Goldberg, "Database Reorgamzatton---Prmc~ples And Practices," ACM Computtng Surveys, Vol 11, No 4 (December 1979) Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sto86 M Stonebraker, "The Case For Shared Nothing," Database Engtneermg Conf, Vol 9, No 1 (March 1986)Google ScholarGoogle Scholar
  31. Tan87 The Tandem Database Group, "NonStop SQL, A D~stnbuted, High-Performance, Hlgh-Avadabd~ty Implementation of SQL," Workshop on Hzgh Performance Transactton Systems, Asdomar, CA (September 1987) Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ter85 "DBC/1012 Data Base Computer System Manual, Release 1 3," C10-0001-01, Teradata Corp, Los Angeles (February 1985)Google ScholarGoogle Scholar
  33. Tue78 W G Tuel, "Optimal Reorgamzatlon Points For Lmearly Growlng Fdes," ACM TODS, Vol 3, No 1 (March 1978) Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vrs85 D Vrsalowc, E F Gehrmger, Z Z Segal and D P S~ewtorek, "The Influence Of Parallel Decomposmon Strategies On The Perlormanee Of Multlprocessor Systems,"IEEE/ACM Symposmm on Computer Archttecture, Boston (June 1985) Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yao76 S B Yao, K S Das and T J Teorey, "A Dynamic Database Reorgamzatlon Algorithm," ACM TODS, Vol 1, No 2 (June 1976) Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yu85 CT Yu, CM Such, K Lam and MK Sin, "Adaptive Record Clustering," ACM TODS, Vol 10, No 2 (June 1985) Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data placement in Bubba

                          Recommendations

                          Comments

                          Login options

                          Check if you have access through your login credentials or your institution to get full access on this article.

                          Sign in

                          Full Access

                          • Published in

                            cover image ACM SIGMOD Record
                            ACM SIGMOD Record  Volume 17, Issue 3
                            June 1988
                            431 pages
                            ISSN:0163-5808
                            DOI:10.1145/971701
                            Issue’s Table of Contents
                            • cover image ACM Conferences
                              SIGMOD '88: Proceedings of the 1988 ACM SIGMOD international conference on Management of data
                              June 1988
                              443 pages
                              ISBN:0897912683
                              DOI:10.1145/50202

                            Copyright © 1988 ACM

                            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                            Publisher

                            Association for Computing Machinery

                            New York, NY, United States

                            Publication History

                            • Published: 1 June 1988

                            Check for updates

                            Qualifiers

                            • article

                          PDF Format

                          View or Download as a PDF file.

                          PDF

                          eReader

                          View online with eReader.

                          eReader