Abstract
This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. “Highly-parallel” implies that load balancing is a critical performance issue. “Data-intensive” means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue.
In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i e, spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead.
We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.
- Ale87 W Alexander, T Keller and E Boughter, "A Workload Characterization P~pehne for Models of Parallel Systems," ACM SIGMETRICS Conference, Alberta, Canada (May 1987) Google ScholarDigital Library
- Ale88 W Alexander and G Copeland, "Comparison Of Dataflow Control Techniques In Distributed Data-Intensive Systems," ACM SIGMETRICS Conference, Santa Fe, New Mexico (May 1988) Google ScholarDigital Library
- AlC88 W Alexander and G Copeland, "Process And Dataflow Control In D~stnbuted data-Intensive Systems," ACM SIGMOD Conference, Chicago (June 1988) Google ScholarDigital Library
- Ano85 Anon el al, "A Measure Of Transaction Processing Power," Datamatton, Vol 31 7 (Aprd 1985) Google ScholarDigital Library
- Att84 R Attar, P Bemstem and N Goodman, "Site Imtmhzat~on, Recovery And Backup In A D~stnbuted Database System," IEEE Transactions on Software Engmeermg, Vol SE-10, No 6 (November 1984)Google Scholar
- Bat82 D S Batory, "Optimal File Designs And Reorgamzat~on Points," ACM TODS, Vol 7, No 1 (March 1982) Google ScholarDigital Library
- Bou87 E Boughter, W Alexander and T Keller, "A Tool for Performance-Driven Design of Parallel Systems", MCC Tech Report ACA-ST-312-87 (1987)Google Scholar
- Bun84 R Bunt, J Murphy, and S Majumdar, "A Measure of Program Locahty and ~ts Apphcations," A CM SIGMETRICS Conference, Cambridge, Mass (May 1984) Google ScholarDigital Library
- Chu69 W W Chu, "Multiple File Allocations m a Multiple Computer System," IEEE Trans on Computers, Vol C-18, No 10 (October 1969)Google Scholar
- Cve87 Z Cvetanowe, "The Effects Of Problem Partmonmg, Alloeauon, and Granularity On The Performance Of Multiple-Processor Systems," IEEE Trans on Computers, Vol C-36, No 4 (Aprd 1987) Google ScholarDigital Library
- Den78 Denning, P, Buzen, J, "The Operational Analysis of Queuing Network Models", ACM Computing Surveys Vol 10, No 3 (September 1978) Google ScholarDigital Library
- DeW86 D J DeW~tt, R H Gerber, G Graefe, M H Heytens, K B Kumar and M Murahknshna, "GAMMA--A High Performance Dataflow Database Machine," VLDB Conference, Japan (August 1986) Google ScholarDigital Library
- DeW87 D J DeWltt, S Ghandeharizadeh, D Schneider, R Jauhan, M Mural~knshna and A Sharma, "A Single User Evaluation Of The Gamma Database Machme," Proceedings of the Fifth International Workshop on Database Machines, Japan (October 1987)Google Scholar
- Eas74 K Eswaran, "Placement of Records m a Fde and Fde Allocation m a Computer Network," lnformatton Processtng 74, IFIPS (1974)Google Scholar
- Flo78 A Flory, J Gunther and J Kouloumdjtan, "Database Reorganization By Clustering Methods," lnformatton Systems, Vol 3, No 1 (1978)Google Scholar
- Gra78 J Gray, "Notes on Database Operating Systems," IBM Research Laboratory, San Jose, Report RJ2188 (1978)Google Scholar
- Gra87 J N Gray and F Putzolu, "The 5 Minute Rule for Trading Memory for Disc Accesses and the 10 Byte Rule for Trading Memory for CPU Time," A CM SlGMOD Conference, San Francisco (May 1987) Google ScholarDigital Library
- Hwa84 K Hwang and F Bnggs, Computer Archttecture And Parallel Processing, McGraw-Hall Pub Co (1984) Google ScholarDigital Library
- Jak80 M Jakobsson, "Reducing Block Accesses In inverted Fdes By Partml Clustering," Information Systems, Vol 5, No 1 (1980)Google Scholar
- Kat78 J A Katzman, "A Fault-Tolerant Celnputmg System," Eleventh Conference on System Sczences, Hawan (January 1978)Google Scholar
- Laz84 E Lazowska, J Zahorjan, G Graham, K Sevcik, Quantttattve System Performance, Prentice-Hall (1984) Google ScholarDigital Library
- Liv87 M Livny, S Khoshafian and H Boral, "Multi-Disk Management," ACM SIGMETRICS Conference, Alberta, Canada (1987) Google ScholarDigital Library
- Mah76 S Mahmoud and J S Raordon, "Optimal Allocation of Resources m D~stnbuted Information Networks", ACM TODS, Vol 1, No 1 (March 1976) Google ScholarDigital Library
- Mar76 K Maruyama and S E Smith, "Optimal Reorgamzauon Of D~strlbuted Space D~sk Flies," Commun of the ACM, Vol 19, No 11 (November 1976) Google ScholarDigital Library
- Muk87 R Mukkamala, "Design of Partmlly Replmated D~stnbuted Database Systems An Integrated Methodology," Tech Report 87-04, Department of Computer Science, Umverslty of Iowa (July 1987)Google Scholar
- Omi83 E Omiecmskl and P Scheuermann, "A Global Approach To Record Clustering and File Reorgamzataon," Techmcal Report, Department Of EECS, Northwestern Umverslty (December 1983)Google Scholar
- Sam87 H W Sammer, "Online Stock Trading Systems Study Of An Apphcatlon," IEEE COMPCON, San Francisco (February 1987)Google Scholar
- Shn73 B Shnelderman, "Optimum Data Base Reorganization Points," Commun of the ACM, Vol 16, No 6 (June 1973) Google ScholarDigital Library
- Soc79 G H Sockut and R P Goldberg, "Database Reorgamzatton---Prmc~ples And Practices," ACM Computtng Surveys, Vol 11, No 4 (December 1979) Google ScholarDigital Library
- Sto86 M Stonebraker, "The Case For Shared Nothing," Database Engtneermg Conf, Vol 9, No 1 (March 1986)Google Scholar
- Tan87 The Tandem Database Group, "NonStop SQL, A D~stnbuted, High-Performance, Hlgh-Avadabd~ty Implementation of SQL," Workshop on Hzgh Performance Transactton Systems, Asdomar, CA (September 1987) Google ScholarDigital Library
- Ter85 "DBC/1012 Data Base Computer System Manual, Release 1 3," C10-0001-01, Teradata Corp, Los Angeles (February 1985)Google Scholar
- Tue78 W G Tuel, "Optimal Reorgamzatlon Points For Lmearly Growlng Fdes," ACM TODS, Vol 3, No 1 (March 1978) Google ScholarDigital Library
- Vrs85 D Vrsalowc, E F Gehrmger, Z Z Segal and D P S~ewtorek, "The Influence Of Parallel Decomposmon Strategies On The Perlormanee Of Multlprocessor Systems,"IEEE/ACM Symposmm on Computer Archttecture, Boston (June 1985) Google ScholarDigital Library
- Yao76 S B Yao, K S Das and T J Teorey, "A Dynamic Database Reorgamzatlon Algorithm," ACM TODS, Vol 1, No 2 (June 1976) Google ScholarDigital Library
- Yu85 CT Yu, CM Such, K Lam and MK Sin, "Adaptive Record Clustering," ACM TODS, Vol 10, No 2 (June 1985) Google ScholarDigital Library
Index Terms
- Data placement in Bubba
Recommendations
Data placement in Bubba
SIGMOD '88: Proceedings of the 1988 ACM SIGMOD international conference on Management of dataThis paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. “Highly-parallel” implies that load balancing is a critical performance issue. “Data-intensive” means data is so ...
Data Placement Techniques for Serpentine Tapes
HICSS '00: Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 8 - Volume 8Due to the information explosion we are witnessing, a growing number of applications store, maintain, and retrieve large volumes of data, where the data is required to be available online or near-online. These data repositories are implemented using ...
A priority-based data placement method for databases using solid-state drives
RACS '18: Proceedings of the 2018 Conference on Research in Adaptive and Convergent SystemsWhen applications require high I/O performance, solid-state drives (SSDs) are often preferable because they perform better than traditional hard-disk drives (HDDs). Therefore, database system response time can be improved by moving frequently used data ...
Comments