Abstract
Computer architectures are increasingly based on multi-core CPUs and large memories. Memory bandwidth, which has riot kept pace with the increasing number of cores, has become the primary processing bottleneck, replacing disk I/O as the limiting factor. To address this challenge, we provide novel algorithms for increasing the throughput of Business Intelligence (BI) queries, as well as for ensuring fairness and avoiding starvation among a concurrent set of such queries. To maximize throughput, we propose a novel FullSharing scheme that allows all concurrent queries, when performing base-table I/O, to share the cache belonging to a given core. We then generalize this approach to a BatchSharing scheme that avoids thrashing on "agg-tables" ---hash tables that are used for aggregation processing---caused by execution of too many queries on a core. This scheme partitions queries into batches such that the working-set of agg-table entries for each batch can fit into a cache; an efficient sampling technique is used to estimate selectivities and working-set sizes for purposes of query partitioning. Finally, we use lottery-scheduling techniques to ensure fairness and impose a hard upper bound on staging time to avoid starvation. On our 8-core testbed, we were able to completely remove the memory I/O bottleneck, increasing throughput by a factor of 2 to 2.5, while also maintaining fairness and avoiding starvation.
- Intel Architecture Software Developers Manual, volume 2.Google Scholar
- D. J. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671--682. 2006. Google ScholarDigital Library
- M. Blasgen, J. Gray, M. Mitoma, and T. Price. The convoy phenomenon. SIGOPS Oper. Syst. Rev., 13(2):20--25, 1979. Google ScholarDigital Library
- J. Chang and G. S. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. In Proceedings of Supercomputing (SC), pages 242--252, 2007. Google ScholarDigital Library
- J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In SIGMOD, 2000. Google ScholarDigital Library
- S. Chen, P. B. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. E. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. C. Mowry, and C. Wilkerson. Scheduling Threads for Constructive Cache Sharing on CMPs. In Proceedings of SPAA, pages 105--115, 2007. Google ScholarDigital Library
- J. Cieslewicz and K. A. Ross. Adaptive Aggregation on Chip Multiprocessors. In VLDB, pages 339--350, 2007. Google ScholarDigital Library
- G. Dosa. The Tight Bound of First Fit Decreasing Bin-Packing Algorithm Is FFD(I)=(11/9)OPT(I)+6/9. In ESCAPE, 2007. Google ScholarDigital Library
- W. W. Esty. A normal limit law for a nonparametric estimator of the coverage of a random sample. Ann. Statist., 11(8):905--911, 1983.Google ScholarCross Ref
- P. J. Haas and L. Stokes. Estimating the number of classes in a finite population. J. Amer. Statist. Assoc., 93, 1998.Google Scholar
- S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, pages 487--498, 2006. Google ScholarDigital Library
- S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. QPipe: a simultaneously pipelined relational query engine. In SIGMOD, pages 383--394, 2005. Google ScholarDigital Library
- R. Johnson, N. Hardavellas, I. Pandis, N. Mancheril. S. Harizopoulos, K. Sabirli, A. Ailamaki, and B. Falsafi. To share or not to share? In VLDB, pages 351--362, 2007. Google ScholarDigital Library
- S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In PACT, pages 111--122, 2004. Google ScholarDigital Library
- S. Krishnamurthy, M. J. Franklin, J. M. Hellerstein, and G. Jacobson. The case for precision sharing. In VLDB, pages 972--984, 2004. Google ScholarDigital Library
- C. A. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. Increasing buffer-locality for multiple relational table scans through grouping and throttling. In ICDE, pages 1136--1145, 2007. Google ScholarDigital Library
- V. Raman and G. Swart. How to wring a table dry: Entropy compression of relations and querying of compressed relations. In VLDB, pages 858--869, 2006. Google ScholarDigital Library
- V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In ICDE, pages 60--69, 2008. Google ScholarDigital Library
- N. Roussopoulos. View indexing in relational databases. ACM Trans. Database Syst., 7(2):258--290, 1982. Google ScholarDigital Library
- C. A. Waldspurger and W. K. Weihl. Lottery scheduling: Flexible proportional-share resource management. In OSDI, pages 1--11, 1994. Google ScholarDigital Library
- M. Zukowski, S. Héman, N. Nes, and P. A. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS. In VLDB, pages 723--734, 2007. Google ScholarDigital Library
Index Terms
- Main-memory scan sharing for multi-core CPUs
Recommendations
Multi-core, main-memory joins: sort vs. hash revisited
In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-)hash join. The relative performance of these two join approaches have been a topic of discussion for a long ...
Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware
ICDE '13: Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)The architectural changes introduced with multi-core CPUs have triggered a redesign of main-memory join algorithms. In the last few years, two diverging views have appeared. One approach advocates careful tailoring of the algorithm to the architectural ...
An application-centric evaluation of OpenCL on multi-core CPUs
Although designed as a cross-platform parallel programming model, OpenCL remains mainly used for GPU programming. Nevertheless, a large amount of applications are parallelized, implemented, and eventually optimized in OpenCL. Thus, in this paper, we ...
Comments