Abstract
Index join performance is determined by the efficiency of the lookup operation on the involved index. Although database indexes are highly optimized to leverage processor caches, main memory accesses inevitably increase lookup runtime when the index outsizes the last-level cache; hence, index join performance drops. Still, robust index join performance becomes possible with instruction stream interleaving: given a group of lookups, we can hide cache misses in one lookup with instructions from other lookups by switching among their respective instruction streams upon a cache miss.
In this paper, we propose interleaving with coroutines for any type of index join. We showcase our proposal on SAP HANA by implementing binary search and CSB+-tree traversal for an instance of index join related to dictionary compression. Coroutine implementations not only perform similarly to prior interleaving techniques, but also resemble the original code closely, while supporting both interleaved and non-interleaved execution. Thus, we claim that coroutines make interleaving practical for use in real DBMS codebases.
- C# Reference. https://msdn.microsoft.com/en-us/library/9k7k7cf0.aspx {Online; accessed 14-August-2017}.Google Scholar
- Generators. Python Wiki. https://wiki.python.org/ moin/Generators {Online; accessed 14-August-2017}.Google Scholar
- Programming languages --- C++ extensions for coroutines.Google Scholar
- Transaction Processing Performance Council. TPC-DS benchmark version 2.3.0. http://www.tpc.org/tpcds/ {Online; accessed 14-August-2017}.Google Scholar
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: where does time go? In Proc. VLDB, pages 266--277, 1999. Google ScholarDigital Library
- S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. ACM Trans. Database Syst., 32(3), 2007. Google ScholarDigital Library
- M. Colgan. Oracle Database In-Memory. Technical report, Oracle Corporation, 2015. http://www.oracle.com/technetwork/database/in-memory/overview/twp-oracle-database-in-memory-2245633.html.Google Scholar
- M. E. Conway. Design of a separable transition-diagram compiler. Commun. ACM, 6(7):396--408, 1963. Google ScholarDigital Library
- F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA Database - an architecture overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.Google Scholar
- S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, and M. Kersten. MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.Google Scholar
- Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual, June 2016.Google Scholar
- A. Kemper and T. Neumann. HyPer: a hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In Proc. ICDE, pages 195--206, 2011. Google ScholarDigital Library
- M. Kerrisk. The Linux Programming Interface: A Linux and UNIX System Programming Handbook. Google ScholarDigital Library
- C. Kim, J. Chhugani, N. Satish, E. Sedlar, A. D. Nguyen, T. Kaldewey, V. W. Lee, S. A. Brandt, and P. Dubey. FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In Proc. SIGMOD, pages 339--350, 2010. Google ScholarDigital Library
- O. Kocberber, B. Falsafi, and B. Grot. Asynchronous memory access chaining. PVLDB, 9(4):252--263, 2015. Google ScholarDigital Library
- M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. ASPLOS, 1991. Google ScholarDigital Library
- H. Lang, T. Mühlbauer, F. Funke, P. Boncz, T. Neumann, and A. Kemper. Data Blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. Proc. SIGMOD, 2016. Google ScholarDigital Library
- P.-A. Larson, C. Clinciu, C. Fraser, E. N. Hanson, M. Mokhtar, M. Nowakiewicz, V. Papadimos, S. L. Price, S. Rangarajan, R. Rusanu, and M. Saubhasik. Enhancements to SQL Server column stores. In Proc. SIGMOD, pages 1159--1168, 2013. Google ScholarDigital Library
- P.-A. Larson, C. Clinciu, E. N. Hanson, A. Oks, S. L. Price, S. Rangarajan, A. Surna, and Q. Zhou. SQL Server column store indexes. Proc. SIGMOD, pages 1177--1184, 2011. Google ScholarDigital Library
- S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing database architecture for the new bottleneck: memory access. VLDB Journal, 9(3):231--246, 2000. Google ScholarDigital Library
- J. Marusarz. Understanding how general exploration works in Intel VTune Amplifier XE, 2015. https://software.intel.com/en-us/articles/understanding-how-general-exploration-works-in-intel-vtune-amplifier-xe {Online; accessed 14-August-2017}.Google Scholar
- T. J. McCabe. A complexity measure. IEEE Trans. Softw. Eng., 2(4):308--320, July 1976. Google ScholarDigital Library
- J. Melton. Advanced SQL 1999: Understanding Object-Relational, and Other Advanced Features. Elsevier Science Inc., 2002. Google ScholarDigital Library
- A. L. D. Moura and R. Ierusalimschy. Revisiting coroutines. ACM Trans. Program. Lang. Syst., 31(2):6:1--6:31, Feb. 2009. Google ScholarDigital Library
- I. Müller, C. Ratsch, and F. Färber. Adaptive string dictionary compression in in-memory column-store database systems. In Proc. EDBT, pages 283--294, 2014.Google Scholar
- M. Poess and D. Potapov. Data compression in Oracle. Proc. VLDB, pages 937--947, 2003. Google ScholarDigital Library
- V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. DB2 with BLU Acceleration: so much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
- J. Rao and K. A. Ross. Making B+-trees cache conscious in main memory. In Proc. ACM SIGMOD, pages 475--486, 2000. Google ScholarDigital Library
- RethinkDB Team. Improving a large C++ project with coroutines, 2010. https://www.rethinkdb.com/blog/improving-a-large-c-project-with-coroutines/ {Online; accessed 14-August-2016}.Google Scholar
- M. E. Russinovich, D. A. Solomon, and A. Ionescu. Windows Internals, Part 1: Covering Windows Server 2008 R2 and Windows 7. Microsoft Press, 6th edition, 2012. Google ScholarDigital Library
- J. Zhou, J. Cieslewicz, K. A. Ross, and M. Shah. Improving database performance on simultaneous multithreading processors. In Proc. VLDB, pages 49--60, 2005. Google ScholarDigital Library
- J. Zhou and K. A. Ross. Buffering accesses to memory-resident index structures. In Proc. VLDB, pages 405--416, 2003. Google ScholarDigital Library
Recommendations
Exploiting coroutines to attack the "killer nanoseconds"
Database systems use many pointer-based data structures, including hash tables and B+-trees, which require extensive "pointer-chasing." Each pointer dereference, e.g., during a hash probe or a B+-tree traversal, can result in a CPU cache miss, stalling ...
Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins
AbstractIndex joins present a case of pointer-chasing code that causes data cache misses. In principle, we can hide these cache misses by overlapping them with computation: The lookups involved in an index join are parallel tasks whose execution can be ...
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systemsThere is an increasing trend to use commodity microprocessors as the compute engines in large-scale multiprocessors. However, given that the majority of the microprocessors are sold in the workstation market, not in the multiprocessor market, it is only ...
Comments