ABSTRACT
As more and more query processing work can be done in main memory access is becoming a significant cost component of database operations. Recent database research has shown that most of the memory stalls are due to second-level cache data misses and first-level instruction cache misses. While a lot of research has focused on reducing the data cache misses, relatively little research has been done on improving the instruction cache performance of database systems.We first answer the question "Why does a database system incur so many instruction cache misses?" We demonstrate that current demand-pull pipelined query execution engines suffer from significant instruction cache thrashing between different operators. We propose techniques to buffer database operations during query execution to avoid instruction cache thrashing. We implement a new light-weight "buffer" operator and study various factors which may affect the cache performance. We also introduce a plan refinement algorithm that considers the query plan and decides whether it is beneficial to add additional "buffer" operators and where to put them. The benefit is mainly from better instruction locality and better hardware branch prediction. Our techniques can be easily integrated into current database systems without significant changes. Our experiments in a memory-resident PostgreSQL database system show that buffering techniques can reduce the number of instruction cache misses by up to 80% and improve query performance by up to 15%.
- LMbench - Tools for Performance Analysis. http://www.bitmover.com/lmbench/.Google Scholar
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving relations for cache performance. In Proceedings of VLDB Conference, 2001. Google ScholarDigital Library
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In Proceedings of VLDB conference, 1999. Google ScholarDigital Library
- M. Annavaram, J. M. Patel, and E. S. Davidson. Call graph prefetching for database applications. In Proceedings of International Symposium on High Performance Computer Architecture, 2001. Google ScholarDigital Library
- L. A. Barroso, K. Gharachorloo, and E. Bugnion. Memory system characterization of commercial workloads. In Proceedings of International Symposium on Computer Architecture, 1998. Google ScholarDigital Library
- P. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In Proceedings of VLDB Conference, 1999. Google ScholarDigital Library
- I.-C. K. Chen, C.-C. Lee, and T. N. Mudge. Instruction prefetching using branch prediction information. In Proceedings of International Symposium on Microarchitecture, 1997.Google Scholar
- S. Chen, P. B. Gibbons, and T. C. Mowry. Improving index performance through prefetching. In Proceedings of ACM SIGMOD Conference, 2001. Google ScholarDigital Library
- S. Chen, P. B. Gibbons, T. C. Mowry, and G. Valentin. Fractal prefetching B+-trees: Optimizing both cache and disk performance. In Proceedings of ACM SIGMOD Conference, 2002. Google ScholarDigital Library
- N. Gloy and M. D. Smith. Procedure placement using temporal-ordering information. ACM Transactions on Programming Languages and Systems, 21(5): 977--1027, 1999. Google ScholarDigital Library
- G. Graefe. Volcano, an extensible and parallel query evaluation system. IEEE Transactions on knowledge and data enginnering, 6(6):934--944, 1994.Google ScholarDigital Library
- L. M. Haas et al. Starburst mid-flight: as the dust clears. IEEE Transactions on knowledge and data engineering, 2(1):143, 1990. Google ScholarDigital Library
- A. H. Hashemi, D. R. Kaeli, and B. Calder. Efficient procedure mapping using cache line coloring. In SIGPLAN Conference on Programming Language Design and Implementation, 1997. Google ScholarDigital Library
- G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, 1st Quarter, 2001.Google Scholar
- Intel Corp. VTune performance analyzer. http://www.intel.com/software/products/vtune/.Google Scholar
- Intel Inc. IA32 intel architecture optimization reference manual. 2003.Google Scholar
- K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance characterization of a quad pentium pro smp using oltp workloads. In Proceedings of the 25th International Symposium on Computer Architecture, 1998. Google ScholarDigital Library
- K. Kim, S. K. Cha, and K. Kwon. Optimizing multidimensional index trees for main memory access. In Proceedings of ACM SIGMOD Conference, 2001. Google ScholarDigital Library
- C.-K. Luk and T. C. Mowry. Cooperative prefetching: Compiler and hardware support for effective instruction prefetching in modern processors. In International Symposium on Microarchitecture, 1998. Google ScholarDigital Library
- A. M. G. Maynard, C. M. Donnelly, and B. R. Olszewski. Contrasting characteristics and cache performance of technical and multi-user commercial workloads. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, 1994. Google ScholarDigital Library
- S. Padmanabhan, T. Malkemus, R. Agarwal, and A. Jhingran. Block oriented processing of relational database operations in modern computer architectures. In Proceedings of ICDE Conference, 2001. Google ScholarDigital Library
- S. E. Perl and R. L. Sites. Studies of windows nt performance using dynamic execution traces. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, 1996. Google ScholarDigital Library
- K. Pettis and R. C. Hansen. Profile guided code positioning. In Proceedings of ACM SIGPLAN conference, 1990 Google ScholarDigital Library
- A. Ramirez et al. Optimization of instruction fetch for decision support workloads. In Proceedings of the International Conference on Parallel Processing, 1999. Google ScholarDigital Library
- A. Ramirez et al. Code layout optmizations for transaction processing workloads. In Proceedings of International Symposium on Computer Architecture, 2001. Google ScholarDigital Library
- J. Rao and K. A. Ross. Cache conscious indexing for decision-support in main memory. In Proceedings of VLDB Conference, 1999. Google ScholarDigital Library
- J. Rao and K. A. Ross. Making B+ trees chache conscious in main memory. In Proceedings of ACM SIGMOD Conference, 2000. Google ScholarDigital Library
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access with selection in a relational database management system. In Proceedings of ACM SIGMOD Conference, 1979. Google ScholarDigital Library
- A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. In Proceedings of VLDB Conference, pages 510--521, 1994. Google ScholarDigital Library
- J. Smith and W. C. Hsu. Prefetching in supercomputer instruction caches. In Proceedings of Supercomputing, 1992. Google ScholarDigital Library
- M. Stonebraker, L. A. Rowe, and M. Hirohama. The implementation of Postgres. In Transactions on Knowledge and Data Engineering, 1990. Google ScholarDigital Library
- Transaction Processing Performance Council. TPC Benchmark H. Available via http://www.tpc.com/tpch/.Google Scholar
- J. Zhou and K. A. Ross. Buffering accesses to memory-resident index structures. In Proceedings of VLDB Conference, 2003. Google ScholarDigital Library
- Buffering databse operations for enhanced instruction cache performance
Recommendations
A Performance Study of Instruction Cache Prefetching Methods
Prefetching methods for instruction caches are studied via trace-driven simulation. The two primary methods are "fall-through" prefetch (sometimes referred to as "one block lookahead") and "target" prefetch. Fall-through prefetches are for sequential ...
Balanced Instruction Cache: Reducing Conflict Misses of Direct-Mapped Caches through Balanced Subarray Accesses
It is observed that the limited memory space of directmapped caches is not used in balance therefore incurs extra conflict misses. We propose a novel cache organization of a balanced cache, which balances accesses to cache sets at the granularity of ...
Comments