ABSTRACT
For several decades, online transaction processing has been one of the main applications that drives innovations in the data management ecosystem, and in turn the database and computer architecture communities. Despite the novel approaches from industry and various research proposals from academia, recent studies emphasize that OLTP workloads still cannot exploit the full capability of modern processors.
To better integrate OLTP and hardware in future systems, we perform a detailed analysis of instruction and data misses, the main causes of memory stalls. We demonstrate which operations and components of a typical storage manager cause the majority of different types of misses in each level of the memory hierarchy on a configuration that closely represents modern commodity hardware. We also observe the impact of data working set size on these misses.
According to our experimental results, L1 instruction misses are an extensive cause of the overall stall time for OLTP even for data working set sizes as large as 100GB as long as the data fits in memory. Capacity misses coming from the index probe operation are the dominant cause of the instruction and data stalls when running typical OLTP workloads. During index probe (one of the most common operations in OLTP), the B-tree, lock, and buffer management components of a storage manager are responsible for more than half of the total misses.
- I. Atta, P. Tözün, A. Ailamaki, and A. Moshovos. SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads. In MICRO, pages 188--198, 2012. Google ScholarDigital Library
- L. A. Barroso, K. Gharachorloo, and E. Bugnion. Memory System Characterization of Commercial Workloads. In ISCA, pages 3--14, 1998. Google ScholarDigital Library
- K. Chakraborty, P. M. Wells, and G. S. Sohi. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-Fly. In ASPLOS, pages 283--292, 2006. Google ScholarDigital Library
- S. Chen, P. B. Gibbons, T. C. Mowry, and G. Valentin. Fractal Prefetching B+-Trees: Optimizing Both Cache and Disk Performance. In SIGMOD, pages 157--168, 2002. Google ScholarDigital Library
- S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A Performance Counter Architecture for Computing Accurate CPI Components. In ASPLOS, pages 175--184, 2006. Google ScholarDigital Library
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In ASPLOS, pages 37--48, 2012. Google ScholarDigital Library
- M. Ferdman, C. Kaynak, and B. Falsafi. Proactive Instruction Fetch. In MICRO, pages 152--162, 2011. Google ScholarDigital Library
- A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y.-K. Chen, and P. Dubey. Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors. The VLDB Journal, 16(1):77--96, 2007. Google ScholarDigital Library
- S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. OLTP Through the Looking Glass, and What We Found There. In SIGMOD, pages 981--992, 2008. Google ScholarDigital Library
- M. D. Hill and A. J. Smith. Evaluating Associativity in CPU Caches. IEEE TOCS, 38(12):1612--1630, 1989. Google ScholarDigital Library
- IBM Breaks Double Digit Performance Barrier With 10 Million Transactions Per Minute, 2010. http://www-03.ibm.com/press/us/en/pressrelease/32328.wss.Google Scholar
- Intel 64 and IA-32 Architectures Optimization Reference Manual. http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf.Google Scholar
- R. Johnson, I. Pandis, and A. Ailamaki. Improving OLTP Scalability Using Speculative Lock Inheritance. PVLDB, 2(1):479--489, 2009. Google ScholarDigital Library
- R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. Shore-MT: A Scalable Storage Manager for the Multicore Era. In EDBT, pages 24--35, 2009. Google ScholarDigital Library
- R. Johnson, I. Pandis, R. Stoica, M. Athanassoulis, and A. Ailamaki. Aether: A Scalable Approach to Logging. PVLDB, 3:681--692, 2010. Google ScholarDigital Library
- K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance Characterization of a Quad Pentium Pro SMP Using OLTP Workloads. In ISCA, pages 15--26, 1998. Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, pages 190--200, 2005. Google ScholarDigital Library
- C. Mohan and F. Levine. ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging. In SIGMOD, pages 371--380, 1992. Google ScholarDigital Library
- SPARC Supercluster with 27 SPARC T3-4 Servers Demonstrates World Record Performance on TPC-C Benchmark, 2010. http://www.oracle.com/us/solutions/performance-scalability/t3-4-tpc-c-12210-bmark-190934.html.Google Scholar
- I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-Oriented Transaction Execution. PVLDB, 3(1):928--939, 2010. Google ScholarDigital Library
- I. Pandis, P. Tözün, R. Johnson, and A. Ailamaki. PLP: Page Latch-Free Shared-Everything OLTP. PVLDB, 4(10):610--621, 2011. Google ScholarDigital Library
- D. Porobic, I. Pandis, M. Branco, P. Tözün, and A. Ailamaki. OLTP on Hardware Islands. PVLDB, 5(11):1447--1458, 2012. Google ScholarDigital Library
- A. Ramirez, L. A. Barroso, K. Gharachorloo, R. Cohn, J. Larriba-Pey, P. G. Lowney, and M. Valero. Code Layout Optimizations for Transaction Processing Workloads. In ISCA, pages 155--164, 2001. Google ScholarDigital Library
- P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors. In ASPLOS, pages 307--318, 1998. Google ScholarDigital Library
- Shore-MT Official Website. http://diaswww.epfl.ch/shore-mt/.Google Scholar
- R. Stets, K. Gharachorloo, and L. Barroso. A Detailed Comparison of Two Transaction Processing Workloads. In WWC, pages 37--48, 2002.Google ScholarCross Ref
- M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The End of an Architectural Era: (It's Time for a Complete Rewrite). In VLDB, pages 1150--1160, 2007. Google ScholarDigital Library
- P. T özün, I. Pandis, C. Kaynak, D. Jevdjic, and A. Ailamaki. From A to E: Analyzing TPC's OLTP Benchmarks -- The obsolete, the ubiquitous, the unexplored. In EDBT, pages 17--28, 2013. Google ScholarDigital Library
- Transcation Processing Performance Council (TPC). http://www.tpc.org.Google Scholar
- VoltDB. VoltDB, 2012. http://www.voltdb.com.Google Scholar
Recommendations
Micro-architectural Analysis of In-memory OLTP
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataMicro-architectural behavior of traditional disk-based online transaction processing (OLTP) systems has been investigated extensively over thepast couple of decades. Results show that traditional OLTP mostly under-utilize the available micro-...
A case for shared instruction cache on chip multiprocessors running OLTP
Special issue: MEDEA-2003 workshopDue to their large code footprint, OLTP workloads suffer from significant I-cache miss rates on contemporary microprocessors. This paper analyzes the I-stream behavior of an OLTP workload, called the Oracle Database Benchmark (ODB), on Chip-...
A case for shared instruction cache on chip multiprocessors running OLTP
MEDEA '03: Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architectureDue to their large code footprint, OLTP workloads suffer from significant I-cache miss rates on contemporary microprocessors. This paper analyzes the I-stream behavior of an OLTP workload, called the Oracle Database Benchmark (ODB), on Chip-...
Comments