skip to main content
10.1145/2485278.2485286acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

OLTP in wonderland: where do cache misses come from in major OLTP components?

Published:24 June 2013Publication History

ABSTRACT

For several decades, online transaction processing has been one of the main applications that drives innovations in the data management ecosystem, and in turn the database and computer architecture communities. Despite the novel approaches from industry and various research proposals from academia, recent studies emphasize that OLTP workloads still cannot exploit the full capability of modern processors.

To better integrate OLTP and hardware in future systems, we perform a detailed analysis of instruction and data misses, the main causes of memory stalls. We demonstrate which operations and components of a typical storage manager cause the majority of different types of misses in each level of the memory hierarchy on a configuration that closely represents modern commodity hardware. We also observe the impact of data working set size on these misses.

According to our experimental results, L1 instruction misses are an extensive cause of the overall stall time for OLTP even for data working set sizes as large as 100GB as long as the data fits in memory. Capacity misses coming from the index probe operation are the dominant cause of the instruction and data stalls when running typical OLTP workloads. During index probe (one of the most common operations in OLTP), the B-tree, lock, and buffer management components of a storage manager are responsible for more than half of the total misses.

References

  1. I. Atta, P. Tözün, A. Ailamaki, and A. Moshovos. SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads. In MICRO, pages 188--198, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. A. Barroso, K. Gharachorloo, and E. Bugnion. Memory System Characterization of Commercial Workloads. In ISCA, pages 3--14, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Chakraborty, P. M. Wells, and G. S. Sohi. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-Fly. In ASPLOS, pages 283--292, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Chen, P. B. Gibbons, T. C. Mowry, and G. Valentin. Fractal Prefetching B+-Trees: Optimizing Both Cache and Disk Performance. In SIGMOD, pages 157--168, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A Performance Counter Architecture for Computing Accurate CPI Components. In ASPLOS, pages 175--184, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In ASPLOS, pages 37--48, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Ferdman, C. Kaynak, and B. Falsafi. Proactive Instruction Fetch. In MICRO, pages 152--162, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y.-K. Chen, and P. Dubey. Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors. The VLDB Journal, 16(1):77--96, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. OLTP Through the Looking Glass, and What We Found There. In SIGMOD, pages 981--992, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. D. Hill and A. J. Smith. Evaluating Associativity in CPU Caches. IEEE TOCS, 38(12):1612--1630, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. IBM Breaks Double Digit Performance Barrier With 10 Million Transactions Per Minute, 2010. http://www-03.ibm.com/press/us/en/pressrelease/32328.wss.Google ScholarGoogle Scholar
  12. Intel 64 and IA-32 Architectures Optimization Reference Manual. http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf.Google ScholarGoogle Scholar
  13. R. Johnson, I. Pandis, and A. Ailamaki. Improving OLTP Scalability Using Speculative Lock Inheritance. PVLDB, 2(1):479--489, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. Shore-MT: A Scalable Storage Manager for the Multicore Era. In EDBT, pages 24--35, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Johnson, I. Pandis, R. Stoica, M. Athanassoulis, and A. Ailamaki. Aether: A Scalable Approach to Logging. PVLDB, 3:681--692, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance Characterization of a Quad Pentium Pro SMP Using OLTP Workloads. In ISCA, pages 15--26, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, pages 190--200, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Mohan and F. Levine. ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging. In SIGMOD, pages 371--380, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. SPARC Supercluster with 27 SPARC T3-4 Servers Demonstrates World Record Performance on TPC-C Benchmark, 2010. http://www.oracle.com/us/solutions/performance-scalability/t3-4-tpc-c-12210-bmark-190934.html.Google ScholarGoogle Scholar
  20. I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-Oriented Transaction Execution. PVLDB, 3(1):928--939, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. I. Pandis, P. Tözün, R. Johnson, and A. Ailamaki. PLP: Page Latch-Free Shared-Everything OLTP. PVLDB, 4(10):610--621, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Porobic, I. Pandis, M. Branco, P. Tözün, and A. Ailamaki. OLTP on Hardware Islands. PVLDB, 5(11):1447--1458, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Ramirez, L. A. Barroso, K. Gharachorloo, R. Cohn, J. Larriba-Pey, P. G. Lowney, and M. Valero. Code Layout Optimizations for Transaction Processing Workloads. In ISCA, pages 155--164, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors. In ASPLOS, pages 307--318, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shore-MT Official Website. http://diaswww.epfl.ch/shore-mt/.Google ScholarGoogle Scholar
  26. R. Stets, K. Gharachorloo, and L. Barroso. A Detailed Comparison of Two Transaction Processing Workloads. In WWC, pages 37--48, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The End of an Architectural Era: (It's Time for a Complete Rewrite). In VLDB, pages 1150--1160, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. T özün, I. Pandis, C. Kaynak, D. Jevdjic, and A. Ailamaki. From A to E: Analyzing TPC's OLTP Benchmarks -- The obsolete, the ubiquitous, the unexplored. In EDBT, pages 17--28, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Transcation Processing Performance Council (TPC). http://www.tpc.org.Google ScholarGoogle Scholar
  30. VoltDB. VoltDB, 2012. http://www.voltdb.com.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware
    June 2013
    65 pages
    ISBN:9781450321969
    DOI:10.1145/2485278

    Copyright © 2013 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 24 June 2013

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate80of102submissions,78%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader