research-article

OLTP in wonderland: where do cache misses come from in major OLTP components?

Authors:
Pınar Tözün

EPFL

EPFL
View Profile

,
Brian Gold

Oracle Labs

Oracle Labs
View Profile

,
Anastasia Ailamaki

EPFL

EPFL
View Profile

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New HardwareJune 2013Article No.: 8Pages 1–6https://doi.org/10.1145/2485278.2485286

Published:24 June 2013Publication History

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware

Pages 1–6

ABSTRACT

For several decades, online transaction processing has been one of the main applications that drives innovations in the data management ecosystem, and in turn the database and computer architecture communities. Despite the novel approaches from industry and various research proposals from academia, recent studies emphasize that OLTP workloads still cannot exploit the full capability of modern processors.

To better integrate OLTP and hardware in future systems, we perform a detailed analysis of instruction and data misses, the main causes of memory stalls. We demonstrate which operations and components of a typical storage manager cause the majority of different types of misses in each level of the memory hierarchy on a configuration that closely represents modern commodity hardware. We also observe the impact of data working set size on these misses.

According to our experimental results, L1 instruction misses are an extensive cause of the overall stall time for OLTP even for data working set sizes as large as 100GB as long as the data fits in memory. Capacity misses coming from the index probe operation are the dominant cause of the instruction and data stalls when running typical OLTP workloads. During index probe (one of the most common operations in OLTP), the B-tree, lock, and buffer management components of a storage manager are responsible for more than half of the total misses.

References

I. Atta, P. Tözün, A. Ailamaki, and A. Moshovos. SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads. In MICRO, pages 188--198, 2012. Google ScholarDigital Library
L. A. Barroso, K. Gharachorloo, and E. Bugnion. Memory System Characterization of Commercial Workloads. In ISCA, pages 3--14, 1998. Google ScholarDigital Library
K. Chakraborty, P. M. Wells, and G. S. Sohi. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-Fly. In ASPLOS, pages 283--292, 2006. Google ScholarDigital Library
S. Chen, P. B. Gibbons, T. C. Mowry, and G. Valentin. Fractal Prefetching B+-Trees: Optimizing Both Cache and Disk Performance. In SIGMOD, pages 157--168, 2002. Google ScholarDigital Library
S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A Performance Counter Architecture for Computing Accurate CPI Components. In ASPLOS, pages 175--184, 2006. Google ScholarDigital Library
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In ASPLOS, pages 37--48, 2012. Google ScholarDigital Library
M. Ferdman, C. Kaynak, and B. Falsafi. Proactive Instruction Fetch. In MICRO, pages 152--162, 2011. Google ScholarDigital Library
A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y.-K. Chen, and P. Dubey. Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors. The VLDB Journal, 16(1):77--96, 2007. Google ScholarDigital Library
S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. OLTP Through the Looking Glass, and What We Found There. In SIGMOD, pages 981--992, 2008. Google ScholarDigital Library
M. D. Hill and A. J. Smith. Evaluating Associativity in CPU Caches. IEEE TOCS, 38(12):1612--1630, 1989. Google ScholarDigital Library
IBM Breaks Double Digit Performance Barrier With 10 Million Transactions Per Minute, 2010. http://www-03.ibm.com/press/us/en/pressrelease/32328.wss.Google Scholar
Intel 64 and IA-32 Architectures Optimization Reference Manual. http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf.Google Scholar
R. Johnson, I. Pandis, and A. Ailamaki. Improving OLTP Scalability Using Speculative Lock Inheritance. PVLDB, 2(1):479--489, 2009. Google ScholarDigital Library
R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. Shore-MT: A Scalable Storage Manager for the Multicore Era. In EDBT, pages 24--35, 2009. Google ScholarDigital Library
R. Johnson, I. Pandis, R. Stoica, M. Athanassoulis, and A. Ailamaki. Aether: A Scalable Approach to Logging. PVLDB, 3:681--692, 2010. Google ScholarDigital Library
K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance Characterization of a Quad Pentium Pro SMP Using OLTP Workloads. In ISCA, pages 15--26, 1998. Google ScholarDigital Library
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, pages 190--200, 2005. Google ScholarDigital Library
C. Mohan and F. Levine. ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging. In SIGMOD, pages 371--380, 1992. Google ScholarDigital Library
SPARC Supercluster with 27 SPARC T3-4 Servers Demonstrates World Record Performance on TPC-C Benchmark, 2010. http://www.oracle.com/us/solutions/performance-scalability/t3-4-tpc-c-12210-bmark-190934.html.Google Scholar
I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-Oriented Transaction Execution. PVLDB, 3(1):928--939, 2010. Google ScholarDigital Library
I. Pandis, P. Tözün, R. Johnson, and A. Ailamaki. PLP: Page Latch-Free Shared-Everything OLTP. PVLDB, 4(10):610--621, 2011. Google ScholarDigital Library
D. Porobic, I. Pandis, M. Branco, P. Tözün, and A. Ailamaki. OLTP on Hardware Islands. PVLDB, 5(11):1447--1458, 2012. Google ScholarDigital Library
A. Ramirez, L. A. Barroso, K. Gharachorloo, R. Cohn, J. Larriba-Pey, P. G. Lowney, and M. Valero. Code Layout Optimizations for Transaction Processing Workloads. In ISCA, pages 155--164, 2001. Google ScholarDigital Library
P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors. In ASPLOS, pages 307--318, 1998. Google ScholarDigital Library
Shore-MT Official Website. http://diaswww.epfl.ch/shore-mt/.Google Scholar
R. Stets, K. Gharachorloo, and L. Barroso. A Detailed Comparison of Two Transaction Processing Workloads. In WWC, pages 37--48, 2002.Google ScholarCross Ref
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The End of an Architectural Era: (It's Time for a Complete Rewrite). In VLDB, pages 1150--1160, 2007. Google ScholarDigital Library
P. T özün, I. Pandis, C. Kaynak, D. Jevdjic, and A. Ailamaki. From A to E: Analyzing TPC's OLTP Benchmarks -- The obsolete, the ubiquitous, the unexplored. In EDBT, pages 17--28, 2013. Google ScholarDigital Library
Transcation Processing Performance Council (TPC). http://www.tpc.org.Google Scholar
VoltDB. VoltDB, 2012. http://www.voltdb.com.Google Scholar

Recommendations

Micro-architectural Analysis of In-memory OLTP
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Micro-architectural behavior of traditional disk-based online transaction processing (OLTP) systems has been investigated extensively over thepast couple of decades. Results show that traditional OLTP mostly under-utilize the available micro-...
Read More
A case for shared instruction cache on chip multiprocessors running OLTP
Special issue: MEDEA-2003 workshop

Due to their large code footprint, OLTP workloads suffer from significant I-cache miss rates on contemporary microprocessors. This paper analyzes the I-stream behavior of an OLTP workload, called the Oracle Database Benchmark (ODB), on Chip-...
Read More
A case for shared instruction cache on chip multiprocessors running OLTP
MEDEA '03: Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture

Due to their large code footprint, OLTP workloads suffer from significant I-cache miss rates on contemporary microprocessors. This paper analyzes the I-stream behavior of an OLTP workload, called the Oracle Database Benchmark (ODB), on Chip-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware
June 2013
65 pages
ISBN:9781450321969
DOI:10.1145/2485278
Conference Chairs:
Ryan Johnson
University of Toronto
,
Alfons Kemper
Technische Universität München
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate80of102submissions,78%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 268
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

OLTP in wonderland: where do cache misses come from in major OLTP components?

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

Micro-architectural Analysis of In-memory OLTP

A case for shared instruction cache on chip multiprocessors running OLTP

A case for shared instruction cache on chip multiprocessors running OLTP

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

OLTP in wonderland: where do cache misses come from in major OLTP components?

DaMoN '13: Proceedings of the Ninth International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

Micro-architectural Analysis of In-memory OLTP

A case for shared instruction cache on chip multiprocessors running OLTP

A case for shared instruction cache on chip multiprocessors running OLTP

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media