research-article

HPCache: Memory-Efficient OLAP Through Proportional Caching

Authors:
Hamish Nicholson

EPFL, CH

EPFL, CH
View Profile

,
Periklis Chrysogelos

EPFL, CH

EPFL, CH
View Profile

,
Anastasia Ailamaki

EPFL, Switzerland and RAW Labs SA, Switzerland

EPFL, Switzerland and RAW Labs SA, Switzerland
View Profile

DaMoN '22: Proceedings of the 18th International Workshop on Data Management on New HardwareJune 2022Article No.: 7Pages 1–9https://doi.org/10.1145/3533737.3535100

Published:13 June 2022Publication History

DaMoN '22: Proceedings of the 18th International Workshop on Data Management on New Hardware

Pages 1–9

ABSTRACT

Analytical engines rely on in-memory caching to avoid disk accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- & time-based caching decisions, however, are a proxy of the expected query execution speedup only when disk accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach or even outperform fully in-memory query execution response times, rendering purely frequency-based statistics incapable of capturing impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. As a result, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics.

This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on their speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that i) estimating speedup potential improves memory space utilization, and ii) simple runtime statistics suffice to infer speedup expectations. We show that HPCache achieves up to 12% faster query execution over state-of-the-art caching policies, or 75% less in-memory cache footprint without deteriorating query performance. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without any requirement for workload predictions.

References

Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd. 2018. A rewriting system for convex optimization problems. Journal of Control and Decision 5, 1 (2018), 42–60.Google ScholarCross Ref
Manos Athanassoulis, Kenneth S. Bøgh, and Stratos Idreos. 2019. Optimal column layout for hybrid workloads. Proceedings of the VLDB Endowment 12, 13 (Sept. 2019), 2393–2407. https://doi.org/10.14778/3358701.3358707Google ScholarDigital Library
Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. In SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 168–180. https://doi.org/10.1145/3448016.3452831Google ScholarDigital Library
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, Timos K. Sellis, Renée J. Miller, Anastasios Kementsietsidis, and Yannis Velegrakis (Eds.). ACM, 37–48. https://doi.org/10.1145/1989323.1989328Google ScholarDigital Library
Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the memory wall in MonetDB. Commun. ACM 51, 12 (Dec. 2008), 77–85. https://doi.org/10.1145/1409360.1409380Google ScholarDigital Library
Peter A. Boncz, Stefan Manegold, and Martin L. Kersten. 1999. Database architecture optimized for the new bottleneck: Memory access. In VLDB’99, proceedings of 25th international conference on very large data bases, september 7-10, 1999, edinburgh, scotland, UK, Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie (Eds.). Morgan Kaufmann, 54–65. http://www.vldb.org/conf/1999/P5.pdf tex.bibsource: dblp computer science bibliography, https://dblp.org tex.biburl: https://dblp.org/rec/conf/vldb/BonczMK99.bib tex.timestamp: Wed, 11 May 2022 08:53:25 +0200.Google Scholar
Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Second Biennial Conference on Innovative Data Systems Research, CIDR 2005, Asilomar, CA, USA, January 4-7, 2005, Online Proceedings. CIDR, 225–237. http://cidrdb.org/cidr2005/papers/P19.pdfGoogle Scholar
Renata Borovica-Gajic, Raja Appuswamy, and Anastasia Ailamaki. 2016. Cheap data analytics using cold storage devices. Proc. VLDB Endow. 9, 12 (2016), 1029–1040. https://doi.org/10.14778/2994509.2994521Google ScholarDigital Library
Mustafa Canim, George A. Mihaila, Bishwaranjan Bhattacharjee, Kenneth A. Ross, and Christian A. Lang. 2010. SSD Bufferpool Extensions for Database Systems. Proc. VLDB Endow. 3, 2 (2010), 1435–1446. https://doi.org/10.14778/1920841.1921017Google ScholarDigital Library
Hong-Tai Chou and David J. DeWitt. 1986. An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica 1, 3 (1986), 311–336. https://doi.org/10.1007/BF01840450Google ScholarDigital Library
Periklis Chrysogelos, Manos Karpathiotakis, Raja Appuswamy, and Anastasia Ailamaki. 2019. HetExchange: encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines. Proceedings of the VLDB Endowment 12, 5 (Jan. 2019), 544–556. https://doi.org/10.14778/3303753.3303760Google ScholarDigital Library
Andrew Crotty, Viktor Leis, and Andrew Pavlo. 2022. Are You Sure You Want to Use MMAP in Your Database Management System?. In {CIDR} 2022, Conference on Innovative Data Systems Research. 7.Google Scholar
Advanced Micro Devices. 2021. AMD EPYC™ 7003 SERIES Data Sheet. Advanced Micro Devices. https://www.amd.com/system/files/documents/amd-epyc-7003-series-datasheet.pdfGoogle Scholar
Steven Diamond and Stephen Boyd. 2016. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research 17, 83 (2016), 1–5.Google ScholarDigital Library
Jaeyoung Do, Ivan Luiz Picoli, David B. Lomet, and Philippe Bonnet. 2021. Better database cost/performance via batched I/O on programmable SSD. VLDB J. 30, 3 (2021), 403–424. https://doi.org/10.1007/s00778-020-00648-zGoogle ScholarDigital Library
Hadi Esmaeilzadeh, Emily R. Blem, Renée St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2012. Dark Silicon and the End of Multicore Scaling. IEEE Micro 32, 3 (2012), 122–134. https://doi.org/10.1109/MM.2012.17Google ScholarDigital Library
Goetz Graefe, Haris Volos, Hideaki Kimura, Harumi Kuno, Joseph Tucek, Mark Lillibridge, and Alistair Veitch. 2014. In-memory performance for big data. Proceedings of the VLDB Endowment 8, 1 (Sept. 2014), 37–48. https://doi.org/10.14778/2735461.2735465Google ScholarDigital Library
Stratos Idreos, F. Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35 (Jan. 2012), 40–45. http://sites.computer.org/debull/A12mar/monetdb.pdfGoogle Scholar
Ryan Johnson, Ippokratis Pandis, Nikos Hardavellas, Anastasia Ailamaki, and Babak Falsafi. 2009. Shore-MT: a scalable storage manager for the multicore era. In Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT ’09. ACM Press, Saint Petersburg, Russia, 24. https://doi.org/10.1145/1516360.1516365Google ScholarDigital Library
Theodore Johnson and Dennis E. Shasha. 1994. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile, Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo (Eds.). Morgan Kaufmann, 439–450. http://www.vldb.org/conf/1994/P439.PDFGoogle Scholar
Manos Karpathiotakis, Ioannis Alagiannis, and Anastasia Ailamaki. 2016. Fast queries over heterogeneous data through engine customization. Proceedings of the VLDB Endowment 9, 12 (Aug. 2016), 972–983. https://doi.org/10.14778/2994509.2994516Google ScholarDigital Library
Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, Paris, 185–196. https://doi.org/10.1109/ICDE.2018.00026Google ScholarCross Ref
John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/A continually updated technical report. http://www.cs.virginia.edu/stream/.Google Scholar
John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19–25.Google Scholar
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (2011), 539–550. https://doi.org/10.14778/2002938.2002940Google ScholarDigital Library
Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. CIDR. http://cidrdb.org/cidr2020/papers/p29-neumann-cidr20.pdfGoogle Scholar
Chris Nyberg. 1984. Disk scheduling and cache replacement for a database machine. Master’s thesis. UC Berkeley.Google Scholar
Patrick O’Neil, Elizabeth O’Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In Performance Evaluation and Benchmarking(Lecture Notes in Computer Science), Raghunath Nambiar and Meikel Poess (Eds.). Springer, Berlin, Heidelberg, 237–252. https://doi.org/10.1007/978-3-642-10424-4_17Google ScholarDigital Library
Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. Adaptive HTAP through Elastic Resource Scheduling. arXiv:2004.05437 [cs, eess] (April 2020). http://arxiv.org/abs/2004.05437 arXiv:2004.05437.Google Scholar
Allen Reiter. 1976. A Study of Buffer Management Policies for Data Management Systems.Technical Report. WISCONSIN UNIV MADISON MATHEMATICS RESEARCH CENTER.Google Scholar
Giovanni Maria Sacco and Mario Schkolnick. 1982. A Mechanism for Managing the Buffer Pool in a Relational Database System Using the Hot Set Model. In Eigth International Conference on Very Large Data Bases, September 8-10, 1982, Mexico City, Mexico, Proceedings. Morgan Kaufmann, 257–262. http://www.vldb.org/conf/1982/P257.PDFGoogle ScholarDigital Library
Allen Samuels. 2016. The Consequences of Infinite Storage Bandwidth. (2016). https://events.static.linuxfound.org/sites/events/files/slides/Keynote_Allen%20Samuels_Final.pdf Vault Linux Storage & Filesystems Conference.Google Scholar
Utku Sirin and Anastasia Ailamaki. 2020. Micro-architectural Analysis of OLAP: Limitations and Opportunities. Proc. VLDB Endow. 13, 6 (2020), 840–853. https://doi.org/10.14778/3380750.3380755Google ScholarDigital Library
Michael Stonebraker, John Woodfill, Jeff Ranstrom, Marguerite C. Murphy, Marc Meyer, and Eric Allman. 1983. Performance Enhancements to a Relational Database System. ACM Trans. Database Syst. 8, 2 (1983), 167–185. https://doi.org/10.1145/319983.319984Google ScholarDigital Library
Lukas Vogel, Viktor Leis, Alexander van Renen, Thomas Neumann, Satoshi Imamura, and Alfons Kemper. 2020. Mosaic: a budget-conscious storage engine for relational database systems. Proceedings of the VLDB Endowment 13, 12 (Aug. 2020), 2662–2675. https://doi.org/10.14778/3407790.3407852Google ScholarDigital Library
Yinghao Yu, Wei Wang, Jun Zhang, and Khaled Ben Letaief. 2017. LRC: Dependency-aware cache management for data analytics clusters. In 2017 IEEE Conference on Computer Communications, INFOCOM 2017, Atlanta, GA, USA, May 1-4, 2017. IEEE, 1–9. https://doi.org/10.1109/INFOCOM.2017.8057007Google ScholarCross Ref
Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge and Data Engineering 27, 7 (July 2015), 1920–1948. https://doi.org/10.1109/TKDE.2015.2427795Google ScholarDigital Library

Recommendations

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer Architecture

On-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
Read More
An integrated prefetching/caching scheme in multimedia servers

Advances in storage and networking technologies have made on-demand multimedia services prevalent these days. To provide numerous concurrent users with such high-quality services, it is essential for storage systems to support sufficient I/O bandwidth ...
Read More
FrozenHot Cache: Rethinking Cache Management for Modern Hardware
EuroSys '23: Proceedings of the Eighteenth European Conference on Computer Systems

Caching is crucial for accelerating data access, employed as a ubiquitous design in modern systems at many parts of computer systems. With increasing core count, and shrinking latency gap between cache and modern storage devices, hit-path scalability ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DaMoN '22: Proceedings of the 18th International Workshop on Data Management on New Hardware
June 2022
83 pages
ISBN:9781450393782
DOI:10.1145/3533737
Editors:
Spyros Blanas
The Ohio State University
,
Norman May
SAP SE
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
NVMe
analytics
buffer management
caching
storage
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
DaMoN '22 Paper Acceptance Rate12of18submissions,67%Overall Acceptance Rate80of102submissions,78%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 359
  Total Downloads
- Downloads (Last 12 months)121
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

HPCache: Memory-Efficient OLAP Through Proportional Caching

DaMoN '22: Proceedings of the 18th International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies

An integrated prefetching/caching scheme in multimedia servers

FrozenHot Cache: Rethinking Cache Management for Modern Hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

HPCache: Memory-Efficient OLAP Through Proportional Caching

DaMoN '22: Proceedings of the 18th International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Recommendations

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies

An integrated prefetching/caching scheme in multimedia servers

FrozenHot Cache: Rethinking Cache Management for Modern Hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media