ABSTRACT
Analytical engines rely on in-memory caching to avoid disk accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- & time-based caching decisions, however, are a proxy of the expected query execution speedup only when disk accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach or even outperform fully in-memory query execution response times, rendering purely frequency-based statistics incapable of capturing impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. As a result, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics.
This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on their speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that i) estimating speedup potential improves memory space utilization, and ii) simple runtime statistics suffice to infer speedup expectations. We show that HPCache achieves up to 12% faster query execution over state-of-the-art caching policies, or 75% less in-memory cache footprint without deteriorating query performance. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without any requirement for workload predictions.
- Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd. 2018. A rewriting system for convex optimization problems. Journal of Control and Decision 5, 1 (2018), 42–60.Google ScholarCross Ref
- Manos Athanassoulis, Kenneth S. Bøgh, and Stratos Idreos. 2019. Optimal column layout for hybrid workloads. Proceedings of the VLDB Endowment 12, 13 (Sept. 2019), 2393–2407. https://doi.org/10.14778/3358701.3358707Google ScholarDigital Library
- Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. In SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 168–180. https://doi.org/10.1145/3448016.3452831Google ScholarDigital Library
- Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, Timos K. Sellis, Renée J. Miller, Anastasios Kementsietsidis, and Yannis Velegrakis (Eds.). ACM, 37–48. https://doi.org/10.1145/1989323.1989328Google ScholarDigital Library
- Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the memory wall in MonetDB. Commun. ACM 51, 12 (Dec. 2008), 77–85. https://doi.org/10.1145/1409360.1409380Google ScholarDigital Library
- Peter A. Boncz, Stefan Manegold, and Martin L. Kersten. 1999. Database architecture optimized for the new bottleneck: Memory access. In VLDB’99, proceedings of 25th international conference on very large data bases, september 7-10, 1999, edinburgh, scotland, UK, Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie (Eds.). Morgan Kaufmann, 54–65. http://www.vldb.org/conf/1999/P5.pdf tex.bibsource: dblp computer science bibliography, https://dblp.org tex.biburl: https://dblp.org/rec/conf/vldb/BonczMK99.bib tex.timestamp: Wed, 11 May 2022 08:53:25 +0200.Google Scholar
- Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Second Biennial Conference on Innovative Data Systems Research, CIDR 2005, Asilomar, CA, USA, January 4-7, 2005, Online Proceedings. CIDR, 225–237. http://cidrdb.org/cidr2005/papers/P19.pdfGoogle Scholar
- Renata Borovica-Gajic, Raja Appuswamy, and Anastasia Ailamaki. 2016. Cheap data analytics using cold storage devices. Proc. VLDB Endow. 9, 12 (2016), 1029–1040. https://doi.org/10.14778/2994509.2994521Google ScholarDigital Library
- Mustafa Canim, George A. Mihaila, Bishwaranjan Bhattacharjee, Kenneth A. Ross, and Christian A. Lang. 2010. SSD Bufferpool Extensions for Database Systems. Proc. VLDB Endow. 3, 2 (2010), 1435–1446. https://doi.org/10.14778/1920841.1921017Google ScholarDigital Library
- Hong-Tai Chou and David J. DeWitt. 1986. An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica 1, 3 (1986), 311–336. https://doi.org/10.1007/BF01840450Google ScholarDigital Library
- Periklis Chrysogelos, Manos Karpathiotakis, Raja Appuswamy, and Anastasia Ailamaki. 2019. HetExchange: encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines. Proceedings of the VLDB Endowment 12, 5 (Jan. 2019), 544–556. https://doi.org/10.14778/3303753.3303760Google ScholarDigital Library
- Andrew Crotty, Viktor Leis, and Andrew Pavlo. 2022. Are You Sure You Want to Use MMAP in Your Database Management System?. In {CIDR} 2022, Conference on Innovative Data Systems Research. 7.Google Scholar
- Advanced Micro Devices. 2021. AMD EPYC™ 7003 SERIES Data Sheet. Advanced Micro Devices. https://www.amd.com/system/files/documents/amd-epyc-7003-series-datasheet.pdfGoogle Scholar
- Steven Diamond and Stephen Boyd. 2016. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research 17, 83 (2016), 1–5.Google ScholarDigital Library
- Jaeyoung Do, Ivan Luiz Picoli, David B. Lomet, and Philippe Bonnet. 2021. Better database cost/performance via batched I/O on programmable SSD. VLDB J. 30, 3 (2021), 403–424. https://doi.org/10.1007/s00778-020-00648-zGoogle ScholarDigital Library
- Hadi Esmaeilzadeh, Emily R. Blem, Renée St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2012. Dark Silicon and the End of Multicore Scaling. IEEE Micro 32, 3 (2012), 122–134. https://doi.org/10.1109/MM.2012.17Google ScholarDigital Library
- Goetz Graefe, Haris Volos, Hideaki Kimura, Harumi Kuno, Joseph Tucek, Mark Lillibridge, and Alistair Veitch. 2014. In-memory performance for big data. Proceedings of the VLDB Endowment 8, 1 (Sept. 2014), 37–48. https://doi.org/10.14778/2735461.2735465Google ScholarDigital Library
- Stratos Idreos, F. Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35 (Jan. 2012), 40–45. http://sites.computer.org/debull/A12mar/monetdb.pdfGoogle Scholar
- Ryan Johnson, Ippokratis Pandis, Nikos Hardavellas, Anastasia Ailamaki, and Babak Falsafi. 2009. Shore-MT: a scalable storage manager for the multicore era. In Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT ’09. ACM Press, Saint Petersburg, Russia, 24. https://doi.org/10.1145/1516360.1516365Google ScholarDigital Library
- Theodore Johnson and Dennis E. Shasha. 1994. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile, Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo (Eds.). Morgan Kaufmann, 439–450. http://www.vldb.org/conf/1994/P439.PDFGoogle Scholar
- Manos Karpathiotakis, Ioannis Alagiannis, and Anastasia Ailamaki. 2016. Fast queries over heterogeneous data through engine customization. Proceedings of the VLDB Endowment 9, 12 (Aug. 2016), 972–983. https://doi.org/10.14778/2994509.2994516Google ScholarDigital Library
- Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, Paris, 185–196. https://doi.org/10.1109/ICDE.2018.00026Google ScholarCross Ref
- John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/A continually updated technical report. http://www.cs.virginia.edu/stream/.Google Scholar
- John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19–25.Google Scholar
- Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (2011), 539–550. https://doi.org/10.14778/2002938.2002940Google ScholarDigital Library
- Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. CIDR. http://cidrdb.org/cidr2020/papers/p29-neumann-cidr20.pdfGoogle Scholar
- Chris Nyberg. 1984. Disk scheduling and cache replacement for a database machine. Master’s thesis. UC Berkeley.Google Scholar
- Patrick O’Neil, Elizabeth O’Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In Performance Evaluation and Benchmarking(Lecture Notes in Computer Science), Raghunath Nambiar and Meikel Poess (Eds.). Springer, Berlin, Heidelberg, 237–252. https://doi.org/10.1007/978-3-642-10424-4_17Google ScholarDigital Library
- Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. Adaptive HTAP through Elastic Resource Scheduling. arXiv:2004.05437 [cs, eess] (April 2020). http://arxiv.org/abs/2004.05437 arXiv:2004.05437.Google Scholar
- Allen Reiter. 1976. A Study of Buffer Management Policies for Data Management Systems.Technical Report. WISCONSIN UNIV MADISON MATHEMATICS RESEARCH CENTER.Google Scholar
- Giovanni Maria Sacco and Mario Schkolnick. 1982. A Mechanism for Managing the Buffer Pool in a Relational Database System Using the Hot Set Model. In Eigth International Conference on Very Large Data Bases, September 8-10, 1982, Mexico City, Mexico, Proceedings. Morgan Kaufmann, 257–262. http://www.vldb.org/conf/1982/P257.PDFGoogle ScholarDigital Library
- Allen Samuels. 2016. The Consequences of Infinite Storage Bandwidth. (2016). https://events.static.linuxfound.org/sites/events/files/slides/Keynote_Allen%20Samuels_Final.pdf Vault Linux Storage & Filesystems Conference.Google Scholar
- Utku Sirin and Anastasia Ailamaki. 2020. Micro-architectural Analysis of OLAP: Limitations and Opportunities. Proc. VLDB Endow. 13, 6 (2020), 840–853. https://doi.org/10.14778/3380750.3380755Google ScholarDigital Library
- Michael Stonebraker, John Woodfill, Jeff Ranstrom, Marguerite C. Murphy, Marc Meyer, and Eric Allman. 1983. Performance Enhancements to a Relational Database System. ACM Trans. Database Syst. 8, 2 (1983), 167–185. https://doi.org/10.1145/319983.319984Google ScholarDigital Library
- Lukas Vogel, Viktor Leis, Alexander van Renen, Thomas Neumann, Satoshi Imamura, and Alfons Kemper. 2020. Mosaic: a budget-conscious storage engine for relational database systems. Proceedings of the VLDB Endowment 13, 12 (Aug. 2020), 2662–2675. https://doi.org/10.14778/3407790.3407852Google ScholarDigital Library
- Yinghao Yu, Wei Wang, Jun Zhang, and Khaled Ben Letaief. 2017. LRC: Dependency-aware cache management for data analytics clusters. In 2017 IEEE Conference on Computer Communications, INFOCOM 2017, Atlanta, GA, USA, May 1-4, 2017. IEEE, 1–9. https://doi.org/10.1109/INFOCOM.2017.8057007Google ScholarCross Ref
- Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge and Data Engineering 27, 7 (July 2015), 1920–1948. https://doi.org/10.1109/TKDE.2015.2427795Google ScholarDigital Library
Recommendations
Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitectureOn-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
An integrated prefetching/caching scheme in multimedia servers
Advances in storage and networking technologies have made on-demand multimedia services prevalent these days. To provide numerous concurrent users with such high-quality services, it is essential for storage systems to support sufficient I/O bandwidth ...
FrozenHot Cache: Rethinking Cache Management for Modern Hardware
EuroSys '23: Proceedings of the Eighteenth European Conference on Computer SystemsCaching is crucial for accelerating data access, employed as a ubiquitous design in modern systems at many parts of computer systems. With increasing core count, and shrinking latency gap between cache and modern storage devices, hit-path scalability ...
Comments