skip to main content
10.1145/3533737.3535100acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

HPCache: Memory-Efficient OLAP Through Proportional Caching

Published:13 June 2022Publication History

ABSTRACT

Analytical engines rely on in-memory caching to avoid disk accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- & time-based caching decisions, however, are a proxy of the expected query execution speedup only when disk accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach or even outperform fully in-memory query execution response times, rendering purely frequency-based statistics incapable of capturing impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. As a result, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics.

This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on their speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that i) estimating speedup potential improves memory space utilization, and ii) simple runtime statistics suffice to infer speedup expectations. We show that HPCache achieves up to 12% faster query execution over state-of-the-art caching policies, or 75% less in-memory cache footprint without deteriorating query performance. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without any requirement for workload predictions.

References

  1. Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd. 2018. A rewriting system for convex optimization problems. Journal of Control and Decision 5, 1 (2018), 42–60.Google ScholarGoogle ScholarCross RefCross Ref
  2. Manos Athanassoulis, Kenneth S. Bøgh, and Stratos Idreos. 2019. Optimal column layout for hybrid workloads. Proceedings of the VLDB Endowment 12, 13 (Sept. 2019), 2393–2407. https://doi.org/10.14778/3358701.3358707Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. In SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 168–180. https://doi.org/10.1145/3448016.3452831Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, Timos K. Sellis, Renée J. Miller, Anastasios Kementsietsidis, and Yannis Velegrakis (Eds.). ACM, 37–48. https://doi.org/10.1145/1989323.1989328Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the memory wall in MonetDB. Commun. ACM 51, 12 (Dec. 2008), 77–85. https://doi.org/10.1145/1409360.1409380Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Peter A. Boncz, Stefan Manegold, and Martin L. Kersten. 1999. Database architecture optimized for the new bottleneck: Memory access. In VLDB’99, proceedings of 25th international conference on very large data bases, september 7-10, 1999, edinburgh, scotland, UK, Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie (Eds.). Morgan Kaufmann, 54–65. http://www.vldb.org/conf/1999/P5.pdf tex.bibsource: dblp computer science bibliography, https://dblp.org tex.biburl: https://dblp.org/rec/conf/vldb/BonczMK99.bib tex.timestamp: Wed, 11 May 2022 08:53:25 +0200.Google ScholarGoogle Scholar
  7. Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Second Biennial Conference on Innovative Data Systems Research, CIDR 2005, Asilomar, CA, USA, January 4-7, 2005, Online Proceedings. CIDR, 225–237. http://cidrdb.org/cidr2005/papers/P19.pdfGoogle ScholarGoogle Scholar
  8. Renata Borovica-Gajic, Raja Appuswamy, and Anastasia Ailamaki. 2016. Cheap data analytics using cold storage devices. Proc. VLDB Endow. 9, 12 (2016), 1029–1040. https://doi.org/10.14778/2994509.2994521Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mustafa Canim, George A. Mihaila, Bishwaranjan Bhattacharjee, Kenneth A. Ross, and Christian A. Lang. 2010. SSD Bufferpool Extensions for Database Systems. Proc. VLDB Endow. 3, 2 (2010), 1435–1446. https://doi.org/10.14778/1920841.1921017Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hong-Tai Chou and David J. DeWitt. 1986. An Evaluation of Buffer Management Strategies for Relational Database Systems. Algorithmica 1, 3 (1986), 311–336. https://doi.org/10.1007/BF01840450Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Periklis Chrysogelos, Manos Karpathiotakis, Raja Appuswamy, and Anastasia Ailamaki. 2019. HetExchange: encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines. Proceedings of the VLDB Endowment 12, 5 (Jan. 2019), 544–556. https://doi.org/10.14778/3303753.3303760Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Andrew Crotty, Viktor Leis, and Andrew Pavlo. 2022. Are You Sure You Want to Use MMAP in Your Database Management System?. In {CIDR} 2022, Conference on Innovative Data Systems Research. 7.Google ScholarGoogle Scholar
  13. Advanced Micro Devices. 2021. AMD EPYC™ 7003 SERIES Data Sheet. Advanced Micro Devices. https://www.amd.com/system/files/documents/amd-epyc-7003-series-datasheet.pdfGoogle ScholarGoogle Scholar
  14. Steven Diamond and Stephen Boyd. 2016. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research 17, 83 (2016), 1–5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jaeyoung Do, Ivan Luiz Picoli, David B. Lomet, and Philippe Bonnet. 2021. Better database cost/performance via batched I/O on programmable SSD. VLDB J. 30, 3 (2021), 403–424. https://doi.org/10.1007/s00778-020-00648-zGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hadi Esmaeilzadeh, Emily R. Blem, Renée St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2012. Dark Silicon and the End of Multicore Scaling. IEEE Micro 32, 3 (2012), 122–134. https://doi.org/10.1109/MM.2012.17Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Goetz Graefe, Haris Volos, Hideaki Kimura, Harumi Kuno, Joseph Tucek, Mark Lillibridge, and Alistair Veitch. 2014. In-memory performance for big data. Proceedings of the VLDB Endowment 8, 1 (Sept. 2014), 37–48. https://doi.org/10.14778/2735461.2735465Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Stratos Idreos, F. Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35 (Jan. 2012), 40–45. http://sites.computer.org/debull/A12mar/monetdb.pdfGoogle ScholarGoogle Scholar
  19. Ryan Johnson, Ippokratis Pandis, Nikos Hardavellas, Anastasia Ailamaki, and Babak Falsafi. 2009. Shore-MT: a scalable storage manager for the multicore era. In Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT ’09. ACM Press, Saint Petersburg, Russia, 24. https://doi.org/10.1145/1516360.1516365Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Theodore Johnson and Dennis E. Shasha. 1994. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile, Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo (Eds.). Morgan Kaufmann, 439–450. http://www.vldb.org/conf/1994/P439.PDFGoogle ScholarGoogle Scholar
  21. Manos Karpathiotakis, Ioannis Alagiannis, and Anastasia Ailamaki. 2016. Fast queries over heterogeneous data through engine customization. Proceedings of the VLDB Endowment 9, 12 (Aug. 2016), 972–983. https://doi.org/10.14778/2994509.2994516Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, Paris, 185–196. https://doi.org/10.1109/ICDE.2018.00026Google ScholarGoogle ScholarCross RefCross Ref
  23. John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/A continually updated technical report. http://www.cs.virginia.edu/stream/.Google ScholarGoogle Scholar
  24. John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19–25.Google ScholarGoogle Scholar
  25. Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (2011), 539–550. https://doi.org/10.14778/2002938.2002940Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. CIDR. http://cidrdb.org/cidr2020/papers/p29-neumann-cidr20.pdfGoogle ScholarGoogle Scholar
  27. Chris Nyberg. 1984. Disk scheduling and cache replacement for a database machine. Master’s thesis. UC Berkeley.Google ScholarGoogle Scholar
  28. Patrick O’Neil, Elizabeth O’Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In Performance Evaluation and Benchmarking(Lecture Notes in Computer Science), Raghunath Nambiar and Meikel Poess (Eds.). Springer, Berlin, Heidelberg, 237–252. https://doi.org/10.1007/978-3-642-10424-4_17Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. Adaptive HTAP through Elastic Resource Scheduling. arXiv:2004.05437 [cs, eess] (April 2020). http://arxiv.org/abs/2004.05437 arXiv:2004.05437.Google ScholarGoogle Scholar
  30. Allen Reiter. 1976. A Study of Buffer Management Policies for Data Management Systems.Technical Report. WISCONSIN UNIV MADISON MATHEMATICS RESEARCH CENTER.Google ScholarGoogle Scholar
  31. Giovanni Maria Sacco and Mario Schkolnick. 1982. A Mechanism for Managing the Buffer Pool in a Relational Database System Using the Hot Set Model. In Eigth International Conference on Very Large Data Bases, September 8-10, 1982, Mexico City, Mexico, Proceedings. Morgan Kaufmann, 257–262. http://www.vldb.org/conf/1982/P257.PDFGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  32. Allen Samuels. 2016. The Consequences of Infinite Storage Bandwidth. (2016). https://events.static.linuxfound.org/sites/events/files/slides/Keynote_Allen%20Samuels_Final.pdf Vault Linux Storage & Filesystems Conference.Google ScholarGoogle Scholar
  33. Utku Sirin and Anastasia Ailamaki. 2020. Micro-architectural Analysis of OLAP: Limitations and Opportunities. Proc. VLDB Endow. 13, 6 (2020), 840–853. https://doi.org/10.14778/3380750.3380755Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Michael Stonebraker, John Woodfill, Jeff Ranstrom, Marguerite C. Murphy, Marc Meyer, and Eric Allman. 1983. Performance Enhancements to a Relational Database System. ACM Trans. Database Syst. 8, 2 (1983), 167–185. https://doi.org/10.1145/319983.319984Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lukas Vogel, Viktor Leis, Alexander van Renen, Thomas Neumann, Satoshi Imamura, and Alfons Kemper. 2020. Mosaic: a budget-conscious storage engine for relational database systems. Proceedings of the VLDB Endowment 13, 12 (Aug. 2020), 2662–2675. https://doi.org/10.14778/3407790.3407852Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yinghao Yu, Wei Wang, Jun Zhang, and Khaled Ben Letaief. 2017. LRC: Dependency-aware cache management for data analytics clusters. In 2017 IEEE Conference on Computer Communications, INFOCOM 2017, Atlanta, GA, USA, May 1-4, 2017. IEEE, 1–9. https://doi.org/10.1109/INFOCOM.2017.8057007Google ScholarGoogle ScholarCross RefCross Ref
  37. Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge and Data Engineering 27, 7 (July 2015), 1920–1948. https://doi.org/10.1109/TKDE.2015.2427795Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    DaMoN '22: Proceedings of the 18th International Workshop on Data Management on New Hardware
    June 2022
    83 pages
    ISBN:9781450393782
    DOI:10.1145/3533737

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 June 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    DaMoN '22 Paper Acceptance Rate12of18submissions,67%Overall Acceptance Rate80of102submissions,78%
  • Article Metrics

    • Downloads (Last 12 months)121
    • Downloads (Last 6 weeks)11

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format