skip to main content
research-article

Memory-efficient hash joins

Published:01 December 2014Publication History
Skip Abstract Section

Abstract

We present new hash tables for joins, and a hash join based on them, that consumes far less memory and is usually faster than recently published in-memory joins. Our hash join is not restricted to outer tables that fit wholly in memory. Key to this hash join is a new concise hash table (CHT), a linear probing hash table that has 100% fill factor, and uses a sparse bitmap with embedded population counts to almost entirely avoid collisions. This bitmap also serves as a Bloom filter for use in multi-table joins.

We study the random access characteristics of hash joins, and renew the case for non-partitioned hash joins. We introduce a variant of partitioned joins in which only the build is partitioned, but the probe is not, as this is more efficient for large outer tables than traditional partitioned joins. This also avoids partitioning costs during the probe, while at the same time allowing parallel build without latching overheads. Additionally, we present a variant of CHT, called a concise array table (CAT), that can be used when the key domain is moderately dense. CAT is collision-free and avoids storing join keys in the hash table.

We perform a detailed comparison of CHT and CAT against leading in-memory hash joins. Our experiments show that we can reduce the memory usage by one to three orders of magnitude, while also being competitive in performance.

References

  1. Full experimental results. http://researcher.ibm.com/person/us-ravijay.Google ScholarGoogle Scholar
  2. Understanding hash joins. Microsoft TechNet SQL Server.Google ScholarGoogle Scholar
  3. D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization strategies in a column-oriented DBMS. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. M.-C. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, 5(10), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Balkesen, G. Alonso, J. Teubner, and T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Balkesen, J. Teubner, G. Alonso, and T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. K. Begley, Z. He, and Y.-P. P. Chen. Mcjoin: A memory-constrained join for column-store main-memory databases. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core cpus. In SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. ACM Trans. Database Syst., 32(3), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. J. DeWitt et al. Implementation techniques for main memory database systems. In SIGMOD, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Kim et al. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PLVDB, 2(2), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Kitsuregawa, H. Tanaka, and T. Moto-Oka. Application of hash to data base machine and its architecture. New Generation Computing, 1(1), 1983.Google ScholarGoogle Scholar
  13. H. Lang et al. Massively parallel NUMA-aware hash joins. In IMDM, 2013.Google ScholarGoogle Scholar
  14. V. Leis et al. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Li et al. NUMA-aware algorithms: the case of data shuffling. In CIDR, 2013.Google ScholarGoogle Scholar
  16. L. F. Mackert and G. M. Lohman. R* Optimizer Validation and Performance Evaluation for Distributed Queries. In VLDB, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Manegold, P. Boncz, and M. Kersten. Optimizing database architecture for the new bottleneck: memory access. VLDB Journal, 9(3), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Manegold, P. A. Boncz, and M. L. Kersten. What happens during a join? Dissecting CPU and memory optimization effects. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Pagh and F. F. Rodler. Cuckoo hashing. In ESA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Raman et al. DB2 with BLU Acceleration: So much more than just a column store. PVLDB, 6, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. A. Ross. Efficient hash probes on modern processors. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Memory-efficient hash joins
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 8, Issue 4
        December 2014
        132 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 December 2014
        Published in pvldb Volume 8, Issue 4

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader