skip to main content
article

Improving hash join performance through prefetching

Published:01 August 2007Publication History
Skip Abstract Section

Abstract

Hash join algorithms suffer from extensive CPU cache stalls. This article shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 80% of its user time stalled on CPU cache misses, and explores the use of CPU cache prefetching to improve its cache performance. Applying prefetching to hash joins is complicated by the data dependencies, multiple code paths, and inherent randomness of hashing. We present two techniques, group prefetching and software-pipelined prefetching, that overcome these complications. These schemes achieve 1.29--4.04X speedups for the join phase and 1.37--3.49X speedups for the partition phase over GRACE and simple prefetching approaches. Moreover, compared with previous cache-aware approaches (i.e. cache partitioning), the schemes are at least 36% faster on large relations and do not require exclusive use of the CPU cache to be effective. Finally, comparing the elapsed real times when disk I/Os are in the picture, our cache prefetching schemes achieve 1.12--1.84X speedups for the join phase and 1.06--1.60X speedups for the partition phase over the GRACE hash join algorithm.

References

  1. Baer, J.-L. and Chen, T.-F. 1991. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of Supercomputing'91 (Albuquerque, NM). 176--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Boncz, P. A., Manegold, S., and Kersten, M. L. 1999. Database architecture optimized for the new bottleneck: memory access. In Proceedings of the 25th International Conference on Very Large Data Bases. Edinburgh, Scotland, UK, 54--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chen, S. 2005. Redesigning database systems in light of CPU cache prefetching. Ph.D. thesis, Carnegie Mellon University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chen, S., Ailamaki, A., Gibbons, P. B., and Mowry, T. C. 2004. Improving hash join performance through prefetching. In Proceedings of the 20th International Conference on Data Engineering (Boston, MA). IEEE Computer Society Press, Los Alamitos, CA, 116--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chen, S., Ailamaki, A., Gibbons, P. B., and Mowry, T. C. 2005. Inspector Joins. In Proceedings of the 31st International Conference on Very Large Data Bases (Trondheim, Norway). 817--828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen, S., Gibbons, P. B., and Mowry, T. C. 2001. Improving Index Performance through Prefetching. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (Santa Barbara, CA). ACM, New York, 235--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chen, S., Gibbons, P. B., Mowry, T. C., and Valentin, G. 2002. Fractal prefetching B+-trees: optimizing both cache and disk performance. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, (Madison, WI), ACM, New York, 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gold, B. T., Ailamaki, A., Huston, L., and Falsafi, B. 2005. Accelerating database operations using a network processor. In Proceedings of the First International Workshop on Data Management on New Hardware (DaMoN 2005) (Baltimore, MD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Graefe, G. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2, 73--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hepkin, D. 2006. Guide to multiple page size support on AIX 5L version 5.3. http://www-03.ibm.com/servers/aix/whitepapers/multiple_page.pdf.Google ScholarGoogle Scholar
  11. IBM Corporation. 2004. IBM DB2 Universal Database Administration Guide Version 8.2.Google ScholarGoogle Scholar
  12. Intel Corporation. 2004. Intel Itanium 2 Processor Reference Manual For Software Development and Optimization. Intel Corporation. Order Number: 251110-003.Google ScholarGoogle Scholar
  13. Intel Corporation. 2006. IA-32 Intel Architecture Software Developer's Manual (Volumn 3A and 3B): System Programming Guide. Intel Corporation.Google ScholarGoogle Scholar
  14. Kahle, J. A., Day, M. N., Hofstee, H. P., Jonhns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the Cell Multiprocessor. IBM J. Res. Devel. 49, 4/5 (July/Sept.), 589--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kalla, R. N., Sinharoy, B., and Tendler, J. M. 2004. IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro 24, 2 (Mar./Apr.), 40--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kennedy, K. and McKinley, K. S. 1990. Loop distribution with arbitrary control flow. In Proceedings of Supercomputing'90 (New York, NY). 407--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kitsuregawa, M., Tanaka, H., and Moto-Oka, T. 1983. Application of hash to data base machine and its architecture. New Generation Comput. 1, 1, 63--74.Google ScholarGoogle ScholarCross RefCross Ref
  18. Lam, M. S. 1987. A systolic array optimizing compiler. Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lindsay, B. 2002. Hash joins in DB2 UDB: The inside story. Carnegie Mellon DB Seminar.Google ScholarGoogle Scholar
  20. Luk, C.-K. and Mowry, T. C. 1996. Compiler-based prefetching for recursive data structures. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (Cambridge, MA). 222--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Luk, C.-K. and Mowry, T. C. 1999. Automatic compiler-inserted prefetching for pointer-based applications. IEEE Trans. Comput. 48, 2 (Feb.), 134--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Manegold, S., Boncz, P. A., and Kersten, M. L. 2000. What happens during a join? Dissecting CPU and memory optimization effects. In Proceedings of the 26th International Conference on Very Large Data Bases (Cairo, Egypt). 339--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. McDougall, R. 2004. Supporting multiple page sizes in the solaris operating system. http://www.sun.com/blueprints/0304/817-5917.pdf.Google ScholarGoogle Scholar
  24. McFarling, S. 1993. Combining branch predictors. Tech. Rep. WRL Technical Note TN-36, Digital Equipment Corporation (June).Google ScholarGoogle Scholar
  25. Mowry, T. C. 1994. Tolerating latency through software-controlled data prefetching. Ph.D. dissertation. Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mowry, T. C., Lam, M. S., and Gupta, A. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, MA). 62--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nakayama, M., Kitsuregawa, M., and Takagi, M. 1988. Hash-partitioned join method using dynamic destaging strategy. In Proceedings of the 14th International Conference on Very Large Data Bases (Los Angeles, CA). 468--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Perfmon Project. http://www.hpl.hp.com/research/linux/perfmon/index.php4.Google ScholarGoogle Scholar
  29. Saulsbury, A., Dahlgren, F., and Stenström, P. 2000. Recency-based TLB preloading. In Proceedings of the 27th International Symposium on Computer Architecture (Vancouver, BC, Canada). 117--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shapiro, L. D. 1986. Join processing in database systems with large main memories. ACM Trans. Datab. Syst. 11, 3, 239--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shatdal, A., Kant, C., and Naughton, J. F. 1994. Cache conscious algorithms for relational query processing. In Proceedings of the 20th International Conference on Very Large Data Bases (Santiago de Chile). 510--521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sun Microsystems. UltraSPARC IV Processor Architecture Overview. Sun Microsystems. Technical Whitepaper, Version 1.0, Feb. 2004.Google ScholarGoogle Scholar
  33. TPC Benchmarks. http://www.tpc.org/.Google ScholarGoogle Scholar
  34. Zeller, H. and Gray, J. 1990. An adaptive hash join algorithm for multiuser environments. In Proceedings of the 16th International Conference on Very Large Data Bases. (Brisbane, Queensland, Australia). 186--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhou, J., Cieslewicz, J., Ross, K. A., and Shah, M. 2005. Improving database performance on simultaneous multithreading processors. In Proceedings of the 31st International Conference on Very Large Data Bases (Trondheim, Norway). 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving hash join performance through prefetching

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader