article

Improving hash join performance through prefetching

Authors:
Shimin Chen

Intel Research Pittsburgh, Pittsburgh, PA

Intel Research Pittsburgh, Pittsburgh, PA
View Profile

,
Anastassia Ailamaki

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Phillip B. Gibbons

Intel Research Pittsburgh, Pittsburgh, PA

Intel Research Pittsburgh, Pittsburgh, PA
View Profile

,
Todd C. Mowry

Carnegie Mellon University and Intel Research Pittsburgh, Pittsburgh, PA

Carnegie Mellon University and Intel Research Pittsburgh, Pittsburgh, PA
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 32 Issue 3pp 17–eshttps://doi.org/10.1145/1272743.1272747

Published:01 August 2007Publication History

ACM Transactions on Database Systems

Abstract

Hash join algorithms suffer from extensive CPU cache stalls. This article shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 80% of its user time stalled on CPU cache misses, and explores the use of CPU cache prefetching to improve its cache performance. Applying prefetching to hash joins is complicated by the data dependencies, multiple code paths, and inherent randomness of hashing. We present two techniques, group prefetching and software-pipelined prefetching, that overcome these complications. These schemes achieve 1.29--4.04X speedups for the join phase and 1.37--3.49X speedups for the partition phase over GRACE and simple prefetching approaches. Moreover, compared with previous cache-aware approaches (i.e. cache partitioning), the schemes are at least 36% faster on large relations and do not require exclusive use of the CPU cache to be effective. Finally, comparing the elapsed real times when disk I/Os are in the picture, our cache prefetching schemes achieve 1.12--1.84X speedups for the join phase and 1.06--1.60X speedups for the partition phase over the GRACE hash join algorithm.

References

Baer, J.-L. and Chen, T.-F. 1991. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of Supercomputing'91 (Albuquerque, NM). 176--186. Google ScholarDigital Library
Boncz, P. A., Manegold, S., and Kersten, M. L. 1999. Database architecture optimized for the new bottleneck: memory access. In Proceedings of the 25th International Conference on Very Large Data Bases. Edinburgh, Scotland, UK, 54--65. Google ScholarDigital Library
Chen, S. 2005. Redesigning database systems in light of CPU cache prefetching. Ph.D. thesis, Carnegie Mellon University. Google ScholarDigital Library
Chen, S., Ailamaki, A., Gibbons, P. B., and Mowry, T. C. 2004. Improving hash join performance through prefetching. In Proceedings of the 20th International Conference on Data Engineering (Boston, MA). IEEE Computer Society Press, Los Alamitos, CA, 116--127. Google ScholarDigital Library
Chen, S., Ailamaki, A., Gibbons, P. B., and Mowry, T. C. 2005. Inspector Joins. In Proceedings of the 31st International Conference on Very Large Data Bases (Trondheim, Norway). 817--828. Google ScholarDigital Library
Chen, S., Gibbons, P. B., and Mowry, T. C. 2001. Improving Index Performance through Prefetching. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (Santa Barbara, CA). ACM, New York, 235--246. Google ScholarDigital Library
Chen, S., Gibbons, P. B., Mowry, T. C., and Valentin, G. 2002. Fractal prefetching B^+-trees: optimizing both cache and disk performance. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, (Madison, WI), ACM, New York, 157--168. Google ScholarDigital Library
Gold, B. T., Ailamaki, A., Huston, L., and Falsafi, B. 2005. Accelerating database operations using a network processor. In Proceedings of the First International Workshop on Data Management on New Hardware (DaMoN 2005) (Baltimore, MD). Google ScholarDigital Library
Graefe, G. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2, 73--170. Google ScholarDigital Library
Hepkin, D. 2006. Guide to multiple page size support on AIX 5L version 5.3. http://www-03.ibm.com/servers/aix/whitepapers/multiple_page.pdf.Google Scholar
IBM Corporation. 2004. IBM DB2 Universal Database Administration Guide Version 8.2.Google Scholar
Intel Corporation. 2004. Intel Itanium 2 Processor Reference Manual For Software Development and Optimization. Intel Corporation. Order Number: 251110-003.Google Scholar
Intel Corporation. 2006. IA-32 Intel Architecture Software Developer's Manual (Volumn 3A and 3B): System Programming Guide. Intel Corporation.Google Scholar
Kahle, J. A., Day, M. N., Hofstee, H. P., Jonhns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the Cell Multiprocessor. IBM J. Res. Devel. 49, 4/5 (July/Sept.), 589--604. Google ScholarDigital Library
Kalla, R. N., Sinharoy, B., and Tendler, J. M. 2004. IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro 24, 2 (Mar./Apr.), 40--47. Google ScholarDigital Library
Kennedy, K. and McKinley, K. S. 1990. Loop distribution with arbitrary control flow. In Proceedings of Supercomputing'90 (New York, NY). 407--416. Google ScholarDigital Library
Kitsuregawa, M., Tanaka, H., and Moto-Oka, T. 1983. Application of hash to data base machine and its architecture. New Generation Comput. 1, 1, 63--74.Google ScholarCross Ref
Lam, M. S. 1987. A systolic array optimizing compiler. Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA. Google ScholarDigital Library
Lindsay, B. 2002. Hash joins in DB2 UDB: The inside story. Carnegie Mellon DB Seminar.Google Scholar
Luk, C.-K. and Mowry, T. C. 1996. Compiler-based prefetching for recursive data structures. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (Cambridge, MA). 222--233. Google ScholarDigital Library
Luk, C.-K. and Mowry, T. C. 1999. Automatic compiler-inserted prefetching for pointer-based applications. IEEE Trans. Comput. 48, 2 (Feb.), 134--141. Google ScholarDigital Library
Manegold, S., Boncz, P. A., and Kersten, M. L. 2000. What happens during a join&quest; Dissecting CPU and memory optimization effects. In Proceedings of the 26th International Conference on Very Large Data Bases (Cairo, Egypt). 339--350. Google ScholarDigital Library
McDougall, R. 2004. Supporting multiple page sizes in the solaris operating system. http://www.sun.com/blueprints/0304/817-5917.pdf.Google Scholar
McFarling, S. 1993. Combining branch predictors. Tech. Rep. WRL Technical Note TN-36, Digital Equipment Corporation (June).Google Scholar
Mowry, T. C. 1994. Tolerating latency through software-controlled data prefetching. Ph.D. dissertation. Stanford University. Google ScholarDigital Library
Mowry, T. C., Lam, M. S., and Gupta, A. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, MA). 62--73. Google ScholarDigital Library
Nakayama, M., Kitsuregawa, M., and Takagi, M. 1988. Hash-partitioned join method using dynamic destaging strategy. In Proceedings of the 14th International Conference on Very Large Data Bases (Los Angeles, CA). 468--478. Google ScholarDigital Library
Perfmon Project. http://www.hpl.hp.com/research/linux/perfmon/index.php4.Google Scholar
Saulsbury, A., Dahlgren, F., and Stenström, P. 2000. Recency-based TLB preloading. In Proceedings of the 27th International Symposium on Computer Architecture (Vancouver, BC, Canada). 117--127. Google ScholarDigital Library
Shapiro, L. D. 1986. Join processing in database systems with large main memories. ACM Trans. Datab. Syst. 11, 3, 239--264. Google ScholarDigital Library
Shatdal, A., Kant, C., and Naughton, J. F. 1994. Cache conscious algorithms for relational query processing. In Proceedings of the 20th International Conference on Very Large Data Bases (Santiago de Chile). 510--521. Google ScholarDigital Library
Sun Microsystems. UltraSPARC IV Processor Architecture Overview. Sun Microsystems. Technical Whitepaper, Version 1.0, Feb. 2004.Google Scholar
TPC Benchmarks. http://www.tpc.org/.Google Scholar
Zeller, H. and Gray, J. 1990. An adaptive hash join algorithm for multiuser environments. In Proceedings of the 16th International Conference on Very Large Data Bases. (Brisbane, Queensland, Australia). 186--197. Google ScholarDigital Library
Zhou, J., Cieslewicz, J., Ross, K. A., and Shah, M. 2005. Improving database performance on simultaneous multithreading processors. In Proceedings of the 31st International Conference on Very Large Data Bases (Trondheim, Norway). 49--60. Google ScholarDigital Library

Index Terms

Improving hash join performance through prefetching

Recommendations

Improving Hash Join Performance through Prefetching
ICDE '04: Proceedings of the 20th International Conference on Data Engineering

Hash join algorithms suffer from extensive CPU cachestalls. This paper shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over73% of its user time stalled on CPU cache misses, and explores the use of prefetching ...
Read More
Improving memory hierarchy performance with hardware prefetching and cache replacement
Read More
Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 32, Issue 3
August 2007
260 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1272743
Issue’s Table of Contents

Copyright © 2007 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 2007
Published in tods Volume 32, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CPU cache performance
CPU cache prefetching
Hash join
group prefetching
software-pipelined prefetching
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 89
  Total Citations
  View Citations
- 1,542
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving hash join performance through prefetching

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Improving Hash Join Performance through Prefetching

Improving memory hierarchy performance with hardware prefetching and cache replacement

Stealth prefetching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving hash join performance through prefetching

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Improving Hash Join Performance through Prefetching

Improving memory hierarchy performance with hardware prefetching and cache replacement

Stealth prefetching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media