Cache-conscious structure layout

Authors:
Trishul M. Chilimbi

Computer Sciences Department, University of Wisconsin, 1210 West Dayton St., Madison, WI

Computer Sciences Department, University of Wisconsin, 1210 West Dayton St., Madison, WI
View Profile

,
Mark D. Hill

Computer Sciences Department, University of Wisconsin, 1210 West Dayton St., Madison, WI

Computer Sciences Department, University of Wisconsin, 1210 West Dayton St., Madison, WI
View Profile

,
James R. Larus

Microsoft Research, One Microsoft Way, Redmond, WA

Microsoft Research, One Microsoft Way, Redmond, WA
View Profile

PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementationMay 1999Pages 1–12https://doi.org/10.1145/301618.301633

Published:01 May 1999Publication History

PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation

Pages 1–12

ABSTRACT

Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs.This paper explores a complementary approach that attacks the source (poor reference locality) of the problem rather than its manifestation (memory latency). It demonstrates that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance. It explores two placement techniques---clustering and coloring---that improve cache performance by increasing a pointer structure's spatial and temporal locality, and by reducing cache-conflicts.To reduce the cost of applying these techniques, this paper discusses two strategies---cache-conscious reorganization and cache-conscious allocation---and describes two semi-automatic tools---ccmorph and ccmalloc---that use these strategies to produce cache-conscious pointer structure layouts. ccmorph is a transparent tree reorganizer that utilizes topology information to cluster and color the structure. ccmalloc is a cache-conscious heap allocator that attempts to co-locate contemporaneously accessed data elements in the same physical cache block. Our evaluations, with microbenchmarks, several small benchmarks, and a couple of large real-world applications, demonstrate that the cache-conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefits---in most cases, significantly outperforming state-of-the-art prefetching.

References

1.A. Agarwal, M. Horowitz, and J. Hennessy. "An analytical cache model." A CM Transactions on Computer Systems, 7(2): 184--215, 1989. Google ScholarDigital Library
2.A.V. Aho, P.J. Denning, and J. D. Ullman. "Principles of optimal page replacement." Journal of the ACM, 18(1):80- 93, 1971. Google ScholarDigital Library
3.J. Banerjee, W. Kim, and J. F. Garza. "Clustering a DAG for CAD databases." IEEE Transactions on Software Engineering, 14(11): 1684--1699, 1988. Google ScholarDigital Library
4.R. Bayer and C. McCreight. "Organization and maintenance of large ordered indexes." Acta Informatica, 1(3):173--189, 1972.Google ScholarDigital Library
5.Veronique Benzaken and Claude Delobel. "Enhancing performance in a persistent object store: Clustering strategies in O2.' In Technical Report 50-90, Altair, Aug. 1990.Google Scholar
6.R.K. Brayton, G D. Hachtel, A. S. Vincentelli, F. Somenzi, A. Aziz, S. Cheng, S. Edwards, S. Khatri, Y. Kukimoto, A. Pardo, S. Qadeer, R. Ranjan, S. Sarwary, T.R. Shilpe, G Swamy, and T. Villa. "VIS: a system for verification and synthesis." In Proceedings of the Eight International Conference on Computer Aided Verification, July 1996. Google ScholarDigital Library
7.Doug Burger, James R. Goodman, and Alain Kagi. "Memory bandwidth limitations of future microprocessors." In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 78--89, May 1996. Google ScholarDigital Library
8.Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. "Cache-conscious data placement." In Proceedings of the Eight International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII), pages 139-149, Oct. 1998. Google ScholarDigital Library
9.David Callahan, Ken Kennedy, and Allan Poterfield. "Software prefetching." in Proceedings of the Fourth International Conjkrence on Architectural Support jbr Programming Languages and Operating Systems (ASPLOS IV), pages 40--52, April 1991. Google ScholarDigital Library
10.Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. "Compiler optimizations for improving data locality." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V1), pages 252-262, Oct. 1994. Google ScholarDigital Library
11.Trishul M. Chilimbi, and James R. Larus. "Using generational garbage collection to implement cache-conscious data placement." In Proceedings of the 1998 International Symposium on Memory Management, Oct. 1998. Google ScholarDigital Library
12.Trishul M. Chilimbi, Bob Davidson, and James R. Lares. "Cache-conscious structure definition." In Proceedings of the A CM SIGPIMN'99 Conference on Programming Language Design and Implementation, May 1999. Google ScholarDigital Library
13.Douglas Comer. "The ubiquitous B-tree." ACM Computing Surveys, 11(2):121-137, 1979. Google ScholarDigital Library
14.R. Courts. "Improving locality of reference in a garbage-collecting memory management system." Communications of the ACM, 31(9):1128--1138, 1988. Google ScholarDigital Library
15.P. Drew and R. King. "The performance and utility of the Cactis implementation algorithms." In Proceedings of the 16th VLDB Conj~rence, pages 135-147, 1990. Google ScholarDigital Library
16.Dennis Gannon, William Jalby, and K. Gallivan. "Strategies for cache and local memory management by global program transformation." Journal of Parallel and Distributed Computing, 5:587--616, 1988. Google ScholarDigital Library
17.N. Gloy, T. Blackwell, M. D. Smith, and B. Calder. "Procedure placement using temporal ordering information." In Proceedings of MICRO-30, Dee. 1997. Google ScholarDigital Library
18.I. J. Haikala. "Cache hit ratios with geometric task switch intervals." In Proceedings of the l lth Annual International Symposium on Computer Architecture, pages 364-371, June 1984. Google ScholarDigital Library
19.Mark D. Hill and Alan Jay Smith. "Evaluafng associativity in CPU caches." IEEE Transactions on Computers, C- 38(12): 1612-1630, December 1989. Google ScholarDigital Library
20.David Kroft. "Lockup-free instruction fetch/prefetch cache organization." In The 8th Annual International Symposium on Computer Architecture, pages 81--87, May 1981. Google ScholarDigital Library
21.M.S. Lam, P.R. Wilson, and T. G Moher. "Object type directed garbage collection to improve locality." In Proceedings of the International Workshop on Memory Management, pages 16-18, Sept. 1992. Google ScholarDigital Library
22.Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf. "The cache performance and optimizations of blocked algorithms.'' In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, Santa Clara, California, 1991. Google ScholarDigital Library
23.Anthony LaMarca and Richard E. Ladner. "The influence of caches on the performance of heaps." ACMJournal of ExperimentalAlgorithmics, 1, 1996. Google ScholarDigital Library
24.Anthony LaMarca and Richard E. Ladner. "The influence of caches on the performance of sorting." In Eight Annual A CM- SIAM Symposium on Discrete Algorithms, Jan. 1997. Google ScholarDigital Library
25.James Laudon, Anoop Gupta, and Mark Horowitz. "Interleaving: A multithreading technique targeting multiprocessors and workstations." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, San Jose, California, 1994. Google ScholarDigital Library
26.Chi-Keung Luk and Todd C. Mowry. "Compiler-based prefetching for recursive data structures." In Proceedings of the Seventh international Conference on Architectural Support for Programming Languages and Operating b3,stems (,,tSPLOS VIi), pages 222-233, Oct. 1996. Google ScholarDigital Library
27.Scott McFarling. "Program optimization for instruction caches." In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 183-191, 1989. Google ScholarDigital Library
28.D. A. Moon. "Garbage collection in a large LISP system." In ConJbrence Record of the i984 Symposium on LISP and Functional Programming, pages 235--246, Aug. 1984. Google ScholarDigital Library
29.Todd C. Mowry, Monica S. Lain, and Anoop Gupta. "Design and evaluation of a compiler algorithm for prefetching." In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IO, pages 62-73, October 1992. Google ScholarDigital Library
30.V. S. Pai, P. Ranganathan, and S. V. Adve. "RSIM reference manual version 1.0." In Technical Report 9705, Dept. of Electrical and Computer Engineering, Rice University, Aug. 1997.Google Scholar
31.V. $. Pai, P. Ranganathan, S.V. Adve, and T. Harton. "An evaluation of memory consistency models for shared-memory systems with ILP processors." In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 12-23, Oct. 1996. Google ScholarDigital Library
32.David Patterson, Thnm~~ And~renn Monl /~rdxvoll Rieh~rrl Fromm, Kimberly Keaton, Christoforos Kazyrakis, Randi Thomas, and Katherine Yellick. "A case for intelligent RAM." In IEEE Micro, pages 34--44, Apr 1997. Google ScholarDigital Library
33.Sharon E. Perl and Richard L. Sites. "Studies of Windows NT pcHormancc using execution traces." in Symposium on Operating Systems Design and Implementation, Oct. 1996. Google ScholarDigital Library
34.Karl Pettis and Robert C. Hansen. "Profile guided code positioning.'' SIGPLAN Notices, 25(6):16-27, June 1990. Proceedings of the A CM SIGPLAN'90 Conference on ProgrammingLanguage Design and Implementation. Google ScholarDigital Library
35.G. S. Rao. ;;Performance analysis of cache memories." Journal of the ACM, 25(3):378-395, 1978. Google ScholarDigital Library
36.G.S. Rao 'Performance anayyisgb iugbiu iugbiu ivbiu ing dynamic data structures on distributed memory machines." A CM Transactions on Programming Languages and Systems, 17(2), 1995. Google ScholarDigital Library
37.M. Rosenblm, E. Bugnion, S.A. Herrod, E. Witchel, and A. Gupta. "The impact of architectural trends on operating system performance." In Proceedings of the 15th ACM Symposium on Operating System Pnnciples (SOSP), pages 285- 298, Dec. 1995. Google ScholarDigital Library
38.A. Roth, A. Moshovos, and GS. Sohi. "Dependence based prefetching for linked data structures." In Proceedings of the Eight International ConJkrence on Architectural Support for ~rvglutttttttttg l.~tt~uu~co llrtu LtF~ruttltg VIII), pages 115--126, Oct. 1998. Google ScholarDigital Library
39.J. H. Saltzer. "A simple linear model of demand paging performance." Communications of the ACM, 17(4):181-186, 1974. Google ScholarDigital Library
40.M. L. Seidl, and B. G Zorn "Segregating heap objects by reference behavior and lifetime." In Proceedings of the Eight gramming Languages and Operating Systems (ASPLOS VIII), pages 12-23, Oct. 1998. Google ScholarDigital Library
41.JaswinderPal Singh, Harold S. Stone, and Dominique F. Thiebaut. "A model of workloads and its use in miss-rate prediction for fully associative caches." IEI~I~ Transactions on Computers, 41 (7):811-825, 1992. Google ScholarDigital Library
42.A.J. Smith "A Compartive study pf giug iugiugvuiy}fg mapping algorithms and their use for cache and main memory.'' IEEE Trans. on Software En_~neering, 4(2):121-130, 1978.Google ScholarDigital Library
43.Alan J. Smith. "Cache memories." ACM Computing Surveys, 14(3):473-530, 1982. Google ScholarDigital Library
44.Burton J. Smith. "Architecture and applications of the HEP Multiprocerssor oihoi og9u iugiuguigiyfc yufui}yg iugh cessing IV, pages 241-248, 1981.Google Scholar
45.J.R. Spirn, editor. Program Behavior: Models and Measurements. Operating and Programming System Series, Elsevier, New York, 1977. Google ScholarDigital Library
46.J. W. Stamos. "Static grouping of small objects to enhance performance of a paged virtual memory." ACM Transactions on Programming Languages and Systems, 2(2):155-180, 1984. Google ScholarDigital Library
47.Dan N. Truong oilkhoih oihoi iohoi iouhbo oih oihoihiooihi ing cache behavior of dynamically allocated data structures." In International Conference on Parallel Architectures and Compilation Techniques, Oct. 1998. Google ScholarDigital Library
48.M. N. Tsangaris and J. Naughton. "On the performance of nh~r,t oh,etoriBg tot, hni.,,,,._.~,,,o." ._t- Proceedings A CM SIGMOD Intl. Conf. on Management of Data, pages 144-153, June 1992. Google ScholarDigital Library
49.G J. Ward. "The RADIANCE lighting simulation and rendering system." in Proceedings of SIGGRAPH '94, July 1994. Google ScholarDigital Library
50.M. V. Wilkes. "Siave memories and dynamic storage allocation.'' In IEEE Trans. on Electronic Computers, pages 270- 9'71 April 1965.Google Scholar
51.Paul R. Wilson, Michael S. Lain, and Thomas G Moher. "Effective "static-graph" reorganization to improve locality in garbage-collected systems." SiGPLAN Notices, 26(6):177- 191, June 1991. Proceedings of the A CM $IGPIMN'91 Conference on Programming Language Design and implementation. Google ScholarDigital Library
52.Michael E. Wolf and Monics bhouihoihoihihihp oihphj'p poi mizing algorithm." SIGPLAN Notices, 26(6):30--44, June 1991. Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation. Google ScholarDigital Library

Index Terms

Recommendations

Cache-conscious structure definition
PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation

A program's cache performance can be improved by changing the organization and layout of its data---even complex, pointer-based data structures. Previous techniques improved the cache performance of these structures by arranging distinct instances to ...
Read More
Cache-conscious structure layout

Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating ...
Read More
Using generational garbage collection to implement cache-conscious data placement

The cost of accessing main memory is increasing. Machine designers have tried to mitigate the consequences of the processor and memory technology trends underlying this increasing gap with a variety of techniques to reduce or tolerate memory latency. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
May 1999
304 pages
ISBN:1581130945
DOI:10.1145/301618
Chairmen:
Barbara G. Ryder
Rutgers Univ., New Brunswick, NJ
,
Benjamin G. Zorn
Univ. of Colorado, Boulder, and Microsoft Research
ACM SIGPLAN Notices Volume 34, Issue 5
May 1999
304 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/301631
Editors:
Barbara G. Ryder
Rutgers Univ., New Brunswick, NJ
,
Benjamin G. Zorn
Univ. of Colorado, Boulder
,
A. Michael Berman
Rowan Univ., Glassboro, NJ
Issue’s Table of Contents
Copyright © 1999 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1999
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache-conscious allocation
cache-conscious data placement
cache-conscious reorganization
clustering
coloring
Qualifiers
- Article
Conference

Acceptance Rates
PLDI '99 Paper Acceptance Rate26of130submissions,20%Overall Acceptance Rate406of2,067submissions,20%
More
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 261
  Total Citations
  View Citations
- 2,092
  Total Downloads
- Downloads (Last 12 months)184
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cache-conscious structure layout

PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cache-conscious structure definition

Cache-conscious structure layout

Using generational garbage collection to implement cache-conscious data placement