ABSTRACT
Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs.This paper explores a complementary approach that attacks the source (poor reference locality) of the problem rather than its manifestation (memory latency). It demonstrates that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance. It explores two placement techniques---clustering and coloring---that improve cache performance by increasing a pointer structure's spatial and temporal locality, and by reducing cache-conflicts.To reduce the cost of applying these techniques, this paper discusses two strategies---cache-conscious reorganization and cache-conscious allocation---and describes two semi-automatic tools---ccmorph and ccmalloc---that use these strategies to produce cache-conscious pointer structure layouts. ccmorph is a transparent tree reorganizer that utilizes topology information to cluster and color the structure. ccmalloc is a cache-conscious heap allocator that attempts to co-locate contemporaneously accessed data elements in the same physical cache block. Our evaluations, with microbenchmarks, several small benchmarks, and a couple of large real-world applications, demonstrate that the cache-conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefits---in most cases, significantly outperforming state-of-the-art prefetching.
- 1.A. Agarwal, M. Horowitz, and J. Hennessy. "An analytical cache model." A CM Transactions on Computer Systems, 7(2): 184--215, 1989. Google ScholarDigital Library
- 2.A.V. Aho, P.J. Denning, and J. D. Ullman. "Principles of optimal page replacement." Journal of the ACM, 18(1):80- 93, 1971. Google ScholarDigital Library
- 3.J. Banerjee, W. Kim, and J. F. Garza. "Clustering a DAG for CAD databases." IEEE Transactions on Software Engineering, 14(11): 1684--1699, 1988. Google ScholarDigital Library
- 4.R. Bayer and C. McCreight. "Organization and maintenance of large ordered indexes." Acta Informatica, 1(3):173--189, 1972.Google ScholarDigital Library
- 5.Veronique Benzaken and Claude Delobel. "Enhancing performance in a persistent object store: Clustering strategies in O2.' In Technical Report 50-90, Altair, Aug. 1990.Google Scholar
- 6.R.K. Brayton, G D. Hachtel, A. S. Vincentelli, F. Somenzi, A. Aziz, S. Cheng, S. Edwards, S. Khatri, Y. Kukimoto, A. Pardo, S. Qadeer, R. Ranjan, S. Sarwary, T.R. Shilpe, G Swamy, and T. Villa. "VIS: a system for verification and synthesis." In Proceedings of the Eight International Conference on Computer Aided Verification, July 1996. Google ScholarDigital Library
- 7.Doug Burger, James R. Goodman, and Alain Kagi. "Memory bandwidth limitations of future microprocessors." In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 78--89, May 1996. Google ScholarDigital Library
- 8.Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. "Cache-conscious data placement." In Proceedings of the Eight International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII), pages 139-149, Oct. 1998. Google ScholarDigital Library
- 9.David Callahan, Ken Kennedy, and Allan Poterfield. "Software prefetching." in Proceedings of the Fourth International Conjkrence on Architectural Support jbr Programming Languages and Operating Systems (ASPLOS IV), pages 40--52, April 1991. Google ScholarDigital Library
- 10.Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. "Compiler optimizations for improving data locality." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V1), pages 252-262, Oct. 1994. Google ScholarDigital Library
- 11.Trishul M. Chilimbi, and James R. Larus. "Using generational garbage collection to implement cache-conscious data placement." In Proceedings of the 1998 International Symposium on Memory Management, Oct. 1998. Google ScholarDigital Library
- 12.Trishul M. Chilimbi, Bob Davidson, and James R. Lares. "Cache-conscious structure definition." In Proceedings of the A CM SIGPIMN'99 Conference on Programming Language Design and Implementation, May 1999. Google ScholarDigital Library
- 13.Douglas Comer. "The ubiquitous B-tree." ACM Computing Surveys, 11(2):121-137, 1979. Google ScholarDigital Library
- 14.R. Courts. "Improving locality of reference in a garbage-collecting memory management system." Communications of the ACM, 31(9):1128--1138, 1988. Google ScholarDigital Library
- 15.P. Drew and R. King. "The performance and utility of the Cactis implementation algorithms." In Proceedings of the 16th VLDB Conj~rence, pages 135-147, 1990. Google ScholarDigital Library
- 16.Dennis Gannon, William Jalby, and K. Gallivan. "Strategies for cache and local memory management by global program transformation." Journal of Parallel and Distributed Computing, 5:587--616, 1988. Google ScholarDigital Library
- 17.N. Gloy, T. Blackwell, M. D. Smith, and B. Calder. "Procedure placement using temporal ordering information." In Proceedings of MICRO-30, Dee. 1997. Google ScholarDigital Library
- 18.I. J. Haikala. "Cache hit ratios with geometric task switch intervals." In Proceedings of the l lth Annual International Symposium on Computer Architecture, pages 364-371, June 1984. Google ScholarDigital Library
- 19.Mark D. Hill and Alan Jay Smith. "Evaluafng associativity in CPU caches." IEEE Transactions on Computers, C- 38(12): 1612-1630, December 1989. Google ScholarDigital Library
- 20.David Kroft. "Lockup-free instruction fetch/prefetch cache organization." In The 8th Annual International Symposium on Computer Architecture, pages 81--87, May 1981. Google ScholarDigital Library
- 21.M.S. Lam, P.R. Wilson, and T. G Moher. "Object type directed garbage collection to improve locality." In Proceedings of the International Workshop on Memory Management, pages 16-18, Sept. 1992. Google ScholarDigital Library
- 22.Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf. "The cache performance and optimizations of blocked algorithms.'' In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, Santa Clara, California, 1991. Google ScholarDigital Library
- 23.Anthony LaMarca and Richard E. Ladner. "The influence of caches on the performance of heaps." ACMJournal of ExperimentalAlgorithmics, 1, 1996. Google ScholarDigital Library
- 24.Anthony LaMarca and Richard E. Ladner. "The influence of caches on the performance of sorting." In Eight Annual A CM- SIAM Symposium on Discrete Algorithms, Jan. 1997. Google ScholarDigital Library
- 25.James Laudon, Anoop Gupta, and Mark Horowitz. "Interleaving: A multithreading technique targeting multiprocessors and workstations." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, San Jose, California, 1994. Google ScholarDigital Library
- 26.Chi-Keung Luk and Todd C. Mowry. "Compiler-based prefetching for recursive data structures." In Proceedings of the Seventh international Conference on Architectural Support for Programming Languages and Operating b3,stems (,,tSPLOS VIi), pages 222-233, Oct. 1996. Google ScholarDigital Library
- 27.Scott McFarling. "Program optimization for instruction caches." In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 183-191, 1989. Google ScholarDigital Library
- 28.D. A. Moon. "Garbage collection in a large LISP system." In ConJbrence Record of the i984 Symposium on LISP and Functional Programming, pages 235--246, Aug. 1984. Google ScholarDigital Library
- 29.Todd C. Mowry, Monica S. Lain, and Anoop Gupta. "Design and evaluation of a compiler algorithm for prefetching." In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IO, pages 62-73, October 1992. Google ScholarDigital Library
- 30.V. S. Pai, P. Ranganathan, and S. V. Adve. "RSIM reference manual version 1.0." In Technical Report 9705, Dept. of Electrical and Computer Engineering, Rice University, Aug. 1997.Google Scholar
- 31.V. $. Pai, P. Ranganathan, S.V. Adve, and T. Harton. "An evaluation of memory consistency models for shared-memory systems with ILP processors." In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 12-23, Oct. 1996. Google ScholarDigital Library
- 32.David Patterson, Thnm~~ And~renn Monl /~rdxvoll Rieh~rrl Fromm, Kimberly Keaton, Christoforos Kazyrakis, Randi Thomas, and Katherine Yellick. "A case for intelligent RAM." In IEEE Micro, pages 34--44, Apr 1997. Google ScholarDigital Library
- 33.Sharon E. Perl and Richard L. Sites. "Studies of Windows NT pcHormancc using execution traces." in Symposium on Operating Systems Design and Implementation, Oct. 1996. Google ScholarDigital Library
- 34.Karl Pettis and Robert C. Hansen. "Profile guided code positioning.'' SIGPLAN Notices, 25(6):16-27, June 1990. Proceedings of the A CM SIGPLAN'90 Conference on ProgrammingLanguage Design and Implementation. Google ScholarDigital Library
- 35.G. S. Rao. ;;Performance analysis of cache memories." Journal of the ACM, 25(3):378-395, 1978. Google ScholarDigital Library
- 36.G.S. Rao 'Performance anayyisgb iugbiu iugbiu ivbiu ing dynamic data structures on distributed memory machines." A CM Transactions on Programming Languages and Systems, 17(2), 1995. Google ScholarDigital Library
- 37.M. Rosenblm, E. Bugnion, S.A. Herrod, E. Witchel, and A. Gupta. "The impact of architectural trends on operating system performance." In Proceedings of the 15th ACM Symposium on Operating System Pnnciples (SOSP), pages 285- 298, Dec. 1995. Google ScholarDigital Library
- 38.A. Roth, A. Moshovos, and GS. Sohi. "Dependence based prefetching for linked data structures." In Proceedings of the Eight International ConJkrence on Architectural Support for ~rvglutttttttttg l.~tt~uu~co llrtu LtF~ruttltg VIII), pages 115--126, Oct. 1998. Google ScholarDigital Library
- 39.J. H. Saltzer. "A simple linear model of demand paging performance." Communications of the ACM, 17(4):181-186, 1974. Google ScholarDigital Library
- 40.M. L. Seidl, and B. G Zorn "Segregating heap objects by reference behavior and lifetime." In Proceedings of the Eight gramming Languages and Operating Systems (ASPLOS VIII), pages 12-23, Oct. 1998. Google ScholarDigital Library
- 41.JaswinderPal Singh, Harold S. Stone, and Dominique F. Thiebaut. "A model of workloads and its use in miss-rate prediction for fully associative caches." IEI~I~ Transactions on Computers, 41 (7):811-825, 1992. Google ScholarDigital Library
- 42.A.J. Smith "A Compartive study pf giug iugiugvuiy}fg mapping algorithms and their use for cache and main memory.'' IEEE Trans. on Software En_~neering, 4(2):121-130, 1978.Google ScholarDigital Library
- 43.Alan J. Smith. "Cache memories." ACM Computing Surveys, 14(3):473-530, 1982. Google ScholarDigital Library
- 44.Burton J. Smith. "Architecture and applications of the HEP Multiprocerssor oihoi og9u iugiuguigiyfc yufui}yg iugh cessing IV, pages 241-248, 1981.Google Scholar
- 45.J.R. Spirn, editor. Program Behavior: Models and Measurements. Operating and Programming System Series, Elsevier, New York, 1977. Google ScholarDigital Library
- 46.J. W. Stamos. "Static grouping of small objects to enhance performance of a paged virtual memory." ACM Transactions on Programming Languages and Systems, 2(2):155-180, 1984. Google ScholarDigital Library
- 47.Dan N. Truong oilkhoih oihoi iohoi iouhbo oih oihoihiooihi ing cache behavior of dynamically allocated data structures." In International Conference on Parallel Architectures and Compilation Techniques, Oct. 1998. Google ScholarDigital Library
- 48.M. N. Tsangaris and J. Naughton. "On the performance of nh~r,t oh,etoriBg tot, hni.,,,,._.~,,,o." ._t- Proceedings A CM SIGMOD Intl. Conf. on Management of Data, pages 144-153, June 1992. Google ScholarDigital Library
- 49.G J. Ward. "The RADIANCE lighting simulation and rendering system." in Proceedings of SIGGRAPH '94, July 1994. Google ScholarDigital Library
- 50.M. V. Wilkes. "Siave memories and dynamic storage allocation.'' In IEEE Trans. on Electronic Computers, pages 270- 9'71 April 1965.Google Scholar
- 51.Paul R. Wilson, Michael S. Lain, and Thomas G Moher. "Effective "static-graph" reorganization to improve locality in garbage-collected systems." SiGPLAN Notices, 26(6):177- 191, June 1991. Proceedings of the A CM $IGPIMN'91 Conference on Programming Language Design and implementation. Google ScholarDigital Library
- 52.Michael E. Wolf and Monics bhouihoihoihihihp oihphj'p poi mizing algorithm." SIGPLAN Notices, 26(6):30--44, June 1991. Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation. Google ScholarDigital Library
Index Terms
- Cache-conscious structure layout
Recommendations
Cache-conscious structure definition
PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementationA program's cache performance can be improved by changing the organization and layout of its data---even complex, pointer-based data structures. Previous techniques improved the cache performance of these structures by arranging distinct instances to ...
Cache-conscious structure layout
Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating ...
Using generational garbage collection to implement cache-conscious data placement
The cost of accessing main memory is increasing. Machine designers have tried to mitigate the consequences of the processor and memory technology trends underlying this increasing gap with a variety of techniques to reduce or tolerate memory latency. ...
Comments