skip to main content
10.1145/301618.301633acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free Access

Cache-conscious structure layout

Authors Info & Claims
Published:01 May 1999Publication History

ABSTRACT

Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs.This paper explores a complementary approach that attacks the source (poor reference locality) of the problem rather than its manifestation (memory latency). It demonstrates that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance. It explores two placement techniques---clustering and coloring---that improve cache performance by increasing a pointer structure's spatial and temporal locality, and by reducing cache-conflicts.To reduce the cost of applying these techniques, this paper discusses two strategies---cache-conscious reorganization and cache-conscious allocation---and describes two semi-automatic tools---ccmorph and ccmalloc---that use these strategies to produce cache-conscious pointer structure layouts. ccmorph is a transparent tree reorganizer that utilizes topology information to cluster and color the structure. ccmalloc is a cache-conscious heap allocator that attempts to co-locate contemporaneously accessed data elements in the same physical cache block. Our evaluations, with microbenchmarks, several small benchmarks, and a couple of large real-world applications, demonstrate that the cache-conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefits---in most cases, significantly outperforming state-of-the-art prefetching.

References

  1. 1.A. Agarwal, M. Horowitz, and J. Hennessy. "An analytical cache model." A CM Transactions on Computer Systems, 7(2): 184--215, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.A.V. Aho, P.J. Denning, and J. D. Ullman. "Principles of optimal page replacement." Journal of the ACM, 18(1):80- 93, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.J. Banerjee, W. Kim, and J. F. Garza. "Clustering a DAG for CAD databases." IEEE Transactions on Software Engineering, 14(11): 1684--1699, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.R. Bayer and C. McCreight. "Organization and maintenance of large ordered indexes." Acta Informatica, 1(3):173--189, 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Veronique Benzaken and Claude Delobel. "Enhancing performance in a persistent object store: Clustering strategies in O2.' In Technical Report 50-90, Altair, Aug. 1990.Google ScholarGoogle Scholar
  6. 6.R.K. Brayton, G D. Hachtel, A. S. Vincentelli, F. Somenzi, A. Aziz, S. Cheng, S. Edwards, S. Khatri, Y. Kukimoto, A. Pardo, S. Qadeer, R. Ranjan, S. Sarwary, T.R. Shilpe, G Swamy, and T. Villa. "VIS: a system for verification and synthesis." In Proceedings of the Eight International Conference on Computer Aided Verification, July 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Doug Burger, James R. Goodman, and Alain Kagi. "Memory bandwidth limitations of future microprocessors." In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 78--89, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. "Cache-conscious data placement." In Proceedings of the Eight International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII), pages 139-149, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.David Callahan, Ken Kennedy, and Allan Poterfield. "Software prefetching." in Proceedings of the Fourth International Conjkrence on Architectural Support jbr Programming Languages and Operating Systems (ASPLOS IV), pages 40--52, April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. "Compiler optimizations for improving data locality." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V1), pages 252-262, Oct. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Trishul M. Chilimbi, and James R. Larus. "Using generational garbage collection to implement cache-conscious data placement." In Proceedings of the 1998 International Symposium on Memory Management, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Trishul M. Chilimbi, Bob Davidson, and James R. Lares. "Cache-conscious structure definition." In Proceedings of the A CM SIGPIMN'99 Conference on Programming Language Design and Implementation, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Douglas Comer. "The ubiquitous B-tree." ACM Computing Surveys, 11(2):121-137, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.R. Courts. "Improving locality of reference in a garbage-collecting memory management system." Communications of the ACM, 31(9):1128--1138, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.P. Drew and R. King. "The performance and utility of the Cactis implementation algorithms." In Proceedings of the 16th VLDB Conj~rence, pages 135-147, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Dennis Gannon, William Jalby, and K. Gallivan. "Strategies for cache and local memory management by global program transformation." Journal of Parallel and Distributed Computing, 5:587--616, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.N. Gloy, T. Blackwell, M. D. Smith, and B. Calder. "Procedure placement using temporal ordering information." In Proceedings of MICRO-30, Dee. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.I. J. Haikala. "Cache hit ratios with geometric task switch intervals." In Proceedings of the l lth Annual International Symposium on Computer Architecture, pages 364-371, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.Mark D. Hill and Alan Jay Smith. "Evaluafng associativity in CPU caches." IEEE Transactions on Computers, C- 38(12): 1612-1630, December 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.David Kroft. "Lockup-free instruction fetch/prefetch cache organization." In The 8th Annual International Symposium on Computer Architecture, pages 81--87, May 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.M.S. Lam, P.R. Wilson, and T. G Moher. "Object type directed garbage collection to improve locality." In Proceedings of the International Workshop on Memory Management, pages 16-18, Sept. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf. "The cache performance and optimizations of blocked algorithms.'' In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, Santa Clara, California, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.Anthony LaMarca and Richard E. Ladner. "The influence of caches on the performance of heaps." ACMJournal of ExperimentalAlgorithmics, 1, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.Anthony LaMarca and Richard E. Ladner. "The influence of caches on the performance of sorting." In Eight Annual A CM- SIAM Symposium on Discrete Algorithms, Jan. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.James Laudon, Anoop Gupta, and Mark Horowitz. "Interleaving: A multithreading technique targeting multiprocessors and workstations." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, San Jose, California, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.Chi-Keung Luk and Todd C. Mowry. "Compiler-based prefetching for recursive data structures." In Proceedings of the Seventh international Conference on Architectural Support for Programming Languages and Operating b3,stems (,,tSPLOS VIi), pages 222-233, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.Scott McFarling. "Program optimization for instruction caches." In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 183-191, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.D. A. Moon. "Garbage collection in a large LISP system." In ConJbrence Record of the i984 Symposium on LISP and Functional Programming, pages 235--246, Aug. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.Todd C. Mowry, Monica S. Lain, and Anoop Gupta. "Design and evaluation of a compiler algorithm for prefetching." In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IO, pages 62-73, October 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30.V. S. Pai, P. Ranganathan, and S. V. Adve. "RSIM reference manual version 1.0." In Technical Report 9705, Dept. of Electrical and Computer Engineering, Rice University, Aug. 1997.Google ScholarGoogle Scholar
  31. 31.V. $. Pai, P. Ranganathan, S.V. Adve, and T. Harton. "An evaluation of memory consistency models for shared-memory systems with ILP processors." In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 12-23, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.David Patterson, Thnm~~ And~renn Monl /~rdxvoll Rieh~rrl Fromm, Kimberly Keaton, Christoforos Kazyrakis, Randi Thomas, and Katherine Yellick. "A case for intelligent RAM." In IEEE Micro, pages 34--44, Apr 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33.Sharon E. Perl and Richard L. Sites. "Studies of Windows NT pcHormancc using execution traces." in Symposium on Operating Systems Design and Implementation, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34.Karl Pettis and Robert C. Hansen. "Profile guided code positioning.'' SIGPLAN Notices, 25(6):16-27, June 1990. Proceedings of the A CM SIGPLAN'90 Conference on ProgrammingLanguage Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35.G. S. Rao. ;;Performance analysis of cache memories." Journal of the ACM, 25(3):378-395, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. 36.G.S. Rao 'Performance anayyisgb iugbiu iugbiu ivbiu ing dynamic data structures on distributed memory machines." A CM Transactions on Programming Languages and Systems, 17(2), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37.M. Rosenblm, E. Bugnion, S.A. Herrod, E. Witchel, and A. Gupta. "The impact of architectural trends on operating system performance." In Proceedings of the 15th ACM Symposium on Operating System Pnnciples (SOSP), pages 285- 298, Dec. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. 38.A. Roth, A. Moshovos, and GS. Sohi. "Dependence based prefetching for linked data structures." In Proceedings of the Eight International ConJkrence on Architectural Support for ~rvglutttttttttg l.~tt~uu~co llrtu LtF~ruttltg VIII), pages 115--126, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. 39.J. H. Saltzer. "A simple linear model of demand paging performance." Communications of the ACM, 17(4):181-186, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. 40.M. L. Seidl, and B. G Zorn "Segregating heap objects by reference behavior and lifetime." In Proceedings of the Eight gramming Languages and Operating Systems (ASPLOS VIII), pages 12-23, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. 41.JaswinderPal Singh, Harold S. Stone, and Dominique F. Thiebaut. "A model of workloads and its use in miss-rate prediction for fully associative caches." IEI~I~ Transactions on Computers, 41 (7):811-825, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. 42.A.J. Smith "A Compartive study pf giug iugiugvuiy}fg mapping algorithms and their use for cache and main memory.'' IEEE Trans. on Software En_~neering, 4(2):121-130, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. 43.Alan J. Smith. "Cache memories." ACM Computing Surveys, 14(3):473-530, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. 44.Burton J. Smith. "Architecture and applications of the HEP Multiprocerssor oihoi og9u iugiuguigiyfc yufui}yg iugh cessing IV, pages 241-248, 1981.Google ScholarGoogle Scholar
  45. 45.J.R. Spirn, editor. Program Behavior: Models and Measurements. Operating and Programming System Series, Elsevier, New York, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. 46.J. W. Stamos. "Static grouping of small objects to enhance performance of a paged virtual memory." ACM Transactions on Programming Languages and Systems, 2(2):155-180, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. 47.Dan N. Truong oilkhoih oihoi iohoi iouhbo oih oihoihiooihi ing cache behavior of dynamically allocated data structures." In International Conference on Parallel Architectures and Compilation Techniques, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. 48.M. N. Tsangaris and J. Naughton. "On the performance of nh~r,t oh,etoriBg tot, hni.,,,,._.~,,,o." ._t- Proceedings A CM SIGMOD Intl. Conf. on Management of Data, pages 144-153, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. 49.G J. Ward. "The RADIANCE lighting simulation and rendering system." in Proceedings of SIGGRAPH '94, July 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. 50.M. V. Wilkes. "Siave memories and dynamic storage allocation.'' In IEEE Trans. on Electronic Computers, pages 270- 9'71 April 1965.Google ScholarGoogle Scholar
  51. 51.Paul R. Wilson, Michael S. Lain, and Thomas G Moher. "Effective "static-graph" reorganization to improve locality in garbage-collected systems." SiGPLAN Notices, 26(6):177- 191, June 1991. Proceedings of the A CM $IGPIMN'91 Conference on Programming Language Design and implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. 52.Michael E. Wolf and Monics bhouihoihoihihihp oihphj'p poi mizing algorithm." SIGPLAN Notices, 26(6):30--44, June 1991. Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cache-conscious structure layout

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in
                    • Published in

                      cover image ACM Conferences
                      PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
                      May 1999
                      304 pages
                      ISBN:1581130945
                      DOI:10.1145/301618

                      Copyright © 1999 ACM

                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 1 May 1999

                      Permissions

                      Request permissions about this article.

                      Request Permissions

                      Check for updates

                      Qualifiers

                      • Article

                      Acceptance Rates

                      PLDI '99 Paper Acceptance Rate26of130submissions,20%Overall Acceptance Rate406of2,067submissions,20%

                      Upcoming Conference

                      PLDI '24

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader