skip to main content
10.1145/1993478.1993486acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article

Cache index-aware memory allocation

Published:04 June 2011Publication History

ABSTRACT

Poor placement of data blocks in memory may negatively impact application performance because of an increase in the cache conflict miss rate [18]. For dynamically allocated structures this placement is typically determined by the memory allocator. Cache index-oblivious allocators may inadvertently place blocks on a restricted fraction of the available cache indexes, artificially and needlessly increasing the conflict miss rate. While some allocators are less vulnerable to this phenomena, no general-purpose malloc allocator is index-aware and methodologically addresses this concern. We demonstrate that many existing state-of-the-art allocators are index-oblivious, admitting performance pathologies for certain block sizes. We show that a simple adjustment within the allocator to control the spacing of blocks can provide better index coverage, which in turn reduces the superfluous conflict miss rate in various applications, improving performance with no observed negative consequences. The result is an index-aware allocator. Our technique is general and can easily be applied to most memory allocators and to various processor architectures.

Furthermore, we can reduce inter-thread and inter-process conflict misses for processors where threads concurrently share the level-1 cache such as the Sun UltraSPARC-T2 and Intel "Nehalem" by coloring the placement of blocks so that allocations for different threads and processes start on different cache indexes.

References

  1. The Solaris Schedctl Facility. US Patent #5937187.Google ScholarGoogle Scholar
  2. hoard.org, 2010 (accessed 2010-1-14). hoard.org.Google ScholarGoogle Scholar
  3. Amino-CBBS, 2010 (accessed 2010-1-5). amino-cbbs.sourceforge.net.Google ScholarGoogle Scholar
  4. BioPerf, 2010 (accessed 2010-5-19). http://www.bioperf.org/.Google ScholarGoogle Scholar
  5. dlmalloc, 2010 (accessed 2010-7-2). http://gee.cs.oswego.edu/dl/html/malloc.html.Google ScholarGoogle Scholar
  6. tcmalloc: version 1.6, 2010 (accessed November 9, 2010). http://code.google.com/p/google-perftools/.Google ScholarGoogle Scholar
  7. D. Bader, Y. Li, T. Li, and V. Sachdeva. Bioperf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications. IEEE Workload Characterization Symposium, 0:163--173, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: a scalable memory allocator for multithreaded applications. In Proc. ninth international conference on Architectural Support for Programming Languages and Operating Systems, pages 117--128, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Bonwick. The slab allocator: an object-caching kernel memory allocator. In USTC'94: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, pages 6--6, Berkeley, CA, USA, 1994. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. SIGPLAN Not., 33(11):139--149, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. M. Chilimbi, M. D. Hill, and J. R. Larus. Making pointer-based data structures cache conscious. Computer, 33(12):67--74, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. SIGARCH Comput. Archit. News, 32(2):76, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Dice. Dave Dice's blog, 2010 (accessed 2010-7-21). thttp://blogs.sun.com/dave/entry/solaris_scheduling_and_cpuids.Google ScholarGoogle Scholar
  14. D. Dice and A. Garthwaite. Mostly lock-free malloc. In Proc. 3rd International Symposium on Memory Management, pages 163--174, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Dice, Y. Lev, V. J. Marathe, M. Moir, D. Nussbaum, and M. Oleszewski. Simplifying concurrent algorithms by exploiting hardware transactional memory. In SPAA '10: Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, pages 325--334, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Evans. A scalable concurrent malloc(3) implementation for freebsd, 2006.Google ScholarGoogle Scholar
  17. A. González, M. Valero, N. Topham, and J. M. Parcerisa. Eliminating cache conflict misses through xor-based placement functions. In ICS '97: Proceedings of the 11th international conference on Supercomputing, pages 76--83, New York, NY, USA, 1997. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Trans. Comput., 38(12):1612--1630, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. S. Johnstone and P. R. Wilson. The memory fragmentation problem: Solved? In ISMM, pages 26--36, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. SIGARCH Comput. Archit. News, 18(3a):364--373, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. E. Kessler and M. D. Hill. Page placement algorithms for large real-indexed caches. ACM Trans. Comput. Syst., 10(4):338--359, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Kistler and M. Franz. Automated data-member layout of heap objects to improve memory-hierarchy performance. ACM Trans. Program. Lang. Syst., 22(3):490--505, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. C. Knowlton. A fast storage allocator. Commun. ACM, 8(10):623--624, 1965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Lattner and V. Adve. Automatic pool allocation: improving performance by controlling data structure layout in the heap. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 129--142, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. R. Lebeck and D. A. Wood. Cache profiling and the spec benchmarks: A case study. Computer, 27(10):15--26, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. V. B. Lvin, G. Novark, E. D. Berger, and B. G. Zorn. Archipelago: trading address space for reliability and security. SIGOPS Oper. Syst. Rev., 42(2):115--124, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. M. Michael. Scalable Lock-free Dynamic Memory Allocation. In Proc. ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 35--46, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Min and Y. Hu. Improving performance of large physically indexed caches by decoupling memory addresses from cache addresses. IEEE Trans. Comput., 50(11):1191--1201, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. Petrank and D. Rawitz. The hardness of cache conscious data placement. In POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 101--112, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Romer, D. Lee, B. N. Bershad, and J. B. Chen. Dynamic page mapping policies for cache conflict resolution on standard hardware. In In 1st USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 255--266, 1994.Google ScholarGoogle Scholar
  31. D. Sanchez and K. Christos. The zcache: Decoupling ways and associativity. In MICRO 43: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Seznec. A case for two-way skewed-associative caches. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pages 169--178, New York, NY, USA, 1993. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. J. ACM, 32(3):652--686, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. V. Čakarević, P. Radojković, J. Verdú, A. Pajuelo, F. J. Cazorla, M. Nemirovsky, and M. Valero. Characterizing the resource-sharing levels in the ultrasparc t2 processor. In MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 481--492, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Z. Wang and R. B. Lee. A novel cache architecture with enhanced performance and security. In MICRO 41: Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, pages 83--93, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. B. Zilles. Benchmark health considered harmful. SIGARCH Comput. Archit. News, 29(3):4--5, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Zukowski, S. Héman, and P. Boncz. Architecture-conscious hashing. In DaMoN '06: Proceedings of the 2nd international workshop on Data management on new hardware, page 6, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cache index-aware memory allocation

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ISMM '11: Proceedings of the international symposium on Memory management
    June 2011
    148 pages
    ISBN:9781450302630
    DOI:10.1145/1993478
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 11
      ISMM '11
      November 2011
      135 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2076022
      Issue’s Table of Contents

    Copyright © 2011 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 4 June 2011

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate72of156submissions,46%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader