research-article

Cache index-aware memory allocation

Authors:
Yehuda Afek

Tel Aviv University, Tel Aviv, Israel

Tel Aviv University, Tel Aviv, Israel
View Profile

,
Dave Dice

Oracle Labs, Burlington, MA, USA

Oracle Labs, Burlington, MA, USA
View Profile

,
Adam Morrison

Tel Aviv University, Tel Aviv, Israel

Tel Aviv University, Tel Aviv, Israel
View Profile

ISMM '11: Proceedings of the international symposium on Memory managementJune 2011Pages 55–64https://doi.org/10.1145/1993478.1993486

Published:04 June 2011Publication History

ISMM '11: Proceedings of the international symposium on Memory management

Pages 55–64

ABSTRACT

Poor placement of data blocks in memory may negatively impact application performance because of an increase in the cache conflict miss rate [18]. For dynamically allocated structures this placement is typically determined by the memory allocator. Cache index-oblivious allocators may inadvertently place blocks on a restricted fraction of the available cache indexes, artificially and needlessly increasing the conflict miss rate. While some allocators are less vulnerable to this phenomena, no general-purpose malloc allocator is index-aware and methodologically addresses this concern. We demonstrate that many existing state-of-the-art allocators are index-oblivious, admitting performance pathologies for certain block sizes. We show that a simple adjustment within the allocator to control the spacing of blocks can provide better index coverage, which in turn reduces the superfluous conflict miss rate in various applications, improving performance with no observed negative consequences. The result is an index-aware allocator. Our technique is general and can easily be applied to most memory allocators and to various processor architectures.

Furthermore, we can reduce inter-thread and inter-process conflict misses for processors where threads concurrently share the level-1 cache such as the Sun UltraSPARC-T2^™ and Intel "Nehalem" by coloring the placement of blocks so that allocations for different threads and processes start on different cache indexes.

References

The Solaris Schedctl Facility. US Patent #5937187.Google Scholar
hoard.org, 2010 (accessed 2010-1-14). hoard.org.Google Scholar
Amino-CBBS, 2010 (accessed 2010-1-5). amino-cbbs.sourceforge.net.Google Scholar
BioPerf, 2010 (accessed 2010-5-19). http://www.bioperf.org/.Google Scholar
dlmalloc, 2010 (accessed 2010-7-2). http://gee.cs.oswego.edu/dl/html/malloc.html.Google Scholar
tcmalloc: version 1.6, 2010 (accessed November 9, 2010). http://code.google.com/p/google-perftools/.Google Scholar
D. Bader, Y. Li, T. Li, and V. Sachdeva. Bioperf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications. IEEE Workload Characterization Symposium, 0:163--173, 2005.Google ScholarCross Ref
E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: a scalable memory allocator for multithreaded applications. In Proc. ninth international conference on Architectural Support for Programming Languages and Operating Systems, pages 117--128, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
J. Bonwick. The slab allocator: an object-caching kernel memory allocator. In USTC'94: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, pages 6--6, Berkeley, CA, USA, 1994. USENIX Association. Google ScholarDigital Library
B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. SIGPLAN Not., 33(11):139--149, 1998. Google ScholarDigital Library
T. M. Chilimbi, M. D. Hill, and J. R. Larus. Making pointer-based data structures cache conscious. Computer, 33(12):67--74, 2000. Google ScholarDigital Library
Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. SIGARCH Comput. Archit. News, 32(2):76, 2004. Google ScholarDigital Library
D. Dice. Dave Dice's blog, 2010 (accessed 2010-7-21). thttp://blogs.sun.com/dave/entry/solaris_scheduling_and_cpuids.Google Scholar
D. Dice and A. Garthwaite. Mostly lock-free malloc. In Proc. 3rd International Symposium on Memory Management, pages 163--174, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
D. Dice, Y. Lev, V. J. Marathe, M. Moir, D. Nussbaum, and M. Oleszewski. Simplifying concurrent algorithms by exploiting hardware transactional memory. In SPAA '10: Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, pages 325--334, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
J. Evans. A scalable concurrent malloc(3) implementation for freebsd, 2006.Google Scholar
A. González, M. Valero, N. Topham, and J. M. Parcerisa. Eliminating cache conflict misses through xor-based placement functions. In ICS '97: Proceedings of the 11th international conference on Supercomputing, pages 76--83, New York, NY, USA, 1997. ACM. Google ScholarDigital Library
M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Trans. Comput., 38(12):1612--1630, 1989. Google ScholarDigital Library
M. S. Johnstone and P. R. Wilson. The memory fragmentation problem: Solved? In ISMM, pages 26--36, 1998. Google ScholarDigital Library
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. SIGARCH Comput. Archit. News, 18(3a):364--373, 1990. Google ScholarDigital Library
R. E. Kessler and M. D. Hill. Page placement algorithms for large real-indexed caches. ACM Trans. Comput. Syst., 10(4):338--359, 1992. Google ScholarDigital Library
T. Kistler and M. Franz. Automated data-member layout of heap objects to improve memory-hierarchy performance. ACM Trans. Program. Lang. Syst., 22(3):490--505, 2000. Google ScholarDigital Library
K. C. Knowlton. A fast storage allocator. Commun. ACM, 8(10):623--624, 1965. Google ScholarDigital Library
C. Lattner and V. Adve. Automatic pool allocation: improving performance by controlling data structure layout in the heap. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 129--142, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
A. R. Lebeck and D. A. Wood. Cache profiling and the spec benchmarks: A case study. Computer, 27(10):15--26, 1994. Google ScholarDigital Library
V. B. Lvin, G. Novark, E. D. Berger, and B. G. Zorn. Archipelago: trading address space for reliability and security. SIGOPS Oper. Syst. Rev., 42(2):115--124, 2008. Google ScholarDigital Library
M. M. Michael. Scalable Lock-free Dynamic Memory Allocation. In Proc. ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 35--46, 2004. Google ScholarDigital Library
R. Min and Y. Hu. Improving performance of large physically indexed caches by decoupling memory addresses from cache addresses. IEEE Trans. Comput., 50(11):1191--1201, 2001. Google ScholarDigital Library
E. Petrank and D. Rawitz. The hardness of cache conscious data placement. In POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 101--112, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
T. Romer, D. Lee, B. N. Bershad, and J. B. Chen. Dynamic page mapping policies for cache conflict resolution on standard hardware. In In 1st USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 255--266, 1994.Google Scholar
D. Sanchez and K. Christos. The zcache: Decoupling ways and associativity. In MICRO 43: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2010. Google ScholarDigital Library
A. Seznec. A case for two-way skewed-associative caches. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pages 169--178, New York, NY, USA, 1993. ACM. Google ScholarDigital Library
D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. J. ACM, 32(3):652--686, 1985. Google ScholarDigital Library
V. Čakarević, P. Radojković, J. Verdú, A. Pajuelo, F. J. Cazorla, M. Nemirovsky, and M. Valero. Characterizing the resource-sharing levels in the ultrasparc t2 processor. In MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 481--492, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Z. Wang and R. B. Lee. A novel cache architecture with enhanced performance and security. In MICRO 41: Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, pages 83--93, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarDigital Library
C. B. Zilles. Benchmark health considered harmful. SIGARCH Comput. Archit. News, 29(3):4--5, 2001. Google ScholarDigital Library
M. Zukowski, S. Héman, and P. Boncz. Architecture-conscious hashing. In DaMoN '06: Proceedings of the 2nd international workshop on Data management on new hardware, page 6, New York, NY, USA, 2006. ACM. Google ScholarDigital Library

Index Terms

Cache index-aware memory allocation
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Allocation / deallocation strategies

Recommendations

Cache index-aware memory allocation
ISMM '11

Poor placement of data blocks in memory may negatively impact application performance because of an increase in the cache conflict miss rate [18]. For dynamically allocated structures this placement is typically determined by the memory allocator. Cache ...
Read More
Reducing traffic generated by conflict misses in caches
CF '04: Proceedings of the 1st conference on Computing frontiers

Off-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a ...
Read More
Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture

Recent research advocates large die-stacked DRAM caches in many core servers to break the memory latency and bandwidth wall. To realize their full potential, die-stacked DRAM caches necessitate low lookup latencies, high hit rates and the efficient use ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISMM '11: Proceedings of the international symposium on Memory management
June 2011
148 pages
ISBN:9781450302630
DOI:10.1145/1993478
General Chair:
Hans-J. Boehm
HP Labs
,
Program Chair:
David Bacon
IBM T.J. Watson Research
ACM SIGPLAN Notices Volume 46, Issue 11
ISMM '11
November 2011
135 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2076022
Issue’s Table of Contents
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
caches
conflict misses
dynamic storage allocators
malloc
memory allocation
placement policies
shared caches
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate72of156submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 594
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cache index-aware memory allocation

ISMM '11: Proceedings of the international symposium on Memory management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cache index-aware memory allocation

Reducing traffic generated by conflict misses in caches

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache