Abstract
On-chip hardware coherence can scale gracefully as the number of cores increases.
- Agarwal, A., Simoni, R., Horowitz, M., and Hennessy, J. An evaluation of directory schemes for cache coherence. In Proceedings of the 15th Annual International Symposium on Computer Architecture (Honolulu, May). IEEE Computer Society Press, Los Alamitos, CA, 1988, 280--298. Google ScholarDigital Library
- Boyd-Wickizer, S. Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., and Zeldovich, N. An analysis of Linux scalability to many cores. In Proceedings of the Ninth USENIX Symposium on Operating Systems Design and Implementation (Vancouver, Oct. 4--6). USENIX Association, Berkeley, CA, 2010, 1--8. Google ScholarDigital Library
- Bryant, R. Scaling Linux to the extreme. In Proceedings of the Linux Symposium (Boston, June 27--July 2, 2004), 133--148.Google Scholar
- Butler, M., Barnes, L., Sarma, D.D., and Gelinas, B. Bulldozer: An approach to multithreaded compute performance. IEEE Micro 31, 2 (Mar./Apr. 2011), 6--15. Google ScholarDigital Library
- Choi, B., Komuravelli, R., Sung, H., Smolinski, R., Honarmand, N., Adve, S.V., Adve, V.S., Carter, N.P., and Chou, C.-T. DeNovo: Rethinking the memory hierarchy for disciplined parallelism. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (Galveston Island, TX, Oct. 10--14). IEEE Computer Society, Washington, D.C., 2011, 155--166. Google ScholarDigital Library
- Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., and Hughes, B. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 2 (Mar./Apr. 2010), 16--29. Google ScholarDigital Library
- Ferdman, M., Lotfi-Kamran, P., Balet, K., and Falsafi, B. Cuckoo directory: Efficient and scalable CMP coherence. In Proceedings of the 17th Symposium on High-Performance Computer Architecture (San Antonio, TX, Feb. 12--16). IEEE Computer Society, Washington, D.C., 2011, 169--180. Google ScholarDigital Library
- Hill, M.D., Larus, J.R., Reinhardt, S.K., and Wood, D.A. Cooperative shared memory: Software and hardware for scalable multiprocessors. ACM Transactions on Computer Systems 11, 4 (Nov. 1993), 300--318. Google ScholarDigital Library
- Hill, M.D. and Smith, A.J. Evaluating associativity in CPU caches. IEEE Transactions on Computers 38, 12 (Dec. 1989), 1612--1630. Google ScholarDigital Library
- Howard, J. et al. A 48-core IA-32 message-passing processor with DV FS in 45nm CMOS. In Proceedings of the International Solid-State Circuits Conference (San Francisco, Feb. 7--11, 2010), 108--109.Google Scholar
- Jaleel, A., Borch, E., Bhandaru, M., Steely Jr., S.C., and Emer, J. Achieving noninclusive cache performance with inclusive caches: Temporal locality-aware cache management policies. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (Atlanta, Dec. 4--8). IEEE Computer Society, Washington, D.C., 2010, 151--162. Google ScholarDigital Library
- Kelm, J.H., Johnson, D.R., Johnson, M.R., Crago, N.C., Tuohy, W., Mahesri, A., Lumetta, S.S., Frank, M.I., and Patel, S.J. Rigel: An architecture and scalable programming interface for a 1,000-core accelerator. In Proceedings of the 36th Annual International Symposium on Computer Architecture (Austin, TX, June 20--24). ACM Press, New York, 2009, 140--151. Google ScholarDigital Library
- Kelm, J.H., Johnson, D.R., Tuohy, W., Lumetta, S.S., and Patel, S.J. Cohesion: An adaptive hybrid memory model for accelerators. IEEE Micro 31, 1 (Jan./Feb. 2011), 42--55. Google ScholarDigital Library
- Laudon, J. and Lenoski, D. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24th Annual International Symposium on Computer Architecture (Denver, June 2--4). ACM Press, New York, 1997, 241--251. Google ScholarDigital Library
- Nickolls, J. and Dally, W.J. The GPU computing era. IEEE Micro 30, 2 (Mar./Apr. 2010), 56--69. Google ScholarDigital Library
- Shah, M., Barren, J., Brooks, J., Golla, R., Grohoski, G., Gura, N., Hetherington, R., Jordan, P., Luttrell, M., Olson, C., Sana, B., Sheahan, D., Spracklen, L., and Wynn, W. UltraSPARC T2: A highly treaded, power-efficient SPARC SOC. In Proceedings of the IEEE Asian Solid-State Circuits Conference (Jeju, Korea, Nov. 12--14, 2007), 22--25.Google ScholarCross Ref
- Singhal, R. Inside Intel next-generation Nehalem microarchitecture. Hot Chips 20 (Stanford, CA, Aug. 24--26, 2008).Google Scholar
- Sorin, D.J., Hill, M.D., and Wood, D.A. A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool Publishers, 2011. Google ScholarDigital Library
- Zhang, M., Lebeck, A.R., and Sorin, D.J. Fractal coherence: Scalably verifiable cache coherence. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (Atlanta, Dec. 4--8). IEEE Computer Society, Washington, D.C., 2010, 471--482. Google ScholarDigital Library
Index Terms
- Why on-chip cache coherence is here to stay
Recommendations
A hybrid NoC design for cache coherence optimization for chip multiprocessors
DAC '12: Proceedings of the 49th Annual Design Automation ConferenceOn chip many-core systems, evolving from prior multi-processor systems, are considered as a promising solution to the performance scalability and power consumption problems. The long communication distance between the traditional multi-processors makes ...
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors
Chip Multiprocessors (CMPs) have different technological parameters and physical constraints than earlier multi-processor systems, which should be taken into consideration when designing cache coherence protocols. Also, contemporary cache coherence ...
Comments