Top

The Journal of Supercomputing

Published in:

01-11-2017

Reducing the second-level cache conflict misses using a set folding technique

Authors: Ali Shatnawi, Mohammad Alsaedeen

Published in: The Journal of Supercomputing | Issue 2/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The cache memory has a direct effect on the performance of a computer system. Instructions and data are fetched from a fast cache instead of a slow memory to save hundreds of cycles. Reducing the cache miss ratio will definitely improve the execution time of an application. In this work, we propose cache memory designs that reduce the number of conflict misses significantly. The proposed designs reduce the conflict misses in the last level multi-way set associative cache. Each set is divided into a group of subsets: the first is referred to as the exclusive subset, and the rest are the shared subsets. The exclusive is configured as a traditional cache where each block is mapped to the set whose index matches the block index. In addition to their standard cache indexing role, the shared subsets are configured to host blocks with different indices. A memory block can be mapped to one subset from the exclusive type or one of multiple subsets from the shared type. Since the proposed technique is based on combining multiple sets of the shared part to form a larger set, that is shared between memory blocks with different indices, we have chosen the name “set folding.” The decision as to where to map a memory block depends on the number of misses encountered at each of the potential target sets. To evaluate the proposed design based on the overall hit rate, twenty-three benchmarks from SPEC CPU 2006 were simulated using the SuperESCalar simulator. The proposed designs require a few extra storage bits which adds a small overhead on the hardware complexity in comparison with the conventional cache. However, the proposed designs achieve lower miss rates for most of the benchmarks.

previous article A deadlock-free routing algorithm for irregular 3D network-on-chips with wireless links

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Jl H, Da P (2007) Computer architecture: a quantitative approach, vol 4. Kaufmann Publishers, San Francisco

Moore GE et al (1998) Cramming more components onto integrated circuits. Proc IEEE 86(1):82–85CrossRef

Kagi A, Goodman JR, Burger D (1996) Memory bandwidth limitations of future microprocessors. In: Computer Architecture, 23rd Annual International Symposium on IEEE 1996, pp 78–78

Patt YN, Patel SJ, Evers M, Friendly DH, Stark J (1997) One billion transistors, one uniprocessor, one chip. Computer 30(9):51–57CrossRef

Agarwal A, Pudar SD (1993) Column-associative caches: a technique for reducing the miss rate of direct-mapped caches, vol 21 (2). ASM

Jouppi NP (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Computer Architecture, Proceedings of the 17th Annual International Symposium on IEEE 1990, pp 364–373

Smith AJ (1982) Cache memories. ACM Comput Surv (CSUR) 14(3):473–530MathSciNetCrossRef

Seznec A (1993) A case for two-way skewed-associative caches. In: ACM SIGARCH computer architecture news, vol 21(2). ACM, pp 169–178

Kharbutli M, Irwin K, Solihin Y, Lee J (2004) Using prime numbers for cache indexing to eliminate conflict misses. In: Software, IEE Proceedings of IEEE, pp 288–299

10.

Bodin F, Seznec A (1997) Skewed-associativity improves performance and enhances predictability. IEEE Trans Comput 46(5):530–544CrossRef

11.

Bodin F, Seznec A (1995) Skewed associativity enhances performance predictability. In: Proceedings of the 22nd annual international symposium on computer architecture, 22–24 June 1995. S. Margherita Ligure, Italy, pp 265–274

12.

Bugnion E, Anderson JM, Mowry TC, Rosenblum M, Lam MS (1996) Compiler-directed page coloring for multiprocessors. In: ACM SIGPLAN notices, vol 31(9). ACM, pp 244–255

13.

Kessler RE, Hill MD (1992) Page placement algorithms for large real-indexed caches. ACM Trans Comput Syst (TOCS) 10(4):338–359CrossRef

14.

Zhang C (2006) Balanced cache: reducing conflict misses of direct-mapped caches. ACM SIGARCH Compu Archit News 34(2):155–166CrossRef

15.

Ros A, Xekalakis P, Cintra M, Acacio ME, Garcia JM (2015) Adaptive selection of cache indexing bits for removing conflict misses. IEEE Trans Comput 64(6):1534–1547MathSciNetMATH

16.

Qureshi MK, Thompson D, Patt YN (2005) The v-way cache: demand-based associativity via global replacement. In: Computer Architecture, ISCA’05. Proceedings of the 32nd International Symposium on IEEE 2005, pp 544–555

17.

Rolán D, Fraguela BB, Doallo R (2009) Adaptive line placement with the set balancing cache. In: Microarchitecture, MICRO-42. 42nd Annual IEEE/ACM International Symposium on IEEE 2009, pp 529–540

18.

González A, Valero M, Topham N, Parcerisa JM (1997) Eliminating cache conflict misses through xor-based placement functions. In: Proceedings of the 11th International Conference on Supercomputing. ACM, pp 76–83

19.

Rau BR (1991) Pseudo-randomly interleaved memory. In: ACM SIGARCH computer architecture news, vol 19(3). ACM, pp 74–83

20.

Givargis T (2003) Improved indexing for cache miss reduction in embedded systems. In: Design Automation Conference, Proceedings of IEEE 2003, pp 875–880

21.

Johnson TL, Connors DA, Merten MC, Hwu W-M (1999) Run-time cache bypassing. IEEE Trans Comput 48(12):1338–1354CrossRef

22.

Collins JD, Tullsen DM (2001) Runtime identification of cache conflict misses: the adaptive miss buffer. ACM Trans Comput Syst (TOCS) 19(4):413–439CrossRef

23.

Balasubramonian R, Albonesi D, Buyuktosunoglu A, Dwarkadas S (2000) Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In: Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. ACM, pp 245–257

24.

Chiou D, Jain P, Devadas S, Rudolph L (2000) Cache partitioning via columnization. In: Proceedings of Design Automation Conference. minus 0.4emCiteseer

25.

Ranganathan P, Adve S, Jouppi NP (2000) Reconfigurable caches and their application to media processing, vol 28(2). ACM

26.

Bershad BN, Lee D, Romer TH, Chen JB (1994) Avoiding conflict misses dynamically in large direct-mapped caches. In: ACM SIGPLAN notices, vol 29(11). ACM, pp. 158–170

27.

Sherwood T, Calder B, Emer J (1999) Reducing cache misses using hardware and software page placement. In: Proceedings of the 13th International Conference on Supercomputing. ACM, pp 155–164

28.

Chu Y, Ito MR (2000) The 2-way thrashing-avoidance cache (tac): an efficient instruction cache scheme for object-oriented languages. In: Computer Design, Proceedings of 2000 International Conference on IEEE, pp 93–98

29.

Chu Y, Ito M (2001) An efficient instruction cache scheme for object-oriented languages. In: Performance, Computing, and Communications, IEEE International Conference on IEEE, pp 329–336

30.

Calder B, Grunwald D, Zorn B (1994) Quantifying behavioral differences between c and c++ programs. J Program Lang 2(4):313–351

31.

Das S, Kapoor HK (2015) Dynamic associativity management using utility based way-sharing. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. ACM, pp 1919–1924

32.

Das S, Kapoor HK (2013) Dynamic associativity management using fellow sets. In: Electronic System Design (ISED), International Symposium on IEEE 2013, pp 133–137

33.

Salwan H (2013) Global conflict avoidance using block placement strategies in multi-level caches. In: Information and Communication Technologies (ICT), 2013 IEEE Conference on IEEE, pp 1221–1226

34.

Rolán D, Fraguela BB, Doallo R (2010) Reducing capacity and conflict misses using set saturation levels. In: High Performance Computing (HiPC), 2010 International Conference on IEEE, pp 1–9

35.

Jia X, Jiang J, Ni X, Zhao T, Qi S, Fu G, Zhang M (2011) Understanding how non-uniform distribution of memory accesses on cache sets affects the system performance of chip multiprocessors. In: Parallel and Distributed Processing with Applications Workshops (ISPAW), Ninth IEEE International Symposium on IEEE 2011, pp 266–272

36.

Wang B, Liu Z, Wang X, Yu W (2015) Eliminating intra-warp conflict misses in GPU. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition. EDA Consortium, pp 689–694

37.

Hong C, Bao W, Cohen A, Krishnamoorthy S, Pouchet L-N, Rastello F, Ramanujam J, Sadayappan P (2016) Effective padding of multidimensional arrays to avoid cache conflict misses. In: ACM SIGPLAN notices, vol 51(6). ACM, pp 129–144

38.

Khairy M, Zahran M, Wassal A (2017) Sacat: streaming-aware conflict-avoiding thrashing-resistant gpgpu cache management scheme. IEEE Trans Parallel Distrib Syst 28(6):1740–1753CrossRef

39.

Sato Y, Endo T (2017) An accurate simulator of cache-line conflicts to exploit the underlying cache performance. In: European Conference on Parallel Processing. Springer, pp 119–133

40.

Austin T, Larson E, Ernst D (2002) Simplescalar: an infrastructure for computer system modeling. Computer 35(2):59–67CrossRef

41.

Ortego PM, Sack P (2004) Sesc: Superescalar simulator. In: 17th Euro Micro Conference on Real Time Systems (ECRTS05), pp 1–4

42.

Spec cpu benchmarks. http://www.spec.org/benchmarks.html

43.

Spec cpu2000. http://www.spec.org/cpu2000

44.

Spec cpu2006. http://www.spec.org/cpu2006

45.

Beckmann N, Sanchez D (2015) Talus: a simple way to remove cliffs in cache performance. In: High Performance Computer Architecture (HPCA), IEEE 21st International Symposium on IEEE 2015, pp 64–75

Title: Reducing the second-level cache conflict misses using a set folding technique
Authors: Ali Shatnawi
Mohammad Alsaedeen
Publication date: 01-11-2017
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 2/2018
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-017-2174-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 2/2018

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Toward the design of a novel hybrid parallel N-body method in scope of modern cloud architectures

A new clustering approach in wireless sensor networks using fuzzy system

Large-scale simulation of a self-organizing self-management cloud computing framework

Special section: towards high performance computing in the cloud

Dlog: diagnosing router events with syslogs for anomaly detection

Premium Partner