Skip to main content
Top
Published in: The Journal of Supercomputing 2/2018

01-11-2017

Reducing the second-level cache conflict misses using a set folding technique

Authors: Ali Shatnawi, Mohammad Alsaedeen

Published in: The Journal of Supercomputing | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The cache memory has a direct effect on the performance of a computer system. Instructions and data are fetched from a fast cache instead of a slow memory to save hundreds of cycles. Reducing the cache miss ratio will definitely improve the execution time of an application. In this work, we propose cache memory designs that reduce the number of conflict misses significantly. The proposed designs reduce the conflict misses in the last level multi-way set associative cache. Each set is divided into a group of subsets: the first is referred to as the exclusive subset, and the rest are the shared subsets. The exclusive is configured as a traditional cache where each block is mapped to the set whose index matches the block index. In addition to their standard cache indexing role, the shared subsets are configured to host blocks with different indices. A memory block can be mapped to one subset from the exclusive type or one of multiple subsets from the shared type. Since the proposed technique is based on combining multiple sets of the shared part to form a larger set, that is shared between memory blocks with different indices, we have chosen the name “set folding.” The decision as to where to map a memory block depends on the number of misses encountered at each of the potential target sets. To evaluate the proposed design based on the overall hit rate, twenty-three benchmarks from SPEC CPU 2006 were simulated using the SuperESCalar simulator. The proposed designs require a few extra storage bits which adds a small overhead on the hardware complexity in comparison with the conventional cache. However, the proposed designs achieve lower miss rates for most of the benchmarks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Jl H, Da P (2007) Computer architecture: a quantitative approach, vol 4. Kaufmann Publishers, San Francisco Jl H, Da P (2007) Computer architecture: a quantitative approach, vol 4. Kaufmann Publishers, San Francisco
2.
go back to reference Moore GE et al (1998) Cramming more components onto integrated circuits. Proc IEEE 86(1):82–85CrossRef Moore GE et al (1998) Cramming more components onto integrated circuits. Proc IEEE 86(1):82–85CrossRef
3.
go back to reference Kagi A, Goodman JR, Burger D (1996) Memory bandwidth limitations of future microprocessors. In: Computer Architecture, 23rd Annual International Symposium on IEEE 1996, pp 78–78 Kagi A, Goodman JR, Burger D (1996) Memory bandwidth limitations of future microprocessors. In: Computer Architecture, 23rd Annual International Symposium on IEEE 1996, pp 78–78
4.
go back to reference Patt YN, Patel SJ, Evers M, Friendly DH, Stark J (1997) One billion transistors, one uniprocessor, one chip. Computer 30(9):51–57CrossRef Patt YN, Patel SJ, Evers M, Friendly DH, Stark J (1997) One billion transistors, one uniprocessor, one chip. Computer 30(9):51–57CrossRef
5.
go back to reference Agarwal A, Pudar SD (1993) Column-associative caches: a technique for reducing the miss rate of direct-mapped caches, vol 21 (2). ASM Agarwal A, Pudar SD (1993) Column-associative caches: a technique for reducing the miss rate of direct-mapped caches, vol 21 (2). ASM
6.
go back to reference Jouppi NP (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Computer Architecture, Proceedings of the 17th Annual International Symposium on IEEE 1990, pp 364–373 Jouppi NP (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Computer Architecture, Proceedings of the 17th Annual International Symposium on IEEE 1990, pp 364–373
8.
go back to reference Seznec A (1993) A case for two-way skewed-associative caches. In: ACM SIGARCH computer architecture news, vol 21(2). ACM, pp 169–178 Seznec A (1993) A case for two-way skewed-associative caches. In: ACM SIGARCH computer architecture news, vol 21(2). ACM, pp 169–178
9.
go back to reference Kharbutli M, Irwin K, Solihin Y, Lee J (2004) Using prime numbers for cache indexing to eliminate conflict misses. In: Software, IEE Proceedings of IEEE, pp 288–299 Kharbutli M, Irwin K, Solihin Y, Lee J (2004) Using prime numbers for cache indexing to eliminate conflict misses. In: Software, IEE Proceedings of IEEE, pp 288–299
10.
go back to reference Bodin F, Seznec A (1997) Skewed-associativity improves performance and enhances predictability. IEEE Trans Comput 46(5):530–544CrossRef Bodin F, Seznec A (1997) Skewed-associativity improves performance and enhances predictability. IEEE Trans Comput 46(5):530–544CrossRef
11.
go back to reference Bodin F, Seznec A (1995) Skewed associativity enhances performance predictability. In: Proceedings of the 22nd annual international symposium on computer architecture, 22–24 June 1995. S. Margherita Ligure, Italy, pp 265–274 Bodin F, Seznec A (1995) Skewed associativity enhances performance predictability. In: Proceedings of the 22nd annual international symposium on computer architecture, 22–24 June 1995. S. Margherita Ligure, Italy, pp 265–274
12.
go back to reference Bugnion E, Anderson JM, Mowry TC, Rosenblum M, Lam MS (1996) Compiler-directed page coloring for multiprocessors. In: ACM SIGPLAN notices, vol 31(9). ACM, pp 244–255 Bugnion E, Anderson JM, Mowry TC, Rosenblum M, Lam MS (1996) Compiler-directed page coloring for multiprocessors. In: ACM SIGPLAN notices, vol 31(9). ACM, pp 244–255
13.
go back to reference Kessler RE, Hill MD (1992) Page placement algorithms for large real-indexed caches. ACM Trans Comput Syst (TOCS) 10(4):338–359CrossRef Kessler RE, Hill MD (1992) Page placement algorithms for large real-indexed caches. ACM Trans Comput Syst (TOCS) 10(4):338–359CrossRef
14.
go back to reference Zhang C (2006) Balanced cache: reducing conflict misses of direct-mapped caches. ACM SIGARCH Compu Archit News 34(2):155–166CrossRef Zhang C (2006) Balanced cache: reducing conflict misses of direct-mapped caches. ACM SIGARCH Compu Archit News 34(2):155–166CrossRef
15.
go back to reference Ros A, Xekalakis P, Cintra M, Acacio ME, Garcia JM (2015) Adaptive selection of cache indexing bits for removing conflict misses. IEEE Trans Comput 64(6):1534–1547MathSciNetMATH Ros A, Xekalakis P, Cintra M, Acacio ME, Garcia JM (2015) Adaptive selection of cache indexing bits for removing conflict misses. IEEE Trans Comput 64(6):1534–1547MathSciNetMATH
16.
go back to reference Qureshi MK, Thompson D, Patt YN (2005) The v-way cache: demand-based associativity via global replacement. In: Computer Architecture, ISCA’05. Proceedings of the 32nd International Symposium on IEEE 2005, pp 544–555 Qureshi MK, Thompson D, Patt YN (2005) The v-way cache: demand-based associativity via global replacement. In: Computer Architecture, ISCA’05. Proceedings of the 32nd International Symposium on IEEE 2005, pp 544–555
17.
go back to reference Rolán D, Fraguela BB, Doallo R (2009) Adaptive line placement with the set balancing cache. In: Microarchitecture, MICRO-42. 42nd Annual IEEE/ACM International Symposium on IEEE 2009, pp 529–540 Rolán D, Fraguela BB, Doallo R (2009) Adaptive line placement with the set balancing cache. In: Microarchitecture, MICRO-42. 42nd Annual IEEE/ACM International Symposium on IEEE 2009, pp 529–540
18.
go back to reference González A, Valero M, Topham N, Parcerisa JM (1997) Eliminating cache conflict misses through xor-based placement functions. In: Proceedings of the 11th International Conference on Supercomputing. ACM, pp 76–83 González A, Valero M, Topham N, Parcerisa JM (1997) Eliminating cache conflict misses through xor-based placement functions. In: Proceedings of the 11th International Conference on Supercomputing. ACM, pp 76–83
19.
go back to reference Rau BR (1991) Pseudo-randomly interleaved memory. In: ACM SIGARCH computer architecture news, vol 19(3). ACM, pp 74–83 Rau BR (1991) Pseudo-randomly interleaved memory. In: ACM SIGARCH computer architecture news, vol 19(3). ACM, pp 74–83
20.
go back to reference Givargis T (2003) Improved indexing for cache miss reduction in embedded systems. In: Design Automation Conference, Proceedings of IEEE 2003, pp 875–880 Givargis T (2003) Improved indexing for cache miss reduction in embedded systems. In: Design Automation Conference, Proceedings of IEEE 2003, pp 875–880
21.
go back to reference Johnson TL, Connors DA, Merten MC, Hwu W-M (1999) Run-time cache bypassing. IEEE Trans Comput 48(12):1338–1354CrossRef Johnson TL, Connors DA, Merten MC, Hwu W-M (1999) Run-time cache bypassing. IEEE Trans Comput 48(12):1338–1354CrossRef
22.
go back to reference Collins JD, Tullsen DM (2001) Runtime identification of cache conflict misses: the adaptive miss buffer. ACM Trans Comput Syst (TOCS) 19(4):413–439CrossRef Collins JD, Tullsen DM (2001) Runtime identification of cache conflict misses: the adaptive miss buffer. ACM Trans Comput Syst (TOCS) 19(4):413–439CrossRef
23.
go back to reference Balasubramonian R, Albonesi D, Buyuktosunoglu A, Dwarkadas S (2000) Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In: Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. ACM, pp 245–257 Balasubramonian R, Albonesi D, Buyuktosunoglu A, Dwarkadas S (2000) Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In: Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. ACM, pp 245–257
24.
go back to reference Chiou D, Jain P, Devadas S, Rudolph L (2000) Cache partitioning via columnization. In: Proceedings of Design Automation Conference. minus 0.4emCiteseer Chiou D, Jain P, Devadas S, Rudolph L (2000) Cache partitioning via columnization. In: Proceedings of Design Automation Conference. minus 0.4emCiteseer
25.
go back to reference Ranganathan P, Adve S, Jouppi NP (2000) Reconfigurable caches and their application to media processing, vol 28(2). ACM Ranganathan P, Adve S, Jouppi NP (2000) Reconfigurable caches and their application to media processing, vol 28(2). ACM
26.
go back to reference Bershad BN, Lee D, Romer TH, Chen JB (1994) Avoiding conflict misses dynamically in large direct-mapped caches. In: ACM SIGPLAN notices, vol 29(11). ACM, pp. 158–170 Bershad BN, Lee D, Romer TH, Chen JB (1994) Avoiding conflict misses dynamically in large direct-mapped caches. In: ACM SIGPLAN notices, vol 29(11). ACM, pp. 158–170
27.
go back to reference Sherwood T, Calder B, Emer J (1999) Reducing cache misses using hardware and software page placement. In: Proceedings of the 13th International Conference on Supercomputing. ACM, pp 155–164 Sherwood T, Calder B, Emer J (1999) Reducing cache misses using hardware and software page placement. In: Proceedings of the 13th International Conference on Supercomputing. ACM, pp 155–164
28.
go back to reference Chu Y, Ito MR (2000) The 2-way thrashing-avoidance cache (tac): an efficient instruction cache scheme for object-oriented languages. In: Computer Design, Proceedings of 2000 International Conference on IEEE, pp 93–98 Chu Y, Ito MR (2000) The 2-way thrashing-avoidance cache (tac): an efficient instruction cache scheme for object-oriented languages. In: Computer Design, Proceedings of 2000 International Conference on IEEE, pp 93–98
29.
go back to reference Chu Y, Ito M (2001) An efficient instruction cache scheme for object-oriented languages. In: Performance, Computing, and Communications, IEEE International Conference on IEEE, pp 329–336 Chu Y, Ito M (2001) An efficient instruction cache scheme for object-oriented languages. In: Performance, Computing, and Communications, IEEE International Conference on IEEE, pp 329–336
30.
go back to reference Calder B, Grunwald D, Zorn B (1994) Quantifying behavioral differences between c and c++ programs. J Program Lang 2(4):313–351 Calder B, Grunwald D, Zorn B (1994) Quantifying behavioral differences between c and c++ programs. J Program Lang 2(4):313–351
31.
go back to reference Das S, Kapoor HK (2015) Dynamic associativity management using utility based way-sharing. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. ACM, pp 1919–1924 Das S, Kapoor HK (2015) Dynamic associativity management using utility based way-sharing. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. ACM, pp 1919–1924
32.
go back to reference Das S, Kapoor HK (2013) Dynamic associativity management using fellow sets. In: Electronic System Design (ISED), International Symposium on IEEE 2013, pp 133–137 Das S, Kapoor HK (2013) Dynamic associativity management using fellow sets. In: Electronic System Design (ISED), International Symposium on IEEE 2013, pp 133–137
33.
go back to reference Salwan H (2013) Global conflict avoidance using block placement strategies in multi-level caches. In: Information and Communication Technologies (ICT), 2013 IEEE Conference on IEEE, pp 1221–1226 Salwan H (2013) Global conflict avoidance using block placement strategies in multi-level caches. In: Information and Communication Technologies (ICT), 2013 IEEE Conference on IEEE, pp 1221–1226
34.
go back to reference Rolán D, Fraguela BB, Doallo R (2010) Reducing capacity and conflict misses using set saturation levels. In: High Performance Computing (HiPC), 2010 International Conference on IEEE, pp 1–9 Rolán D, Fraguela BB, Doallo R (2010) Reducing capacity and conflict misses using set saturation levels. In: High Performance Computing (HiPC), 2010 International Conference on IEEE, pp 1–9
35.
go back to reference Jia X, Jiang J, Ni X, Zhao T, Qi S, Fu G, Zhang M (2011) Understanding how non-uniform distribution of memory accesses on cache sets affects the system performance of chip multiprocessors. In: Parallel and Distributed Processing with Applications Workshops (ISPAW), Ninth IEEE International Symposium on IEEE 2011, pp 266–272 Jia X, Jiang J, Ni X, Zhao T, Qi S, Fu G, Zhang M (2011) Understanding how non-uniform distribution of memory accesses on cache sets affects the system performance of chip multiprocessors. In: Parallel and Distributed Processing with Applications Workshops (ISPAW), Ninth IEEE International Symposium on IEEE 2011, pp 266–272
36.
go back to reference Wang B, Liu Z, Wang X, Yu W (2015) Eliminating intra-warp conflict misses in GPU. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition. EDA Consortium, pp 689–694 Wang B, Liu Z, Wang X, Yu W (2015) Eliminating intra-warp conflict misses in GPU. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition. EDA Consortium, pp 689–694
37.
go back to reference Hong C, Bao W, Cohen A, Krishnamoorthy S, Pouchet L-N, Rastello F, Ramanujam J, Sadayappan P (2016) Effective padding of multidimensional arrays to avoid cache conflict misses. In: ACM SIGPLAN notices, vol 51(6). ACM, pp 129–144 Hong C, Bao W, Cohen A, Krishnamoorthy S, Pouchet L-N, Rastello F, Ramanujam J, Sadayappan P (2016) Effective padding of multidimensional arrays to avoid cache conflict misses. In: ACM SIGPLAN notices, vol 51(6). ACM, pp 129–144
38.
go back to reference Khairy M, Zahran M, Wassal A (2017) Sacat: streaming-aware conflict-avoiding thrashing-resistant gpgpu cache management scheme. IEEE Trans Parallel Distrib Syst 28(6):1740–1753CrossRef Khairy M, Zahran M, Wassal A (2017) Sacat: streaming-aware conflict-avoiding thrashing-resistant gpgpu cache management scheme. IEEE Trans Parallel Distrib Syst 28(6):1740–1753CrossRef
39.
go back to reference Sato Y, Endo T (2017) An accurate simulator of cache-line conflicts to exploit the underlying cache performance. In: European Conference on Parallel Processing. Springer, pp 119–133 Sato Y, Endo T (2017) An accurate simulator of cache-line conflicts to exploit the underlying cache performance. In: European Conference on Parallel Processing. Springer, pp 119–133
40.
go back to reference Austin T, Larson E, Ernst D (2002) Simplescalar: an infrastructure for computer system modeling. Computer 35(2):59–67CrossRef Austin T, Larson E, Ernst D (2002) Simplescalar: an infrastructure for computer system modeling. Computer 35(2):59–67CrossRef
41.
go back to reference Ortego PM, Sack P (2004) Sesc: Superescalar simulator. In: 17th Euro Micro Conference on Real Time Systems (ECRTS05), pp 1–4 Ortego PM, Sack P (2004) Sesc: Superescalar simulator. In: 17th Euro Micro Conference on Real Time Systems (ECRTS05), pp 1–4
45.
go back to reference Beckmann N, Sanchez D (2015) Talus: a simple way to remove cliffs in cache performance. In: High Performance Computer Architecture (HPCA), IEEE 21st International Symposium on IEEE 2015, pp 64–75 Beckmann N, Sanchez D (2015) Talus: a simple way to remove cliffs in cache performance. In: High Performance Computer Architecture (HPCA), IEEE 21st International Symposium on IEEE 2015, pp 64–75
Metadata
Title
Reducing the second-level cache conflict misses using a set folding technique
Authors
Ali Shatnawi
Mohammad Alsaedeen
Publication date
01-11-2017
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 2/2018
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2174-8

Other articles of this Issue 2/2018

The Journal of Supercomputing 2/2018 Go to the issue

Premium Partner