Skip to main content
Top
Published in: Journal of Electronic Testing 1/2020

18-02-2020

Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration

Authors: Avishek Choudhury, Biplab K. Sikdar

Published in: Journal of Electronic Testing | Issue 1/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

On top of the wear-out failures and external particle interventions, voltage scaling to mitigate the power consumption in multiprocessor makes cache more vulnerable to cell failures. For the indispensable voltage reduction to prolong the battery life of handheld devices, fault tolerance techniques are extremely important to ensure fault free execution in near-threshold voltage. Several fault tolerance techniques have been proposed and the remapping based techniques are found to be effective to address the issue of fault tolerance in single core systems. This work proposes an analytical model for remapping based fault tolerance techniques to evaluate the effectiveness of such schemes in multicore systems. The metrics Expected Miss Ratio in Multicore (EMRMC) and Expected Latency Ratio in Multicore (ELRMC), are introduced to characterize the behavior of remapping based techniques. The EMRMC and ELRMC are defined as the function of probability of cell failure (Pfail), block size, number of cores and threads. The system is simulated in Multi2sim 5.0, a multicore CPU-GPU simulator. The values of the metrics for different configuration parameters like probability of cell failure, number of cores, number of blocks, block size and number of threads are analysed for framing the guidelines of system configuration to deliver better performance in remapping based fault tolerance. It is observed that the EMRMC is proportional to Pfail and block size but inversely proportional to the number of cores and threads and it is not affected by the number of blocks. On the contrary, the ELRMC is inversely proportional to Pfail and block size and proportional to the number of cores and threads. It is also observed that the ELRMC is independent of the number of cores and blocks. EMRMC is best minimized for Pfail ≤ 1e-4, block size ≤ 64 bytes, number of cores ≥ 4 and number of threads ≥ 2. On the other hand, ELRMC is best observed for Pfail ≤ 1e-4, block size ≥ 64 bytes, number of cores ≥ 4 and number of threads 2.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Ansari A, et al. (2009) Enabling ultra low voltage system operation by tolerating on-chip cache failures. In: Proc. ISLPED Ansari A, et al. (2009) Enabling ultra low voltage system operation by tolerating on-chip cache failures. In: Proc. ISLPED
2.
go back to reference Ansari A, et al. (2009) Enabling ultra low voltage system operation by tolerating on-chip cache failures. In: Proc. of the intl. symposium on low power electronics and design Ansari A, et al. (2009) Enabling ultra low voltage system operation by tolerating on-chip cache failures. In: Proc. of the intl. symposium on low power electronics and design
3.
go back to reference Ansari A, et al. (2011) Archipelago: a polymorphic cache design for enabling robust near-threshold operation. In: Proc of the international conference on computer architecture (HPCA), pp 539–550 Ansari A, et al. (2011) Archipelago: a polymorphic cache design for enabling robust near-threshold operation. In: Proc of the international conference on computer architecture (HPCA), pp 539–550
4.
go back to reference BanaiyanMofrad A, et al. (2011) FFT-Cache: a flexible fault-tolerant cache architecture for ultra low voltage operation. In: Proc. CASES BanaiyanMofrad A, et al. (2011) FFT-Cache: a flexible fault-tolerant cache architecture for ultra low voltage operation. In: Proc. CASES
5.
go back to reference BanaiyanMofrad A, et al. (2013) REMEDIATE: a scalable fault-tolerant architecture for low-power NUCA cache in tiled CMPs. In: Proc. of the international green computing conference (IGCC) BanaiyanMofrad A, et al. (2013) REMEDIATE: a scalable fault-tolerant architecture for low-power NUCA cache in tiled CMPs. In: Proc. of the international green computing conference (IGCC)
6.
go back to reference Banaiyanmofrad A, Homayoun H, Dutt N (2015) Using a flexible fault- tolerant cache to improve reliability for ultra low voltage operation. ACM Trans Embedded Comput Syst 14(2):Article 32. Publication date: February 2015CrossRef Banaiyanmofrad A, Homayoun H, Dutt N (2015) Using a flexible fault- tolerant cache to improve reliability for ultra low voltage operation. ACM Trans Embedded Comput Syst 14(2):Article 32. Publication date: February 2015CrossRef
7.
go back to reference Calhoun B, Chandrakasan A (2006) A 256 kb subthreshold SRAM in 65nm CMOS. In: Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp 480–48 Calhoun B, Chandrakasan A (2006) A 256 kb subthreshold SRAM in 65nm CMOS. In: Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp 480–48
8.
go back to reference Chen C, Hsiao M (1984) Error-correcting codes for semiconductor memory applications: a state of the art review. IBM J R & D Chen C, Hsiao M (1984) Error-correcting codes for semiconductor memory applications: a state of the art review. IBM J R & D
9.
go back to reference Choudhury A, Sikdar BK Modeling and analysis of redundancy based fault tolerance for permanent faults in chip multiprocessor cache. In: Proceedings of the 31st international conference on VLSI design, VLSID 2018, ISSN-2380-6923, pp 115–120 Choudhury A, Sikdar BK Modeling and analysis of redundancy based fault tolerance for permanent faults in chip multiprocessor cache. In: Proceedings of the 31st international conference on VLSI design, VLSID 2018, ISSN-2380-6923, pp 115–120
10.
go back to reference Choudhury A, Sikdar BK (2017) CIFR: a complete in-place fault remapping strategy for CMP cache for dynamic reuse distance. In: Proc. of the 7th International conference on embedded computing and system design, ISED Choudhury A, Sikdar BK (2017) CIFR: a complete in-place fault remapping strategy for CMP cache for dynamic reuse distance. In: Proc. of the 7th International conference on embedded computing and system design, ISED
11.
go back to reference Duong N, et al. (2012) Improving cache management policies using dynamic reuse distances. In: Proceedings of the 45th Annual IEEE/ACM international symposium on microarchitecture (MICRO) Duong N, et al. (2012) Improving cache management policies using dynamic reuse distances. In: Proceedings of the 45th Annual IEEE/ACM international symposium on microarchitecture (MICRO)
12.
go back to reference Kim J, Hardavellas N, Mai K, Falsafi B, Hoe JC (2007) Multi-bit error tolerant caches using two-dimensional error coding. In: Proc. 39th annual IEEE/ACM international symposium on microarchitecture (MICRO 39), pp 15–25 Kim J, Hardavellas N, Mai K, Falsafi B, Hoe JC (2007) Multi-bit error tolerant caches using two-dimensional error coding. In: Proc. 39th annual IEEE/ACM international symposium on microarchitecture (MICRO 39), pp 15–25
13.
go back to reference Koh C-K, et al. (2009) The salvage cache: a fault-tolerant cache architecture for next-generation memory technologies. In: Proc. of the international conference on computer design (ICCD) Koh C-K, et al. (2009) The salvage cache: a fault-tolerant cache architecture for next-generation memory technologies. In: Proc. of the international conference on computer design (ICCD)
14.
go back to reference Kulkarni JP, Kim K, Roy K (2007) A 160 mv, fully differential, robust Schmitt trigger based sub-threshold sram. In: Proc. of the 2007 international symposium on low power electronics and design. ACM, New York, pp 171–176 Kulkarni JP, Kim K, Roy K (2007) A 160 mv, fully differential, robust Schmitt trigger based sub-threshold sram. In: Proc. of the 2007 international symposium on low power electronics and design. ACM, New York, pp 171–176
15.
go back to reference Ladas N, Sazeides Y, Desmet V (2010) Performance-effective operation below Vcc-min. In: Proc of the intl symposium on performance analysis of systems & software Ladas N, Sazeides Y, Desmet V (2010) Performance-effective operation below Vcc-min. In: Proc of the intl symposium on performance analysis of systems & software
16.
go back to reference Moradi F, Wisland D, Aunet S, Mahmoodi H, Cao T (2008) 65nm sub-threshold 11t-sram for ultra low voltage applications. In: Intl. symposium on system-on-a-chip, p 113118 Moradi F, Wisland D, Aunet S, Mahmoodi H, Cao T (2008) 65nm sub-threshold 11t-sram for ultra low voltage applications. In: Intl. symposium on system-on-a-chip, p 113118
17.
go back to reference Morita Y, Fujiwara H, Noguchi H, Iguchi Y, Nii K, Kawaguchi H, Yoshimoto M (2007) An area-conscious low-voltage-oriented 8t-sram design under dvs environment. IEEE Symposium on VLSI circuits, pp. 256–257 Morita Y, Fujiwara H, Noguchi H, Iguchi Y, Nii K, Kawaguchi H, Yoshimoto M (2007) An area-conscious low-voltage-oriented 8t-sram design under dvs environment. IEEE Symposium on VLSI circuits, pp. 256–257
18.
go back to reference Ozdemir S, Sinha D, Memik G, Adams J, Zhou H (2006) Yield-aware cache architectures. In: Proc. of the international symposium on microarchitecture Ozdemir S, Sinha D, Memik G, Adams J, Zhou H (2006) Yield-aware cache architectures. In: Proc. of the international symposium on microarchitecture
19.
go back to reference Pour F, Hill MD (1993) Performance implications of tolerating cache faults, Trans Comput Pour F, Hill MD (1993) Performance implications of tolerating cache faults, Trans Comput
20.
go back to reference Sa’nchez D, Sazeides Y, Cebria’n J, Garc’ia JM, Arago’ JLN (2013) Modeling the impact of permanent faults in caches. ACM Trans Arch Code Optim 10(4):Article 29. Publication date: December 2013 Sa’nchez D, Sazeides Y, Cebria’n J, Garc’ia JM, Arago’ JLN (2013) Modeling the impact of permanent faults in caches. ACM Trans Arch Code Optim 10(4):Article 29. Publication date: December 2013
21.
go back to reference Sasan A, Homayoun H, Eltawil A, Kurdahi F (2009) A fault tolerant cache architecture for sub 500mv operation: resizable data composer cache (RDC-Cache). In: Proc. of international conference on compilers, architectures and synthesis for embedded systems (CASES) Sasan A, Homayoun H, Eltawil A, Kurdahi F (2009) A fault tolerant cache architecture for sub 500mv operation: resizable data composer cache (RDC-Cache). In: Proc. of international conference on compilers, architectures and synthesis for embedded systems (CASES)
22.
go back to reference Skotnicki T, Hutchby J, King T-J, Wong H-S, Boeuf F (2005) The end of cmos scaling: toward the introduction of new materials and structural changes to improve mosfet performance. Circ Dev Mag IEEE 21(1):16CrossRef Skotnicki T, Hutchby J, King T-J, Wong H-S, Boeuf F (2005) The end of cmos scaling: toward the introduction of new materials and structural changes to improve mosfet performance. Circ Dev Mag IEEE 21(1):16CrossRef
23.
go back to reference Sohi G (1989) Cache memory organization to enhance the yield of high-performance VLSI processors. IEEE Trans. Computers 38(4):484–492CrossRef Sohi G (1989) Cache memory organization to enhance the yield of high-performance VLSI processors. IEEE Trans. Computers 38(4):484–492CrossRef
24.
go back to reference Ubal R, Jang B, Mistry P, Schaa D, Kaeli D (2012) Multi2Sim: a simulation framework for CPU-GPU computing. In: Proc. of 21st international conference on parallel architectures and compilation techniques. Minneapolis Ubal R, Jang B, Mistry P, Schaa D, Kaeli D (2012) Multi2Sim: a simulation framework for CPU-GPU computing. In: Proc. of 21st international conference on parallel architectures and compilation techniques. Minneapolis
25.
go back to reference Vergos HT, Nikolos D (1995) Performance recovery in direct- mapped faulty caches via the use of a very small fully associative spare cache. In: Proc. of the intl. computer performance and dependability symposium Vergos HT, Nikolos D (1995) Performance recovery in direct- mapped faulty caches via the use of a very small fully associative spare cache. In: Proc. of the intl. computer performance and dependability symposium
26.
go back to reference Wilkerson C et al (2008) Trading off cache capacity for reliability to enable low voltage operation. In: Proc. of international symposium on computer architecture (ISCA) Wilkerson C et al (2008) Trading off cache capacity for reliability to enable low voltage operation. In: Proc. of international symposium on computer architecture (ISCA)
Metadata
Title
Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration
Authors
Avishek Choudhury
Biplab K. Sikdar
Publication date
18-02-2020
Publisher
Springer US
Published in
Journal of Electronic Testing / Issue 1/2020
Print ISSN: 0923-8174
Electronic ISSN: 1573-0727
DOI
https://doi.org/10.1007/s10836-019-05852-6

Other articles of this Issue 1/2020

Journal of Electronic Testing 1/2020 Go to the issue

Announcement

2019 Reviewers