Top

Published in:

2017 | OriginalPaper | Chapter

7. Work-Unit Tolerance

Authors : Abbas Rahimi, Luca Benini, Rajesh K. Gupta

Published in: From Variability Tolerance to Approximate Computing in Parallel Integrated Architectures and Accelerators

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Manufacturing and environmental variations cause timing errors in microelectronic processors that are typically avoided by ultraconservative multi-corner design margins or corrected by error detection and recovery mechanisms at the circuit level. In contrast, we present in this chapter runtime software support for cost-effective countermeasures against hardware timing failures during system operation. We propose a variability-aware OpenMP (VOMP) programming environment, suitable for tightly coupled shared-memory processor clusters that relies upon modeling across the hardware/software interface. VOMP is implemented as an extension to the OpenMP v3.0 programming model that covers various parallel constructs, including task, sections, and for. Using the notion of work-unit vulnerability (WUV) proposed here, we capture timing errors caused by circuit-level variability as high-level software knowledge. WUV consists of descriptive metadata to characterize the impact of variability on different work-unit types running on various cores. As such, WUV provides a useful abstraction of hardware variability to efficiently allocate a given work-unit to a suitable core for execution. VOMP enables hardware/software collaboration with online variability monitors in hardware and runtime scheduling in software. The hardware provides online per-core characterization of WUV metadata. This metadata is made available by carefully placing key data structures in a shared L1 memory and is used by VOMP schedulers. Our results show that VOMP greatly reduces the cost of timing error recovery compared to the baseline schedulers of OpenMP, yielding speedup of 3–36% for tasks, and 26–49% for sections. Further, VOMP reaches energy saving of 2–46% and 15–50% for tasks and sections, respectively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Hierarchically Focused Guardbanding

next chapter Memristive-Based Associative Memory for Error Recovery

8 cycles are required for synchronization between multiple clock domains for a read/write operation, while performance of the architecture relies on the fact that we have 2 cycles access to L1 memory.

Our platform does not have control over the errors happening while executing library code. The functionality is preserved as each core is equipped with the replay mechanism.

Proportional to the number of errant instructions.

Up to a few tens, for large programs.

There is a 1:1 correspondence between threads and cores, thus we will use the two terms interchangeably.

In our applications, it is selected as 2 iterations.

D. Melpignano, L. Benini, E. Flamand, B. Jego, T. Lepley, G. Haugou, F. Clermidy, D. Dutoit, Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications, in Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE (2012), pp. 1137–1142

L.M. de Lima Silva, A. Calimera, A. Macii, E. Macii, M. Poncino, Power efficient variability compensation through clustered tunable power-gating. IEEE J. Emerg. Sel. Top. Circuits Syst. 1(3), 242–253 (2011)

K.A. Bowman, J.W. Tschanz, S.L. Lu, P.A. Aseron, M.M. Khellah, A. Raychowdhury, B.M. Geuskens, C. Tokunaga, C.B. Wilkerson, T. Karnik, V.K. De, A 45 nm resilient microprocessor core for dynamic variation tolerance. IEEE J. Solid-State Circuits 46(1), 194–208 (2011)CrossRef

K.A. Bowman, J.W. Tschanz, N.S. Kim, J.C. Lee, C.B. Wilkerson, S.L. Lu, T. Karnik, V.K. De, Energy-efficient and metastability-immune resilient circuits for dynamic variation tolerance. IEEE J. Solid-State Circuits 44(1), 49–63 (2009)

H. Zakaria, L. Fesquet, Designing a process variability robust energy-efficient control for complex socs. IEEE J. Emerg. Sel. Topics Circuits Syst. 1(2), 160–172 (2011)CrossRef

P. Gupta, Y. Agarwal, L. Dolecek, N. Dutt, R.K. Gupta, R. Kumar, S. Mitra, A. Nicolau, T.S. Rosing, M.B. Srivastava, S. Swanson, D. Sylvester, Underdesigned and opportunistic computing in presence of hardware variability. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 32(1), 8–23 (2013)CrossRef

G. Karakonstantis, A. Chatterjee, K. Roy, Containing the nanometer “pandora-box”: cross-layer design techniques for variation aware low power systems. IEEE J. Emerg. Sel. Topics Circuits Syst. 1(1), 19–29 (2011)CrossRef

L. Leem, H. Cho, H.-H. Lee, Y.M. Kim, Y. Li, S. Mitra, Cross-layer error resilience for robust systems, in 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2010), pp. 177–180

A. Rahimi, A. Marongiu, P. Burgio, R.K. Gupta, L. Benini, Variation-tolerant openmp tasking on tightly-coupled processor clusters, in Design, Automation Test in Europe Conference Exhibition (DATE), 2013 (2013), pp. 541–546

10.

A. Rahimi, L. Benini, R.K. Gupta, Analysis of instruction-level vulnerability to dynamic voltage and temperature variations, in Design, Automation Test in Europe Conference Exhibition (DATE), 2012 (2012), pp. 1102–1105

11.

A. Rahimi, L. Benini, R.K. Gupta, Application-adaptive guardbanding to mitigate static and dynamic variability. IEEE Trans. Comput. (2013)

12.

A. Rahimi, L. Benini, R.K. Gupta, Procedure hopping: a low overhead solution to mitigate variability in shared-l1 processor clusters, in Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED ’12, ACM, New York, NY, USA (2012), pp. 415–420

13.

L. Benini, E. Flamand, D. Fuin, D. Melpignano, P2012: building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator, in Design, Automation Test in Europe Conference Exhibition (DATE), 2012 (2012), pp. 983–987

14.

Whitepaper. Nvidia’s next generation cuda compute architecture: Fermi (2009)

15.

Plurality, the hypercore processor. http://www.plurality.com/hypercore.html

16.

Kalray, mppa. http://www.kalray.eu/products/mppa-manycore-a-multicore-processors-family-13/

17.

A. Rahimi, I. Loi, M.R. Kakoee, L. Benini, A fully-synthesizable single-cycle interconnection network for shared-l1 processor clusters, in Design, Automation Test in Europe Conference Exhibition (DATE), 2011 (2011), pp. 1–6

18.

S. Miermont, P. Vivet, M. Renaudin, A power supply selector for energy- and area-efficient local dynamic voltage scaling, in Proceedings of the 17th International Workshop on Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, PATMOS ’07 (Springer, Berlin, 2007), pp. 556–565

19.

The gnu project, gomp – an openmp implementation for gcc. http://gcc.gnu.org/projects/gomp

20.

E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, G. Zhang, The design of openmp tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009)

21.

P.N. Sanda, J.W. Kellington, P. Kudva, R. Kalla, R.B. McBeth, J. Ackaret, R. Lockwood, J. Schumann, C.R. Jones, Soft-error resilience of the ibm power6 processor. IBM J. Res. Dev. 52(3), 275–284 (2008)CrossRef

22.

G. Hoang, R.B. Findler, R. Joseph, Exploring circuit timing-aware language and compilation, in Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, ACM, New York, NY, USA (2011), pp. 345–356

23.

E.K. Ardestani, E. Ebrahimi, G. Southern, J. Renau, Thermal-aware sampling in architectural simulation, in Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED ’12, ACM, New York, NY, USA (2012), pp. 33–38

24.

A. Marongiu, P. Burgio, L. Benini, Fast and lightweight support for nested parallelism on cluster-based embedded many-cores, in Design, Automation Test in Europe Conference Exhibition (DATE), 2012 (2012), pp. 105–110

25.

P. Burgio, G. Tagliavini, A. Marongiu, L. Benini, Enabling fine-grained openmp tasking on tightly-coupled shared memory clusters, in Design, Automation Test in Europe Conference Exhibition (DATE), 2013 (2013), pp. 1504–1509

26.

S.N. Agathos, V.V. Dimakopoulos, A. Mourelis, A. Papadogiannakis, Deploying openmp on an embedded multicore accelerator, in 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII) (2013), pp. 180–187

27.

O. Tahan, M. Shawky, Using dynamic task level redundancy for openmp fault tolerance, in Proceedings of the 25th International Conference on Architecture of Computing Systems, ARCS’12 (Springer, Berlin, 2012) pp. 25–36

28.

D. Bortolotti, C. Pinto, A. Marongiu, M. Ruggiero, L. Benini, Virtualsoc: A full-system simulation environment for massively parallel heterogeneous system-on-chip, in IPDPS Workshops (2013), pp. 2182–2187

29.

Leon3. http://www.gaisler.com/cms/

30.

P.D. Hoang, J.M. Rabaey, Scheduling of dsp programs onto multiprocessors for maximum throughput. IEEE Trans. Signal Process. 41(6), 2225–2235 (1993)CrossRefMATH

31.

V.K. Prasanna M. Lee, W. Liu, A mapping methodology for designing software task pipelines for embedded signal processing, in Parallel and Distributed Processing (1998), pp. 937–944

32.

A. Moreno, E. Cesar, A. Guevara, J. Sorribes, T. Margalef, Load balancing in homogeneous pipeline based applications. Parallel Comput. 38(3), 125–139 (2012)CrossRef

Title: Work-Unit Tolerance
Authors: Abbas Rahimi
Luca Benini
Rajesh K. Gupta
Publisher: Springer International Publishing
Book: From Variability Tolerance to Approximate Computing in Parallel Integrated Architectures and Accelerators
Print ISBN: 978-3-319-53767-2

Electronic ISBN: 978-3-319-53768-9

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-53768-9_7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"