Skip to main content
Top

2017 | OriginalPaper | Chapter

7. Work-Unit Tolerance

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Manufacturing and environmental variations cause timing errors in microelectronic processors that are typically avoided by ultraconservative multi-corner design margins or corrected by error detection and recovery mechanisms at the circuit level. In contrast, we present in this chapter runtime software support for cost-effective countermeasures against hardware timing failures during system operation. We propose a variability-aware OpenMP (VOMP) programming environment, suitable for tightly coupled shared-memory processor clusters that relies upon modeling across the hardware/software interface. VOMP is implemented as an extension to the OpenMP v3.0 programming model that covers various parallel constructs, including task, sections, and for. Using the notion of work-unit vulnerability (WUV) proposed here, we capture timing errors caused by circuit-level variability as high-level software knowledge. WUV consists of descriptive metadata to characterize the impact of variability on different work-unit types running on various cores. As such, WUV provides a useful abstraction of hardware variability to efficiently allocate a given work-unit to a suitable core for execution. VOMP enables hardware/software collaboration with online variability monitors in hardware and runtime scheduling in software. The hardware provides online per-core characterization of WUV metadata. This metadata is made available by carefully placing key data structures in a shared L1 memory and is used by VOMP schedulers. Our results show that VOMP greatly reduces the cost of timing error recovery compared to the baseline schedulers of OpenMP, yielding speedup of 3–36% for tasks, and 26–49% for sections. Further, VOMP reaches energy saving of 2–46% and 15–50% for tasks and sections, respectively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
8 cycles are required for synchronization between multiple clock domains for a read/write operation, while performance of the architecture relies on the fact that we have 2 cycles access to L1 memory.
 
2
Our platform does not have control over the errors happening while executing library code. The functionality is preserved as each core is equipped with the replay mechanism.
 
3
Proportional to the number of errant instructions.
 
4
Up to a few tens, for large programs.
 
5
There is a 1:1 correspondence between threads and cores, thus we will use the two terms interchangeably.
 
6
In our applications, it is selected as 2 iterations.
 
Literature
1.
go back to reference D. Melpignano, L. Benini, E. Flamand, B. Jego, T. Lepley, G. Haugou, F. Clermidy, D. Dutoit, Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications, in Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE (2012), pp. 1137–1142 D. Melpignano, L. Benini, E. Flamand, B. Jego, T. Lepley, G. Haugou, F. Clermidy, D. Dutoit, Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications, in Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE (2012), pp. 1137–1142
2.
go back to reference L.M. de Lima Silva, A. Calimera, A. Macii, E. Macii, M. Poncino, Power efficient variability compensation through clustered tunable power-gating. IEEE J. Emerg. Sel. Top. Circuits Syst. 1(3), 242–253 (2011) L.M. de Lima Silva, A. Calimera, A. Macii, E. Macii, M. Poncino, Power efficient variability compensation through clustered tunable power-gating. IEEE J. Emerg. Sel. Top. Circuits Syst. 1(3), 242–253 (2011)
3.
go back to reference K.A. Bowman, J.W. Tschanz, S.L. Lu, P.A. Aseron, M.M. Khellah, A. Raychowdhury, B.M. Geuskens, C. Tokunaga, C.B. Wilkerson, T. Karnik, V.K. De, A 45 nm resilient microprocessor core for dynamic variation tolerance. IEEE J. Solid-State Circuits 46(1), 194–208 (2011)CrossRef K.A. Bowman, J.W. Tschanz, S.L. Lu, P.A. Aseron, M.M. Khellah, A. Raychowdhury, B.M. Geuskens, C. Tokunaga, C.B. Wilkerson, T. Karnik, V.K. De, A 45 nm resilient microprocessor core for dynamic variation tolerance. IEEE J. Solid-State Circuits 46(1), 194–208 (2011)CrossRef
4.
go back to reference K.A. Bowman, J.W. Tschanz, N.S. Kim, J.C. Lee, C.B. Wilkerson, S.L. Lu, T. Karnik, V.K. De, Energy-efficient and metastability-immune resilient circuits for dynamic variation tolerance. IEEE J. Solid-State Circuits 44(1), 49–63 (2009) K.A. Bowman, J.W. Tschanz, N.S. Kim, J.C. Lee, C.B. Wilkerson, S.L. Lu, T. Karnik, V.K. De, Energy-efficient and metastability-immune resilient circuits for dynamic variation tolerance. IEEE J. Solid-State Circuits 44(1), 49–63 (2009)
5.
go back to reference H. Zakaria, L. Fesquet, Designing a process variability robust energy-efficient control for complex socs. IEEE J. Emerg. Sel. Topics Circuits Syst. 1(2), 160–172 (2011)CrossRef H. Zakaria, L. Fesquet, Designing a process variability robust energy-efficient control for complex socs. IEEE J. Emerg. Sel. Topics Circuits Syst. 1(2), 160–172 (2011)CrossRef
6.
go back to reference P. Gupta, Y. Agarwal, L. Dolecek, N. Dutt, R.K. Gupta, R. Kumar, S. Mitra, A. Nicolau, T.S. Rosing, M.B. Srivastava, S. Swanson, D. Sylvester, Underdesigned and opportunistic computing in presence of hardware variability. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 32(1), 8–23 (2013)CrossRef P. Gupta, Y. Agarwal, L. Dolecek, N. Dutt, R.K. Gupta, R. Kumar, S. Mitra, A. Nicolau, T.S. Rosing, M.B. Srivastava, S. Swanson, D. Sylvester, Underdesigned and opportunistic computing in presence of hardware variability. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 32(1), 8–23 (2013)CrossRef
7.
go back to reference G. Karakonstantis, A. Chatterjee, K. Roy, Containing the nanometer “pandora-box”: cross-layer design techniques for variation aware low power systems. IEEE J. Emerg. Sel. Topics Circuits Syst. 1(1), 19–29 (2011)CrossRef G. Karakonstantis, A. Chatterjee, K. Roy, Containing the nanometer “pandora-box”: cross-layer design techniques for variation aware low power systems. IEEE J. Emerg. Sel. Topics Circuits Syst. 1(1), 19–29 (2011)CrossRef
8.
go back to reference L. Leem, H. Cho, H.-H. Lee, Y.M. Kim, Y. Li, S. Mitra, Cross-layer error resilience for robust systems, in 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2010), pp. 177–180 L. Leem, H. Cho, H.-H. Lee, Y.M. Kim, Y. Li, S. Mitra, Cross-layer error resilience for robust systems, in 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2010), pp. 177–180
9.
go back to reference A. Rahimi, A. Marongiu, P. Burgio, R.K. Gupta, L. Benini, Variation-tolerant openmp tasking on tightly-coupled processor clusters, in Design, Automation Test in Europe Conference Exhibition (DATE), 2013 (2013), pp. 541–546 A. Rahimi, A. Marongiu, P. Burgio, R.K. Gupta, L. Benini, Variation-tolerant openmp tasking on tightly-coupled processor clusters, in Design, Automation Test in Europe Conference Exhibition (DATE), 2013 (2013), pp. 541–546
10.
go back to reference A. Rahimi, L. Benini, R.K. Gupta, Analysis of instruction-level vulnerability to dynamic voltage and temperature variations, in Design, Automation Test in Europe Conference Exhibition (DATE), 2012 (2012), pp. 1102–1105 A. Rahimi, L. Benini, R.K. Gupta, Analysis of instruction-level vulnerability to dynamic voltage and temperature variations, in Design, Automation Test in Europe Conference Exhibition (DATE), 2012 (2012), pp. 1102–1105
11.
go back to reference A. Rahimi, L. Benini, R.K. Gupta, Application-adaptive guardbanding to mitigate static and dynamic variability. IEEE Trans. Comput. (2013) A. Rahimi, L. Benini, R.K. Gupta, Application-adaptive guardbanding to mitigate static and dynamic variability. IEEE Trans. Comput. (2013)
12.
go back to reference A. Rahimi, L. Benini, R.K. Gupta, Procedure hopping: a low overhead solution to mitigate variability in shared-l1 processor clusters, in Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED ’12, ACM, New York, NY, USA (2012), pp. 415–420 A. Rahimi, L. Benini, R.K. Gupta, Procedure hopping: a low overhead solution to mitigate variability in shared-l1 processor clusters, in Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED ’12, ACM, New York, NY, USA (2012), pp. 415–420
13.
go back to reference L. Benini, E. Flamand, D. Fuin, D. Melpignano, P2012: building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator, in Design, Automation Test in Europe Conference Exhibition (DATE), 2012 (2012), pp. 983–987 L. Benini, E. Flamand, D. Fuin, D. Melpignano, P2012: building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator, in Design, Automation Test in Europe Conference Exhibition (DATE), 2012 (2012), pp. 983–987
14.
go back to reference Whitepaper. Nvidia’s next generation cuda compute architecture: Fermi (2009) Whitepaper. Nvidia’s next generation cuda compute architecture: Fermi (2009)
17.
go back to reference A. Rahimi, I. Loi, M.R. Kakoee, L. Benini, A fully-synthesizable single-cycle interconnection network for shared-l1 processor clusters, in Design, Automation Test in Europe Conference Exhibition (DATE), 2011 (2011), pp. 1–6 A. Rahimi, I. Loi, M.R. Kakoee, L. Benini, A fully-synthesizable single-cycle interconnection network for shared-l1 processor clusters, in Design, Automation Test in Europe Conference Exhibition (DATE), 2011 (2011), pp. 1–6
18.
go back to reference S. Miermont, P. Vivet, M. Renaudin, A power supply selector for energy- and area-efficient local dynamic voltage scaling, in Proceedings of the 17th International Workshop on Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, PATMOS ’07 (Springer, Berlin, 2007), pp. 556–565 S. Miermont, P. Vivet, M. Renaudin, A power supply selector for energy- and area-efficient local dynamic voltage scaling, in Proceedings of the 17th International Workshop on Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, PATMOS ’07 (Springer, Berlin, 2007), pp. 556–565
20.
go back to reference E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, G. Zhang, The design of openmp tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009) E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, G. Zhang, The design of openmp tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009)
21.
go back to reference P.N. Sanda, J.W. Kellington, P. Kudva, R. Kalla, R.B. McBeth, J. Ackaret, R. Lockwood, J. Schumann, C.R. Jones, Soft-error resilience of the ibm power6 processor. IBM J. Res. Dev. 52(3), 275–284 (2008)CrossRef P.N. Sanda, J.W. Kellington, P. Kudva, R. Kalla, R.B. McBeth, J. Ackaret, R. Lockwood, J. Schumann, C.R. Jones, Soft-error resilience of the ibm power6 processor. IBM J. Res. Dev. 52(3), 275–284 (2008)CrossRef
22.
go back to reference G. Hoang, R.B. Findler, R. Joseph, Exploring circuit timing-aware language and compilation, in Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, ACM, New York, NY, USA (2011), pp. 345–356 G. Hoang, R.B. Findler, R. Joseph, Exploring circuit timing-aware language and compilation, in Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, ACM, New York, NY, USA (2011), pp. 345–356
23.
go back to reference E.K. Ardestani, E. Ebrahimi, G. Southern, J. Renau, Thermal-aware sampling in architectural simulation, in Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED ’12, ACM, New York, NY, USA (2012), pp. 33–38 E.K. Ardestani, E. Ebrahimi, G. Southern, J. Renau, Thermal-aware sampling in architectural simulation, in Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED ’12, ACM, New York, NY, USA (2012), pp. 33–38
24.
go back to reference A. Marongiu, P. Burgio, L. Benini, Fast and lightweight support for nested parallelism on cluster-based embedded many-cores, in Design, Automation Test in Europe Conference Exhibition (DATE), 2012 (2012), pp. 105–110 A. Marongiu, P. Burgio, L. Benini, Fast and lightweight support for nested parallelism on cluster-based embedded many-cores, in Design, Automation Test in Europe Conference Exhibition (DATE), 2012 (2012), pp. 105–110
25.
go back to reference P. Burgio, G. Tagliavini, A. Marongiu, L. Benini, Enabling fine-grained openmp tasking on tightly-coupled shared memory clusters, in Design, Automation Test in Europe Conference Exhibition (DATE), 2013 (2013), pp. 1504–1509 P. Burgio, G. Tagliavini, A. Marongiu, L. Benini, Enabling fine-grained openmp tasking on tightly-coupled shared memory clusters, in Design, Automation Test in Europe Conference Exhibition (DATE), 2013 (2013), pp. 1504–1509
26.
go back to reference S.N. Agathos, V.V. Dimakopoulos, A. Mourelis, A. Papadogiannakis, Deploying openmp on an embedded multicore accelerator, in 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII) (2013), pp. 180–187 S.N. Agathos, V.V. Dimakopoulos, A. Mourelis, A. Papadogiannakis, Deploying openmp on an embedded multicore accelerator, in 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII) (2013), pp. 180–187
27.
go back to reference O. Tahan, M. Shawky, Using dynamic task level redundancy for openmp fault tolerance, in Proceedings of the 25th International Conference on Architecture of Computing Systems, ARCS’12 (Springer, Berlin, 2012) pp. 25–36 O. Tahan, M. Shawky, Using dynamic task level redundancy for openmp fault tolerance, in Proceedings of the 25th International Conference on Architecture of Computing Systems, ARCS’12 (Springer, Berlin, 2012) pp. 25–36
28.
go back to reference D. Bortolotti, C. Pinto, A. Marongiu, M. Ruggiero, L. Benini, Virtualsoc: A full-system simulation environment for massively parallel heterogeneous system-on-chip, in IPDPS Workshops (2013), pp. 2182–2187 D. Bortolotti, C. Pinto, A. Marongiu, M. Ruggiero, L. Benini, Virtualsoc: A full-system simulation environment for massively parallel heterogeneous system-on-chip, in IPDPS Workshops (2013), pp. 2182–2187
30.
go back to reference P.D. Hoang, J.M. Rabaey, Scheduling of dsp programs onto multiprocessors for maximum throughput. IEEE Trans. Signal Process. 41(6), 2225–2235 (1993)CrossRefMATH P.D. Hoang, J.M. Rabaey, Scheduling of dsp programs onto multiprocessors for maximum throughput. IEEE Trans. Signal Process. 41(6), 2225–2235 (1993)CrossRefMATH
31.
go back to reference V.K. Prasanna M. Lee, W. Liu, A mapping methodology for designing software task pipelines for embedded signal processing, in Parallel and Distributed Processing (1998), pp. 937–944 V.K. Prasanna M. Lee, W. Liu, A mapping methodology for designing software task pipelines for embedded signal processing, in Parallel and Distributed Processing (1998), pp. 937–944
32.
go back to reference A. Moreno, E. Cesar, A. Guevara, J. Sorribes, T. Margalef, Load balancing in homogeneous pipeline based applications. Parallel Comput. 38(3), 125–139 (2012)CrossRef A. Moreno, E. Cesar, A. Guevara, J. Sorribes, T. Margalef, Load balancing in homogeneous pipeline based applications. Parallel Comput. 38(3), 125–139 (2012)CrossRef
Metadata
Title
Work-Unit Tolerance
Authors
Abbas Rahimi
Luca Benini
Rajesh K. Gupta
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-53768-9_7