Abstract
Transient faults are expected a be a major design consideration in future microprocessors. Recent proposals for transient fault detection in processor cores have revolved around the idea of redundant threading, which involves redundant execution of a program across multiple execution contexts. This paper presents a new approach to redundant threading by bringing together the concepts of slice-level execution and value and control-flow locality into a novel partial redundant threading mechanism called SlicK.The purpose of redundant execution is to check the integrity of the outputs propagating out of the core (typically through stores). SlicK implements redundancy at the granularity of backward-slices of these output instructions and exploits value and control-flow locality to avoid redundantly executing slices that lead to predictable outputs, thereby avoiding redundant execution of a significant fraction of instructions while maintaining extremely low vulnerabilities for critical processor structures.We propose the microarchitecture of a backward-slice extractor called SliceEM that is able to identify backward slices without interrupting the instruction flow, and show how this extractor and a set of predictors can be integrated into a redundant threading mechanism to form SlicK. Detailed simulations with SPEC CPU2000 benchmarks show that SlicK can provide around 10.2% performance improvement over a well known redundant threading mechanism, buying back over 50% of the loss suffered due to redundant execution. SlicK can keep the Architectural Vulnerability Factors of processor structures to typically 0%-2%. More importantly, SlicK's slice-based mechanisms provide future opportunities for exploring interesting points in the performance-reliability design space based on market segment needs.
- T. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In Proceedings of the International ymposium on Microarchitecture (MICRO), pages 196--207, November 1999. Google ScholarDigital Library
- M. Brown, J. Stark, and Y. Patt. Select-Free Instruction Scheduling Logic. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 204--213, December 2001. Google ScholarDigital Library
- D. Burger and T. Austin. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com.Google Scholar
- M. Burtscher. An Improved Index Function for (D)FCM Predictors. ACM SIGARCH Computer Architecture News, 30(3):19--24, June 2002. Google ScholarDigital Library
- J. Collins, D. Tullsen, H. Wang, and J. Shen. Dynamic Speculative Precomputation. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 306--317, December 2001. Google ScholarDigital Library
- E. Duesterwald, R. Gupta, and M.L. Soffa. Distributed slicing and partial re-execution for distributed programs. In Languages and Compilers for Parallel Computing, pages 497--511, 1992. Google ScholarDigital Library
- M.A. Gomaa and T.N. Vijaykumar. Opportunistic transient-fault detection. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 172--183, 2005. Google ScholarDigital Library
- D. Grunwald, A. Klauser, S. Manne, and A.R. Pleszkun. Confidence estimation for speculation control. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 122--131, 1998. Google ScholarDigital Library
- S. Gurumurthi, A. Parashar, and A. Sivasubramaniam. SOS: Using Speculation for Memory Error Detection. In Proceedings of the Workshop on High Performance Computing Reliability Issues (held in conjunction with HPCA), February 2005.Google Scholar
- HP NonStop Himalaya. http://nonstop.compaq.com/.Google Scholar
- J.J. Koppanalil and E. Rotenberg. A simple mechanism for detecting ineffectual instructions in slipstream processors. IEEE Transactions on Computers, 53(4):399--413, 2004. Google ScholarDigital Library
- K. Lepak, G. Bell, and M. Lipasti. Silent Stores and Store Value Locality. IEEE Transactions on Computers, 50 11):1174--1190, November 2001. Google ScholarDigital Library
- X. Li, S. V. Adve, P. Bose, and J.A. Rivers. Softarch: An architecture level tool for modeling and analyzing soft errors. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), pages 496--505, 2005. Google ScholarDigital Library
- E. Morancho, J. Labia, and A. Olive. Recovery mechanism for latency misprediction. In Proceedings of the 2001 ACM/IEEE nternational Conference on Parallel Architectures and Compilation Techniques, 2001. Google ScholarDigital Library
- A. Moshovos, D.N. Pnevmatikatos, and A. Baniasadi. Slice-processors: an implementation of operation-based prediction. n ICS '01: Proceedings of the 15th international conference on Supercomputing, pages 321--334, 2001. Google ScholarDigital Library
- S. Mukherjee, M. Kontz, and S. Reinhardt. Detailed Design and Evaluation of Redundant Multithreading Alternatives. In roceedings of the International Symposium on Computer Architecture (ISCA), pages 99--110, May 2002. Google ScholarDigital Library
- S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 29--40, December 2003. Google ScholarDigital Library
- A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. A Complexity-Effective Approach to ALU Bandwidth Enhancement or Instruction-Level Temporal Redundancy. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 376--386, June 2004. Google ScholarDigital Library
- M.K. Qureshi, O. Mutlu, and Y.N. Patt. Microarchitecture-based introspection: A technique for transient-fault tolerance in microprocessors. In Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05), pages 434--443, 2005. Google ScholarDigital Library
- S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 25--36, June 2000. Google ScholarDigital Library
- S. R. Sarangi, J. T. Wei Liu, and Y. Zhou. Reslice: Selective re-execution of long-retired misspeculated instructions using forward slicing. In Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 257--270, 2005. Google ScholarDigital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In roceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2002. Google ScholarDigital Library
- P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2002. Google ScholarDigital Library
- T. Slegel et al. IBM's S/390 G5 Microprocessor Design. IEEE Micro, 19(2), March 1999. Google ScholarDigital Library
- J. Smolens, B. Gold, J. Kim, B. Falsafi, J. Hoe, and A. Nowatzyk. Fingerprinting: Bounding Soft-Error Detection Latency and Bandwidth. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 224--234, October 2004. Google ScholarDigital Library
- J. Smolens, J. Kim, J. Hoe, and B. Falsafi. Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 257--268, December 2004. Google ScholarDigital Library
- K. Sundaramoorthy, Z. Purser, and E. Rotenburg. Slipstream processors: improving both performance and fault tolerance. In ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, pages 257--268, 2000. Google ScholarDigital Library
- D. Tullsen, S. Eggers, and H. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In Proceedings of the nternational Symposium on Computer Architecture (ISCA), pages 392--403, June 1995. Google ScholarDigital Library
- T. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-Fault Recovery via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 87--98, May 2002. Google ScholarDigital Library
- N.J. Wang and S.J. Patel. Restore: Symptom based soft error detection in microprocessors. In Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05), pages 30--39, 2005. Google ScholarDigital Library
- T.-Y. Yeh and Y. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 124--134, May 1992. Google ScholarDigital Library
- C. Zilles and G. Sohi. Understanding the Backward Slices of Performance Degrading Instructions. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 172--181, June 2000. Google ScholarDigital Library
Index Terms
- SlicK: slice-based locality exploitation for efficient redundant multithreading
Recommendations
SlicK: slice-based locality exploitation for efficient redundant multithreading
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systemsTransient faults are expected a be a major design consideration in future microprocessors. Recent proposals for transient fault detection in processor cores have revolved around the idea of redundant threading, which involves redundant execution of a ...
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 2006 ASPLOS ConferenceTransient faults are expected a be a major design consideration in future microprocessors. Recent proposals for transient fault detection in processor cores have revolved around the idea of redundant threading, which involves redundant execution of a ...
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 2006 ASPLOS ConferenceTransient faults are expected a be a major design consideration in future microprocessors. Recent proposals for transient fault detection in processor cores have revolved around the idea of redundant threading, which involves redundant execution of a ...
Comments