Abstract
For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore’s law into remarkable increases in performance. Recently, however, the bounty provided by Moore’s law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
1.7 References
J. Abella, X. Vera, and A. Gonzalez. Penelope: The NBTI-Aware Processor. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 85–96, Dec. 2007.
Advanced Micro Devices. Revision Guide for AMD Athlon64 and AMD Opteron Processors. Publication 25759, Revision 3.59, Sept. 2006.
R. M. Bentley. Validating the Pentium 4 Microprocessor. In Proceedings of the International Conference on Dependable Systems and Networks, pp. 493–498, July 2001. doi:https://doi.org/10.1109/DSN.2001.941434
M. Blum and H. Wasserman. Reflections on the Pentium Bug. IEEE Transactions on Computers, 45(4), pp. 385–393, Apr. 1996. doi:https://doi.org/10.1109/12.494097
S. Borkar. Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation. IEEE Micro, 25(6), pp. 10–16, Nov./Dec. 2005. doi:https://doi.org/10.1109/MM.2005.110
J. R. Carter, S. Ozev, and D. J. Sorin. Circuit-Level Modeling for Concurrent Testing of Operational Defects due to Gate Oxide Breakdown. In Proceedings of Design, Automation, and Test in Europe (DATE), pp. 300–305, Mar. 2005. doi:https://doi.org/10.1109/DATE.2005.94
J. J. Clement. Electromigration Modeling for Integrated Circuit Interconnect Reliability Analysis. IEEE Transactions on Device and Materials Reliability, 1(1), pp. 33–42, Mar. 2001. doi:https://doi.org/10.1109/7298.946458
C. Constantinescu. Trends and Challenges in VLSI Circuit Reliability. IEEE Micro, 23(4), July–Aug. 2003. doi:https://doi.org/10.1109/MM.2003.1225959
T. J. Dell. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. IBM Microelectronics Division Whitepaper, Nov. 1997.
D. J. Dumin. Oxide Reliability: A Summary of Silicon Oxide Wearout, Breakdown and Reliability. World Scientific Publications, 2002.
D. Ernst et al. Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2003. doi:https://doi.org/10.1109/MICRO.2003.1253179
S. Feng, S. Gupta, and S. Mahlke. Olay: Combat the Signs of Aging with Introspective Reliability Management. In Proceedings of the Workshop on Quality-Aware Design, June 2008.
A. H. Fischer, A. von Glasow, S. Penka, and F. Ungar. Electromigration Failure Mechanism Studies on Copper Interconnects. In Proceedings of the 2002 IEEE Interconnect Technology Conference, pp. 139–141, 2002. doi:https://doi.org/10.1109/IITC.2002.1014913
IBM. Enhancing IBM Netfinity Server Reliability: IBM Chipkill Memory. IBM Whitepaper, Feb. 1999.
IBM. IBM PowerPC 750FX and 750FL RISC Microprocessor Errata List DD2.X, version 1.3, Feb. 2006.
Intel Corporation. Intel Itanium Processor Specification Update. Order Number 249720-00, May 2003.
Intel Corporation. Intel Pentium 4 Processor Specification Update. Document Number 249199-065, June 2006.
S. Krumbein. Metallic Electromigration Phenomena. IEEE Transactions on Components, Hybrids, and Manufacturing Technology, 11(1), pp. 5–15, Mar. 1988. doi:https://doi.org/10.1109/33.2957
P.-C. Li and T. K. Young. Electromigration: The Time Bomb in Deep-Submicron ICs. IEEE Spectrum, 33(9), pp. 75–78, Sept. 1996.
X. Liang and D. Brooks. Mitigating the Impact of Process Variations on Processor Register Files and Execution Units. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2006.
B. P. Linder, J. H. Stathis, D. J. Frank, S. Lombardo, and A. Vayshenker. Growth and Scaling of Oxide Conduction After Breakdown. In 41st Annual IEEE International Reliability Physics Symposium Proceedings, pp. 402–405, Mar. 2003. doi:https://doi.org/10.1109/RELPHY.2003.1197781
T. May and M. Woods. Alpha-Particle-Induced Soft Errors in Dynamic Memories. IEEE Transactions on Electronic Devices, 26(1), pp. 2–9, 1979.
S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T. Austin. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2003. doi:https://doi.org/10.1109/MICRO.2003.1253181
S. Oussalah and F. Nebel. On the Oxide Thickness Dependence of the Time-Dependent Dielectric Breakdown. In Proceedings of the IEEE Electron Devices Meeting, pp. 42–45, June 1999. doi:https://doi.org/10.1109/HKEDM.1999.836404
S. Ozdemir, D. Sinha, G. Memik, J. Adams, and H. Zhou. Yield-Aware Cache Architectures. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 15–25, Dec. 2006.
M. D. Powell and T. N. Vijaykumar. Pipeline Damping: A Microarchitectural Technique to Reduce Inductive Noise in Supply Voltage. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pp. 72–83, June 2003. doi:https://doi.org/10.1109/ISCA.2003.1206990
D. K. Pradhan. Fault-Tolerant Computer System Design. Prentice-Hall, Inc., Upper Saddle River, NJ, 1996.
P. Ramachandran, S. V. Adve, P. Bose, and J. A. Rivers. Metrics for Architecture-Level Lifetime Reliability Analysis. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, pp. 202–212, Apr. 2008.
R. Rodriguez, J. H. Stathis, and B. P. Linder. Modeling and Experimental Verification of the Effect of Gate Oxide Breakdown on CMOS Inverters. In Proceedings of the IEEE International Reliability Physics Symposium, pp. 11–16, 2003. doi:https://doi.org/10.1109/RELPHY.2003.1197713
B. F. Romanescu, M. E. Bauer, D. J. Sorin, and S. Ozev. Reducing the Impact of Intra-Core Process Variability with Criticality-Based Resource Allocation and Prefetching. In Proceedings of the ACM International Conference on Computing Frontiers, pp. 129–138, May 2008. doi:https://doi.org/10.1145/1366230.1366257
S. S. Sabade and D. Walker. IDDQ Test: Will It Survive the DSM Challenge? IEEE Design & Test of Computers, 19(5), pp. 8–16, Sept./Oct. 2002.
J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-End Arguments in Systems Design. ACM Transactions on Computer Systems, 2(4), pp. 277–288, Nov. 1984. doi:https://doi.org/10.1145/357401.357402
O. Serlin. Fault-Tolerant Systems in Commercial Applications. IEEE Computer, 17(8), pp. 19–30, Aug. 1984.
J. Shin, V. Zyuban, P. Bose, and T. M. Pinkston. A Proactive Wearout Recovery Approach for Exploiting Microarchitectural Redundancy to Extend Cache SRAM Lifetime. In Proceedings of the 35th Annual International Symposium on Computer Architecture, pp. 353–362, June 2008. doi:https://doi.org/10.1145/1394608.1382151
P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks, June 2002. doi:https://doi.org/10.1109/DSN.2002.1028924
D. P. Siewiorek and R. S. Swarz. Reliable Computer Systems: Design and Evaluation. A. K. Peters, third edition, Natick, Massachusetts, 1998.
K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. Temperature-aware Microarchitecture. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pp. 2–13, June 2003. doi:https://doi.org/10.1145/859619.859620
N. Soundararajan, A. Parashar, and A. Sivasubramaniam. Mechanisms for Bounding Vulnerabilities of Processor Structures. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 506–515, June 2007. doi:https://doi.org/10.1145/1250662.1250725
J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers. The Case for Lifetime Reliability-Aware Microprocessors. In Proceedings of the 31st Annual International Symposium on Computer Architecture, June 2004. doi:https://doi.org/10.1109/ISCA.2004.1310781
J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers. The Impact of Technology Scaling on Lifetime Reliability. In Proceedings of the International Conference on Dependable Systems and Networks, June 2004. doi:https://doi.org/10.1109/DSN.2004.1311888
J. H. Stathis. Physical and Predictive Models of Ultrathin Oxide Reliability in CMOS Devices and Circuits. IEEE Transactions on Device and Materials Reliability, 1(1), pp. 43–59, Mar. 2001. doi:https://doi.org/10.1109/7298.946459
D. Sylvester, D. Blaauw, and E. Karl. ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon. IEEE Design & Test of Computers, 23(6), pp. 484–490, Nov./Dec. 2006.
A. Tiwari, S. R. Sarangi, and J. Torrellas. ReCycle: Pipeline Adaptation to Tolerate Process Variability. In Proceedings of the 34th Annual International Symposium on Computer Architecture, June 2007.
A. Tiwari and J. Torrellas. Facelift: Hiding and Slowing Down Aging in Multicores. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, pp. 129–140, Nov. 2008.
J. von Neumann. Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components. In C. E. Shannon and J. McCarthy, editors, Automata Studies, pp. 43–98. Princeton University Press, Princeton, NJ, 1956.
N. J. Wang, A. Mahesri, and S. J. Patel. Examining ACE Analysis Reliability Estimates Using Fault-Injection. In Proceedings of the 34th Annual International Symposium on Computer Architecture, June 2007. doi:https://doi.org/10.1145/1250662.1250719
C. Weaver, J. Emer, S. S. Mukherjee, and S. K. Reinhardt. Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor. In Proceedings of the 31st Annual International Symposium on Computer Architecture, pp. 264–275, June 2004. doi:https://doi.org/10.1109/ISCA.2004.1310780
P. M. Wells, K. Chakraborty, and G. S. Sohi. Adapting to Intermittent Faults in Multicore Systems. In Proceedings of the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2008. doi:https://doi.org/10.1145/1346281.1346314
J. Ziegler. Terrestrial Cosmic Rays. IBM Journal of Research and Development, 40(1), pp. 19–39, Jan. 1996.
J. Ziegler et al. IBM Experiments in Soft Fails in Computer Electronics. IBM Journal of Research and Development, 40(1), pp. 3–18, Jan. 1996.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sorin, D. (2009). Introduction. In: Fault Tolerant Computer Architecture. Synthesis Lectures on Computer Architecture. Springer, Cham. https://doi.org/10.1007/978-3-031-01723-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-01723-0_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00595-4
Online ISBN: 978-3-031-01723-0
eBook Packages: Synthesis Collection of Technology (R0)eBColl Synthesis Collection 2