Skip to main content

Part of the book series: Synthesis Lectures on Computer Architecture ((SLCA))

Abstract

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore’s law into remarkable increases in performance. Recently, however, the bounty provided by Moore’s law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

1.7 References

  1. J. Abella, X. Vera, and A. Gonzalez. Penelope: The NBTI-Aware Processor. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 85–96, Dec. 2007.

    Google Scholar 

  2. Advanced Micro Devices. Revision Guide for AMD Athlon64 and AMD Opteron Processors. Publication 25759, Revision 3.59, Sept. 2006.

    Google Scholar 

  3. R. M. Bentley. Validating the Pentium 4 Microprocessor. In Proceedings of the International Conference on Dependable Systems and Networks, pp. 493–498, July 2001. doi:https://doi.org/10.1109/DSN.2001.941434

  4. M. Blum and H. Wasserman. Reflections on the Pentium Bug. IEEE Transactions on Computers, 45(4), pp. 385–393, Apr. 1996. doi:https://doi.org/10.1109/12.494097

    Article  MATH  Google Scholar 

  5. S. Borkar. Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation. IEEE Micro, 25(6), pp. 10–16, Nov./Dec. 2005. doi:https://doi.org/10.1109/MM.2005.110

    Article  Google Scholar 

  6. J. R. Carter, S. Ozev, and D. J. Sorin. Circuit-Level Modeling for Concurrent Testing of Operational Defects due to Gate Oxide Breakdown. In Proceedings of Design, Automation, and Test in Europe (DATE), pp. 300–305, Mar. 2005. doi:https://doi.org/10.1109/DATE.2005.94

  7. J. J. Clement. Electromigration Modeling for Integrated Circuit Interconnect Reliability Analysis. IEEE Transactions on Device and Materials Reliability, 1(1), pp. 33–42, Mar. 2001. doi:https://doi.org/10.1109/7298.946458

    Article  Google Scholar 

  8. C. Constantinescu. Trends and Challenges in VLSI Circuit Reliability. IEEE Micro, 23(4), July–Aug. 2003. doi:https://doi.org/10.1109/MM.2003.1225959

  9. T. J. Dell. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. IBM Microelectronics Division Whitepaper, Nov. 1997.

    Google Scholar 

  10. D. J. Dumin. Oxide Reliability: A Summary of Silicon Oxide Wearout, Breakdown and Reliability. World Scientific Publications, 2002.

    Google Scholar 

  11. D. Ernst et al. Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2003. doi:https://doi.org/10.1109/MICRO.2003.1253179

  12. S. Feng, S. Gupta, and S. Mahlke. Olay: Combat the Signs of Aging with Introspective Reliability Management. In Proceedings of the Workshop on Quality-Aware Design, June 2008.

    Google Scholar 

  13. A. H. Fischer, A. von Glasow, S. Penka, and F. Ungar. Electromigration Failure Mechanism Studies on Copper Interconnects. In Proceedings of the 2002 IEEE Interconnect Technology Conference, pp. 139–141, 2002. doi:https://doi.org/10.1109/IITC.2002.1014913

  14. IBM. Enhancing IBM Netfinity Server Reliability: IBM Chipkill Memory. IBM Whitepaper, Feb. 1999.

    Google Scholar 

  15. IBM. IBM PowerPC 750FX and 750FL RISC Microprocessor Errata List DD2.X, version 1.3, Feb. 2006.

    Google Scholar 

  16. Intel Corporation. Intel Itanium Processor Specification Update. Order Number 249720-00, May 2003.

    Google Scholar 

  17. Intel Corporation. Intel Pentium 4 Processor Specification Update. Document Number 249199-065, June 2006.

    Google Scholar 

  18. S. Krumbein. Metallic Electromigration Phenomena. IEEE Transactions on Components, Hybrids, and Manufacturing Technology, 11(1), pp. 5–15, Mar. 1988. doi:https://doi.org/10.1109/33.2957

    Article  Google Scholar 

  19. P.-C. Li and T. K. Young. Electromigration: The Time Bomb in Deep-Submicron ICs. IEEE Spectrum, 33(9), pp. 75–78, Sept. 1996.

    Article  Google Scholar 

  20. X. Liang and D. Brooks. Mitigating the Impact of Process Variations on Processor Register Files and Execution Units. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2006.

    Google Scholar 

  21. B. P. Linder, J. H. Stathis, D. J. Frank, S. Lombardo, and A. Vayshenker. Growth and Scaling of Oxide Conduction After Breakdown. In 41st Annual IEEE International Reliability Physics Symposium Proceedings, pp. 402–405, Mar. 2003. doi:https://doi.org/10.1109/RELPHY.2003.1197781

  22. T. May and M. Woods. Alpha-Particle-Induced Soft Errors in Dynamic Memories. IEEE Transactions on Electronic Devices, 26(1), pp. 2–9, 1979.

    Article  Google Scholar 

  23. S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T. Austin. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2003. doi:https://doi.org/10.1109/MICRO.2003.1253181

  24. S. Oussalah and F. Nebel. On the Oxide Thickness Dependence of the Time-Dependent Dielectric Breakdown. In Proceedings of the IEEE Electron Devices Meeting, pp. 42–45, June 1999. doi:https://doi.org/10.1109/HKEDM.1999.836404

  25. S. Ozdemir, D. Sinha, G. Memik, J. Adams, and H. Zhou. Yield-Aware Cache Architectures. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 15–25, Dec. 2006.

    Google Scholar 

  26. M. D. Powell and T. N. Vijaykumar. Pipeline Damping: A Microarchitectural Technique to Reduce Inductive Noise in Supply Voltage. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pp. 72–83, June 2003. doi:https://doi.org/10.1109/ISCA.2003.1206990

  27. D. K. Pradhan. Fault-Tolerant Computer System Design. Prentice-Hall, Inc., Upper Saddle River, NJ, 1996.

    Google Scholar 

  28. P. Ramachandran, S. V. Adve, P. Bose, and J. A. Rivers. Metrics for Architecture-Level Lifetime Reliability Analysis. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, pp. 202–212, Apr. 2008.

    Google Scholar 

  29. R. Rodriguez, J. H. Stathis, and B. P. Linder. Modeling and Experimental Verification of the Effect of Gate Oxide Breakdown on CMOS Inverters. In Proceedings of the IEEE International Reliability Physics Symposium, pp. 11–16, 2003. doi:https://doi.org/10.1109/RELPHY.2003.1197713

  30. B. F. Romanescu, M. E. Bauer, D. J. Sorin, and S. Ozev. Reducing the Impact of Intra-Core Process Variability with Criticality-Based Resource Allocation and Prefetching. In Proceedings of the ACM International Conference on Computing Frontiers, pp. 129–138, May 2008. doi:https://doi.org/10.1145/1366230.1366257

  31. S. S. Sabade and D. Walker. IDDQ Test: Will It Survive the DSM Challenge? IEEE Design & Test of Computers, 19(5), pp. 8–16, Sept./Oct. 2002.

    Article  Google Scholar 

  32. J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-End Arguments in Systems Design. ACM Transactions on Computer Systems, 2(4), pp. 277–288, Nov. 1984. doi:https://doi.org/10.1145/357401.357402

    Article  Google Scholar 

  33. O. Serlin. Fault-Tolerant Systems in Commercial Applications. IEEE Computer, 17(8), pp. 19–30, Aug. 1984.

    Article  Google Scholar 

  34. J. Shin, V. Zyuban, P. Bose, and T. M. Pinkston. A Proactive Wearout Recovery Approach for Exploiting Microarchitectural Redundancy to Extend Cache SRAM Lifetime. In Proceedings of the 35th Annual International Symposium on Computer Architecture, pp. 353–362, June 2008. doi:https://doi.org/10.1145/1394608.1382151

  35. P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks, June 2002. doi:https://doi.org/10.1109/DSN.2002.1028924

  36. D. P. Siewiorek and R. S. Swarz. Reliable Computer Systems: Design and Evaluation. A. K. Peters, third edition, Natick, Massachusetts, 1998.

    Book  Google Scholar 

  37. K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. Temperature-aware Microarchitecture. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pp. 2–13, June 2003. doi:https://doi.org/10.1145/859619.859620

  38. N. Soundararajan, A. Parashar, and A. Sivasubramaniam. Mechanisms for Bounding Vulnerabilities of Processor Structures. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 506–515, June 2007. doi:https://doi.org/10.1145/1250662.1250725

  39. J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers. The Case for Lifetime Reliability-Aware Microprocessors. In Proceedings of the 31st Annual International Symposium on Computer Architecture, June 2004. doi:https://doi.org/10.1109/ISCA.2004.1310781

  40. J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers. The Impact of Technology Scaling on Lifetime Reliability. In Proceedings of the International Conference on Dependable Systems and Networks, June 2004. doi:https://doi.org/10.1109/DSN.2004.1311888

  41. J. H. Stathis. Physical and Predictive Models of Ultrathin Oxide Reliability in CMOS Devices and Circuits. IEEE Transactions on Device and Materials Reliability, 1(1), pp. 43–59, Mar. 2001. doi:https://doi.org/10.1109/7298.946459

    Article  Google Scholar 

  42. D. Sylvester, D. Blaauw, and E. Karl. ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon. IEEE Design & Test of Computers, 23(6), pp. 484–490, Nov./Dec. 2006.

    Article  Google Scholar 

  43. A. Tiwari, S. R. Sarangi, and J. Torrellas. ReCycle: Pipeline Adaptation to Tolerate Process Variability. In Proceedings of the 34th Annual International Symposium on Computer Architecture, June 2007.

    Google Scholar 

  44. A. Tiwari and J. Torrellas. Facelift: Hiding and Slowing Down Aging in Multicores. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, pp. 129–140, Nov. 2008.

    Google Scholar 

  45. J. von Neumann. Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components. In C. E. Shannon and J. McCarthy, editors, Automata Studies, pp. 43–98. Princeton University Press, Princeton, NJ, 1956.

    Google Scholar 

  46. N. J. Wang, A. Mahesri, and S. J. Patel. Examining ACE Analysis Reliability Estimates Using Fault-Injection. In Proceedings of the 34th Annual International Symposium on Computer Architecture, June 2007. doi:https://doi.org/10.1145/1250662.1250719

  47. C. Weaver, J. Emer, S. S. Mukherjee, and S. K. Reinhardt. Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor. In Proceedings of the 31st Annual International Symposium on Computer Architecture, pp. 264–275, June 2004. doi:https://doi.org/10.1109/ISCA.2004.1310780

  48. P. M. Wells, K. Chakraborty, and G. S. Sohi. Adapting to Intermittent Faults in Multicore Systems. In Proceedings of the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2008. doi:https://doi.org/10.1145/1346281.1346314

  49. J. Ziegler. Terrestrial Cosmic Rays. IBM Journal of Research and Development, 40(1), pp. 19–39, Jan. 1996.

    Article  Google Scholar 

  50. J. Ziegler et al. IBM Experiments in Soft Fails in Computer Electronics. IBM Journal of Research and Development, 40(1), pp. 3–18, Jan. 1996.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Nature Switzerland AG

About this chapter

Cite this chapter

Sorin, D. (2009). Introduction. In: Fault Tolerant Computer Architecture. Synthesis Lectures on Computer Architecture. Springer, Cham. https://doi.org/10.1007/978-3-031-01723-0_1

Download citation

Publish with us

Policies and ethics