skip to main content
10.1145/2248418.2248433acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Efficient soft error protection for commodity embedded microprocessors using profile information

Published:12 June 2012Publication History

ABSTRACT

Successive generations of processors use smaller transistors in the quest to make more powerful computing systems. It has been previously studied that smaller transistors make processors more susceptible to soft errors (transient faults caused by high energy particle strikes). Such errors can result in unexpected behavior and incorrect results. With smaller and cheaper transistors becoming pervasive in mainstream computing, it is necessary to protect these devices against soft errors; an increasing rate of faults necessitates the protection of applications running on commodity processors against soft errors. The existing methods of protecting against such faults generally have high area or performance overheads and thus are not directly applicable in the embedded design space. In order to protect against soft errors, the detection of these errors is a necessary first step so that a recovery can be triggered.

To solve the problem of detecting soft errors cheaply, we propose a profiling-based software-only application analysis and transformation solution. The goal is to develop a low cost solution which can be deployed for off-the-shelf embedded processors. The solution works by intelligently duplicating instructions that are likely to affect the program output, and comparing results between original and duplicated instructions. The intelligence of our solution is garnered through the use of control flow, memory dependence, and value profiling to understand and exploit the common-case behavior of applications. Our solution is able to achieve 92% fault coverage with a 20% instruction overhead. This represents a 41% lower performance overhead than the best prior approaches with approximately the same fault coverage.

References

  1. T. Austin. Diva: a reliable substrate for deep submicron microarchitecture design. In Proc. of the 32nd Annual International Symposium on Microarchitecture, pages 196--207, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Bartlett and L. Spainhower. Commercial fault tolerance: A tale of two systems. IEEE Transactions on Dependable and Secure Computing, 1(1): 87--96, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. B. Bell, K. M. Lepak, and M. H. Lipasti. Characterization of silent stores. In Proc. of the 9th International Conference on Parallel Architectures and Compilation Techniques, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Bernick, B. Bruckert, P. D. Vigna, D. Garcia, R. Jardine, J. Klecka, and J. Smullen. Nonstop advanced architecture. In International Conference on Dependable Systems and Networks, pages 12--21, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. L. Binkert, E. G. Hallnor, and S. K. Reinhardt. Network-oriented full-system simulation using M5. In 6th Workshop on Computer Architecture Evaluation using Commercial Workloads, pages 36--43, Feb. 2003.Google ScholarGoogle Scholar
  6. J. A. Blome, S. Gupta, S. Feng, S. Mahlke, and D. Bradley. Cost-efficient soft error protection for embedded microprocessors. In Proc. of the 2006 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 421--431, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. A. Bower, D. J. Sorin, and S. Ozev. A mechanism for online diagnosis of hard faults in microprocessors. In Proc. of the 38th Annual International Symposium on Microarchitecture, pages 197--208, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Dixit and A. Wood. The impact of new technology on soft error rates. In Reliability Physics Symposium (IRPS), 2011 IEEE International, april 2011.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Feng, S. Gupta, A. Ansari, and S. Mahlke. Shoestring: Probabilistic soft-error reliability on the cheap. In 18th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. A. Gomaa, C. Scarbrough, I. Pomeranz, and T. N. Vijaykumar. Transient-fault recovery for chip multiprocessors. In Proc. of the 30th Annual International Symposium on Computer Architecture, pages 98--109, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Gupta, E. Mehofer, and Y. Zhang. Profile guided compiler optimizations. The Compiler Design Handbook: Optimizations and Machine Code Generation, CRC Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Hari, M.-L. Li, P. Ramachandran, B. Choi, and S. Adve. mswat: Low-cost hardware fault detection and diagnosis for multicore systems. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pages 122--132, dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Hubicka. Profile driven optimisations in gcc. GCC Summit Proceedings, pages 107--124, 2005.Google ScholarGoogle Scholar
  14. M. M. Latif, R. Ramaseshan, and F. Mueller. Soft error protection via fault-resilient data representations. In Workshop on Silicon Errors in Logic - System Effects, 2007.Google ScholarGoogle Scholar
  15. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the 2004 International Symposium on Code Generation and Optimization, pages 75--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Leveugle, A. Calvez, P. Maistri, and P. Vanhauwaert. Statistical fault injection: quantified error and confidence. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '09, pages 502--506. European Design and Automation Association, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M.-L. Li, P. Ramachandran, S. K. Sahoo, S. V. Adve, V. S. Adve, and Y. Zhou. Understanding the propagation of hard errors to software and implications for resilient system design. In 16th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 265--276, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Mason. LAMPVIEW: A Loop-Aware Toolset for Facilitating Parallelization. Master's thesis, Dept. of Electrical Engineeringi, Princeton University, Aug. 2009.Google ScholarGoogle Scholar
  19. T. May and M. Woods. Alpha-particle-induced soft errors in dynamic memories. IEEE Transactions on Electron Devices, 26(1):2--9, Jan. 1979.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Meixner, M. Bauer, and D. Sorin. Argus: Low-cost, comprehensive error detection in simple cores. IEEE Micro, 28(1):52--59, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Montesinos, W. Liu, and J. Torrellas. Using register lifetime predictions to protect register files against soft errors. In Proc. of the 2007 International Conference on Dependable Systems and Networks, pages 286--296, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. S. Mukherjee, M. Kontz, and S. K. Reinhardt. Detailed design and evaluation of redundant multithreading alternatives. In Proc. of the 29th Annual International Symposium on Computer Architecture, pages 99--110, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A systematic methodology to compute the architectural vulnerability factors for a high performance microprocessor. In International Symposium on Microarchitecture, pages 29--42, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Normand. Single event upset at ground level. Nuclear Science, IEEE Transactions on, 43(6):2742--2750, dec 1996. ISSN 0018--9499. doi: 10.1109/23.556861.Google ScholarGoogle Scholar
  25. S. K. Reinhardt and S. S. Mukherjee. Transient fault detection via simulataneous multithreading. In Proc. of the 27th Annual International Symposium on Computer Architecture, pages 25--36, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. August. SWIFT: Software implemented fault tolerance. In Proc. of the 2005 International Symposium on Code Generation and Optimization, pages 243--254, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, D. I. August, and S. S. Mukherjee. Software-controlled fault tolerance. ACM Transactions on Architecture and Code Optimization, 2(4):366--396, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Rotenberg. AR-SMT: A microarchitectural approach to fault tolerance in microprocessors. In International Symposium on Fault Tolerant Computing, pages 84--91, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proc. of the 2002 International Conference on Dependable Systems and Networks, pages 389--398, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Smolens, J. Kim, J. Hoe, and B. Falsafi. Efficient resource sharing in concurrent error detecting superscalar microarchitectures. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 256--268, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Spainhower and T. Gregg. IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective. IBM Journal of Research and Development, 43(6):863--873, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. J. Wang and S. J. Patel. ReStore: Symptom-based soft error detection in microprocessors. IEEE Transactions on Dependable and Secure Computing, 3(3):188--201, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. J. Wang, M. Fertig, and S. J. Patel. Y-branches: When you come to a fork in the road, take it. In Proc. of the 12th International Conference on Parallel Architectures and Compilation Techniques, pages 56--65, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel. Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline. In International Conference on Dependable Systems and Networks, page 61, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Weaver and T. M. Austin. A fault tolerant approach to microprocessor design. In Proc. of the 2001 International Conference on Dependable Systems and Networks, pages 411--420, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. F. Ziegler and H. Puchner. SER-History, Trends, and Challenges: A Guide for Designing with Memory ICs. Cypress Semiconductor Corp., 2004.Google ScholarGoogle Scholar

Index Terms

  1. Efficient soft error protection for commodity embedded microprocessors using profile information

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
          June 2012
          153 pages
          ISBN:9781450312127
          DOI:10.1145/2248418
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 47, Issue 5
            LCTES '12
            MAY 2012
            152 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2345141
            Issue’s Table of Contents

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 June 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate116of438submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader