survey

Assessing Dependability with Software Fault Injection: A Survey

Authors:
Roberto Natella

DIETI, Federico II University of Naples, Naples, Italy

DIETI, Federico II University of Naples, Naples, Italy

0000-0003-1084-4824
View Profile

,
Domenico Cotroneo

DIETI, Federico II University of Naples, Naples, Italy

DIETI, Federico II University of Naples, Naples, Italy
View Profile

,
Henrique S. Madeira

CISUC, University of Coimbra, Coimbra, Portugal

CISUC, University of Coimbra, Coimbra, Portugal
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 48 Issue 3Article No.: 44pp 1–55https://doi.org/10.1145/2841425

Published:08 February 2016Publication History

ACM Computing Surveys

Abstract

With the rise of software complexity, software-related accidents represent a significant threat for computer-based systems. Software Fault Injection is a method to anticipate worst-case scenarios caused by faulty software through the deliberate injection of software faults. This survey provides a comprehensive overview of the state of the art on Software Fault Injection to support researchers and practitioners in the selection of the approach that best fits their dependability assessment goals, and it discusses how these approaches have evolved to achieve fault representativeness, efficiency, and usability. The survey includes a description of relevant applications of Software Fault Injection in the context of fault-tolerant systems.

References

J. Aidemark, J. Vinter, P. Folkesson, and J. Karlsson. 2001. GOOFI: Generic object-oriented fault injection tool. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 83--88. Google ScholarDigital Library
A. Albinet, J. Arlat, and J. C. Fabre. 2004. Characterization of the impact of faulty drivers on the robustness of the linux kernel. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 867--876. Google ScholarDigital Library
AMBER project. 2009. AMBER Final Research Roadmap. Retrieved from http://www.amber-project.eu/.Google Scholar
J. H. Andrews, L. C. Briand, and Y. Labiche. 2005. Is mutation an appropriate tool for testing experiments? In Proc. Intl. Conf. on Software Engineering. 402--411. Google ScholarDigital Library
J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J. C. Fabre, J. C. Laprie, E. Martins, and D. Powell. 1990. Fault injection for dependability validation: A methodology and some applications. IEEE Trans. Software Eng. 16, 2 (1990), 166--182. Google ScholarDigital Library
J. Arlat, Y. Crouzet, J. Karlsson, P. Folkesson, E. Fuchs, and G. H. Leber. 2003. Comparison of physical and software-implemented fault injection techniques. IEEE Trans. Comput. 52, 9 (2003), 1115--1133. Google ScholarDigital Library
J. Arlat, J. C. Fabre, M. Rodríguez, and F. Salles. 2002. Dependability of COTS microkernel-based systems. IEEE Trans. Comput. 51, 2 (2002), 138--163. Google ScholarDigital Library
J. Arlat and R. Moraes. 2011. Collecting, analyzing and archiving results from fault injection experiments. In Proc. Latin-American Symposium on Dependable Computing. 100--105.Google Scholar
A. Avizienis, J. C. Laprie, B. Randell, and C. Landwehr. 2004. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. on Dependable and Secure Computing 1, 1 (2004), 11--33. Google ScholarDigital Library
D. Avresky, J. Arlat, J. C. Laprie, and Y. Crouzet. 1996. Fault injection for formal testing of fault tolerance. IEEE Trans. on Reliability 45, 3 (1996), 443--455.Google ScholarCross Ref
R. Banabic and G. Candea. 2012. Fast black-box testing of system recovery code. In Proc. ACM European Conference on Computer Systems. 281--294. Google ScholarDigital Library
R. Barbosa, J. Vinter, P. Folkesson, and J. Karlsson. 2005. Assembly-level pre-injection analysis for improving fault injection efficiency. In Proc. European Dependable Computing Conf. 246--262. Google ScholarDigital Library
J. H. Barton, E. W. Czeck, Z. Z. Segall, and D. P. Siewiorek. 1990. Fault injection experiments using FIAT. IEEE Trans. Comput. 39, 4 (1990), 575--582. Google ScholarDigital Library
T. Basso, R. Moraes, B. P. Sanches, and M. Jino. 2009. An investigation of java faults operators derived from a field data study on Java software faults. In Workshop de Testes e Tolerância a Falhas.Google Scholar
A. Bondavalli, S. Chiaradonna, D. Cotroneo, and L. Romano. 2004. Effective fault treatment for improving the dependability of COTS and legacy-based applications. IEEE Trans. Dependable Secure Comput. 1, 4 (2004), 223--237. Google ScholarDigital Library
E. Bounimova, P. Godefroid, and D. Molnar. 2013. Billions and billions of constraints: Whitebox fuzz testing in production. In Proc. Intl. Conf. on Software Engineering. 122--131. Google ScholarDigital Library
P. Broadwell, N. Sastry, and J. Traupman. 2002. FIG: A prototype tool for online verification of recovery mechanisms. In Workshop on Self-Healing, Adaptive and self-MANaged Systems.Google Scholar
G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. 2004. Microreboot--A technique for cheap recovery. In Proc. Symp. on Operating Systems Design and Implementation. Google ScholarDigital Library
J. V. Carreira, D. Costa, and J. G. Silva. 1999. Fault injection spot-checks computer system dependability. IEEE Spectrum 36, 8 (1999), 50--55. Google ScholarDigital Library
J. Carreira, H. Madeira, and J. G. Silva. 1998. Xception: A technique for the experimental evaluation of dependability in modern computers. IEEE Trans. Software Eng. 24, 2 (1998), 125--136. Google ScholarDigital Library
R. Chandra, R. M. Lefever, K. R. Joshi, M. Cukier, and W. H. Sanders. 2004. A global-state-triggered fault injector for distributed system evaluation. IEEE Trans. Parallel Distrib. Syst. 15, 7 (2004), 593--605. Google ScholarDigital Library
S. Chandra and P. M. Chen. 1998. How fail-stop are faulty programs? In Digest of Papers, Intl. Symp. on Fault-Tolerant Computing. 240--249. Google ScholarDigital Library
R. Chillarege, I. S. Bhandari, J. K. Chaar, M. J. Halliday, D. S. Moebus, B. K. Ray, and M. Y. Wong. 1992. Orthogonal defect classification--A concept for in-process measurements. IEEE Trans. Software Eng. 18, 11 (1992), 943--956. Google ScholarDigital Library
R. Chillarege, W. L. Kao, and R. G. Condit. 1991. Defect type and its impact on the growth curve. In Proc. Intl. Conf. on Software Engineering. 246--255. Google ScholarDigital Library
J. Christmansson and R. Chillarege. 1996. Generation of an error set that emulates software faults based on field data. In Digest of Papers, Intl. Symp. on Fault-Tolerant Computing. 304--313. Google ScholarDigital Library
J. Christmansson, M. Hiller, and M. Rimen. 1998. An experimental comparison of fault and error injection. In Proc. IEEE Intl. Symp. on Software Reliability Engineering. 369--378. Google ScholarDigital Library
J. Christmansson and P. Santhanam. 1996. Error injection aimed at fault removal in fault tolerance mechanisms--Criteria for error selection using field data on software faults. In Proc. IEEE Intl. Symp. on Software Reliability Engineering. 175--184. Google ScholarDigital Library
J. A. Clark and D. K. Pradhan. 1995. Fault injection: A method for validating computer-system dependability. IEEE Computer 28, 6 (1995), 47--56. Google ScholarDigital Library
D. Cotroneo, D. Di Leo, F. Fucci, and R. Natella. 2013. SABRINE: State-based robustness testing of operating systems. In Proc. IEEE/ACM Intl. Conf. on Automated Software Engineering. 125--135.Google Scholar
D. Cotroneo, A. Lanzaro, R. Natella, and R. Barbosa. 2012. Experimental analysis of binary-level software fault injection in complex software. In Proc. European Dependable Computing Conf. 162--172. Google ScholarDigital Library
D. Cotroneo, R. Natella, S. Russo, and F. Scippacercola. 2013. State-driven testing of distributed systems. In Proc. Intl. Conf. Principles of Distributed Systems. 114--128. Google ScholarDigital Library
M. Daran and P. Thévenod-Fosse. 1996. Software error analysis: A real case study involving real faults and mutations. ACM Software Engineering Notes 21, 3 (1996), 158--171. Google ScholarDigital Library
S. Dawson, F. Jahanian, T. Mitton, and T. L. Tung. 1996. Testing of fault-tolerant and real-time distributed systems via protocol fault injection. In Digest of Papers, Intl. Symp. on Fault-Tolerant Computing. 404--414. Google ScholarDigital Library
DBench project. 2004. DBench Final Report. Retrieved from http://www.laas.fr/DBench/.Google Scholar
V. De Florio and C. Blondia. 2008. A survey of linguistic structures for application-level fault tolerance. Comput. Surveys 40, 2 (2008), 6. Google ScholarDigital Library
M. E. Delamaro and J. C. Maldonado. 1996. Proteum—A tool for the assessment of test adequacy for c programs. In Proc. Conf. Performability in Computer Systems. 79--95.Google Scholar
R. A. DeMillo, R. J. Lipton, and F. G. Sayward. 1978. Hints on test data selection: Help for the practicing programmer. IEEE Computer 11, 4 (1978), 34--41. Google ScholarDigital Library
C. P. Dingman and J. Marshall. 1995. Measuring robustness of a fault-tolerant aerospace system. In Digest of Papers, Intl. Symp. on Fault-Tolerant Computing. 522--527. Google ScholarDigital Library
H. Do and G. Rothermel. 2006. On the use of mutation faults in empirical assessments of test case prioritization techniques. IEEE Trans. Software Eng. (2006), 733--752. Google ScholarDigital Library
J. Duraes and H. Madeira. 2002. Emulation of software faults by educated mutations at machine-code level. In Proc. IEEE Intl. Symp. on Software Reliability Engineering. 329--340. Google ScholarDigital Library
J. Durães and H. Madeira. 2006. Emulation of software faults: A field data study and a practical approach. IEEE Trans. Software Eng. 32, 11 (2006), 849--867. Google ScholarDigital Library
J. Durães, M. Vieira, and H. Madeira. 2003. Multidimensional characterization of the impact of faulty drivers on the operating systems behavior. IEICE Trans. Inf. Sys. 86, 12 (2003), 2563--2570.Google Scholar
J. Durães, M. Vieira, and H. Madeira. 2004. Dependability benchmarking of Web-servers. In Proc. Intl. Conf. on Computer Safety, Reliability, and Security. 297--310.Google Scholar
N. E. Fenton and N. Ohlsson. 2000. Quantitative analysis of faults and failures in a complex software system. IEEE Trans. Software Eng. 26, 8 (2000), 797--814. Google ScholarDigital Library
C. Fetzer, P. Felber, and K. Högstedt. 2004. Automatic detection and masking of nonatomic exception handling. IEEE Trans. Software Eng. 30 (2004), 547--560. Issue 8. Google ScholarDigital Library
C. Fetzer and Z. Xiao. 2002. An automated approach to increasing the robustness of C libraries. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 155--164. Google ScholarDigital Library
J. Fonseca and M. Vieira. 2008. Mapping software faults with web security vulnerabilities. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 257--266.Google Scholar
J. Fonseca, M. Vieira, and H. Madeira. 2009. Vulnerability & attack injection for Web applications. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 93--102.Google Scholar
A. G. Ganek and T. A. Corbi. 2003. The dawning of the autonomic computing era. IBM Syst. J. 42, 1 (2003), 5--18. Google ScholarDigital Library
A. K. Ghosh, M. Schmid, and V. Shah. 1998. Testing the robustness of windows NT software. In Proc. IEEE Intl. Symp. on Software Reliability Engineering. 231--235. Google ScholarDigital Library
C. Giuffrida, A. Kuijsten, and A. S. Tanenbaum. 2013. EDFI: A dependable fault injection tool for dependability benchmarking experiments. In Proc. Pacific Rim Intl. Symp. on Dependable Computing. Google ScholarDigital Library
P. Godefroid, M. Y. Levin, and D. A. Molnar. 2008. Automated whitebox fuzz testing. In Proc. Network and Distributed Sys. Sec. Symp. 151--166.Google Scholar
A. Gorla, M. Pezzè, J. Wuttke, L. Mariani, and F. Pastore. 2012. Achieving cost-effective software reliability through self-healing. Comput. Inf. 29, 1 (2012), 93--115.Google Scholar
J. Gray. 1990. A census of tandem system availability between 1985 and 1990. IEEE Trans. on Reliability 39, 4 (1990), 409--418.Google ScholarCross Ref
H. S. Gunawi, T. Do, P. Joshi, P. Alvaro, J. M. Hellerstein, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, K. Sen, and D. Borthakur. 2011. FATE and DESTINI: A framework for cloud recovery testing. In Proc. USENIX Symposium on Networked Systems Design and Implementation. Google ScholarDigital Library
R. G. Hamlet. 1977. Testing programs with the aid of a compiler. IEEE Trans. Software Eng. 3, 4 (1977), 279--290. Google ScholarDigital Library
S. Han, K. G. Shin, and H. A. Rosenberg. 1995. DOCTOR: An integrated software fault injection environment. In Proc. Intl. Computer Performance and Dependability Symp. 204--213. Google ScholarDigital Library
M. Hiller, A. Jhumka, and N. Suri. 2001. An approach for analysing the propagation of data errors in software. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 161--170. Google ScholarDigital Library
M. C. Hsueh, T. K. Tsai, and R. K. Iyer. 1997. Fault injection techniques and tools. IEEE Computer 30, 4 (1997), 75--82. Google ScholarDigital Library
J. J. Hudak, B. H. Suh, D. P. Siewiorek, and Z. Segall. 1993. Evaluation and comparison of fault-tolerant software techniques. IEEE Trans. Reliability 42, 2 (1993), 190--204.Google ScholarCross Ref
IEEE. 1990. IEEE standard glossary of software engineering terminology. IEEE Std 610.12-1990 (1990).Google Scholar
IEEE. 1994. IEEE standard for information technology--Portable operating system interface (POSIX) part 1. IEEE Std 1003.1b-1993 (1994).Google Scholar
ISO. 2011. Product development: software level. ISO 26262: Road vehicles -- Functional safety 6 (2011).Google Scholar
T. Jarboui, J. Arlat, Y. Crouzet, K. Kanoun, and T. Marteau. 2002. Analysis of the effects of real and injected software faults: Linux as a case study. In Proc. Pacific Rim Intl. Symp. on Dependable Computing. 51--58. Google ScholarDigital Library
Y. Jia and M. Harman. 2009. Higher order mutation testing. Inf. Software Technol. 51, 10 (2009), 1379--1393. Google ScholarDigital Library
Y. Jia and M. Harman. 2011. An analysis and survey of the development of mutation testing. IEEE Trans. Software Eng. 37, 5 (2011), 649--678. Google ScholarDigital Library
A. Johansson and N. Suri. 2005. Error propagation profiling of operating systems. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 86--95. Google ScholarDigital Library
A. Johansson, N. Suri, and B. Murphy. 2007a. On the impact of injection triggers for OS robustness evaluation. In Proc. IEEE Intl. Symp. on Software Reliability Engineering. 127--126. Google ScholarDigital Library
A. Johansson, N. Suri, and B. Murphy. 2007b. On the selection of error model(s) for OS robustness evaluation. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 502--511. Google ScholarDigital Library
P. Joshi, H. S. Gunawi, and K. Sen. 2011. PREFAIL: A programmable tool for multiple-failure injection. ACM SIGPLAN Not. 46, 10 (2011), 171--188. Google ScholarDigital Library
A. Kalakech, K. Kanoun, Y. Crouzet, and J. Arlat. 2004. Benchmarking the dependability of windows NT4, 2000 and XP. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 681--686. Google ScholarDigital Library
G. A. Kanawati, N. A. Kanawati, and J. A. Abraham. 1995. FERRARI: A flexible software-based fault and error injection system. IEEE Trans. Comput. 44, 2 (1995), 248--260. Google ScholarDigital Library
K. Kanoun and L. Spainhower. 2008. Dependability Benchmarking for Computer Systems. Wiley-IEEE Computer Society. Google ScholarDigital Library
W.-I. Kao and R. K. Iyer. 1994. DEFINE: A distributed fault injection and monitoring environment. In Proc. Workshop on Fault-Tolerant Parallel and Distributed Systems. 252--259.Google Scholar
W.-I. Kao, R. K. Iyer, and D. Tang. 1993. FINE: A fault injection and monitoring environment for tracing the UNIX system behavior under faults. IEEE Trans. Software Eng. 19, 11 (1993), 1105--1118. Google ScholarDigital Library
J. Katcher. 1997. Postmark: A New File System Benchmark. Technical Report TR-3022.Google Scholar
L. Keller, P. Upadhyaya, and G. Candea. 2008. ConfErr: A tool for assessing resilience to human configuration errors. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 157--166.Google Scholar
J. C. King. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (1976), 385--394. Google ScholarDigital Library
K. N. King and A. J. Offutt. 1991. A fortran language system for mutation-based software testing. Software: Practice Exp. 21, 7 (1991), 685--718. Google ScholarDigital Library
P. Koopman and J. DeVale. 2000. The exception handling effectiveness of POSIX operating systems. IEEE Trans. Software Eng. 26, 9 (2000), 837--848. Google ScholarDigital Library
A. Lanzaro, R. Natella, S. Winter, D. Cotroneo, and N. Suri. 2014. An empirical study of injected versus actual interface errors. In Proc. Intl. Symp. Soft. Testing and Analysis. 397--408. Google ScholarDigital Library
J.-C. Laprie, J. Arlat, C. Beounes, and K. Kanoun. 1990. Definition and analysis of hardware-and software-fault-tolerant architectures. IEEE Computer 23, 7 (1990), 39--51. Google ScholarDigital Library
N. Laranjeiro, M. Vieira, and H. Madeira. 2014. A technique for deploying robust Web services. IEEE Trans. Services Comput. 7, 1 (2014), 68--81. Google ScholarDigital Library
I. Lee and R. K. Iyer. 1995. Software dependability in the tandem guardian system. IEEE Trans. Software Eng. 21, 5 (1995), 455--467. Google ScholarDigital Library
N. G. Leveson. 2004. Role of software in spacecraft accidents. J. Spacecraft Rockets 41, 4 (2004), 564--575.Google ScholarCross Ref
X. Li, M. C. Huang, K. Shen, and L. Chu. 2010. A realistic evaluation of memory hardware errors and software system susceptibility. In Proc. USENIX Annual Technical Conf. Google ScholarDigital Library
M. R. Lyu. 1995. Software Fault Tolerance. John Wiley & Sons. Google ScholarDigital Library
H. Madeira, D. Costa, and M. Vieira. 2000. On the emulation of software faults by software fault injection. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 417--426. Google ScholarDigital Library
L. Madeyski, W. Orzeszyna, R. Torkar, and M. Józala. 2014. Overcoming the equivalent mutant problem: A systematic literature review and a comparative experiment of second order mutation. IEEE Trans. Software Eng. 40, 1 (2014), 23--42. Google ScholarDigital Library
A. Mahmood, D. M. Andrews, and EJ McCluskey. 1984. Executable assertions and flight software. In Proceedings of the 6th Digital Avionics Systems Conference. 346--351.Google ScholarCross Ref
P. D. Marinescu and G. Candea. 2011. Efficient testing of recovery code using fault injection. ACM Trans. Comput. Syst. 29, 4 (2011), 11:1--11:38. Google ScholarDigital Library
E. Martins, C. M. F. Rubira, and N. G. M. Leme. 2002. Jaca: A reflective fault injection tool based on patterns. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 483--487. Google ScholarDigital Library
P. A. McQuaid. 2012. Software disasters—understanding the past, to improve the future. J. Software: Evolution and Process 24, 5 (2012), 459--470.Google ScholarCross Ref
Microsoft Corp. 2014. Resilience by Design for Cloud Services. Retrieved from http://www.microsoft.com/en-us/download/details.aspx?id=38823.Google Scholar
B. P. Miller, L. Fredriksen, and B. So. 1990. An empirical study of the reliability of UNIX utilities. Commun. ACM 33, 12 (1990), 32--44. Google ScholarDigital Library
B. Miller, D. Koski, C. Lee, V. Maganty, R. Murthy, A. Natarajan, and J. Steidl. 1998. Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities and Services. Technical Report CSTR-95-1268.Google Scholar
R. Moraes, R. Barbosa, J. Durães, N. Mendes, E. Martins, and H. Madeira. 2006. Injection of faults at component interfaces and inside the component code: Are they equivalent? In Proc. European Dependable Computing Conf. 53--64. Google ScholarDigital Library
V. Nagarajan, D. Jeffrey, and R. Gupta. 2009. Self-recovery in server programs. In Proc. Intl. Symp. on Memory Management. 49--58. Google ScholarDigital Library
NASA. 2004. NASA software safety guidebook. NASA-GB-8719.13 (2004).Google Scholar
R. Natella and D. Cotroneo. 2010. Emulation of transient software faults for dependability assessment: A case study. In Proc. European Dependable Computing Conf. 23--32. Google ScholarDigital Library
R. Natella, D. Cotroneo, J. A. Duraes, and H. Madeira. 2013. On fault representativeness of software fault injection. IEEE Trans. Software Eng. 39, 1 (2013), 80--96. Google ScholarDigital Library
W. T. Ng, C. M. Aycock, G. Rajamani, and P. M. Chen. 1996. Comparing disk and memory’s resistance to operating system crashes. In Proc. IEEE Intl. Symp. on Software Reliability Engineering. 185--194. Google ScholarDigital Library
W. T. Ng and P. M. Chen. 2001. The design and verification of the rio file cache. IEEE Trans. Comput. 50, 4 (2001), 322--337. Google ScholarDigital Library
A. J. Offutt. 1992. Investigations of the software testing coupling effect. ACM Trans. Software Eng Methodol. 1, 1 (1992), 5--20. Google ScholarDigital Library
A. J. Offutt, A. Lee, G. Rothermel, R. H. Untch, and C. Zapf. 1996. An experimental determination of sufficient mutant operators. ACM Trans. Software Eng Methodol. 5, 2 (1996), 99--118. Google ScholarDigital Library
D. Oppenheimer, A. Ganapathi, and D. A. Patterson. 2003. Why do internet services fail, and what can be done about it? In USENIX Symp. on Internet Technologies and Systems. Google ScholarDigital Library
T. J. Ostrand, E. J. Weyuker, and R. M. Bell. 2005. Predicting the location and number of faults in large software systems. IEEE Trans. Software Eng. 31, 4 (2005), 340--355. Google ScholarDigital Library
M. Papadakis and N. Malevris. 2010. An empirical evaluation of the first and second order mutation testing strategies. In Proc. Intl. Conf. Software Testing, Verification, and Validation Workshops. 90--99. Google ScholarDigital Library
C. S. Păsăreanu and W. Visser. 2009. A survey of new trends in symbolic execution for software testing and analysis. Intl. J. Software Tools Tech. Transf. 11, 4 (2009), 339--353. Google ScholarDigital Library
K. Pattabiraman, N. Nakka, Z. Kalbarczyk, and R. K. Iyer. 2008. SymPLFIED: symbolic program-level fault injection and error detection framework. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 472--481.Google Scholar
D. Patterson, A. Brown, P. Broadwell, G. Candea, M. Chen, J. Cutler, P. Enriquez, A. Fox, E. Kiciman, M. Merzbacher, D. Oppenheimer, N. Sastry, W. Tetzlaff, J. Traupman, and N. Treuhaft. 2002. Recovery-Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Technical Report TR-02-1175. Google ScholarDigital Library
D. Powell, E. Martins, J. Arlat, and Y. Crouzet. 1995. Estimators for fault tolerance coverage evaluation. IEEE Trans. Comput. 44, 2 (1995), 261--274. Google ScholarDigital Library
V. Prabhakaran, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. 2005. Model-based failure analysis of journaling file systems. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 802--811. Google ScholarDigital Library
G. L. Ries, G. S. Choi, and R. K. Iyer. 1994. Device-level transient fault modeling. In Digest of Papers, Intl. Symp. on Fault-Tolerant Computing. 86--94.Google Scholar
RTCA. 1992. DO-178B software considerations in airborne systems and equipment certification. Requirements and Technical Concepts for Aviation (1992).Google Scholar
F. Salfner, M. Lenk, and M. Malek. 2010. A survey of online failure prediction methods. Comput. Surveys 42, 3 (2010), 10. Google ScholarDigital Library
F. Salles, M. Rodriguez, J.-C. Fabre, and J. Arlat. 1999. MetaKernels and fault containment wrappers. In Digest of Papers, Intl. Symp. on Fault-Tolerant Computing. 22--29. Google ScholarDigital Library
B. P. Sanches, T. Basso, and R. Moraes. 2011. J-SWFIT: A Java software fault injection tool. In Proc. Latin American Symp. on Dependable Computing.Google Scholar
A. Schiper, K. Birman, and P. Stephenson. 1991. Lightweight causal and atomic group multicast. ACM Trans. Comput. Syst. 9, 3 (1991), 272--314. Google ScholarDigital Library
SPEC. 2000. SPECweb99 v1.02. Retrieved from http://www.spec.org/web99/.Google Scholar
M. Sridharan and A. S. Namin. 2010. Prioritizing mutation operators based on importance sampling. In Proc. IEEE Intl. Symp. on Software Reliability Engineering. 378--387. Google ScholarDigital Library
D. T. Stott, B. Floering, Z. Kalbarczyk, and R. K. Iyer. 2000. A framework for assessing dependability in distributed systems with lightweight fault injectors. In Proc. Intl. Computer Performance and Dependability Symp. 91--100. Google ScholarDigital Library
R. Strom and S. Yemini. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3 (1985), 204--226. Google ScholarDigital Library
M. Sullivan and R. Chillarege. 1991. Software defects and their impact on system availability: A study of field failures in operating systems. In Digest of Papers, Intl. Symp. on Fault-Tolerant Computing. 2--9.Google Scholar
N. Suri and P. Sinha. 1998. On the use of formal techniques for validation. In Digest of Papers, Intl. Symp. on Fault-Tolerant Computing. 390--399. Google ScholarDigital Library
M. Susskraut and C. Fetzer. 2006. Automatically finding and patching bad error handling. In Proc. European Dependable Computing Conf. 13--22. Google ScholarDigital Library
A. Thakur, R. K. Iyer, L. Young, and I. Lee. 1995. Analysis of failures in the tandem nonstop-UX operating system. In Proc. IEEE Intl. Symp. on Software Reliability Engineering. 40--50.Google Scholar
TPCC. 2010. TPC Benchmark C (TPC-C) v5.11. Retrieved from http://www.tpc.org/tpcc/.Google Scholar
T. K. Tsai, M. C. Hsueh, H. Zhao, Z. Kalbarczyk, and R. K. Iyer. 1999. Stress-based and path-based fault injection. IEEE Trans. Comput. 48, 11 (1999), 1183--1201. Google ScholarDigital Library
E. van der Kouwe, C. Giuffrida, and A. S. Tanenbaum. 2014. Evaluating distortion in fault injection experiments. In Proc. IEEE Intl. Symp. High-Assurance Systems Engineering. 25--32. Google ScholarDigital Library
P. C. Véras, E. Villani, A. M. Ambrosio, N. Silva, M. Vieira, and H. Madeira. 2012. Errors on space software requirements: A field study and application scenarios. In Proc. IEEE Intl. Symp. on Software Reliability Engineering. 61--70. Google ScholarDigital Library
M. Vieira and H. Madeira. 2003. A dependability benchmark for OLTP application environments. In Proc. Intl. Conf. on Very Large Data Bases. 742--753. Google ScholarDigital Library
M. Vieira, H. Madeira, I. Irrera, and M. Malek. 2009. Fault injection for failure prediction methods validation. In Proc. Workshop on Hot Topics in System Dependability.Google Scholar
J. M. Voas. 1998. Certifying off-the-shelf software components. IEEE Computer 31, 6 (1998), 53--59. Google ScholarDigital Library
J. M. Voas, F. Charron, G. McGraw, K. Miller, and M. Friedman. 1997. Predicting how badly “Good” software can behave. IEEE Software 14, 4 (1997), 73--83. Google ScholarDigital Library
C. J. Walter and N. Suri. 2003. The customizable fault/error model for dependable distributed systems. Theor. Comput. Sci. 290, 2 (2003), 1223--1251. Google ScholarDigital Library
E. J. Weyuker. 1998. Testing component-based software: A cautionary tale. IEEE Software 15, 5 (1998), 54--59. Google ScholarDigital Library
L. Wilson. 2013. International Technology Roadmap for Semiconductors. Retrieved from http://www.itrs.net.Google Scholar
S. Winter, C. Sârbu, N. Suri, and B. Murphy. 2011. The impact of fault models on software robustness evaluations. In Proc. Intl. Conf. on Software Engineering. 51--60. Google ScholarDigital Library
S. Winter, M. Tretter, B. Sattler, and N. Suri. 2013. simFI: From single to simultaneous software fault injections. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. Google ScholarDigital Library
C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. 2012. Experimentation in Software Engineering. Springer. Google ScholarDigital Library
W. E. Wong and A. P. Mathur. 1995. Reducing the cost of mutation testing: An empirical study. J. Syst. Software 31, 3 (1995), 185--196. Google ScholarDigital Library
J. Xu, S. Chen, Z. Kalbarczyk, and R. K. Iyer. 2001. An experimental study of security vulnerabilities caused by errors. In Proc. IEEE/IFIP Intl. Conf. Dependable Systems and Networks. 421--430. Google ScholarDigital Library

Index Terms

Assessing Dependability with Software Fault Injection: A Survey

Recommendations

Fault Injection for Software Certification

As software becomes more pervasive and complex, it's increasingly important to ensure that a system will be safe even in the presence of residual software faults (or bugs). Software fault injection consists of the deliberate introduction of software ...
Read More
Emulation of Transient Software Faults for Dependability Assessment: A Case Study
EDCC '10: Proceedings of the 2010 European Dependable Computing Conference

Fault Tolerance Mechanisms (FTMs) are extensively used in software systems to counteract software faults, in particular against faults that manifest transiently, namely Mandelbugs. In this scenario, Software Fault Injection (SFI) plays a key role for ...
Read More
A Framework for Assessing Dependability in Distributed Systems with Lightweight Fault Injectors
IPDS '00: Proceedings of the 4th International Computer Performance and Dependability Symposium

Many fault injection tools are available for dependability assessment. Although these tools are good at injecting a single fault model into a single system, they suffer from two main limitations for use in distributed systems: (1) no single tool is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 48, Issue 3
February 2016
619 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2856149
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering/University of Florida/Gainesville
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 February 2016
- Accepted: 1 October 2015
- Revised: 1 July 2015
- Received: 1 June 2013
Published in csur Volume 48, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Software faults
dependability assessment
software fault tolerance
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 129
  Total Citations
  View Citations
- 2,048
  Total Downloads
- Downloads (Last 12 months)276
- Downloads (Last 6 weeks)43
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Assessing Dependability with Software Fault Injection: A Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Fault Injection for Software Certification

Emulation of Transient Software Faults for Dependability Assessment: A Case Study

A Framework for Assessing Dependability in Distributed Systems with Lightweight Fault Injectors