Skip to main content
Top

2018 | OriginalPaper | Chapter

Damage Reduction via White-Box Failure Shaping

Authors : Thomas B. Jones, David H. Ackley

Published in: Search-Based Software Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Emerging hardware that trades reliability guarantees for resource savings presents a challenge to software engineered for deterministic execution. Research areas like approximate computing, however, embrace non-determinism by abandoning strict correctness in favor of maximizing the probability and degree of correctness. Existing work has used stochastic failure sampling to perform white-box searches along software execution paths, producing criticality assessments of which selected operations are likely most damaging if they fail. Here, we apply these assessments to a new domain and employ them using failure shaping, an automated method for reducing a computation’s expected output damage in a model where failures can be relocated but not eliminated. In two case studies, we demonstrate error reductions of 38% to 63% on Strassen’s matrix multiplication algorithm despite a virtually identical failure count. We discuss how our framework helps provide a smooth landscape for performing the search-based software engineering that will be required to apply this technology to larger problems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
3.
go back to reference Akram, R., Alam, M.M.U., Muzahid, A.: Approximate lock: trading off accuracy for performance by skipping critical sections. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 253–263. IEEE (2016) Akram, R., Alam, M.M.U., Muzahid, A.: Approximate lock: trading off accuracy for performance by skipping critical sections. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 253–263. IEEE (2016)
5.
go back to reference Areias, C., Cunha, J.C., Vieira, M.: Studying the propagation of failures in SOAs. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 81–86. IEEE (2015) Areias, C., Cunha, J.C., Vieira, M.: Studying the propagation of failures in SOAs. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 81–86. IEEE (2015)
7.
go back to reference Atkinson, B., DeBardeleben, N., Guan, Q., Robey, R., Jones, W.M.: Fault injection experiments with the CLAMR hydrodynamics mini-app. In: 2014 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 6–9. IEEE (2014) Atkinson, B., DeBardeleben, N., Guan, Q., Robey, R., Jones, W.M.: Fault injection experiments with the CLAMR hydrodynamics mini-app. In: 2014 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 6–9. IEEE (2014)
9.
go back to reference Baudry, B., Fleurey, F., Jézéquel, J.M., Traon, Y.L.: From genetic to bacteriological algorithms for mutation-based testing: research articles. Verif. Reliab. Softw. Test. 15(2), 73–96 (2005)CrossRef Baudry, B., Fleurey, F., Jézéquel, J.M., Traon, Y.L.: From genetic to bacteriological algorithms for mutation-based testing: research articles. Verif. Reliab. Softw. Test. 15(2), 73–96 (2005)CrossRef
10.
go back to reference Borchert, C., Schirmeier, H., Spinczyk, O.: Protecting the dynamic dispatch in C++ by dependability aspects. In: GI-Jahrestagung, pp. 521–536 (2012) Borchert, C., Schirmeier, H., Spinczyk, O.: Protecting the dynamic dispatch in C++ by dependability aspects. In: GI-Jahrestagung, pp. 521–536 (2012)
11.
go back to reference Cámara, J., de Lemos, R.: Evaluation of resilience in self-adaptive systems using probabilistic model-checking. In: Proceedings of the 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, pp. 53–62. IEEE Press (2012) Cámara, J., de Lemos, R.: Evaluation of resilience in self-adaptive systems using probabilistic model-checking. In: Proceedings of the 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, pp. 53–62. IEEE Press (2012)
14.
go back to reference Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of programs. Commun. ACM 55(8), 107–115 (2012)CrossRefMATH Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of programs. Commun. ACM 55(8), 107–115 (2012)CrossRefMATH
15.
go back to reference Chippa, V.K., Chakradhar, S.T., Roy, K., Raghunathan, A.: Analysis and characterization of inherent application resilience for approximate computing. In: Proceedings of the 50th Annual Design Automation Conference, p. 113. ACM (2013) Chippa, V.K., Chakradhar, S.T., Roy, K., Raghunathan, A.: Analysis and characterization of inherent application resilience for approximate computing. In: Proceedings of the 50th Annual Design Automation Conference, p. 113. ACM (2013)
16.
go back to reference Dantas, J., Matos, R., Araujo, J., Oliveira, D., Oliveira, A., Maciel, P.: Hierarchical model and sensitivity analysis for a cloud-based VoD streaming service. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Workshop, pp. 10–16. IEEE (2016) Dantas, J., Matos, R., Araujo, J., Oliveira, D., Oliveira, A., Maciel, P.: Hierarchical model and sensitivity analysis for a cloud-based VoD streaming service. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Workshop, pp. 10–16. IEEE (2016)
17.
go back to reference De Kruijf, M., Nomura, S., Sankaralingam, K.: Relax: an architectural framework for software recovery of hardware faults. ACM SIGARCH Comput. Archit. News 38(3), 497–508 (2010)CrossRef De Kruijf, M., Nomura, S., Sankaralingam, K.: Relax: an architectural framework for software recovery of hardware faults. ACM SIGARCH Comput. Archit. News 38(3), 497–508 (2010)CrossRef
18.
go back to reference Filiposka, S., Mishev, A., Juiz, C.: Current prospects towards energy-efficient top HPC systems. Comput. Sci. Inf. Syst. 13(1), 151–171 (2016)CrossRef Filiposka, S., Mishev, A., Juiz, C.: Current prospects towards energy-efficient top HPC systems. Comput. Sci. Inf. Syst. 13(1), 151–171 (2016)CrossRef
19.
go back to reference Gargama, H., Chaturvedi, S.K.: Criticality assessment models for failure mode effects and criticality analysis using fuzzy logic. IEEE Trans. Reliab. 60(1), 102–110 (2011)CrossRef Gargama, H., Chaturvedi, S.K.: Criticality assessment models for failure mode effects and criticality analysis using fuzzy logic. IEEE Trans. Reliab. 60(1), 102–110 (2011)CrossRef
20.
go back to reference Gay, G., Rayadurgam, S., Heimdahl, M.P.: Automated steering of model-based test oracles to admit real program behaviors. IEEE Trans. Softw. Eng. 43(6), 531–555 (2017)CrossRef Gay, G., Rayadurgam, S., Heimdahl, M.P.: Automated steering of model-based test oracles to admit real program behaviors. IEEE Trans. Softw. Eng. 43(6), 531–555 (2017)CrossRef
21.
go back to reference Guo, S., Huang, H.Z., Wang, Z., Xie, M.: Grid service reliability modeling and optimal task scheduling considering fault recovery. IEEE Trans. Reliab. 60(1), 263–274 (2011)CrossRef Guo, S., Huang, H.Z., Wang, Z., Xie, M.: Grid service reliability modeling and optimal task scheduling considering fault recovery. IEEE Trans. Reliab. 60(1), 263–274 (2011)CrossRef
22.
go back to reference Han, J., Orshansky, M.: Approximate computing: an emerging paradigm for energy-efficient design. In: 2013 18th IEEE European Test Symposium (ETS), pp. 1–6. IEEE (2013) Han, J., Orshansky, M.: Approximate computing: an emerging paradigm for energy-efficient design. In: 2013 18th IEEE European Test Symposium (ETS), pp. 1–6. IEEE (2013)
23.
go back to reference Harman, M., et al.: Testability transformation. IEEE Trans. Softw. Eng. 30(1), 3–16 (2004)CrossRef Harman, M., et al.: Testability transformation. IEEE Trans. Softw. Eng. 30(1), 3–16 (2004)CrossRef
24.
go back to reference Holler, A., Macher, G., Rauter, T., Iber, J., Kreiner, C.: A virtual fault injection framework for reliability-aware software development. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 69–74. IEEE (2015) Holler, A., Macher, G., Rauter, T., Iber, J., Kreiner, C.: A virtual fault injection framework for reliability-aware software development. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 69–74. IEEE (2015)
25.
go back to reference Hukerikar, S., Lucas, R.F.: Rolex: resilience-oriented language extensions for extreme-scale systems. J. Supercomput. 72(12), 4662–4695 (2016)CrossRef Hukerikar, S., Lucas, R.F.: Rolex: resilience-oriented language extensions for extreme-scale systems. J. Supercomput. 72(12), 4662–4695 (2016)CrossRef
26.
go back to reference Ibtesham, D., DeBonis, D., Arnold, D., Ferreira, K.B.: Coarse-grained energy modeling of rollback/recovery mechanisms. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 708–713. IEEE (2014) Ibtesham, D., DeBonis, D., Arnold, D., Ferreira, K.B.: Coarse-grained energy modeling of rollback/recovery mechanisms. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 708–713. IEEE (2014)
27.
go back to reference Irrera, I., Vieira, M.: Towards assessing representativeness of fault injection-generated failure data for online failure prediction. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 75–80. IEEE (2015) Irrera, I., Vieira, M.: Towards assessing representativeness of fault injection-generated failure data for online failure prediction. In: 2015 IEEE International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 75–80. IEEE (2015)
28.
go back to reference Jones, T.B., Ackley, D.H.: Comparison criticality in sorting algorithms. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 726–731. IEEE (2014) Jones, T.B., Ackley, D.H.: Comparison criticality in sorting algorithms. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 726–731. IEEE (2014)
29.
go back to reference Jones, T.B., Ackley, D.H.: Scalable robustness. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop, pp. 31–38. IEEE (2016) Jones, T.B., Ackley, D.H.: Scalable robustness. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop, pp. 31–38. IEEE (2016)
30.
go back to reference Kahng, A.B., Kang, S., Kumar, R., Sartori, J.: Slack redistribution for graceful degradation under voltage overscaling. In: Proceedings of the 2010 Asia and South Pacific Design Automation Conference, pp. 825–831. IEEE Press (2010) Kahng, A.B., Kang, S., Kumar, R., Sartori, J.: Slack redistribution for graceful degradation under voltage overscaling. In: Proceedings of the 2010 Asia and South Pacific Design Automation Conference, pp. 825–831. IEEE Press (2010)
32.
go back to reference Kukunas, J., Cupper, R.D., Kapfhammer, G.M.: A genetic algorithm to improve Linux kernel performance on resource-constrained devices. In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 2095–2096. ACM (2010) Kukunas, J., Cupper, R.D., Kapfhammer, G.M.: A genetic algorithm to improve Linux kernel performance on resource-constrained devices. In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 2095–2096. ACM (2010)
34.
go back to reference Liu, S., Pattabiraman, K., Moscibroda, T., Zorn, B.G.: Flikker: saving dram refresh-power through critical data partitioning. ACM SIGPLAN Not. 47(4), 213–224 (2012)CrossRef Liu, S., Pattabiraman, K., Moscibroda, T., Zorn, B.G.: Flikker: saving dram refresh-power through critical data partitioning. ACM SIGPLAN Not. 47(4), 213–224 (2012)CrossRef
35.
go back to reference Mathew, S., Varia, J.: Overview of Amazon Web Services. Amazon Whitepapers (2014) Mathew, S., Varia, J.: Overview of Amazon Web Services. Amazon Whitepapers (2014)
36.
go back to reference Mohapatra, D., Chippa, V.K., Raghunathan, A., Roy, K.: Design of voltage-scalable meta-functions for approximate computing. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1–6. IEEE (2011) Mohapatra, D., Chippa, V.K., Raghunathan, A., Roy, K.: Design of voltage-scalable meta-functions for approximate computing. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1–6. IEEE (2011)
37.
go back to reference Monson, J.S., Wirthlin, M., Hutchings, B.: A fault injection analysis of Linux operating on an FPGA-embedded platform. Int. J. Reconfig. Comput. 2012, 7 (2012)CrossRef Monson, J.S., Wirthlin, M., Hutchings, B.: A fault injection analysis of Linux operating on an FPGA-embedded platform. Int. J. Reconfig. Comput. 2012, 7 (2012)CrossRef
38.
go back to reference Natella, R., Cotroneo, D., Duraes, J.A., Madeira, H.S.: On fault representativeness of software fault injection. IEEE Trans. Softw. Eng. 39(1), 80–96 (2013)CrossRef Natella, R., Cotroneo, D., Duraes, J.A., Madeira, H.S.: On fault representativeness of software fault injection. IEEE Trans. Softw. Eng. 39(1), 80–96 (2013)CrossRef
39.
go back to reference Oliveira, D.A., Lunardi, C.B., Pilla, L.L., Rech, P., Navaux, P.O., Carro, L.: Radiation sensitivity of high performance computing applications on Kepler-based GPGPUs. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 732–737. IEEE (2014) Oliveira, D.A., Lunardi, C.B., Pilla, L.L., Rech, P., Navaux, P.O., Carro, L.: Radiation sensitivity of high performance computing applications on Kepler-based GPGPUs. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 732–737. IEEE (2014)
40.
go back to reference Pai, G.J., Dugan, J.B.: Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans. Softw. Eng. 33(10) (2007) Pai, G.J., Dugan, J.B.: Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans. Softw. Eng. 33(10) (2007)
41.
go back to reference Piancó, M., Fonseca, B., Antunes, N.: Code change history and software vulnerabilities. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Workshop, pp. 6–9. IEEE (2016) Piancó, M., Fonseca, B., Antunes, N.: Code change history and software vulnerabilities. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Workshop, pp. 6–9. IEEE (2016)
42.
go back to reference Raha, A., Raghunathan, V.: Towards full-system energy-accuracy tradeoffs: a case study of an approximate smart camera system. In: Proceedings of the 54th Annual Design Automation Conference 2017, p. 74. ACM (2017) Raha, A., Raghunathan, V.: Towards full-system energy-accuracy tradeoffs: a case study of an approximate smart camera system. In: Proceedings of the 54th Annual Design Automation Conference 2017, p. 74. ACM (2017)
43.
go back to reference Rodrigues, I., Ribeiro, M., Medeiros, F., Borba, P., Fonseca, B., Gheyi, R.: Assessing fine-grained feature dependencies. Inf. Softw. Technol. 78, 27–52 (2016)CrossRef Rodrigues, I., Ribeiro, M., Medeiros, F., Borba, P., Fonseca, B., Gheyi, R.: Assessing fine-grained feature dependencies. Inf. Softw. Technol. 78, 27–52 (2016)CrossRef
45.
go back to reference Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., Grossman, D.: EnerJ: approximate data types for safe and general low-power computation. ACM SIGPLAN Not. 46, 164–174 (2011)CrossRef Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., Grossman, D.: EnerJ: approximate data types for safe and general low-power computation. ACM SIGPLAN Not. 46, 164–174 (2011)CrossRef
46.
go back to reference Siciliano, V., Garzilli, I., Fracassi, C., Criscuolo, S., Ventre, S., Di Bernardo, D.: MiRNAs confer phenotypic robustness to gene networks by suppressing biological noise. Nat. Commun. 4, 2364 (2013)CrossRef Siciliano, V., Garzilli, I., Fracassi, C., Criscuolo, S., Ventre, S., Di Bernardo, D.: MiRNAs confer phenotypic robustness to gene networks by suppressing biological noise. Nat. Commun. 4, 2364 (2013)CrossRef
47.
go back to reference Ukkusuri, S.V., Yushimito, W.F.: A methodology to assess the criticality of highway transportation networks. J. Transp. Secur. 2(1–2), 29–46 (2009)CrossRef Ukkusuri, S.V., Yushimito, W.F.: A methodology to assess the criticality of highway transportation networks. J. Transp. Secur. 2(1–2), 29–46 (2009)CrossRef
49.
go back to reference Xiang, J., Ye, L., Vicario, E., Tadano, K., Machida, F.: Analysis of relevance and importance of components in system reliability. In: 2015 2nd International Symposium on Dependable Computing and Internet of Things (DCIT), pp. 146–147. IEEE (2015) Xiang, J., Ye, L., Vicario, E., Tadano, K., Machida, F.: Analysis of relevance and importance of components in system reliability. In: 2015 2nd International Symposium on Dependable Computing and Internet of Things (DCIT), pp. 146–147. IEEE (2015)
Metadata
Title
Damage Reduction via White-Box Failure Shaping
Authors
Thomas B. Jones
David H. Ackley
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-99241-9_11

Premium Partner