Skip to main content
Erschienen in: The Journal of Supercomputing 3/2013

01.12.2013

Component survivability at runtime for mission-critical distributed systems

verfasst von: Joon S. Park, Pratheep Chandramohan, Avinash T. Suresh, Joseph V. Giordano, Kevin A. Kwiat

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As information systems develop into larger and more complex implementations, the need for survivability in mission-critical systems is pressing. Furthermore, the requirement for protecting information systems becomes increasingly vital, while new threats are identified each day. It becomes more challenging to build systems that will detect such threats and recover from the damage. This is particularly critical for distributed mission-critical systems, which cannot afford a letdown in functionality even though there are internal component failures or compromises with malicious codes, especially in a downloaded component from an external source. Therefore, when using such a component, we should check to see if the source of the component is trusted and that the code has not been modified in an unauthorized manner since it was created. Furthermore, once we find failures or malicious codes in the component, we should fix those problems and continue the original functionality of the component at runtime so that we can support survivability in the mission-critical system. In this paper, we define our definition of survivability, discuss the survivability challenges in component-sharing in a large distributed system, identify the static and dynamic survivability models, and discuss their trade-offs. Consequently, we propose novel approaches for component survivability. Finally, we prove the feasibility of our ideas by implementing component recovery against internal failures and malicious codes based on the dynamic model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Barga R, Lomet D, Shegalov G, Weikum G (2004) Recovery guarantees for internet applications. ACM Trans Internet Technol 4(3):289–328 CrossRef Barga R, Lomet D, Shegalov G, Weikum G (2004) Recovery guarantees for internet applications. ACM Trans Internet Technol 4(3):289–328 CrossRef
2.
Zurück zum Zitat Freytag JC, Cristian F, Kähler B (1987) Making system crashes in database application programs. In: Proceedings of the 13th international conference on very large data bases, Brighton, England, 1–4 September 1987, pp 407–416 Freytag JC, Cristian F, Kähler B (1987) Making system crashes in database application programs. In: Proceedings of the 13th international conference on very large data bases, Brighton, England, 1–4 September 1987, pp 407–416
3.
Zurück zum Zitat Lomet D, Weikum G (1998) Efficient transparent application recovery in client-server information systems. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 460–471 Lomet D, Weikum G (1998) Efficient transparent application recovery in client-server information systems. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 460–471
4.
Zurück zum Zitat Barga RS, Lomet DB, Baby T, Agrawal S (2000) Persistent client-server database sessions. In: Proceedings of the 7th international conference on extending database technology: advances in database technology, Konstanz, Germany, 27–31 March 2000, pp 462–477 Barga RS, Lomet DB, Baby T, Agrawal S (2000) Persistent client-server database sessions. In: Proceedings of the 7th international conference on extending database technology: advances in database technology, Konstanz, Germany, 27–31 March 2000, pp 462–477
5.
Zurück zum Zitat Andersson K, Lennartson B, Fabian M (2010) Restarting manufacturing systems; restart states and restartability. IEEE Trans Autom Sci Eng 7(3):486–499 CrossRef Andersson K, Lennartson B, Fabian M (2010) Restarting manufacturing systems; restart states and restartability. IEEE Trans Autom Sci Eng 7(3):486–499 CrossRef
6.
Zurück zum Zitat Lomet DB (1998) Persistent applications using generalized redo recovery. In: Proceedings of the 14th IEEE international conference on data engineering (ICDE), Orlando, FL, pp 154–163 CrossRef Lomet DB (1998) Persistent applications using generalized redo recovery. In: Proceedings of the 14th IEEE international conference on data engineering (ICDE), Orlando, FL, pp 154–163 CrossRef
7.
Zurück zum Zitat Liu P, Ammann P, Jajodia S (2000) Rewriting histories: recovering from malicious transactions. Distrib Parallel Databases 8(1):7–40 CrossRef Liu P, Ammann P, Jajodia S (2000) Rewriting histories: recovering from malicious transactions. Distrib Parallel Databases 8(1):7–40 CrossRef
8.
Zurück zum Zitat Jajodia S, McCollum CD, Ammann P (1999) Trusted recovery. Commun ACM 42(7):71–75 CrossRef Jajodia S, McCollum CD, Ammann P (1999) Trusted recovery. Commun ACM 42(7):71–75 CrossRef
9.
Zurück zum Zitat Ring S, Esler D, Cole E (2004) Self-healing mechanisms for kernel system compromises. In: Proceedings of the 1st ACM SIGSOFT workshop on self-managed systems. ACM, New York, pp 100–104 CrossRef Ring S, Esler D, Cole E (2004) Self-healing mechanisms for kernel system compromises. In: Proceedings of the 1st ACM SIGSOFT workshop on self-managed systems. ACM, New York, pp 100–104 CrossRef
10.
Zurück zum Zitat Dai Y, Xiang Y, Li Y, Xing L, Zhang G (2011) Consequence oriented self-healing and autonomous diagnosis for highly reliable systems and software. IEEE Trans Reliab 60(2):369–380 CrossRef Dai Y, Xiang Y, Li Y, Xing L, Zhang G (2011) Consequence oriented self-healing and autonomous diagnosis for highly reliable systems and software. IEEE Trans Reliab 60(2):369–380 CrossRef
11.
Zurück zum Zitat Meng Q, Zhou R-P, Yang X-H (2010) Design and implementation of an intrusion-tolerant self-healing application server. In: Proceedings of the international conference on communications and intelligence information security. IEEE Computer Society, Washington, pp 92–95. doi:10.1109/ICCIIS.2010.69 Meng Q, Zhou R-P, Yang X-H (2010) Design and implementation of an intrusion-tolerant self-healing application server. In: Proceedings of the international conference on communications and intelligence information security. IEEE Computer Society, Washington, pp 92–95. doi:10.​1109/​ICCIIS.​2010.​69
12.
Zurück zum Zitat Dowling J, Cahill V (2004) Self-managed decentralised systems using k-components and collaborative reinforcement learning. In: Proceedings of the 1st ACM SIGSOFT workshop on self-managed systems. ACM, New York, pp 39–43 CrossRef Dowling J, Cahill V (2004) Self-managed decentralised systems using k-components and collaborative reinforcement learning. In: Proceedings of the 1st ACM SIGSOFT workshop on self-managed systems. ACM, New York, pp 39–43 CrossRef
13.
Zurück zum Zitat Helsinger A, Kleinmann K, Brinn M (2004) A framework to control emergent survivability of multi agent systems. In: Proceedings of the third international joint conference on autonomous agents and multiagent systems. IEEE Computer Society, Washington, pp 28–35 Helsinger A, Kleinmann K, Brinn M (2004) A framework to control emergent survivability of multi agent systems. In: Proceedings of the third international joint conference on autonomous agents and multiagent systems. IEEE Computer Society, Washington, pp 28–35
14.
Zurück zum Zitat Abadi M, Lamport L (1993) Composing specifications. ACM Trans Program Lang Syst 15(1):73–132 CrossRef Abadi M, Lamport L (1993) Composing specifications. ACM Trans Program Lang Syst 15(1):73–132 CrossRef
16.
Zurück zum Zitat Voas JM, Miller KW, Payne J (1992) PISCES: a tool for predicting software testability. NASA tech rep Voas JM, Miller KW, Payne J (1992) PISCES: a tool for predicting software testability. NASA tech rep
17.
Zurück zum Zitat Daniel B, Boshernitsan M (2008) Predicting effectiveness of automatic testing tools. In: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, pp 363–366. doi:10.1109/ASE.2008.49 Daniel B, Boshernitsan M (2008) Predicting effectiveness of automatic testing tools. In: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, pp 363–366. doi:10.​1109/​ASE.​2008.​49
18.
Zurück zum Zitat Omar F, Ibrahim S (2010) Designing test coverage for grey box analysis. In: Proceedings of the 10th international conference on quality software. IEEE Computer Society, Washington, pp 353–356. doi:10.1109/QSIC.2010.44 Omar F, Ibrahim S (2010) Designing test coverage for grey box analysis. In: Proceedings of the 10th international conference on quality software. IEEE Computer Society, Washington, pp 353–356. doi:10.​1109/​QSIC.​2010.​44
19.
Zurück zum Zitat Voas JM, Payne J (2000) Dependability certification of software components. J Syst Softw 52(2–3):165–172 CrossRef Voas JM, Payne J (2000) Dependability certification of software components. J Syst Softw 52(2–3):165–172 CrossRef
21.
Zurück zum Zitat Kapfhammer G, Michael C, Haddox J, Colyer R (2000) An approach to identifying and understanding problematic COTS components. In: Proceedings of the software risk management conference (ISACC), Reston, VA Kapfhammer G, Michael C, Haddox J, Colyer R (2000) An approach to identifying and understanding problematic COTS components. In: Proceedings of the software risk management conference (ISACC), Reston, VA
22.
Zurück zum Zitat Avresky DR, Arlat J, Laprie J-C, Crouzet Y (1996) Fault injection for formal testing of fault tolerance. IEEE Trans Reliab 45(3):443–455 CrossRef Avresky DR, Arlat J, Laprie J-C, Crouzet Y (1996) Fault injection for formal testing of fault tolerance. IEEE Trans Reliab 45(3):443–455 CrossRef
23.
Zurück zum Zitat Hsueh M-C, Tsai TK, Iyer RK (1997) Fault injection techniques and tools. Computer 30(4):75–82 CrossRef Hsueh M-C, Tsai TK, Iyer RK (1997) Fault injection techniques and tools. Computer 30(4):75–82 CrossRef
24.
Zurück zum Zitat Madeira H, Costa D, Vieira M (2000) On the emulation of software faults by software fault injection. In: Proceedings of the international conference on dependable systems and networks (DNS). IEEE Computer Society, Washington, pp 417–426 Madeira H, Costa D, Vieira M (2000) On the emulation of software faults by software fault injection. In: Proceedings of the international conference on dependable systems and networks (DNS). IEEE Computer Society, Washington, pp 417–426
25.
Zurück zum Zitat Lee D, Na J (2009) A novel simulation fault injection method for dependability analysis. IEEE Des Test Comput 26(6):50–61 CrossRef Lee D, Na J (2009) A novel simulation fault injection method for dependability analysis. IEEE Des Test Comput 26(6):50–61 CrossRef
26.
Zurück zum Zitat Zhang H (2010) Research about software fault injection technology based on distributed system. In: Proceedings of the international conference on machine vision and human-machine interface. IEEE Computer Society, Washington, pp 518–521. doi:10.1109/MVHI.2010.46 CrossRef Zhang H (2010) Research about software fault injection technology based on distributed system. In: Proceedings of the international conference on machine vision and human-machine interface. IEEE Computer Society, Washington, pp 518–521. doi:10.​1109/​MVHI.​2010.​46 CrossRef
27.
Zurück zum Zitat Voas JM, McGraw G (1998) Software fault injection: innoculating programs against errors. Wiley Computer, New York Voas JM, McGraw G (1998) Software fault injection: innoculating programs against errors. Wiley Computer, New York
28.
Zurück zum Zitat Chen MY, Kiciman E, Fratkin E, Fox A, Brewer E (2002) Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of the international conference on dependable systems and networks (DSN). IEEE Computer Society, Washington, pp 595–604 CrossRef Chen MY, Kiciman E, Fratkin E, Fox A, Brewer E (2002) Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of the international conference on dependable systems and networks (DSN). IEEE Computer Society, Washington, pp 595–604 CrossRef
29.
Zurück zum Zitat Knight J, Sullivan K (2000) Towards a definition of survivability. In: Proceedings of the 3rd IEEE information survivability workshop (ISW), Boston, MA, October 2000 Knight J, Sullivan K (2000) Towards a definition of survivability. In: Proceedings of the 3rd IEEE information survivability workshop (ISW), Boston, MA, October 2000
30.
Zurück zum Zitat Lipson H, Fisher D (1999) Survivability—a new technical and business perspective on security. In: Proceedings of the new security paradigms workshop (NSPW’99), Ontario, Canada, 21–24 September 1999, pp 21–24 Lipson H, Fisher D (1999) Survivability—a new technical and business perspective on security. In: Proceedings of the new security paradigms workshop (NSPW’99), Ontario, Canada, 21–24 September 1999, pp 21–24
31.
Zurück zum Zitat Westmark V (2004) A definition for information system survivability. In: Proceedings of the 37th annual Hawaii international conference on systems sciences, January 2004, p 10 Westmark V (2004) A definition for information system survivability. In: Proceedings of the 37th annual Hawaii international conference on systems sciences, January 2004, p 10
32.
Zurück zum Zitat Knight J, Elder M, Du X (1998) Error recovery in critical infrastructure systems. In: Proceedings of the computer security, dependability, and assurance workshop, Williamsburg, VA, November 1998 Knight J, Elder M, Du X (1998) Error recovery in critical infrastructure systems. In: Proceedings of the computer security, dependability, and assurance workshop, Williamsburg, VA, November 1998
33.
Zurück zum Zitat Park JS, Froscher JN (2002) A strategy for information survivability. In: Proceedings of the 4th IEEE/CMU/SEI information survivability workshop (ISW), Vancouver, Canada, 18–20 March 2002 Park JS, Froscher JN (2002) A strategy for information survivability. In: Proceedings of the 4th IEEE/CMU/SEI information survivability workshop (ISW), Vancouver, Canada, 18–20 March 2002
35.
Zurück zum Zitat Park JS, Giordano J (2008) Software component survivability in information warfare. In: Janczewski LJ, Colarik AM (eds) Cyber warfare and cyber terrorism. Idea Group Publishing, Hershey, pp 403–411 Park JS, Giordano J (2008) Software component survivability in information warfare. In: Janczewski LJ, Colarik AM (eds) Cyber warfare and cyber terrorism. Idea Group Publishing, Hershey, pp 403–411
36.
Zurück zum Zitat Park JS, Chandramohan P (2004) Component recovery approaches for survivable distributed systems. In: Proceedings of the 37th Hawaii international conference on systems sciences (HICSS-37), Big Island, HI, 5–8 January 2004 Park JS, Chandramohan P (2004) Component recovery approaches for survivable distributed systems. In: Proceedings of the 37th Hawaii international conference on systems sciences (HICSS-37), Big Island, HI, 5–8 January 2004
37.
Zurück zum Zitat Park JS, Chandramohan P, Giordano J (2004) Survivability models and implementations in large distributed environments. In: Proceedings of the 16th IASTED (international association of science and technology for development) conference on parallel and distributed computing and systems (PDCS), 8–10 November 2004. MIT, Cambridge Park JS, Chandramohan P, Giordano J (2004) Survivability models and implementations in large distributed environments. In: Proceedings of the 16th IASTED (international association of science and technology for development) conference on parallel and distributed computing and systems (PDCS), 8–10 November 2004. MIT, Cambridge
38.
Zurück zum Zitat Park JS, Chandramohan P, Devarajan G, Giordano J (2005) Trusted component sharing by runtime test and immunization for survivable distributed systems. In: Proceedings of he 20th IFIP international conference on information security (IFIP/SEC 2005), Chiba, Japan, 30 May–1 June 2005 Park JS, Chandramohan P, Devarajan G, Giordano J (2005) Trusted component sharing by runtime test and immunization for survivable distributed systems. In: Proceedings of he 20th IFIP international conference on information security (IFIP/SEC 2005), Chiba, Japan, 30 May–1 June 2005
39.
Zurück zum Zitat Park JS, Chandramohan P, Giordano J (2004) Component-abnormality detection and immunization for survivable systems in large distributed environments. In: Proceedings of the 8th IASTED (international association of science and technology for development) conference on software engineering and application (SEA), November 2004. MIT, Cambridge, pp 102–108 Park JS, Chandramohan P, Giordano J (2004) Component-abnormality detection and immunization for survivable systems in large distributed environments. In: Proceedings of the 8th IASTED (international association of science and technology for development) conference on software engineering and application (SEA), November 2004. MIT, Cambridge, pp 102–108
40.
Zurück zum Zitat Schmidt DC (1995) Using design patterns to develop reusable object-oriented communication software. Commun ACM 38(10):65–74 CrossRef Schmidt DC (1995) Using design patterns to develop reusable object-oriented communication software. Commun ACM 38(10):65–74 CrossRef
41.
Zurück zum Zitat Farrell J (2010) An object-oriented approach to programming logic and design, 3rd edn. Course Technology Press, Boston Farrell J (2010) An object-oriented approach to programming logic and design, 3rd edn. Course Technology Press, Boston
43.
Zurück zum Zitat Enhanced management controls using digital signatures and attribute certificates. American National Standards Institute (ANSI), 1999, ANSI X9.45-1999 Enhanced management controls using digital signatures and attribute certificates. American National Standards Institute (ANSI), 1999, ANSI X9.45-1999
44.
Zurück zum Zitat Milenkovic M, Milenkovic aEJA (2005) Using instruction block signatures to counter code injection attacks. ACM SIGARCH Comput Arch 33(1) Milenkovic M, Milenkovic aEJA (2005) Using instruction block signatures to counter code injection attacks. ACM SIGARCH Comput Arch 33(1)
45.
Zurück zum Zitat Oppenheimer DL, Martonosi MR (1997) Performance signatures: A mechanism for intrusion detection. In: Proceedings of the IEEE information survivability workshop (ISW), San Diego, CA, February 1997 Oppenheimer DL, Martonosi MR (1997) Performance signatures: A mechanism for intrusion detection. In: Proceedings of the IEEE information survivability workshop (ISW), San Diego, CA, February 1997
46.
Zurück zum Zitat Park JS, Jayaprakash G, Giordano J (2006) Component integrity check and recovery against malicious codes. In: Proceedings of the 20th international conference on advanced information networking and applications (AINA ’06), vol 2. IEEE Computer Society, Washington, pp 466–470. doi:10.1109/AINA.2006.131 Park JS, Jayaprakash G, Giordano J (2006) Component integrity check and recovery against malicious codes. In: Proceedings of the 20th international conference on advanced information networking and applications (AINA ’06), vol 2. IEEE Computer Society, Washington, pp 466–470. doi:10.​1109/​AINA.​2006.​131
47.
Zurück zum Zitat Park JS, Sandhu R (2000) Binding identities and attributes using digitally signed certificates. In: Proceedings of the 16th annual conference on computer security application (ACSAC), New Orleans, LA, 11–15 December 2000 Park JS, Sandhu R (2000) Binding identities and attributes using digitally signed certificates. In: Proceedings of the 16th annual conference on computer security application (ACSAC), New Orleans, LA, 11–15 December 2000
48.
Zurück zum Zitat Park JS, Devarajan G (2007) Fine-grained and scalable approaches for message integrity. In: Proceedings of the 40th Hawaii international conference on systems sciences (HICSS-40), Big Island, HI, 3–6 January 2007 Park JS, Devarajan G (2007) Fine-grained and scalable approaches for message integrity. In: Proceedings of the 40th Hawaii international conference on systems sciences (HICSS-40), Big Island, HI, 3–6 January 2007
50.
Zurück zum Zitat Park JS, Suresh AT, An G, Giordano J (2006) A framework of multiple-aspect component-testing for trusted collaboration in mission-critical systems. In: Proceedings of the IEEE workshop on trusted collaboration (TrustCol), Atlanta, CA, 17–20 November 2006 Park JS, Suresh AT, An G, Giordano J (2006) A framework of multiple-aspect component-testing for trusted collaboration in mission-critical systems. In: Proceedings of the IEEE workshop on trusted collaboration (TrustCol), Atlanta, CA, 17–20 November 2006
51.
Zurück zum Zitat Park JS, An G, Suresh A (2008) Multiple-aspect malicious code detection for component survivability in distributed computing environments. In: Proceedings of the 20th IASTED (international association of science and technology for development) conference on parallel and distributed computing and systems (PDCS), November 2008. ACTA, Calgary Park JS, An G, Suresh A (2008) Multiple-aspect malicious code detection for component survivability in distributed computing environments. In: Proceedings of the 20th IASTED (international association of science and technology for development) conference on parallel and distributed computing and systems (PDCS), November 2008. ACTA, Calgary
52.
Zurück zum Zitat Brilliant SS, Knight JC, Leveson NG (1990) Analysis of faults in an n-version software experiment. IEEE Trans Softw Eng 16(2):238–247 CrossRef Brilliant SS, Knight JC, Leveson NG (1990) Analysis of faults in an n-version software experiment. IEEE Trans Softw Eng 16(2):238–247 CrossRef
53.
Zurück zum Zitat Cai X, Lyu MR, Vouk MA (2005) An experimental evaluation on reliability features of n-version programming. In: Proceedings of the 16th IEEE international symposium on software reliability engineering, November 2005. IEEE Computer Society, Washington, pp 161–170 Cai X, Lyu MR, Vouk MA (2005) An experimental evaluation on reliability features of n-version programming. In: Proceedings of the 16th IEEE international symposium on software reliability engineering, November 2005. IEEE Computer Society, Washington, pp 161–170
54.
Zurück zum Zitat Ege M, Eyler MA, Karakas MU (2001) Reliability analysis in n-version programming with dependent failures. In: Proceedings of the 27th EUROMICRO conference, September 2001. IEEE Computer Society, Warsaw Ege M, Eyler MA, Karakas MU (2001) Reliability analysis in n-version programming with dependent failures. In: Proceedings of the 27th EUROMICRO conference, September 2001. IEEE Computer Society, Warsaw
Metadaten
Titel
Component survivability at runtime for mission-critical distributed systems
verfasst von
Joon S. Park
Pratheep Chandramohan
Avinash T. Suresh
Joseph V. Giordano
Kevin A. Kwiat
Publikationsdatum
01.12.2013
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 3/2013
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-012-0818-2

Weitere Artikel der Ausgabe 3/2013

The Journal of Supercomputing 3/2013 Zur Ausgabe