Skip to main content
Erschienen in: Wireless Personal Communications 4/2018

27.04.2017

Advanced Primary–Backup Platform with Container-Based Automatic Deployment for Fault-Tolerant Systems

verfasst von: Jaemyoun Lee, Haegeon Jeong, Won-Joo Lee, Hyo-Joong Suh, Dongeun Lee, Kyungtae Kang

Erschienen in: Wireless Personal Communications | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Within mission-critical systems, the primary–backup scheme is a desirable approach for improving reliability and fault tolerance. It can be used to ensure a high mission success rate despite unexpected errors. However, it must cope with the need to maintain consistency between a primary and a backup whenever the primary encounters unexpected errors. We overcome this issue by introducing a platform that uses container-based light virtualization and an automatic build system to isolate an application so that it may then be deployed on different devices without manual intervention. We believe an advanced deployment procedure can retain the consistency of the primary–backup systems with low implementation complexity. Integrated with a cloud application, it can also manage mission-critical systems effectively, communicate with the redundant systems, and detect unexpected errors by using sophisticated fault-detection technologies. We demonstrate that the platform can improve the reliability of mission-critical systems through realistic experiment using a model electronic vehicle and can reduce hardware dependencies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zhang, Y., Chamseddine, A., Rabbath, C., Gordon, B., Su, C.-Y., Rakheja, S., et al. (2013). Development of advanced FDD and FTC techniques with application to an unmanned quadrotor helicopter testbed. Journal of the Franklin Institute, 350(9), 2396–2422.CrossRefMATH Zhang, Y., Chamseddine, A., Rabbath, C., Gordon, B., Su, C.-Y., Rakheja, S., et al. (2013). Development of advanced FDD and FTC techniques with application to an unmanned quadrotor helicopter testbed. Journal of the Franklin Institute, 350(9), 2396–2422.CrossRefMATH
2.
Zurück zum Zitat Saied, M., Lussier, B., Fantoni, I., Francis, C., & Shraim, H. (2015). Fault tolerant control for multiple successive failures in an octorotor: Architecture and experiments. In Proceedidings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS`15), (pp. 40–45). Saied, M., Lussier, B., Fantoni, I., Francis, C., & Shraim, H. (2015). Fault tolerant control for multiple successive failures in an octorotor: Architecture and experiments. In Proceedidings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS`15), (pp. 40–45).
3.
Zurück zum Zitat Park, J., Lee, S., Yoon, T., & Kim, J. (2015). An autonomic control system for high-reliable CPS. Cluster Computing, 18(2), 587–598.CrossRef Park, J., Lee, S., Yoon, T., & Kim, J. (2015). An autonomic control system for high-reliable CPS. Cluster Computing, 18(2), 587–598.CrossRef
4.
Zurück zum Zitat Asikin, D., & Dolan, J. M. (2010). Reliability impact on planetary robotic missions. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS`10), (pp. 4095–4100). Asikin, D., & Dolan, J. M. (2010). Reliability impact on planetary robotic missions. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS`10), (pp. 4095–4100).
5.
Zurück zum Zitat Freddi, A., Longhi, S., & Monteriu, A. (2012). A diagnostic Thau observer for a class of unmanned vehicles. Journal of Intelligent and Robotic Systems, 67(1), 61–73.CrossRefMATH Freddi, A., Longhi, S., & Monteriu, A. (2012). A diagnostic Thau observer for a class of unmanned vehicles. Journal of Intelligent and Robotic Systems, 67(1), 61–73.CrossRefMATH
6.
Zurück zum Zitat Fault-detection, Fault-isolation and recovery (FDIR) techniques. Johnson Space Center (NASA), Tech. DFE-7, (1994). Fault-detection, Fault-isolation and recovery (FDIR) techniques. Johnson Space Center (NASA), Tech. DFE-7, (1994).
7.
Zurück zum Zitat Soltesz, S., P¨otzl, H., Fiuczynski, M. E., Bavier, A., & Peterson, L. (2007). Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys`07), (pp. 275–287). Soltesz, S., P¨otzl, H., Fiuczynski, M. E., Bavier, A., & Peterson, L. (2007). Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys`07), (pp. 275–287).
8.
Zurück zum Zitat Kyriazis, D., Anagnostopoulos, V., Arcangeli, A., Gilbert, D., Kalogeras, D., Kat, R., Klein, C., Kokkinos, P., Kuperman, Y., Nider, J., Svärd, P., Tomas, L., Varvarigos, E., & Varvarigou, T. (2015). High performance fault-tolerance for clouds. In Proceedings of the IEEE Symposium on Computers and Communication (ISCC`15), (pp. 251–257). Kyriazis, D., Anagnostopoulos, V., Arcangeli, A., Gilbert, D., Kalogeras, D., Kat, R., Klein, C., Kokkinos, P., Kuperman, Y., Nider, J., Svärd, P., Tomas, L., Varvarigos, E., & Varvarigou, T. (2015). High performance fault-tolerance for clouds. In Proceedings of the IEEE Symposium on Computers and Communication (ISCC`15), (pp. 251–257).
9.
Zurück zum Zitat Wang, J., Zhu, X., & Bao, W. (2013). Real-time fault-tolerant scheduling based on primary–backup approach in virtualized clouds. In Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications and IEEE International Conference on Embedded and Ubiquitous Computing (HPCC EUC`13), (pp. 1127–1134). Wang, J., Zhu, X., & Bao, W. (2013). Real-time fault-tolerant scheduling based on primary–backup approach in virtualized clouds. In Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications and IEEE International Conference on Embedded and Ubiquitous Computing (HPCC EUC`13), (pp. 1127–1134).
10.
Zurück zum Zitat Jiang, G., Chen, H., Yoshihira, K., & Saxena, A. (2011). Ranking the importance of alerts for problem determination in large computer systems. Cluster Computing, 14(3), 213–227.CrossRef Jiang, G., Chen, H., Yoshihira, K., & Saxena, A. (2011). Ranking the importance of alerts for problem determination in large computer systems. Cluster Computing, 14(3), 213–227.CrossRef
11.
Zurück zum Zitat Merkel, D. (2014). Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. Merkel, D. (2014). Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal, 2014(239), 2.
12.
Zurück zum Zitat Jia, W., & Zhou, W. (2006). Distributed network systems: From concepts to implementations, ser. network theory and applications. New York: Springer. Jia, W., & Zhou, W. (2006). Distributed network systems: From concepts to implementations, ser. network theory and applications. New York: Springer.
13.
Zurück zum Zitat Zheng, W., Xu, P., Huang, X., & Wu, N. (2010). Design a cloud storage platform for pervasive computing environments. Cluster Computing, 13(2), 141–151.CrossRef Zheng, W., Xu, P., Huang, X., & Wu, N. (2010). Design a cloud storage platform for pervasive computing environments. Cluster Computing, 13(2), 141–151.CrossRef
14.
Zurück zum Zitat Zheng, Q., Veeravalli, B., & Tham, C. K. (2009). On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Transactions on Computers, 58(3), 380–393.MathSciNetCrossRefMATH Zheng, Q., Veeravalli, B., & Tham, C. K. (2009). On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Transactions on Computers, 58(3), 380–393.MathSciNetCrossRefMATH
15.
Zurück zum Zitat Luo, W., Qin, X., Tan, X. C., Qin, K., & Manzanares, A. (2009). Exploiting redundancies to enhance schedulability in fault-tolerant and real-time distributed systems. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 39(3), 626–639.CrossRef Luo, W., Qin, X., Tan, X. C., Qin, K., & Manzanares, A. (2009). Exploiting redundancies to enhance schedulability in fault-tolerant and real-time distributed systems. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 39(3), 626–639.CrossRef
16.
Zurück zum Zitat Ko, W., Yoo, J., Kang, I., Jun, J., & Lim, S. S. (2016). Lightweight, predictable hypervisor for ARM-Based embedded systems. In Proceedings of the IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA`16), (p. 109). Ko, W., Yoo, J., Kang, I., Jun, J., & Lim, S. S. (2016). Lightweight, predictable hypervisor for ARM-Based embedded systems. In Proceedings of the IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA`16), (p. 109).
17.
Zurück zum Zitat Li, N., Kinebuchi, Y., Mitake, H., Shimada, H., Lin, T., & Nakajima, T. (2012). A light-weighted virtualization layer for multicore processor-based rich functional embedded systems. In Proceedings of the IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC`12), (pp. 144–153). Li, N., Kinebuchi, Y., Mitake, H., Shimada, H., Lin, T., & Nakajima, T. (2012). A light-weighted virtualization layer for multicore processor-based rich functional embedded systems. In Proceedings of the IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC`12), (pp. 144–153).
18.
Zurück zum Zitat Yoo, J. (2016). The design and implementation of fault tolerant PSTR on the embedded virtualization system. In Proceedings of the World Congress on Engineering and Computer Science (WCECS`16), (pp. 145–149). Yoo, J. (2016). The design and implementation of fault tolerant PSTR on the embedded virtualization system. In Proceedings of the World Congress on Engineering and Computer Science (WCECS`16), (pp. 145–149).
19.
Zurück zum Zitat Checconi, F., Cucinotta, T., & Stein, M. (2010). Real-time issues in live migration of virtual machines. In Proceedings of the International Conference on Parallel Processing (Euro-Par`09), (pp. 454–466). Checconi, F., Cucinotta, T., & Stein, M. (2010). Real-time issues in live migration of virtual machines. In Proceedings of the International Conference on Parallel Processing (Euro-Par`09), (pp. 454–466).
20.
Zurück zum Zitat Kim, D., Machida, F., & Trivedi, K. (2009). Availability modeling and analysis of a virtualized system. In Proceedings of the IEEE Pacific Rim International Symposium on Dependable Computing (PRDC`09), (pp. 365–371). Kim, D., Machida, F., & Trivedi, K. (2009). Availability modeling and analysis of a virtualized system. In Proceedings of the IEEE Pacific Rim International Symposium on Dependable Computing (PRDC`09), (pp. 365–371).
21.
Zurück zum Zitat Groesbrink, S. (2014). Virtual machine migration as a fault tolerance technique for embedded real-time systems. In Proceedings of the IEEE International Conference on Software Security and Reliability-Companion (SERE-C`14), (pp. 7–12). Groesbrink, S. (2014). Virtual machine migration as a fault tolerance technique for embedded real-time systems. In Proceedings of the IEEE International Conference on Software Security and Reliability-Companion (SERE-C`14), (pp. 7–12).
22.
Zurück zum Zitat Dhouib, S., Kchir, S., Stinckwich, S., Ziadi, T., & Ziane, M. (2012). RobotML, a domain-specific language to design, simulate and deploy robotic applications. In Proceedings of the Third International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR`12), (pp. 149–160). Dhouib, S., Kchir, S., Stinckwich, S., Ziadi, T., & Ziane, M. (2012). RobotML, a domain-specific language to design, simulate and deploy robotic applications. In Proceedings of the Third International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR`12), (pp. 149–160).
23.
Zurück zum Zitat Dhillon, B. (2012). Robot reliability and safety. New York: Springer. Dhillon, B. (2012). Robot reliability and safety. New York: Springer.
24.
Zurück zum Zitat Hammadi, M., Choley, M., Ben Said, A., Kellner, A., & Hehenberger, P. (2016). Systems engineering analysis approach based on interoperability for reconfigurable manufacturing systems. In Proceedings of the IEEE International Symposium on Systems Engineering (ISSE`16), (pp. 1–6). Hammadi, M., Choley, M., Ben Said, A., Kellner, A., & Hehenberger, P. (2016). Systems engineering analysis approach based on interoperability for reconfigurable manufacturing systems. In Proceedings of the IEEE International Symposium on Systems Engineering (ISSE`16), (pp. 1–6).
25.
Zurück zum Zitat Zhu, X., Wang, J., Guo, H., Zhu, D., Yang, L. T., & Liu, L. (2016). Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Transactions on Parallel and Distributed Systems, 27(12), 3501–3517.CrossRef Zhu, X., Wang, J., Guo, H., Zhu, D., Yang, L. T., & Liu, L. (2016). Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Transactions on Parallel and Distributed Systems, 27(12), 3501–3517.CrossRef
26.
Zurück zum Zitat Stanclif, S., Dolan, J., & Trebi-Ollennu, A. (2009). Planning to fail—reliability as a design parameter for planetary rover missions. In Proceedings of the Carnegie Mellon University Research Showcase Robotics Institute, (pp. 2–6). Stanclif, S., Dolan, J., & Trebi-Ollennu, A. (2009). Planning to fail—reliability as a design parameter for planetary rover missions. In Proceedings of the Carnegie Mellon University Research Showcase Robotics Institute, (pp. 2–6).
27.
Zurück zum Zitat Sommerville, I. (2010). Software engineering (9th ed.). Boston: Addison Wesley.MATH Sommerville, I. (2010). Software engineering (9th ed.). Boston: Addison Wesley.MATH
28.
Zurück zum Zitat Bassil, Y. (2012). A simulation model for the waterfall software development life cycle. International Journal of Engineering and Technology, 2(5), 742–749. Bassil, Y. (2012). A simulation model for the waterfall software development life cycle. International Journal of Engineering and Technology, 2(5), 742–749.
29.
Zurück zum Zitat Stellman, A., & Greene, J. (2005). Applied software project management. Sebastopol: O’Reilly Media. Stellman, A., & Greene, J. (2005). Applied software project management. Sebastopol: O’Reilly Media.
30.
Zurück zum Zitat Cappos, J., Baker, S., Plichta, J., Nyugen, D., Hardies, J., Borgard, M., Johnston, J., & Hartman, J. H. (2007). Stork: Package management for distributed VM environments. In Proceedings of the 21st Conference on Large Installation System Administration Conference (LISA`07), (pp. 7:1–7:16). Cappos, J., Baker, S., Plichta, J., Nyugen, D., Hardies, J., Borgard, M., Johnston, J., & Hartman, J. H. (2007). Stork: Package management for distributed VM environments. In Proceedings of the 21st Conference on Large Installation System Administration Conference (LISA`07), (pp. 7:1–7:16).
31.
Zurück zum Zitat Tucker, C., Shuffelton, D., Jhala, R., & Lerner, S. (2007). OPIUM: Optimal package install/uninstall manager. In Proceedings of the 29th International Conference on Software Engineering (ICSE`07), (pp. 178–188). Tucker, C., Shuffelton, D., Jhala, R., & Lerner, S. (2007). OPIUM: Optimal package install/uninstall manager. In Proceedings of the 29th International Conference on Software Engineering (ICSE`07), (pp. 178–188).
32.
Zurück zum Zitat Gerkey, B., & Conley, K. (2011). Robot developer kits. IEEE Robotics and Automation Magazine, 18(3), 16.CrossRef Gerkey, B., & Conley, K. (2011). Robot developer kits. IEEE Robotics and Automation Magazine, 18(3), 16.CrossRef
33.
Zurück zum Zitat Smith, J. E., & Nair, R. (2005). The architecture of virtual machines. Computer, 38(5), 32–38.CrossRef Smith, J. E., & Nair, R. (2005). The architecture of virtual machines. Computer, 38(5), 32–38.CrossRef
34.
Zurück zum Zitat Youseff, L., Seymour, K., You, H., Zagorodnov, D., Dongarra, J., & Wolski, R. (2009). Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software. Cluster Computing, 12(2), 101–122.CrossRef Youseff, L., Seymour, K., You, H., Zagorodnov, D., Dongarra, J., & Wolski, R. (2009). Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software. Cluster Computing, 12(2), 101–122.CrossRef
35.
Zurück zum Zitat Bernstein, D. (2014). Containers and cloud: From LXC to Docker to Kubernetes. IEEE Cloud Computing, 1(3), 81–84.CrossRef Bernstein, D. (2014). Containers and cloud: From LXC to Docker to Kubernetes. IEEE Cloud Computing, 1(3), 81–84.CrossRef
36.
Zurück zum Zitat Felter, W., Ferreira, A., Rajamony, R., & Rubio, J. (2014). An updated performance comparison of virtual machines and linux containers. IBM Research Division Austin Research Laboratory, RC25482 (AUS1407-001). Felter, W., Ferreira, A., Rajamony, R., & Rubio, J. (2014). An updated performance comparison of virtual machines and linux containers. IBM Research Division Austin Research Laboratory, RC25482 (AUS1407-001).
38.
Zurück zum Zitat Bahl, P., Han, R. Y., Li, L. E., & Satyanarayanan, M. (2012). Advancing the state of mobile cloud computing. In Proceedings of 3rd ACM Workshop on Mobile Cloud Computing and Services (MCS`12), New York, NY, USA: ACM, (pp. 21–28). Bahl, P., Han, R. Y., Li, L. E., & Satyanarayanan, M. (2012). Advancing the state of mobile cloud computing. In Proceedings of 3rd ACM Workshop on Mobile Cloud Computing and Services (MCS`12), New York, NY, USA: ACM, (pp. 21–28).
Metadaten
Titel
Advanced Primary–Backup Platform with Container-Based Automatic Deployment for Fault-Tolerant Systems
verfasst von
Jaemyoun Lee
Haegeon Jeong
Won-Joo Lee
Hyo-Joong Suh
Dongeun Lee
Kyungtae Kang
Publikationsdatum
27.04.2017
Verlag
Springer US
Erschienen in
Wireless Personal Communications / Ausgabe 4/2018
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-017-4282-4

Weitere Artikel der Ausgabe 4/2018

Wireless Personal Communications 4/2018 Zur Ausgabe

Neuer Inhalt