Skip to main content
Top
Published in: Real-Time Systems 4/2019

06-09-2019

Practical task allocation for software fault-tolerance and its implementation in embedded automotive systems

Authors: Anand Bhat, Soheil Samii, Ragunathan Rajkumar

Published in: Real-Time Systems | Issue 4/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Due to the advent of active safety features and automated driving capabilities, the complexity of embedded computing systems within automobiles continues to increase. Such advanced driver assistance systems (ADAS) are inherently safety-critical and must tolerate failures in any subsystem. However, fault-tolerance in safety-critical systems has been traditionally supported by hardware replication, which is prohibitively expensive in terms of cost, weight, and size for the automotive market. Recent work has studied the use of software-based fault-tolerance techniques that utilize task-level hot and cold standbys to tolerate fail-stop processor and task failures. The benefit of using standbys is maximal when a task and any of its standbys obey the placement constraint of not being co-located on the same processor. We propose a new heuristic based on a “tiered” placement constraint, and show that our heuristic produces a better task assignment that saves at least one processor up to 40% of the time relative to the best known heuristic to date. We then introduce a task allocation algorithm that, for the first time to our knowledge, leverages the run-time attributes of cold standbys. Our empirical study finds that our heuristic uses no more than one additional processor in most cases relative to an optimal allocation that we construct for evaluation purposes using a creative technique. We also extend our heuristic to support mixed-criticality systems which allow for overload operation. We have designed and implemented our software fault-tolerance framework in AUTOSAR, an automotive industry standard. We use this implementation to provide an experimental evaluation of our task-level fault-tolerance features. Finally, we present an analysis of the worst-case behavior of our task recovery features.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
SAE J3016: Taxonomy and definitions for terms related to on-road motor vehicle automated driving systems.
 
3
“IEEE802.1cb-frame replication and elimination for reliability.” http://​www.​ieee802.​org/​1/​pages/​802.​1cb.​html.
 
5
This is feasible with earliest deadline first (EDF) and rate-monotonic scheduling (RMS) with harmonic task sets.
 
6
Since creating an optimal allocation given an arbitrary taskset is NP-Hard to compute, we instead explicitly create a perfect solution that by definition represents an optimal allocation.
 
Literature
go back to reference Avizienis A et al (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE transactions on dependable and secure computing Avizienis A et al (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE transactions on dependable and secure computing
go back to reference Balasubramanian J et al. (2010) Middleware for resource-aware deployment and configuration of fault-tolerant real-time systems. In: RTAS ’10, pp 69–78 Balasubramanian J et al. (2010) Middleware for resource-aware deployment and configuration of fault-tolerant real-time systems. In: RTAS ’10, pp 69–78
go back to reference Bhat A, Aoki S, Rajkumar R (2018) Tools and methodologies for autonomous driving systems. In: Proceedings of the IEEE vol 106, pp 1700–1716 Bhat A, Aoki S, Rajkumar R (2018) Tools and methodologies for autonomous driving systems. In: Proceedings of the IEEE vol 106, pp 1700–1716
go back to reference Bhat A, Samii S, Rajkumar RR (2018) Recovery time considerations in real-time systems employing software fault tolerance. In: 30th Euromicro Conference on Real-Time Systems (ECRTS 2018) (S. Altmeyer, ed.), vol. 106 of Leibniz International Proceedings in Informatics (LIPIcs), (Dagstuhl, Germany). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, pp 23:1–23:22 Bhat A, Samii S, Rajkumar RR (2018) Recovery time considerations in real-time systems employing software fault tolerance. In: 30th Euromicro Conference on Real-Time Systems (ECRTS 2018) (S. Altmeyer, ed.), vol. 106 of Leibniz International Proceedings in Informatics (LIPIcs), (Dagstuhl, Germany). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, pp 23:1–23:22
go back to reference Bouyssounouse B, Sifakis J (2005) Tools for verification and validation. Springer, Berlin, pp 72–84 Bouyssounouse B, Sifakis J (2005) Tools for verification and validation. Springer, Berlin, pp 72–84
go back to reference Chen J et al (2007) Real-time task replication for fault tolerance in identical multiprocessor systems. In: Proceedings of the 13th IEEE real time and embedded technology and applications symposium, RTAS ’07, pp 249–258 Chen J et al (2007) Real-time task replication for fault tolerance in identical multiprocessor systems. In: Proceedings of the 13th IEEE real time and embedded technology and applications symposium, RTAS ’07, pp 249–258
go back to reference Cristian F (1991) Reaching agreement on processor-group membership in synchronous distributed systems. Distrib Comput 4(4):175–187CrossRef Cristian F (1991) Reaching agreement on processor-group membership in synchronous distributed systems. Distrib Comput 4(4):175–187CrossRef
go back to reference Davis RI, Burns A, Bril RJ, Lukkien JJ (2007) Controller area network (can) schedulability analysis: refuted, revisited and revised. Real-Time Syst 35:239–272CrossRef Davis RI, Burns A, Bril RJ, Lukkien JJ (2007) Controller area network (can) schedulability analysis: refuted, revisited and revised. Real-Time Syst 35:239–272CrossRef
go back to reference Felber PNP (2004) Experiences, strategies, and challenges in building fault-tolerant CORBA systems. IEEE Trans Comput. 53(5):497–511CrossRef Felber PNP (2004) Experiences, strategies, and challenges in building fault-tolerant CORBA systems. IEEE Trans Comput. 53(5):497–511CrossRef
go back to reference Gopalakrishnan S, Caccamo M (2006) Task partitioning with replication upon heterogeneous multiprocessor systems. RTAS 06:199–207 Gopalakrishnan S, Caccamo M (2006) Task partitioning with replication upon heterogeneous multiprocessor systems. RTAS 06:199–207
go back to reference Huang H, Gill C, Lu C (2012) Implementation and evaluation of mixed-criticality scheduling approaches for periodic tasks. In: 2012 IEEE 18th Real Time and Embedded Technology and Applications Symposium, pp 23–32 Huang H, Gill C, Lu C (2012) Implementation and evaluation of mixed-criticality scheduling approaches for periodic tasks. In: 2012 IEEE 18th Real Time and Embedded Technology and Applications Symposium, pp 23–32
go back to reference Johnson D (1973) Near optimal allocation algorithms. Ph.D. Dissertation, MIT, MA Johnson D (1973) Near optimal allocation algorithms. Ph.D. Dissertation, MIT, MA
go back to reference Kim J et al (2010) R-BATCH: task partitioning for fault-tolerant multiprocessor real-time systems. In: CIT 2010, Bradford, West Yorkshire, UK, June 29-July 1, 2010, pp 1872–1879 Kim J et al (2010) R-BATCH: task partitioning for fault-tolerant multiprocessor real-time systems. In: CIT 2010, Bradford, West Yorkshire, UK, June 29-July 1, 2010, pp 1872–1879
go back to reference Kim J et al (2012) Safer: system-level architecture for failure evasion in real-time applications. In: IEEE 33rd real-time systems symposium (RTSS), 2012 Kim J et al (2012) Safer: system-level architecture for failure evasion in real-time applications. In: IEEE 33rd real-time systems symposium (RTSS), 2012
go back to reference Klobedanz K et al (2013) Embedded systems: design, analysis and verification. In: Proceedings of the 4th IFIP TC 10, IESS 2013, Paderborn, Germany, June 17-19, 2013. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 238–249 Klobedanz K et al (2013) Embedded systems: design, analysis and verification. In: Proceedings of the 4th IFIP TC 10, IESS 2013, Paderborn, Germany, June 17-19, 2013. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 238–249
go back to reference Lakshmanan K, De Niz D, Rajkumar RR, Moreno G (2013) Overload provisioning in mixed-criticality cyber-physical systems. ACM Trans Embed Comput Syst 11:83:1–83:24 Lakshmanan K, De Niz D, Rajkumar RR, Moreno G (2013) Overload provisioning in mixed-criticality cyber-physical systems. ACM Trans Embed Comput Syst 11:83:1–83:24
go back to reference Lakshmanan K, Niz DD, Rajkumar R, Moreno G (2010) Resource allocation in distributed mixed-criticality cyber-physical systems. In: 2010 IEEE 30th International Conference on Distributed Computing Systems, pp 169–178 Lakshmanan K, Niz DD, Rajkumar R, Moreno G (2010) Resource allocation in distributed mixed-criticality cyber-physical systems. In: 2010 IEEE 30th International Conference on Distributed Computing Systems, pp 169–178
go back to reference Leu K et al (2012) Generic reliability analysis for safety-critical flexray drive-by-wire systems. In: Connected Vehicles and Expo (ICCVE), 2012 Leu K et al (2012) Generic reliability analysis for safety-critical flexray drive-by-wire systems. In: Connected Vehicles and Expo (ICCVE), 2012
go back to reference Narasimhan P et al (2005) MEAD: support for real-time fault-tolerant CORBA. Concurr Comp-Pract E 17(12):1527–1545CrossRef Narasimhan P et al (2005) MEAD: support for real-time fault-tolerant CORBA. Concurr Comp-Pract E 17(12):1527–1545CrossRef
go back to reference Niz D, Lakshmanan K, Rajkumar R (2009) On the scheduling of mixed-criticality real-time task sets. In: 2009 30th IEEE Real-Time Systems Symposium, pp 291–300 Niz D, Lakshmanan K, Rajkumar R (2009) On the scheduling of mixed-criticality real-time task sets. In: 2009 30th IEEE Real-Time Systems Symposium, pp 291–300
go back to reference Oh D, Baker T (1998) Utilization bounds for n-processor rate monotonic scheduling with static processor assignment. In: Real-Time System, pp vol 15, pp 183–192 Oh D, Baker T (1998) Utilization bounds for n-processor rate monotonic scheduling with static processor assignment. In: Real-Time System, pp vol 15, pp 183–192
go back to reference Phillips M, Narayanan V, Aine S, Likhachev M (2015) Efficient search with an ensemble of heuristics. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15. AAAI Press, pp 784–791 Phillips M, Narayanan V, Aine S, Likhachev M (2015) Efficient search with an ensemble of heuristics. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15. AAAI Press, pp 784–791
go back to reference Pinello C et al (2008) Fault-tolerant distributed deployment of embedded control software. In: IEEE transactions on computer-aided design of integrated circuits and systems vol 27, pp 906–919 Pinello C et al (2008) Fault-tolerant distributed deployment of embedded control software. In: IEEE transactions on computer-aided design of integrated circuits and systems vol 27, pp 906–919
go back to reference Pop T, Pop P, Eles P, Peng Z, Andrei A (2006) Timing analysis of the flexray communication protocol. In: 18th Euromicro conference on real-time systems (ECRTS’06), pp 11–216 Pop T, Pop P, Eles P, Peng Z, Andrei A (2006) Timing analysis of the flexray communication protocol. In: 18th Euromicro conference on real-time systems (ECRTS’06), pp 11–216
go back to reference Rajkumar R, Gagliardi M (1996) High availability in the real-time publisher/subscriber inter-process communication model. In: 17th IEEE Real-Time Systems Symposium, pp 136–141 Rajkumar R, Gagliardi M (1996) High availability in the real-time publisher/subscriber inter-process communication model. In: 17th IEEE Real-Time Systems Symposium, pp 136–141
go back to reference Ramamritham K (1995) Allocation and scheduling of precedence-related periodic tasks. IEEE Trans Parallel Distrib Syst 6:412–420CrossRef Ramamritham K (1995) Allocation and scheduling of precedence-related periodic tasks. IEEE Trans Parallel Distrib Syst 6:412–420CrossRef
go back to reference Samii S (2015) Ethernet TSN as enabling technology for ADAS and automated driving systems. In: IEEE-SA Ethernet and IP at Automotive Technology Day, Oct 2015 Samii S (2015) Ethernet TSN as enabling technology for ADAS and automated driving systems. In: IEEE-SA Ethernet and IP at Automotive Technology Day, Oct 2015
go back to reference Zhu P, Yang F, Tu G (2010) Fault-tolerant rate-monotonic compact-factor-driven scheduling in hard-real-time systems. Wuhan Univ J Nat Sci 15(3):217–221CrossRef Zhu P, Yang F, Tu G (2010) Fault-tolerant rate-monotonic compact-factor-driven scheduling in hard-real-time systems. Wuhan Univ J Nat Sci 15(3):217–221CrossRef
Metadata
Title
Practical task allocation for software fault-tolerance and its implementation in embedded automotive systems
Authors
Anand Bhat
Soheil Samii
Ragunathan Rajkumar
Publication date
06-09-2019
Publisher
Springer US
Published in
Real-Time Systems / Issue 4/2019
Print ISSN: 0922-6443
Electronic ISSN: 1573-1383
DOI
https://doi.org/10.1007/s11241-019-09339-7

Other articles of this Issue 4/2019

Real-Time Systems 4/2019 Go to the issue

Premium Partner