2021 | OriginalPaper | Chapter Open Access

# 10. Compilation for Real-Time Systems a Decade After Predator

Authors: Heiko Falk, Shashank Jadhav, Arno Luppold, Kateryna Muts, Dominic Oehlert, Nina Piontek, Mikko Roth

Publisher: Springer International Publishing

## 10.1 Introduction

## 10.2 Challenges and State-of-the-Art in WCET-Aware Compilation During Predator

## 10.3 Integration of Task Coordination into WCET-Aware Compilation

_{i}are used per basic block b

_{i}. b

_{i}is moved from main memory onto the scratchpad memory if x

_{i}equals 1. The overall goal of this ILP is to find an assignment of values to the variables x

_{i}such that the resulting SPM allocation leads to the minimal WCET of the whole task. Constraints are added to the ILP that model the task’s internal program structure. For each basic block b

_{i}and each successor b

_{succ}of b

_{i}in the task’s Control Flow Graph (CFG), a constraint is set up bounding the WCET c

_{i}of b

_{i}:

_{i}of a path starting in b

_{i}must be larger than the WCET c

_{succ}of any of the successors of b

_{i}, plus the contribution of b

_{i}to the WCET itself with b

_{i}located in main memory ( \(\mathit {cost}_{i,\mathit {main\_mem}}\)), minus the potential gain when moving b

_{i}from main memory onto the scratchpad memory. Additional constraints in the ILP model loops and function calls. The limited available capacity of the SPM is considered as well as the additional overhead due to long-distance jumps from the main memory to the SPM or back. In the end, the WCET of an entire task is represented in the ILP model by a variable \(c_{\mathtt {main}}^{\mathit {entry}}\) which models the WCET of the path starting at the task’s entry point, i.e., at its main function [ 7].

_{j}associated with the entry points of the tasks τ

_{j}describe a safe upper bound of the tasks’ WCETs. An early work [ 22] towards Predator’s Challenge #1 on optimization of multi-task systems under consideration of scheduling policies integrated Joseph’s schedulability analysis [ 15] into this multi-task ILP.

_{j}’s WCRT r

_{j}is the maximum possible time interval between the activation of a task and its end, including penalties due to preemptions by higher-priority tasks. The tasks’ WCRTs are computed as follows:

_{j}of task τ

_{j}and the penalties due to tasks τ

_{0}, …, τ

_{j−1}having higher priority than τ

_{j}. Each such high-priority task τ

_{h}preempts τ

_{j}a total of \(\left \lceil \frac {r_j}{T_h} \right \rceil \) times where T

_{h}denotes a task’s period. For each preemption of τ

_{j}by τ

_{h}, the higher-priority task’s WCET c

_{h}is considered.

_{h}and the WCRTs r

_{j}are ILP variables so that the multiplication of \(\left \lceil \frac {r_j}{T_h} \right \rceil \) by c

_{h}is infeasible. In order to solve this problem, an integer variable p

_{j,h}is added to the ILP for every combination of low- and high-priority tasks τ

_{j}and τ

_{h}, resp. p

_{j,h}denotes the timing penalty that is added to τ

_{j}’s WCRT due to preemptions by τ

_{h}. Using these variables, Eq. ( 10.2) can be rewritten to:

_{j,h}, the following linearization scheme is applied: If r

_{j}is lower than or equal to τ

_{h}’s period T

_{h}, τ

_{j}can be preempted at most once by τ

_{h}, thus leading to p

_{j,h}= 1 ∗ c

_{h}. If r

_{j}is greater than T

_{h}but lower than or equal to 2 ∗ T

_{h}, p

_{j,h}= 2 ∗ c

_{h}results, etc. In general, it has been proven that

_{j}is preempted at least N times by τ

_{h}, then p

_{j,h}≥ ( N + 1) ∗ c

_{h}must hold.

_{j}by τ

_{h}is \(\left \lceil \frac {D_j}{T_h} \right \rceil \) where D

_{j}denotes task τ

_{j}’s deadline. Thus, the conditional constraints from Theorem 10.1 are added to the ILP for all values of N with \(0 \leq N \leq \left \lceil \frac {D_j}{T_h} \right \rceil - 1\) and for all pairs of low- and high-priority tasks τ

_{j}and τ

_{h}, resp. Finally, the schedulability of the entire multi-task set is ensured during this ILP-based optimization by adding constraints

_{j}must be within its respective deadline.

_{0}with the highest priority into the multi-task system. τ

_{0}represents the periodically executed scheduler, and by considering an actual scheduler’s WCET c

_{0}and its period T

_{0}, it can smoothly be integrated into the optimization framework [ 23].

_{j}( Δ t − D

_{j}) returns the number of activations of task τ

_{j}that happen within Δ t and that must be finished before the deadline D

_{j}. Each task activation is multiplied by the task’s respective maximum computational demand, i.e., its WCET plus additional preemption overheads o

_{j}.

_{j}. This schedulability test has to be modeled for all possible time intervals Δ t. The maximal interval to be considered is regularly given by the task set’s hyperperiod. Checking all possible intervals Δ t up to the hyperperiod is practically infeasible. Fortunately, task preemptions can only occur if a new task is ready for execution for many real-life scheduling policies like, e.g., EDF. Thus, the schedulability test from Eq. ( 10.6) has to be modeled in the ILP only at the points of discontinuity of the task set’s density functions η. Finally, one constraint needs to be added that ensures that the system’s overall load due to periodically repeating task activations stays below 100%. It is also possible to extend this approach towards fixed-priority scheduling, and the resulting ILP model grows only linearly with the number of events that have to be analyzed, in contrast to the quadratic nature inherent to [ 22, 24].

## 10.4 Analysis and Optimization of Multi-Processor Systems on Chip

^{−}) and upper ( α

^{+}) request functions generated for two selected benchmarks compressdata and binarysearch are shown in Fig. 10.2. The vertical distance between the lower and upper functions shows the variation of the number of produced requests. For example, compressdata can terminate with solely 82 shared bus accesses in total, or with up to 131. For binarysearch, both the lower and upper request functions converge to a common value, since each possible path through the program’s code covers an identical number of bus requests. Only the points in time when these events occur differ.

^{+}of compressdata. The finest-possible granularity, i.e., Δ t = 1 clock cycle, leads to 131 samples in total and to a very smooth and precise result. When reducing the granularity such that only 50 samples are considered, the resulting request function has a clearly visible stepwise shape. However, the resulting function for 50 samples always dominates the most precise function so that no unsafe results are produced. For the highest precision with 131 samples, our algorithm requires 48 CPU seconds. In contrast, the time required to generate the request function for compressdata decreases down to 10 CPU seconds if 50 sampling points are considered.

^{+}and β represents the maximum delay d

_{max}a task exhibits due to blocked shared bus requests. In the figure, a task requests 2 bus accesses during interval lengths of 3 clock cycles. However, the bus can deliver the desired capacity only within 13 clock cycles. Thus, a blocking time of 10 clock cycles results from Fig. 10.4.

## 10.5 Multi-Objective Compiler Optimizations Under Real-Time Constraints

_{i}of each basic block b

_{i}can be characterized in dependence of the ILP’s binary decision variables x

_{i}. By combining these block-level energy values with profiling-based information about the blocks’ execution frequencies, the overall energy consumption e

_{j}of task τ

_{j}can be modeled. Multiplying these task-level energy values with the tasks’ activation functions η

_{j}(cf. Sect. 10.3) over the entire task set’s hyperperiod H yields an expression that models the energy dissipation of the complete multi-task system and that thus can be minimized under simultaneous adherence to the ILP’s schedulability constraints:

_{i}encode whether function f

_{i}is compressed or not.

_{i}that might be compressed, its original, uncompressed code size \(S_i^{\mathrm{orig}}\) and its Worst-Case Execution Time \(C_i^{\mathrm{orig}}\) are pre-computed. Assuming that f

_{i}would be compressed, the corresponding values \(S_i^{\mathrm{comp}}\) and \(C_i^{\mathrm{comp}}\) can also be pre-determined. For the WCET analysis of a potentially compressed function f

_{i}, the decompression routine is added by the compiler, and the loops therein are precisely annotated with upper iteration bounds for the decompression of the currently considered function f

_{i}in order to support the WCET analyzer aiT. Based on this data, the impact of f

_{i}’s compression on the entire program’s code size Δ S

_{i}and Worst-Case Execution Time Δ C

_{i}can be expressed in the ILP.

^{limit}. Under these constraints, the ILP finally minimizes the entire program’s code size by selecting appropriate functions f

_{i}for compression.

^{limit}was set to 0.5 so that maximum WCET increases by 50% were still accepted by the optimization.