1 Extended version
-
A brief discussion of the sustainability of the analysis is given in Sect. 6.
-
Additional experiments have been added, in Sect. 7.1, exploring how the performance of the various analysis methods is impacted by changes in the number of tasks and by the memory delay.
-
A discussion and evaluation of the impact of write buffers on the performance of write-through caches has been added in Sect. 8. Here we show that depending on the precise policies employed, write buffers may result in domino effects, severely affecting guaranteed performance.
-
Finally, while data-cache analysis is not the main focus of the paper, we review related work in this area in the Appendix.
2 Introduction
2.1 Related work
2.1.1 Accounting for overheads in schedulability analysis
2.1.2 Write-back caches in worst-case execution time (WCET) analysis
2.2 Organisation
3 Caches
3.1 Write policies
3.2 Classification of write backs
3.3 Characterizing a task’s write backs
4 Task model and basic analysis
4.1 Task model
4.2 Schedulability analysis for FPPS
4.3 Schedulability analysis for FPNS
5 Write backs under FPNS
5.1 Carry-in write backs within the job
5.1.1 ECB-Only approach
5.1.2 FDCB-Union approach
5.2 Carry-in write backs in subsequent jobs
5.2.1 FDCB-Only approach
5.2.2 ECB-Union approach
5.3 Worked example
Task | C | T | ECB | DCB | FDCB |
---|---|---|---|---|---|
\(\tau _1\)
| 100 | 1000 |
\(\{1,4,5\}\)
|
\(\{1\}\)
|
\(\{1\}\)
|
\(\tau _2\)
| 100 | 1000 |
\(\{2,3,4,5\}\)
|
\(\{2,3,4\}\)
|
\(\{2,3\}\)
|
\(\tau _3\)
| 100 | 1000 |
\(\{2,3,5\}\)
|
\(\{2,3,5\}\)
|
\(\{2,3\}\)
|
\(\tau _4\)
| 100 | 1000 |
\(\{1,2,3,4,5,6\}\)
|
\(\{1,2,3,4,5,6\}\)
|
\(\{1\}\)
|
5.3.1 ECB-Only
5.3.2 FDCB-Union
5.3.3 FDCB-Only
5.3.4 ECB-Union
6 Write backs under FPPS
6.1 Initially dirty cache line write backs
6.2 Lower priority carry-in write backs
6.2.1 DCB-Only approach
6.2.2 ECB-Union approach
6.2.3 ECB-Only approach
6.2.4 DCB-Union approach
6.3 Finished-carry-in write backs
6.4 Worked example
Task | C | T | ECB | DCB | FDCB |
---|---|---|---|---|---|
\(\tau _1\)
| 100 | 1000 |
\(\{1,4,5\}\)
|
\(\{1\}\)
|
\(\{1\}\)
|
\(\tau _2\)
| 100 | 1000 |
\(\{2,3,4,5\}\)
|
\(\{2,3,4\}\)
|
\(\{2,3\}\)
|
\(\tau _3\)
| 100 | 1000 |
\(\{2,3,5\}\)
|
\(\{2,3,5\}\)
|
\(\{2,3\}\)
|
\(\tau _4\)
| 100 | 1000 |
\(\{1,2,3,4,5,6\}\)
|
\(\{1,2,3,4,5,6\}\)
|
\(\{1\}\)
|
6.4.1 DCB-Only
6.4.2 ECB-Union
6.4.3 ECB-Only
6.4.4 DCB-Union
7 Sustainability of the analysis
8 Experimental evaluation
Name |
\(\vert UCB^{I}\vert \)
|
\(\vert ECB^{I}\vert \)
|
\(\vert UCB^{D}\vert \)
|
\(\vert ECB^{D}\vert \)
|
\(\vert DCB\vert \)
|
\(\vert FDCB\vert \)
|
---|---|---|---|---|---|---|
cnt | 12 | 82 | 21 | 68 | 28 | 28 |
compress | 21 | 71 | 53 | 103 | 60 | 60 |
countneg | 15 | 77 | 59 | 103 | 66 | 66 |
crc | 19 | 89 | 25 | 73 | 40 | 39 |
expint | 16 | 76 | 11 | 42 | 13 | 13 |
fdct | 52 | 144 | 15 | 48 | 19 | 19 |
fir | 22 | 83 | 17 | 57 | 17 | 16 |
jfdctint | 46 | 145 | 17 | 53 | 23 | 23 |
loop3 | 7 | 309 | 9 | 42 | 12 | 12 |
ludcmp | 38 | 128 | 21 | 61 | 28 | 28 |
minver | 103 | 213 | 18 | 71 | 33 | 33 |
ns | 14 | 70 | 9 | 116 | 13 | 11 |
nsichneu | 345 | 494 | 52 | 95 | 54 | 53 |
qurt | 61 | 132 | 14 | 49 | 17 | 17 |
select | 47 | 124 | 10 | 49 | 16 | 16 |
sqrt | 51 | 102 | 11 | 48 | 16 | 16 |
statemate | 92 | 167 | 25 | 68 | 21 | 20 |
a2time | 16 | 122 | 8 | 100 | 69 | 67 |
aifirf | 25 | 141 | 33 | 188 | 161 | 54 |
basefp | 11 | 88 | 15 | 512 | 507 | 467 |
canrdr | 8 | 40 | 9 | 371 | 195 | 186 |
iirflt | 35 | 288 | 28 | 259 | 147 | 138 |
pntrch | 24 | 38 | 20 | 237 | 176 | 70 |
puwmod | 3 | 50 | 5 | 512 | 307 | 275 |
rspeed | 8 | 53 | 7 | 122 | 71 | 70 |
tblook | 12 | 115 | 14 | 125 | 71 | 71 |
Name |
\(C^{\text {wb}}\)
|
\(C^{\text {wt}}\)
|
\(C^{{\text {wt}}}/C^{\text {wb}}\)
|
\(C^{\text {nc}}\)
|
\(C^{\text {nc}}/C^{\text {wb}}\)
|
---|---|---|---|---|---|
cnt | 9325 | 13485 | 1.44 | 24565 | 2.63 |
compress | 10673 | 18713 | 1.75 | 43443 | 4.07 |
countneg | 36180 | 57250 | 1.58 | 114340 | 3.16 |
crc | 68889 | 133909 | 1.94 | 272859 | 3.96 |
expint | 9268 | 15208 | 1.64 | 31098 | 3.35 |
fdct | 7883 | 16793 | 2.13 | 38423 | 4.87 |
fir | 8328 | 18998 | 2.28 | 43668 | 5.24 |
jfdctint | 9711 | 18621 | 1.91 | 39181 | 4.03 |
loop3 | 14189 | 28729 | 2.02 | 57929 | 4.08 |
ludcmp | 10058 | 15948 | 1.58 | 39668 | 3.94 |
minver | 18976 | 30616 | 1.61 | 54746 | 2.88 |
ns | 27464 | 37674 | 1.37 | 98634 | 3.59 |
nsichneu | 18988 | 24458 | 1.28 | 66808 | 3.51 |
qurt | 10473 | 16003 | 1.52 | 23573 | 2.25 |
select | 8981 | 17031 | 1.89 | 30331 | 3.37 |
sqrt | 27667 | 40537 | 1.46 | 59117 | 2.13 |
statemate | 64638 | 195778 | 3.02 | 581908 | 9.00 |
a2time | 12655 | 22975 | 1.81 | 53815 | 4.25 |
aifirf | 44898 | 86768 | 1.93 | 181698 | 4.04 |
basefp | 50491 | 92221 | 1.82 | 213771 | 4.23 |
canrdr | 32641 | 65211 | 1.99 | 156611 | 4.79 |
iirflt | 29995 | 56995 | 1.90 | 127605 | 4.25 |
pntrch | 23887 | 43137 | 1.80 | 109257 | 4.57 |
puwmod | 48782 | 97072 | 1.98 | 239752 | 4.91 |
rspeed | 10913 | 21393 | 1.96 | 51713 | 4.73 |
tblook | 12533 | 25493 | 2.03 | 58813 | 4.69 |
-
The default task set size was 10.
-
Each task was assigned data from a randomly chosen row of Table 4, corresponding to code from the benchmarks.
-
The task utilizations (\(U_i\)) were generated using UUnifast (Bini and Buttazzo 2005).
-
Task periods were set based on utilization and the stand-alone WCET for a write-back cache, i.e., \(T_i = C^{\text {wb}}_i/U_i\).
-
Task deadlines were implicit \(D_i = T_i\).
-
Task priorities were in deadline-monotonic order.
-
Tasks were placed in memory sequentially in priority order, thus determining the direct mapping to cache.
Approach | FPPS | FPNS |
---|---|---|
Write-back (upper bound) | 0.793458 | 0.445750 |
Combined | 0.693003 | 0.412270 |
(F)DCB-Union | 0.692087 | 0.411087 |
ECB-Union | 0.672489 | 0.396159 |
(F)DCB-Only | 0.561542 | 0.396159 |
ECB-Only | 0.581876 | 0.365523 |
Write-back (flush) | 0.304987 | 0.305039 |
Write-through | 0.249231 | 0.112666 |
No data cache | 0.052548 | 0.021463 |
8.1 Weighted schedulability
9 Write buffers
9.1 Local hazard policy
9.2 Coalescence policy
9.3 Retirement policies
9.4 Timing composition
9.5 Domino effects
9.5.1 Reading from the write buffer combined with lazy retirement
9.5.2 Write merge combined with lazy retirement
9.5.3 Write merge combined with eager retirement
9.6 Analysis of write merge
9.7 Analysis of no-write merge
9.8 Write buffers and write-back caches
9.9 Evaluation with write buffers
Name |
\(C^{{\text {wb}}-0}\)
|
\(C^{{\text {wb}}-1}\)
|
\(C^{{\text {wt}}-0}\)
|
\(C^{{\text {wt}}-1}\)
|
\(C^{{\text {wt}}-2}\)
|
\(C^{{\text {wt}}-4}\)
|
\(C^{\text {nc}}\)
|
---|---|---|---|---|---|---|---|
cnt | 9325 | 9325 | 13485 | 9815 | 9745 | 9695 | 24565 |
compress | 10673 | 10673 | 18713 | 16963 | 16403 | 14863 | 43443 |
countneg | 36180 | 36180 | 57250 | 56500 | 48450 | 37260 | 114340 |
crc | 68889 | 68869 | 133909 | 79469 | 79419 | 69759 | 272859 |
expint | 9268 | 9268 | 15208 | 12548 | 9508 | 9448 | 31098 |
fdct | 7883 | 7883 | 16793 | 11403 | 10203 | 9253 | 38423 |
fir | 8328 | 8318 | 18998 | 13718 | 8858 | 8548 | 43668 |
jfdctint | 9711 | 9711 | 18621 | 14141 | 12291 | 11601 | 39181 |
loop3 | 14189 | 14189 | 28729 | 26909 | 14369 | 14349 | 57929 |
ludcmp | 10058 | 10048 | 15948 | 13178 | 11628 | 10828 | 39668 |
minver | 18976 | 18976 | 30616 | 23226 | 22276 | 20026 | 54746 |
ns | 27464 | 27444 | 37674 | 27704 | 27644 | 27624 | 98634 |
nsichneu | 18988 | 18954 | 24458 | 20068 | 20028 | 19988 | 66808 |
qurt | 10473 | 10473 | 16003 | 12293 | 11483 | 10873 | 23573 |
select | 8981 | 8971 | 17031 | 12181 | 11251 | 9961 | 30331 |
sqrt | 27667 | 27667 | 40537 | 34607 | 31037 | 28147 | 59117 |
statemate | 64638 | 64628 | 195778 | 120958 | 102918 | 96858 | 581908 |
a2time | 12655 | 12468 | 22975 | 22825 | 12645 | 12635 | 53815 |
aifirf | 44898 | 41638 | 86768 | 77528 | 41508 | 41508 | 181698 |
basefp | 50491 | 49822 | 92221 | 91421 | 50801 | 50651 | 213771 |
canrdr | 32641 | 32372 | 65211 | 64811 | 33261 | 33141 | 156611 |
iirflt | 29995 | 29845 | 56995 | 54845 | 34445 | 32865 | 127605 |
pntrch | 23887 | 22519 | 43137 | 42447 | 22627 | 22627 | 109257 |
puwmod | 48782 | 48184 | 97072 | 96702 | 48642 | 48592 | 239752 |
rspeed | 10913 | 10893 | 21393 | 21213 | 12103 | 11933 | 51713 |
tblook | 12533 | 12503 | 25493 | 22383 | 19923 | 13573 | 58813 |
Name |
\(\frac{C^{{\text {wb}}-1}}{C^{{\text {wb}}-0}}\)
|
\(\frac{C^{{\text {wt}}-0}}{C^{{\text {wb}}-0}}\)
|
\(\frac{C^{{\text {wt}}-1}}{C^{{\text {wb}}-0}}\)
|
\(\frac{C^{{\text {wt}}-2}}{C^{{\text {wb}}-0}}\)
|
\(\frac{C^{{\text {wt}}-4}}{C^{{\text {wb}}-0}}\)
|
\(\frac{C^{\text {nc}}}{C^{{\text {wb}}-0}}\)
|
---|---|---|---|---|---|---|
cnt | 1.00 | 1.44 | 1.05 | 1.04 | 1.03 | 2.63 |
compress | 1.00 | 1.75 | 1.58 | 1.53 | 1.39 | 4.07 |
countneg | 1.00 | 1.58 | 1.56 | 1.33 | 1.02 | 3.16 |
crc | .99 | 1.94 | 1.15 | 1.15 | 1.01 | 3.96 |
expint | 1.00 | 1.64 | 1.35 | 1.02 | 1.01 | 3.35 |
fdct | 1.00 | 2.13 | 1.44 | 1.29 | 1.17 | 4.87 |
fir | .99 | 2.28 | 1.64 | 1.06 | 1.02 | 5.24 |
jfdctint | 1.00 | 1.91 | 1.45 | 1.26 | 1.19 | 4.03 |
loop3 | 1.00 | 2.02 | 1.89 | 1.01 | 1.01 | 4.08 |
ludcmp | .99 | 1.58 | 1.31 | 1.15 | 1.07 | 3.94 |
minver | 1.00 | 1.61 | 1.22 | 1.17 | 1.05 | 2.88 |
ns | .99 | 1.37 | 1.00 | 1.00 | 1.00 | 3.59 |
nsichneu | .99 | 1.28 | 1.05 | 1.05 | 1.05 | 3.51 |
qurt | 1.00 | 1.52 | 1.17 | 1.09 | 1.03 | 2.25 |
select | .99 | 1.89 | 1.35 | 1.25 | 1.10 | 3.37 |
sqrt | 1.00 | 1.46 | 1.25 | 1.12 | 1.01 | 2.13 |
statemate | .99 | 3.02 | 1.87 | 1.59 | 1.49 | 9.00 |
a2time | .98 | 1.81 | 1.80 | .99 | .99 | 4.25 |
aifirf | .92 | 1.93 | 1.72 | .92 | .92 | 4.04 |
basefp | .98 | 1.82 | 1.81 | 1.00 | 1.00 | 4.23 |
canrdr | .99 | 1.99 | 1.98 | 1.01 | 1.01 | 4.79 |
iirflt | .99 | 1.90 | 1.82 | 1.14 | 1.09 | 4.25 |
pntrch | .94 | 1.80 | 1.77 | .94 | .94 | 4.57 |
puwmod | .98 | 1.98 | 1.98 | .99 | .99 | 4.91 |
rspeed | .99 | 1.96 | 1.94 | 1.10 | 1.09 | 4.73 |
tblook | .99 | 2.03 | 1.78 | 1.58 | 1.08 | 4.69 |