1 Introduction
2 Related work
3 Background
-
Average distance. This error function calculates the average time discrepancy of the first k jobs.$$E_{SQD}(W,t,k):=\sqrt{\frac{\sum _{1\le i\le k} (r_{ex}(j_i)-r_{ob}(j_i,t))^2}{k}}$$(1)
-
Mean absolute percentage error. Here the relative error of the observed runtime is calculated for each job, then it is averaged for all k jobs:$$E_{MAPE}(W,t,k):=\frac{100}{k}\sum _{1\le i \le k}\frac{|r_{ex}(j_i)-r_{ob}(j_i,t)|}{r_{ex}(j_i)}$$(2)
-
Time adjusted distance. The function adjusts the execution time discrepancies calculated in \(E_{SQD}\) so that the jobs started closer (in time) to \(j_k\) will have more weight in the final error value.$$E_{TAdj-SQD}(W,t,k):=\sqrt{\frac{\sum _{1\le i\le k} \frac{i}{k}(r_{ex}(j_i)-r_{ob}(j_i,t))^2}{\sum _{1\le i\le k}\frac{i}{k}}}$$(3)
4 Workflow enactment and simultaneous prediction
4.1 Background workload prediction
4.1.1 Base definitions
4.1.2 Overview of the algorithm
5 Evaluation with a biochemical workflow
5.1 The tinker workflow on the LPDS cloud
5.2 Modelling and simulating the workflow
5.2.1 The model of the workflow’s execution
PSSTART
tag (see lines 1, 6 and 20, which correspond to the three main sections G–[T1/T2/T3–TC]–C of the TCG workflow shown in the top left corner of Fig. 4). This tag is used as a delimiter of parallel sections of the workflow, thus everything that reside in between two PSSTART
lines should be simulated as if they were executed in parallel. Before the actual execution though, every PSSTART
delimited section contains the definition of the kind of VM that should be utilized during the entire parallel section. The properties of these VMs are defined by the VMDEF
entry (e.g., see lines 2 or 8) following PSSTART
lines. Note, the definition of a VM is dependent on the simulator used, so below we list the defining details specific to DISSECT-CF:-
The virtual machine image used as the VM’s disk. This is denoted with property name
VA
. In this property, we specify that the image is to be called “tinker”. Next, we ask its boot process to last for 25 seconds. Afterwards, we specify the VM image to be copied to its hosting PM before starting the VM—0 (i.e., the VM should not run on a remote filesystem). Finally, we set the image’s size as 306 MBs. -
The required resources to be allocated for the VM on its hosting PM. These resources are depicted behind the property name of
RC
in the figure. Here we provided details for the number of cores (1), their performance (5.0E-4
—this is a relative performance metric compared to one of the CPUs in LPDS cloud) and the amount of memory (1 GB) to be associated with the soon to be VMs. -
Image origin where the VM’s disk image is downloaded from before the virtual machine is instantiated. We used the property name of
VAST
to tell the simulator the host name of the image repository that originally stores the VM’s image. -
Data store is the source/sink of all the data the VM produces during its runtime. This is defined with the property called
DATA
. This field helps the simulation to determine the target/source of the network activities later depicted in theVMSEQ
entries.
VMSEQ
entries (e.g., see lines 4 or 10) that reside in each PSSTART
delimited parallel section. VMSEQ
entries are used to tell the simulator a new VM needs to be instantiated in the parallel section. Each VM requested by the VMSEQ
entries will use the definition provided in the beginning of the parallel section. All VMs listed in the section are requested from the simulated LPDS cloud right before the workflow’s processing reaches the next PSSTART
entry in the description. This guarantees they are requested and executed in parallel (note, despite requesting the VMs simultaneously from the cloud, their level of concurrency observed during the parallel section will depend on the actual load of the simulated LPDS cloud). The processing of the next parallel section, only starts after the termination of all previously created VMs.VMSEQ
entries, a VM’s activities before termination. There are two kinds of activities listed: network and compute. Network activities start with N
and then followed by the number of bytes to be transferred between the DATA
store and the VM (this is the store defined by the VMDEF
entry at the beginning of the parallel section). Compute activities, on the other hand, start with the letter C
and then they list the number of seconds till the CPUs of the VM are expected to be fully utilised by the activity. VM level activities are executed in the simulated VM in a sequence (i.e., one must complete before the next could start).PSSTART
entry in line 6 and the virtual machine executions, defined until line 16, represent a single execution of the parallel section of the TCG workflow. Because of repetitions, we have omitted the several VMSEQ
entries from the parallel section, as well as several PSSTART
entries representing further parallel sections of conformer analysis. On the other hand, the description offered for the simulator did contain all the 14 additional PSSTART
entries which were omitted here for readability purposes.5.2.2 Simulating the background load
5.3 Evaluation
5.3.1 Relation between past and future errors
5.3.2 Behaviour analysis of the algorithm
Input | Used parameters |
---|---|
P
| 10, 20, 50, 100, 200, 500, 1000, 2000, 5000 |
S
| 100, 200, 500, 1000, 2000\(^*\) |
I
| 2, 4, 8, 16, 32 |
\((\max T_{red}-\min T_{red})/S\)
| 2, 5, 10, 20, 50, 100 |
E
| \(E_{SQD}\), \(E_{MAPE}\), \(E_{T_{Adj}-SQD}\) |
-
Execution time level metric: First, we used the \(E_{MAPE}\) function from Eq. 2. We wanted \(E_{MAPE}\) to show the average error (in percentage) between \(t_g\) and \(t_{target}\), thus during this evaluation, we assigned \(r_{ex}(j_i)\leftarrow r_{ob}(j_i,t_g)\) and \(r_{ob}(j_i,t)\leftarrow r_{ob}(j_i,t_{target})\). This allowed us to see the execution time differences the algorithm’s predicted \(t_{target}\) trace fragment causes in contrast to the golden’s. We will denote this special use of the error function as \(E_{MAPE}^*\).
-
Error level metric: We also compared how do the golden and the approximated trace fragments relate to the real life execution expectations of TCG—these are the \(r_{ex}(j_i)\) values we have identified in the LPDS cloud according to Sect. 5.1. For this metric, we again use the mean absolute percentage error method, but this time to see how the error for \(t_g\) is approximated by the error of \(t_{target}\) at every k value.
Random selection |
\(MAPE_E(W,t_g)\)
|
\(MAPE_F(W,t_g)\)
|
\(E_{MAPE}^*\)
|
\(F_{MAPE}^*\)
|
---|---|---|---|---|
Average (\(\forall t_g\in T_G\)) | 157.874 | 166.166 | 72.243 | 85.893 |
Median (\(\forall t_g\in T_G\)) | 49.180 | 67.825 | 45.174 | 47.002 |
Error function | Coefficient of determination | |
---|---|---|
\(\downarrow\)
|
\(R^2(E(t_g),E(t_{target}))\)
|
\(R^2(F(t_g),F(t_{target}))\)
|
\(E_{SQD}\)
| 0.824 | 0.176 |
\(E_{MAPE}\)
| 0.656 | 0.015 |
\(E_{TAlt-SQD}\)
| 0.696 | 0.267 |