Introduction
Related work
Methodology
Our previous generator approach
Proposed architecture
GA
, User Behaviour
and Standard Format
. They are added to improve quality, enabling scaling workload and providing reusability of generated traces respectively. Our new extension also relies on the DistSysJavaHelpers
project that provides abstractions to represent arbitrary workloads as well as enabling loading of several well-known workload trace formats.
GA
component includes the concept of genetic algorithm, and it works with other components of the same layer to improve the process of generating traces based on averages and percentiles. The User Behaviour
component calculates the number of invocations and services for each unique user to provide statistical information allowing scale-workloads. Finally, the Standard Format
component supports the reusability of the generated traces by converting them to other formats that are supported by most simulators.Improve the percentiles of generated trace
GA
component) to our previous approach to produce several sets of values, and then it selects optimal values to constitute a single set for function and its invocations as shown in Fig. 2. Our approach is able to generate full large-scale traces or scaling-workload to desired size (which will be discussed in the following subsection). When the Azure dataset is selected, it will be processed by Generic Trace Producer
to analyse dataset’s contents. It reads each line from the trace file (which represent one unique function and its invocations over a 24 hour period) and imitates the behaviour of the function according to invocations, as shown in Fig. 2. This is done by extracting the necessary percentile information to be passed for Generate Execution Time
and Generate Memory
components via GA
component. The GA
component includes the configuration of genetic algorithm set according to [12], in terms of generation’s number, individuals and fitness value that has to be fulfilled.
GA
component produces sets of values according to the number of individuals. Each set of these represents generated execution time and memory utilisation values for the selected function and its invocations. For each iteration, the GA
component will apply the selection, mutation and crossover policies, according to the configuration, to select the best amongst the whole selection of individuals that produce good \(R^2\) value for the percentiles (fitness function) against the original percentiles of the Azure dataset.Function Definition
component (which is responsible for populating the corresponding task definition in the simulation) will instantiate invocation with the previously extracted values (e.g., amount of memory). Each single invocation involves several parameters such as unique ID, submitted time, execution time and memory utilisation, that determine the behaviour of the function. Function Definition
will continue the generation of the tasks of the selected function according to its total number of invocations. After generating all the required function invocations for the simulator, Generic Trace Producer
proceeds reading the next line from the trace. This process continues till we finish generating all requested functions and their invocations.Scaling workload with real users’ behaviour
User Behavior
component that enables scaling workloads by detecting users’ behaviour in terms of calculating the percentage of participation for each unique user in a dataset. The main logic behind this component is explained in Algorithm 1.Generic Trace Producer
component. This, then produces a single user focused statistically correct generated set of invocations that fits the desired scenario of the user of our scaler. These, unique-user-specific traces are then merged into the finally returned generated workload. To allow further customisation of the scaled traces, we enabled customising the scaled workload based on a particular time range or specific services. These are both passed in as the service and time parameters to both the scaler as well as the trace producer.Converting generated trace to standard formats
Standard Format
component enables reusability of generated traces by converting them to other formats and getting rid of the process of generating traces repeatedly. When the tasks (FaaS functions) are generated, this component obtains each task with all its attributes (e.g. execution time, tasks’ id and amount of memory) to store in its repository. Then, for each task, it sorts its attributes values based on the desired format. Finally, it goes through the process of extracting the tasks one by one till the end.Standard Format
component supports converting generated traces to several standard formats. This provides an alternative solution to other simulators that don’t natively support the serverless model to advantage from the generated realistic serverless traces. It also allows a user of the model to write its own format by extracting all attributes of generated tasks based on the desired format.Standard Format
component enriches serverless frameworks with FaaS workload by converting generated traces to AWS Lambda traces and other trace formats. However, some AWS trace attributes are not available in Azure datasets, such as cold start, which demonstrate when a provider has requested a new instance for an invocation. To collect these attributes in real run time, our serverless model [4] that was devised by extending the DISSECT-CF, uses converted AWS Lambda trace as input for simulation to produce the necessary attributes that are unavailable. For each task (invocation) in trace, our model will submit the task to the infrastructure to be simulated. If there is an instance ready to accommodate this task, it will simulate directly. Otherwise, the model will request a new instance (cold start) for this task. When the simulation is finished, the model reproduces the AWS trace with complete attributes to be simulated by other serverless frameworks.Evaluation
Percentiles
GA
component setting according to an analysis provided in [12]. We used 100 individuals and tournament selection with size equal to 10 individuals. Regarding crossover and mutation rates, they set to 0.9 and 0.05, respectively. Finally, the elitism strategy is exploited to copy the best individual from the current population into the next one without undergoing the genetic operators.Users’ behaviour
User Behavior
component to analyse the selected file statistically. The User Behavior
component demonstrated that the file came with around 853 million invocations, 36,456 services and 8,590 users. Moreover, it provided detailed information regarding each user’s invocation number and the percentage of participation, as shown in the Table 1.Rank | UserID | Number of Jobs | Percentage |
---|---|---|---|
1 | U4932 | 127471686 | 14.94% |
2 | U376 | 98867247 | 11.59% |
3 | U5660 | 51372036 | 6.02% |
4 | U1746 | 49036404 | 5.75% |
5 | U3387 | 37036087 | 4.34% |
6 | U8104 | 22457567 | 2.63% |
7 | U6940 | 16743959 | 1.96% |
8 | U6143 | 15937341 | 1.87% |
9 | U2488 | 15808936 | 1.85% |
10 | U6175 | 14780372 | 1.73% |
11 | Other | 403617380 | 47.31% |
12 | Total | 853129015 | 100.00% |
Generic Trace Producer
component to be used for producing different workloads with the same Azure dataset users’ behaviour. We validated the scaling approach by producing different workload sizes that fit small and large infrastructures. We, then, invoked User Behavior
component for statistical analysis of each generated workload. Finally, we compared the percentage of users’ participation in all different workloads, with the original one (which existed in selected file representing one day) by using \(R^2\) as shown in Table 2. We also measured \(R^2\) between the average of percentiles for execution time and memory utilisation for all generated workloads against the original one to show data accuracy. The results show that our approach enables scaling workloads efficiently with the real users’ behaviour. It also shows that the generated execution time and memory utilisation percentiles resemble the original values during scaling workloads.Workload Size | \(R^2\) (Percentage) | \(R^2\) (Execution time) | \(R^2\) (Memory) |
---|---|---|---|
\(10^3\) | 0.9999 | 0.9969 | 0.9986 |
\(10^4\) | 1 | 0.9986 | 0.9984 |
\(10^5\) | 1 | 0.9993 | 0.9989 |
\(10^6\) | 1 | 0.9993 | 0.9981 |
\(10^7\) | 1 | 0.9995 | 0.9997 |
\(10^8\) | 1 | 1 | 0.9998 |
Generic Trace Producer
also enables a user of a simulator to generate workload with customized options such as a specific service or time while maintaining real users’ behaviour with help of User Behavior
component. We generated a trace as shown in Fig. 4 for orchestration trigger that contains numerous invocations during one day. We also generated a trace for the timer trigger, and we showed the participation percentage of users at the first and last minutes of the day, as shown in Fig. 5.
Converting approach
DISSECT-CF
Metrics | CSV | GWF | SWF |
---|---|---|---|
Average utilization of PMs(%) | 0.8119 | 0.8014 | 0.8014 |
Total power consumption (kWh) | 73.5076 | 73.4451 | 73.4451 |
Simulated timespan (ms) | 87792001 | 87792001 | 87792001 |
Number of used VMs | 301 | 301 | 301 |
Other simulators
Workload supported | ||||
---|---|---|---|---|
Simulator | Type | Year Used | Trace file | Synthetic |
CloudSim [5] | Cloud | 2022 | \(\checkmark\) | \(\checkmark\) |
GridSim [7] | Grid | 2022 | \(\checkmark\) | \(\checkmark\) |
iCanCloud [13] | Cloud | 2021 | \(\checkmark\) | \(\checkmark\) |
DISSECT-CF [6] | Cloud | 2021 | \(\checkmark\) | \(\checkmark\) |
WorkflowSim [14] | Cloud | 2021 | \(\checkmark\) | N/A |
CloudSched [15] | Cloud | 2021 | \(\checkmark\) | N/A |
CloudAnalyst [16] | Cloud | 2021 | N/A | \(\checkmark\) |
GreenCloud [17] | Cloud | 2021 | \(\checkmark\) | \(\checkmark\) |
GPUCloudSim [18] | Cloud | 2021 | N/A | \(\checkmark\) |
SimGrid [19] | Grid/Cloud | 2021 | \(\checkmark\) | \(\checkmark\) |
DFaaSCloud [8] | Serverless | 2021 | N/A | \(\checkmark\) |
BigHouse [20] | Cloud | 2021 | \(\checkmark\) | \(\checkmark\) |
simFaaS [10] | Serverless | 2021 | \(\checkmark\) | \(\checkmark\) |
FaaSFSim [9] | Serverless | 2021 | \(\checkmark\) | \(\checkmark\) |
CloudSimSDN [21] | Cloud | 2021 | \(\checkmark\) | N/A |
CEPSim [22] | Cloud | 2021 | N/A | \(\checkmark\) |
OpenDC Serverless [2] | Serverless | 2020 | \(\checkmark\) | \(\checkmark\) |
FederatedCloudSim [23] | Cloud | 2020 | \(\checkmark\) | N/A |
EMUSIM [24] | Cloud | 2019 | N/A | \(\checkmark\) |
CloudReports [25] | Cloud | 2019 | N/A | \(\checkmark\) |
DCSim [26] | Cloud | 2018 | \(\checkmark\) | N/A |
ElasticSim [27] | Cloud | 2017 | \(\checkmark\) | N/A |
Cloud2Sim [28] | Cloud | 2016 | \(\checkmark\) | \(\checkmark\) |
Simulator | 1k | 5k | 10k | 50k | 100k | 500k | 1m |
---|---|---|---|---|---|---|---|
DISSECT-CF | 33.6s | 2.79m | 5.58m | 27.9m | 55.9m | 4.65h | 9.31h |
GridSim | 34.3s | 2.81m | 5.59m | 27.9m | 55.9m | 4.65h | 9.31h |
Difference | \(1.95\%\) | \(0.4\%\) | \(0.2\%\) | \(0.04\%\) | \(0.02\%\) | \(\sim\)0% | \(\sim\)0% |