1 Introduction
-
What type of virtual resources should be rented for a given application?
-
How to efficiently map the components of an application (e.g., web server VMs, a database VMs) to the rented resources?
-
What is the optimal bid price for each resource that allows to fulfill quality of service requirements?
-
a heuristic, called OptiSpot, to jointly solve the bidding and allocation problem, which are in general NP-hard;
-
what is, to our knowledge, the first application in the area of bidding of extended queueing network models that include a model of the operational environment. The latter, which is referred to as random environment model [6], captures the stochastic nature of the operational environment, in which VMs can be lost and restarted as a result of spot price fluctuations and the consequent temporary switch to an on-demand pricing model.
-
the use of advanced fluid analysis techniques to accurately approximate response time percentiles, which are commonly used to constraint performance in service-level agreements, but which are usually hard to compute in queueing networks. Compared to more complex approximations for accurate percentile assessment, such as Laplace transforms, this method is fast enough for run-time application.
2 Motivating example
-
number of CPUs, since having multiple CPUs does not always correspond to a proportional increase in the system throughput;
-
load balancing, since balancing the load among multiple VMs does not always correspond to a proportional increase in the system throughput with respect to using a single resource of the same type;
-
availability, since a spot instance has a possibility to be lost and become unavailable for some time.
3 System model and problem statement
3.1 System model
3.1.1 Application
3.1.2 Resources
3.2 Decision variables and problem statement
-
\(t=[t_y]\), \(1\le y\le Y\). Resource assignment vector: this is a vector that assigns a resource y to a resource type \(t_y \in \mathbb {N}_{[0,R]}\). From this parameter we also define:
-
\(\hat{\lambda }_y:=\lambda ({t_y})\): rate of the resource y.
-
\(\hat{c}_y=c({t_y})\): price of the resource y.
-
-
\(D=[d_{m,y}]\), \(1\le m\le M\), \(1\le y\le Y\). Allocation matrix: where \(d_{m,y} \in \mathbb {R}_{[ 0,+\infty )}\) assigns part of the rate \(\lambda \) of each resource y to each software server m.
4 OptiSpot heuristic
4.1 General idea
4.2 Finding the optimal rate for each software server
4.3 Finding the real resources to rent
4.4 Finding the allocation of the rate for each software server to the real resources
4.5 System analysis and scaling-up of the bottleneck server
4.6 Convergence of the approach
5 Bid price and random environment
5.1 Determining the parameterized resource model
-
overbidTime(b) mean time before the bid price b is overbid (i.e., an active spot instance is reclaimed by the cloud provider).
-
underbidTime(b) mean time before the bid price b is underbid (i.e., a previously reclaimed spot instance is available again).
-
spotCost(b) average cost for an active spot instance, when bidding b. The difference between this cost and c(r) is that the former only considers the use of spot instances, while the latter considers the possibility for a spot instance to become an on-demand instance when the bid price is overbid, which is the actual cost incurred by the user.
-
A(b) expected availability of the resource when bidding b. This function estimates the percentage of time the resource is able to process requests.
termNoticeTime
| Time between a spot instance is overbid and its termination (this is the advance termination notice service offered by Amazon EC2) |
odStartupTime
| Time needed to start this resource in on-demand mode |
spotStartupTime
| Time needed to start this resource in spot mode |
-
State 1 The resource is available as a spot instance. Spot price is paid.
-
Transition 1 to 2 The bid price has been overbid, the spot instance receives a termination notice, and an on-demand instance is scheduled to start.
-
State 2 The resource is available as a spot instance (although it has received the termination notice) and an on-demand instance is starting. Both spot and on-demand prices are paid.
-
Transition 2 to 3 The termination notice is expired and therefore the spot instance is no longer available.
-
State 3 The resource is not available, it is being started as an on-demand instance, but it is not ready yet to process requests. On-demand price is paid.
-
Transition 3 to 4 The on-demand instance is now ready to receive requests. This transition can happen instantaneously in the case the time needed to start the on-demand VM (odStartupTime) is not higher than the termination notice time (termNoticeTime). To avoid the possibility of having a non-positive period in the CTMC for this transition, we force a lower bound equal to eps (the smallest positive number that can be represented).
-
State 4 The resource is available as an on-demand instance. On-demand price is paid.
-
Transition 4 to 5 The bid price has been underbid, so it is possible to start a spot instance again.
-
State 5 The resource is available as an on-demand instance, although a spot instance is currently starting. Both spot price and on-demand price are paid.
-
Transition 5 to 1 The spot instance is now ready to receive requests and the on-demand instance is terminated.
-
Transition 5 to 4 The spot instance has been overbid before being fully started. So it is immediately terminated since an on-demand instance is still active.
5.2 Determining the bid price
5.3 Determining the random environment
6 Evaluation setting
6.1 Hardware and software
6.2 Application model
-
dialog step requests: process and update data on the client-side through the graphical user interface;
-
update requests: higher priority asynchronous update requests that may be triggered by a dialog step request;
-
update2 requests: lower priority asynchronous update requests that may be triggered by a dialog step request.
-
\(N_\textit{dia}\) (number of users that issue dialog step requests) is arbitrary, but \(N_\textit{upd}\) (number of users that issue update requests) and \(N_\textit{upd2}\) (number of users that issue update2 requests) are assumed dependent on it, as explained in Sect. 3.4.1 of [24]. Therefore we consider \(N_\textit{upd}=0.2652 \times N_\textit{dia}\) and \(N_\textit{upd2}=0.06657 \times N_\textit{dia}\).
-
\(\sigma _k\) is 0.0001 for all service classes, since we assume an average think time of 10 s for each class of users in the system.
Server/class | Service demand (ms) | Service rate \(\mu _{m,k}\) (req/ms) |
---|---|---|
AS dialog step | 119.82 | 0.008346 |
AS update1 | 47.92 | 0.02087 |
AS update2 | 32.98 | 0.03032 |
DB dialog step | 4.541 | 0.2202 |
DB update1 | 1.205 | 0.8299 |
DB update2 | 0.3043 | 3.286 |
6.3 Resource model
-
Linux instances have 100 % availability This happens because the termination notice is higher than the time needed to start an on-demand resource, therefore the resource will never be in the unavailable state (state 3 of the CTMC in Fig. 8).
-
Some resources have an infinite overbid time This happens when the current bid price has never been overbid in historical traces. This also results in a 100% availability since the spot instance is assumed to never terminate.
-
Some resources have a bid price that is slightly higher than the on-demand price This is intentional since we want to avoid the situation in which a resource switches too often between spot mode and on-demand mode, which would cause a decrease in the availability and consequently in the amount of processed requests per price paid.
-
The availability level is quite high for all the resources This is a result of the method we used to calculate the optimal bid price, which tries to find the best trade-off between the actual price paid and the availability. Another reason for the high value is the fact that the time during which a resource is unavailable is less than the time needed to start the new on-demand resource since the new on-demand resource is started proactively after the spot instance termination notice from Amazon EC2 is received.
Parameter | Linux VMs (s) | Windows VMs (s) |
---|---|---|
odStartupTime
| 97 | 810 |
spotStartupTime
| 557 | 1270 |
termNoticeTime
| 120 | 120 |
Resource | Rate ECU | CPUs | On-dem. price ($/h) | Bid price ($/h) | Actual price ($/h) | overbid Time (h) | underbid Time (h) | Avail. (%) |
---|---|---|---|---|---|---|---|---|
r
|
\(\lambda (r)\)
|
q(r) |
o(r) |
b(r) |
c(r) |
A(r) | ||
m3.medium (us-east/Linux) | 3 | 1 | 0.067 | 0.0704 | 0.0104 | 4416 | 0.0814 | 100 |
m3.large (us-east/Linux) | 6.5 | 2 | 0.133 | 0.1463 | 0.0235 | 155 | 0.4408 | 100 |
m3.xlarge (us-east/Linux) | 13 | 4 | 0.266 | 0.266 | 0.0391 | 252 | 0.1901 | 100 |
m3.2xlarge (us-east/Linux) | 26 | 8 | 0.532 | 0.532 | 0.1029 |
\(\inf \)
| 0 | 100 |
m4.large (us-east/Linux) | 6.5 | 2 | 0.12 | 0.132 | 0.0278 | 65 | 0.8569 | 100 |
m4.xlarge (us-east/Linux) | 13 | 4 | 0.239 | 0.2629 | 0.0438 | 145 | 4.3378 | 100 |
m4.2xlarge (us-east/Linux) | 26 | 8 | 0.479 | 0.5269 | 0.0984 |
\(\inf \)
| 0 | 100 |
m3.medium (us-east/Windows) | 3 | 1 | 0.13 | 0.13 | 0.0591 |
\(\inf \)
| 0 | 100 |
m3.large (us-east/Windows) | 6.5 | 2 | 0.259 | 0.272 | 0.1176 | 420 | 0.1726 | 99.95 |
m3.xlarge (us-east/Windows) | 13 | 4 | 0.518 | 0.518 | 0.1336 | 1262 | 0.0787 | 99.98 |
m3.2xlarge (us-east/Windows) | 26 | 8 | 1.036 | 1.036 | 0.2767 | 2208 | 0.1076 | 99.99 |
m4.large (us-east/Windows) | 6.5 | 2 | 0.246 | 0.2583 | 0.1409 | 140 | 0.0956 | 99.86 |
m4.xlarge (us-east/Windows) | 13 | 4 | 0.491 | 0.5156 | 0.28 | 676 | 3.2325 | 99.97 |
m4.2xlarge (us-east/Windows) | 26 | 8 | 0.983 | 1.0322 | 0.5559 |
\(\inf \)
| 0 | 100 |
m3.medium (eu-west/Windows) | 3 | 1 | 0.129 | 0.129 | 0.0722 | 3312 | 0.0711 | 99.99 |
m3.large (eu-west/Windows) | 6.5 | 2 | 0.258 | 0.258 | 0.1432 | 2208 | 0.176 | 99.99 |
m3.xlarge (eu-west/Windows) | 13 | 4 | 0.517 | 0.5429 | 0.2288 | 471 | 1.9457 | 99.96 |
m3.2xlarge (eu-west/Windows) | 26 | 8 | 1.033 | 1.0846 | 0.5436 |
\(\inf \)
| 0 | 100 |
m4.large (eu-west/Windows) | 6.5 | 2 | 0.244 | 0.2562 | 0.1269 | 662 | 0.5042 | 99.97 |
m4.xlarge (eu-west/Windows) | 13 | 4 | 0.488 | 0.488 | 0.2561 | 212 | 2.0831 | 99.91 |
m4.2xlarge (eu-west/Windows) | 26 | 8 | 0.976 | 0.976 | 0.5061 | 2207 | 1.6699 | 99.99 |