Prediction of migration cost and duration
The key goal of our approach is to accurately estimate cloud database migration cost and duration. This would enable different migration options and parameters to be evaluated, therefore supporting decision-making. We measured the achievement of this goal by comparing our predictions against (1) real cloud database migrations and (2) cost calculators from the cloud providers [
48,
49]. The database migrations used a closed-source system from our industrial partner Science Warehouse [
9] and the open-source ERP System Apache OFBiz [
50].
Both databases (Science Warehouse and OFBiz) were migrated twice: from the existing cloud platform to a new cloud platform, then back again. This provided two data points per system for our evaluation. While we used public clouds as the source and target infrastructure, our approach can also be applied to in-house to cloud migrations. The Amazon Database Migration Service was used as middleware to perform both migrations. Additionally, the Science Warehouse migration required a VPN between the two clouds to secure the data during transfer.
The Science Warehouse system is an enterprise procurement system for making purchases in business-to-business scenarios. At the centre of this is a product catalogue which is populated by ‘supplier’ organisations, from which ‘buyer’ organisations can make purchases. The vast majority of users are in the United Kingdom and Ireland, resulting in peak loads during business hours in these countries.
Apache OFBiz (Open For Business) is an ERP system which contains applications for: e-commerce/online shopping, fulfilment of orders, marketing, and warehouse management. Unlike with Science Warehouse we did not have a real instance to use; therefore we populated the system with 200GB of synthetic data to represent the use-case of large online retailer. This was random data which matched the purpose of the column, e.g., 16-digit numbers following the Mastercard and Visa format were inserted into a credit card number column. All database tables were populated.
The Science Warehouse database was migrated while idle and the OFBiz database was migrated with synthetic load applied. This reflects how some of our approach users can perform the migration with the system shutdown. However, many larger systems would be critical to the organisation and this would be impossible. The load represented a user base within a single country, with two daily peaks at approximately 1100 and 1500. A daily total of 1.3 million queries were made; 90% between 0600 and 2000. Two Amazon EC2 instances were used to send these queries to the database server.
Our four migrations are modest in terms of size and cost. Many organisations migrating enterprise system databases will be moving data between database clusters rather than single servers, making the cost more significant. A common reason for database migration is scalability, where database load is reaching capacity. Limited capacity would therefore be available to migrate the database while it is being used. Inducing such large databases and loads was not feasible in this evaluation due to the high costs. However, we expect the accuracy of the experimental results to be similar for larger systems (such as those migrated in [
2,
51], which have similar characteristics).
The Amazon Simple Monthly Calculator [
48] and the Microsoft Azure pricing calculator [
49] are used for each migration to provide a cost baseline, as shown in Table
4. These are often the first tools an organisation will use when planning a cloud migration. However, compared to our approach they have significant limitations. Most notably they require a user to accurately identify the cloud resource they require. As workload information not directly considered by these tools, there is no indication when the selected cloud resources represent over or under provisioning. Furthermore, the Amazon Cost Calculator does not include the Amazon Database Migration Service and can only predict costs for one month.
Table 4
Predicted migration costs
Science Warehouse | AWS → Azure | 38GB | 417 Min (7 Min.) | $32.96 | $264.68 |
| AWS ← Azure | 18GB | 144 Min. (2 Min.) | $26.41 | $408.59 |
Apache OFBiz | AWS ← Azure | 200GB | 1147 Min. (12 Min.) | $12.10 | $821.71 |
| AWS → Azure | 163GB | 901 Min. (8 Min.) | $9.68 | $508.93 |
The lack of support for determining the inputs to the cost calculators can cause an organisation to make coarse-grained estimates based on the size of the existing database servers. We based the calculated costs in Table
4 on running the migration infrastructure for one week. This represents a typical estimate for systems of this size without knowing detailed workload information [
52]. For the calculations, the Science Warehouse database is migrated to a ‘db.m4.2xlarge’ AWS instance and a ‘D4v2’ Azure instance. The OFBiz database is migrated to a ‘db.m4.4xlarge’ and ‘D5v2’ respectively.
The predicted migration cost and duration obtained with our modelling and simulation approach are shown in Table
4 alongside the cost calculator baseline. Each simulation was performed 20 times on a laptop PC with a Intel i5-6200 (dual core, 2.3GHz) and 8GB of RAM. For comparison, the costs for the real migrations we performed are shown in Table
5. These values were obtained using data from Amazon CloudWatch and the Microsoft Azure Active Log, which are services that record the creation and deletion times of each cloud resource.
Table 5
Actual migration costs
Science Warehouse | AWS → Azure | 38GB | 402 Min. | $40.12 |
| AWS ← Azure | 18GB | 147 Min. | $30.75 |
Apache OFBiz | AWS ← Azure | 200GB | 1176 Min. | $13.30 |
| AWS → Azure | 163GB | 888 Min. | $6.96 |
The Science Warehouse outbound migration took 0.3 hours less than our prediction and its “return” migration took 0.15 hours less; a relative error of of 4% and 22%, respectively. In contrast, the outbound OFBiz migration took 1.6 hours longer then predicted and the return migration took 0.8 hours longer; these figures correspond to a small relative error of 8% and 5%, respectively.
The simulation Execution times ranged from 8 minutes for the 18GB simulation (best case) to 144 minutes for the 200GB simulation (worst case). When computing the cost, MigSim rounds the migration duration up to the nearest full hour, in line with the cost model of many cloud providers. Therefore, the cost did not vary between runs, although small differences may arise for larger datasets.
The ‘Total Cost’ columns in Tables
4 and
5 include
Additional Time. As discussed previously in “
Design” section, migrating a database often requires the underlining infrastructure to be running for longer than the data transfer. Common tasks include: set-up, VPN configuration, tear-down, and non-working hours where it is not possible to the delete the cloud infrastructure. MigSim allows such tasks to be accounted for when predicting migration costs. We manually estimated additional time for each migration and input this as a migration parameter (1 hour for Apache OFBiz and 16 hours for Science Warehouse due to security requirements).
A side-effect of database migration is that the target database will consume less storage space and perform better than the source database (despite identical data). This is due to different amounts of data being inserted and removed during routine usage of the source database, creating fragmented free space [
53]. Furthermore, all schema objects are (tables, indexes, etc.) are essentially rebuilt in the target database. We have included the size column to show the storage space consumed by the database before it was migrated. This value was provided as input to our simulation via each system’s workload model.
For each migration (simulated and actual) the databases’ SSD/HDD was matched to its workload, as would be expected for any existing system. However, every SSD/HDD type in AWS or Azure has its own price and performance characteristics which impacts our results. On AWS the Science Warehouse database used magnetic storage which cost of $0.0002 per GB-Hour and a published performance of 100 IOPS. On Azure a ‘P6 Premium Managed Disk’ was used (240 IOPS, $0.001). Each cloud provider has a different way of abstracting from the physical SSD/HDD in the their datacentre and the virtualised storage devices they sell to users. As a result, it challenging to have mirror the performance characteristics on the source and target side of a migration.
The Apache OFBiz database used a provisioned IOPS SSD on AWS (1000 IOPS, $0.0017) and a P20 Premium Managed Disk on Azure (2300 IOPS, $0.0002). Both databases have higher performance levels than for the Science Warehouse system due to the increased size. The impact of this extra performance can be seen in Table
4 and
5 as the migration time is not proportional to size.
Our evaluation compares like-for-like hardware so the performance differences do not affect the results. For example, the AWS to Azure migration of the Science Warehouse database uses magnetic storage (Table
5). The simulation of this migration models the performance of magnetic storage (Table
4).