Introduction
-
Investigate the existing works which aim at comparing and contrasting the various container technologies in a performance-oriented perspective.
-
Explore the features which differentiate the Rkt container runtime from the other runtimes.
-
Implement archetypes to assess the variation in support provided by the Rkt container runtime for applications with high performance requirements.
-
Analyze how the different features impact the performance of computationally challenging tasks.
-
Analyze the impact of the features specific to the Rkt container runtime on the data intensive tasks.
Background and related work
-
LXC LXC is an interface for the linux kernel containment features and allows the users to run multiple isolated images on a single host. The isolation among LXC containers are provided through kernel namespaces. LXC containers uses PID namespace, IPC namespace and file system namespace for virtualizing and isolating PIDs, IPCs and mount points respectively. Network namespace is used to connect the virtual interface in a namespace to the physical interface and supports route based and bridge based configurations. Resource management is done through cgroup. Some other key responsibilities of cgroup are process control, limiting the usage of CPU and isolating containers and processes. LXC has the unprivileged option to create User space containers. It may seem advantageous in some aspects but in other aspects it may create some security issues. Container portability allows the same image to run in different distributions and hardware configurations without much changes. In this regard, LXC provides only partial portability because it will work across only Ubuntu distributions [13]. LXC allows multiple applications in a container.
-
Docker Docker is one of the leading container life cycle management tools. Docker allows to run and manage applications side-by-side in isolated containers. Similar to LXC, Docker also make use of the features of the Linux kernel such as cgroups and kernel namespaces [14].The lightweight nature of the Docker container allows several containers to run in a single server or virtual machine simultaneously [15]. The major limitation of Docker is its security issues, that is, an attacker can easily get superuser privileges. Lack of interoperability is the another limitation of the Docker technology [16].
-
namespace: Docker uses namespace for creating isolation among containers. Following are the types of namespace used by Docker.
-
pid: pid namespace ensures that process in one container does not affect process in a different container.
-
uts: used for kernel version isolation.
-
-
mnt namespace: provides view of own file system and mount point.
-
ipc namespace: provides isolation for interprocess communication.
-
net : network isolation is provided through this namespace.
-
-
Union file system: union file system allows to stack different layers and present it as a single file system. The writeable layer exists only at the top.
-
Control group: resource management or efficient sharing of hardware among containers is allowed.
-
-
Rkt Core OS Rkt is a more secure, interoperable, and open source alternative to Docker. It allows to run multiple isolated images sharing a common kernel space. Rkt provides more security compare to the Docker Containers in various aspects. For example, while downloading an image docker does not ensure any kind of security but rkt does a cross checking of the signature of the publisher of the image [9]. Rkt has different stages.Coreos, host, fly and kvm are the different modes of execution supported by Rkt in Stage0. Coreos and host uses Linux namespaces for isolation such as pid, network and so on. Fly mode is the lighter security mode-this does not have any isolation for network, CPU and memory. SELinux is also not enabled in this mode. KVM mode is the most secure mode, in which the Rkt container will behave like a lightweight virtual machine itself. Sharing of kernels is not permitted in this mode. Several researchers explored the area of virtualization from the past decades onwards, and majority of the works were about virtual machines. Morabito et al. [17] made a comparison between hypervisor based virtualization and lightweight virtualization. They considered KVM as an example of hypervisor-based virtualization and Docker, LXC as the representatives of Lightweight virtualization. Xavier et al. [18] made a similar comparison between virtual machine and different containers. They considered Xen as the VM technology and LXC, Vserver and OpenVZ are the containers. They found that among those containers LXC is the suitable one for HPC environment. Chung et al. [19] made a detailed evaluation about the suitability of Docker in HPC environments and found that Docker is more suitable for data intensive applications. Kozhirbayev et al. [20] made a comparison among the container technologies to find out which performs better in Cloud environments. They made a comparison between Docker and Flockport with reference to the native performance and they found that only I/O intensive operations suffer the impact of a higher overhead for containers. Docker and Flockport doesnt suffer the overheads in terms of memory and processor. They claim that containers reduces the difference between the Infrastructure as service and baremetal systems by providing near native performance. Varma et al. [21] performed an analysis of network overheads when Docker containers are used in Big Data environments. The Hadoop benchmarks were executed in different experimental setups, by varying the number of containers against the number of virtual machines. The networking and latency aspects were considered. The throughput of the network was observed to be inversely proportional to the number of nodes in a virtual machine. The Docker containers were found to offer fair support for big data applications. Chung et al. [22] evaluated the performance of virtual machines and Docker HPC environments where the infrastructure is connected by Infiniband and found that the overall performance of containers is better than the virtual machines. C. de Alfonso et al. explored the practical feasibility of running scientific workloads with high throughput requirements on containers [16]. Clusters of machines running applications in containers were used to provide the requirements of the scientific applications. A middleware receives the user requests and spawns the appropriate number of virtual nodes. The CLUES manager is employed to provide the required elasticity. Distributed computing paradigms such as the Cloud, is the basis of the IT era. Docker containers offer an efficient option to run applications in the Cloud [23]. The Docker containers may be executed on single host or on multiple hosts. When large number of Docker containers are run on multiple hosts, the management of the system becomes tedious, which can be tackled by developing infrastructure solutions enabling the administrators to automate the tasks of management of the system. Several open-source software solutions have been developed for the Docker ecosystem. There is relatively less work focused in the Rkt container in HPC environment, so, we decided to explore the feasibility and performance impacts of this most secure container in HPC environment and to check whether Rkt can meet the challenges of an HPC environment.
-
Stage 0
-
Interacts with the user.
-
Fetches the image and verifies it.
-
Handles image store operations.
-
Image rendering.
-
-
Stage1
-
Pod isolation from others.
-
Relevant networking established.
-
Initialise file systems.
-
-
Stage 2
-
Execution of the user application.
-
-
Experimental setup and benchmarking tools
Parameter | # of values | Values |
---|---|---|
Problems size (N) | 5 | 10000 11000 12000 13000 15000 |
Block size (NBs) | 4 | 100 92 104 98 |
Process grids (P × Q) | 1 | P = 4, Q = 8 |
Threshold | 1 | 16 |
Panel fact | 3 | Right left crout |
Recursive stopping criterium (NBMin) | 1 | 4 |
Recursive panel fact (RFACTs) | 2 | Right left |
Panels in recursion (NDIVs) | 1 | 2 |
Swapping threshold | 1 | 64 |
Computing intensive applications
-
Lower upper factorization of a random matrix.
-
Lower upper factorization used to solve the linear system consisting of the random matrix and a scalar.
-
\(Gflops=ops/(cpu*100000000)\)
-
Varying the problem size (N).
-
Varying block size (NB).
-
Varying the rfact and panel fact.
Data intensive applications
-
Data generation
-
generates edge list.
-
construction of graph from the generated edge list.
-
-
BFS (Breadth First Search) on the constructed graph
-
randomly selects 64 unique search keys whose degree is greater than or equivalent to one.
-
parent array of each key is computed and check whether it is a valid BFS search tree.
-
Results and discussions
Exploring computing intensive applications
-
Rising zone The low problem size wont invoke memory and processor to their maximum performance.
-
Flat zone Problem invokes only processor to its maximum performance.
-
Decaying zone Problem size is too large and cache memory is not large enough to keep all the necessary data, because processor is running in top speed.