Concept-only solutions
A large number of academic research papers are focused on the Cloud-to-Things orchestration aspects. It is not possible to cover all such papers individually and therefore, we shortlisted 10 papers from this category for review in detail, where other papers are briefly introduced. Amongst the 10 papers, half of them are the most highly cited papers of all time so far and the rest of them are all papers published in 2019 and onwards. It is important to note that these solutions are theoretical with no or just proof-of-concept implementation. The rest of this section discusses these solutions, where Table
1 further presents a complete summary of the reviewed solutions in light of the attributes from the taxonomy.
ENORM [
66], a framework for edge node resource management, aimed to address the following three problems: (1) Edge node provisioning, (2) Workload deployment on edge with a focus on how to deploy and what services to deploy, and (3) resource management at the edge. ENORM follows a decentralised architecture, where edge nodes are responsible for their own resource management decisions. However, the overall architecture is static in nature, as all edge nodes are known in advance to the cloud servers’ managers running in the cloud. The focus of ENORM is mostly on the operations of edge nodes, where it supports provisioning, monitoring, vertical scaling, and offloading applications. However, the details related to the cloud layer are not known, e.g., how the cloud server managers that are responsible for different applications are provisioned and maintained.
Fernandez et al. [
67] introduced slice orchestrator, which facilitates the automated orchestration of IoT services based on certain specific operational (and/or business) requirements over a set of shared infrastructures. Their idea is based on the 5G concept called network slice, which is an end-to-end logical network, capable of providing an agreed quality of service for a defined customer’s purpose [
68]. Based on this notion, an IoT slice would be a partition of the entire end-to-end IoT solution created to serve a specific (or a group of) customer(s). The job of the slice orchestrator is to establish network slices, set up edge and cloud tenants, and the deployment of IoT functions as per the specific requirements related to resources in terms of computing, storage, network, and target locations (e.g., edge and/or, cloud, and transport network). This solution followed a hybrid architecture, where a centralised slice orchestrator creates and manages slices but also relies on other domain-specific resource orchestrators (e.g., a different cloud orchestrator is responsible for a specific cloud environment) to perform key resource management functions such as resource selection and deployment. However, no details are provided regarding resource provisioning by domain orchestration, run-time reconfiguration aspects and the requirement specification that will be given as input to the system.
Alam et al. [
69] introduced a 3-layered reference architecture that makes use of Docker as the underlying orchestration tool for the automated deployment of microservices as containers. Their system follows a centralised model, where key functions like monitoring, adaptation, and orchestration take place at the cloud layer. Their Fog layer is mainly used as a gateway to mediate between the cloud and edge layers for system-specific operations (e.g., to update the status of connected edge devices) or application-specific operations (e.g., data transformation). This system is mostly suitable for publish-subscribe based IoT applications. Similarly to Fernandez et al., [
67], no details of various important functions, such as device connectivity at the edge level, resource provisioning at different layers, and run-time reconfiguration, are provided. However, different to others, they include a data mining component, which is responsible for erroneous behaviour detection such as responsiveness of deployed components, and edge devices’ statuses. Hence, their system is adaptable in case of any failures.
Santos et al. [
70] focused on the optimal application placement problem in smart city applications while considering the reduction in network bandwidth usage and improved latency. Their proposal extends the ETSI NFV MANO architecture [
71] with additional functions of monitoring and data analysis. Their system follows a hybrid approach where management and decision-making related to the various functions happen at the cloud layer by cloud node (CN) and at the local layer by fog nodes (FNs). CN is responsible for the global view of the system including operations like coordination and control of FNs, global level data analysis and monitoring of the overall SLA. Each FN on the other hand has its own orchestrator and is responsible for autonomously managing its own local infrastructure, associated devices, and the life-cycle of microservices, as well as interfacing with the modules for resource discovery, system monitoring, data analysis, security, machine to machine communication, and decision making related to application life cycle and related policies. However, no details are provided in relation to these policies, their structure, or how they will be passed on to the system. This solution provides both GUI and API access to facilitate application owners managing and controlling FNs (and CN) independently and to perform manual updates if required. Lastly, a fog protocol based on the existing Open Shortest Path First (OSPF) routing protocol [
72] has been proposed to enable and exchange communication between fog and edge layers. Details on edge device management, application description and run-time reconfiguration are missing.
Foggy [
73] framework, similarly to Santos et al. [
70], aimed to minimise latency and perform optimal application placement. Foggy follows a centralised model. It consists of an orchestration server (OS)—a central entity responsible for deployment and resource management decisions—and an orchestration client (OC)—running on each computational resource and is responsible for enforcing deployment decisions. Overall, Foggy offers the following unique characteristics in contrast to other solutions discussed in this category: 1) To facilitate an automated build, a direct integration of a version control system (such as Github) and continuous integration process as part of their system architecture; 2) A pluggable policy-driven deployment planner that dynamically identifies suitable resources based on user provided requirements; 3) A JSON based container specification to facilitate application owners to provide service requirements using qualitative constructs such as Low, Medium, and High. However, it is not clear how these qualitative specifications for different aspects, such as computation and latency, are mapped within the system. Similarly to others, Foggy also does not cover details related to edge device registration, standardised application description and run-time reconfiguration.
Castellano et al. [
74] solution follows a distributed approach where a dedicated instance of a service-defined orchestrator (SDO) is initiated every time a new application is deployed. The input to the system is an application deployment request that mainly consists of a list of components, their topology and a set of declarative statements to form the Orchestration Behaviour Model (OBM) that drives the orchestration functions. The OBM features aspects, such as infrastructure and/or application state, required objectives to be optimised, the events and the corresponding actions to be performed. Using the OBM, every SDO instance aims to make optimal decisions with respect to the managed application. However, this also raises the resource allocation issue for different instances of SDOs at the shared infrastructure level when resources are limited. To cope with this, Castellano et al. [
74] introduced Dragon—an additional component responsible for the optimal partitioning of the underlying shared resources across different SDOs. Using Dragon, the SDO can decide to terminate an application component if it cannot allocate the required resources to that particular component. Their proposed declarative statementsbased application description approach, however, is specific to this solution only and does not follow a standardised approach.
HYDRA [
75], similar to Castellano et al. [
74] also follows a decentralised architecture, where a set of distributed nodes without the presence of a centralised entity are responsible for performing the orchestration functions of one application. HYDRA actually builds a peer-to-peer (P2P) overlay network of computational nodes, where every node serves both as an orchestrator as well as a computational resource—responsible for running the application micro-services. HYDRA supports both location-agnostic as well as location-aware application deployment with a primary focus on the overall scalability and resilience aspects of the underlying resource infrastructure through its decentralised architecture. This has been achieved through the adoption of a dynamic partitioning scheme, where orchestrator nodes operate independently to control the needs on a per-application basis.
Caravela [
76] follows a similar decentralised model, where all key aspects such as the overall architecture, resource discovery and scheduling are also based on the concept of a P2P overlay network. However, different for HYDRA, it follows a market-oriented approach, where volunteer resources can join the ecosystem and get rewarded for their services. Caravela dynamically builds edge cloud from the volunteer resources that are further used to deploy applications using Docker containers. Caravela’s scope, however, is only limited to non-cloud layers and does not include resource provisioning from cloud.
Mathias et al. [
77] solution consists of a Fog Orchestrator (FO)—a central entity responsible for maintaining a resource catalogue of fog nodes, overall service management, global level monitoring, and orchestration—and an agent component called Fog Orchestrator Agent (FOA) that runs on every fog node and is responsible for activities such as management of connected edge devices, security and monitoring. The working mechanism of this solution suggests that FO composes a TOSCA-based orchestration template using information obtained from a resource catalogue and monitoring components. This template is further used for deployment and run-time management. Such usage of TOSCA for expressing orchestration strategies is common and has been used by many solutions such as [
11,
22,
78,
79]. However, in this case, the TOSCA template is dynamically generated by the system and, therefore, it is not clear what the initial input to the system is. A unique prospect of this solution, in contrast to others discussed in this category, is that the FOA can also act as FO if the connection is lost between them. However, this behaviour is static and the specific fog node has to specify this at the time of joining. Furthermore, the scope of the overall solution only includes the non-cloud layers.
Hetero-Edge [
80] follows a similar concept to that of Mathias et al. [
77], where a central entity has been used to handle the orchestration functions at the non-cloud (edge) layer only. The solution, however, is specific to computer vision applications and relies on the use of Apache Storm (or something similar, such as Apache Flink). Hetero-Edge breaks down an application into smaller Apache Storm tasks and then efficiently maps them onto the connected edge nodes with the objective of minimising the overall end-to-end latency. The specification of tasks is provided through a directed acyclic graph, where the mapping is performed using their custom-developed task scheduler that takes into account the estimated performance and resource demands of tasks. The solutions proposed by Donassolo et al. [
81] also follow a similar model, which supports orchestration at the non-cloud layers, however, with a particular focus on optimising the provisioning cost of IoT applications.
Some other more recent notable contributions include GeneSIS [
82], which proposed a model-driven approach to automate the deployment of different kinds of deployable artefacts including binary, ThingML-based [
83], and container; ECCO [
84], which proposed an orchestration framework for enabling the collective use of edge-cloud resources for road context assessment; KubeHICE [
85], which took on the challenge of addressing hardware heterogeneity by automatically matching the right computational device that is compatible with the instruction set architecture (ISA) supported by the containerized application; and Gand et al. [
86] and Sonmez et al. [
87], which focused on the presence and importance of uncertainty in the cloud-to edge environment and therefore adopted a fuzzy logic-based approach for workload deployment.
In addition to the above-mentioned solutions, there are also some research works that did not directly cover the core orchestration functions, however, they emphasised the importance of other related aspects. For example, the authors in [
88,
89] introduced the notion of trusted orchestration, where the proposed approach aimed at identifying and tracking orchestration activities to improve trust across the involved actors of the system. Similarly, Kochovski et al. [
90] proposed a smart contract (SC) based architecture for SLA management and verification amongst relevant entities and actors of a decentralised environment. More recent works on the DRL-based advanced techniques for dynamic load balancing [
91] and network dynamic clustering [
92] in edge computing focused on the overall optimisation of cloud-to-edge system. Such solutions can be integrated into distributed orchestration solutions to support self-organisation and optimisation behaviours. Lastly, with the growing popularity of Deep Learning (DL) applications, there is also an increasing interest in proposing resource management solutions that are specifically tailored to DL applications. For example, FlowCon [
93] monitors the execution of DL jobs at run-time to make informed resource allocation and placement decisions. Similarly, SpeCon [
94] is a container scheduler that aims to optimise resource usage and improves the performance of DL training jobs, whereas DQoES [
95] aims at dynamically adjusting cloud resources to meet the target Quality of Experience (QoE) specified by the clients. The scope of all the aforementioned DL-tailored solutions, however, only includes the cloud layer. The details of these papers do not directly fall within the scope of the proposed taxonomy and therefore have not been included here in larger details.
Table 1
Comparative summary of the concept-only orchestration solutions
Cloud resource handling | Environment | Single cloud | \(\checkmark\) | | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | | | | |
Multi-cloud | | \(\checkmark\) | | \(\checkmark\) | | | \(\checkmark\) | | | |
Cross-cloud | | | | | | | | | | |
Resource types | Compute | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | |
Storage | | \(\checkmark\) | | | | \(\checkmark\) | | \(\checkmark\) | | |
Network | | \(\checkmark\) | | | | \(\checkmark\) | | \(\checkmark\) | | |
Resource selection | Statically defined | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | | | | | |
Automatic selection | | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | |
Run-time optimised | | | | | \(\checkmark\) | \(\checkmark\) | | | | |
Fog/Edge resource handling | Connectivity | Manual registration | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | | \(\checkmark\) |
Automatic registration | | | | \(\checkmark\) | | | | \(\checkmark\) | \(\checkmark\) | |
Others | Heterogeneity | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Auto reconnectivity | | | | | | | | \(\checkmark\) | | |
Resource discovery | | | | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | |
Orchestration functionalities | Service/Job Handling | Virt support | VM | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | | | |
Containerisation | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Mapping | Static | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | | \(\checkmark\) | \(\checkmark\) | |
Context aware | | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | | \(\checkmark\) |
Run-time reconfiguration | Definition type | Statically pre-defined | \(\checkmark\) | | | | | | | | \(\checkmark\) | |
User-defined dynamic | | | | | | \(\checkmark\) | | | | |
Operating type | Reactive | \(\checkmark\) | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | | |
Proactive | | | | \(\checkmark\) | | | | | | |
Hybrid | | | | | | | | | | |
Scaling | Horizontal | | | | | | | \(\checkmark\) | | | |
Vertical | \(\checkmark\) | | | | | | | | | |
Hybrid | | | | | | \(\checkmark\) | | | \(\checkmark\) | |
Offloading | Cloud-to-Edge | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | | | | | | |
Edge-to-Cloud | \(\checkmark\) | | \(\checkmark\) | | | | | | | |
Edge-to-Edge | | | | \(\checkmark\) | | | | | | |
Monitoring | Support level | Cloud | | | \(\checkmark\) | | | | | | | |
Edge | \(\checkmark\) | | | | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Cloud-to-Edge | | | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | | |
Metrics support | System | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Application | \(\checkmark\) | | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | | | | \(\checkmark\) |
Custom | | | | | | | | | | |
Security handling | Configurable app level | \(\checkmark\) | | | | \(\checkmark\) | | | | | |
Sys wide inter-comp | | | | \(\checkmark\) | | | | | | |
Edge authentication | | | | | | | | | \(\checkmark\) | |
Access control | | | | | | | \(\checkmark\) | | | |
Others | Fault diagnosis | | | \(\checkmark\) | \(\checkmark\) | | | | | | |
SLA Handling | \(\sim\) | | | | | \(\sim\) | | | | |
Design | Architecture | Centralised | | | \(\checkmark\) | | \(\checkmark\) | | | | \(\checkmark\) | \(\checkmark\) |
Decentralised | \(\checkmark\) | | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | |
Hybrid | | \(\checkmark\) | | \(\checkmark\) | | | | | | |
App description | Solution independent | | | | | | | | | \(\checkmark\) | \(\checkmark\) |
Solution specific | | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | |
Extensibility | Resources | | | | | | | | | | |
Functionalities | | | | | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | | |
User interface | GUI (Web/Desktop) | | | | \(\checkmark\) | | | | | | |
CLI | \(\checkmark\) | \(\checkmark\) | | | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
API (Service/Library) | | | | \(\checkmark\) | | | | \(\checkmark\) | | |
Supported App types | G | G | PS | SC | G | DS | G | G | G | CV |
Production ready solutions
SODALITE@RT [
79] supports the deployment and management of applications across a cloud-to-edge infrastructure in a portable manner. The term “portable” is based on their use of TOSCA as the deployment model to represent application components and resources; and the use of the Infrastructure as Code (IaC) concept [
96] to implement the life-cycle operations of components, for which they utilised Ansible [
97]. SODALITE@RT follows a centralised model, where a central component called a meta-orchestrator receives TOSCA-based deployment models and Ansible implementation scripts to set up the resources and to perform the deployment. The Ansible scripts are cloud provider specific that the orchestrator pulls from an IaC repository. Such an approach enables custom implementation, however, also burdens application developers with the production of Ansible scripts in comparison with other TOSCA-based solutions such as [
22,
78], where the TOSCA model is the only input. The Ansible scripts take care of the cloud resources handling, where the edge resources are handled as part of a Kubernetes cluster. However, details on edge cluster formation are not provided and therefore it is not clear whether a meta-orchestrator creates the edge cluster or it must exist prior to the deployment process. SODALITE@RT also provides an event-condition-action-based policy language to support custom redeployment policies. Furthermore, it also supports access control and mechanisms for secure storage of application secrets. However, no mechanism for application-level security configurations is provided.
Capillary [
98] focused on the use of a custom-built monitoring system to measure QoS parameters and Offloading across different resource layers based on various user-defined characteristics, including geographic positioning. The offloading decisions follow an “offload to next immediate layer” model (e.g., edge to fog or fog to cloud) that resembles the capillary fluid movement, hence the name Capillary. It follows a centralised approach, where a central entity called Capillary container orchestrator performs the deployment and offloading operations. The input to the system is a TOSCA deployment model that includes various details, such as resource capacity requirements for services, zone details, and constraints on QoS thresholds that are used for reconfiguration purposes. At run-time, the monitoring system raises alarms based on the developer-provided thresholds. As a result, a sub-component of the orchestrator, similar to SODALITE@RT [
79], takes an offloading decision, changes the TOSCA model and triggers re-deployment. For resource handling, the cloud resources are dynamically provisioned by the orchestrator based on the user-provided minimum requirements for the service. However, no details on the provisioning of the fog and edge infrastructure are provided.
MiCADO-Edge [
22] is also a centralised solution, where a central entity called MiCADO-Master is responsible for the automated deployment and management of a microservices-based application across the cloud-to-edge continuum using a single TOSCA based deployment model. This model consists of details related to computational resources, component specification, application topology, service placement mapping, user-defined scaling policies and any application-specific security settings. The key focus in MiCADO-Edge is on generalising the resources across the different layers of the continuum by facilitating a mechanism to allow the resources from fog and edge layers (referred to as non-cloud resources) to join a centralised cluster prior to the application deployment process. Once they become part of the MiCADO cluster, developers can reference them in the TOSCA-based deployment model to define placement and reconfiguration policies. Furthermore, MiCADO-Edge empowers application developers to write custom dynamic scaling policies based on a wide range of application and system metrics. MiCADO-Edge, however, currently lacks support for context-based placement of application services and developers are required to provide static mapping between services and resources.
PrEstoCloud [
99,
100] follows a similar model of a TOSCA-based orchestration solution. However, it provides an optimisation step before deployment. This step consists of receiving TOSCA in a high-level form (type level TOSCA model as they referred to it), which also contains optimisation criteria independent from the underlying infrastructure resources. Based on the provided criteria, the solution automatically produces a more specific instance-level TOSCA deployment model containing the specific resources across the infrastructure that are to be used for application deployment. Hence, it provides an optimised placement mechanism. Furthermore, PrEstoCloud also focused on facilitating predictive reconfiguration based on the changing data stream conditions considering data-intensive applications.
mF2C [
101] adopted an N-layered approach to utilise the available resources in the continuum from edge (Layer-N) to the cloud (Layer-0), in contrast to the two-layered (i.e., cloud and non-cloud) approach followed by MiCADO-Edge [
22] and the typical three-layered approach as followed in [
98]. Their proposed solution is decentralised, where deployed mF2C agents, coordinate with each other to find suitable resources, closer to the edge, for the execution of application services. The input to the system, e.g., a service execution request is received by the mF2C agent at the lower layer. The receiving agent, in coordination with other agents at the same layer, aims to execute the service request if the required resource specification can be fulfilled. Otherwise, the request is further passed on to the mF2C agents at the upper layer. The service execution request is in JSON format that includes the required resource specification used by the mF2C agents to make deployment decisions. In terms of resource handling, the mF2C architecture supports the automatic discovery of other mF2C agents, the dynamic formation of clusters, and also reconfiguration in case of device mobility prospects. However, it does not address aspects like scaling, offloading, dynamic provisioning of resources, and configurable policies.
DECENTER [
102] is specifically developed for transforming construction sites into smart and safe environments. Hence, this solution facilitates methods that are specifically tailored to the problems related to construction processes. The unique feature of DECENTER, amongst other solutions in this category, is the Blockchain-based resource brokerage mechanism, which facilitates the trusted brokerage and negotiation of computational resources that can be used for deployment. Furthermore, all transactions of the system are traceable and can be formally verified. Hence, improving the trust and transparency of the overall system. DECENTER follows a centralised architecture, where four key components of the system including Application composer, QoS-aware decision maker, Monitoring system, and Orchestrator are responsible for performing the key orchestration functions. It also facilitates users with a GUI interface to select the services they want to use and define their QoS objectives. These details, along with the monitoring data, are used by the QoS-aware decision maker to perform deployment decisions that are forwarded to the orchestrator. DECENTER also supports automatic redeployment, when the system encounters violation of QoS specifications. However, it lacks functions like dynamic auto-scaling and offloading.
Pledger [
103] also makes use of Blockchain to improve trust, secure communication and enable ad-hoc networks between the resources of non-cloud layers to collaborate with each other for the execution of a specific application. Although Pledger’s overall architecture follows a centralised model, its implementation does not comply with a traditional adapter-based interaction model between different parts of the system. Pledger provides different tool-kits for resource providers to integrate their resources into the Pledger ecosystem and for the application owners to perform mapping of their applications on specific resources, which is further assessed and reconfigured by the core Pledger service to ensure optimised use.
Rainbow [
104] particularly focused on the issue of lack of handling concerning the fog-specific constraints related to the deployed services. For this purposes, their proposed solution consists of a high-level abstraction mechanism, where application topology and the related constraints on services are described through a graph. The Rainbow orchestration system accepts the graph as input and deals with the optimised placement of the services and the execution thereafter. The orchestration system follows a decentralised model, where different components of the system may run on the different computational nodes that are part of the Rainbow ecosystem. To address the various challenges of the fog environment (such as low-powered devices, intermittent connectivity, and the interactions of sub-components), the system follows a publish-subscribe mechanism where a component called Orchestrator Repository maintains the states of the system and its sub-components. The Rainbow platform facilitates the dynamic registration of edge devices and their reconfiguration as per defined service level objectives (SLO) violations. However, its scope is only limited to fog/edge resources and lacks the dynamic provisioning of cloud resources.
Table 2
Comparative summary of research projects based orchestration solutions
Cloud resource handling | Environment | Single cloud | | \(\checkmark\) | | | | | | | \(\checkmark\) |
Multi-cloud | | | \(\checkmark\) | \(\checkmark\) | | | | \(\checkmark\) | |
Cross-cloud | \(\checkmark\) | | | | \(\checkmark\) | \(\checkmark\) | | | |
Resource types | Compute | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Storage | | | | | | | | | \(\checkmark\) |
Network | | | | | | | | | |
Resource selection | Statically defined | \(\checkmark\) | | | \(\checkmark\) | | | | \(\checkmark\) | \(\checkmark\) |
Automatic selection | | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | |
Run-time optimised | | | | | \(\checkmark\) | | | \(\checkmark\) | |
Fog/Edge resource handling | Connectivity | Manual registration | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | | | \(\checkmark\) | |
Automatic registration | | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Others | Heterogeneity | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Auto reconnectivity | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | | \(\checkmark\) |
Resource discovery | | | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) |
Orchestration functionalities | Service/Job Handling | Virt support | VM | | | | \(\checkmark\) | | | | | |
Containerisation | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Mapping | Static | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | | | | \(\checkmark\) | |
Context aware | \(\sim\) | \(\checkmark\) | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) |
Run-time reconfiguration | Definition type | Statically pre-defined | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | | |
User-defined dynamic | \(\checkmark\) | | | \(\checkmark\) | \(\checkmark\) | | | \(\checkmark\) | \(\checkmark\) |
Operating type | Reactive | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Proactive | | | | | \(\checkmark\) | | | | |
Hybrid | | | | | | | | | |
Scaling | Horizontal | | | | \(\checkmark\) | \(\checkmark\) | | | | |
Vertical | | | | | \(\checkmark\) | | | | |
Hybrid | | | | | | | \(\checkmark\) | \(\checkmark\) | |
Offloading | Cloud-to-Edge | | \(\checkmark\) | | | | | | \(\checkmark\) | \(\checkmark\) |
Edge-to-Cloud | | \(\checkmark\) | | | | | | \(\checkmark\) | \(\checkmark\) |
Edge-to-Edge | \(\checkmark\) | | | | \(\checkmark\) | | \(\checkmark\) | | \(\checkmark\) |
Monitoring | Support level | Cloud | | | | | | | \(\checkmark\) | | |
Edge | | | | | | | \(\checkmark\) | | |
Cloud-to-Edge | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) |
Metrics support | System | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Application | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Custom | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | | | \(\checkmark\) | |
Security handling | Configurable app level | | | | \(\checkmark\) | | | | | |
Sys wide inter-comp | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | | \(\checkmark\) | |
Edge authentication | | | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | |
Access control | \(\checkmark\) | | | | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | |
Others | Fault diagnosis | | | | \(\checkmark\) | \(\checkmark\) | | | | |
SLA Handling | | | \(\sim\) | | \(\sim\) | \(\sim\) | \(\sim\) | \(\sim\) | |
Design | Architecture | Centralised | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | |
Decentralised | | | \(\checkmark\) | | | | \(\checkmark\) | | \(\checkmark\) |
Hybrid | | | | | | | | | |
App description | Solution independent | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | \(\checkmark\) | |
Solution specific | | | \(\checkmark\) | | | | \(\checkmark\) | | |
Extensibility | Resources | \(\checkmark\) | | | \(\checkmark\) | \(\checkmark\) | | | \(\checkmark\) | |
Functionalities | | | | \(\checkmark\) | \(\checkmark\) | | | | |
User interface | GUI (Web/Desktop) | \(\checkmark\) | | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | |
CLI | | | | | | | \(\checkmark\) | | |
API (Service/Library) | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | | | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Supported App types | G | DI | G | G | DI | SC | G | G | G |
Slack4things [
105] is an open-source initiative developed by the Mobile and Distributed Systems Lab (MDSLab) at the University of Messina, Italy. This project aimed to provide an OpenStack-based IoT framework for managing IoT devices seamlessly, i.e., without considering their physical location, network configuration, and underlying technology. The tools from this project are further extended by Merlino et al. [
106] to build a distributed orchestration solution based on a three-layer architecture that covers cloud-fog-edge, and supporting both horizontal and vertical task offloading. With the former, tasks can be migrated within the same layer, e.g., from one edge device to another; with vertical offloading, tasks can move across different layers, e.g., from edge to fog, or from fog to cloud. Unlike other systems, this solution is based on independent managers deployed at each layer of the architecture; hence, applications can be directly deployed, partly or as a whole, to any layer through the provided managers.
Beyond the projects introduced above, there are some relevant EU research initiatives that have just recently started; however, at the time of review, we were not able to find any reported results from these initiatives therefore, we only briefly review them below for the purpose of completeness. The
European Cloud, Edge and IoT (CEI) Continuum [
107] is an umbrella initiative that provides the strategic guidance and next stage of tech development to achieve the goals of an active and dynamic European CEI ecosystem, with an emphasis on promoting the establishment of a global and open ecosystem for the Cloud-Edge-IoT technologies. The initiative coordinates across clusters of Research and Innovation Actions to support industries and researchers in creating impact, promoting the link between open source and open standards, and engaging relevant industrial alliances in actions directed toward open approaches. Among such clusters, the Meta-Operating Systems for the Next Generation IoT and Edge Computing (MetaOS) [
108] are relevant for this review paper, and include projects such as AerOS, FluiDOS, ICOS, NebulOus, NEMO and NEPHELE. Likewise, the cluster AI-enabled computing continuum from Cloud to Edge (CognitiveCloud) [
109] is also related to our work, and includes projects such as AC3, ACES, CloudSkin, CODECO, COGNIFOG, DECICE, EDGELESS, MLSysOps and SovereignEdge.Cognit. More details of these projects are available at the CEI website.
HPE GreenLake [
110] cloud-to-edge is an infrastructure-as-a-service solution that brings the public cloud model to multiple IT environments, such as private clouds, multi-cloud and on-premises, in order to deliver an agile cloud everywhere modality to the users. HPE GreenLake allows for the integration, management and monitoring of all the above resources through a centralised interface. Users can access different types of deployable resources and services, e.g., bare-metal, compute, storage, containers and data protection services, as well as HPC, AI/ML and virtual desktop infrastructures.
Intel Smart Edge Open [
111] is an edge computing software toolkit for building platforms optimized for the edge. This is done by providing a toolkit of functionality selected from across the cloud native landscape, which has been extended and optimised to be used at the edge. This solution is able to work with heterogeneous hardware resources from the on-premise edge to regional data centres. These are managed by using a set of “experience kits”, provided by Intel and built on top of Kubernetes, that combine 5G capabilities and cloud-native components to simplify the deployment of complex network architectures, significantly reducing development time and cost. For instance, the Developer Experience Kit provides the base capabilities to run containerised edge services, including networking, security, and telemetry. An experience kit consists of building blocks that can be chosen according to the customer’s needs. Specifically, Resource management provides identification, configuration, allocation, and continuous monitoring of the hardware and software resources on the edge cluster; the Telemetry and Monitoring combine application telemetry, hardware telemetry, and events to create a heat-map across the edge cluster and enable the orchestrator to make scheduling decisions.
AMCOP [
112]—Aarna Networks Multi Cluster Orchestration Platform—is an open-source platform for orchestration, life-cycle management, and closed-loop automation of cloud-native network services and edge computing applications. AMCOP aims to solve the problem of managing the growing number of edge applications and edge sites by offering intent-based orchestration of network services and composite edge computing applications, which comprise cloud-native network functions and cloud native applications; it also supports service assurance for edge and 5G services through real-time, policy-driven closed-loop automation. AMCOP works by interfacing (northbound) with the collection of systems/applications that a network service provider already uses to operate its business (OSS/BSS), and by orchestrating infrastructure and network services/applications across multiple heterogeneous Kubernetes clusters (southbound).
Ormuco [
113] is a solution that aims to lead the deployment and usage of edge computing as an effective approach to deliver data processing. The platform was developed to respond to the needs of modern businesses that require the setup of an infrastructure-as-a-service via a decentralised approach in order to increase their revenue, reduce both the operations and maintenance costs, and automate the deployment of systems and applications on demand. The platform’s Cerebro virtual sysadmin is able to collect logs from heterogeneous computing nodes and applications; these are used to learn the expected behaviour of the deployed software and to notify application owners of any detected potential anomalies.
Azion [
114] is an end-to-end encrypted edge orchestration service with cloud management and zero-touch provisioning, created for large-scale edge networks. Users can manage and control resources across the edge in real-time and orchestrate services more easily, according to specific service requirements. The orchestration relies on an agent, installed on the edge nodes, that provides encrypted remote node management to the Azion Control panel, within the Real-Time Manager, deployed in the cloud. The Edge Node module enables devices to be created, managed and implements the integration with the orchestrator. The Edge Services module enables the customers to create their own services and allows them to be managed and orchestrated by the Real-Time Manager.
ONAP [
115] platform provides a unified operating framework for vendor-agnostic, policy-driven service design and implementation, as well as analytics and lifecycle management for large-scale workloads and services. Network operators can use ONAP to orchestrate both physical and virtual network functions; hence, they can capitalise on their existing network infrastructure while being part of a vibrant VNF ecosystem that includes providers around the globe. The ONAP Operations Manager (OOM) module, based on Kubernetes is responsible for orchestrating the end-to-end lifecycle management and monitoring of ONAP components, as well as enforcing scalability and resiliency mechanisms.
ZEDEDA [
116] is a cloud-based orchestration solution for the secure control of distributed edge computing deployments, which provides users with full-stack remote management of edge computing hardware and applications deployed both on clouds or on-premises systems. ZEDEDA leverages EVE-OS [
55], a secure, open universal operating system, developed with vendor-neutral and open-source governance as part of the Linux Foundation’s LF Edge organization. EVE-OS simplifies the deployment, orchestration and security of cloud-native and legacy applications on distributed edge compute nodes. EVE-OS encrypts data, maintains device and software integrity and supports VMs, containers and clusters (Docker and Kubernetes).