In order to successfully deploy any service or component their runtime dependencies in term of other services must be specified. For example, the DDA (Deterministic Data Analyzer) tool depends at runtime on C-SPARQL. In addition, all TCP or UDP sockets on which the services listen must be specified. Finally, wrapper scripts are configuring through environmental variables the socket addresses on which services are allowed to listen and the remote service endpoints on which the service depends on.
The next subsections detail the most important supporting services from MODAClouds. These are integral for the correct functioning of the MODAClouds solution.
8.3.1 Object Store
The classic approach in software configuration is through configuration files which reside on the local disk, however such an approach is not very well suited for a Cloud environment, where VM’s are started from identical templates (the VM images), and in most cases unattended, thus the configuration files must be rewritten at startup. Luckily, for such a scenario, there are existing solutions, such as Puppet
1 or Chef.
2 However they also require a central database where the actual configuration parameters are stored. Moreover some of the deployed services might also want to store small state data, either for later retrieval, or for weak synchronization within a cluster. In this case the simplest solution is to use either a kind of database, or a distributed file system. This is the rational behind the development of the Object Store.
The Object Store provides an alternative to the more traditional locally stored configuration files. In the Object Store an object is a keyed container which aggregates various attributes that refer to the same subject. For example one could have an object to hold the configuration parameters of a given service (or class of services); or perhaps to hold the end-point (and other protocol parameters) where a given service can be contacted.
The object’s attributes are: data, indices, links, annotations, and attachments.
A collection serves no other purpose than to group similar objects together, either based on purpose or type, or based on scope (such as all objects belonging to the same service). Collections can be used without being created first, and there is no option to destroy them (except removing one-by-one all the objects that belong to it). Therefore there are no other implications (in terms of performance or functionality) of placing an object in a collection or another, except perhaps easing operational procedures (such as removing all objects belonging to a service).
The most basic usage of an object would be to store some useful information, and have it available for later access. The stored data can be anything, from JSON or XML to a binary file, and besides the actual data it is characterized by a content-type. Later based on this declared content-type one can decide how to interpret the data. Although there can be a single data item for an object, one could easily use multipart/mixed to bundle together multiple data items; however it is advisable to avoid such a scenario and use either links or attachments.
Access to the data is atomic and concurrent updates are permitted without any locking or conflict resolution mechanisms, the latest update overriding previous ones, thus no isolation with lost-updates being possible. Although the data can be frequently accessed or updated without high overhead, it is advisable to cache operations by using the dedicated HTTP conditional requests. Because the data is stored temporarily in memory, it is advised to keep the amount of data small, well under a 100 kilo-bytes. Data that is larger should be handled as an attachment. In addition to its data, an object can be augmented with indices which enables efficiently selecting objects on other criteria than just the object key. An object can have multiple indices, each index being characterized by a label and a value, and it is allowed to have multiple indices with the same label.
The major difference between indices presented by this solution and other NoSQL or even SQL databases is that most other solutions build their indices based on the actual data. In the case of the object store, the indices are built based on meta-data that is associated with the actual data (the indices attribute). By separating the indexing from the actual data we have greater control over how the data is stored and retrieved. We also optimize for those access patterns where the data changes frequently, but the values used by the indexer stay the same.
Links are the feature which allows an object to reference another one, building in essence a graph. For example one could have a service configuration object, holding specific parameter values, and pointing to a global configuration object, holding default parameter values. A link is characterized by a label and the referenced object key, and it is allowed to have multiple links with the same label or the same referenced object (therefore a many-to-many relation can be created). Unlike indices, links are scoped under the object, are unidirectional, and are not usable in selection criteria. Therefore one can not ascertain which objects reference a given target object (without performing a full scan of the store). The only operation, besides creation and destruction, that can be applied to a link is link-walking, where by starting from an object, one can specify a label and gain access to the referenced object’s attributes; link-walking can be applied recursively. Links can be destroyed or created as frequently as necessary as they are not indexed.
Data that logically belongs to the object, but which is either too large to be used as actual data or is static, can be placed within an attachment. Attachments are created in two steps. First, the attachment is created by uploading its content, and obtaining its fingerprint, so if the same data is uploaded twice the fingerprint remains the same thus no extra storage space is consumed. Second, a reference to the attachment (i.e. its fingerprint) is placed within the object with a given label, together with the content-type and size which serves only for informative purposes. The same attachment can be referenced from multiple objects without uploading its data, provided that the fingerprint is known.
Similarly, accessing the attachment of an object is done in two steps: obtaining the attachment reference, then accessing the actual attachment based on its fingerprint. Like with links, attachments are scoped under an object, only their data being globally stored. In terms of efficiency, creating or updating attachments do not have high overhead (except the initial data upload). This is because the various information pertaining to a specific object such as the actual data, meta-data, links, annotations, attachments are not lumped together. These are partitioned, just like vertically partitioned SQL databases. Also, because attachments are identified based on their global qualifier, duplicating or moving an attachments from one object to another doesn’t require the re-upload of the entire attachment.
The annotations are meta-data which can be specified for objects or attachments, and are characterized by a label (unique within the same object) and any JSON term as a value. Annotations are those data which if erased do not impact the usage of the object. In general annotations can be used to store ancillary data about the object, especially those used by operational tools. For example, one can specify the creator, tool and possibly the source, ACL’s or digital signatures, etc.
The object store has facilities for multi-Cloud deployment via replication. The replication process has three phases: defining on the target (i.e. the server) a replication stream, which yields a token used to authenticate the replication; defining on the initiator (i.e. the client) a matching replication stream; and the actual replication which happens behind the scenes. It must be noted that the replication is one way, namely the target (i.e. the server) continuously streams updates towards the initiator (i.e. the client). If two-way replication is desired, the same process must be followed on both sides.
Regarding conflicts, and because internally the object store exchanges “patches” which only highlight the changes, any conflicting patch is currently ignored. It is therefore highly recommended to confine updates to a certain object only to one of the two replicas. However if multiple changes happen to the same object, and multiple patches are sent, and say the first one yields a conflict, but the rest don’t, only the conflicting patch will be discarded, the others being applied. It is possible to obtain replication graphs or trees, including cycles, and the object store handles these properly.
Service Configuration Use Cases
Let us suppose that an operator has several instances of the same service type (i.e. application server or database) which he would like to configure during execution. Moreover the user would like to change the configuration and have it dynamically applied as easily as possible.
Single shared configuration is the most basic scenario. The most simple solution is to store the configuration parameters in an object created before execution is started, preferably JSON term or plain text as the data, or alternatively as an attachment. Then at execution the object’s reference is specified as an initialization argument to each of the instantiated services, which retrieve the data and use it to properly configure the service.
If each service continuously polls the object for updates, it can detect when the operator has changed the configuration parameters, and apply the necessary changes (possibly by restarting). This might seem to involve fetching the data over and over again thus incurring large network overhead, such is not necessary true if one uses HTTP conditional requests which is rather efficient.
In the case of Multiple shared configurations the services require multiple different configuration parameters grouped in multiple “files”, possibly because their syntax is different, or perhaps for better maintenance by the operator. One solution to this problem is to create a master object and using links to point to the other needed configuration objects. As before polling can be applied to detect configuration changes, but because now it involves multiple objects, after an update has been detected a grace period should be used, after which another check should be done, if no other updates have been detected the configurations are applied. This prevents frequently restarting the service while the operator updates sequentially the configuration objects.
8.3.4 Batch Engine
The main goal of the Batch Engine (BE) is to support the computationally-intensive routines that are to be executed as part of the Filling the Gap Analysis. As there are no tight deadlines, these routines are executed offline, and therefore it is possible to exploit the large datasets of monitoring information collected at runtime. We therefore opt for a BE that exploits a pool of parallel resources. In particular, the BE aims to provide on demand HTC/HPC clusters on top of existing computational Cloud resources (e.g., Eucalyptus, EC2, Flexiant, PTC, etc.).
From a technical perspective, the BE integrates the services provided by the underlying scheduling middleware, particularly the HTCondor workload management system [
3]. The BE provides REST API’s that allow job execution management (including submission and monitoring).
The API offered by BE is extensible, providing the ability to support new job-scheduling engines or middleware. As the FG analysis techniques were implemented in Matlab, we are making use of the Parallel Toolbox and the APIs offered by the BE to submit and manage the parallel jobs, as well as to retrieve the results. The execution of the FG analysis relies on the Matlab Compiler Runtime (MCR), a free runtime platform to execute standalone applications developed in Matlab.
The main features of the BE are include automatic provisioning using specially-designed Puppet modules, the ability to use existing infrastructure (ex: Amazon EC2, Flexiant) and an API middleware for job control. There are several important features in the BE. First, a REST API (based on JSONRPC2) for controlling the deployment. This API allows to dynamically specify the architecture of the provisioned cluster, and to reuse predefined models. It allows customizing the cluster based on the required resources (CPU, memory, GPUs, etc.). This API abstracts the cluster deployment operations, including: machine deployment; software provisioning, configuration, monitoring. The API resorts to specially-defined Puppet modules that handle the deployment of all the software components.
It also uses a REST API for job management and monitoring. This REST API abstracts the job management operations and interacts with the back-end HTCondor service. The API provides common operations offered by HTCondor as REST-compliant services. These operations include job submission, data staging, job state notifications, etc.
Lastly a flexible core that allows the addition of various schedulers, each with a different feature set, as required by applications.
From an architectural point of view the BE is composed of four main subsystems: Batch Engine API: This subsystem is responsible for interacting with the client applications or users. It handles the requests and delegates them to the other subsystems.
Batch Engine Cluster Manager API: Based on SCT,it uses the Configuration Management subsystem (mainly Puppet) and the Cloud interface for deploying nodes and provisioning the job scheduler (e.g., HTCondor).
Batch Engine Execution Manager: Is responsible with the effective job execution and corresponding event handling (interaction with external components). It dispatches job execution requests to the deployed HTCondor workload manager. The workload manager permits the management of both serial and parallel jobs, feature that will be exploited by applications that use MPI like technologies.
Scheduler: Represents the effective job-scheduling system, responsible for executing the submitted jobs. It also provides the wrapping mechanism needed for offering integration facilities like the job notification API.
Finally, the FG Analyzer calls the Batch Engine periodically and executes several jobs on multiple nodes performing different analyses. For instance, the FG Analyzer can execute several demand estimation procedures in parallel using the Batch Engine to compare the accuracy of them during design time. It also executes the analysis corresponding to different datasets in parallel, thus speeding up the analysis phase.