Skip to main content

2018 | Buch

Cloud Computing – CLOUD 2018

11th International Conference, Held as Part of the Services Conference Federation, SCF 2018, Seattle, WA, USA, June 25–30, 2018, Proceedings

insite
SUCHEN

Über dieses Buch

This volume constitutes the proceedings of the 11th International Conference on Cloud Computing, CLOUD 2018, held as part of the Services Conference Federation, SCF 2018, in Seattle, WA, USA, in June 2018.The 26 full papers presented together with 3 short papers were carefully reviewed and selected from 108 submissions. They are organized in topical sections such as cloud computing; client-server architectures; distributed systems organizing principles; storage virtualization; virtual machines; cloud based storage; distributed architectures; network services; and computing platforms.

Inhaltsverzeichnis

Frontmatter

Research Track: Cloud Schedule

Frontmatter
A Vector-Scheduling Approach for Running Many-Task Applications in the Cloud
Abstract
The performance variation of cloud resources makes it difficult to run certain scientific applications in the cloud because of their unique synchronization and communication requirements. We propose a decentralized scheduling approach for many-task applications that assigns individual tasks to cloud nodes based on periodic performance measurements of the cloud resources. In this paper, we present a vector-based scheduling algorithm that assigns tasks to nodes based on measuring the compute performance and the queue length of those nodes. Our experiments with a set of tasks in CloudLab show that the application proceeds in three distinct phases: flooding the cloud nodes with tasks, a steady state in which all nodes are busy, and the end game in which the remaining tasks are executed on the fastest nodes. We present heuristics for these three phases and demonstrate with measurements in CloudLab that they result in a reduction of the overall execution time of the many-task application.
Brian Peterson, Yalda Fazlalizadeh, Gerald Baumgartner, Qingyang Wang
Mitigating Multi-tenant Interference in Continuous Mobile Offloading
Abstract
Offloading computation to resource-rich servers is effective in improving application performance on resource constrained mobile devices. Despite a rich body of research on mobile offloading frameworks, most previous works are evaluated in a single-tenant setting, i.e., a server is assigned to a single client. In this paper we consider that multiple clients offload various continuous mobile sensing applications with end-to-end delay constraints, to a cluster of machines as the server. Contention for shared computing resources on a server can unfortunately result in delays and application malfunctions. We present a two-phase Plan-Schedule approach to mitigate multi-tenant resource contention, thus to reduce offloading delays. The planning phase predicts future workloads from all clients, estimates contention, and devises offloading schedule to remove or reduce contention. The scheduling phase dispatches arriving offloaded workloads to the server machine that minimizes contention, according to the running workloads on each machine. We implement the methods into ATOMS (Accurate Timing prediction and Offloading for Mobile Systems), a framework that adopts prediction of workload computing times, estimation of network delays, and mobile-server clock synchronization techniques. Using several mobile vision applications, we evaluate ATOMS under diverse configurations and prove its effectiveness.
Zhou Fang, Mulong Luo, Tong Yu, Ole J. Mengshoel, Mani B. Srivastava, Rajesh K. Gupta
Dynamic Selecting Approach for Multi-cloud Providers
Abstract
Since multi-cloud can be used both to mitigate vendor lock-in and take advantages of cloud computing, an application can be deployed to multiple cloud providers that best meet the user and application needs. For this, it is necessary to select the cloud providers that will host an application. The selection process is a complex task due to the fact that each application part has its constraints, but mainly due to the existence of many cloud providers’, each of them with its specific characteristics. In this article, we propose a cloud providers selection process to host applications based on microservices, in which each microservice must be hosted by the provider that best meet the user and microservice requirements. We use applications based on microservices because they can be independently deployed and scalable. In addition, we use Simple Additive Weighting method for ranking the candidate cloud providers then we select candidate providers amongst them to host each microservice by mapping the selection process to multi-choice knapsack problem. Besides that, the microservices are analyzed individually, except for the cost constraint. The selection process examines the cost constraint by observing all application microservices. Our approach still differs from others described in the literature because it selects multiple cloud providers to host the application microservices, which one for each application microservice. In this article, we also evaluated the proposed method through experiments and the result shows the viability of our approach. Finally, we point out future directions for the cloud providers selection process.
Juliana Carvalho, Dario Vieira, Fernando Trinta

Research Track: Cloud Data Storage

Frontmatter
Teleporting Failed Writes with Cache Augmented Data Stores
Abstract
Cache Augmented Data Stores enhance the performance of workloads that exhibit a high read to write ratio by extending a persistent data store (PStore) with a cache. When the PStore is unavailable, today’s systems result in failed writes. With the cache available, we propose TARDIS, a family of techniques that teleport failed writes by buffering them in the cache and persisting them once the PStore becomes available. TARDIS preserves consistency of the application reads and writes by processing them in the context of buffered writes. TARDIS family of techniques is differentiated in how they apply buffered writes to PStore once it recovers. Each technique requires a different amount of mapping information for the writes performed while PStore was unavailable. The primary contribution of this study is an overview of TARDIS and its family of techniques.
Shahram Ghandeharizadeh, Haoyu Huang, Hieu Nguyen
2-Hop Eclipse: A Fast Algorithm for Bandwidth-Efficient Data Center Switching
Abstract
A hybrid-switched data center network interconnects its racks of servers with a combination of a fast circuit switch that a schedule can reconfigure at significant cost and a much slower packet switch that a schedule can reconfigure at negligible cost. Given a traffic demand matrix between the racks, how can we best compute a good circuit switch configuration schedule that meets most of the traffic demand, leaving as little as possible for the packet switch to handle?
In this paper we propose 2-hop Eclipse, a new hybrid switch scheduling algorithm that strikes a much better tradeoff between the performance of the hybrid switch and the computational complexity of the algorithm, both in theory and in simulations, than the current state of the art solution Eclipse/Eclipse++.
Liang Liu, Long Gong, Sen Yang, Jun (Jim) Xu, Lance Fortnow
A Prediction Approach to Define Checkpoint Intervals in Spot Instances
Abstract
Cloud computing providers have started offering their idle resources in the form of virtual machines (VMs) without availability guarantees. Know as transient servers, these VMs can be revoked at any time without user intervention. Spot instances are transient servers offered by Amazon at lower prices than regular dedicated servers. A market model was used to create a bidding scenario for cloud users of servers without service reliability guarantees, where prices changed dynamically over time based on supply and demand. To prevent data loss, the use of fault tolerance techniques allows the exploration of transient resources. This paper proposes a strategy that addresses the problem of executing a distributed application, like bag-of-tasks, using spot instances. We implemented a heuristic model that uses checkpoint and restore techniques, supported by a statistical model that predicts time to revocation by analyzing price changes and defining the best checkpoint interval. Our experiments demonstrate that by using a bid strategy and the observed price variation history, our model is able to predict revocation time with high levels of accuracy. We evaluate our strategy through extensive simulations, which use the price change history, simulating bid strategies and comparing our model with real time to revocation events. Using instances with considerable price changes, our results achieve an 94% success with standard deviation of 1.36. Thus, the proposed model presents promising results under realistic working conditions.
Jose Pergentino A. Neto, Donald M. Pianto, Célia Ghedini Ralha

Research Track: Cloud Container

Frontmatter
Cloud Service Brokerage and Service Arbitrage for Container-Based Cloud Services
Abstract
By exploiting the portability of application containers, platform- and software-as-a-service providers receive the flexibility to deploy and move their cloud services across infrastructure services delivered by different providers. The aim of this research is to apply the concepts of cloud service brokerage to container-based cloud services and to define a method for service arbitrage in an environment with multiple infrastructure-as-a-service (IaaS) providers. A new placement method based on constraint programming is introduced for the optimised deployment of containers across multiple IaaS providers. The benefits and limitations of the proposed method are discussed and the efficiency of the method is evaluated based on multiple deployment scenarios.
Ruediger Schulze
Fault Injection and Detection for Artificial Intelligence Applications in Container-Based Clouds
Abstract
Container technique is increasingly used to build modern cloud computing systems to achieve higher efficiency and lower resource costs, as compared with traditional virtual machine technique. Artificial intelligence (AI) is a mainstream method to deal with big data, and is used in many areas to achieve better effectiveness. It is known that attacks happen every day in production cloud systems, however, the fault behaviors and interferences of up-to-date AI applications in container-based cloud systems is still not clear. This paper aims to study the reliability issue of container-based clouds. We first propose a fault injection framework for container-based cloud systems. We build a docker container environment installed with TensorFlow deep learning framework, and develop four typical attack programs, i.e., CPU attack, Memory attack, Disk attack and DDOS attack. Then, we inject the attack programs to the containers running AI applications (CNN, RNN, BRNN and DRNN), to observe fault behaviors and interferences phenomenon. After that, we design fault detection models based on quantile regression method to detect potential faults in containers. Experimental results show the proposed fault detection models can effectively detect the injected faults with more than 60% Precision, more than 90% Recall and nearly 100% Accuracy.
Kejiang Ye, Yangyang Liu, Guoyao Xu, Cheng-Zhong Xu
Container-VM-PM Architecture: A Novel Architecture for Docker Container Placement
Abstract
Docker is a mature containerization technique used to perform operating system level virtualization. One open issue in the cloud environment is how to properly choose a virtual machine (VM) to initialize its instance, i.e., container, which is similar to the conventional problem of VM placement towards physical machines (PMs). Current studies mainly focus on container placement and VM placement independently, but rarely take into consideration of the two placements’ systematic collaboration. However, we view it as a main reason for scattered distribution of containers in a data center, which finally results in worse physical resource utilization. In this paper, we propose a definition named “Container-VM-PM” architecture and propose a novel container placement strategy by simultaneously taking into account the three involved entities. Furthermore, we model a fitness function for the selection of VM and PM. Simulation experiments show that our method is superior to the existing strategy with regarding to the physical resource utilization.
Rong Zhang, A-min Zhong, Bo Dong, Feng Tian, Rui Li

Research Track: Cloud Resource Management

Frontmatter
Renewable Energy Curtailment via Incentivized Inter-datacenter Workload Migration
Abstract
Continuous Grid balancing is essential for ensuring the reliable operation of modern smart grids. Current smart grid systems lack practical large-scale energy storage capabilities and therefore their supply and demand levels must always be kept equal in order to avoid system instability and failure. Grid balancing has become more relevant in recent years following the increasing desire to integrate more Renewable Energy Sources (RESs) into the generation mix of modern grids. RESs produce intermittent energy supply that can’t always be predicted accurately [1] and necessitates that effective balancing mechanisms are put in place to compensate for their supply variability [2, 3]. In this work, we propose a new energy curtailment scheme for balancing excess RESs energy using data centers as managed loads. Our scheme uses incentivized inter-datacenter workload migration to increase the computational energy consumption at a destination datacenter by the amount necessary to balance the grid. Incentivised workload migration is achieved by offering discounted energy prices (in the form of Energy Credits) to large-scale cloud clients in order to influence their workload placement algorithms to favor datacenters where the energy credits can be used. Implementations of our system using the CPLEX ILP solver as well as the Best Fit Decreasing (BFD) heuristic [4] for workload placement on data centers showed that using energy credits is an effective mechanism to speed-up/control the energy consumption rates at datacenters especially at low system loads and that they result in increased profits for the cloud clients due to the higher profit margins associated with using the proposed credits.
Ahmed Abada, Marc St-Hilaire
Pricing Cloud Resource Based on Reinforcement Learning in the Competing Environment
Abstract
Multiple cloud providers compete against each other in order to attract cloud users and make profits in the cloud market. In doing so, each provider needs to charge fees to users in a proper way. In this paper, we will analyze how a cloud provider sets price effectively when competing against other cloud providers. Specifically, we model this problem as a Markov game, and then use minimax-Q and Q learning algorithms to design the pricing policies respectively. Based on this, we run extensive experiments to analyze the effectiveness of minimax-Q and Q learning based pricing policies. We find that although minimax-Q is more suitable in analyzing the competing game with multiple self-interested cloud providers, Q learning based pricing policy performs better in terms of making profits. We also find that minimax-Q learning based pricing policy performs better in terms of keeping cloud users. Our experimental results can provide useful insights on designing practical pricing policies in different situations.
Bing Shi, Hangxing Zhu, Han Yuan, Rongjian Shi, Jinwen Wang
An Effective Offloading Trade-Off to Economize Energy and Performance in Edge Computing
Abstract
Mobile Edge Computing is a new technology which aims to reduce latency, to ensure highly efficient network operation and to offer an improved user experience. Considering offloading will introduce additional wireless transmission overhead, the key technical challenge of mobile edge computing is tradeoff between computation cost and wireless transmission cost, reducing energy consumption of mobile edge devices and response time of computation task at the same time. A Mobile Edge Computing System composed of Mobile Edge Device and Edge Cloud, connecting with Wireless Stations, comes out. To protect user privacy, Data Preprocessing is proposed which includes irrelevant property clean and data segmentation. Aimed at reducing total energy consumption and response time, an energy consumption priority offloading (ECPO) algorithm and a response time priority offloading (RTPO) algorithm are put forward, based on Energy Consumption Model and Response Time Model. Combining both ECPO and RTPO, a dynamic computing offloading algorithm is raised which is more universal. Finally, simulations in four scenarios, including network normal scenario, network congested scenario, device low battery scenario and task time limited scenario, demonstrate that our algorithms can effectively reduce energy consumption of mobile edge device and response time of computation task.
Yuting Cao, Haopeng Chen, Zihao Zhao

Research Track: Cloud Management

Frontmatter
Implementation and Comparative Evaluation of an Outsourcing Approach to Real-Time Network Services in Commodity Hosted Environments
Abstract
Commodity operating systems (OS) often sacrifice real-time (RT) performance (e.g., consistent low latency) in favor of optimized average latency and throughput. This can cause latency variance problems when an OS hosts virtual machines that run network services. This paper proposes a software-based RT method in Linux KVM-based hosted environments. First, this method solves the priority inversion problem in interrupt handling of vanilla Linux using the RT Preempt patch. Second, this method solves another priority inversion problem in the softirq mechanism of Linux by explicitly separating the RT softirq handling from the non-RT softirq handling. Finally, this method mitigates the cache pollution problem by co-located non-RT services and avoids the second priority inversion in a guest OS by socket outsourcing. Compared to the RT Preempt Patch Only method, the proposed method has the 76% lower standard deviation, 15% higher throughput, and 33% lower CPU overhead. Compared to the dedicated processor method, the proposed method has the 63% lower standard deviation, higher total throughput by a factor of 2, and avoids under-utilization of the dedicated processor.
Oscar Garcia, Yasushi Shinjo, Calton Pu
Identity and Access Management for Cloud Services Used by the Payment Card Industry
Abstract
The Payment Card Industry Data Security Standard (PCI DSS) mandates that any entity of the cardholder data environment (CDE) involved in the credit card payment process has to be compliant to the requirements of the standard. Hence, cloud services which are used in the CDE have to adhere to the PCI DSS requirements too. Identity and access management (IAM) are essential functions for controlling the access to the resources of cloud services. The aim of this research is to investigate the aspects of IAM required by the PCI DSS and to describe current concepts of IAM for cloud services and how they relate to the requirements of the PCI DSS.
Ruediger Schulze
Network Anomaly Detection and Identification Based on Deep Learning Methods
Abstract
Network anomaly detection is the process of determining when network behavior has deviated from the normal behavior. The detection of abnormal events in large dynamic network has become increasingly important as networks grow in size and complexity. However, fast and accurate network anomaly detection is very challenging. Deep learning is a potential method for network anomaly detection due to its good feature modeling capability. This paper presents a new anomaly detection method based on deep learning models, specifically the feedforward neural network (FNN) model and convolutional neural network (CNN) model. The performance of the models is evaluated by several experiments with a popular NSL-KDD dataset. From the experimental results, we find the FNN and CNN models not only have a strong modeling ability for network anomaly detection, but also have high accuracy. Compared with several traditional machine learning methods, such as J48, Naive Bayes, NB Tree, Random Forest, Random Tree and SVM, the proposed models obtain a higher accuracy and detection rate with lower false positive rate. The deep learning models can effectively improve both the detection accuracy and the ability to identify anomaly types.
Mingyi Zhu, Kejiang Ye, Cheng-Zhong Xu
A Feedback Prediction Model for Resource Usage and Offloading Time in Edge Computing
Abstract
Nowadays, edge computing which provides low delay services has gained much attention in the research filed. However, the limited resources of the platform make it necessary to predict the usage, execution time exactly and further optimize the resource utilization during offloading. In this paper, we propose a feedback prediction model (FPM), which includes three processes: the usage prediction process, the time prediction process and the feedback process. Firstly, we use average usage instead of instantaneous usage for usage prediction and calibrate the prediction results with real data. Secondly, building the time prediction process with the predicted usage values, then project the time error to usage value and update the usage values. Meanwhile, our model re-executes the time prediction process. Thirdly, setting a judgment and feedback number to the correction process. If prediction values meet the requirement or reach the number, FPM stops error feedback and skips to the next training. We compare the testing results to other two model which are BP neural network and FPM without feedback process (NO-FP FPM). The average usage and time prediction errors of BP and NO-FP FPM are 10%, 25% and 16%, 12%. The prediction accuracy in FPM has a great improvements. The average usage prediction errors can reach less than 8% and time error reach about 6%.
Menghan Zheng, Yubin Zhao, Xi Zhang, Cheng-Zhong Xu, Xiaofan Li

Application and Industry Track: Cloud Service System

cuCloud: Volunteer Computing as a Service (VCaaS) System
Abstract
Emerging cloud systems, such as volunteer clouds and mobile clouds, are getting momentum among the current topics that dominate the research landscape of Cloud Computing. Volunteer cloud computing is an economical, secure, and greener alternative solution to the current Cloud Computing model that is based on data centers, where tens of thousands of dedicated servers are setup to back the cloud services. This paper presents cuCloud, a Volunteer Computing as a Service (VCaaS) system that is based on the spare resources of personal computers owned by individuals and/or organizations. The paper addresses the design and implementation issues of cuCloud, including the technical details of its integration with the well-known open source IaaS cloud management system, CloudStack. The paper also presents the empirical performance evidence of cuCloud in comparison with Amazon EC2 using a big-data application based on Hadoop.
Tessema M. Mengistu, Abdulrahman M. Alahmadi, Yousef Alsenani, Abdullah Albuali, Dunren Che
CloudsStorm: An Application-Driven Framework to Enhance the Programmability and Controllability of Cloud Virtual Infrastructures
Abstract
Most current IaaS (Infrastructure-as-a-Service) clouds provide dedicated virtual infrastructure resources to cloud applications with only limited programmability and controllability, which enlarges the management gap between infrastructures and applications. Traditional DevOps (development and operations) approaches are not suitable in today’s cloud environments, because of the slow, manual and error-prone collaboration between developers and operations personnel. It is essential to involve the operation into the cloud application development phase, which needs to make the infrastructure able to be controlled by the application directly. Moreover, each of these cloud providers offers their own set of APIs to access the resources. It causes the vendor lock-in problem for the application when managing its infrastructure across federated clouds or multiple data centers. To mitigate this gap, we have designed CloudsStorm, an application-driven DevOps framework that allows the application directly program and control its infrastructure. In particular, it provides multi-level programmability and controllability according to the applications’ specifications. We evaluate it by comparing its functionality to other proposed solutions. Moreover, we implement an extensible TSV-Engine, which is the core component of CloudsStorm for managing infrastructures. It is the first to be able to provision a networked infrastructure among public clouds. At last, we conduct a set of experiments on actual clouds and compare with other related DevOps tools. The experimental results demonstrate our solution is efficient and outperforms others.
Huan Zhou, Yang Hu, Jinshu Su, Cees de Laat, Zhiming Zhao
A RESTful E-Governance Application Framework for People Identity Verification in Cloud
Abstract
An effective application framework design for e-governance is definitely a challenging task. The majority of the prior research has focused on designing e-governance architecture where people identity verification takes long time using manual verification system. We develop an efficient application framework that verifies peoples identity. It provides cloud based REST API using deep learning based recognition approach and stores face meta data in neural networks for rapid facial recognition. After each successful identity verification, we store the facial data in the neural network if there is a match between 80–95%. This decreases the error rate in each iteration and enhance the network. Finally, our system is compared with the existing system on the basis of CPU utilization, error rate and cost metrics to show the novelty of this framework. We implement and evaluate our proposed framework which allows any organization and institute to verify people identity in a reliable and secure manner.
Ahmedur Rahman Shovon, Shanto Roy, Tanusree Sharma, Md Whaiduzzaman
A Novel Anomaly Detection Algorithm Based on Trident Tree
Abstract
In this paper, we propose a novel anomaly detection algorithm, named T-Forest, which is implemented by multiple trident trees (T-trees). Each T-tree is constructed recursively by isolating the data outside of 3 sigma into the left and right subtree and isolating the others into the middle subtree, and each node in a T-tree records the size of datasets that falls on this node, so that each T-tree can be used as a local density estimator for data points. The density value for each instance is the average of all trees evaluation instance densities, and it can be used as the anomaly score of the instance. Since each T-tree is constructed according to 3 sigma principle, each tree in TB-Forest can obtain good anomaly detection results without a large tree height. Compared with some state-of-the-art methods, our algorithm performs well in AUC value, and needs linear time complexity and space complexity. The experimental results show that our approach can not only effectively detect anomaly points, but also tend to converge within a certain parameters range.
Chunkai Zhang, Ao Yin, Yepeng Deng, Panbo Tian, Xuan Wang, Lifeng Dong

Application and Industry Track: Cloud Environment Framework

Framework for Management of Multi-tenant Cloud Environments
Abstract
The benefits of using container-based microservices for the development of cloud applications have been widely reported in the literature and are supported by empirical evidence. However, it is also becoming clear that the management of large-scale container-based environments has its challenges. This is particularly true in multi-tenant environments operating across multiple cloud platforms. In this paper, we discuss the challenges of managing container-based environments and review the various initiatives directed towards addressing this problem. We then describe the architecture of the Unicorn Universe Cloud framework and the Unicorn Cloud Control Centre designed to facilitate the management and operation of containerized microservices in multi-tenant cloud environments.
Marek Beranek, Vladimir Kovar, George Feuerlicht
Fault Tolerant VM Consolidation for Energy-Efficient Cloud Environments
Abstract
Cloud computing applications are expected to guarantee the service performance which is determined by the Service Level Agreement (SLA) between cloud owner and client. While satisfying SLA, Virtual Machines (VMs) may be consolidated based on their utilization in order to balance the load or optimize energy efficiency of physical resources. Since, these physical resources are prone to system failure, any optimization approach should consider the probability of a failure on a physical resource and orchestrate the VMs over the physical plane accordingly. Otherwise, it is possible to experience unfavorable results such as numerous redundant application migrations and consolidations of VMs that may easily cause in SLA violations. In this paper, a novel approach is proposed to reduce energy consumption and number of application migration without violating SLA while considering the fault tolerance of the cloud system in the face of physical resource failures. Simulation results show that proposed model reduce energy consumption by approximately 37% and number of application migration by approximately 9%. Besides, in case of faults the increase of energy consumption is less than 11% when the proposed approach is used.
Cihan Secinti, Tolga Ovatman
Over-Sampling Algorithm Based on VAE in Imbalanced Classification
Abstract
The imbalanced classification problem is a problem that violates the assumption of uniform distribution of samples, classes differ in sample size, sample distribution and misclassification cost. The traditional classifiers tend to ignore the important minority samples because of their rarity. Oversampling, the algorithm uses various methods to increase the minority samples in the training set to increase the recognition rate of them. However, these over-sampling methods are too coarse to improve the classification effect of the minority samples, because they can’t make full use of the information in the original samples, but increase the training time because of adding extra samples. In this paper, we propose to use the distribution information of the minority samples, use the variational auto-encoder to fit the probability distribution function of them without any prior assumption, and reasonably expand the minority class sample set. The experimental results prove the effectiveness of the proposed algorithm.
Chunkai Zhang, Ying Zhou, Yingyang Chen, Yepeng Deng, Xuan Wang, Lifeng Dong, Haoyu Wei

Application and Industry Track: Cloud Data Processing

A Two-Stage Data Processing Algorithm to Generate Random Sample Partitions for Big Data Analysis
Abstract
To enable the individual data block files of a distributed big data set to be used as random samples for big data analysis, a two-stage data processing (TSDP) algorithm is proposed in this paper to convert a big data set into a random sample partition (RSP) representation which ensures that each individual data block in the RSP is a random sample of the big data, therefore, it can be used to estimate the statistical properties of the big data. The first stage of this algorithm is to sequentially chunk the big data set into non-overlapping subsets and distribute these subsets as data block files to the nodes of a cluster. The second stage is to take a random sample from each subset without replacement to form a new subset saved as an RSP data block file and the random sampling step is repeated until all data records in all subsets are used up and a new set of RSP data block files are created to form an RSP of the big data. It is formally proved that the expectation of the sample distribution function (s.d.f.) of each RSP data block equals to the s.d.f. of the big data set, therefore, each RSP data block is a random sample of the big data set. Implementation of the TSDP algorithm on Apache Spark and HDFS is presented. Performance evaluations on terabyte data sets show the efficiency of this algorithm in converting HDFS big data files into HDFS RSP big data files. We also show an example that uses only a small number of RSP data blocks to build ensemble models which perform better than the single model built from the entire data set.
Chenghao Wei, Salman Salloum, Tamer Z. Emara, Xiaoliang Zhang, Joshua Zhexue Huang, Yulin He
An Improved Measurement of the Imbalanced Dataset
Abstract
Imbalanced classification is a classification problem that violates the assumption of uniform distribution of samples. In such problems, traditional imbalanced datasets are measured in terms of the imbalance of sample size, without considering the distribution information, which has a more important impact on the classification performance, so the traditional measurements have a weak relation with the classification performance. This paper proposed an improved measurement for imbalanced datasets, it is based on the idea that a sample surrounded by more same class samples is easier to classify, for each sample of different classes, the proposed method calculates the average number of the k nearest neighbors in the same class in different subsets under the weighted k-NN, after that, the product of these average values is regarded as the measurement of this dataset, and it is a good indicator of the relationship between the distribution of samples and the classification results. The experimental results show that the proposed measurement has a higher correlation with the classification results and shows the difficulty of classification of data sets more clearly.
Chunkai Zhang, Ying Zhou, Yingyang Chen, Changqing Qi, Xuan Wang, Lifeng Dong
A Big Data Analytical Approach to Cloud Intrusion Detection
Abstract
Advances in cloud computing in the past decade have made it a feasible option for the high performance computing and mass storage needs of many enterprises due to the low startup and management costs. Due to this prevalent use, cloud systems have become hot targets for attackers aiming to disrupt reliable operation of large enterprise systems. The variety of attacks launched on cloud systems, including zero-day attacks that these systems are not prepared for, call for a unified approach for real-time detection and mitigation to provide increased reliability. In this work, we propose a big data analytical approach to cloud intrusion detection, which aims to detect deviations from the normal behavior of cloud systems in near real-time and introduce measures to ensure reliable operation of the system by learning from the consequences of attack conditions. Initial experiments with recurrent neural network-based learning on a large network attack dataset demonstrate that the approach is promising to detect intrusions on cloud systems.
Halim Görkem Gülmez, Emrah Tuncel, Pelin Angin

Short Track

Frontmatter
Context Sensitive Efficient Automatic Resource Scheduling for Cloud Applications
Abstract
Nowadays applications in cloud computing often use containers as their service infrastructures to provide online services. Applications deployed in lightweight containers can be quickly started and stopped, which adapt to changing resource requirements brought from workload fluctuation. Existing works to schedule physical resources often model applications’ performance according to a training dataset, but they cannot self-adapt dynamic deployment environment. To address this above issue, this paper proposes a feedback based physical resource scheduling approach for containers to deal with the significant changes of physical resources brought from dynamic workloads. First, we model applications’ performance with a queue-based model, which represents the correlations between workloads and performance metrics. Second, we predict applications’ response time by adjusting the parameters of the application performance model with a fuzzy Kalman filter. Finally, we schedule physical resources according to the predicted response time. Experimental results show that our approach can adapt to dynamic workloads and achieve high performance.
Lun Meng, Yao Sun
A Case Study on Benchmarking IoT Cloud Services
Abstract
The Internet of Things (IoT) is on the rise, forming networks of sensors, machines, cars, household appliances, and other physical items. In the industrial area, machines in assembly lines are connected to the internet for quick date exchange and coordinated operation. Cloud services are an obvious choice for data integration and processing in the (industrial) IoT. However, manufacturing machines have to exchange data in close to real-time in many use cases, requiring small round-trip times (RTT) in the communication between device and cloud. In this study, two of the major IoT cloud services, Microsoft Azure IoT Hub and Amazon Web Services IoT, are benchmarked in the area of North Rhine-Westphalia (Germany) regarding RTT, varying factors like time of day, day of week, location, inter-message interval, and additional data processing in the cloud. The results show significant performance differences between the cloud services and a considerable impact of some of the aforementioned factors. In conclusion, as soon as (soft) real-time conditions come into play, it is highly advisable to carry out benchmarking in advance to identify an IoT cloud workflow which meets these conditions.
Kevin Grünberg, Wolfram Schenck
A Comprehensive Solution for Research-Oriented Cloud Computing
Abstract
Cutting edge research today requires researchers to perform computationally intensive calculations and/or create models and simulations using large sums of data in order to reach research-backed conclusions. As datasets, models, and calculations increase in size and scope they present a computational and analytical challenge to the researcher. Advances in cloud computing and the emergence of big data analytic tools are ideal to aid the researcher in tackling this challenge. Although researchers have been using cloud-based software services to propel their research, many institutions have not considered harnessing the Infrastructure as a Service model. The reluctance to adopt Infrastructure as a Service in academia can be attributed to many researchers lacking the high degree of technical experience needed to design, procure, and manage custom cloud-based infrastructure. In this paper, we propose a comprehensive solution consisting of a fully independent cloud automation framework and a modular data analytics platform which will allow researchers to create and utilize domain specific cloud solutions irrespective of their technical knowledge, reducing the overall effort and time required to complete research.
Mevlut A. Demir, Weslyn Wagner, Divyaansh Dandona, John J. Prevost
Backmatter
Metadaten
Titel
Cloud Computing – CLOUD 2018
herausgegeben von
Min Luo
Liang-Jie Zhang
Copyright-Jahr
2018
Electronic ISBN
978-3-319-94295-7
Print ISBN
978-3-319-94294-0
DOI
https://doi.org/10.1007/978-3-319-94295-7

Premium Partner