Skip to main content
Top

2019 | Book

Cloud Computing – CLOUD 2019

12th International Conference, Held as Part of the Services Conference Federation, SCF 2019, San Diego, CA, USA, June 25–30, 2019, Proceedings

insite
SEARCH

About this book

This volume constitutes the proceedings of the 12th International Conference on Cloud Computing, CLOUD 2019, held as part of the Services Conference Federation, SCF 2019, in San Diego, CA, USA, in June 2019.

The 24 full papers were carefully reviewed and selected from 53 submissions. CLOUD has been a prime international forum for both researchers and industry practitioners to exchange the latest fundamental advances in the state of the art and practice of cloud computing, to identify emerging research topics, and to define the future of cloud computing. All topics regarding cloud computing align with the theme of CLOUD.

Table of Contents

Frontmatter
Ultra-Low Power Localization System Using Mobile Cloud Computing
Abstract
In the existing positioning system based on bluetooth (BT), the interference of the positioning device signal, the slow processing speed of the positioning data and the large energy consumption of the positioning device affect the system positioning accuracy and service quality. In this paper, we propose an Ultra-Low power indoor localization system using mobile cloud computing. The mobile cloud server reduces the signal interference of the positioning device, improves the positioning accuracy and reduces the system energy consumption by controlling the working mode of the positioning device. A simultaneous localization and power adaptation scheme is developed. In the real experiment evaluation, our proposed system can localize the area of a terminal located within 3 m distance with \(98\%\) accuracy and average positioning error less then 1.55 m. Compare with other BLE system, \(97\%\) average energy consumption of our system is reduced.
Junjian Huang, Yubin Zhao, XiaoFan Li, Cheng-Zhong Xu
A Method and Tool for Automated Induction of Relations from Quantitative Performance Logs
Abstract
Operators use performance logs to manage large-scale web service infrastructures. Detecting, isolating and diagnosing fine-grained performance anomalies require integrating system performance measures across space and time. The diversity of these logs layouts impedes their efficient processing and hinders such analyses. Performance logs possess some unique features, which challenge current log parsing techniques. In addition, most current techniques stop at extraction leaving relational definition as a post-processing activity, which can be a substantial effort at web scale. To achieve scale, we introduce our perftables approach, which automatically interprets performance log data and transforms the text into structured relations. We interpret the signals provided by the layout using our template catalog to induce an appropriate relation. We evaluate our method on a large sample obtained from our experimental computer science infrastructure in addition to a sample drawn from the wild. We were able to successfully extract on average over 97% and 85% of the data respectively.
Joshua Kimball, Calton Pu
Systematic Construction, Execution, and Reproduction of Complex Performance Benchmarks
Abstract
In this work, we present the next generation of the Elba toolkit available under a Beta release, showing how we have used it for experimental research in computer systems using RUBBoS, a well-known n-tier system benchmark, as example. In particular, we show how we have leveraged milliScope – Elba toolkit’s monitoring and instrumentation framework – to collect log data from benchmark executions at unprecedented fine-granularity, as well as how we have specified benchmark workflows with WED-Make – a declarative workflow language whose main characteristic is to facilitate the declaration of dependencies. We also show how to execute WED-Makefiles (i.e., workflow specifications written with WED-Make), and how we have successfully reproduced the experimental verification of the millibottleneck theory of performance bugs in multiple cloud environments and systems.
Rodrigo Alves Lima, Joshua Kimball, João E. Ferreira, Calton Pu
Multiple Workflow Scheduling with Offloading Tasks to Edge Cloud
Abstract
Edge computing can realize a data locality among a cloud and users, and it can be applied to task offloading, i.e., a part of workload on a mobile terminal is moved to an edge or a cloud system to minimize the response time with reducing energy consumption. Mobile workflow jobs have been widely used due to advance of computational power on a mobile terminal. Thus, how to offload or schedule each task in a mobile workflow is one of the current challenging issues.
In this paper, we propose a task scheduling algorithm with task offloading, called priority-based continuous task selection for offloading (PCTSO), to minimize the schedule length with energy consumption at a mobile client being reduced. PCTSO tries to select dependent tasks such that many tasks are offloaded so as to utilize many vCPUs in the edge cloud; in this manner, the degree of parallelism can be maintained. Experimental results of the simulation demonstration that PCTSO outperforms other algorithms in the schedule length and satisfies the energy constraint.
Hidehiro Kanemitsu, Masaki Hanada, Hidenori Nakazato
Min-Edge P-cycles: An Efficient Approach for Computing P-cycles in Optical Data Center Networks
Abstract
Effective network protection requires that extra resources be used in failure events. Pre-configured protection cycles (P-cycles) are proposed to protect mesh-based networks using few extra resources. A number of heuristic methods have been developed to overcome the complexity of finding optimum P-cycles in dense optical networks. The processing time of existing approaches depends on the number of working wavelengths. As the number of working wavelengths is increasing in modern networks, the processing time of current P-cycle computing approaches will continue to increase. In this paper, we propose an approach, called Min-Edge P-cycle (MEP), that addresses this problem. The core of the proposed approach is an iterative algorithm that uses the minimum-weight edge in each iteration. Our approach provides the same redundancy requirements as the previously known unity cycle method but it does not depend on the number of working wavelengths. The new approach can significantly reduce the processing time of computing P-cycles in large scale optical, server-centric data center networks, e.g., BCube, FiConn, and DCell networks.
Amir Mirzaeinia, Abdelmounaam Rezgui, Zaki Malik, Mehdi Mirzaeinia
Toward Accurate and Efficient Emulation of Public Blockchains in the Cloud
Abstract
Blockchain is an enabler of many emerging decentralized applications in areas of cryptocurrency, Internet of Things, smart healthcare, among many others. Although various open-source blockchain frameworks are available in the form of virtual machine images or docker images on public clouds, the infrastructure of mainstream blockchains nonetheless exhibits a technical barrier for many users to modify or test out new research ideas in blockchains. To make it worse, many advantages of blockchain systems can be demonstrated only at large scales, e.g., thousands of nodes, which are not always available to researchers. This paper presents an accurate and efficient emulating system to replay the execution of large-scale blockchain systems on tens of thousands of nodes. In contrast to existing work that simulates blockchains with artificial timestamp injection, the proposed system is designed to be executing real proof-of-work workload along with peer-to-peer network communications and hash-based immutability. In addition, the proposed system employs a preprocessing approach to avoid the per-node computation overhead at runtime and thus achieves practical scales. We have evaluated the system for emulating up to 20,000 nodes on Amazon Web Services (AWS), showing both high accuracy and high efficiency with millions of transactions.
Xinying Wang, Abdullah Al-Mamun, Feng Yan, Dongfang Zhao
Teleportation of VM Disk Images Over WAN
Abstract
As edge computing and hybrid clouds gain momentum, migrating virtual machines between datacenters is becoming increasingly important. Whether such migration is performed live or not, it starts with a full copy of a virtual disk over the network. This initial copy is consuming the bulk of the transfer time and network use. Improving this copy is the focus of our paper. While compression can somewhat help with this, we propose a novel technique, which we call teleportation. Teleportation assembles disk images directly at the destination from the pieces of other, unrelated disk images already present there. Since the data found at the destination doesn’t have to be sent over, our prototype has achieved 3.4x increase in network throughput (comparing to compression).
Oleg Zaydman, Roman Zhirin
Live Migration of Virtual Machines in OpenStack: A Perspective from Reliability Evaluation
Abstract
Virtualization technology is widely used in cloud data centers and today’s IT infrastructure. A key technology for server virtualization is the live migration of virtual machines (VMs). This technology allows VMs to be moved from one physical host to another while minimizing service downtime. The cloud providers usually use cloud operating system for virtual machine management. Currently the most widely used open source cloud operating system is OpenStack. In this paper, we investigate the reliability of VM live migration in OpenStack by increasing the system pressures and injecting network failures during the migration. We analyze the impact of these pressures and failures on the performance of VM live migration. The experimental results can be used to guide data center administrators in migration decisions and fault localization. Furthermore, it can help researchers to find bottlenecks and optimization methods for live migration in OpenStack.
Jin Hao, Kejiang Ye, Cheng-Zhong Xu
An Approach to Failure Prediction in Cluster by Self-updating Cause-and-Effect Graph
Abstract
Cluster systems have been widely used in cloud computing, high-performance computing, and other fields, and the usage and scale of cluster systems have shown a sharp upward trend. Unfortunately, the larger cluster systems are more prone to failures, and the difficulty and cost of repairing failures are unusually huge. Therefore, the importance and necessity of failure prediction in cluster systems are obvious. In order to solve this severe challenge, we propose an approach to failure prediction in cluster systems by Self-Updating Cause-and-Effect Graph. Different from the previous approaches, the most novel point of our approach is that it can automatically mine the causality among log events from cluster systems, and set up and update Cause-and-Effect Graph for failure prediction throughout their life cycle. In addition, we use the real logs from Blue Gene/L system to verify the effectiveness of our approach and compare our approach to other approaches using the same logs. The result shows that our approach outperforms other approaches with the best precision and recall rate reaching 89% and 85%, respectively.
Yan Yu, Haopeng Chen
Towards Decentralized Deep Learning with Differential Privacy
Abstract
In distributed machine learning, while a great deal of attention has been paid on centralized systems that include a central parameter server, decentralized systems have not been fully explored. Decentralized systems have great potentials in the future practical use as they have multiple useful attributes such as less vulnerable to privacy and security issues, better scalability, and less prone to single point of bottleneck and failure. In this paper, we focus on decentralized learning systems and aim to achieve differential privacy with good convergence rate and low communication cost. To achieve this goal, we propose a new algorithm, Leader-Follower Elastic Averaging Stochastic Gradient Descent (LEASGD), driven by a novel Leader-Follower topology and differential privacy model.
We also provide a theoretical analysis of the convergence rate of LEASGD and the trade-off between the performance and privacy in the private setting. We evaluate LEASGD in real distributed testbed with poplar deep neural network models MNIST-CNN, MNIST-RNN, and CIFAR-10. Extensive experimental results show that LEASGD outperforms state-of-the-art decentralized learning algorithm DPSGD by achieving nearly \(40\%\) lower loss function within same iterations and by \(30\%\) reduction of communication cost. Moreover, it spends less differential privacy budget and has final higher accuracy result than DPSGD under private setting.
Hsin-Pai Cheng, Patrick Yu, Haojing Hu, Syed Zawad, Feng Yan, Shiyu Li, Hai Li, Yiran Chen
Exploiting the Spam Correlations in Scalable Online Social Spam Detection
Abstract
The huge amount of social spam from large-scale social networks has been a common phenomenon in the contemporary world. The majority of former research focused on improving the efficiency of identifying social spam from a limited size of data in the algorithm side, however, few of them target on the data correlations among large-scale distributed social spam and utilize the benefits from the system side. In this paper, we propose a new scalable system, named SpamHunter, which can utilize the spam correlations from distributed data sources to enhance the performance of large-scale social spam detection. It identifies the correlated social spam from various distributed servers/sources through DHT-based hierarchical functional trees. These functional trees act as bridges among data servers/sources to aggregate, exchange, and communicate the updated and newly emerging social spam with each other. Furthermore, by processing the online social logs instantly, it allows online streaming data to be processed in a distributed manner, which reduces the online detection latency and avoids the inefficiency of outdated spam posts. Our experimental results with real-world social logs demonstrate that SpamHunter reaches 95% F1 score in the spam detection, achieves high efficiency in scaling to a large amount of data servers with low latency.
Hailu Xu, Liting Hu, Pinchao Liu, Boyuan Guan
Dynamic Network Anomaly Detection System by Using Deep Learning Techniques
Abstract
The Internet and computer networks are currently suffering from serious security threats. Those threats often keep changing and will evolve to new unknown variants. In order to maintain the security of network, we design and implement a dynamic network anomaly detection system using deep learning methods. We use Long Short Term Memory (LSTM) to build a deep neural network model and add an Attention Mechanism (AM) to enhance the performance of the model. The SMOTE algorithm and an improved loss function are used to handle the class-imbalance problem in the CSE-CIC-IDS2018 dataset. The experimental results show that the classification accuracy of our model reaches 96.2%, which is higher than other machine learning algorithms. In addition, the class-imbalance problem is alleviated to a certain extent, making our method have great practicality.
Peng Lin, Kejiang Ye, Cheng-Zhong Xu
Heterogeneity-Aware Data Placement in Hybrid Clouds
Abstract
In next-generation cloud computing clusters, performance of data-intensive applications will be limited, among other factors, by disks data transfer rates. In order to mitigate performance impacts, cloud systems offering hierarchical storage architectures are becoming commonplace. The Hadoop File System (HDFS) offers a collection of storage policies that exploit different storage types such as RAM_DISK, SSD, HDD, and ARCHIVE. However, developing algorithms to leverage heterogeneous storage through an efficient data placement has been challenging. This work presents an intelligent algorithm based on genetic programming which allow to find the optimal mapping of input datasets to storage types on a Hadoop file system.
Jack D. Marquez, Juan D. Gonzalez, Oscar H. Mondragon
Towards Automated Configuration of Cloud Storage Gateways: A Data Driven Approach
Abstract
Cloud storage gateways (CSGs) are an essential part of enterprises to take advantage of the scale and flexibility of cloud object store. A CSG provides clients the impression of a locally configured large size block-based storage device, which needs to be mapped to remote cloud storage which is invariably object based. Proper configuration of the cloud storage gateway is extremely challenging because of numerous parameters involved and interactions among them. In this paper, we study this problem for a commercial CSG product that is typical of offerings in the market. We explore how machine learning techniques can be exploited both for the forward problem (i.e. predicting performance from the configuration parameters) and backward problem (i.e. predicting configuration parameter values from the target performance). Based on extensive testing with real world customer workloads, we show that it is possible to achieve excellent prediction accuracy while ensuring that the model is not overfitted to the data.
Sanjeev Sondur, Krishna Kant
The Case for Physical Memory Pools: A Vision Paper
Abstract
The cloud is a rapidly expanding and increasingly prominent component of modern computing. Monolithic servers limit the flexibility of cloud-based systems, however, due to static memory limitations. Developments in OS design, distributed memory systems, and address translation have been crucial in aiding the progress of the cloud. In this paper, we discuss recent developments in virtualization, OS design and distributed memory structures with regards to their current impact and relevance to future work on eliminating memory limits in cloud computing. We argue that creating physical memory pools is essential for cheaper and more efficient cloud computing infrastructures, and we identify research challenges to implement these structures.
Heather Craddock, Lakshmi Prasanna Konudula, Kun Cheng, Gökhan Kul
A Parallel Algorithm for Bayesian Text Classification Based on Noise Elimination and Dimension Reduction in Spark Computing Environment
Abstract
The Naive Bayesian algorithm is one of the ten classical algorithms in data mining, which is widely used as the basic theory for text classification. With the high-speed development of the Internet and information systems, huge amount of data are being produced all the time. Some problems are certain to arise when the traditional Bayesian classification algorithm addresses massive amount of data, especially without the parallel computing framework. This paper proposes an improved Bayesian algorithm INBCS, for text classification in the Spark computing environment and improves the Naive Bayesian algorithm based on a polynomial model. For the data preprocessing, this paper first proposes a parallel noise elimination algorithm, and then proposes another parallel dimension reduction algorithm based on Information Gain and TextRank computation in the Spark environment. Based on these preprocessed data, an improved parallel method is proposed for calculating the conditional probability that comprehensively considers the effects of the feature items in each document, class and training set. Finally, through experiments on different widely used corpuses on the Spark computation platform, the results illustrate that INBCS can obtain higher accuracy and efficiency than some current improvements and implementations of the Naive Bayesian algorithms in Spark ML-library.
Zhuo Tang, Wei Xiao, Bin Lu, Youfei Zuo, Yuan Zhou, Keqin Li
On the Optimal Number of Computational Resources in MapReduce
Abstract
Big data computing in the cloud needs faster processing and better resource provisioning. MapReduce is the framework for computing large scale datasets in cloud environments. Optimization of resource requirement for each job to satisfy a specific objective in MapReduce is an open problem. Many factors, e.g., system side information and requirements of each client must be considered to estimate the appropriate amount of resources. This paper presents a mathematical model for the optimal number of map tasks in MapReduce resource provisioning. This model is to estimate the optimal number of the mappers based on the resource specification and the size of the dataset.
Htway Htway Hlaing, Hidehiro Kanemitsu, Tatsuo Nakajima, Hidenori Nakazato
Class Indistinguishability for Outsourcing Equality Conjunction Search
Abstract
Searchable symmetric encryption (SSE) enables a remote cloud server to answer queries directly over encrypted data on a client’s behalf, therefore, relieves the resource limited client from complicated data management tasks. Two key requirements are a strong security guarantee and a sub-linear search performance. The bucketization approach in the literature addresses these requirements at the expense of downloading many false positives or requiring the client to search relevant bucket ids locally, which limits the applicability of the method. In this paper, we propose a novel approach CLASS to meet these requirements for equality conjunction search while minimizing the client work and communication cost. First, we generalize the standard ciphertext indistinguishability to partitioned data, called class indistinguishability, which provides a level of ciphertext indistinguishability similar to that of bucketization but allows the cloud server to perform search of relevant data and filtering of false positives. We present a construction achieving these goals through a two-phase search algorithm for a query. The first phase finds a candidate set through a sub-linear search. The second phase finds the exact query result using a linear search applied to the candidate set. Both phases are performed by the server and are implemented by plugging in existing search methods. The experiment results on large real-world data sets show that our approach outperforms the state-of-the-art.
Weipeng Lin, Ke Wang, Zhilin Zhang, Ada Waichee Fu, Raymond Chi-Wing Wong, Cheng Long
A Hybrid Approach for Synchronizing Clocks in Distributed Systems
Abstract
The art of synchronizing clocks across a wide area network has got a new dimension when it comes to the reality of achieving the demand for high-accuracy synchronization; even for local or small computing systems. Before implementing any clock synchronization protocol, some important aspects must be considered. For example, communication latency- is it fixed or variable? Does there exist any reference clock in the system or not? In this paper, we have studied the standard and experimental protocols for synchronizing clocks over a geographically distributed network and implemented the features of the Network Time Protocol (NTP) combined with the timing signal of Global Positioning System (GPS) for synchronizing distributed system’s clocks more accurately. Our proposed system can achieve higher clock synchronization accuracy compared to the traditional NTP clock synchronization protocol with the help of our designed decentralized GPS-based NTP servers.
Md Shohel Khan, Ratul Sikder, Muhammad Abdullah Adnan
JCallGraph: Tracing Microservices in Very Large Scale Container Cloud Platforms
Abstract
Microservice architecture splits giant and complex enterprise applications into fine-grained microservices, promoting agile development, integration, delivery and deployment. However, monitoring tens of thousands of microservices is extremely challenging, and debugging problems among massive microservices is like looking for a needle in a haystack. We present JCallGraph, a tracing and analytics tool to capture and visualize the microservice invocation relationship of tens of thousands of microservices with millions of containers at JD.com. JCallGraph achieves three main goals for distributed tracing and debugging: online microservices invocation construction within milliseconds, minimal overhead without any significant performance impact on real-production applications, and application-agnostic with zero-intrusion to application. Our evaluation shows that JCallGraph can accurately capture the real-time invocation relationship at massive scale and help developers to efficiently understand interactions among microservices, pinpoint root-cause of problems.
Haifeng Liu, Jinjun Zhang, Huasong Shan, Min Li, Yuan Chen, Xiaofeng He, Xiaowei Li
An Overview of Cloud Computing Testing Research
Abstract
With the rapid growth in information technology, there is a significant increase in research activities in the field of cloud computing. Cloud testing can be interpreted as (i) testing of cloud applications, which involves continuous monitoring of cloud application status to verify Service Level Agreements, and (ii) testing as a cloud service which involves using the cloud as a testing middleware to execute a large-scale simulation of real-time user interactions. This study aims to examine the methodologies and tools used in cloud testing and the current research trends in cloud computing testing.
Jia Yao, Babak Maleki Shoja, Nasseh Tabrizi
A Robust Multi-terminal Support Method Based on Tele-Immersion Multimedia Technology
Abstract
In this paper, a multi terminal support method based on tele-immersion multimedia technology is proposed. This method provides two functions of synchronous request and asynchronous request, and it can dynamically download and decompress the material by the way of material sub-contracting and on-demand loading, which effectively avoids that the main thread is blocked because of too long request time and that browser crashes because of excessive memory footprint. The system excellently solves the problems of memory, storage and network communication in the development of cross WebGL platform, and optimizes the storage mode of memory, which realizes the high speed calculation and real-time processing of data. The system also integrates the system operation framework, and provides unified framework support for the development of the sub-systems of different platforms. Based on the framework, the secondary development can quickly complete the process development of the sub-system.
Ronghe Wang, Bo Zhang, Haiyong Xie, Dong Jiao, Shilong Ma
CMonitor: A Monitoring and Alarming Platform for Container-Based Clouds
Abstract
Container technology has been recently recognized by the industry like Google and Alibaba as an emerging platform for building and deploying applications. Due to the high flexibility and low cost, more and more applications are beginning to use containers as the underlying resource abstraction platforms. In order to maintain the stability of the system and detect suspected abnormal events or operations, it is necessary to provide a monitoring and alarming mechanism for containers. In this paper, we design and implement a monitoring and alarming platform - CMonitor for the container-based clouds. It is built upon the interfaces provided by Docker containers. Specially, we added some new functions: (i) Integrated monitoring services. CMonitor not only monitors the basic resource usages of each container but also provides hardware-level and application-level monitoring services. (ii) Global topology view. CMonitor generates a global topology structure for containers by parsing network traffic among containers. (iii) Intelligent alarming mechanism. CMonitor contains several anomaly detection algorithms to identify abnormal behaviors in containers and then notifies an alarm to the users. (iv) Rich visualization functions. CMonitor records the runtime log for both system resources and application performance and generates tables and figures with advanced data visualization techniques. By using CMonitor, users can better understand the system runtime status and monitor potential abnormal events, making container-based clouds more stable, efficient and safe.
Shujian Ji, Kejiang Ye, Cheng-Zhong Xu
CPR: Client-Side Processing of Range Predicates
Abstract
Range predicates are important to diverse application workloads. A system may process range predicates using either a server-side, a client-side, or a hybrid of these two solutions. This study presents CPR, a client-side solution that caches the result of range predicates and looks up their results. This implementation provides strong consistency and supports alternative write policies. It is embodied in a flexible framework named RangeQP that provides the hybrid solution. We quantify strengths and limitations of CPR when compared with the server-side solution.
Shahram Ghandeharizadeh, Yazeed Alabdulkarim, Hieu Nguyen
Correction to: Cloud Computing – CLOUD 2019
Dilma Da Silva, Qingyang Wang, Liang-Jie Zhang
Backmatter
Metadata
Title
Cloud Computing – CLOUD 2019
Editors
Dilma Da Silva
Qingyang Wang
Liang-Jie Zhang
Copyright Year
2019
Electronic ISBN
978-3-030-23502-4
Print ISBN
978-3-030-23501-7
DOI
https://doi.org/10.1007/978-3-030-23502-4

Premium Partner