Skip to main content

About this book

This book constitutes the refereed post-conference proceedings of the 4th International Symposium on Algorithmic Aspects of Cloud Computing, ALGOCLOUD 2018, held in Helsinki, Finland, in August 2018.

The 11 revised full papers were carefully reviewed and selected from 29 submissions. The aim of the symposium is to present research activities and results on topics related to algorithmic, design, and development aspects of modern cloud-based systems.

Table of Contents


Minimization of Testing Costs in Capacity-Constrained Database Migration

Database migration is an ubiquitous need faced by enterprises that generate and use vast amount of data. This is due to database software updates, or from changes to hardware, project standards, and other business factors [1]. Migrating a large collection of databases is a way more challenging task than migrating a single database, due to the presence of additional constraints. These constraints include capacities of shifts, sizes of databases, and timing relationships. In this paper, we present a comprehensive framework that can be used to model database migration problems of different enterprises with customized constraints, by appropriately instantiating the parameters of the framework. We establish the computational complexities of a number of instantiations of this framework. We present fixed-parameter intractability results for various relevant parameters of the database migration problem. Finally, we discuss a randomized approximation algorithm for an interesting instantiation.
K. Subramani, Bugra Caskurlu, Alvaro Velasquez

Community Detection via Neighborhood Overlap and Spanning Tree Computations

Most social networks of today are populated with several millions of active users, while the most popular of them accommodate way more than one billion. Analyzing such huge complex networks has become particularly demanding in computational terms. A task of paramount importance for understanding the structure of social networks as well as of many other real-world systems is to identify communities, that is, sets of nodes that are more densely connected to each other than to other nodes of the network. In this paper we propose two algorithms for community detection in networks, by employing the neighborhood overlap metric and appropriate spanning tree computations.
Ketki Kulkarni, Aris Pagourtzis, Katerina Potika, Petros Potikas, Dora Souliou

Colocation, Colocation, Colocation: Optimizing Placement in the Hybrid Cloud

Today’s enterprise customer has to decide how to distribute her services among multiple clouds - between on-premise private clouds and public clouds - so as to optimize different objectives, e.g., minimizing bottleneck resource usage, maintenance downtime, bandwidth usage or privacy leakage. These use cases motivate a general formulation, the uncapacitated (A defining feature of clouds is their elasticity or ability to scale with load) multidimensional load assignment problem - VITA(F) (Vectors-In-Total Assignment): the input consists of n, d-dimensional load vectors \(\bar{V} = \{\bar{V}_i | 1\le i \le n\}\), m cloud buckets \(B = \{B_j | 1\le j \le m\}\) with associated weights \(w_j\) and assignment constraints represented by a bipartite graph \(G=(\bar{V} \cup B, E \subseteq \bar{V} \times B)\) restricting load \(\bar{V}_i\) to be assigned only to buckets \(B_j\) with which it shares an edge (In a slight abuse of notation, we let \(B_j\) also denote the subset of vectors assigned to bucket \(B_j\)). F can be any operator mapping a vector to a scalar, e.g., \(\max \), \(\min \), etc. The objective is to partition the vectors among the buckets, respecting assignment constraints, so as to achieve
$$\begin{aligned} \min [ \sum _j w_j*F (\sum _{\bar{V}_i \in B_j} \bar{V}_i)] \end{aligned}$$
We characterize the complexity of VITA\((\min )\), VITA\((\max )\), VITA\((\max - \min )\) and VITA\((2^{nd}\max )\) by providing hardness results and approximation algorithms, LP-Approx involving clever rounding of carefully crafted linear programs. Employing real-world traces from Nutanix, a leading hybrid cloud provider, we perform a comprehensive comparative evaluation versus three natural heuristics - Conservative, Greedy and Local-Search. Our main finding is that on real-world workloads too, LP-Approx outperforms the heuristics, in terms of quality, in all but one case.
Srinivas Aiyar, Karan Gupta, Rajmohan Rajaraman, Bochao Shen, Zhifeng Sun, Ravi Sundaram

A Peer-to-Peer Based Cloud Storage Supporting Orthogonal Range Queries of Arbitrary Dimension

We present a peer-to-peer network that supports the efficient processing of orthogonal range queries in a d-dimensional point space. The network is the same for each dimension, namely a distance halving network like the one introduced by Naor and Wieder (ACM TALG’07). We show how to execute such range queries using \(\mathcal {O}\left( 2^{d'}d\,\log m + d\,|R|\right) \) hops (and the same number of messages) in total. Here \([m]^d\) is the ground set, |R| is the size and \(d'\) the dimension of the queried range. Furthermore, if the peers form a distributed network, the query can be answered in \(\mathcal {O}\left( d\,\log m + d\,\sum _{i=1}^{d}(b_i-a_i+1)\right) \) communication rounds. Our algorithms are based on a mapping of the Hilbert Curve through \([m]^d\) to the peers.
Markus Benter, Till Knollmann, Friedhelm Meyer auf der Heide, Alexander Setzer, Jannik Sundermeier

A Fully Polynomial Time Approximation Scheme for Packing While Traveling

Understanding the interaction between different combinatorial optimization problems is a challenging task of high relevance for numerous real-world applications including modern computer and memory architectures as well as high performance computing. Recently, the Traveling Thief Problem (TTP), as a combination of the classical traveling salesperson problem and the knapsack problem, has been introduced to study these interactions in a systematic way. We investigate the underlying non-linear Packing While Traveling Problem (PWTP) of the TTP where items have to be selected along a fixed route. We give an exact dynamic programming approach for this problem and a fully polynomial time approximation scheme (FPTAS) when maximizing the benefit that can be gained over the baseline travel cost. Our experimental investigations show that our new approaches outperform current state-of-the-art approaches on a wide range of benchmark instances.
Frank Neumann, Sergey Polyakovskiy, Martin Skutella, Leen Stougie, Junhua Wu

Multi-commodity Flow with In-Network Processing

Modern networks run “middleboxes” that offer services ranging from network address translation and server load balancing to firewalls, encryption, and compression. In an industry trend known as Network Functions Virtualization (NFV), these middleboxes run as virtual machines on any commodity server, and the switches steer traffic through the relevant chain of services. Network administrators must decide how many middleboxes to run, where to place them, and how to direct traffic through them, based on the traffic load and the server and network capacity. Rather than placing specific kinds of middleboxes on each processing node, we argue that server virtualization allows each server node to host all middlebox functions, and simply vary the fraction of resources devoted to each one. This extra flexibility fundamentally changes the optimization problem the network administrators must solve to a new kind of multi-commodity flow problem, where the traffic flows consume bandwidth on the links as well as processing resources on the nodes. We show that allocating resources to maximize the processed flow can be optimized exactly via a linear programming formulation, and to arbitrary accuracy via an efficient combinatorial algorithm. Our experiments with real traffic and topologies show that a joint optimization of node and link resources leads to an efficient use of bandwidth and processing capacity. We also study a class of design problems that decide where to provide node capacity to best process and route a given set of demands, and demonstrate both approximation algorithms and hardness results for these problems.
Moses Charikar, Yonatan Naamad, Jenifer Rexford, X. Kelvin Zou

On-Line Big-Data Processing for Visual Analytics with Argus-Panoptes

Analyses with data mining and knowledge discovery techniques are not always successful as they occasionally yield no actionable results. This is especially true in the Big-Data context where we routinely deal with complex, heterogeneous, diverse and rapidly changing data. In this context, visual analytics play a key role in helping both experts and users to readily comprehend and better manage analyses carried on data stored in Infrastructure as a Service (IaaS) cloud services. To this end, humans should play a critical role in continually ascertaining the value of the processed information and are invariably deemed to be the instigators of actionable tasks. The latter is facilitated with the assistance of sophisticated tools that let humans interface with the data through vision and interaction. When working with Big-Data problems, both scale and nature of data undoubtedly present a barrier in implementing responsive applications. In this paper, we propose a software architecture that seeks to empower Big-Data analysts with visual analytics tools atop large-scale data stored in and processed by IaaS. Our key goal is to not only yield on-line analytic processing but also provide the facilities for the users to effectively interact with the underlying IaaS machinery. Although we focus on hierarchical and spatiotemporal datasets here, our proposed architecture is general and can be used to a wide number of application domains. The core design principles of our approach are: (a) On-line processing on cloud with Apache Spark. (b) Integration of interactive programming following the notebook paradigm through Apache Zeppelin. (c) Offering robust operation when data and/or schema change on the fly. Through experimentation with a prototype of our suggested architecture, we demonstrate not only the viability of our approach but also we show its value in a use-case involving publicly available crime data from United Kingdom.
Panayiotis I. Vlantis, Alex Delis

An Overview of Big Data Issues in Privacy-Preserving Record Linkage

Nearly 90% of today’s data have been produced only in the last two years! These data come from a multitude of human activities, including social networking sites, mobile phone applications, electronic medical records systems, e-commerce sites, etc. Integrating and analyzing this wealth and volume of data offers remarkable opportunities in sectors that are of high interest to businesses, governments, and academia. Given that the majority of the data are proprietary and may contain personal or business sensitive information, Privacy-Preserving Record Linkage (PPRL) techniques are essential to perform data integration. In this paper, we review existing work in PPRL, focusing on the computational aspect of the proposed algorithms, which is crucial when dealing with Big data. We propose an analysis tool for the computational aspects of PPRL, and characterize existing PPRL techniques along five dimensions. Based on our analysis, we identify research gaps in current literature and promising directions for future work.
Dinusha Vatsalan, Dimitrios Karapiperis, Aris Gkoulalas-Divanis

Web Frameworks Metrics and Benchmarks for Data Handling and Visualization

This paper presents benchmarks regarding a web application that aims at displaying and visualizing a dataset for air quality monitoring, experimenting using two different programming languages. Specifically, an application is developed via the use of PHP and Python frameworks in order to study the impact of the CPU, the hard disk architecture and the operating system between each system. Detailed tests have been conducted regarding the necessary computing resources as well as the use of the network (delay, CPU usage etc.) for different operating systems and hardware specifications.
Alexandros Gazis, Eleftheria Katsiri

Algorithms for Cloud-Based Smart Mobility

Innovative algorithm technology plays an important role in smart city applications. In this work, we review some recent innovative algorithmic approaches that contributed decisively in the development of efficient and effective cloud-based systems for smart mobility in urban environments.
Kalliopi Giannakopoulou

A Frequent Itemset Hiding Toolbox

Advances in data collection and storage technologies have given way to the establishment of transactional databases among companies and organizations, as they allow enormous amounts of data to be stored efficiently. Useful knowledge can be mined from these data, which can be used in several ways depending on the nature of the data. Quite often companies and organizations are willing to share data for the sake of mutual benefit. However, the sharing of such data comes with risks, as problems with privacy may arise. Sensitive data, along with sensitive knowledge inferred from this data, must be protected from unintentional exposure to unauthorized parties. One form of the inferred knowledge is frequent patterns mined in the form of frequent itemsets from transactional databases. The problem of protecting such patterns from being discovered, is known as the frequent itemset hiding problem. In this paper we present a toolbox, which provides several implementations of frequent itemset hiding algorithms. Firstly, we summarize the most important aspects of each algorithm. We then introduce the architecture of the toolbox and its novel features. Finally, we provide experimental results on real world datasets, demonstrating the capabilities of the toolbox and the convenience it offers in comparing different algorithms.
Aris Gkoulalas-Divanis, Vasileios Kagklis, Elias C. Stavropoulos


Additional information

Premium Partner

    Image Credits