main-content

## Über dieses Buch

This book constitutes the refereed proceedings of the 16th International Conference on Principles of Distributed Systems, OPODIS 2012, held in Rome, Italy, in December 2012. The 24 papers presented were carefully reviewed and selected from 89 submissions. The conference is an international forum for the exchange of state-of-the-art knowledge on distributed computing and systems. Papers were sought soliciting original research contributions to the theory, specification, design and implementation of distributed systems.

## Inhaltsverzeichnis

### FixMe: A Self-organizing Isolated Anomaly Detection Architecture for Large Scale Distributed Systems

Abstract
Monitoring a system is the ability of collecting and analyzing relevant information provided by the monitored devices so as to be continuously aware of the system state. However, the ever growing complexity and scale of systems makes both real time monitoring and fault detection a quite tedious task. Thus the usually adopted option is to focus solely on a subset of information states, so as to provide coarse-grained indicators. As a consequence, detecting isolated failures or anomalies is a quite challenging issue. In this work, we propose to address this issue by pushing the monitoring task at the edge of the network. We present a peer-to-peer based architecture, which enables nodes to adaptively and efficiently self-organize according to their “health” indicators. By exploiting both temporal and spatial correlations that exist between a device and its vicinity, our approach guarantees that only isolated anomalies (an anomaly is isolated if it impacts solely a monitored device) are reported on the fly to the network operator. We show that the end-to-end detection process, i.e., from the local detection to the management operator reporting, requires a logarithmic number of messages in the size of the network.
Emmanuelle Anceaume, Erwan Le Merrer, Romaric Ludinard, Bruno Sericola, Gilles Straub

Abstract

### Range Queries in Non-blocking k-ary Search Trees

Abstract
We present a linearizable, non-blocking k-ary search tree (k-ST) that supports fast searches and range queries. Our algorithm uses single-word compare-and-swap (CAS) operations, and tolerates any number of crash failures. Performance experiments show that, for workloads containing small range queries, our k-ST significantly outperforms other algorithms which support these operations, and rivals the performance of a leading concurrent skip-list, which provides range queries that cannot always be linearized.
Trevor Brown, Hillel Avni

### On the Polling Problem for Social Networks

Abstract
We tackle the polling problem in social networks where the privacy of exchanged information and user reputation are very critical. Indeed, users want to preserve the confidentiality of their votes and to hide, if any, their misbehaviors. Recent works [7,8] proposed polling protocols based on simple secret sharing scheme and without requiring any central authority or cryptography system. But these protocols can be deployed safely provided that the social graph structure should be transformed into a ring-based structure and the number of participating users is perfect square. Accordingly, devising polling protocols regardless these constraints remains a challenging issue.
In this paper, we propose a simple decentralized polling protocol that relies on the current state of social graphs. More explicitly, we define one family of social graphs and show their structures constitute necessary and sufficient condition to ensure vote privacy and limit the impact of dishonest users on the accuracy of the output of the poll. In a system of N users with D ≤ N/5 dishonest ones (and similarly to the works [7,8] where they considered $$D<\sqrt{N}$$), a privacy parameter k enables us to obtain the following results: (i) the probability to recover one vote of honest node is bounded by $$\sum_{m=k+1}^{2k}\bigl(\frac{D}{N}\bigr)^{m}.\bigl(\frac{1}{2}\bigr)^{2k+1-m}$$; (ii) the maximum number of votes revealed by dishonest nodes is 2D; and, (iii) the maximum impact on the output is (6k + 4)D. Despite the use of richer social graph structures, we succeed to detect the misbehaving users by manipulating verification procedures based on shortest path scheme and routing tables. An experimental evaluation demonstrates that the dishonest coalition never affects the outcome of the poll outside the theoretical bound of (6k + 4)D.

### Non-deterministic Population Protocols

Abstract
In this paper we show that, in terms of generated output languages, non-deterministic population protocols are strictly more powerful than deterministic ones. Analyzing the reason for this negative result, we propose two slightly enhanced models, in which non-deterministic population protocols can be exactly simulated by deterministic ones. First, we consider a model in which interactions are not only between couples of agents, but also between triples and in which non-uniform initial states are allowed. We generalize this transformation and we prove a general property for a model with interactions between any number of agents. Second, we simulate any non-deterministic population protocol by a deterministic one in a model where a configuration can have an empty output.
Non-deterministic and deterministic population protocols are then compared in terms of inclusion of their output languages, that is, in terms of solvability of problems. We present a transformation realizing this inclusion. It uses (again) the natural model with interactions of triples, but does not need non-uniform initial states. As before, this result is generalized for the natural model with interactions between any number of agents.
Note that the transformations in the paper apply to a whole class of non-deterministic population protocols (for a proposed model), in contrast with the transformations proposed in previous works, which apply only to a specific sub-class of protocols (satisfying a so called “elasticity” condition).
Joffroy Beauquier, Janna Burman, Laurent Rosaz, Brigitte Rozoy

### Stochastic Modeling of Dynamic Distributed Systems with Crash Recovery and Its Application to Atomic Registers

Abstract
In a dynamic distributed system, processes can join and leave the system. We consider such a system in which processes are subject to crash failures from which they may recover. Assuming a stochastic model for joining, leaving, crashing, and recovering of processes, we provide a probabilistic analysis of the long-term behavior of the system. As an example of the utility of our modeling, we provide a specification and implementation of an atomic register in such a system. The dynamic nature of the system can cause all active processes to leave or crash, leaving the system in a dormant state. We analyze the average time spent in dormant states that can give us some insight into the behavior of the register system.
Silvia Bonomi, Andreas Klappenecker, Hyunyoung Lee, Jennifer L. Welch

### When and How Process Groups Can Be Used to Reduce the Renaming Space

Abstract
Considering the M-renaming problem and process groups, this paper investigates the following question: Is there a relation between the number of groups and the size of the new name space M? This question can be rephrased as follows: Can the initial partitioning of the processes into m groups allows the size of the renaming space M to be reduced, and if yes, how much?
This paper answers the previous questions. Let n denote the number of processes. Assuming that the processes are initially partitioned into m = n − ℓ non-empty groups, such that each process knows only its identity and its group number, the paper first presents a wait-free M-renaming algorithm whose size of the new name space is M = n + 2ℓ − 1. For $$\frac{n}{2} < m \leq n-1$$ (i.e. $$1\leq \ell < \frac{n}{2}$$), we have M < 2n − 1, which shows that, when the number of groups is greater than $$\frac{n}{2}$$, groups allow to circumvent the renaming lower bound in read/write systems. Then, on the lower bound size, the paper shows that there are pairs of values (n,m) such that there is no read/write wait-free M-renaming algorithm for which M ≤ 2n − 2. This impossibility result breaks our hope to have a renaming algorithm providing a new name space whose size would decrease “regularly” as the number of groups increases from 1 to n. Finally, the paper considers the case where each group includes at least s processes. This algorithm shows that, when m is such that $$\frac{n}{s+1}< m < \frac{n}{s}$$, there is an M-renaming algorithm where M = 3n − (s + 1)m − 1 = n(2 − s) + (s + 1)ℓ − 1. Hence, the paper leaves open the following question: For any n and s = 1, does the predicate $$m > \frac{n}{2}$$ define a threshold on the number of groups which allows the 2n − 2 lower bound on the renaming space size to be bypassed?
Armando Castañeda, Michel Raynal, Julien Stainer

Abstract
We consider the task of electing a leader in a distributed manner in ad hoc multi-hop radio networks. Radio networks represent the class of wireless networks in which one frequency is used for transmissions, network’s topology can be represented by a simple undirected graph with some n nodes, and there is no collision detection. We give a randomized algorithm electing a leader in $$\mathcal{O}(n)$$ expected time and prove that this time bound is optimal. We give a deterministic algorithm electing a leader in $$\mathcal{O}(n\log^{3/2}n \sqrt{\log\log n})$$ time. By way of application, we show how to perform gossiping with combined messages in $$\mathcal{O}(n\log^{3/2} n \sqrt{\log\log n})$$ time by a deterministic algorithm, and in $$\mathcal{O}(n)$$ expected time by a randomized algorithm.
Bogdan S. Chlebus, Dariusz R. Kowalski, Andrzej Pelc

### Tree Exploration by a Swarm of Mobile Agents

Abstract
A swarm of mobile agents starting at the root of a tree has to explore it: every node of the tree has to be visited by at least one agent. In every round, each agent can remain idle or move to an adjacent node. In any round all agents have to be at distance at most d, where d is a parameter called the range of the swarm. The goal is to explore the tree as fast as possible.
If the topology of the tree is known to the agents, we establish optimal exploration time for any range d and give an optimal exploration algorithm. The formula for the optimal exploration time of a tree by a swarm of agents depends on the range of the swarm and on the characteristics of the tree. If the tree is unknown, the quality of an exploration algorithm $$\mathcal{A}$$ is measured by comparing its time to that of the optimal algorithm having full knowledge of the tree. The ratio between these times, maximized over all starting nodes and over all trees, is called the overhead of algorithm $$\mathcal{A}$$. Overhead 2 is achieved when the swarm executes a DFS, remaining together all the time. We show that this overhead cannot be improved, for any range d.
Jurek Czyzowicz, Andrzej Pelc, Mélanie Roy

### Crash Resilient and Pseudo-Stabilizing Atomic Registers

Abstract
We propose a crash safe and pseudo-stabilizing algorithm for implementing an atomic memory abstraction in a message passing system. Our algorithm is particularly appealing for multi-core architectures where both processors and memory contents (including stale messages in transit) are prone to errors and faults. Our algorithm extends the classical fault-tolerant implementation of atomic memory that was originally proposed by Attiya, Bar-Noy, and Dolev (ABD) to a stabilizing setting where memory can be initially corrupted in an arbitrary manner. The original ABD algorithm provides no guaranties when started in such a corrupted configuration. Interestingly, our scheme preserves the same properties as ABD when there are no transient faults, namely the linearizability of operations. When started in an arbitrarily corrupted initial configuration, we still guarantee eventual yet suffix-closed linearizability.
Shlomi Dolev, Swan Dubois, Maria Gradinariu Potop-Butucaru, Sébastien Tixeuil

### Directed Graph Exploration

Abstract
We study the problem of exploring all nodes of an unknown directed graph. A searcher has to construct a tour that visits all nodes, but only has information about the parts of the graph it already visited. The goal is to minimize the cost of such a tour. In this paper, we present upper and lower bounds for both the deterministic and the randomized online version of exploring all nodes of directed graphs. Our bounds are sharp or sharp up to a small constant, depending on the specific model. Essentially, exploring a directed graph has a multiplicative overhead linear in the number of nodes. If one wants to search for just a node in unweighted directed graphs, a greedy algorithm with quadratic multiplicative overhead can only be improved by a factor of at most two. We were also able to show that randomly choosing a starting point does not improve lower bounds beyond a small constant factor.
Klaus-Tycho Förster, Roger Wattenhofer

### Lattice Completion Algorithms for Distributed Computations

Abstract
A distributed computation is usually modeled as a finite partially ordered set (poset) of events. Many operations on this poset require computing meets and joins of subsets of events. The lattice of normal cuts of a poset is the smallest lattice that embeds the poset such that all meets and joins are defined. In this paper, we propose new algorithms to construct or enumerate the lattice of normal cuts. Our algorithms are designed for distributed computing applications and have lower time or space complexity than those of existing algorithms. We also show applications of this lattice to the problems in distributed computing such as finding the extremal events and detecting global predicates.
Vijay K. Garg

Abstract
This paper studies single hop broadcast in a single hop shared spectrum radio network. The problem requires a source to deliver a message to n receivers, where only a polynomial upper bound on n is known. The model assumes that in each round, each device can participate on 1 out of $$\mathcal{C} \geq 1$$ available communication channels, up to $$t < \mathcal{C}$$ of which might be disrupted, preventing communication. This disruption captures the unpredictable message loss that plagues real shared spectrum networks. The best existing solution to the problem, which comes from the systems literature, requires $$O\big({\frac{\mathcal{C} t}{\mathcal{C}-t}}\log{n}\big)$$ rounds. Our algorithm, by contrast, solves the problem in $$O\big(\frac{\mathcal{C}}{\mathcal{C}-t}\lceil\frac{t}{n}\rceil\log{n}\big)$$ rounds, when $$\mathcal{C} \geq \log{n}$$, and in $$O\big(\frac{\mathcal{C}}{\mathcal{C}-t}\log{n}\cdot\log{\rm log}{n}\big)$$ rounds, when $$\mathcal{C}$$ is smaller. It accomplishes this improvement by deploying a self-regulating relay strategy in which receivers that already know useful information coordinate themselves to efficiently assist the source’s broadcast. We conclude by proving these bounds tight for most cases.
Mohsen Ghaffari, Seth Gilbert, Calvin Newport, Henry Tan

### Attack-Resilient Multitree Data Distribution Topologies

Abstract
We consider a scenario of information broadcast where a source node distributes data in parallel over a fixed number of trees spanning over a large audience of nodes. The trees used for data dissemination are called distribution topology. Particular implementations of this scenario are peer-to-peer live streaming systems. Encoding data partially redundant, nodes are satisfied as long as they receive packets in at least a certain portion of trees. Otherwise, they are called isolated.
We study distribution topologies limiting the worst-case consequences of attacks suddenly removing nodes from the trees. In particular, we aim to minimize the maximum possible number of isolated nodes for each number of removed nodes. We show necessary conditions on distribution topologies closely approximating this goal. Then, we demonstrate that the attack-resilience of topologies adhering to these conditions is characterized by specific matrices that have to be Orthogonal Arrays of maximum strength. The computational complexity of finding such matrices for arbitrary dimensions is a long-standing research problem. Our results show that finding representatives of the studied distribution topologies is at least as hard as this problem.
Sascha Grau

### On the Complexity of Distributed Broadcasting and MDS Construction in Radio Networks

Abstract
We study two fundamental problems in the model of undirected radio networks: broadcasting and construction of a Minimal Dominating Set (MDS). The network is ad hoc, in the sense that initially nodes know only their own ID and the IDs of their neighbors. For both problems, we provide deterministic distributed algorithms working in $$O(D\sqrt{n} \log^6 n)$$ communication rounds, and complement them by a close lower bound $$\Omega(\sqrt{Dn\log(n/D)})$$, where n is the number of nodes and D is the radius of the radio network. Our work provides several novel algorithmic methods for overcoming the impact of collisions in radio networks, and shrinks the gap between the lower and the upper bounds for the considered problems from polynomial to polylogarithmic, for networks with small (polylogarithmic) radius.
Tomasz Jurdzinski, Dariusz R. Kowalski

### On the Impact of Identifiers on Local Decision

Abstract
The issue of identifiers is crucial in distributed computing. Informally, identities are used for tackling two of the fundamental difficulties that are inherent to deterministic distributed computing, namely: (1) symmetry breaking, and (2) topological information gathering. In the context of local computation, i.e., when nodes can gather information only from nodes at bounded distances, some insight regarding the role of identities has been established. For instance, it was shown that, for large classes of construction problems, the role of the identities can be rather small. However, for the identities to play no role, some other kinds of mechanisms for breaking symmetry must be employed, such as edge-labeling or sense of direction. When it comes to local distributed decision problems, the specification of the decision task does not seem to involve symmetry breaking. Therefore, it is expected that, assuming nodes can gather sufficient information about their neighborhood, one could get rid of the identities, without employing extra mechanisms for breaking symmetry. We tackle this question in the framework of the $$\mathcal{LOCAL}$$ model.
Let LD be the class of all problems that can be decided in a constant number of rounds in the $$\mathcal{LOCAL}$$ model. Similarly, let LD* be the class of all problems that can be decided at constant cost in the anonymous variant of the $$\mathcal{LOCAL}$$ model, in which nodes have no identities, but each node can get access to the (anonymous) ball of radius t around it, for any t, at a cost of t. It is clear that LD* ⊆ LD. We conjecture that LD*=LD. In this paper, we give several evidences supporting this conjecture. In particular, we show that it holds for hereditary problems, as well as when the nodes know an arbitrary upper bound on the total number of nodes. Moreover, we prove that the conjecture holds in the context of non-deterministic local decision, where nodes are given certificates (independent of the identities, if they exist), and the decision consists in verifying these certificates. In short, we prove that NLD*=NLD.
Pierre Fraigniaud, Magnús M. Halldórsson, Amos Korman

### Black Hole Search and Exploration in Unoriented Tori with Synchronous Scattered Finite Automata

Abstract
We consider the problem of locating a black hole in a synchronous, anonymous, and unoriented torus network using mobile agents. A black hole is a harmful network node that destroys any agent visiting it without leaving any trace. The objective is to locate the black hole using as few agents as possible. We present here an almost optimal deterministic algorithm for synchronous (partially) unoriented tori using five scattered agents with constant memory and three identical tokens. We also study the exploration problem of a safe (i.e., without black holes) unoriented torus. While it has been previously shown that there is no universal algorithm for one agent with constant memory and any constant number of tokens which can explore all cubic planar graphs, we give here the first algorithm which enables a finite automaton with two tokens to explore (without termination detection) any totally unoriented torus and we prove optimality on the number of tokens.
Euripides Markou, Michel Paquette

### Algorithms for Partial Gathering of Mobile Agents in Asynchronous Rings

Abstract
In this paper, we consider the partial gathering problem of mobile agents in asynchronous unidirectional rings equipped with whiteboards on nodes. The partial gathering problem requires, for a given input g, that each agent should move to a node and terminates so that at least g agents should meet at the same node. The requirement for the partial gathering is weaker than that for the ordinary (total) gathering, and thus, we have interests in clarifying the difference on the move complexity between them. We propose two algorithms to solve the partial gathering problem. One algorithm is deterministic and assumes unique ID of each agent. The other is randomized and assumes anonymous agents. The deterministic (resp., randomized) algorithm achieves the partial gathering in O(gn) (resp., expected O(gn + nlogk)) total number of moves where n is the ring size and k is the number of agents, while the total gathering requires Ω(kn) moves. We show that the move complexity of the deterministic algorithm is asymptotically optimal.
Masahiro Shibata, Shinji Kawai, Fukuhito Ooshita, Hirotsugu Kakugawa, Toshimitsu Masuzawa

### Causality, Influence, and Computation in Possibly Disconnected Synchronous Dynamic Networks

Abstract
In this work, we study the propagation of influence and computation in dynamic networks that are possibly disconnected at every instant. We focus on a synchronous message passing communication model with broadcast and bidirectional links. To allow for bounded end-to-end communication we propose a set of minimal temporal connectivity conditions that bound from the above the time it takes for information to make progress in the network. We show that even in dynamic networks that are disconnected at every instant information may spread as fast as in networks that are connected at every instant. Further, we investigate termination criteria when the nodes know some upper bound on each of the temporal connectivity conditions. We exploit our termination criteria to provide efficient protocols (optimal in some cases) that solve the fundamental counting and all-to-all token dissemination (or gossip) problems. Finally, we show that any protocol that is correct in instantaneous connectivity networks can be adapted to work in temporally connected networks.
Othon Michail, Ioannis Chatzigiannakis, Paul G. Spirakis

### Wait-Free Stabilizing Dining Using Regular Registers

Abstract
Dining philosophers is a scheduling paradigm that determines when processes in a distributed system should execute certain sections of their code so that processes do not execute ‘conflicting’ code sections concurrently, for some application-dependent notion of a ‘conflict’. Designing a stabilizing dining algorithm for shared-memory systems subject to process crashes presents an interesting challenge: classic stabilization relies on all processes continuing to execute actions forever, an assumption which is violated when crash failures are considered. We present a dining algorithm that is both wait-free (tolerates any number of crashes) and is pseudo-stabilizing. Our algorithm works in an asynchronous system in which processes communicate via shared regular registers and have access to the eventually perfect failure detector $$\diamondsuit \mathcal{P}$$. Furthermore, with a stronger failure detector, the solution becomes wait-free and self-stabilizing. To our knowledge, this is the first such algorithm. Prior results show that $$\diamondsuit \mathcal{P}$$ is necessary for wait-freedom.
Srikanth Sastry, Jennifer L. Welch, Josef Widder

### Node Sampling Using Random Centrifugal Walks

Abstract
Sampling a network with a given probability distribution has been identified as a useful operation. In this paper we propose distributed algorithms for sampling networks, so that nodes are selected by a special node, called the source, with a given probability distribution. All these algorithms are based on a new class of random walks, that we call Random Centrifugal Walks (RCW). A RCW is a random walk that starts at the source and always moves away from it.
Firstly, an algorithm to sample any connected network using RCW is proposed. The algorithm assumes that each node has a weight, so that the sampling process must select a node with a probability proportional to its weight. This algorithm requires a preprocessing phase before the sampling of nodes. In particular, a minimum diameter spanning tree (MDST) is created in the network, and then nodes’ weights are efficiently aggregated using the tree. The good news are that the preprocessing is done only once, regardless of the number of sources and the number of samples taken from the network. After that, every sample is done with a RCW whose length is bounded by the network diameter.
Secondly, RCW algorithms that do not require preprocessing are proposed for grids and networks with regular concentric connectivity, for the case when the probability of selecting a node is a function of its distance to the source.
The key features of the RCW algorithms (unlike previous Markovian approaches) are that (1) they do not need to warm-up (stabilize), (2) the sampling always finishes in a number of hops bounded by the network diameter, and (3) it selects a node with the exact probability distribution.
Andrés Sevilla, Alberto Mozo, Antonio Fernández Anta

### Physarum-Inspired Self-biased Walkers for Distributed Clustering

Abstract
We propose a distributed scheme to compute distance-based clusters. We first present a mechanism based on the flow of distributed tokens called walkers, circulating randomly between a source and a sink to compute a shortest path. Each time a walker takes an edge, it reinforces the probability that subsequent walkers take it. This mechanism is a discrete emulation of the slime mould (Physarum polycephalum) dynamics presented in [16]: each node observes the flow of walkers going through each adjacent edge and uses this flow to compute the probabilities with which it sends the walkers through each edge. Then, based on this mechanism, we show how several sources compute a shortest path DAG to a given sink. Finally, given some clusterheads acting like sinks, we show that this process converges to distance-based clusters (i.e. nodes join the clusterhead to which they are closest) with shortest-path DAGs. The algorithm is designed with a special focus on dynamic networks: the flow locally adapts to the appearance and disappearance of links and nodes, including clusterheads.
Devan Sohier, Giorgos Georgiadis, Simon Clavière, Marina Papatriantafilou, Alain Bui

Abstract
Wait-freedom is the strongest and most desirable progress guarantee, under which any thread must make progress when given enough CPU steps. Wait-freedom is required for hard real-time, and desirable in many other scenarios. However, because wait-freedom is hard to achieve, we usually settle for the weaker lock-free progress guarantee, under which one of the active threads is guaranteed to make progress. With lock-freedom (and unlike wait-freedom), starvation of all threads but one is possible.
The linked-list data structure is fundamental and ubiquitous. Lock-free versions of the linked-list are well known. However, whether it is possible to design a practical wait-free linked-list has remained an open question. In this work we present a practical wait-free linked-list based on the CAS primitive. To improve performance further, we also extend this design using the fast-path-slow-path methodology. The proposed design has been implemented and measurements demonstrate performance competitive with that of Harris’s lock-free list, while still providing the desirable wait-free guarantee, required for real-time systems.
Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank

### Byzantine Chain Replication

Abstract
We present a new class of Byzantine-tolerant State Machine Replication protocols for asynchronous environments that we term Byzantine Chain Replication. We demonstrate two implementations that present different trade-offs between performance and security, and compare these with related work. Leveraging an external reconfiguration service, these protocols are not based on Byzantine consensus, do not require majority-based quorums during normal operation, and the set of replicas is easy to reconfigure.
One of the implementations is instantiated with t + 1 replicas to tolerate t failures and is useful in situations where perimeter security makes malicious attacks unlikely. Applied to in-memory BerkeleyDB replication, it supports 20,000 transactions per second while a fully Byzantine implementation supports 12,000 transactions per second—about 70% of the throughput of a non-replicated database.
Robbert van Renesse, Chi Ho, Nicolas Schiper

### Backmatter

Weitere Informationen