Skip to main content

2009 | Buch

Principles of Distributed Systems

13th International Conference, OPODIS 2009, Nîmes, France, December 15-18, 2009. Proceedings

herausgegeben von: Tarek Abdelzaher, Michel Raynal, Nicola Santoro

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 13th International Conference on Principles of Distributed Systems, OPODIS 2009, held in Nimes, France, in December 2009. The 23 full papers and 4 short papers presented were carefully reviewed and selected from 72 submissions. The papers are organized in topical sections on distributed scheduling, distributed robotics, fault and failure detection, wireless and social networks, synchronization, storage systems, distributed agreement, and distributed algorithms.

Inhaltsverzeichnis

Frontmatter

Invited Talks

Transactional Memory Today: A Status Report
Abstract
The term “Transactional Memory” was coined back in 1993, but even today, there is a vigorous debate about its merits. This debate sometimes generates more heat than light: terms are not always well-defined and criteria for making judgments are not always clear.
In this talk, I will try to impose some order on the conversation. TM itself can encompass hardware, software, speculative lock elision, and other mechanisms. The benefits sought encompass simpler implementations of highly-concurrent data structures, better software engineering for concurrent platforms, enhanced performance, and reduced power consumption. We will look at various terms in this cross-product and evaluate how we are doing. So far.
Maurice Herlihy
Navigating the Web 2.0 with Gossple
Abstract
Social networks and collaborative tagging systems have taken off at an unexpected scale and speed (Facebook, YouTube, Flickr, Last.fm, Delicious, etc). Web content is now generated by you, me, our friends and millions of others. This represents a revolution in usage and a great opportunity to leverage collaborative knowledge to enhance the user’s Internet experience. The Gossple project aims at precisely achieving this: automatically capturing affinities between users that are potentially unknown yet share similar interests, or exhibiting similar behaviors on the Web. This can fully personalizes the Web 2.0 experience process, increasing the ability of a user to find relevant content, get relevant recommandation, etc. This personalization calls for decentralization. (1) Centralized servers might dissuade users from generating new content for they expose their privacy and represent a single point of attack. (2) The amount of information to store grows exponentially with the size of the system and centralized systems cannot sustain storing a growing amount of data at a user granularity. We believe that the salvation can only come from a fully decentralized user centric approach where every participant is entrusted to harvest the Web with information relevant to her own activity. This poses a number of scientific challenges: How to discover similar users, how to build and manage a network of similar users, how to define the relevant metrics for such personalization, how to preserve privacy when needed, how to deal with free-riders and misheavior and how to manage efficiently a growing amount of data.
Anne-Marie Kermarrec

Distributed Scheduling

Transactional Scheduling for Read-Dominated Workloads
Abstract
The transactional approach to contention management guarantees atomicity by aborting transactions that may violate consistency. A major challenge in this approach is to schedule transactions in a manner that reduces the total time to perform all transactions (the makespan), since transactions are often aborted and restarted. The performance of a transactional scheduler can be evaluated by the ratio between its makespan and the makespan of an optimal, clairvoyant scheduler that knows the list of resource accesses that will be performed by each transaction, as well as its release time and duration.
This paper studies transactional scheduling in the context of read-dominated workloads; these common workloads include read-only transactions, i.e., those that only observe data, and late-write transactions, i.e., those that update only towards the end of the transaction.
We present the Bimodal transactional scheduler, which is especially tailored to accommodate read-only transactions, without punishing transactions that write most of their duration, called early-write transactions. It is evaluated by comparison with an optimal clairvoyant scheduler; we prove that Bimodal achieves the best competitive ratio achievable by a non-clairvoyant schedule for workloads consisting of early-write and read-only transactions.
We also show that late-write transactions significantly deteriorate the competitive ratio of any non-clairvoyant scheduler, assuming it takes a conservative approach to conflicts.
Hagit Attiya, Alessia Milani
Performance Evaluation of Work Stealing for Streaming Applications
Abstract
This paper studies the performance of parallel stream computations on a multiprocessor architecture using a work-stealing strategy. Incoming tasks are split in a number of jobs allocated to the processors and whenever a processor becomes idle, it steals a fraction (typically half) of the jobs from a busy processor. We propose a new model for the performance analysis of such parallel stream computations. This model takes into account both the algorithmic behavior of work-stealing as well as the hardware constraints of the architecture (synchronizations and bus contentions). Then, we show that this model can be solved using a recursive formula. We further show that this recursive analytical approach is more efficient than the classic global balance technique. However, our method remains computationally impractical when tasks split in many jobs or when many processors are considered. Therefore, bounds are proposed to efficiently solve very large models in an approximate manner. Experimental results show that these bounds are tight and robust so that they immediately find applications in optimization studies. An example is provided for the optimization of energy consumption with performance constraints. In addition, our framework is flexible and we show how it adapts to deal with several stealing strategies.
Jonatha Anselmi, Bruno Gaujal
Not All Fair Probabilistic Schedulers Are Equivalent
Abstract
We propose a novel, generic definition of probabilistic schedulers for population protocols. We then identify the consistent probabilistic schedulers, and prove that any consistent scheduler that assigns a non-zero probability to any transition ij, where i and j are configurations satisfying i ≠ j, is fair with probability 1. This is a new theoretical framework that aims to simplify proving specific probabilistic schedulers fair. In this paper we propose two new schedulers, the State Scheduler and the Transition Function Scheduler. Both possess the significant capability of being protocol-aware, i.e. they can assign transition probabilities based on information concerning the underlying protocol. By using our framework we prove that the proposed schedulers, and also the Random Scheduler that was defined by Angluin et al. [2], are all fair with probability 1. Finally, we define and study equivalence between schedulers w.r.t. performance and correctness and prove that there exist fair probabilistic schedulers that are not equivalent w.r.t. to performance and others that are not equivalent w.r.t. correctness.
Ioannis Chatzigiannakis, Shlomi Dolev, Sándor P. Fekete, Othon Michail, Paul G. Spirakis
Brief Announcement: Relay: A Cache-Coherence Protocol for Distributed Transactional Memory
Abstract
Distributed transactional memory promises to alleviate difficulties with lock-based (distributed) synchronization and object performance bottlenecks in distributed systems. The design of the cache- coherence protocol is critical to the performance of distributed transactional memory systems. We evaluate the performance of a cache-coherence protocol by measuring its worst-case competitive ratio — i.e., the ratio of its makespan to the makespan of the optimal cache-coherence protocol. We establish the upper bound of the competitive ratio and show that it is determined by the worst-case number of abortions, maximum locating stretch, and maximum moving stretch of the protocol — the first such result. We present the Relay protocol, a novel cache-coherence protocol, which optimizes these values, and evaluate its performance. We show that Relay’s competitive ratio is significantly improved by a factor of O(N i ) for N i transactions requesting the same object when compared against past distributed queuing protocols.
Bo Zhang, Binoy Ravindran

Distributed Robotics

Byzantine Convergence in Robot Networks: The Price of Asynchrony
Abstract
We study the convergence problem in fully asynchronous, uni-dimensional robot networks that are prone to Byzantine (i.e. malicious) failures. In these settings, oblivious anonymous robots with arbitrary initial positions are required to eventually converge to an a priori unknown position despite a subset of them exhibiting Byzantine behavior. Our contribution is twofold. We propose a deterministic algorithm that solves the problem in the most generic settings: fully asynchronous robots that operate in the non-atomic CORDA model. Our algorithm provides convergence in 5f + 1-sized networks where f is the upper bound on the number of Byzantine robots. Additionally, we prove that 5f + 1 is a lower bound whenever robot scheduling is fully asynchronous. This constrasts with previous results in partially synchronous robot networks, where 3f + 1 robots are necessary and sufficient.
Zohir Bouzid, Maria Gradinariu Potop-Butucaru, Sébastien Tixeuil
Deaf, Dumb, and Chatting Asynchronous Robots
Enabling Distributed Computation and Fault-Tolerance among Stigmergic Robots
Abstract
We investigate avenues for the exchange of information (explicit communication) among deaf and dumb mobile robots scattered in the plane. We introduce the use of movement-signals (analogously to flight signals and bees waggle) as a mean to transfer messages, enabling the use of distributed algorithms among robots. We propose one-to-one deterministic movement protocols that implement explicit communication among asynchronous robots. We first show how the movements of robots can provide implicit acknowledgment in asynchronous systems. We use this result to design one-to-one communication among a pair of robots. Then, we propose two one-to-one communication protocols for any system of n ≥ 2 robots. The former works for robots equipped with observable IDs that agree on a common direction (sense of direction). The latter enables one-to-one communication assuming robots devoid of any observable IDs or sense of direction. All three protocols (for either two or any number of robots) assume that no robot remains inactive forever. However, they cannot avoid that the robots move either away or closer of each others, by the way requiring robots with an infinite visibility. In this paper, we also present how to overcome these two disadvantages.
These protocols enable the use of distributing algorithms based on message exchanges among swarms of Stigmergic robots. They also allow robots to be equipped with the means of communication to tolerate faults in their communication devices.
Yoann Dieudonné, Shlomi Dolev, Franck Petit, Michael Segal
Synchronization Helps Robots to Detect Black Holes in Directed Graphs
Abstract
The paper considers a team of robots which has to explore a graph G where some nodes can be harmful. Robots are initially located at the so called home base node. The dangerous nodes are the so called black hole nodes, and once a robot enters in one of them, it is destroyed. The goal is to find a strategy in order to explore G in such a way that the minimum number of robots is wasted. The exploration ends if there is at least one surviving robot which knows all the edges leading to the black holes. As many variations of the problem have been considered so far, the solution and its measure heavily depend on the initial knowledge and the capabilities of the robots. In this paper, we assume that G is a directed graph, the robots are associated with unique identifiers, they know the number of nodes n of G (or at least an upper bound on n), and they know the number of edges Δ leading to the black holes. Each node is associated with a white board where robots can read and write information in a mutual exclusive way.
A recently posed question [Czyzowicz et al., Proc. SIROCCO’09] is whether some number of robots, expressed as a function of parameter Δ only, is sufficient to detect black holes in directed graphs of arbitrarily large order n. We give a positive answer to this question for the synchronous case, i.e., when the robots share a common clock, showing that O(Δ·2Δ) robots are sufficient to solve the problem. This bound is nearly tight, since it is known that at least 2Δ robots are required for some instances. Quite surprisingly, we also show that unlike in the case of undirected graphs, for the directed version of the problem, synchronization can sometimes make a difference: for Δ= 1, 2 robots are always sufficient and sometimes required to explore the graph regardless of whether synchronization is present; however, for Δ= 2, in the synchronous case 4 robots are always sufficient, whereas in the asynchronous case at least 5 robots are sometimes required.
Adrian Kosowski, Alfredo Navarra, Cristina M. Pinotti

Fault and Failure Detection

The Fault Detection Problem
Abstract
One of the most important challenges in distributed computing is ensuring that services are correct and available despite faults. Recently it has been argued that fault detection can be factored out from computation, and that a generic fault detection service can be a useful abstraction for building distributed systems. However, while fault detection has been extensively studied for crash faults, little is known about detecting more general kinds of faults.
This paper explores the power and the inherent costs of generic fault detection in a distributed system. We propose a formal framework that allows us to partition the set of all faults that can possibly occur in a distributed computation into several fault classes. Then we formulate the fault detection problem for a given fault class, and we show that this problem can be solved for only two specific fault classes, namely omission faults and commission faults. Finally, we derive tight lower bounds on the cost of solving the problem for these two classes in asynchronous message-passing systems.
Andreas Haeberlen, Petr Kuznetsov
The Minimum Information about Failures for Solving Non-local Tasks in Message-Passing Systems
Abstract
This paper defines the basic notions of local and non-local tasks, and determines the minimum information about failures that is necessary to solve any non-local task in message-passing systems. It also introduces a natural weakening of the well-known set agreement task, and show that, in some precise sense, it is the weakest non-local task in message-passing systems.
Carole Delporte-Gallet, Hugues Fauconnier, Sam Toueg
Enhanced Fault-Tolerance through Byzantine Failure Detection
Abstract
We consider a variant of the Byzantine failure model in which Byzantine processes are eventually detected and silenced, and investigate the fault-tolerance of the classical broadcast and agreement problems. We show that if all Byzantine processes are eventually detected, then it is possible to solve the broadcast problem in the presence of any number of Byzantine processes. If only a fraction of the Byzantine processes can be detected, then we show that it is possible to solve consensus (and broadcast) if the total number of processes is N ≥ 2f + 3F + 1, where f is the number of Byzantine processes that are eventually detected and F is the number of those that are never detected. We show that 2f + 3F + 1 is a lower bound to solve the consensus and broadcast problems.
Rida A. Bazzi, Maurice Herlihy

Wireless and Social Networks

Decentralized Polling with Respectable Participants
Abstract
We consider the polling problem in a social network where participants care about their reputation: they do not want their vote to be disclosed nor their misbehaving, if any, to be publicly exposed. Assuming this reputation concern, we show that a simple secret sharing scheme, combined with verification procedures, can efficiently enable polling without the need for any central authority or heavyweight cryptography.
More specifically, we present DPol, a simple and scalable distributed polling protocol where misbehaving nodes are exposed with a non-zero probability and the probability of dishonest participants violating privacy is balanced with their impact on the accuracy of the polling result. The trade-off is captured by a generic parameter of the protocol, an integer k we call the privacy parameter, so that in a system of N nodes with \(B<\sqrt{N}\) dishonest participants, the probability of disclosing a participant’s vote is bounded by (B/N) k + 1, whereas the impact on the polling result is bounded by (6k + 2) B.
We report on the deployment of DPolover 400 PlanetLab nodes. The polling result suffers a relative error of less than 10% in the face of message losses, crashes and asynchrony inherent in PlanetLab. In the presence of dishonest nodes, our experiments show that the impact on the polling result is (4k + 1) B on average, consistently lower that the theoretical bound of (6k + 2) B.
Rachid Guerraoui, Kévin Huguenin, Anne-Marie Kermarrec, Maxime Monod
Efficient Power Utilization in Multi-radio Wireless Ad Hoc Networks
Abstract
Short-range wireless communication capabilities enable the creation of ad hoc networks between devices such as smart-phones or sensors, spanning, e.g., an entire high-school or a small university campus. This paper is motivated by the proliferation of devices equipped with multiple such capabilities, e.g., Blue-Tooth (BT) and WiFi for smart-phones, or ZigBee and WiFi for sensors. Yet, each of these interfaces has significantly different, and, to a large extent complementing, characteristics in terms of energy efficiency, transmission range, and bandwidth. Consequently, a viable ad hoc network composed of such devices must be able to utilize the combination of these capabilities in a clever way. For example, BT is an order of magnitude more power efficient than WiFi, but its transmission range is also an order of magnitude shorter. Hence, one would want to shut down as many WiFi transmitters as possible, while still ensuring overall network connectivity. Moreover, for latency and network capacity reasons, in addition to pure connectivity, a desired property of such a solution is to keep the number of BT hops traversed by each transmission below a given threshold k.
This paper addresses this issue by introducing the novel k-Weighted Connected Dominating Set (kWCDS) problem and providing a formal definition for it. A distributed algorithm with a proven approximation ratio is presented, followed by a heuristic protocol. While the heuristic protocol has no formally proven approximation ratio, it behaves better than the first protocol in many practical network densities. Beyond that, a tradeoff between communication overhead and the quality of the resulting kWCDS emerges. The paper includes simulation results that explore the performance of the two protocols.
Roy Friedman, Alex Kogan
Adversarial Multiple Access Channel with Individual Injection Rates
Abstract
We study deterministic distributed broadcasting on a synchronous multiple-access channel. Packets are injected into stations by a window-type adversary that is constrained by an individual injection rate of each station and a window w. We investigate what queue sizes and packet latency can be achieved with the maximum throughput of one packet per round. A protocol knows the number n of all the stations but does not know the window nor the individual rates of stations. We study the power of full sensing and acknowledgment based protocols as compared to general adaptive ones. We show that individual injection rates make it possible to achieve bounded packet latency by full sensing protocols, what is in contrast with the model of global injection rates for which stability and finite waiting times are not achievable together by general protocols. We show that packet latency is \(\Omega\bigl(w\,\frac{\log n}{\log w}\bigr)\) when w ≤ n and it is Ω(w) when w > n. We give a full sensing protocol for channels with collision detection and an adaptive one for channels without collision detection that achieve \({\mathcal O}({\rm min}(n+w,w\log n))\) packet latency. We develop a full sensing protocol for a channel without collision detection that achieves \({\mathcal O}(n+w)\) queues and \({\mathcal O}(nw)\) packet latency.
Lakshmi Anantharamu, Bogdan S. Chlebus, Mariusz A. Rokicki

Synchronization

NB-FEB: A Universal Scalable Easy-to-Use Synchronization Primitive for Manycore Architectures
Abstract
This paper addresses the problem of universal synchronization primitives that can support scalable thread synchronization for large-scale manycore architectures. The universal synchronization primitives that have been deployed widely in conventional architectures, are the compare-and-swap (CAS) and load-linked/store-conditional (LL/SC) primitives. However, such synchronization primitives are expected to reach their scalability limits in the evolution to manycore architectures with thousands of cores.
We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallel programming on manycore architectures. We show that the NB-FEB primitive is universal, scalable, feasible and easy to use. NB-FEB, together with registers, can solve the consensus problem for an arbitrary number of processes (universality). NB-FEB is combinable, namely its memory requests to the same memory location can be combined into only one memory request, which consequently makes NB-FEB scalable (scalability). Since NB-FEB is a variant of the original full/empty bit that always returns a value instead of waiting for a conditional flag, it is as feasible as the original full/empty bit, which has been implemented in many computer systems (feasibility). We construct, on top of NB-FEB, a non-blocking software transactional memory system called NBFEB-STM, which can be used as an abstraction to handle concurrent threads easily. NBFEB-STM is space efficient: the space complexity of each object updated by N concurrent threads/transactions is \({\it \Theta}(N)\), which is optimal.
Phuong Hoai Ha, Philippas Tsigas, Otto J. Anshus
Gradient Clock Synchronization Using Reference Broadcasts
Abstract
Reference-Broadcast Synchronization (RBS) is a technique that allows a set of receivers in a broadcast network to accurately estimate each others’ clock values. RBS provides a relative time-frame for conversion between the local clocks of different nodes, and can be used to synchronize nodes to an external time-source such as GPS. However, RBS by itself does not output a logical clock at every node, and so it does not solve internal clock synchronization.
In this work we study the theoretical properties of RBS in the worst-case model, in which the performance of a clock synchronization algorithm is measured by the worst-case skew it can incur. We suggest a method by which RBS can be incorporated in standard internal clock synchronization algorithms. This is achieved by separating the task of estimating the clock values of other nodes in the network from the task of using these estimates to output a logical clock value.
The separation is modelled using a virtual estimate graph, overlaid on top of the real network graph, which represents the information various nodes can obtain about each other. RBS estimates are represented in the estimate graph as edges between nodes at distance 2 from each other in the original network graph. A clock synchronization algorithm then operates on the estimate graph as though it were the original network.
To illustrate the merits of this approach, we modify a recent optimal gradient clock synchronization algorithm to work in this setting. The modified algorithm transparently takes advantage of RBS estimates. Its quality of synchronization depends on the diameter of the estimate graph, which is typically much smaller than the diameter of the original network graph.
Fabian Kuhn, Rotem Oshman
Brief Announcement: Communication-Efficient Self-stabilizing Protocols for Spanning-Tree Construction
Abstract
Most of self-stabilizing protocols require every pair of neighboring processes to communicate with each other repeatedly and forever even after converging to legitimate configurations. Such permanent communication impairs efficiency, but is necessary in nature of self-stabilization. So it is challenging to minimize the number of process pairs communicating after convergence. In this paper, we investigate possibility of communication-efficient self-stabilization for spanning-tree construction, which allows only O(n) pairs of neighboring processes to communicate repeatedly after convergence.
Toshimitsu Masuzawa, Taisuke Izumi, Yoshiaki Katayama, Koichi Wada

Storage Systems

On the Impact of Serializing Contention Management on STM Performance
Abstract
Transactional memory (TM) is an emerging concurrent programming abstraction. Numerous software-based transactional memory (STM) implementations have been developed in recent years. STM implementations must guarantee transaction atomicity and isolation. In order to ensure progress, an STM implementation must resolve transaction collisions by consulting a contention manager (CM).
Recent work established that serializing contention management - a technique in which the execution of colliding transactions is serialized for eliminating repeat-collisions - can dramatically improve STM performance in high-contention workloads. In low-contention and highly-parallel workloads, however, excessive serialization of memory transactions may limit concurrency too much and hurt performance. It is therefore important to better understand how the impact of serialization on STM performance varies as a function of workload characteristics.
We investigate how serializing CM influences the performance of STM systems. Specifically, we study serialization’s influence on STM throughput (number of committed transactions per time unit) and efficiency (ratio between the extent of “useful” work done by the STM and work “wasted” by aborts) as the workload’s level of contention varies. Towards this goal, we implement CBench - a synthetic benchmark that generates workloads in which transactions have (parameter) pre-determined length and probability of being aborted in the lack of contention reduction mechanisms. CBench facilitates evaluating the efficiency of contention management algorithms across the full spectrum of contention levels.
The characteristics of TM workloads generated by real applications may vary over time. To achieve good performance, CM algorithms need to monitor these characteristics and change their behavior accordingly. We implement adaptive algorithms that control the activation of serialization CM according to measured contention level, based on a novel low-overhead serialization algorithm. We then evaluate our new algorithms on CBench-generated workloads and on additional well-known STM benchmark applications. We believe our results shed light on the manner in which serializing CM should be used by STM systems.
Tomer Heber, Danny Hendler, Adi Suissa
On the Efficiency of Atomic Multi-reader, Multi-writer Distributed Memory
Abstract
This paper considers quorum-replicated, multi-writer, multi-reader (MWMR) implementations of survivable atomic registers in a distributed message-passing system with processors prone to failures. Previous implementations in such settings invariably required two rounds of communication between readers/writers and replica owners. Hence the question arises whether it is possible to have single round read and/or write operations in this setting.
We thus devise an algorithm, called Sfw, that exploits a new technique called server side ordering (SSO), which –unlike previous approaches– places partial responsibility for the ordering of write operations on the replica owners (the servers). With SSO, fast write operations are introduced for the very first time in the MWMR setting. We prove that our algorithm preserves atomicity in all permissible executions. While algorithm SFW shows that in principle fast writes are possible, we also show that under certain conditions the MWMR model imposes inherent limitations on any quorum-based fast write implementation of a safe read/write register and potentially even restricts the number of writer participants in the system. In this case our algorithm achieves near optimal efficiency.
Burkhard Englert, Chryssis Georgiou, Peter M. Musial, Nicolas Nicolaou, Alexander A. Shvartsman
Abortable Fork-Linearizable Storage
Abstract
We address the problem of emulating a shared read/write memory in a message passing system using a storage server prone to Byzantine failures. Although cryptography can be used to ensure confidentiality and integrity of the data, nothing can prevent a malicious server from returning obsolete data. Fork-linearizability [1] guarantees that if a malicious server hides an update of some client from another client, then these two clients will never see each others’ updates again. Fork-linearizability is arguably the strongest consistency property attainable in the presence of a malicious server. Recent work [2] has shown that there is no fork-linearizable shared memory emulation that supports wait-free operations. On the positive side, it has been shown that lock-based emulations exist [1,2]. Lock-based protocols are fragile because they are blocking if clients may crash. In this paper we present for the first time lock-free emulations of fork-linearizable shared memory. We have developed two protocols, Linear and Concur. With a correct server, both protocols guarantee linearizability and that every operation successfully completes in the absence of step contention, while interfering operations terminate by aborting. The Concur algorithm additionally ensures that concurrent operations invoked on different registers complete successfully.
Matthias Majuntke, Dan Dobre, Marco Serafini, Neeraj Suri

Distributed Agreement

On the Computational Power of Shared Objects
Abstract
We propose a new classification for evaluating the strength of shared objects. The classification is based on finding, for each object of type o, the strongest progress condition for which it is possible to solve consensus for any number of processes, using any number of objects of type o and atomic registers. We use the strongest progress condition to associate with each object a number call the power number of that object. Objects with higher power numbers are considered stronger. Then, we define the power hierarchy which is an infinite hierarchy of objects such that the objects at level i of the hierarchy are exactly those objects with power number i. Comparing our classification with the traditional one which is based on fixing the progress condition (namely, wait-freedom) and finding the largest number of processes for which consensus is solvable, reveals interesting results. Our equivalence and extended universality results, provide a deeper understanding of the nature of the relative computational power of shared objects.
Gadi Taubenfeld
Weak Synchrony Models and Failure Detectors for Message Passing (k-)Set Agreement
Abstract
The recent discovery of the weakest failure detector \({\mathcal{L}}\) for message passing set agreement has renewed the interest in exploring the border between solvable and unsolvable problems in message passing systems. This paper contributes to this research by introducing two novel system models \({\mathcal{M}^\text{anti}}\) and \({\mathcal{M}^\text{sink}}\) with very weak synchrony requirements, where \({\mathcal{L}}\) can be implemented. To the best of our knowledge, they are the first message passing models where set agreement is solvable but consensus is not. We also generalize \({\mathcal{L}}\) by a novel “(nk)-loneliness” failure detector \({\mathcal{L}}(k)\), which allows to solve k-set agreement but not (k−1)-set agreement. We also present an algorithm that solves k-set agreement with \({\mathcal{L}}(k)\), which is anonymous in that it does not require unique process identifiers. This reveals that \({\mathcal{L}}\) is also the weakest failure detector for anonymous set agreement. Finally, we analyze the relationship between \({\mathcal{L}}(k)\) and other failure detectors, namely the limited scope failure detector \({\mathcal{S}}_{n-k+1}\) and the quorum failure detector Σ.
Martin Biely, Peter Robinson, Ulrich Schmid
Unifying Byzantine Consensus Algorithms with Weak Interactive Consistency
Abstract
The paper considers the consensus problem in a partially synchronous system with Byzantine processes. In this context, the literature distinguishes authenticated Byzantine faults, where messages can be signed by the sending process (with the assumption that the signature cannot be forged by any other process), and Byzantine faults, where there is no mechanism for signatures (but the receiver of a message knows the identity of the sender). The paper proposes an abstraction called weak interactive consistency (WIC) that unifies consensus algorithms with and without signed messages. WIC can be implemented with and without signatures.
The power of WIC is illustrated on two seminal Byzantine consensus algorithms: the Castro-Liskov PBFT algorithm (no signatures) and the Martin-Alvisi FaB Paxos algorithms (signatures). WIC allows a very concise expression of these two algorithms.
Zarko Milosevic, Martin Hutle, André Schiper

Distributed Algorithms

Safe and Eventually Safe: Comparing Self-stabilizing and Non-stabilizing Algorithms on a Common Ground
(Extended Abstract)
Abstract
Self-stabilizing systems can be started in any arbitrary state and converge to exhibit the desired behavior. However, self-stabilizing systems can be started in predefined initial states, in the same way as non-stabilizing systems. In this case, a self-stabilizing system can mask faults just like any other distributed system. Moreover, whenever faults overwhelm the systems beyond their capabilities to mask faults, the stabilizing system recovers to exhibit eventual safety and liveness, while the behavior of non-stabilizing systems is undefined and may well remain totally and permanently undesired. We demonstrate the importance of defining the initial state of a self-stabilizing system in a specific case of distributed reset over a system composed of several layers of self-stabilizing algorithms. A self-stabilizing stabilization detector ensures that, at first, only the very first layer(s) takes action, and that then higher levels are activated, ensuring smooth restarts, while preserving the stabilization property. The safety of initialized self-stabilizing systems, combined with their better ability to regain safety and liveness following severe conditions, is then demonstrated over the classical fault masking modular redundancy architecture.
Sylvie Delaët, Shlomi Dolev, Olivier Peres
Proactive Fortification of Fault-Tolerant Services
Abstract
We present an approach for incorporating intrusion resilience to replicated services, irrespective of the service replication used and of the fault types tolerated. The approach, termed as FORTRESS, involves fortifying a fault-tolerant service using proxies that block clients from accessing the servers directly, and periodically refreshing proxies and servers with diverse executables generated using code randomization. These two features make it hard for an attacker to compromise a server when no proxy has been compromised. An analytical evaluation establishes that if attackers cannot intrude servers without first having compromised a proxy, fortifying even a passively replicated service can offer greater resilience than building that service as a deterministic state machine and actively replicating it over diverse platforms. Finally, the FORTRESS architecture is presented where proactive code randomization is achieved by proactive replacement of server and proxy nodes. Examining the state transfer protocol executed during node replacement shows that the processing overhead per replacement is no more than the overhead for changing the leader or the primary replica in replication management.
Paul Ezhilchelvan, Dylan Clarke, Isi Mitrani, Santosh Shrivastava
Robustness of the Rotor-router Mechanism
Abstract
We consider the model of exploration of an undirected graph G by a single agent which is called the rotor-router mechanism or the Propp machine (among other names). Let π v indicate the edge adjacent to a node v which the agent took on its last exit from v. The next time when the agent enters node v, first a “rotor” at node v advances pointer π v to the edge \({\it next}(\pi_v)\) which is next after the edge π v in a fixed cyclic order of the edges adjacent to v. Then the agent is directed onto edge π v to move to the next node. It was shown before that after initial O(mD) steps, the agent periodically follows one established Eulerian cycle, that is, in each period of 2m consecutive steps the agent traverses each edge exactly twice, once in each direction. The parameters m and D are the number of edges in G and the diameter of G. We investigate robustness of such exploration in presence of faults in the pointers π v or dynamic changes in the graph. We show that after the exploration establishes an Eulerian cycle,
  • if at some step the values of k pointers π v are arbitrarily changed, then a new Eulerian cycle is established within O(km) steps;
  • if at some step k edges are added to the graph, then a new Eulerian cycle is established within O(km) steps;
  • if at some step an edge is deleted from the graph, then a new Eulerian cycle is established within O(γm) steps, where γ is the smallest number of edges in a cycle in graph G containing the deleted edge.
Our proofs are based on the relation between Eulerian cycles and spanning trees known as the “BEST” Theorem (after de Bruijn, van Aardenne-Ehrenfest, Smith and Tutte).
Evangelos Bampas, Leszek Gąsieniec, Ralf Klasing, Adrian Kosowski, Tomasz Radzik
Brief Annoucement: Analysis of an Optimal Bit Complexity Randomised Distributed Vertex Colouring Algorithm
(Extended Abstract)
Abstract
Let G = (V,E) be a simple undirected graph. A vertex colouring of G assigns colours to each vertex in such a way that neighbours have different colours.
In this paper we discuss how efficient (time and bits) vertex colouring may be accomplished by exchange of bits between neighbouring vertices. The distributed complexity of vertex colouring is of fundamental interest for the study and analysis of distributed computing. Usually, the topology of a distributed system is modelled by a graph and paradigms of distributed systems are encoded by classical problems in graph theory; among these classical problems one may cite the problems of vertex colouring, computing a maximal independent set, finding a vertex cover or finding a maximal matching. Each solution to one of these problems is a building block for many distributed algorithms: symmetry breaking, topology control, routing, resource allocation.
Yves Métivier, John Michael Robson, Nasser Saheb-Djahromi, Akka Zemmari
Brief Annoucement: Distributed Swap Edges Computation for Minimum Routing Cost Spanning Trees
Abstract
Given a weighted graph G (V G , E G ) representing a communication network, with n nodes and m edges where the weights are positive integers, its Spanning Tree is typically used to route messages. In [1] the routing cost of a spanning tree is defined as the sum of the distances over all pairs of vertices of this tree. Hence, the most suitable spanning tree for the routing problem is the one minimizing the routing cost: the Minimum Routing Cost Spanning Tree (MRCST).
Linda Pagli, Giuseppe Prencipe
Backmatter
Metadaten
Titel
Principles of Distributed Systems
herausgegeben von
Tarek Abdelzaher
Michel Raynal
Nicola Santoro
Copyright-Jahr
2009
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-10877-8
Print ISBN
978-3-642-10876-1
DOI
https://doi.org/10.1007/978-3-642-10877-8