Skip to main content
Top

2005 | Book

Distributed Computing

19th International Conference, DISC 2005, Cracow, Poland, September 26-29, 2005. Proceedings

insite
SEARCH

Table of Contents

Frontmatter

Invited Talks

Digital Fountains and Their Application to Informed Content Delivery over Adaptive Overlay Networks
Invited Talk

We study how to optimize throughput of large transfers across richly connected, adaptive overlay networks, focusing on the potential of collaborative transfers between peers to supplement ongoing downloads. First, we make the case for an erasure-resilient encoding of the content, using the digital fountain paradigm. Such an approach affords reliability and a substantial degree of application-level flexibility, as it seamlessly accommodates connection migration and parallel transfers while providing resilience to packet loss. We explain the history of this paradigm, focusing on recent advances in coding that allow efficient implementations of digital fountains. We also describe our previous work showing the effectiveness of digital fountains for reliable multicast and parallel downloading.

In the setting of collaborative transfers on overlay networks, there is an additional consideration since sets of encoded symbols acquired by peers during downloads may overlap substantially. We describe a collection of useful algorithmic tools for efficient estimation, summarization, and approximate reconciliation of sets of symbols between pairs of collaborating peers, all of which keep messaging complexity and computation to a minimum. Through simulations and experiments on a prototype implementation, we demonstrate the performance benefits of our informed content delivery mechanisms.

Michael Mitzenmacher
Securing the Net: Challenges, Failures and Directions
Invited Talk

The Internet is infamously insecure (fraudulent and spoofed sites, phishing and spam e-mail, viruses and Trojans, Denial of Service attacks, etc.) in spite of extensive efforts, standards, tools, and research. We will discuss the problems and the pitfalls, and outline solutions and directions for future applied and analytical research.

Amir Herzberg

Regular Papers

Coterie Availability in Sites

In this paper, we explore new failure models for multi-site systems, which are systems characterized by a collection of sites spread across a wide area network, each site formed by a set of computing nodes running processes. In particular, we introduce two failure models that allow sites to fail, and we use them to derive coteries. We argue that these coteries have better availability than quorums formed by a majority of processes, which are known for having best availability when process failures are independent and identically distributed. To motivate introducing site failures explicitly into a failure model, we present availability data from a production multi-site system, showing that sites are frequently unavailable. We then discuss the implementability of our abstract models, showing possibilities for obtaining these models in practice. Finally, we present evaluation results from running an implementation of the Paxos algorithm on PlanetLab using different quorum constructions. The results show that our constructions have substantially better availability and response time compared to majority coteries.

Flavio Junqueira, Keith Marzullo
Keeping Denial-of-Service Attackers in the Dark

We consider the problem of overcoming (Distributed) Denial of Service (DoS) attacks by realistic adversaries that can eavesdrop on messages, or parts thereof, but with some delay. We show a protocol that mitigates DoS attacks by eavesdropping adversaries, using only available, efficient packet filtering mechanisms based mainly on (addresses and) port numbers. Our protocol avoids the use of fixed ports, and instead performs ‘pseudo-random port hopping’. We model the underlying packet-filtering services and define measures for the capabilities of the adversary and for the success rate of the protocol. Using these, we analyze the proposed protocol, and show that it provides effective DoS prevention for realistic attack and deployment scenarios.

Gal Badishi, Amir Herzberg, Idit Keidar
On Conspiracies and Hyperfairness in Distributed Computing

We study the phenomenon of

conspiracies

, a certain class of livelocks, in distributed computations. This elementary phenomenon occurs in systems with shared variables, shared actions as well as in message-passing systems. We propose a new and simple characterization via a new notion of

hyperfairness

, which postulates the absence of conspiracies. We argue that hyperfairness is a useful tool for understanding some impossibility results, in particular results involving crash-tolerance. As a main result, we show that a large subclass of hyperfairness can be implemented through partial synchrony and randomization.

Hagen Völzer
On the Availability of Non-strict Quorum Systems

Allowing read operations to return stale data with low probability has been proposed as a means to increase availability in quorums systems. Existing solutions that allow stale reads cannot tolerate an adversarial scheduler that can maliciously delay messages between servers and clients in the system and for such a scheduler existing solutions cannot enforce a bound on the staleness of data read. This paper considers the possibility of increasing system availability while at the same time tolerating a malicious scheduler and guaranteeing an upper bound on the staleness of data. We characterize the conditions under which this increase is possible and show that it depends on the ratio of the write frequency to the servers’ failure frequency. For environments with a relatively large failure frequency compared to write frequency, we propose K-quorums that can provide higher availability than the strict quorum systems and also guarantee bounded staleness. We also propose a definition of k-atomicity and present a protocol to implement a k-atomic register using k-quorums.

Amitanand Aiyer, Lorenzo Alvisi, Rida A. Bazzi
Musical Benches

We propose the

musical benches problem

to model a wait-free coordination difficulty that is orthogonal to previously studied ones such as agreement or symmetry breaking (leader election or renaming). A

bench

is the usual binary consensus problem for 2 processes. Assume

n

+1 processes want to sit in

n

benches as follows. Each one starts with a preference, consisting of a bench and one place (left or right) in the bench where it wants to sit. Each process should produce as output the place of the bench where it decides to sit. It is required that no two processes sit in different places of the same bench. Upon the observance of a conflict in one of the benches an undecided process can “abandon” its initial bench and place and try to sit in another bench at another place.

The musical benches problem is so called because processes jump from bench to bench trying to find one in which they may be alone or not in conflict with one another. If at most one process starts in each bench, the problem is trivially solvable– each process stays in its place. We show that if there is just one bench where two processes rather than one, start, the problem is wait-free unsolvable in read/write shared memory. This impossibility establishes a new connection between distributed computing and topology, via the Borsuk-Ulam theorem.

The musical benches problem seems like just a collection of consensus problems, where by the pigeon hole principle at least one of them will have to be solved by two processes. Consequently, one is tempted to try to find a bivalency impossibility proof of the FLP style. Our second result shows that there is no such proof: We present an algorithm to solve the musical benches problem using set agreement, a primitive stronger than read/write registers, but weaker than consensus. Thus, an FLP-style impossibility for musical benches will imply an FLP-style impossibility of set-consensus.

The musical benches problem can be generalized by considering benches other than consensus, such as set agreement or renaming, leading to a very interesting class of new problems.

Eli Gafni, Sergio Rajsbaum
Obstruction-Free Algorithms Can Be Practically Wait-Free

The obstruction-free progress condition is weaker than previous nonblocking progress conditions such as lock-freedom and wait-freedom, and admits simpler implementations that are faster in the uncontended case. Pragmatic contention management techniques appear to be effective at facilitating progress in practice, but, as far as we know, none

guarantees

progress.

We present a transformation that converts any obstruction-free algorithm into one that is wait-free when analyzed in the

unknown-bound

semisynchronous model. Because all practical systems satisfy the assumptions of the unknown-bound model, our result implies that, for all practical purposes, obstruction-free implementations can provide progress guarantees equivalent to wait-freedom. Our transformation preserves the advantages of any pragmatic contention manager, while guaranteeing progress.

Faith Ellen Fich, Victor Luchangco, Mark Moir, Nir Shavit
Efficient Reduction for Wait-Free Termination Detection in a Crash-Prone Distributed System

We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform

any

termination detection algorithm

$\mathcal{A}$

that has been designed for a failure-free environment into a termination detection algorithm

$\mathcal{B}$

that can tolerate process crashes. Our transformation assumes the existence of a

perfect failure detector

. We show that a perfect failure detector is in fact necessary to solve the termination detection problem in a crash-prone distributed system even if

at most one

process can crash.

Let

μ

(

n

,

M

) and

δ

(

n

,

M

) denote the message complexity and detection latency, respectively, of

$\mathcal{A}$

when the system has

n

processes and the underlying computation exchanges

M

application messages. The message complexity of

$\mathcal{B}$

is at most

O

(

n

+

μ

(

n

,0)) messages per failure more than the message complexity of

$\mathcal{A}$

. Also, its detection latency is at most

O

(

δ

(

n

,0)) per failure more than that of

$\mathcal{A}$

. Furthermore, the overhead (that is, the amount of control data piggybacked) on an application message increases by only

O

(log

n

) bits per failure.

The fault-tolerant termination detection algorithm resulting from the transformation satisfies two desirable properties. First, it can tolerate failure of up to

n

–1 processes, that is, it is

wait-free

. Second, it does not impose any overhead on the fault-sensitive termination detection algorithm until one or more processes crash, that is, it is

fault-reactive

. Our transformation can be extended to arbitrary communication topologies provided process crashes do not partition the system.

Neeraj Mittal, Felix C. Freiling, S. Venkatesan, Lucia Draque Penso
Non-blocking Hashtables with Open Addressing

We present the first non-blocking hashtable based on open addressing that provides the following benefits: it combines good cache locality, accessing a single cacheline if there are no collisions, with short straight-line code; it needs no storage overhead for pointers and memory allocator schemes, having instead an overhead of two words per bucket; it does not need to periodically reorganise or replicate the table; and it does not need garbage collection, even with arbitrary-sized keys. Open problems include resizing the table and replacing, rather than erasing, entries. The result is a highly-concurrent set algorithm that approaches or outperforms the best externally-chained implementations we tested, with fixed memory costs and no need to select or fine-tune a garbage collector or locking strategy.

Chris Purcell, Tim Harris
Computing with Reads and Writes in the Absence of Step Contention
Extended Abstract

This paper studies implementations of concurrent objects that exploit the absence of

step contention

. These implementations use only reads and writes when a process is running solo. The other processes might be busy with other objects, swapped-out, failed, or simply delayed by a contention manager. We study in this paper two classes of such implementations, according to how they handle the case of step contention. The first kind, called

obstruction-free

implementations, are not required to terminate in that case. The second kind, called

solo-fast

implementations, terminate using powerful operations (e.g., C&S).

We present a generic obstruction-free object implementation that has a linear contention-free step complexity (number of reads and writes taken by a process running solo) and uses a linear number of read/write objects. We show that these complexities are asymptotically optimal, and hence generic obstruction-free implementations are inherently slow. We also prove that obstruction-free implementations cannot be

gracefully degrading

, namely, be nonblocking when the contention manager operates correctly, and remain (at least) obstruction-free when the contention manager misbehaves.

Finally, we show that any object has a

solo-fast

implementation, based on a solo-fast implementation of consensus. The implementation has linear contention-free step complexity, and we conjecture solo-fast implementations must have non-constant step complexity, i.e., they are also inherently slow.

Hagit Attiya, Rachid Guerraoui, Petr Kouznetsov
Restricted Stack Implementations

We introduce a new object, BH, and prove that a system with one BH object and single-writer Registers has the same computational power as a system with countably many commutative and overwriting objects. This provides a simple characterization of the class of objects that can be implemented from commutative and overwriting objects, and creates a potential tool for proving impossibility results.

It has been conjectured that Stacks and Queues shared by three or more processes

are not

in this class. In this paper, we use a BH object to show that two different restricted versions of Stacks

are

in this class. Specifically, we give an implementation of a Stack that supports any number of poppers, but at most two pushers. We also implement a Stack (or Queue) shared by any number of processes, but, in which, all stored elements are the same.

Matei David, Alex Brodsky, Faith Ellen Fich
Proving Atomicity: An Assertional Approach

Atomicity (or

linearizability

) is a commonly used consistency criterion for distributed services and objects. Although atomic object implementations are abundant, proving that algorithms achieve atomicity has turned out to be a challenging problem. In this paper, we initiate the study of systematic ways of verifying distributed implementations of atomic objects, beginning with read/write objects (registers). Our general approach is to replace the existing operational reasoning about events and partial orders with assertional reasoning about invariants and simulation relations. To this end, we define an abstract state machine that captures the atomicity property and prove correctness of the object implementations by establishing a simulation mapping between the implementation and the specification automata. We demonstrate the generality of our specification by showing that it is implemented by three different read/write register constructions: the message-passing register emulation of Attiya, Bar-Noy and Dolev, its optimized version based on real time, and the shared memory register construction of Vitanyi and Awerbuch. In addition, we show that a simplified version of our specification is implemented by a general atomic object construction based on the Lamport’s replicated state machine algorithm.

Gregory Chockler, Nancy Lynch, Sayan Mitra, Joshua Tauber
Time and Space Lower Bounds for Implementations Using k-CAS
Extended Abstract

This paper presents lower bounds on the time- and space-complexity of implementations that use the

k

compare-and-swap (

k

-CAS) synchronization primitives. We prove that the use of

k

-CAS primitives cannot improve neither the time- nor the space-complexity of implementations of widely-used concurrent objects, such as counter, stack, queue, and collect. Surprisingly, the use of

k

-CAS may even

increase

the space complexity required by such implementations.

We prove that the worst-case

average

number of steps performed by processes for any

n

-process implementation of a counter, stack or queue object is Ω(log

k

 + 1

n

), even if the implementation can use

j

-CAS for

j

k

. This bound holds even if a

k

-CAS operation is allowed to

read

the

k

values of the objects it accesses and return these values to the calling process. This bound is tight.

We also consider more realistic

non-readingk

-CAS primitives. An operation of a non-reading

k

-CAS primitive is only allowed to return a success/failure indication. For implementations of the

collect

object that use such primitives, we prove that the worst-case average number of steps performed by processes is Ω(log

2

n

), regardless of the value of

k

. This implies a

round complexity

lower bound of Ω(log

2

n

) for such implementations. As there is an

O

(log

2

n

) round complexity implementation of collect that uses only reads and writes, these results establish that non-reading

k

-CAS is no stronger than read and write for collect implementation round complexity.

We also prove that

k

-CAS does not improve the space complexity of implementing many objects (including counter, stack, queue, and single-writer snapshot). An implementation has to use at least

n

base objects even if

k

-CAS is allowed, and if all operations (other than read) swap exactly

k

base objects, then the space complexity must be at least

k

·

n

.

Hagit Attiya, Danny Hendler
(Almost) All Objects Are Universal in Message Passing Systems
Extended Abstract

This paper shows that all shared atomic object types that can solve consensus among

k

>1 processes have the same weakest failure detector in a message passing system with process crash failures. In such a system, object types such as

test-and-set

,

fetch-and-add

, and

queue

, known to have weak synchronization power in a shared memory system are thus, in a precise sense, equivalent to universal types like

compare-and-swap

, known to have the strongest synchronization power. In the particular case of a message passing system of two processes, we show that, interestingly, even a

register

is in that sense universal.

Carole Delporte-Gallet, Hugues Fauconnier, Rachid Guerraoui
Ω Meets Paxos: Leader Election and Stability Without Eventual Timely Links

This paper provides a realization of distributed leader election without having any eventual timely links. Progress is guaranteed in the following weak setting: Eventually one process can send messages such that every message obtains

f

timely responses, where

f

is a resilience bound. A crucial facet of this property is that the

f

responders need

not

be fixed, and may change from one message to another. In particular, this means that no specific link needs to remain timely. In the (common) case where

f

=1, this implies that the FLP impossibility result on consensus is circumvented if one process can at any time communicate in a timely manner with one other process in the system.

The protocol also bears significant practical importance to well-known coordination schemes such as Paxos, because our setting more precisely captures the conditions on the elected leader for reaching timely consensus. Additionally, an extension of our protocol provides leader

stability

, which guarantees against arbitrary demotion of a qualified leader and avoids performance penalties associated with leader changes in schemes such as Paxos.

Dahlia Malkhi, Florin Oprea, Lidong Zhou
Plausible Clocks with Bounded Inaccuracy

In a distributed system with

N

processes, time stamps of size

N

(such as vector clocks) are necessary to accurately track potential causality between events. Plausible clocks are a family of time-stamping schemes that use smaller time stamps at the expense of some accuracy. To date, all plausible clocks have been designed to use fixed-sized time stamps, and the inaccuracy of these schemes varies from run to run. In this paper, we define a new metric, imprecision, that formally characterizes the fidelity of a plausible clock. We present a new plausible clock system that guarantees an arbitrary constant bound on imprecision. This bound is achieved by allowing time stamps to grow and shrink over the course of the computation. We verify the correctness of our algorithm, present results of a simulation study, and evaluate its performance.

Brad T. Moore, Paolo A. G. Sivilotti
Causing Communication Closure: Safe Program Composition with Non-FIFO Channels

A semantic framework for analyzing safe composition of distributed programs is presented. Its applicability is illustrated by a study of program composition when communication is reliable but not necessarily FIFO . In this model, special care must be taken to ensure that messages do not accidentally overtake one another in the composed program. We show that barriers do not exist in this model. Indeed, no program that sends or receives messages can automatically be composed with arbitrary programs without jeopardizing their intended behavior. Safety of composition becomes context-sensitive and new tools are needed for ensuring it. A notion of

sealing

is defined, where if a program

P

is immediately followed by a program

Q

that seals

P

then

P

will be communication-closed—it will execute as if it runs in isolation. The investigation of sealing in this model reveals a novel connection between Lamport causality and safe composition. A characterization of sealable programs is given, as well as efficient algorithms for testing if

Q

seals

P

and for constructing a seal for a significant class of programs. It is shown that every sealable program that is open to interference on

O

(

n

2

) channels can be sealed using

O

(

n

) messages.

Kai Engelhardt, Yoram Moses
What Can Be Implemented Anonymously?

The vast majority of papers on distributed computing assume that processes are assigned unique identifiers before computation begins. But is this assumption necessary? What if processes do not have unique identifiers or do not wish to divulge them for reasons of privacy? We consider asynchronous shared-memory systems that are anonymous. The shared memory contains only the most common type of shared objects, read/write registers. We investigate, for the first time, what can be implemented deterministically in this model when processes can fail. We give anonymous algorithms for some fundamental problems: timestamping, snapshots and consensus. Our solutions to the first two are wait-free and the third is obstruction-free. We also show that a shared object has an obstruction-free implementation if and only if it satisfies a simple property called idempotence. To prove the sufficiency of this condition, we give a universal construction that implements any idempotent object.

Rachid Guerraoui, Eric Ruppert
Waking Up Anonymous Ad Hoc Radio Networks

We consider the task of waking up an anonymous ad hoc radio network from a single source, by a deterministic algorithm. In the beginning only the source is awake and has to wake up other nodes by disseminating messages throughout the network. Nodes of the network do not know its topology and they do not have distinct labels. In such networks some nodes are impossible to reach. A node in a network is

accessible

if it can be woken up by some (possibly network-dependent) deterministic algorithm. A deterministic wakeup algorithm for ad hoc networks is

universal

if it wakes up all accessible nodes in all networks. We study the question of the existence of such a universal wakeup algorithm. For synchronous communication we design a universal wakeup algorithm, and for asynchronous communication we show that no such algorithm exists.

Andrzej Pelc
Fast Deterministic Distributed Maximal Independent Set Computation on Growth-Bounded Graphs

The distributed complexity of computing a maximal independent set in a graph is of both practical and theoretical importance. While there exists an elegant

O

(log

n

) time randomized algorithm for general graphs [20], no deterministic polylogarithmic algorithm is known. In this paper, we study the problem in graphs with bounded growth, an important family of graphs which includes the well-known unit disk graph and many variants thereof. Particularly, we propose a deterministic algorithm that computes a maximal independent set in time

O

(log Δ· log

*

n

) in graphs with bounded growth, where

n

and Δ denote the number of nodes and the maximal degree in

G

, respectively.

Fabian Kuhn, Thomas Moscibroda, Tim Nieberg, Roger Wattenhofer
Distributed Computing with Imperfect Randomness

Randomness is a critical resource in many computational scenarios, enabling solutions where deterministic ones are elusive or even provably impossible. However, the randomized solutions to these tasks assume access to a source of unbiased, independent coins. Physical sources of randomness, on the other hand, are rarely unbiased and independent although they do seem to exhibit somewhat imperfect randomness. This gap in modeling questions the relevance of current randomized solutions to computational tasks. Indeed, there has been substantial investigation of this issue in complexity theory in the context of the applications to efficient algorithms and cryptography.

In this paper, we seek to determine whether imperfect randomness, modeled appropriately, is “good enough” for distributed algorithms. Namely can we do with imperfect randomness all that we can do with perfect randomness, and with comparable efficiency ? We answer this question in the affirmative, for the problem of Byzantine agreement. We construct protocols for Byzantine agreement in a variety of scenarios (synchronous or asynchronous networks, with or without private channels), in which the players have imperfect randomness. Our solutions are essentially as efficient as the best known randomized agreement protocols, despite the defects in the randomness.

Shafi Goldwasser, Madhu Sudan, Vinod Vaikuntanathan
Polymorphic Contention Management

In software transactional memory (STM) systems, a contention manager resolves conflicts among transactions accessing the same memory locations. Whereas atomicity and serializability of the transactions are guaranteed at all times, the contention manager is of crucial importance for guaranteeing that the system as a whole makes progress.

A number of different contention management policies have been proposed and evaluated in the recent literature. An empirical evaluation of these policies leads to the striking result that there seems to be no “universal” contention manager that works best under all reasonable circumstances. Instead, transaction throughput can vary dramatically depending on factors such as transaction length, data access patterns, the length of contended vs. uncontended phases, and so on.

This paper proposes

polymorphic contention management

, a structure that allows contention managers to vary not just across workloads, but across concurrent transactions in a single workload, and even across different phases of a single transaction. The ability to mix contention managers or to change them on-the-fly provides performance benefits, but also poses number of questions concerning how a contention manager of a given class can interact in a useful way with contention managers of different, possibly unknown classes. We address these questions by classifying contention managers in a hierarchy, based on the cost associated with each contention manager, and present a general algorithm to handle conflict between contention managers from different classes. We describe how our polymorphic contention management structure is smoothly integrated with nested transactions in the SXM library.

Rachid Guerraoui, Maurice Herlihy, Bastian Pochon
Distributed Transactional Memory for Metric-Space Networks

Transactional Memory is a concurrent programming API in which concurrent threads synchronize via transactions (instead of locks). Although this model has mostly been studied in the context of multiprocessors, it has attractive features for distributed systems as well. In this paper, we consider the problem of implementing transactional memory in a network of nodes where communication costs form a metric. The heart of our design is a new cache-coherence protocol, called the Ballistic protocol, for tracking and moving up-to-date copies of cached objects. For constant-doubling metrics, a broad class encompassing both Euclidean spaces and growth-restricted networks, this protocol has stretch logarithmic in the diameter of the network.

Maurice Herlihy, Ye Sun
Concise Version Vectors in WinFS

Conflicts naturally arise in optimistically replicated systems. The common way to detect update conflicts is via version vectors, whose storage and communication overhead are

number of replicas

×

number of objects

. These costs may be prohibitive for large systems.

This paper presents

predecessor vectors with exceptions

(PVEs), a novel optimistic replication technique developed for Microsoft’s WinFS system. The paper contains a systematic study of PVE’s performance gains over traditional schemes. The results demonstrate a dramatic reduction of storage and communication overhead in normal scenarios, during which communication disruptions are infrequent. Moreover, they identify a cross-over threshold in communication failure-rate, beyond which PVEs loses efficiency compared with traditional schemes.

Dahlia Malkhi, Doug Terry
Adaptive Software Transactional Memory

Software Transactional Memory (STM) is a generic synchronization construct that enables automatic conversion of correct sequential objects into correct

nonblocking

concurrent objects. Recent STM systems, though significantly more practical than their predecessors, display inconsistent performance: differing design decisions cause different systems to perform best in different circumstances, often by dramatic margins. In this paper we consider four dimensions of the STM design space: (i)

when

concurrent objects are acquired by transactions for modification; (ii)

how

they are acquired; (iii) what they look like when not acquired; and (iv) the non-blocking semantics for transactions (lock-freedom vs. obstruction-freedom). In this 4-dimensional space we highlight the locations of two leading STM systems: the DSTM of Herlihy et al. and the OSTM of Fraser and Harris. Drawing motivation from the performance of a series of application benchmarks, we then present a new

Adaptive

STM (ASTM) system that adjusts to the offered workload, allowing it to match the performance of the best known existing system on every tested workload.

Virendra J. Marathe, William N. Scherer III, Michael L. Scott
Optimistic Generic Broadcast

We consider an asynchronous system with the Ω failure detector, and investigate the number of communication steps required by various broadcast protocols in runs in which the leader does not change. Atomic Broadcast, used for example in state machine replication, requires three communication steps. Optimistic Atomic Broadcast requires only two steps if all correct processes receive messages in the same order. Generic Broadcast requires two steps if no messages conflict. We present an algorithm that subsumes both of these approaches and guarantees two-step delivery if all conflicting messages are received in the same order, and three-step delivery otherwise. Internally, our protocol uses two new algorithms. First, a Consensus algorithm which decides in one communication step if all proposals are the same, and needs two steps otherwise. Second, a method that allows us to run infinitely many instances of a distributed algorithm, provided that only finitely many of them are different. We assume that fewer than a third of all processes are faulty (

n

> 3

f

).

Piotr Zieliński
Space and Step Complexity Efficient Adaptive Collect

Space and step complexity efficient deterministic adaptive to total contention collect algorithms are presented. One of them has an optimal

O

(

k

) step and

O

(

n

) space complexities, but restrict the processes identifiers size to

O

(

n

). Where

n

is the total number of processes in the system and

k

is the total contention, the total number of processes active during the execution. Unrestricting the name space increases the space complexity to

O

(

n

2

) leaving the step complexity at

O

(

k

). To date all deterministic adaptive collect algorithms that we are aware of are either nearly quadratic in their step complexity or their memory overhead is exponential in

n

.

Yehuda Afek, Yaron De Levie
Observing Locally Self-stabilization in a Probabilistic Way

A self-stabilizing algorithm cannot detect by itself that stabilization has been reached. For overcoming this drawback Lin and Simon introduced the notion of an external observer: a set of processes, one being located at each node, whose role is to detect stabilization. Furthermore, Beauquier, Pilard and Rozoy introduced the notion of a local observer: a single observing entity located at an unique node. This entity is not allowed to detect false stabilization, must eventually detect that stabilization is reached, and must not interfere with the observed algorithm.

We introduce here the notion of probabilistic observer which realizes the conditions above only with probability 1. We show that computing the size of an anonymous ring with a synchronous self-stabilizing algorithm cannot be observed deterministically. We prove that some synchronous self-stabilizing solution to this problem can be observed probabilistically.

Joffroy Beauquier, Laurence Pilard, Brigitte Rozoy
Asymptotically Optimal Solutions for Small World Graphs

We consider the problem of determining constructions with an asymptotically optimal oblivious diameter in small world graphs under the Kleinberg’s model. In particular, we give the first general lower bound holding for any monotone distance distribution, that is induced by a monotone generating function. Namely, we prove that the expected oblivious diameter is Ω (log

2

n

) even on a path of

n

nodes. We then focus on deterministic constructions and after showing that the problem of minimizing the oblivious diameter is generally intractable, we give asymptotically optimal solutions, that is with a logarithmic oblivious diameter, for paths, trees and Cartesian products of graphs, including

d

-dimensional grids for any fixed value of

d

.

Michele Flammini, Luca Moscardelli, Alfredo Navarra, Stephane Perennes
Deciding Stability in Packet-Switched FIFO Networks Under the Adversarial Queuing Model in Polynomial Time,

In spite of the importance of the

fifo

protocol and the research efforts invested in obtaining results for it, deciding whether a given (packet-switched) network is stable under

fifo

has remained an open question for several years. In this work, we address the general case of this problem and try to characterize the property of stability under

fifo

in terms of network topologies. Such a characterization provides us with the family of network topologies that, under the

fifo

protocol, can be made unstable by some adversarial traffic pattern. We show that the property of stability under

fifo

is decidable in polynomial time.

Maria J. Blesa
Compact Routing for Graphs Excluding a Fixed Minor
Extended Abstract

This paper concerns compact routing schemes with arbitrary node names. We present a compact name-independent routing scheme for unweighted networks with

n

nodes excluding a fixed minor. For any fixed minor, the scheme, constructible in polynomial time, has constant stretch factor and requires routing tables with poly-logarithmic number of bits at each node.

For shortest-path labeled routing scheme in planar graphs, we prove an Ω(

n

ε

) space lower bound for some constant

ε

> 0. This lower bound holds even for bounded degree triangulations, and is optimal for polynomially weighted planar graphs (

ε

=1/2).

Ittai Abraham, Cyril Gavoille, Dahlia Malkhi
General Compact Labeling Schemes for Dynamic Trees

An

F

- labeling scheme is composed of a

marker

algorithm for labeling the vertices of a graph with short labels, coupled with a

decoder

algorithm allowing one to compute

F

(

u

,

v

) of any two vertices

u

and

v

directly from their labels. As applications for labeling schemes concern mainly large and dynamically changing networks, it is of interest to study

distributed dynamic

labeling schemes.

A general method for constructing labeling schemes for dynamic trees was previously developed in [28]. This method is based on extending an existing

static

tree labeling scheme to the dynamic setting. This approach fits many natural functions on trees, such as distance, routing, nearest common ancestor etc.. The resulted dynamic schemes incur overheads (over the static scheme) on the label size and on the communication complexity. In particular, all their schemes yield a multiplicative overhead factor of Ω(log

n

) on the label sizes of the static schemes. Following [28], we develop a different general method for extending static labeling schemes to the dynamic tree settings. Our method fits the same class of tree functions. In contrast to the above paper, our trade-off is designed to minimize the label size on expense of communication.

Informally, for any

k

we present a dynamic labeling scheme incurring multiplicative overhead factors (over the static scheme) of

O

(log

k

n

) on the label size and

O

(

k

log

k

n

) on the amortized message complexity. In particular, by setting

$k = \sqrt{n}$

, we obtain dynamic labeling schemes with asymptotically optimal label sizes and sublinear amortized message complexity for the routing and the nearest common ancestor functions.

Amos Korman
The Dynamic And-Or Quorum System

We investigate issues related to the probe complexity of the And-Or quorum system and its implementation in a dynamic environment. Our contribution is twofold: We first analyze the algorithmic probe complexity of the And-Or quorum system, and present two optimal algorithms. The first is a non-adaptive algorithm with

$O(\sqrt{n}log n)$

probe complexity, which matches a known lower bound. The second is an adaptive algorithm with a probe complexity that is linear in the cardinality of a quorum set (

$O(\sqrt{n})$

), and requires at most

O

(loglog

n

) rounds. To the best of our knowledge, all other adaptive algorithms with same parameters (load and probe complexity) require

$\theta(\sqrt{n})$

rounds.

Our second contribution is presenting the

‘dynamic And-Or’

quorum system – an adaptation of the above quorum system to a dynamic environment, where processors join and leave the network. It is based on a dynamic overlay network that emulates the De-Bruijn network and maintains the good properties of the quorum system(e.g.,load and availability). The algorithms suggested for the maintenance of these dynamic data structures are strongly coupled with the dynamic overlay network. This fact enables the use of gossip protocols which saves in message complexity and keeps the protocols simple and local. All these qualities make the

‘dynamic And-Or’

an excellent candidate for an implementation of dynamic quorums.

Uri Nadav, Moni Naor

Brief Announcements

Byzantine Clients Rendered Harmless

The original work on quorum systems assumed that servers fail benignly, by crashing or omitting some steps. More recently, researchers have developed techniques that enable quorum systems to provide data availability in the presence of arbitrary (Byzantine) faults [6]. Earlier work provides correct semantics despite server (i.e., replica) failures and also handles some of the problems of Byzantine clients [1,2,4,6, 9].

This paper describes the first protocols to handle all problems caused by Byzantine clients. Our protocols ensure that bad clients cannot interfere with good clients. Bad clients cannot prevent good clients from completing reads and writes, nor can they cause good clients to see inconsistencies. In addition bad clients that have been removed from operation can leave behind at most a bounded number of “lurking” writes that could be done on their behalf by a colluder.

Barbara Liskov, Rodrigo Rodrigues
Reliably Executing Tasks in the Presence of Malicious Processors

The demand for processing large amounts of data has increased over the last decade. As traditional one-processor machines have limited computational power, distributed systems consisting of multitude of cooperating processing units are used instead. An example of such a massive distributed cooperative computation is the SETI@Home project [5]. As the search for extraterrestrial intelligence involves the analysis of gigabytes of raw data that a fixed-size collection of machines would not be able to effectively carry out, the data are distributed to millions of voluntary machines around the world. A machine acts as a server and sends data (aka tasks) to these client computers, which they process and report back the result of the task computation. This gives rise to a crucial problem: how can we prevent malicious clients from damaging the outcome of the overall computation?

In this work we abstract this problem in the form of a distributed system consisting of a master fail-free processor and a collection of processors (workers) that can execute tasks; worker processors might act maliciously. Since each task returns a value, we want the master to accept only correct values with high probability. Furthermore, we assume that the service provided by the workers is not free (as opposed to the SETI@Home project). For each task that a worker executes, the master computer is charged with a work-unit. Therefore, considering a single task assigned to several workers, our goal is to have the master computer to accept the correct value of the task with high probability, with the smallest possible amount of work. We explore two ways of bounding the number of faulty processors and evaluate an algorithm that the master can run. Our preliminary analytical results show that it is possible to obtain high probability of correct acceptance with reasonable amount of work.

Antonio Fernández, Chryssis Georgiou, Luis López, Agustín Santos
Obstruction-Free Step Complexity: Lock-Free DCAS as an Example

We propose

obstruction-free step complexity

, a new complexity measure for nonblocking algorithms. We believe that this measure provides a more pragmatic quanti.cation of nonblocking algorithms than previous measures, providing better guidance for designers of practical nonblocking algorithms.

In our opinion, the main shortcoming of existing complexity measures for nonblocking algorithms is that they are targeted towardsworst-case behavior inworstcase scenarios, and say little about behavior inmore common cases.This is true for the

sensitivity

measure of Attiya and Dagan [1], and the

d-local step complexity

of Afek et al. [2]. These measures are directed at evaluating the behavior of algorithms under contention, i.e., when concurrent operations actively interfere with each other’s progress. However, in practice, a well-designed system manages contention so that it does not impact performance too greatly. Thus, these previous measures do not evaluate the behaviour that is likely to be observed.

Faith Ellen Fich, Victor Luchangco, Mark Moir, Nir Shavit
Communication-Efficient Implementation of Failure Detector Classes $\diamondsuit\mathcal{Q}$ and $\diamondsuit\mathcal{P}$

Several algorithms implementing failure detector classes

$\diamondsuit\mathcal{Q}$

and

$\diamondsuit\mathcal{P}$

have been proposed in the literature. The algorithm proposed by Chandra and Toueg in [2] uses a heartbeat mechanism and all-to-all communication to detect faulty processes. The algorithms proposed by Aguilera et al. in [1] and by Larrea et al. in [4] use heartbeats too, and rely on a leader-based approach. On the other hand, the algorithm proposed by Larrea et al. in [3] uses a polling —or query/reply— mechanism on a ring arrangement of processes. The leader-based and the ringbased algorithms are more e.cient than the all-to-all algorithm regarding the number of messages exchanged (linear vs. quadratic). Compared to polling, the heartbeat mechanism reduces the number of messages to the half. Therefore, a heartbeat and ring-based algorithm should outperform the former ones.

Mikel Larrea, Alberto Lafuente
Optimal Resilience for Erasure-Coded Byzantine Distributed Storage

We consider distributed storage systems implemented by a set of

n

servers and accessed by a possibly unbounded set of clients, for reading and writing data. Servers and clients communicate by exchanging messages over a fully connected asynchronous network. This model is suitable for heterogeneous and wide-area networks, and, furthermore, avoids timing assumptions, which may otherwise become a vulnerability of the system. We consider Byzantine failures and assume that up to

t

servers and any number of clients may deviate from the protocol in an arbitrary way.

Christian Cachin, Stefano Tessaro
Agreement Among Unacquainted Byzantine Generals

The Byzantine Agreement (BA) problem introduced by Pease, Shostak and Lamport in [1] is one of the central problems in distributed computing. It was extensively studied under various timing, topology, authentication and failure assumptions. In previous works it was assumed that the network topology is known to the processors in advance, i.e., every processor has an a priori knowledge of the true unique identi.er of the processor to which it is connected by each of its communication channels (see Fig. 1a).

Michael Okun
Subscription Propagation and Content-Based Routing with Delivery Guarantees

Subscription propagation enables efficient content-based routing in publish/subscribe systems and is a challenging problem when it is required to support reliable delivery in networks with redundant routes. We have designed a generic model and a highly-asynchronous algorithm accomplishing these goals. Existing algorithms can be interpreted as different encodings and optimizations of the generic algorithm and hence their correctness can be derived from the generic algorithm.

Yuanyuan Zhao, Sumeer Bhola, Daniel Sturman
Asynchronous Verifiable Information Dispersal

We consider the distribution of data by a client among a set of

n

storage servers, of which up to

t

might be faulty exhibiting arbitrary, i.e.,

Byzantine

, behavior. The goal is to ensure that clients can always recover the stored data correctly, independently from the behavior of faulty servers or other, faulty clients. An inefficient solution is based on replication such that every server keeps a copy of the data. The classic alternative is

information dispersal

(IDA): using an

erasure code

, the data is split into blocks such that each server holds exactly one block and only a subset of the blocks is needed in order to reconstruct the data.

Christian Cachin, Stefano Tessaro
Towards a Theory of Self-organization

Self-organization is an evolutionary process in which the e.ects of the environment are minimal; i.e., where the development of new, complex structures primarily takes place in and throughout the system itself. Natural phenomena, living forms, or social systems (e.g., growing crystals, cells aggregation, ant colonies) are examples of self-organizing systems in which a global order of the system emerges from local interactions. In the newly emerging fields of distributed systems (p2p, ad-hoc networks, sensor networks, cooperative robotics), self-organization has become one of the most desired properties. The major feature of all recent scalable distributed systems is their extreme dynamism in terms of structure, content, and load. In peer-to-peer systems, self-organization is handled through protocols for node arrival and departure, based either on a fault-tolerant overlay network, such as in CAN, Chord, Pastry, or on a localization and routing infrastructure [2]. In ad-hoc networks, self-organizing solutions have been designed to cluster ad-hoc nodes [4]. Self-organizing algorithms have also been developed to arrange mobile robots into prede.ned geometric patterns (e.g., [3]).

E. Anceaume, X. Defago, M. Gradinariu, M. Roy
Timing Games and Shared Memory

We model a simple problem in advertising as a strategic timing game, and consider continuous and discrete versions of this game. For the continuous game, we completely characterize the Nash equilibrium for two players. For the discrete game, we give an efficient algorithm to compute the Nash equilibrium for

n

players.

Zvi Lotker, Boaz Patt-Shamir, Mark R. Tuttle
A Lightweight Group Mutual k-Exclusion Algorithm Using Bi-k-Arbiters

In this paper, we propose a bi-

k

-arbiter quorum structure for the group mutual

k

-exclusion problem in distributed systems. We design a lightweight group mutual

k

-exclusion algorithm adopting the bi-

k

-arbiter to solve the group mutual

k

-exclusion problem. Adopting bi-

k

-arbiters, the concurrent processes accessing to the same resource could use the lightweight quorum to reduce the message complexity.

Yu-Chen Kuo, Huang-Chen Lee
Could any Graph be Turned into a Small-World?

In the last decade, effective measurements of real interaction networks have revealed specific unexpected properties. Among these, most of these networks present a very small diameter and a high clustering. Furthermore, very short paths can be effciently found between any pair of nodes without global knowledge of the network (i.e., in a decentralized manner) which is known as the small-world phenomenon [1]. Several models have been proposed to explain this phenomenon [2,3]. However, Kleinberg showed in [4] that these models lack the

essential navigability

property: in spite of a polylogarithmic diameter, decentralized routing requires the visit of a polynomial number of nodes in these models.

Philippe Duchon, Nicolas Hanusse, Emmanuelle Lebhar, Nicolas Schabanel
Papillon: Greedy Routing in Rings

We construct the first

n

-node degree-

d

ring-based network with worst-case

greedy

routes of length Θ(log

n

/ log

d

) hops.

Ittai Abraham, Dahlia Malkhi, Gurmeet Singh Manku
An Efficient Long-Lived Adaptive Collect Algorithm

We present a new long-lived, efficient, adaptive collect algorithm. Namely, our algorithm adapts to

$\mathcal{K}$

-contention – it has the property that if during an operation the interval contention

k

exceeds a predetermined constant

$\mathcal{K}$

the step complexity is

O

(

N

). If, it falls below

$\mathcal{K}$

, the processors executions will eventually have adaptive step complexity of

O

(

k

3

). Moreover, for

$\mathcal{K}$

such that

$\mathcal{K}^3 \leq N$

our algorithm requires only

O

(

N

2

) shared memory registers.

Burkhard Englert
Backmatter
Metadata
Title
Distributed Computing
Editor
Pierre Fraigniaud
Copyright Year
2005
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-32075-3
Print ISBN
978-3-540-29163-3
DOI
https://doi.org/10.1007/11561927

Premium Partner