Workshop Report

Abstract

A Self-Repairing Peer-to-Peer System Resistant to Dynamic Adversarial Churn. Presented by Stefan Schmid.

Mahesh Balakrishnan, Maya Haridasan, Prakash Linga, Hongzhou Liu, Venu Ramasubramanian, Sean Rhea, Manpreet Singh, Vidhyashankar Venkatraman, Vivek Vishnumurthy, Kevin Walsh, Bernard Wong, Ming Zhong

Security and Incentives

A Self-repairing Peer-to-Peer System Resilient to Dynamic Adversarial Churn

Abstract

We present a dynamic distributed hash table where peers may join and leave at any time. Our system tolerates a powerful adversary which has complete visibility of the entire state of the system and can continuously add and remove peers. Our system provides worst-case fault-tolerance, maintaining desirable properties such as a low peer degree and a low network diameter.

Fabian Kuhn, Stefan Schmid, Roger Wattenhofer

A First Look at Peer-to-Peer Worms: Threats and Defenses

Abstract

Peer-to-peer (P2P) worms exploit common vulnerabilities in member hosts of a P2P network and spread topologically in the P2P network, a potentially more effective strategy than random scanning for locating victims. This paper describes the danger posed by P2P worms and initiates the study of possible mitigation mechanisms. In particular, the paper explores the feasibility of a self-defense infrastructure inside a P2P network, outlines the challenges, evaluates how well this defense mechanism contains P2P worms, and reveals correlations between containment and the overlay topology of a P2P network. Our experiments suggest a number of design directions to improve the resilience of P2P networks to worm attacks.

Lidong Zhou, Lintao Zhang, Frank McSherry, Nicole Immorlica, Manuel Costa, Steve Chien

A Taxonomy of Rational Attacks

Abstract

For peer-to-peer services to be effective, participating nodes must cooperate, but in most scenarios a node represents a self-interested party and cooperation can neither be expected nor enforced. A reasonable assumption is that a large fraction of p2p nodes are rational and will attempt to maximize their consumption of system resources while minimizing the use of their own. If such behavior violates system policy then it constitutes an attack. In this paper we identify and create a taxonomy for rational attacks and then identify corresponding solutions if they exist. The most effective solutions directly incentivize cooperative behavior, but when this is not feasible the common alternative is to incentivize evidence of cooperation instead.

Seth James Nielson, Scott A. Crosby, Dan S. Wallach

Search

Brushwood: Distributed Trees in Peer-to-Peer Systems

Abstract

There is an increasing demand for locality-preserving distribution of complex data structures in peer-to-peer systems. Current systems either do not preserve object locality or suffer from imbalances in data distribution, routing state, and/or query processing costs. In this position paper, we take a systematic approach that enables the deployment of searchable tree structures in p2p environments. We achieve distributed tree traversal with efficient routing distance and routing state. We show how to implement several p2p applications using distributed tree structures.

Chi Zhang, Arvind Krishnamurthy, Randolph Y. Wang

Arpeggio: Metadata Searching and Content Sharing with Chord

Abstract

Arpeggio is a peer-to-peer file-sharing network based on the Chord lookup primitive. Queries for data whose metadata matches a certain criterion are performed efficiently by using a distributed keyword-set index, augmented with index-side filtering. We introduce index gateways, a technique for minimizing index maintenance overhead. Because file data is large, Arpeggio employs subrings to track live source peers without the cost of inserting the data itself into the network. Finally, we introduce postfetching, a technique that uses information in the index to improve the availability of rare files. The result is a system that provides efficient query operations with the scalability and reliability advantages of full decentralization, and a content distribution system tuned to the requirements and capabilities of a peer-to-peer network.

Austin T. Clements, Dan R. K. Ports, David R. Karger

OverCite: A Cooperative Digital Research Library

Abstract

CiteSeer is a well-known online resource for the computer science research community, allowing users to search and browse a large archive of research papers. Unfortunately, its current centralized incarnation is costly to run. Although members of the community would presumably be willing to donate hardware and bandwidth at their own sites to assist CiteSeer, the current architecture does not facilitate such distribution of resources. OverCite is a proposal for a new architecture for a distributed and cooperative research library based on a distributed hash table (DHT). The new architecture will harness resources at many sites, and thereby be able to support new features such as document alerts and scale to larger data sets.

Jeremy Stribling, Isaac G. Councill, Jinyang Li, M. Frans Kaashoek, David R. Karger, Robert Morris, Scott Shenker

Miscellaneous

NetProfiler: Profiling Wide-Area Networks Using Peer Cooperation

Abstract

Our work is motivated by two observations about the state of networks today. Operators have little visibility into the end users’ network experience while end users have little information or recourse when they encounter problems. We propose a system called NetProfiler, in which end hosts share network performance information with other hosts over a peer-to-peer network. The aggregated information from multiple hosts allows NetProfiler to profile the wide-area network, i.e., monitor end-to-end performance, and detect and diagnose problems from the perspective of end hosts. We define a set of attribute hierarchies associated with end hosts and their network connectivity. Information on the network performance and failures experienced by end hosts is then aggregated along these hierarchies, to identify patterns (e.g., shared attributes) that might be indicative of the source of the problem. In some cases, such sharing of information can also enable end hosts to resolve problems by themselves. The results from a 4-week-long Internet experiment indicate the promise of this approach.

Venkata N. Padmanabhan, Sriram Ramabhadran, Jitendra Padhye

A Statistical Theory of Chord Under Churn

Abstract

Most earlier studies of DHTs under churn have either depended on simulations as the primary investigation tool, or on establishing bounds for DHTs to function. In this paper, we present a complete analytical study of churn using a master-equation-based approach, used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena. Simulations are used to verify all theoretical predictions. We demonstrate the application of our methodology to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately predict the fraction of failed or incorrect successor and finger pointers and show how we can use these quantities to predict the performance and consistency of lookups under churn. We also discuss briefly how churn may actually be of different ’types’ and the implications this will have for the functioning of DHTs in general.

Supriya Krishnamurthy, Sameh El-Ansary, Erik Aurell, Seif Haridi

Peering Peer-to-Peer Providers

Abstract

The early peer-to-peer applications eschewed commercial arrangements and instead established a grass-roots model in which the collection of end-users provided their own distributed computational infrastructure. While this cooperative end-user approach works well in many application settings, it does not provide a sufficiently stable platform for certain peer-to-peer applications (e.g. DHTs as a building block for network services). Assuming such a stable platform isn’t freely provided by a benefactor (such as NSF), we must ask whether DHTs could be deployed in a competitive commercial environment. The key issue is whether a multiplicity of DHT services can coordinate to provide a single coherent DHT service, much the way ISPs peer to provide a completely connected Internet. In this paper, we describe various approaches for DHT peering and discuss some of the related performance and incentive issues.

Hari Balakrishnan, Scott Shenker, Michael Walfish

Multicast

The Impact of Heterogeneous Bandwidth Constraints on DHT-Based Multicast Protocols

Abstract

In this paper, we consider support for bandwidth-demanding applications such as video broadcasting using DHTs. Our investigations focus on the impact of heterogeneity in the outgoing bandwidth capabilities of nodes on Scribe, a representative and relatively mature DHT-based multicast protocol. We expose important issues that arise due to the mismatch between the ID space that underlies the DHT and the outgoing bandwidth constraints on nodes.

Ashwin R. Bharambe, Sanjay G. Rao, Venkata N. Padmanabhan, Srinivasan Seshan, Hui Zhang

Chainsaw: Eliminating Trees from Overlay Multicast

Abstract

In this paper, we present Chainsaw, a p2p overlay multicast system that completely eliminates trees. Peers are notified of new packets by their neighbors and must explicitly request a packet from a neighbor in order to receive it. This way, duplicate data can be eliminated and a peer can ensure it receives all packets. We show with simulations that Chainsaw has a short startup time, good resilience to catastrophic failure and essentially no packet loss. We support this argument with real-world experiments on Planetlab and compare Chainsaw to Bullet and Splitstream using MACEDON.

Vinay Pai, Kapil Kumar, Karthik Tamilmani, Vinay Sambamurthy, Alexander E. Mohr

FeedTree: Sharing Web Micronews with Peer-to-Peer Event Notification

Abstract

Syndication of micronews, frequently-updated content on the Web, is currently accomplished with RSS feeds and client applications that poll those feeds. However, providers of RSS content have recently become concerned about the escalating bandwidth demand of RSS readers. Current efforts to address this problem by optimizing the polling behavior of clients sacrifice timeliness without fundamentally improving the scalability of the system. In this paper, we argue for a micronews distribution system called FeedTree, which uses a peer-to-peer overlay network to distribute RSS feed data to subscribers promptly and efficiently. Peers in the network share the bandwidth costs, which reduces the load on the provider, and updated content is delivered to clients as soon as it is available.

Daniel Sandler, Alan Mislove, Ansley Post, Peter Druschel

Overlay Algorithms

Hybrid Overlay Structure Based on Random Walks

Abstract

Application-level multicast on structured overlays often suffer several drawbacks: 1) The regularity of the architecture makes it difficult to adapt to topology changes; 2) the uniformity of the protocol generally does not consider node heterogeneity. It would be ideal to combine the scalability of these overlays with the flexibility of an unstructured topology. In this paper, we propose a locality-aware hybrid overlay that combines the scalability and interface of a structured network with the connection flexibility of an unstructured network. Nodes self-organize into structured clusters based on network locality, while connections between clusters are created adaptively through random walks. Simulations show that this structure is efficient in terms of both delay and bandwidth. The network also supports the scalable fast rendezvous interface provided by structured overlays, resulting in fast membership operations.

Ruixiong Tian, Yongqiang Xiong, Qian Zhang, Bo Li, Ben Y. Zhao, Xing Li

Quickly Routing Searches Without Having to Move Content

Abstract

A great deal of work has been done to improve peer-to-peer routing by strategically moving or replicating content. However, there are many applications for which a peer-to-peer architecture might be appropriate, but in which content movement is not feasible. We argue that even in such applications, progress can be made in developing techniques that ensure efficient searches. We present several such techniques. First, we show that organizing the network into a square-root topology, where peer degrees are proportional to the square root of the popularity of their content, provides much better performance than power-law networks. Second, we present routing optimizations based on the amount of content stored at peers, and tracking the “best” peers, that can further improve performance. These and other techniques can make searches efficient, even when content movement or replication is not feasible.

Brian F. Cooper

Practical Locality-Awareness for Large Scale Information Sharing

Abstract

Tulip is an overlay for routing, searching and publish-lookup information sharing. It offers a unique combination of the advantages of both structured and unstructured overlays, that does not co-exist in any previous solution. Tulip features locality awareness (stretch 2) and fault tolerance (nodes can route around failures). It supports under the same roof exact keyed-lookup, nearest copy location, and global information search. Tulip has been deployed and its locality and fault tolerance properties verified over a real wide-area network.

Ittai Abraham, Ankur Badola, Danny Bickson, Dahlia Malkhi, Sharad Maloo, Saar Ron

Empirical Studies

An Empirical Study of Free-Riding Behavior in the Maze P2P File-Sharing System

Abstract

Maze is a P2P file-sharing system with an active and large user base. It is developed, deployed and operated by an academic research team. As such, it offers ample opportunities to conduct experiments to under-stand user behavior. Embedded in Maze is a set of incentive policies designed to encourage sharing and contribution. This paper presents an in-depth analysis of the effectiveness of the incentive policies and how users react to them. We found that in general the policies have been effective. But they also encourage the more selfish users to cheat by whitewashing their ac-counts as a variation of Sybil attack. We examine multiple factors that may contribute to the free-riding behavior. Our conclusions are that upload speed, NAT and amount of shared files are not the problems, and selfish behavior is demonstrated more by shorter online time. Since free-riders are also avid consumers of popular files, we suggest a two-pronged approach to reduce free-riding further: mechanisms to direct queries to sources that would otherwise be free-riders, and policies to encourage users make their resources more available.

Mao Yang, Zheng Zhang, Xiaoming Li, Yafei Dai

Clustering in P2P Exchanges and Consequences on Performances

Abstract

We propose here an analysis of a rich dataset which gives an exhaustive and dynamic view of the exchanges processed in a running eDonkey system. We focus on correlation in term of data exchanged by peers having provided or queried at least one data in common. We introduce a method to capture these correlations (namely the data clustering), and study it in detail. We then use it to propose a very simple and efficient way to group data into clusters and show the impact of this underlying structure on search in typical P2P systems. Finally, we use these results to evaluate the relevance and limitations of a model proposed in a previous publication. We indicate some realistic values for the parameters of this model, and discuss some possible improvements.

Stevens Le Blond, Jean-Loup Guillaume, Matthieu Latapy

The Bittorrent P2P File-Sharing System: Measurements and Analysis

Abstract

Of the many P2P file-sharing prototypes in existence, BitTorrent is one of the few that has managed to attract millions of users. BitTorrent relies on other (global) components for file search, employs a moderator system to ensure the integrity of file data, and uses a bartering technique for downloading in order to prevent users from freeriding. In this paper we present a measurement study of BitTorrent in which we focus on four issues, viz. availability, integrity, flashcrowd handling, and download performance. The purpose of this paper is to aid in the understanding of a real P2P system that apparently has the right mechanisms to attract a large user community, to provide measurement data that may be useful in modeling P2P systems, and to identify design issues in such systems.

Johan Pouwelse, Paweł Garbacki, Dick Epema, Henk Sips

Miscellaneous

Dynamic Load Balancing in Distributed Hash Tables

Abstract

In Peer-to-Peer networks based on consistent hashing and ring topology, each server is responsible for an interval chosen (pseudo-) randomly on a unit circle. The topology of the network, the communication load, and the amount of data a server stores depend heavily on the length of its interval.

Additionally, the nodes are allowed to join the network or to leave it at any time. Such operations can destroy the balance of the network, even if all the intervals had equal lengths in the beginning.

This paper deals with the task of keeping such a system balanced, so that the lengths of intervals assigned to the nodes differ at most by a constant factor. We propose a simple fully distributed scheme, which works in a constant number of rounds and achieves optimal balance with high probability. Each round takes time at most \(\mathcal{O}(\mathcal{D}+{\rm log n})\), where \(\mathcal{D}\) is the diameter of a specific network (e.g. Θ(log n) for Chord [15] and \(\Theta(\frac{log n}{log log n})\) for the continous-discrete approach proposed by Naor and Wieder [12,11]).

The scheme is a continuous process which does not have to be informed about the possible imbalance or the current size of the network to start working. The total number of migrations is within a constant factor from the number of migrations generated by the optimal centralized algorithm starting with the same initial network state.

Marcin Bienkowski, Miroslaw Korzeniowski, Friedhelm Meyer auf der Heide

High Availability in DHTs: Erasure Coding vs. Replication

Abstract

High availability in peer-to-peer DHTs requires data redundancy. This paper compares two popular redundancy schemes: replication and erasure coding. Unlike previous comparisons, we take the characteristics of the nodes that comprise the overlay into account, and conclude that in some cases the benefits from coding are limited, and may not be worth its disadvantages.

Rodrigo Rodrigues, Barbara Liskov

Conservation vs. Consensus in Peer-to-Peer Preservation Systems

Abstract

The problem of digital preservation is widely acknowledged, but the underlying assumptions implicit to the design of systems that address this problem have not been analyzed explicitly. We identify two basic approaches to address the problem of digital preservation using peer-to-peer systems: conservation and consensus. We highlight the design tradeoffs involved in using the two general approaches, and we provide a framework for analyzing the characteristics of peer-to-peer preservation systems in general. In addition, we propose a novel conservation-based protocol for achieving preservation and we analyze its effectiveness with respect to our framework.

Prashanth P. Bungale, Geoffrey Goodell, Mema Roussopoulos

Exploiting Network Locality

Locality Prediction for Oblivious Clients

Abstract

To improve performance, large-scale Internet systems require clients to access nearby servers. While centralized systems can leverage static topology maps for rough network distances, fully-decentralized systems have turned to active probing and network coordinate algorithms to scalably predict inter-host latencies. Internet applications seeking immediate adoption, however, must inter-operate with unmodified clients running existing protocols such as HTTP and DNS.

This paper explores a variety of active probing algorithms for locality prediction. Upon receiving an external client request, peers within a decentralized system are able to quickly estimate nearby servers, using a minimum of probes from multiple vantages. We find that, while network coordinates may play an important role in scalably choosing effective vantage points, they are not directly useful for predicting a client’s nearest servers.

Kevin P. Shanahan, Michael J. Freedman

Impact of Neighbor Selection on Performance and Resilience of Structured P2P Networks

Abstract

Recent work has shown that intelligent neighbor selection during construction can significantly enhance the performance of peer-to-peer overlay networks. While its impact on performance has been recognized, few have examined the impact of neighbor selection on network resilience. In this paper, we study the impact with a generalized cost model for overlay construction that takes into consideration different types of heterogeneity, such as node capacity and network proximity. Our simulation results show that the resulting performance improvement comes at the cost of static resilience against targeted attacks and adding random redundancy can improve the resilience significantly.

Byung-Gon Chun, Ben Y. Zhao, John D. Kubiatowicz

Evaluating DHT-Based Service Placement for Stream-Based Overlays

Abstract

Stream-based overlay networks (SBONs) are one approach to implementing large-scale stream processing systems. A fundamental consideration in an SBON is that of service placement, which determines the physical location of in-network processing services or operators, in such a way that network resources are used efficiently. Service placement consists of two components: node discovery, which selects a candidate set of nodes on which services might be placed, and node selection, which chooses the particular node to host a service. By viewing the placement problem as the composition of these two processes we can trade-off quality and efficiency between them.

We evaluate the appropriateness of using DHT routing paths for service placement in an SBON, when aiming to minimize network usage. For this, we consider two DHT-based algorithms for node discovery, which use either the union or intersection of DHT routing paths in the SBON, and compare their performance to other techniques. We show that current DHT-based schemes are actually rather poor node discovery algorithms, when minimizing network utilization. An efficient DHT may not traverse enough hops to obtain a sufficiently large candidate set for placement. The union of DHT routes may result in a low-quality set of discovered nodes that requires an expensive node selection algorithm. Finally, the intersection of DHT routes relies on route convergence, which prevents the placement of services with a large fan-in.

Peter Pietzuch, Jeffrey Shneidman, Jonathan Ledlie, Matt Welsh, Margo Seltzer, Mema Roussopoulos

Springer Professional

Peer-to-Peer Systems IV

4th International Workshop, IPTPS 2005, Ithaca, NY, USA, February 24-25, 2005. Revised Selected Papers

Inhaltsverzeichnis

Frontmatter

Workshop Report

Workshop Report

Security and Incentives

A Self-repairing Peer-to-Peer System Resilient to Dynamic Adversarial Churn

A First Look at Peer-to-Peer Worms: Threats and Defenses

A Taxonomy of Rational Attacks

Search

Brushwood: Distributed Trees in Peer-to-Peer Systems

Arpeggio: Metadata Searching and Content Sharing with Chord

OverCite: A Cooperative Digital Research Library

Miscellaneous

NetProfiler: Profiling Wide-Area Networks Using Peer Cooperation

A Statistical Theory of Chord Under Churn

Peering Peer-to-Peer Providers

Multicast

The Impact of Heterogeneous Bandwidth Constraints on DHT-Based Multicast Protocols

Chainsaw: Eliminating Trees from Overlay Multicast

FeedTree: Sharing Web Micronews with Peer-to-Peer Event Notification

Overlay Algorithms

Hybrid Overlay Structure Based on Random Walks

Quickly Routing Searches Without Having to Move Content

Practical Locality-Awareness for Large Scale Information Sharing

Empirical Studies

An Empirical Study of Free-Riding Behavior in the Maze P2P File-Sharing System

Clustering in P2P Exchanges and Consequences on Performances

The Bittorrent P2P File-Sharing System: Measurements and Analysis

Miscellaneous

Dynamic Load Balancing in Distributed Hash Tables

High Availability in DHTs: Erasure Coding vs. Replication

Conservation vs. Consensus in Peer-to-Peer Preservation Systems

Exploiting Network Locality

Locality Prediction for Oblivious Clients

Impact of Neighbor Selection on Performance and Resilience of Structured P2P Networks

Evaluating DHT-Based Service Placement for Stream-Based Overlays

Backmatter