An efficient hypercube labeling schema for dynamic Peer-to-Peer networks

https://doi.org/10.1016/j.jpdc.2016.12.022Get rights and content

Highlights

  • We assign labels to each node in a P2P network to facilitate and improve communication.

  • Labels create a virtual hypercube overlay.

  • Known hypercube algorithms are adopted in this environment.

  • Theoretical and experimental findings prove the feasibility and efficiency of this approach.

Abstract

This paper addresses the general problem of reducing unnecessary message transmission thereby lowering overall bandwidth utilization in a Peer-to-Peer (P2P) network. In particular, we exploit the characteristics of a P2P network engineered to resemble a hypercube. The reason for doing this is to achieve constant computation time for inter-node distances that are needed in the process of query optimization. To realize such a hypercube-like engineered structure, we develop a new labeling scheme which assigns identifiers (labels) to each node and then uses these labels to determine inter-node distances as is done in a hypercube, thus eliminating the need to send out queries to find the distance from one node to another. The labels allow for creating a virtual overlay which resembles a hypercube. We prove that the labeling scheme does in fact allow for reducing bandwidth utilization in the network. To confirm our theoretical findings we conduct various experiments with randomly selected P2P networks of various sizes. Detailed statistics on the outcome of these experiments are provided which show clearly the practical utility of the labeling approach.

Introduction

In what follows we present our latest results as a contribution to the problem of reducing unnecessary message transmission and thereby lowering overall bandwidth utilization in a P2P network. The efficiency of query optimization can be significantly improved by the use of a labeling scheme that allows for constant time inter-node distance determination. Realization of such a scheme, which assigns identifiers (labels) to each node and then uses these labels to determine inter-node distances, is made possible by a hypercube-like engineered structure. We prove that the labeling schema does, under certain conditions, reduce bandwidth utilization in the network. As part of this proof, we analyze each of the HyperD algorithms to determine their complexity and identify explicitly any savings in network maintenance costs. To confirm our theoretical findings we conduct various experiments with randomly selected P2P networks of various sizes. Detailed statistics on the outcome of these experiments are provided which show clearly the practical utility of the labeling approach, and allow for identifying the advantages and disadvantages of the labeling scheme.

This paper details the latest findings concerning HyperD, a labeling schema for Peer-to-Peer (P2P) networks (i.e., collections of computer devices which share information and resources in a purely ad-hoc fashion without the help of any central authority). The HyperD concept was first presented at  [46], [45]. HyperD was initially designed to serve as a topology model for a Dynamic Distributed Federated Database (DDFD)  [5]. A DDFD is a fully distributed relational database system which is well suited to a large range of database applications where data is stored locally at each node but is available and accessible to all other nodes in the network. A DDFD is indeed an example of a P2P network. Special attention was paid to a particular DDFD implementation, namely, the Gaian Database (GaianDB)  [5]. GaianDB grows by a process of preferential attachment. The resulting overlay network has the properties of a scale free graph. This is a typical example of an unstructured P2P overlay network where flooding is the method of choice used to search for content and identify communication routes. The advantages of using the GaianDB in improving query processing as well as the behavior and properties of such networks have been shown in  [6], [42], [43]. We then explored further methods for improving performance in GaianDB networks. In  [31], [32] we argued that using a fixed graph structure such as the hypercube offers a number of advantages in this type of distributed environment. In order to verify these positive theoretical predictions we decided to come up with a practical implementation of a hypercube network. These were the first steps in designing HyperD, which adopts a growth model that is well suited to the type of dynamic distributed environment similar to the GaianDB system.

The HyperD protocol assigns to each participating node a label set. These labels are then used to determine which nodes will form a virtual connection. A virtual overlay is established using these virtual connections. Each label represents a position in a virtual hypercube, so that nodes in the virtual network connect to form a topology which resembles a hypercube. In order to maintain a full hypercube structure, nodes may temporarily assume control of more than one hypercube position (i.e., they are assigned multiple labels). Finally, labels are used to facilitate communication between nodes adopting known optimal hypercube communication algorithms.

Assigning labels to nodes when building a structured overlay is not new. Typically these labels are assigned using a random process; therefore, the position of each node in the network is also determined randomly. Quite often this produces an overlay that differs substantially from its underlying physical network. Consequently, theoretical results could be very different from the actual cost of communication. In other cases, labels are assigned using the recursive property of hypercubes or following a hierarchical structure. Our work differs from these previous approaches in that we preserve the integrity of the hypercube by allowing nodes to take control of multiple labels. This entails additional maintenance overhead but permits us to adopt hypercube communication algorithms optimally. Furthermore, we employ a process of preferential attachment to assign labels to newly arriving nodes. Each assignment is done using a fitness function which increases the probability that two virtual nodes are also neighbors (or at least within a small number of hops) in the underlying physical network. In addition, we explore a label assignment variant which limits the label search to a small constant number of hops. By doing so we significantly improve the virtual–physical alignment while preserving the hypercube structure and the efficiency of communication.

In a typical P2P network nodes form a virtual overlay to facilitate data discovery and improve communication. An overlay network is an abstract architecture usually implemented at the application layer which is built on top of the existing physical layer. Nodes belonging to this overlay establish virtual connections. Two nodes are virtual neighbors in the overlay if they know each other and directly communicate using the underlying physical network. In practice two nodes which are virtually connected may not necessarily be adjacent in the physical layer.

Fig. 1 shows an example of a small P2P network and the corresponding overlay. The lower layer shows physical connections among nodes and the upper layer shows all virtual connections these nodes have established to form the overlay.

A large number of papers have been written on P2P networks, and many types of overlay networks have been proposed. It is beyond the scope of this paper to review the extensive literature in detail. Several surveys are available including  [8], [27]. Here we provide a brief summary of recent results relevant to the problem at hand. P2P overlays are commonly categorized as structured and unstructured. In unstructured P2P networks nodes connect using a random process, usually using only adjacent members. In some cases nodes may connect based on their content or common features and interests. Examples of unstructured networks employ query and search techniques such as flooding  [35], [24], random walk  [11], [30], hill climbing  [13], etc. In such cases the resulting overlay graph does not have a fixed structure and quite often resembles a random graph or a scale-free graph. These types of networks normally operate using a very small set of rules and are easier to maintain but leave something to be desired in efficiency and scalability. In structured P2P networks, the resulting overlay graph approximates a known graph structure or geometry. Labels (keys, identifiers, addresses) are used to determine connections. Examples of structured P2P networks include prefix-based multi-hop networks  [33], [47], [36], [1], ring networks  [41], [2], [29], hypercube networks  [38], butterfly networks  [28], De Bruijn networks  [20], [15], cube connected cycles networks  [39], Kautz networks  [23], [16], and D-Torus networks  [34]. Maintaining the chosen structure requires some additional overhead but communication and content discovery are usually much more efficient.

P2P networks, despite their dynamic nature, are expected to offer an acceptable degree of service quality. Any implemented P2P network protocol should be able to: (1) maintain connectivity; (2) self-organize without the need of any central control or authority; (3) deal efficiently with any network changes such as new node arrivals, node failures, node deletions and node mobility; (4) guarantee an accurate and reliable exchange of information between nodes; (5) be able to support the exchange of large amounts of information between nodes in a reasonable amount of time; (6) be scalable adjusting to the growth of the network; (7) and be resilient against attacks or failure.

However, it is quite challenging to achieve all of the above goals, particularly in the absence of a central authority to coordinate these efforts. A crucial aspect of P2P networks that directly impacts the overall performance is the way nodes communicate, search for content and find and maintain routes. Reducing the communication load for performing typical network activities is beneficial in a variety of ways. Finding shorter routes and reducing the number of messages exchanged during network operations increases reliability, decreases the time required for completion, reduces resource consumption such as processing power, battery life, bandwidth utilization, and reduces the likelihood of network fragmentation and node failure.

Section snippets

Hypercube networks

The hypercube plays a central role in the work reported here, so it is necessary to describe them in some detail. Hypercubes have been used in a large variety of applications due to a wealth of desirable properties. A k-dimensional hypercube Qk(V,E) also referred to as a k-cube, binary k-cube or boolean k-cube is a graph consisting of a set V of n=2k nodes (vertices) and a set of edges E. Each node viV is assigned a unique k-binary digit label (identifier). An edge eiE joins two nodes whose

HyperD

HyperD is a P2P overlay which resembles a hypercube. Participating nodes establish bidirectional connections which correspond to edges in the graph modeling the network. HyperD is dynamic and nodes may enter or leave the network at any time. Label assignments and connections in a HyperD are rearranged to deal with these changes. HyperD approximates a k-dimensional hypercube if the number of nodes is between 2k1 and 2k. Each node in HyperD is assigned one or more binary labels. Each label is

Conclusions and future work

In this paper we showed that building a hypercube based P2P overlay is not only feasible but proves to be beneficial in terms of reducing the cost of communication. We summarize our observations and list some of the advantages and disadvantages of using HyperD as opposed to unstructured overlay networks.

Advantages: (1) HyperD is relatively simple to deploy and maintain. It takes full advantage of the simplicity, uniformity, regularity and elegance present in hypercube-like structures. (2)

Acknowledgments

This research was sponsored in part by the US Army Research Laboratory and the UK Ministry of Defense under Agreement Number W911NF-06-3-0001.

Andi Toce is an Associate Professor of Computer Science at LaGuardia Community College of the City University of New York (CUNY). He received his Ph.D. in Computer Science from the Graduate Center (CUNY) in 2013. His research interests include algorithm design for distributed systems and structured network topologies.

References (47)

  • G. Bent, D. Dantressangle, A. Mowshowitz, V. Mitsou, A dynamic distributed federated database, in: Proceedings of the...
  • G. Bent, D. Dantressangle, P. Stone, D. Vyvyan, A. Mowshowitz, Experimental evaluation of the performance and...
  • J. Bufford et al.

    Peer-to-peer networking: Synopsis and research directions

  • Ü.V. Çatalyürek, E.G. Boman, K.D. Devine, D. Bozdag, R.T. Heaphy, L.A. Riesen, Hypergraph-based dynamic load balancing...
  • Y. Chawathe et al.

    Making gnutella-like p2p systems scalable

  • P.-J. Chuang et al.

    Hypercube-based data gathering in wireless sensor networks

    J. Inf. Sci. Eng.

    (2007)
  • I. Clarke et al.

    Freenet: A distributed anonymous information storage and retrieval system

  • M.S. Cline et al.

    Integration of biological networks and gene expression data using cytoscape

    Nat. Prot.

    (2007)
  • D. Guo, J. Wu, H. Chen, X. Luo, Moore: An extendable peer-to-peer network based on incomplete kautz digraph with...
  • C.-T. Ho, S.L. Johnsson, Distributed routing algorithms for broadcasting and personalized communication in hypercubes,...
  • H. Huo et al.

    Virtual hypercube routing in wireless sensor networks for health care systems

    Chin. J. Electron.

    (2010)
  • S.L. Johnsson et al.

    Optimum broadcasting and personalized communication in hypercubes

    IEEE Trans. Comput.

    (1989)
  • M.F. Kaashoek, D.R. Karger, Koorde: A simple degree-optimal distributed hash table, in: IPTPS, 2003, pp....
  • Cited by (9)

    View all citing articles on Scopus

    Andi Toce is an Associate Professor of Computer Science at LaGuardia Community College of the City University of New York (CUNY). He received his Ph.D. in Computer Science from the Graduate Center (CUNY) in 2013. His research interests include algorithm design for distributed systems and structured network topologies.

    Abbe Mowshowitz is professor of computer science at the City College of New York and member of the doctoral faculty at the Graduate Center of the City University of New York. Since he joined CUNY in 1984, he has also held academic appointments at the University of Amsterdam, Erasmus University Rotterdam, and the Rotterdam School of Management. Earlier he held academic appointments at Rensselaer Polytechnic Institute, the University of British Columbia, the University of Toronto, and the University of Michigan. He has been a major contributor to the analysis of complex networks starting with seminal work in the late 1960s. Most recently he participated in the International Technology Alliance research project dedicated to the further development of complex network technology. Mowshowitz received his Ph.D. from the University of Michigan in 1967.

    Akira Kawaguchi is a professor and chair of the Department of Computer Science at the City College of the City University of New York. He received his Ph.D. in Computer Science from Columbia University in 1998. His research interest centers on the area of database and transaction processing systems.

    Paul Stone is an IT Architect with the IBM Emerging Technology organization. He has over 20 years experience in software development, architecture and design and has specialized in performance optimization. Recently Paul has occupied as a primary researcher for IBM in the area of distributed databases and optimization of distributed queries. He is located at IBM Hursley Park in Winchester, United Kingdom.

    Patrick Dantressangle, is a Senior Technical Staff Member in the Emerging Technology Services (ETS) group at IBM Hursley in the UK and is an IBM Master Inventor. Patrick has over 25 years experience in the distributed data management area with interests areas ranging from research, software and solution development, architecture and design of solutions and products. His focus was always to create technology that helps people and companies work better. Patrick worked in many countries (France, USA, Canada, UK,...). He has more than 21 patents filed or pending and he has co-authored over 19 scientific publications. His research interests include Distributed Databases and Semantic Systems, Big Data Analytics including Text Analytics and all other analytics at large scale.

    Graham Bent is a Senior Technical Staff Member in the Emerging Technology Services (ETS) group at IBM Hursley in the UK and is an IBM Master Inventor. Graham has a background in academia and the defense industry where his primary area of expertise is in distributed database technologies and in the use of data and text mining for business analytics. He has developed solutions for the retail, banking, insurance, telecommunications, and healthcare, travel and transport and government sectors and co-authored a number of IBM Red Books on Data Mining. He pioneered the development of a massively scalable distributed database technology (Gaian Database) and is currently involved in joint research activities with IBM Research. These include new techniques for distributed secure computing on encrypted data; neuromorphic processing using the new SyNapse technology and he has recently begun a project investigating new programming models for future quantum computing systems.

    View full text