An efficient hypercube labeling schema for dynamic Peer-to-Peer networks
Introduction
In what follows we present our latest results as a contribution to the problem of reducing unnecessary message transmission and thereby lowering overall bandwidth utilization in a P2P network. The efficiency of query optimization can be significantly improved by the use of a labeling scheme that allows for constant time inter-node distance determination. Realization of such a scheme, which assigns identifiers (labels) to each node and then uses these labels to determine inter-node distances, is made possible by a hypercube-like engineered structure. We prove that the labeling schema does, under certain conditions, reduce bandwidth utilization in the network. As part of this proof, we analyze each of the HyperD algorithms to determine their complexity and identify explicitly any savings in network maintenance costs. To confirm our theoretical findings we conduct various experiments with randomly selected P2P networks of various sizes. Detailed statistics on the outcome of these experiments are provided which show clearly the practical utility of the labeling approach, and allow for identifying the advantages and disadvantages of the labeling scheme.
This paper details the latest findings concerning HyperD, a labeling schema for Peer-to-Peer (P2P) networks (i.e., collections of computer devices which share information and resources in a purely ad-hoc fashion without the help of any central authority). The HyperD concept was first presented at [46], [45]. HyperD was initially designed to serve as a topology model for a Dynamic Distributed Federated Database (DDFD) [5]. A DDFD is a fully distributed relational database system which is well suited to a large range of database applications where data is stored locally at each node but is available and accessible to all other nodes in the network. A DDFD is indeed an example of a P2P network. Special attention was paid to a particular DDFD implementation, namely, the Gaian Database (GaianDB) [5]. GaianDB grows by a process of preferential attachment. The resulting overlay network has the properties of a scale free graph. This is a typical example of an unstructured P2P overlay network where flooding is the method of choice used to search for content and identify communication routes. The advantages of using the GaianDB in improving query processing as well as the behavior and properties of such networks have been shown in [6], [42], [43]. We then explored further methods for improving performance in GaianDB networks. In [31], [32] we argued that using a fixed graph structure such as the hypercube offers a number of advantages in this type of distributed environment. In order to verify these positive theoretical predictions we decided to come up with a practical implementation of a hypercube network. These were the first steps in designing HyperD, which adopts a growth model that is well suited to the type of dynamic distributed environment similar to the GaianDB system.
The HyperD protocol assigns to each participating node a label set. These labels are then used to determine which nodes will form a virtual connection. A virtual overlay is established using these virtual connections. Each label represents a position in a virtual hypercube, so that nodes in the virtual network connect to form a topology which resembles a hypercube. In order to maintain a full hypercube structure, nodes may temporarily assume control of more than one hypercube position (i.e., they are assigned multiple labels). Finally, labels are used to facilitate communication between nodes adopting known optimal hypercube communication algorithms.
Assigning labels to nodes when building a structured overlay is not new. Typically these labels are assigned using a random process; therefore, the position of each node in the network is also determined randomly. Quite often this produces an overlay that differs substantially from its underlying physical network. Consequently, theoretical results could be very different from the actual cost of communication. In other cases, labels are assigned using the recursive property of hypercubes or following a hierarchical structure. Our work differs from these previous approaches in that we preserve the integrity of the hypercube by allowing nodes to take control of multiple labels. This entails additional maintenance overhead but permits us to adopt hypercube communication algorithms optimally. Furthermore, we employ a process of preferential attachment to assign labels to newly arriving nodes. Each assignment is done using a fitness function which increases the probability that two virtual nodes are also neighbors (or at least within a small number of hops) in the underlying physical network. In addition, we explore a label assignment variant which limits the label search to a small constant number of hops. By doing so we significantly improve the virtual–physical alignment while preserving the hypercube structure and the efficiency of communication.
In a typical P2P network nodes form a virtual overlay to facilitate data discovery and improve communication. An overlay network is an abstract architecture usually implemented at the application layer which is built on top of the existing physical layer. Nodes belonging to this overlay establish virtual connections. Two nodes are virtual neighbors in the overlay if they know each other and directly communicate using the underlying physical network. In practice two nodes which are virtually connected may not necessarily be adjacent in the physical layer.
Fig. 1 shows an example of a small P2P network and the corresponding overlay. The lower layer shows physical connections among nodes and the upper layer shows all virtual connections these nodes have established to form the overlay.
A large number of papers have been written on P2P networks, and many types of overlay networks have been proposed. It is beyond the scope of this paper to review the extensive literature in detail. Several surveys are available including [8], [27]. Here we provide a brief summary of recent results relevant to the problem at hand. P2P overlays are commonly categorized as structured and unstructured. In unstructured P2P networks nodes connect using a random process, usually using only adjacent members. In some cases nodes may connect based on their content or common features and interests. Examples of unstructured networks employ query and search techniques such as flooding [35], [24], random walk [11], [30], hill climbing [13], etc. In such cases the resulting overlay graph does not have a fixed structure and quite often resembles a random graph or a scale-free graph. These types of networks normally operate using a very small set of rules and are easier to maintain but leave something to be desired in efficiency and scalability. In structured P2P networks, the resulting overlay graph approximates a known graph structure or geometry. Labels (keys, identifiers, addresses) are used to determine connections. Examples of structured P2P networks include prefix-based multi-hop networks [33], [47], [36], [1], ring networks [41], [2], [29], hypercube networks [38], butterfly networks [28], De Bruijn networks [20], [15], cube connected cycles networks [39], Kautz networks [23], [16], and D-Torus networks [34]. Maintaining the chosen structure requires some additional overhead but communication and content discovery are usually much more efficient.
P2P networks, despite their dynamic nature, are expected to offer an acceptable degree of service quality. Any implemented P2P network protocol should be able to: (1) maintain connectivity; (2) self-organize without the need of any central control or authority; (3) deal efficiently with any network changes such as new node arrivals, node failures, node deletions and node mobility; (4) guarantee an accurate and reliable exchange of information between nodes; (5) be able to support the exchange of large amounts of information between nodes in a reasonable amount of time; (6) be scalable adjusting to the growth of the network; (7) and be resilient against attacks or failure.
However, it is quite challenging to achieve all of the above goals, particularly in the absence of a central authority to coordinate these efforts. A crucial aspect of P2P networks that directly impacts the overall performance is the way nodes communicate, search for content and find and maintain routes. Reducing the communication load for performing typical network activities is beneficial in a variety of ways. Finding shorter routes and reducing the number of messages exchanged during network operations increases reliability, decreases the time required for completion, reduces resource consumption such as processing power, battery life, bandwidth utilization, and reduces the likelihood of network fragmentation and node failure.
Section snippets
Hypercube networks
The hypercube plays a central role in the work reported here, so it is necessary to describe them in some detail. Hypercubes have been used in a large variety of applications due to a wealth of desirable properties. A -dimensional hypercube also referred to as a k-cube, binary k-cube or boolean k-cube is a graph consisting of a set of nodes (vertices) and a set of edges . Each node is assigned a unique -binary digit label (identifier). An edge joins two nodes whose
HyperD
HyperD is a P2P overlay which resembles a hypercube. Participating nodes establish bidirectional connections which correspond to edges in the graph modeling the network. HyperD is dynamic and nodes may enter or leave the network at any time. Label assignments and connections in a HyperD are rearranged to deal with these changes. HyperD approximates a -dimensional hypercube if the number of nodes is between and . Each node in HyperD is assigned one or more binary labels. Each label is
Conclusions and future work
In this paper we showed that building a hypercube based P2P overlay is not only feasible but proves to be beneficial in terms of reducing the cost of communication. We summarize our observations and list some of the advantages and disadvantages of using HyperD as opposed to unstructured overlay networks.
Advantages: (1) HyperD is relatively simple to deploy and maintain. It takes full advantage of the simplicity, uniformity, regularity and elegance present in hypercube-like structures. (2)
Acknowledgments
This research was sponsored in part by the US Army Research Laboratory and the UK Ministry of Defense under Agreement Number W911NF-06-3-0001.
Andi Toce is an Associate Professor of Computer Science at LaGuardia Community College of the City University of New York (CUNY). He received his Ph.D. in Computer Science from the Graduate Center (CUNY) in 2013. His research interests include algorithm design for distributed systems and structured network topologies.
References (47)
- et al.
Optimal communication algorithms for hypercubes
J. Parallel Distrib. Comput.
(1991) - et al.
Bluecube: constructing a hypercube parallel computing and communication environment over bluetooth radio systems
J. Parallel Distrib. Comput.
(2006) - et al.
D2b: a de bruijn based content-addressable network
Theoret. Comput. Sci.
(2006) - et al.
Ring, torus and hypercube architectures/algorithms for parallel computing
Parallel Comput.
(1999) - et al.
The fasttrack overlay: a measurement study
Comput. Netw.
(2006) - et al.
Cycloid: a constant-degree and lookup-efficient p2p overlay network
Perform. Eval.
(2006) - et al.
P-grid: a self-organizing structured p2p system
SIGMOD Rec.
(2003) - et al.
Dks (n, k, f): A family of low communication, scalable and fault-tolerant infrastructures for p2p applications
- et al.
Peercube: A hypercube-based p2p overlay robust against collusion and churn
- et al.
Looking up data in p2p systems
Commun. ACM
(2003)
Peer-to-peer networking: Synopsis and research directions
Making gnutella-like p2p systems scalable
Hypercube-based data gathering in wireless sensor networks
J. Inf. Sci. Eng.
Freenet: A distributed anonymous information storage and retrieval system
Integration of biological networks and gene expression data using cytoscape
Nat. Prot.
Virtual hypercube routing in wireless sensor networks for health care systems
Chin. J. Electron.
Optimum broadcasting and personalized communication in hypercubes
IEEE Trans. Comput.
Cited by (9)
Variational networks of cube-connected cycles are recursive cubes of rings
2020, Information Processing LettersCitation Excerpt :Numerous interconnection networks have been proposed over the last fifty years or so and new designs continue to emerge. There are various reasons for this ongoing investigation and these include the following: the changing face of parallel and distributed systems, which encompass networks-on-chips, supercomputers, clusters and data centre networks, along with new and unforeseen applications, imposes new demands on the underlying interconnection networks (see, e.g., [14]); interconnection networks also feature in peer-to-peer and overlay networks (see, e.g., [15]), social networks (see, e.g., [16]) and wireless sensor networks (see, e.g., [2]); and the ‘structured’ graphs into which interconnection networks sit feature in combinatorial chemistry (see, e.g., [1]), coding theory (see, e.g., [3]), mathematical physics (see, e.g., [12]) and discrete mathematics in general (often purely as interesting combinatorial objects as regards the latter instantiation; see, e.g., [6,8,18]). As illustrations of recently proposed interconnection networks, we point to the recursive cubes of rings (see [10], building on [13]) and the variational networks of cube-connected cycles [17].
Topology as a Factor in Overlay Networks Designed to Support Dynamic Systems Modeling
2022, Lecture Notes in Networks and SystemsQuery Processing in Highly Distributed Environments
2022, Lecture Notes in Networks and Systems
Andi Toce is an Associate Professor of Computer Science at LaGuardia Community College of the City University of New York (CUNY). He received his Ph.D. in Computer Science from the Graduate Center (CUNY) in 2013. His research interests include algorithm design for distributed systems and structured network topologies.
Abbe Mowshowitz is professor of computer science at the City College of New York and member of the doctoral faculty at the Graduate Center of the City University of New York. Since he joined CUNY in 1984, he has also held academic appointments at the University of Amsterdam, Erasmus University Rotterdam, and the Rotterdam School of Management. Earlier he held academic appointments at Rensselaer Polytechnic Institute, the University of British Columbia, the University of Toronto, and the University of Michigan. He has been a major contributor to the analysis of complex networks starting with seminal work in the late 1960s. Most recently he participated in the International Technology Alliance research project dedicated to the further development of complex network technology. Mowshowitz received his Ph.D. from the University of Michigan in 1967.
Akira Kawaguchi is a professor and chair of the Department of Computer Science at the City College of the City University of New York. He received his Ph.D. in Computer Science from Columbia University in 1998. His research interest centers on the area of database and transaction processing systems.
Paul Stone is an IT Architect with the IBM Emerging Technology organization. He has over 20 years experience in software development, architecture and design and has specialized in performance optimization. Recently Paul has occupied as a primary researcher for IBM in the area of distributed databases and optimization of distributed queries. He is located at IBM Hursley Park in Winchester, United Kingdom.
Patrick Dantressangle, is a Senior Technical Staff Member in the Emerging Technology Services (ETS) group at IBM Hursley in the UK and is an IBM Master Inventor. Patrick has over 25 years experience in the distributed data management area with interests areas ranging from research, software and solution development, architecture and design of solutions and products. His focus was always to create technology that helps people and companies work better. Patrick worked in many countries (France, USA, Canada, UK,...). He has more than 21 patents filed or pending and he has co-authored over 19 scientific publications. His research interests include Distributed Databases and Semantic Systems, Big Data Analytics including Text Analytics and all other analytics at large scale.
Graham Bent is a Senior Technical Staff Member in the Emerging Technology Services (ETS) group at IBM Hursley in the UK and is an IBM Master Inventor. Graham has a background in academia and the defense industry where his primary area of expertise is in distributed database technologies and in the use of data and text mining for business analytics. He has developed solutions for the retail, banking, insurance, telecommunications, and healthcare, travel and transport and government sectors and co-authored a number of IBM Red Books on Data Mining. He pioneered the development of a massively scalable distributed database technology (Gaian Database) and is currently involved in joint research activities with IBM Research. These include new techniques for distributed secure computing on encrypted data; neuromorphic processing using the new SyNapse technology and he has recently begun a project investigating new programming models for future quantum computing systems.