Skip to main content

2009 | Buch

From P2P and Grids to Services on the Web

Evolving Distributed Communities

herausgegeben von: Ian J. Taylor, Andrew B. Harrison

Verlag: Springer London

Buchreihe : Computer Communications and Networks

insite
SUCHEN

Über dieses Buch

Over the past several years, Internet users have changed in their usage p- terns from predominately client/server-based Web server interactions to also involving the use of more decentralized applications, where they contribute more equally in the role of the application as a whole, and further to d- tributed communities based around the Web. Distributed systems take many forms, appear in many areas and range from truly decentralized systems, like Gnutella, Skype and Jxta, centrally indexed brokered systems like Web s- vices and Jini and centrally coordinated systems like SETI@home. From P2P and Grids and Services on the Web Evolving Distributed Communities provides a comprehensive overview of the emerging trends in peer-to-peer (P2P), distributed objects, Web Services, the Web, and Grid computing technologies, which have rede ned the way we think about d- tributed computing and the Internet. The book has four main themes: d- tributed environments, protocols and architectures for applications, protocols and architectures focusing on middleware and nally deployment of these middleware systems, providing real-world examples of their usage.

Inhaltsverzeichnis

Frontmatter

Common Themes

1. Introduction
Abstract
The field of distributed systems is characterized by rapid change and conflicting ideologies, approaches and vested interests. In its short history it has seen a number of different paradigms gaining interest and adoption before being vanquished by newer, more virile, movements. However, when a technology fades from the limelight, it often re-emerges at a later date under a new banner. As a result, there is a continuous intermixing of core, reusable concepts, with new innovations.
In the 1990s, there were two primary approaches to distributed systems. The Web represented a human-oriented, distributed information space rather than a computing program [1]. On the other hand, distributed object technologies such as CORBA [2] and DCOM [3] were primarily attempting to create distributed environments that seamlessly emulated local computer applications while providing the benefits of access to networked resources. But despite the initial vision of the Web as a space which many would contribute to, publishing became the preserve of the few with most users merely accessing data, not creating it. Meanwhile distributed object systems were growing in terms of their capabilities but becoming more heavyweight, proprietary and complex in the process.
Ian J. Taylor, Andrew B. Harrison
2. Discovery Protocols
Abstract
As outlined in the taxonomy in the previous chapter, a major theme that needs to be addressed amongst many of the distributed architectures and protocols we talk about in this book is the concept of discovery in both the Intranet and Internet scale domains. Some systems keep this simple by providing a central lookup point in the network but such a scheme, although convenient, is often not scalable to larger networks. Therefore, different mechanisms are needed in order to create a manageable discovery backbone for the automation of the discovery process. In this chapter, we discuss some of the common techniques that are used for discovery and describe the underlying protocols that these techniques rely on.
Ian J. Taylor, Andrew B. Harrison
3. Structured Document Types
Abstract
In order to span heterogeneous environments, many distributed systems have looked to means of expressing data in a way that can be interpreted unambiguously and without loss of information by remote nodes irrespective of their operating system or hardware. The Standard Generalized Markup Language (SGML) forms the basis of the most popular of such mechanisms. A markup language is a vocabulary that combines content with notation, or markup. The markup provides a context for the content. SGML has two important properties that make it suitable for use in distributed environments. First it is based on descriptive markup; the markup is designed to annotate the data in terms of its structure, not in terms of what one should do with the content. This means no particular processing, and therefore underlying software or hardware, is presumed. Such markup separates data from structure and, by implication, presentation as well because the structural markup can be processed for presentation independently of the data. Second SGML provides a means of associating a marked-up document with a document type. Document types are defined in a Document Type Definition (DTD). The DTD is a template against which the marked-up content can be understood, interpreted and validated using an SGML parser.
SGML has spawned several languages that are widely used, in particular Hypertext Markup Language (HTML) and Extensible Markup Language (XML). These are referenced extensively in this book because they are so commonplace. The following sections give an overview of these technologies. Readers may skip these sections now if they are familiar with these languages, or return to them later for clarification.
Ian J. Taylor, Andrew B. Harrison
4. Distributed Security Techniques
Abstract
This chapter covers the core elements of security in a distributed system. It illustrates the various ways that a third party can gain access to data and gives an overview of the design issues involved in building a distributed security system. Cryptography is introduced, then cryptographic techniques for symmetric and asymmetric encryption/decryption are given, along with a description of one-way hash functions. To demonstrate the use of these underlying techniques we provide an example of how a combination of public/private keys and hash functions can be used to digitally sign a document, e.g., email. Both asymmetric and symmetric secure channels are discussed and scenarios are provided for their use. Finally, the notion of sandboxing is introduced and illustrated through the description of the Java security-manager implementation.
The role and timeliness of this chapter therefore is to provide a security gateway for the middleware and applications that we will discuss in the following chapters, which often use a combination of security techniques. For example, Freenet (Chapter 12) uses many of these techniques extensively for creating keys for the Freenet network, which are used not only for privacy issues, but to actually map from the data content to network location. Further, both Jxta (Chapter 15) and Grid computing (Chapter 9) provide security infrastructures and address authentication issues; and BitTorrent (Chapter 13) makes use of hash functions in order to ensure the integrity of data as it is passed around the network.
Ian J. Taylor, Andrew B. Harrison

Distributed Environments

5. The Web
Abstract
In 1959, The Sound of Music made its debut on Broadway. The run won numerous awards and the show went on to become one of the most popular musicals in history. The vocal score along with underscoring, comprises about 200 pages of manuscript. In the same year, Miles Davis arrived at Columbia studios to record his seminal album Kind of Blue. He did not arrive with 200 pages of black dots. Rather, as Bill Evans writes on the album's liner notes, he arrived with sketches which indicated to the group what was to be played, presenting to the other musicians frameworks which are exquisite in their simplicity and yet contain all that is necessary to stimulate performance with sure reference to the primary conception.
Miles Davis' approach to composition has much in common with the spirit in which the Web was designed, and is summed up by Tim Berners-Lee's citation of the Lao Tse poem The Space Within, the final line of which is given above. The Web has arguably become so ubiquitous, not just because of what it can do, but because of what it does not try to do. The less that is defined, the more potential there is for supporting new ideas. Hence the Web provides a very simple skeleton on which new concepts can be easily hung. The trick, however, is to ensure that new ideas themselves do not shut the door on further invention. Therefore much of the activity of the World Wide Web Consortium (W3C) and other interested parties is in defining what the Web architecture is and how it can be maintained and improved. This is a balancing act between enabling natural evolution and ensuring a consistent approach based on past experience.
Ian J. Taylor, Andrew B. Harrison
6. Peer-2-Peer Environments
Abstract
At the time of writing, there are 1.35 billion devices connected to the Internet worldwide (e.g., PCs, phones, PDAs, etc.), a figure which has been rising rapidly [47]. With almost a quarter of the world's populations already connected to the Internet and growth rates in Africa and the Middle East reaching almost 1000% over the past 8 years, the increases to on-line connectivity seem to be following a familiar pattern seen in other areas within the computing industry.
The computer hardware industry has also been characterised by exponential production volumes. Gordon Moore, the co-founder of Intel, in his famous observation in 1965 [54] (made just four years after the first planar integrated circuit was discovered), predicted that the number of transistors on integrated circuits would double every few years. Indeed this prediction, thereafter called Moore's law, remains true today and Intel predicts that this will remain true at least until the end of this decade [55].
Ian J. Taylor, Andrew B. Harrison
7. Web Services
Abstract
Up until recently, data has been exported on the World Wide Web for human consumption in the form of Web pages. Most people therefore use the Web to read news/articles, to buy goods and services, to manage on-line accounts and so on. For this purpose, we use a Web browser and access information mostly through this medium.
From a publishing perspective, this involves converting the raw information, from a database, for example, into HTML or similar language so that it can be rendered in the correct form. Further, many Web sites collate information from other sites via Web pages, which is a bizarre occurrence involving decoding and parsing human-readable information not intended for machines at all (see Fig. 7.1).
Ian J. Taylor, Andrew B. Harrison
8. Distributed Objects and Agent Technologies
Abstract
Distributed object technology has been the mainstay of many corporations for many years, providing a well-defined mechanism for enacting complex yet predictable business processes across the corporate network. It has been used to simplify client-server systems and connections to remote database through the use of object-oriented APIs, which stemmed from the popularity of the object-oriented programming paradigm. To provide this context, the concept of objects is discussed in the next section and then related to how these can be transferred across a network within distributed object systems. The two distributed object systems we focus on in this book are CORBA and Jini, although we also discuss in Section 8.3 how distributed objects relate to the popular agent paradigm.
An overview of the salient features of CORBA is discussed in this chapter in Section 8.2 but Jini is discussed separately in Chapter 14. However, the underlying Java serialization and Remote Method Invocation (RMI) mechanisms for Jini are discussed here in Section 8.5 in order to illustrate how the state of a Java object can be saved and passed across a network as an argument to a remote method on a distributed object. This mechanism is illustrative of the underlying problems that need to be addressed within any distributed object system. Further, the same serialization mechanisms also need to be addressed in systems based on XML technologies, such as Web Services, which were discussed in the previous chapter. However, distributed object technologies differ in approach to Web Services in a number of ways but primarily because they maintain state across the distributed entities in the system.
Ian J. Taylor, Andrew B. Harrison
9. Grid Computing
Abstract
Over the past decade there has been a huge shift in the way we perceive and utilize computing resources. Previously, computing needs were typically achieved by using localised resources and infrastructures and high-end scientific calculations would be performed on dedicated parallel machines. However, nowadays, we are seeing an increasing number of wide-area distributed-computing applications, which has led to the development of many different types of middleware, libraries and tools that allow geographically distributed resources to be unified into a single application. This approach to distributed computing has come under a wide number of different names, such as metacomputing, scalable computing, global computing, Internet computing and, more recently, Grid computing.2
Ian J. Taylor, Andrew B. Harrison

Protocols and Architectures I — P2P Applications

10. Gnutella
Abstract
Gnutella dened and popularised modern P2P technology through its truly decentralized design and implementation. It arrived right around the time when centrally organized solutions were being targeted and provided a mech- anism that oered a much more tolerant structure, where no single entity could be isolated to bring down the entire network. The Gnutella network consists of thousands of information providers, which are not indexed in a central place. Therefore, to shut down such a network is not trivial since a vast number of peers (i.e., many hundreds) would have to be eliminated.
This chapter provides an overview of the original 0.4 version of the Gnutella network. It combines a conceptual overview and a user-friendly rewrite of the Gnutella protocol specication [168]. A historical perspective is provided, along with usage scenarios, which include joining and searching the Gnutella network. This is followed by a detailed account of its protocol specication that provides the fundamental information required for a competent programmer to build a Gnutella network from scratch.
Ian J. Taylor, Andrew B. Harrison
11. Scalability
Abstract
P2P has led to a recent renewal of interest in decentralized systems and a number of scalable applications being deployed on the Internet. Although the underlying Internet itself is the largest decentralized computer system in the world, most systems have employed a completely centralized topology in the 1990s through the massive growth of the Web as discussed in Chapter 6. With the emergence of P2P in early 2000, there has been a shift into employing the use of radically decentralized architectures, such as Gnutella [4]. In practice, however, extreme architectural choices in either direction are seldom the way to build a usable system. Most current P2P file-sharing software, for example, use a hybrid of the two approaches.
In this chapter, we look at the ways in which peers are organized within ad hoc, pervasive, multi-hop networks by providing an overview of two common scalable approaches often referred to broadly as structured and unstructured P2P networks. Structured networks adopt a somewhat hierarchical approach by creating a structured overlay across the network and dividing content across the distributed network by using hash functions. An unstructured approach, however, adapts dynamically by adding caching centres (super peers) across the network in an ad hoc fashion. Both approaches have been shown to scale to large numbers of participants and both have somewhat equal popularity at the time of writing. Although the specifics of the various approaches within these categories are out of scope for this chapter, we do provide here a high-level overview of the basics of each approach and discuss their similarities and differences.
Ian J. Taylor, Andrew B. Harrison
12. Freenet
Abstract
This chapter is dedicated to the Freenet distributed information storage system. Freenet was chosen to be included because it is an excellent example of how many of the techniques discussed so far in this book can be adapted and used in a practical and innovative system. For example, Freenet works within a P2P environment (Chapter 6) and addresses the inherently untrustworthy and unreliable participants within such a network. Freenet is self-organizing and incorporates a learning algorithm that allows the network to adapt its routing behaviour based on prior interactions. This algorithm is interestingly similar to social networking and achieves a power-law (centralized-decentralized) structure (discussed in Chapter 11) in a self-organizing manner. Such a technique offers a different perspective on how to efficiently scale P2P networks (e.g., Gnutella in Chapter 10) to hundred of thousands of nodes.
Freenet was designed from the ground up to provide extensive protection from hostile attack, from both inside the network and out by addressing key information privacy issues. Freenet therefore implements various security strategies that maintain privacy for all participants, regardless of their particular role. The individual security techniques that are used collectively in Freenet were discussed in Chapter 4.
Ian J. Taylor, Andrew B. Harrison
13. BitTorrent
Abstract
The above quote if from Bram Cohen, BitTorrent's author, in an interview with Wired in 2005[184]. The first version of the BitTorrent protocol was presented in the first CodeCon conference[185]1 in San Francisco in February 2002 and subsequently became one of the most popular Internet file sharing protocols [184]. In essence, BitTorrent introduced two key concepts that were novel to its file-sharing competitors at the time. First, rather than providing a search protocol itself, it was designed to integrate seamlessly with the Web and made files (torrents) available via Web pages, which could be searched for using standard Web search tools. Second, it enabled so-called file swarming; that is, once a peer starts downloading that file, it also makes whatever portion of the file that is downloaded immediately available for sharing. The file-swarming process is enabled through the use of a tracker, which is an HTTP-based server used to dynamically synchronise and update the peers as they are downloading as to the locations and availability of pieces of the file in question on the network. The tracker also can monitor users” usage on the network and can implement a tit-for-tat scheme, which divides bandwidth according to how much a peer contributes to the other peers in the network. The result of the file swarming techniques made BitTorrent an extremely attractive tool for sharing files because it allowed users to download files to the maximum of their download capability of their broadband connection by enabling simultaneous downloads of pieces of the same file from multiple users. This is significant because typically a broadband connection has a far lower upload bandwidth than a download one (the upload bandwidth can be typically ten times slower than the download). This means that being able to connect to, say, ten peers, will balance this mismatch and enable the full potential of your Internet link, which results in files being downloaded several times faster than other file sharing systems on the Internet at that time. The BitTorrent protocol therefore has had a massive impact on file sharing applications and similar schemes have been adopted by competitors since. Further, its use has far outgrown the illicit file-sharing arena and nowadays the BitTorrent protocol or similar techniques are used in a multitude of different applications in science and business and it has even being integrated into hardware devices. In this sense, the protocol has grown up and is now taken very seriously throughout the Internet community.
Ian J. Taylor, Andrew B. Harrison

Protocols and Architectures II — Middleware

14. Jini
Abstract
This chapter gives an overview of Jini, which provides a further example of well-known distributed-object-based systems that were discussed in Chapter 8. Jini is similar in concept to industry-pervasive systems such as CORBA [136] and DCOM [3]. It is distinguished by being based on Java, and deriving many features purely from this Java basis (e.g., the use of RMI and Java serialization). There are other Java frameworks from Sun which would appear to overlap Jini, such as Enterprise Java Beans (EJBs) [190]. However, whereas EJBs make it easier to build business logic servers, Jini could be used to distribute these services in a network plug-and-play manner.
In this chapter, a background is given into the development of Jini and into the network plug-and-play manner in which Jini accesses distributed objects. Specifically, this chapter will build on the Java RMI description and Java serialization mechanisms, discussed in Section 8.5, which form the transportation backbone for Jini. The discovery of Jini services is described and the notion of a Jini proxy is introduced.
Ian J. Taylor, Andrew B. Harrison
15. Jxta
Abstract
The Jxta2 middleware is a set of open, generalized peer-to-peer protocols that allow any connected device (cell phone to PDA, PC to server) on the network to communicate and collaborate. Jxta is an open-source project that has been developed by a number of contributors and as such, it is still evolving. For the most recent Jxta Technology Specification, see [24].
The goal of project Jxta is to develop and standardize basic building blocks and services to enable developers to build and deploy interoperable P2P services and applications. The Jxta project intends to address this problem by providing a simple and generic P2P platform to host any kind of network services. The term Jxta is short for juxtapose, as in side by side. It is a recognition that P2P is juxtaposed to client/server or Web-based computing, which is today's traditional distributed-computing model. Jxta provides a common set of open protocols3 and an open-source reference implementation for developing P2P applications.
Ian J. Taylor, Andrew B. Harrison
16. Web Services Protocols
Abstract
The aim of the Web Service protocols is to provide a means of describing data and behavior in a machine-processable way. Web Services use XML to describe metadata for discovery purposes and to define behaviour, and in general presume that XML documents will be the payload of the messages exchanged. These XML documents are not typically descriptions of resources in the sense that the Web exchanges resource representations, but rather encoded data that can trigger operations and responses from remote hosts. As such, Web Services are akin to distributed object technologies. Indeed, SOAP originally stood for Simple Object Access Protocol and early versions influenced the XML Remote Procedure Call (XML-RPC) specification [195].
This affinity with distributed object approaches, as well as the way in which Web Services use the Web, has meant they have had their detractors, as the quote above demonstrates. The debate between the Web Services community and the Web community led to the so-called SOAP versus REST war that raged from 2002 until roughly 2007 with no outright victor. Currently, in 2008, a mood of sometimes resigned acceptance has fallen on the debate with many authors suggesting that both approaches are suitable for different kinds of systems, in terms of scale, heterogeneity and underlying purpose. Those with reservations about Web Services technologies argue that they go against the grain of the Web, because they do not use its existing infrastructure, such as URIs for identifying resources, and HTTP for modelling behaviour. Instead, SOAP and WSDL define their requirements from scratch and merely use Web technology to expose and transfer these formats. Furthermore, during the period in which a plethora of Web Services specifications were being developed (see Section 16.4 for some of the most salient), many argued that the quantity and their complexity would lead to unmanageable and brittle systems. At the core, this debate was about differences in architectural approaches, in particular between object-orientation, service-orientation and resource-orientation. The abstractions underpinning these architectures and the relationships between them are described in Section 8.4.
Ian J. Taylor, Andrew B. Harrison
17. OGSA
Abstract
The Globus toolkit (GT) is the de facto open source toolkit for Grid computing. The functionality of the GT version 3.x is exposed as a collection of virtual Open Grid Services Architecture services [7]. OGSA services, or Grid services, extend Web Services, discussed in the previous chapter, to add features that are often needed within distributed applications. Specifically, OGSA adds state to Web Services in order to control the remote service during its lifetime. Whereas Web Services are stateless, OGSA-based services are stateful. OGSA services represent the GT's various components, e.g., GRAM, MDS, etc., described in Chapter 9, using this unified representation and can be aggregated and used within virtual organizations in a number of different ways.
However, the road to OGSA realisation has not been easy. In 2002 the Open Grid Services Infrastructure specification was announced. This specification defined the extensions to WSDL needed in order to represent and enable stateful Web Services. The designers of OGSI introduced the notion of a Grid service, which extended a basic Web Service to attach a number of additions to which a Grid service must adhere. Stateful resources within OGSI were modelled as Web Services that support the GridService portType (see Section 16.2.2), which is an extension of the WSDL portType.
Ian J. Taylor, Andrew B. Harrison
18. Web 2.0
Abstract
The term Web 2.0 is applied to diverse contexts and technologies and is surrounded with varying degrees of hype. It seems to sum up a “movement,” but eludes precise definition. Is it the dawning of the age of Aquarius, or a branding exercise designed to rescue the Web after the dot-com bubble burst? Is it a set of technologies, or a business model? Can it be summed up by Web pages with curvy boxes and 3D icons, or are there real insights underlying it?
Ian J. Taylor, Andrew B. Harrison
19. On the Horizon
Abstract
As memory becomes more affordable, devices become more compact and networks become more ubiquitous, scenarios that were once nothing more than visions are beginning to become reality. These scenarios leverage the pervasiveness of computing devices and networking capabilities, offering possibilities in which the boundaries between human and computer, and between mobile device and supercomputer are blurred. Approaches drawing on these trends go under different names, and have different areas of focus. In this chapter we look at some of the paradigms and concepts that are defining next-generation distributed systems, beginning with systems and technologies that are already being implemented and used, and moving towards the more esoteric visions proposed by contemporary commentators.
Ian J. Taylor, Andrew B. Harrison

Deployment

20. Distributed Object Deployment Using Jini
Abstract
In this chapter, two specic examples illustrate the use of RMI and Jini. The focus of this chapter is not to create complicated remote Java services and therefore give a lesson in Java programming, but rather it is to provide the reader with the core framework and components needed in order to run simple RMI and Jini services. Providing complicated services at this stage simply clouds this issue of understanding the fundamentals of running remote services.
Therefore, the source is very concise but it gives all the necessary code needed in order to run and describe remote RMI and Jini services. Such code can easily be extended to provide as complicated a service as the reader de- sires. The complete set of examples for this chapter and others can be found at the following Web site: http://www.p2pgridbook.com/
Ian J. Taylor, Andrew B. Harrison
21. P2P Deployment Using Jxta
Abstract
This chapter describes how to deploy applications using Jxta's Java reference implementation [24]. There are two specic examples: the rst illustrates how to manually start the Jxta platform and congure it using the Jxta Congu-rator application; and the second demonstrates how to automatically set an ad doc networking environment by employing the use of the NetworkMan- ager class and a client/server application using Jxta Unicast pipes can be created within that environment. The purpose of these examples is twofold: to familiarise the reader with the style of coding involved in creating Jxta applications and to discuss the tools involved in using the Jxta platform, and considerations that this environment exhibits.
Jxta is very much an evolving project and during the past four years alone, between the rst edition of this book and this one, the Jxta API has changed signicantly (even the stable build directory location has changed). The chap- ter presented here therefore is a complete rewrite of the one found in the previous edition of this book. The examples provided here are re-engineered examples from the rst edition using the updated API, and many more ex- amples can be found in the Jxta tutorial.1 The examples provided here are representative of the Jxta API and how to program using the toolkit. Even if the API changes again, it is highly unlikely that the style of the approach will di-er so much as a whole. Code for these examples can be found at the book Web site: http://www.p2pgridbook.com/
Ian J. Taylor, Andrew B. Harrison
22. Web Services Deployment
Abstract
In this chapter we will create, deploy and invoke a Web Service. The code and scripts that we create here, along with pointers to required libraries, can also be downloaded in zip form from: http://www.p2pgridbook.com
Ian J. Taylor, Andrew B. Harrison
23. Web Deployment Using Atom
Abstract
In this chapter we will look at how to deploy onto the Web using the Atom Syndication Format and Publishing Protocols. As described in Chapter 18, Atom provides both a feed format for enabling clients to subscribe to fre- quently updated content, as well as a publishing protocol, to allow an author to edit and create new content that clients can receive.
We are going to create an Atom version of the Event service that we made in the Web Services Deployment chapter. This is partly for comparative rea- sons, and also to show how Atom can be used for more than just a traditional syndication service. We will be able to reuse some of the container classes we dened for theWeb Service, specically the VEvent and VCard classes.We will also create a class that can generate HTML content from them. In doing this, we will convert these classes into their microformat equivalents of hCalendar and hCard. This will make our data types available for machine processing, as well as humanly readable in a browser. The code and scripts that we create here, along with pointers to required libraries, can also be downloaded in zip form from: http://www.p2pgridbook.com
Ian J. Taylor, Andrew B. Harrison
ERRATA
Springer-Verlag London Limited
Backmatter
Metadaten
Titel
From P2P and Grids to Services on the Web
herausgegeben von
Ian J. Taylor
Andrew B. Harrison
Copyright-Jahr
2009
Verlag
Springer London
Electronic ISBN
978-1-84800-123-7
Print ISBN
978-1-84800-122-0
DOI
https://doi.org/10.1007/978-1-84800-123-7