Skip to main content

2004 | Buch

Grid Resource Management

State of the Art and Future Trends

herausgegeben von: Jarek Nabrzyski, Jennifer M. Schopf, Jan Węglarz

Verlag: Springer US

Buchreihe : International Series in Operations Research & Management Science

insite
SUCHEN

Über dieses Buch

Grid Resource Management: State of the Art and Future Trends presents an overview of the state of the field and describes both the real experiences and the current research available today. Grid computing is a rapidly developing and changing field, involving the shared and coordinated use of dynamic, multi-institutional resources. Grid resource management is the process of identifying requirements, matching resources to applications, allocating those resources, and scheduling and monitoring Grid resources over time in order to run Grid applications as efficiently as possible.
While Grids have become almost commonplace, the use of good Grid resource management tools is far from ubiquitous because of the many open issues of the field, including the multiple layers of schedulers, the lack of control over resources, the fact that resources are shared, and that users and administrators have conflicting performance goals.

Inhaltsverzeichnis

Frontmatter

Introduction to Grids and Resource Management

Frontmatter
Chapter 1. The Grid in a Nutshell
Abstract
The emergence and widespread adoption of Grid computing has been fueled by continued growth in both our understanding of application requirements and the sophistication of the technologies used to meet these requirements. We provide an introduction to Grid applications and technologies and discuss the important role that resource management will play in future developments.
Ian Foster, Carl Kesselman
Chapter 2. Ten Actions When Grid Scheduling
The User as a Grid Scheduler
Abstract
In this chapter we present a general architecture or plan for scheduling on a Grid. A Grid scheduler (or broker) must make resource selection decisions in an environment where it has no control over the local resources, the resources are distributed, and information about the systems is often limited or dated. These interactions are also closely tied to the functionality of the Grid Information Services. This Grid scheduling approach has three phases: resource discovery, system selection, and job execution. We detail the steps involved in each phase.
Jennifer M. Schopf
Chapter 3. Application Requirements for Resource Brokering in a Grid Environment
Abstract
We discuss the problem of resource brokering in a Grid environment from the perspective of general application needs. Starting from a illustrative scenario, these requirements are broken down into the general areas of computation, data, network, security and accounting. Immediate needs for applications to profitably start using the Grid are discussed, along with case studies for applications from astrophysics and particle physics.
Michael Russell, Gabrielle Allen, Tom Goodale, Jarek Nabrzyski, Ed Seidel
Chapter 4. Attributes for Communication Between Grid Scheduling Instances
Abstract
Typically, Grid resources are subject to individual access and usage policies because they are provided by different owners. These policies are usually enforced by local management systems that maintain control of the resources. However, few Grid users are willing to deal with those management systems directly in order to coordinate the resource allocation for their jobs. This leads to a Grid scheduling architecture with several layers. In such an architecture, a higher-level Grid scheduling layer and the lower-level layer of local scheduling systems must efficiently cooperate in order to make the best use of Grid resources. In this chapter we describe attributes characterizing those features of local management systems that can be exploited by a Grid scheduler.
Uwe Schwiegelshohn, Ramin Yahyapour
Chapter 5. Security Issues of Grid Resource Management
Abstract
Secure management of Grid resources presents many challenges. This chapter will examine the security requirements that are essential to Grids and some of the software that is available to meet them. We will discuss how well these security tools have been utilized and review some of the existing and proposed security standards that may be the foundations of the next generation of Grid security tools.
Mary R. Thompson, Keith R. Jackson

Resource Management in Support of Collaborations

Frontmatter
Chapter 6. Scheduling in the Grid Application Development Software Project
Abstract
Developing Grid applications is a challenging endeavor that at the moment requires both extensive labor and expertise. The Grid Application Development Software Project (GrADS) provides a system to simplify Grid application development. This system incorporates tools at all stages of the application development and execution cycle. In this chapter we focus on application scheduling, and present the three scheduling approaches developed in GrADS: development of an initial application schedule (launch-time scheduling), modification of the execution platform during execution (rescheduling), and negotiation between multiple applications in the system (metascheduling). These approaches have been developed and evaluated for platforms that consist of distributed networks of shared workstations, and applied to real-world parallel applications.
Holly Dail, Otto Sievert, Fran Berman, Henri Casanova, Asim YarKhan, Sathish Vadhiyar, Jack Dongarra, Chuang Liu, Lingyun Yang, Dave Angulo, Ian Foster
Chapter 7. Workflow Management in Griphyn
Abstract
This chapter describes the work done within the NSF-funded GriPhyN project in the area of workflow management. The targeted workflows are large both in terms of the number of tasks in a given workflow and in terms of the total execution time of the workflow, which can sometimes be on the order of days. The workflows represent the computation that needs to be performed to analyze large scientific datasets produced in high-energy physics, gravitational physics, and astronomy. This chapter discusses issues associated with workflow management in the Grid in general and also provides a description of the Pegasus system, which can generate executable workflows.
Ewa Deelman, James Blythe, Yolanda Gil, Carl Kesselman

State of the Art Grid Resource Management

Frontmatter
Chapter 8. Grid Service Level Agreements
Grid Resource Management with Intermediaries
Abstract
We present a reformulation of the well-known GRAM architecture based on the Service-Level Agreement (SLA) negotiation protocols defined within the Service Negotiation and Access Protocol (SNAP) framework. We illustrate how a range of local, distributed, and workflow scheduling mechanisms can be viewed as part of a cohesive yet open system, in which new scheduling strategies and management policies can evolve without disrupting the infrastructure. This architecture remains neutral to, and in fact strives to mediate, the potentially conflicting resource, community, and user policies.
Karl Czajkowski, Ian Foster, Carl Kesselman, Steven Tuecke
Chapter 9. Condor and Preemptive Resume Scheduling
Abstract
Condor is a batch job system that, unlike many other scheduling systems, allows users to access both dedicated computers and computers that are not always available, perhaps because they are used as desktop computers or are not under local control. This introduces a number of problems, some of which are solved by Condor’s preemptive resume scheduling, which is the focus of this paper. Preemptive resume scheduling allows jobs to be interrupted while running, and then restarted later. Condor uses preemption in several ways in order to implement the policies supplied by users, computer owners, and system administrators.
Alain Roy, Miron Livny
Chapter 10. Grid Resource Management in Legion
Abstract
Grid resource management is not just about scheduling jobs on the fastest machines, but rather about scheduling all compute objects and all data objects on machines whose capabilities match the requirements, while preserving site autonomy, recognizing usage policies and respecting conditions for use. In this chapter, we present the Grid resource management of Legion, an object-based Grid infrastructure system. We argue that Grid resource management requires not a one-size-fits-all scheduler but an architectural framework that can accommodate different schedulers for different classes of problems.
Anand Natrajan, Marty A. Humphrey, Andrew S. Grimshaw
Chapter 11. Grid Scheduling with Maui/Silver
Abstract
This chapter provides an overview of the interactions of and services provided by the Maui/Silver Grid scheduling system. The Maui Scheduler provides high performance scheduling for local clusters including resource reservation, availability estimation, and allocation management. Silver utilizes the underlying capabilities of Maui to allow multiple independent clusters to be integrated and intelligently scheduled.
David B. Jackson
Chapter 12. Scheduling Attributes and Platform LSF
Abstract
Scheduling is highly complex in the context of Grid Computing. To draw out this complexity, it makes sense to isolate and investigate key areas of the problem. Here we report on communication attributes between higher- and lower-level scheduling instances. Using Platform LSF as the lower-level scheduling instance, we report on overall agreement and a few points of departure relative to the de facto reference on scheduling attributes detailed in Chapter 4. The key concerns involve access to tentative schedules and control exclusivity. While understandable, we show how impractical such ideals prove in the case of production Enterprise deployments; we also challenge the necessity of the schedule-access attribute based on experiences with Platform MultiCluster. Furthermore, experience with the Globus ToolkittTM allows us to expose a lowest-common-denominator tendency in scheduling attributes. We encourage re-assessment of communication attributes subject to these findings and broader comparisons. We also urge for integration of isolated scheduling activities under the framework provided by the Open Grid Services Architecture (OGSA).
Ian Lumb, Chris Smith
Chapter 13. PBS Pro: Grid Computing and Scheduling Attributes
Abstract
The PBS Pro software is a full-featured workload management and job scheduling system with capabilities that cover the entire Grid computing space: security, information, compute, and data. The security infrastructure includes user authentication, access control lists, X.509 certificate support, and cross-site user mapping facilities. Detailed status and usage information is maintained and available both programmatically and via a graphical interface. Compute Grids can be built to support advance reservations, harvest idle desktop compute cycles, and peer schedule work (automatically moving jobs across the room or across the globe). Data management in PBS Pro is handled via automatic stage- in and stage-out of files. The PBS Pro system has numerous site-tunable parameters and can provide access to available scheduling information, information about requesting resources, allocation properties, and information about how an allocation execution can be manipulated.
Bill Nitzberg, Jennifer M. Schopf, James Patton Jones

Prediction and Matching for Grid Resource Management

Frontmatter
Chapter 14. Performance Information Services for Computational Grids
Abstract
Grid schedulers or resource allocators (whether they be human or automatic scheduling programs) must choose the right combination of resources from the available resource pool while the performance and availability characteristics of the individual resources within the pool change from moment to moment. Moreover, the scheduling decision for each application component must be made before the component is executed making scheduling a predictive activity. A Grid scheduler, therefore, must be able to predict what the deliverable resource performance will be for the time period in which a particular application component will eventually use the resource.
In this chapter, we describe techniques for dynamically characterizing resources according to their predicted performance response to enable Grid scheduling and resource allocation. These techniques rely on three fundamental capabilities: extensible and non-intrusive performance monitoring, fast prediction models, and a flexible and high-performance reporting interface. We discuss these challenges in the context of the Network Weather Service (NWS) — an online performance monitoring and forecasting service developed for Grid environments. The NWS uses adaptive monitoring techniques to control intrusiveness, and non-parametric forecasting methods that are lightweight enough to generate forecasts in real-time. In addition, the service infrastructure used by the NWS is portable among all currently available Grid resources and is compatible with extant Grid middleware such as Globus, Legion, and Condor.
Rich Wolski, Lawrence J. Miller, Graziano Obertelli, Martin Swany
Chapter 15. Using Predicted Variance for Conservative Scheduling on Shared Resources
Abstract
In heterogeneous and dynamic environments, efficient execution of parallel computations can require mappings of tasks to processors with performance that is both irregular and time varying. We propose a conservative scheduling policy that uses information about expected future variance in resource capabilities to produce more efficient data mapping decisions.
We first present two techniques to estimate future load and variance, one based on normal distributions and another using tendency-based prediction methodologies. We then present a family of stochastic scheduling algorithms that exploit such predictions when making data mapping decisions. We describe experiments in which we apply our techniques to an astrophysics application. The results of these experiments demonstrate that conservative scheduling can produce execution times that are significantly faster and less variable than other techniques.
Jennifer M. Schopf, Lingyun Yang
Chapter 16. Improving Resource Selection and Scheduling Using Predictions
Abstract
The introduction of computational Grids has resulted in several new problems in the area of scheduling that can be addressed using predictions. The first problem is selecting where to run an application on the many resources available in a Grid. Our approach to help address this problem is to provide predictions of when an application would start to execute if submitted to specific scheduled computer systems. The second problem is gaining simultaneous access to multiple computer systems so that distributed applications can be executed. We help address this problem by investigating how to support advance reservations in local scheduling systems. Our approaches to both of these problems are based on predictions for the execution time of applications on space-shared parallel computers. As a side effect of this work, we also discuss how predictions of application run times can be used to improve scheduling performance.
Warren Smith
Chapter 17. The Classads Language
Abstract
The Classified Advertisements (ClassAds) language facilitates the representation and participation of heterogeneous resources and customers in the resource discovery and scheduling frameworks of highly dynamic distributed environments. Although developed in the context of the Condor system, the ClassAds language is an independent technology that has many applications, especially in systems that exhibit the uncertainty and dynamism inherent in large distributed systems. In this chapter, we present a detailed description of the structure and semantics of the ClassAds language.
Rajesh Raman, Marvin Solomon, Miron Livny, Alain Roy
Chapter 18. Multicriteria Aspects of Grid Resource Management
Abstract
Grid resource management systems should take into consideration the application requirements and user preferences on the one hand and virtual organizations’ polices on the other hand. In order to satisfy both users and resource owners, many metrics, criteria, and constraints should be introduced to formulate multicriteria strategies for Grid resource management problems.
In this chapter we argue that Grid resource management involves multiple criteria and as such requires multicriteria decision support. We discuss available multicriteria optimization techniques and methods for user preference modeling. The influence of the Grid nature on the resource management techniques used is also emphasized, including issues such as dynamic behavior, uncertainty, and incomplete information. We present three aspects of the resource management process: (i) providing the resource management system with all the necessary information concerning accessible resources, application requirements, and user preferences, (ii) making decisions that map tasks to resources in the best possible way, and (iii) controlling the applications and adapting to changing conditions of the Grid environment.
Krzysztof Kurowski, Jarek Nabrzyski, Ariel Oleksiak, Jan Węglarz
Chapter 19. A Metaheuristic Approach to Scheduling Workflow Jobs on a Grid
Abstract
In this chapter we consider the problem of scheduling workflow jobs on a Grid. This problem consists in assigning Grid resources to tasks of a workflow job across multiple administrative domains in such a way that minimizes the execution time of a particular set of tasks. The considered problem is formulated as a multi-mode resource-constrained project scheduling problem with schedule-dependent setup times, which is an extension of the classical resource-constrained project scheduling problem to minimize the makespan (RCPSP). We present a binary linear programming (0–1 LP) formulation of the problem, and propose a local search metaheuristic approach to solve the considered problem.
Marek Mika, Grzegorz Waligóra, Jan Węglarz

Data-Centric Approaches for Grid Resource Management

Frontmatter
Chapter 20. Storage Resource Managers
Essential Components for the Grid
Abstract
Storage Resource Managers (SRMs) are middleware components whose function is to provide dynamic space allocation and file management of shared storage components on the Grid. They complement Compute Resource Managers and Network Resource Managers in providing storage reservation and dynamic information on storage availability for the planning and execution of a Grid job. SRMs manage two types of resources: space and files. When managing space, SRMs negotiate space allocation with the requesting client, and/or assign default space quotas. When managing files, SRMs allocate space for files, invoke file transfer services to move files into the space, pin files for a certain lifetime, release files upon the clients request, and use file replacement policies to optimize the use of the shared space. SRMs can be designed to provide effective sharing of files, by monitoring the activity of shared files, and make dynamic decisions on which files to replace when space is needed. In addition, SRMs perform automatic garbage collection of unused files by removing selected files whose lifetime has expired when space is needed. In this chapter we discuss the design considerations for SRMs, their functionality, and their interfaces. We demonstrate the use of SRMs with several examples of real implementations that are in use today in a routine fashion or in a prototype form.
Arie Shoshani, Alexander Sim, Junmin Gu
Chapter 21. NeST: A Grid Enabled Storage Appliance
Abstract
We describe NeST, a flexible software-only storage appliance designed to meet the storage needs of the Grid. NeST has three key features that make it well-suited for deployment in a Grid environment. First, NeST provides a generic data transfer architecture that supports multiple data transfer protocols (including GridFTP and NFS), and allows for the easy addition of new protocols. Second, NeST is dynamic, adapting itself on-the-fly so that it runs effectively on a wide range of hardware and software platforms. Third, NeST is Grid-aware, implying that features that are necessary for integration into the Grid, such as storage space guarantees, mechanisms for resource and data discovery, user authentication, and quality of service, are a part of the NeST infrastructure. We include a practical discussion about building Grid tools using the NeST software.
John Bent, Venkateshwaran Venkataramani, Nick LeRoy, Alain Roy, Joseph Stanley, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Miron Livny
Chapter 22. Computation Scheduling and Data Replication Algorithms for Data Grids
Abstract
Data Grids seek to harness geographically distributed resources for large-scale data-intensive problems such as those encountered in high energy physics, bioinformatics, and other disciplines. These problems typically involve numerous, loosely coupled jobs that both access and generate large data sets. Effective scheduling in such environments is challenging, because of a need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources.
We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of job scheduling and data movement (replication) algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication on the scheduling strategy, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation of the overall Data Grid system.
Kavitha Ranganathan, Ian Foster

Quality of Service: QoS

Frontmatter
Chapter 23. Gara: A Uniform Quality of Service Architecture
Abstract
Many Grid applications, such as interactive and collaborative environments, can benefit from guarantees for resource performance or quality of service (QoS). Although QoS mechanisms have been developed for different types of resources, they are often difficult to use together because they have different semantics and interfaces. Moreover, many of them do not allow QoS requests to be made in advance of when they are needed. In this chapter, we describe GARA, which is a modular and extensible QoS architecture that allows users to make advance reservations for different types of QoS. We also describe our implementation of network QoS in detail.
Alain Roy, Volker Sander
Chapter 24. QoS-Aware Service Composition for Large-Scale Peer-to-Peer Systems
Abstract
In this chapter, we present a scalable QoS-aware service composition framework, SpiderNet, for large-scale peer-to-peer (P2P) systems. The SpiderNet framework comprises: (1) service path selection, which is responsible for selecting and composing proper service components into an end-to-end service path satisfying the user’s functional and quality requirements; (2) service path instantiation, which decides the specific peers, where the chosen service components are actually instantiated, based on the distributed, dynamic, and composite resource information; and (3) benefit-driven clustering, which dynamically organizes a large-scale P2P system into an overlay network, based on each peer’s benefit. Conducting extensive simulations of a large-scale P2P system (104 peers), we show that SpiderNet can achieve much higher service provisioning success rate than other common heuristic approaches.
Xiaohui Gu, Klara Nahrstedt

Resource Management in Peer-to-Peer Environments

Frontmatter
Chapter 25. A Peer-to-Peer Approach to Resource Location in Grid Environments
Abstract
Resource location (or discovery) is a fundamental service for resource-sharing environments: given desired resource attributes, the service returns locations of matching resources. Designing such a service for a Grid environment of the scale and volatility of today’s peer-to-peer systems is not trivial. We explore part of the design space through simulations on an emulated Grid. To this end, we propose four axes that define the resource location design space, model and implement an emulated Grid, evaluate a set of resource discovery mechanisms, and discuss results.
Adriana Iamnitchi, Ian Foster
Chapter 26. Resource Management in the Entropia System
Abstract
Resource management for desktop Grids is particularly challenging among Grid resource management because of the heterogeneity in system, network, and sharing of resources with desktop users. Desktop Grids must support thousands to millions of computers with low management overhead. We describe the resource management techniques used in the Entropia Internet and Enterprise desktop Grids, to make the systems manageable, usable, and highly productive. These techniques exploit a wealth of database, Internet, and traditional high performance computing technologies and have demonstrated scale to hundreds of thousands of computers. In particular, the Enterprise system includes extensive support for central management, failure management, and robust execution.
Andrew A. Chien, Shawn Marlin, Stephen T. Elbert
Chapter 27. Resource Management for the Triana Peer-to-Peer Services
Abstract
In this chapter we discuss the Triana problem solving environment and its distributed implementation. Triana-specific distribution mechanisms are described along with the corresponding mappings. We outline the middleware independent nature of this implementation through the use of an application-driven API, called the GAT. The GAT supports many modes of operation including, but not limited to, Web Services and JXTA. We describe how the resources are managed within this context as Triana services and give an overview of one specific GAT binding using JXTA, used to prototype the distributed implementation of Triana services. A discussion of Triana resource management is given with emphasis on Triana-service organization within both P2P and Grid computing environments.
Ian Taylor, Matthew Shields, Ian Wang

Economic Approaches and Grid Resource Management

Frontmatter
Chapter 28. Grid Resource Commercialization
Economic Engineering and Delivery Scenarios
Abstract
In this chapter we consider the architectural steps needed to commercialize Grid resources as technical focus shifts towards business requirements. These requirements have been met for conventional utilities resources through commoditization, a variety of market designs, customized contract design, and decision support. Decision support is needed to exploit the flexibility provided by Grids in the context of uncertain and dynamic user requirements and resource prices. We provide a detailed example of how the decision support for users can be formulated as a multi-stage stochastic optimization problem. We derive required architectural features for commercialization using inspiration from conventional utilities and considering the delivery context of Grid resources. We consider two basic delivery scenarios: a group of peers and a group with an external provider. In summary, we provide a conceptual framework for Grid resource commercialization including both the understanding of the underlying resource commodity characteristics and also the delivery context.
Chris Kenyon, Giorgos Cheliotis
Chapter 29. Trading Grid Services within the UK E-Science Grid
Abstract
The Open Grid Services Architecture (OGSA) presents the Grid community with an opportunity to define standard service interfaces to enable the construction of an interoperable Grid infrastructure. The provision of this infrastructure has, to date, come from the donation of time and effort from the research community primarily for their own use. The growing involvement of industry and commerce in Grid activity is accelerating the need to find business models that support their participation. It is therefore essential that an economic infrastructure be incorporated into the OGSA to support economic transactions between service providers and their clients. This chapter describes current standardization efforts taking place with the Global Grid Forum and the implementation of such an architecture within the UK e-Science Programme through the Computational Markets project.
Steven Newhouse, Jon MacLaren, Katarzyna Keahey
Chapter 30. Applying Economic Scheduling Methods to Grid Environments
Abstract
Scheduling becomes more difficult when resources are geographically distributed and owned by individuals with different access and cost policies. This chapter addresses the idea of applying economic models to Grid scheduling. We describe a scheduling infrastructure that implements a market-economy approach, and we evaluate the efficiency of this approach using simulations with real workload traces. Our evaluation shows that this economic scheduling algorithm provides average weighted response-times as good or better than a common scheduling algorithm with backfilling. Our economic model has the additional advantages of supporting different price models, different optimization objectives, varying access policies, and Quality of Service demands.
Carsten Ernemann, Ramin Yahyapour
Backmatter
Metadaten
Titel
Grid Resource Management
herausgegeben von
Jarek Nabrzyski
Jennifer M. Schopf
Jan Węglarz
Copyright-Jahr
2004
Verlag
Springer US
Electronic ISBN
978-1-4615-0509-9
Print ISBN
978-1-4613-5112-2
DOI
https://doi.org/10.1007/978-1-4615-0509-9