Skip to main content

2004 | Buch

Utility Computing

15th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, DSOM 2004, Davis, CA, USA, November 15-17, 2004. Proceedings

herausgegeben von: Akhil Sahai, Felix Wu

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This volume of the Lecture Notes in Computer Science series contains all the papersacceptedforpresentationatthe13thIFIP/IEEEInternationalWorkshop on Distributed Systems: Operations and Management (DSOM 2004), which was held at the University of California, Davis during November 15–17, 2004. DSOM2004wasthe?fteenthworkshopinaseriesofannualworkshopsandit followed in the footsteps of highly successful previous meetings, the most recent of which were held in Heidelberg, Germany (DSOM 2003), Montreal, Canada (DSOM 2002), Nancy, France (DSOM 2001), and Austin, USA (DSOM 2000). The goal of the DSOM workshops is to bring together researchers in the areas of networks, systems, and services management, from both industry and academia, to discuss recent advances and foster future growth in this ?eld. In contrast to the larger management symposia, such as IM (Integrated Management) and NOMS (Network Operations and Management Symposium), the DSOM wo- shops are organized as single-track programs in order to stimulate interaction among participants. The focus of DSOM 2004 was “Management Issues in Utility Computing. ” Increasingly there is a trend now towards managing large infrastructures and services within utility models where resources can be obtained on demand. Such a trend is being driven by the desire to consolidate infrastructures within - terprises and across enterprises using third-party infrastructure providers and networked infrastructures like Grid and PlanetLab. The intent in these init- tives is to create systems that provide automated provisioning, con?guration, and lifecycle management of a wide variety of infrastructure resources and s- vices, on demand.

Inhaltsverzeichnis

Frontmatter

Management Architecture

Requirements on Quality Specification Posed by Service Orientation
Abstract
As service orientation is gaining more and more momentum, the need for common concepts regarding Quality of Service (QoS) and its specification emerges. In recent years numerous approaches to specifying QoS were developed for special subjects like multimedia applications or middleware for distributed systems. However, a survey of existing approaches regarding their contribution to service oriented QoS specification is still missing.
In this paper we present a strictly service oriented, comprehensible classification scheme for QoS specification languages. The scheme is based on the MNM Service Model and the newly introduced LAL–brick which aggregates the dimensions Life cycle, Aspect and Layer of a QoS specification. Using the terminology of the MNM Service Model and the graphical notation of the LAL–brick we are able to classify existing approaches to QoS specification. Furthermore we derive requirements for future specification concepts applicable in service oriented environments.
Markus Garschhammer, Harald Roelle
Automating the Provisioning of Application Services with the BPEL4WS Workflow Language
Abstract
We describe the architecture and implementation of a novel workflow-driven provisioning system for application services, such as multi-tiered e-Commerce systems. These services need to be dynamically provisioned to accomodate rapid changes in the workload patterns. This, in turn, requires a highly automated service provisioning process, for which we were able to leverage a general-purpose workflow language and its execution engine. We have successfully integrated a workflow-based change management system with a commercial service provisioning system that allows the execution of automatically generated change plans as well as the monitoring of their execution.
Alexander Keller, Remi Badonnel
HiFi+: A Monitoring Virtual Machine for Autonomic Distributed Management
Abstract
Autonomic distributed management enables for deploying self-directed monitoring and control tasks that track dynamic network problems such as performance degradation and security threats. In this paper, we present a monitoring virtual machine interface (HiFi+) that enables users to define and deploy distributed autonomic management tasks using simple Java programs. HiFi+ provides a generic expressive and flexible language to define distributed event monitoring and correlation tasks in large-scale networks.
Ehab Al-Shaer, Bin Zhang

SLA Based Management

Defining Reusable Business-Level QoS Policies for DiffServ
Abstract
This paper proposes a PBNM (Policy Based Network Management) framework for automating the process of generating and distributing DiffServ configuration to network devices. The framework is based on IETF standards, and proposes a new business level policy model for simplifying the process of defining QoS policies. The framework is defined in three layers: a business level policy model (based on a IETF PCIM extension), a device independent policy model (based on a IETF QPIM extension) and a device dependent policy model (based on the IETF diffserv PIB definition). The paper illustrates the use of the framework by mapping the information models to XML documents. The XML mapped information model supports the reuse of rules, conditions and network information by using XPointer references.
André Beller, Edgard Jamhour, Marcelo Pellenz
Policy Driven Business Performance Management
Abstract
Business performance management (BPM) has emerged as a critical discipline to enable enterprise to manage their business solutions in an on demand fashion. BPM applications promote an adaptive means by emphasizing the ability to monitor and control both business processes and IT events. However, most BPM processes and architectures are usually linear and rigid; and once done, will be very hard to change. Hence, it does not help enterprise to create adaptive monitoring and control applications for business solutions. There is an urgent need of adaptive BPM framework to be used as a platform of developing BPM applications. This paper presents a policy based BPM framework to help enterprise to achieve on demand monitoring and control framework for business solutions.
Jun-Jang Jeng, Henry Chang, Kumar Bhaskaran
Business Driven Prioritization of Service Incidents
Abstract
As a result of its increasing role in the enterprise, the Information Technology (IT) function is changing, morphing from a technology provider into a strategic partner. Key to this change is its ability to deliver business value by aligning and supporting the business objectives of the enterprise. IT Management frameworks such as ITIL (IT Infrastructure Library, [3]) provide best practices and processes that support the IT function in this transition. In this paper, we focus on one of the various cross-domain processes documented in ITIL involving the service level, incident, problem and change management processes and present a theoretical framework for the prioritization of service incidents based on their impact on the ability of IT to align with business objectives. We then describe the design of a prototype system that we have developed based on our theoretical framework and present how that solution for incident prioritization integrates with other IT management software products of the HP OpenviewTM management suite.
Claudio Bartolini, Mathias Sallé

Policy Based Management

A Case-Based Reasoning Approach for Automated Management in Policy-Based Networks
Abstract
Policy-based networking technologies have been introduced as a promising solution to the problem of management of QoS-enabled networks. However, the potentials of these technologies have not been fully exploited yet. This paper proposes a novel policy-based architecture for autonomous self-adaptable network management. The proposed framework utilizes case-based reasoning (CBR) paradigms for online creation and adaptation of policies. The contribution of this work is two fold; the first is a novel guided automated derivation of network level policies from high-level business objectives. The second contribution is allowing for automated network level policy refinement to dynamically adapt the management system to changing requirements of the underlying environment while keeping with the originally imposed business objectives. We show how automated policy creation and adaptation can enhance the network services by making network components behavior more responsive and customizable to users’ and applications requirements.
Nancy Samaan, Ahmed Karmouch
An Analysis Method for the Improvement of Reliability and Performance in Policy-Based Management Systems
Abstract
Policy-based management shows good promise for application to semi-automated distributed systems management. It is extremely difficult, however, to create policies for controlling the behavior of managed distributed systems that are sufficiently accurate to ensure good reliability. Further, when policy-based management technology is to be applied to actual systems, performance, in addition to reliability, also becomes an important consideration. In this paper, we propose a static analysis method for improving both the reliability and the performance of policy-based management systems. With this method, all sets of policies whose actions might possibly access the same target entity simultaneously are detected. Such sets of policies could cause unexpected trouble in managed systems if their policies were to be executed concurrently. Additionally the results of the static analysis can be used in the optimization of policy processing, and we have developed an experimental system for such optimization. The results of experimental use of this system show that an optimized system is as much as 1.47 times faster than a non-optimized system.
Naoto Maeda, Toshio Tonouchi
Policy-Based Resource Assignment in Utility Computing Environments
Abstract
In utility computing environments, multiple users and applications are served from the same resource pool. To maintain service level objectives and maintain high levels of utilization in the resource pool, it is desirable that resources be assigned in a manner consistent with operator policies, while ensuring that shared resources (e.g., networks) within the pool do not become bottlenecks. This paper addresses how operator policies (preferences) can be included in the resource assignment problem as soft constraints. We provide the problem formulation and use two examples of soft constraints to illustrate the method. Experimental results demonstrate impact of policies on the solution.
Cipriano A. Santos, Akhil Sahai, Xiaoyun Zhu, Dirk Beyer, Vijay Machiraju, Sharad Singhal

Automated Management

Failure Recovery in Distributed Environments with Advance Reservation Management Systems
Abstract
Resource reservations in advance are a mature concept for the allocation of various resources, particularly in grid environments. Common grid toolkits such as Globus support advance reservations and assign jobs to resources at admission time. While the allocation mechanisms for advance reservations are available in current grid management systems, in case of failures the advance reservation perspective demands for strategies that support more than recovery of jobs or applications that are active at the time the resource failure occurs. Instead, also already admitted, but not yet started applications are affected by the failure and hence, need to be dealt with in an appropriate manner. In this paper, we discuss the properties of advance reservations with respect to failure recovery and outline a number of strategies applicable in such cases in order to reduce the impact of resource failures and outages. It can be shown that it pays to remap also affected but not yet started jobs to alternative resources if available. Alike reserving in advance, this can be considered as remapping in advance. In particular, a remapping strategy that prefers requests that were allocated a long time ago, provides a high fairness for clients as it implements similar functionality as advance reservations, while achieving the same performance as the other strategies.
Lars-Olof Burchard, Barry Linnert
Autonomous Management of Clustered Server Systems Using JINI
Abstract
A framework for the autonomous management of clustered server systems called LAMA (Large-scale system’s Autonomous Management Agent) is proposed in this paper. LAMA is based on agents, which are distributed over the nodes and built on JINI infrastructure. There are two classes of agents: a grand LAMA and ordinary LAMAs. An ordinary LAMA abstracts an individual node and performs node-wide configuration. The grand LAMA is responsible for monitoring and controlling all the ordinary ones. Using the discovery, join, lookup, and distributed security operations of JINI, a node can join the clustered system without secure administration. Also, a node’s failure can be detected automatically using the lease interface of the JINI. Resource reallocation is performed dynamically by a reallocation engine in the grand agent. The reallocation engine gathers the status of remote nodes, predicts resource demands, and executes reallocation by accessing the ordinary agents. The proposed framework is verified on our own clustered internet servers, called the CORE-Web server, for an audio-streaming service. The nodes are dynamically reallocated satisfying the performance requirements.
Chul Lee, Seung Ho Lim, Sang Soek Lim, Kyu Ho Park
Event-Driven Management Automation in the ALBM Cluster System
Abstract
One of major concerns on using a large-scale cluster system is manageability. The ALBM (Adaptive Load Balancing and Management) cluster system is an active cluster system that is scalable, reliable and manageable. We introduce the event-driven management automation by using the ALBM active cluster system. This architecture is based on an event management solution that is composed of event notification service, event channel service and event rule engine. Critical system state changes are generated as events and delivered to the event rule engine. According to the predefined management rules, some management actions are performed when a specific condition is satisfied. This event-driven mechanism can be used to manage the system automatically without human intervention. This event management solution can also be used for other advance management purpose, such as event correlation, root cause analysis, trend analysis or capacity planning. In order to support the management automation possibility, the experimental results are presented by comparing adaptive load balancing with non-adaptive load balancing mechanism. The adaptive scheduling algorithm that uses the event management automation results in a better performance compared to the non-adaptive ones for a realistic heavy-tailed workload.
Dugki Min, Eunmi Choi

Analysis and Reasoning

A Formal Validation Model for the Netconf Protocol
Abstract
Netconf is a protocol proposed by the IETF that defines a set of operations for network configuration. One of the main issues of Netconf is to define operations such as validate and commit, which currently lack a clear description and an information model. We propose in this paper a model for validation based on XML schema trees. By using an existing logical formalism called TQL, we express important dependencies between parameters that appear in those information models, and automatically check these dependencies on sample XML trees in reasonable time. We illustrate our claim by showing different rules and an example of validation on a Virtual Private Network.
Sylvain Hallé, Rudy Deca, Omar Cherkaoui, Roger Villemaire, Daniel Puche
Using Object-Oriented Constraint Satisfaction for Automated Configuration Generation
Abstract
In this paper, we describe an approach for automatically generating configurations for complex applications. Automated generation of system configurations is required to allow large-scale deployment of custom applications within utility computing environments. Our approach models the configuration management problem as an Object-Oriented Constraint Satisfaction Problem (OOCSP) that can be solved efficiently using a resolution-based theorem-prover. We outline the approach and discuss both the benefits of the approach as well as its limitations, and highlight certain unresolved issues that require further work. We demonstrate the viability of this approach using an e-Commerce site as an example, and provide results on the complexity and time required to solve for the configuration of such an application.
Tim Hinrichs, Nathaniel Love, Charles Petrie, Lyle Ramshaw, Akhil Sahai, Sharad Singhal
Problem Determination Using Dependency Graphs and Run-Time Behavior Models
Abstract
Key challenges in managing an I/T environment for e-business lie in the area of root cause analysis, proactive problem prediction, and automated problem remediation. Our approach as reported in this paper, utilizes two important concepts: dependency graphs and dynamic runtime performance characteristics of resources that comprise an I/T environment to design algorithms for rapid root cause identification in case of problems. In the event of a reported problem, our approach uses the dependency information and the behavior models to narrow down the root cause to a small set of resources that can be individually tested, thus facilitating quick remediation and thus leading to reduced administrative costs.
Manoj K. Agarwal, Karen Appleby, Manish Gupta, Gautam Kar, Anindya Neogi, Anca Sailer

Trust and Security

Role-Based Access Control for XML Enabled Management Gateways
Abstract
While security is often supported in standard management frameworks, it has been insufficiently approached in most deployment and research initiatives. In this paper we address the provisioning of a security “continuum” for management frameworks based on XML/SNMP gateways. We provide an in depth security extension of such a gateway using the Role Based Access Control paradigm and show how to integrate our approach within a broader XML-based management framework.
V. Cridlig, O. Festor, R. State
Spotting Intrusion Scenarios from Firewall Logs Through a Case-Based Reasoning Approach
Abstract
Despite neglected by most security managers due to the low availability of tools, the content analysis of firewall logs is fundamental (a) to measure and identify accesses to external and private networks, (b) to access the historical growth of accesses volume and applications used, (c) to debug problems on the configuration of filtering rules and (d) to recognize suspicious event sequences that indicate strategies used by intruders in attempt to obtain non-authorized access to stations and services. This paper presents an approach to classify, characterize and analyze events generated by firewalls. The proposed approach explores the case-based reasoning technique, from the Artificial Intelligence field, to identify possible intrusion scenarios. The paper also describes the validation of our approach carried out based on real logs generated along one week by the university firewall.
Fábio Elias Locatelli, Luciano Paschoal Gaspary, Cristina Melchiors, Samir Lohmann, Fabiane Dillenburg
A Reputation Management and Selection Advisor Schemes for Peer-to-Peer Systems
Abstract
In this paper we propose a new and efficient reputation management scheme for partially decentralized peer-to-peer systems. The reputation scheme helps to build trust between peers based on their past experiences and the feedback from other peers. We also propose two selection advisor algorithms for helping peers select the right peer to download from. The simulation results show that the proposed schemes are able to detect malicious peers and isolate them from the system, hence reducing the amount of inauthentic uploads. Our approach also allows to uniformly distribute the load between non malicious peers.
Loubna Mekouar, Youssef Iraqi, Raouf Boutaba

Implementation, Instrumentation, Experience

Using Process Restarts to Improve Dynamic Provisioning
Abstract
Load variations are unexpected perturbations that can degrade performance or even cause unavailability of a system. There are efforts that attempt to dynamically provide resources to accommodate load fluctuations during the execution of applications. However, these efforts do not consider the existence of software faults, whose effects can influence the application behavior and its quality of service, and may mislead a dynamic provisioning system. When trying to tackle both problems simultaneously the fundamental issue to be addressed is how to differentiate a saturated application from a faulty one. The contributions of this paper are threefold. Firstly, we introduce the idea of taking software faults into account when specifying a dynamic provisioning scheme. Secondly, we define a simple algorithm that can be used to distinguish saturated from faulty software. By implementing this algorithm one is able to realize dynamic provisioning with restarts into a full server infrastructure data center. Finally, we implement this algorithm and experimentally demonstrate its efficacy.
Raquel V. Lopes, Walfredo Cirne, Francisco V. Brasileiro
Server Support Approach to Zero Configuration In-Home Networking
Abstract
This paper proposes a new server support approach to zero configuration in-home networking. We show three technical issues for zero configuration. Lack of a protocol or technique addressing all issues simultaneously motivated us to design a new approach based on (1) a two-stage autoconfiguration, (2) a UPnP and HTTP-based autoconfiguration, and (3) extended UPnP services. An elaborated flow for the global Internet connection from scratch will be presented. The proposed approach can obtain software and settings from remote servers, and updates/configures for devices. We implemented a system based on the proposed approach, and evaluated its total autoconfiguration time, and the number of technical calls to a help desk during a field trial for five months. We delivered a user-side configuration tool and an all-in-one modem to approximately 230,000 new aDSL subscribers as part of the trial system. Over 40 settings are properly configured for diverse devices in 14 minutes and 10 seconds, while the ratio of the number of calls to the number of new subscribers per month decreased from 14.9% to 8.2%.
Kiyohito Yoshihara, Takeshi Kouyama, Masayuki Nishikawa, Hiroki Horiuchi
Rule-Based CIM Query Facility for Dependency Resolution
Abstract
A distributed system is composed of various resources which have mutually complicated dependencies. The fact increases an importance of the dependency resolution facility which makes it possible to check if there is given dependency between resources such as a router, and to determine which resources have given dependencies with other resources. This paper addresses a CIM query facility for dependency resolution. Its main features are ease of query description, bi-directional query execution, and completeness of query capability to CIM. These features are performed by a rule-based language that enables interesting predicates to be defined declaratively, unification and backtracking, and the preparation of predicates corresponding to CIM metamodel elements. To validate this facility, it was applied in servers dynamically allocated to service providers in a data center. The basic behavior of the query facility and the dynamic server allocation was illustrated.
Shinji Nakadai, Masato Kudo, Koichi Konishi

Short Papers

Work in Progress: Availability-Aware Self-Configuration in Autonomic Systems
Abstract
The Unity project is a prototype autonomic system demonstrating and validating a number of ideas about self-managing computing systems. We are currently working to enhance the self-configuring and self-optimizing aspects of the system by incorporating the notion of component availability into the system’s policies, and into its models of itself.
David M. Chess, Vibhore Kumar, Alla Segal, Ian Whalley
ABHA: A Framework for Autonomic Job Recovery
Abstract
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; and rapidly integrating this information into the cluster architecture so that the failure is better mitigated in the future. The Agent Based High Availability (ABHA) system provides an API and a collection of services for building autonomic batch job recovery into cluster computing environments. An agent API allows users to define agents for failure diagnosis and recovery. It is currently being evaluated in the U.S. Department of Energy’s STAR project.
Charles Earl, Emilio Remolina, Jim Ong, John Brown, Chris Kuszmaul, Brad Stone
Can ISPs and Overlay Networks Form a Synergistic Co-existence?
Abstract
Overlay networks are becoming increasingly popular for their ability to provide effective and reliable service catered to specific applications [1,2,3]. For instance, this concept has been used in peer-to-peer services like SplitStream, content delivery networks like Akamai, resilient networks like RON and so on.
Ram Keralapura, Nina Taft, Gianluca Iannaccone, Chen-Nee Chuah
Simplifying Correlation Rule Creation for Effective Systems Monitoring
Abstract
Event correlation is a necessary component of systems management but is perceived as a difficult function to set up and maintain. We report on our work to develop a set of tools and techniques to simplify event correlation and thereby reduce overall operating costs. The tools prototyped are described and our current plans for future tool development outlined.
C. Araujo, A. Biazetti, A. Bussani, J. Dinger, M. Feridun, A. Tanner
Backmatter
Metadaten
Titel
Utility Computing
herausgegeben von
Akhil Sahai
Felix Wu
Copyright-Jahr
2004
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-30184-4
Print ISBN
978-3-540-23631-3
DOI
https://doi.org/10.1007/b102082