Skip to main content
Top

2011 | Book

Guide to e-Science

Next Generation Scientific Research and Discovery

insite
SEARCH

About this book

This guidebook on e-science presents real-world examples of practices and applications, demonstrating how a range of computational technologies and tools can be employed to build essential infrastructures supporting next-generation scientific research. Each chapter provides introductory material on core concepts and principles, as well as descriptions and discussions of relevant e-science methodologies, architectures, tools, systems, services and frameworks. Features: includes contributions from an international selection of preeminent e-science experts and practitioners; discusses use of mainstream grid computing and peer-to-peer grid technology for “open” research and resource sharing in scientific research; presents varied methods for data management in data-intensive research; investigates issues of e-infrastructure interoperability, security, trust and privacy for collaborative research; examines workflow technology for the automation of scientific processes; describes applications of e-science.

Table of Contents

Frontmatter

Sharing and Open Research

Frontmatter
Chapter 1. Implementing a Grid/Cloud eScience Infrastructure for Hydrological Sciences
Abstract
The objective of this chapter is to describe building an eScience infrastructure suitable for use with environmental sciences and especially with hydrological science applications. The infrastructure allows a wide range of hydrological problems to be investigated and is particularly suitable for either computationally intensive or multiple scenario applications. To accomplish this objective, this research discovered the shortcomings of current grid infrastructures for hydrological science and developed missing components to fill this gap. In particular, there were three primary areas which needed work: first, integrating data and computing grids; second, visualization of geographic information from grid outputs; and third, implementing hydrological simulations based on this infrastructure. This chapter focuses on the first area, which is focusing on grid infrastructure system integration and development. A grid infrastructure, which consists of a computing and a data grid, has been built. In addition, the computing grid has been extended to utilize the Amazon EC2 cloud computing resources. Users can conduct a complete simulation job life cycle from job submission, and data management to metadata management based on the tools available in the infrastructure.
Gen-Tao Chiang, Martin T. Dove, C. Isabella Bovolo, John Ewen
Chapter 2. The German Grid Initiative D-Grid: Current State and Future Perspectives
Abstract
The D-Grid is a German national academic Grid initiative, which has been established in 2004. Since then, a variety of resource providers offer resources and services to a large and heterogeneous group of user communities. First, this chapter describes in detail the D-Grid e-Infrastructure as it is operated today. Apart from a brief historical digression, D-Grid’s organizational structure and its infrastructure are introduced, complemented by the description of two example user communities. Based on the current state, this chapter then provides a discussion on how D-Grid’s future may look like with virtualization and Cloud computing striding ahead. To this end, a prototype system with coexisting Grid and Cloud middleware is introduced, challenges at resource level are identified, and possible solutions are highlighted. Furthermore, the integration of state-of-the-art service level management, which enables D-Grid providers to guarantee distinct service levels, is discussed.
Stefan Freitag, Philipp Wieder
Chapter 3. Democratizing Resource-Intensive e-Science Through Peer-to-Peer Grid Computing
Abstract
The new ways of doing science rooted on the unprecedented processing, communication, and storage infrastructure that became available to scientists are collectively called e-Science. Due to their nature, most e-Science activities can only be successfully performed if researchers have access to high-performance computing facilities. Grid and voluntary computing are well-established solutions that cater to this need, but are not accessible to all labs and institutions. Peer-to-peer (P2P) grid computing has been proposed to address this very problem. In this chapter, we share our experience in developing a P2P grid middleware called OurGrid and deploying it to build the OurGrid Community. We describe the mechanisms that effectively promote collaboration and allow the assemblage of large P2P grids from the contributions of thousands of small sites. This includes a thorough review of the main mechanisms required to support the execution of bag-of-tasks applications on top of P2P grids: accounting, scheduling, security, and data caching. Besides, we discuss ways to allow P2P grids to interoperate with service grids. We also report a success case in the utilization of the OurGrid middleware in the context of e-Science. Finally, we summarize our experience in this area indicating the lessons we have learned, the present challenges, and future directions of research.
Francisco Brasileiro, Nazareno Andrade, Raquel Lopes, Lívia Sampaio
Chapter 4. Peer4Peer: e-Science Community for Network Overlay and Grid Computing Research
Abstract
This chapter describes a novel approach to Grid and overlay network research that leverages distributed infrastructures and multi-core machines enabling increased simulation complexity and speed. We present its motivation, background, current shortcomings, and the core architectural concepts of the novel research proposed. This is an ongoing effort to further our peer-to-peer cycle-sharing platform by providing a scalable, efficient, and reliable simulation substrate for the Grid and overlay topologies developed by the research community. Thus, Grid and overlay simulations are improved due to (1) increased scalability of simulation tools with a novel parallel, distributed, and decentralized architecture; (2) harnessing the power of idle CPU cycles spread around the Internet as a desktop Grid (over a peer-to-peer overlay); and (3) a framework for topology definition, dissemination, evaluation, and reuse which eases Grid and overlay research. The infrastructure, simulation engine, topology modeling language (TML), management services, and portal comprise a cloud-like platform for overlay research.
Luís Veiga, João Nuno Silva, João Coelho Garcia

Data-intensive e-Science

Chapter 5. A Multidisciplinary, Model-Driven, Distributed Science Data System Architecture
Abstract
The twenty-first century has transformed the world of science by breaking the physical boundaries of distributed organizations and interconnecting them into virtual science environments, allowing for systems and systems of systems to seamlessly access and share information and resources across highly geographically distributed areas. This e-science transformation is enabling new scientific discoveries by allowing for greater collaboration as well as by enabling systems to combine and correlate disparate data sets. At the Jet Propulsion Laboratory in Pasadena, California, we have been developing science data systems for highly distributed communities in physical and life sciences that require extensive sharing of distributed services and common information models based on common architectures. The common architecture contributes a set of atomic functions, interfaces, and information models that support sharing and distributed processing. Additionally, the architecture provides a blueprint for a software product line known as the Object Oriented Data Technology (OODT) framework. OODT has enabled reuse of software for science data generation, capture and management, and delivery across highly distributed organizations for planetary science, earth science, and cancer research. Our experience to date shows that a well-defined architecture and set of accompanied software vastly improves our ability to develop road maps for and to construct virtual science environments.
Daniel J. Crichton, Chris A. Mattmann, John S. Hughes, Sean C. Kelly, Andrew F. Hart
Chapter 6. Galaxy: A Gateway to Tools in e-Science
Abstract
e-Science focuses on the use of computational tools and resources to analyze large scientific datasets. Performing these analyses often requires running a variety of computational tools specific to a given scientific domain. This places a significant burden on individual researchers for whom simply running these tools may be prohibitively difficult, let alone combining tools into a complete analysis, or acquiring data and appropriate computational resources. This limits the productivity of individual researchers and represents a significant barrier to potential scientific discovery. In order to alleviate researchers from such unnecessary complexities and promote more robust science, we have developed a tool integration framework called Galaxy; Galaxy abstracts individual tools behind a consistent and easy-to-use web interface to enable advanced data analysis that requires no informatics expertise. Furthermore, Galaxy facilitates easy addition of developed tools, thus supporting tool developers, as well as transparent and reproducible communication of computationally intensive analyses. Recently, we have enabled trivial deployment of complete a Galaxy solution on aggregated infrastructures, including cloud computing providers.
Enis Afgan, Jeremy Goecks, Dannon Baker, Nate Coraor, The Galaxy Team, Anton Nekrutenko, James Taylor
Chapter 7. An Integrated Ontology Management and Data Sharing Framework for Large-Scale Cyberinfrastructure
Abstract
Large-scale cross-disciplinary scientific collaborations require an overarching semantics-based and service-oriented cyberinfrastructure. However, the ad hoc and incoherent integration of computational and storage resources, data sources from sensor networks, as well as scientific data sharing and knowledge inference models cannot effectively support cross-domain and collaborative scientific research. Thus, we propose an integrated ontology management and data sharing framework which builds upon the advancements in object-oriented database design, semantic Web, and service-oriented architecture to form the key data sharing backbone. The framework has been implemented to cater for data sharing needs for large-scale sensor deployments from disparate scientific domains. This enables each participating scientific community to publish, search, and access the data across the cyberinfrastructure in a service-oriented manner, accompanied by the domain-specific knowledge.
Mudasser Iqbal, Wenqiang Wang, Cheng Fu, Hock Beng Lim

Collaborative Research

Frontmatter
Chapter 8. An e-Science Cyberinfrastructure for Solar-Enabled Water Production and Recycling
Abstract
We propose an e-Science cyberinfrastructure to support the scientific processes of a solar-enabled water production and recycling application. It forms the key resource sharing backbone that allows each participating scientific process to expose its sensor, instrument, data, and intellectual resources in a service-oriented manner, accompanied by domain-specific resource knowledge. The cyberinfrastructure integrates sensor grid, service-oriented architecture, and semantic Web in an innovative manner. We discuss the design of the ontology to describe the resources in the project, such as data, services, computational and storage resources. An object-oriented database is designed for flexible and scalable data storage. Data management issues are discussed within the context of the project requirements. Various data services are created to meet the needs of the users. In this manner, the cyberinfrastructure facilitates the resource sharing among the participants of the project. Complex workflows are also supported by the proposed cyberinfrastructure.
Yuxia Yao, Hock Beng Lim, Chee Keong Chan, Fook Hoong Choo
Chapter 9. e-Science Infrastructure Interoperability Guide: The Seven Steps Toward Interoperability for e-Science
Abstract
This chapter investigates challenges and provides proven solutions in the context of e-science infrastructure interoperability, because we want to guide worldwide infrastructure interoperability efforts. This chapter illustrates how an increasing number of e-scientists can take advantage of using different types of e-science infrastructures jointly for their e-research activities. The goal is to give readers who are working in computationally driven research infrastructures (e.g., as within European Strategy Forum on Research Infrastructures (ESFRIs) scientific user community projects) the opportunity to transfer processes to their particular situations. Hence, although the examples and processes of this chapter are closely aligned with specific setups in Europe, many lessons learned can be actually used in similar environments potentially arising from ESFRI projects that seek to use the computational resources within EGI and PRACE via their own research infrastructure, techniques, and tools. Furthermore, we emphasize that readers should get a sense of the concept and benefits of interoperability, especially by using sustainable standard-based approaches.
Since several decades, traditional scientific computing has been seen as a third pillar alongside theory and experiment and since 10 years the grid community has provided a solid e-science infrastructure base for these pillars to achieve e-science. e-Science is known for new kinds of collaboration in key areas of science through resource sharing using that infrastructure. But a closer look reveals that this base is realized by a wide variety of e-science infrastructures today while we observe an increasing demand by e-scientists for the use of more than one infrastructure to achieve e-science. One of the relatively new “e-science design pattern” in this context is the use of algorithms through scientific workflows that use concepts of both high-throughput computing (HTC) and high performance computing (HPC) with production applications of e-science infrastructures today.
This chapter illustrates ways and examples of realizing this infrastructure interoperability e-science design pattern and will therefore review existing reference models and architectures that are known to promote interoperability, such as the open grid forum (OGF) open grid services architecture (OGSA), the common component architecture (CCA), and the Organization for the Advancement of Structured Information Standards (OASIS) service component architecture (SCA). The review of these reference models and architectures provides insights into numerous limitations that arise due to not having suitable reference models in the community or because of following numerous proprietary approaches in case-by-case interoperability efforts without using any standards at all.
As its main contribution, this chapter therefore reveals a concrete seven-step plan to guide infrastructure interoperability processes. So far, reference models in grids have only addressed component-level interoperability aspects such as concrete functionality and semantics. In contrast, we change the whole process of production e-science infrastructure interoperability into a concrete seven step–based plan to achieve it while ensuring a concrete production grid impact. This impact is in turn another important contribution of this chapter, which we can see in the light of separating the “e-science hype” from “e-science production infrastructure reality.” Hence, this chapter not only presents how technical interoperability can be achieved with current production infrastructures, but also gives insights on operational, policy, and sustainability aspects, thus giving a complementary guidance for worldwide grids and emerging research infrastructures (i.e., ESFRIs or other virtual science communities), as well as their technology providers and e-scientists.
This chapter illustrates how the aforementioned steps can significantly support the process of establishing grid interoperability and, furthermore, gives concrete examples for each step in the context of real e-research problems and activities. The chapter also puts the processes into the context of interoperability field studies and uses cases in the field of fusion science (EUFORIA) and bioinformatics (WISDOM and Virtual Physiological Human).
Morris Riedel
Chapter 10. Trustworthy Distributed Systems Through Integrity-Reporting
Abstract
With the growing influence of e-Science, substantial quantities of research are being facilitated, recorded, and reported by means of distributed computing. As a result, the scope for malicious intervention continues to grow and so do the rewards available to those able to steal the models and data that have significant commercial value. Researchers are often reluctant to exploit the full benefits of distributed computing because they fear the compromise of their sensitive data or the uncertainty of the returned results. In this chapter, we propose two types of trustworthy distributed systems – one suitable for a computational system and the other for a distributed data system. Central to these systems is the novel idea of configuration resolver, which, in both designs, is responsible for filtering trustworthy hosts and ensuring that jobs are dispatched to those considered trustworthy. Furthermore, the blind analysis server enables statistical analyses to be performed on sensitive raw data – collected from multiple sites – without disclosing it to anyone.
Jun Ho Huh, Andrew Martin
Chapter 11. An Intrusion Diagnosis Perspective on Cloud Computing
Abstract
Cloud computing is an emerging paradigm with virtual machine as its enabling technology. As with any other Internet-based technology, security underpins widespread success of Cloud computing. However, Cloud computing introduces new challenges with respect to security mainly due to the unique characteristics inherited via virtual machine technology. In this chapter, we focus on the challenges imposed on intrusion diagnosis for Clouds due to these characteristics. In particular, we identify the importance of intrusion diagnosis problem for Clouds and the novel challenges for intrusion diagnosis for Clouds. Also, we propose a solution to address these challenges and demonstrate the effectiveness of the proposed solution with empirical evaluation.
Junaid Arshad, Paul Townend, Jie Xu

Collaborative Research

Frontmatter
Chapter 12. Conventional Workflow Technology for Scientific Simulation
Abstract
Workflow technology is established in the business domain for several years. This fact suggests the need for detailed investigations in the qualification of conventional workflow technology for the evolving application domain of e-Science. This chapter discusses the requirements on scientific workflows, the state of the art of scientific workflow management systems as well as the ability of conventional workflow technology to fulfill requirements of scientists and scientific applications. It becomes clear that the features of conventional workflows can be advantageous for scientists but also that thorough enhancements are needed. We therefore propose a conceptual architecture for scientific workflow management systems based on the business workflow technology as well as extensions of existing workflow concepts in order to improve the ability of established workflow technology to be applied in the scientific domain with focus on scientific simulations.
Katharina Görlach, Mirko Sonntag, Dimka Karastoyanova, Frank Leymann, Michael Reiter
Chapter 13. Facilitating e-Science Discovery Using Scientific Workflows on the Grid
Abstract
e-Science has been greatly enhanced from the developing capability and usability of cyberinfrastructure. This chapter explains how scientific workflow systems can facilitate e-Science discovery in Grid environments by providing features including scientific process automation, resource consolidation, parallelism, provenance tracking, fault tolerance, and workflow reuse. We first overview the core services to support e-Science discovery. To demonstrate how these services can be seamlessly assembled, an open source scientific workflow system, called Kepler, is integrated into the University of California Grid. This architecture is being applied to a computational enzyme design process, which is a formidable and collaborative problem in computational chemistry that challenges our knowledge of protein chemistry. Our implementation and experiments validate how the Kepler workflow system can make the scientific computation process automated, pipelined, efficient, extensible, stable, and easy-to-use.
Jianwu Wang, Prakashan Korambath, Seonah Kim, Scott Johnson, Kejian Jin, Daniel Crawl, Ilkay Altintas, Shava Smallen, Bill Labate, Kendall N. Houk
Chapter 14. Concepts and Algorithms of Mapping Grid-Based Workflow to Resources Within an SLA Context
Abstract
With the popularity of Grid-based workflow, ensuring the Quality of Service (QoS) for workflow by Service Level Agreements (SLAs) is an emerging trend in the business grid. Among many system components for supporting SLA-aware Grid-based workflow, the SLA mapping mechanism is allotted an important position as it is responsible for assigning sub-jobs of the workflow to Grid resources in a way that meets the user’s deadline and minimizes costs. To meet those requirements, the resource in each Grid site must be reserved and the user must provide the estimated runtime of each sub-job correlated with a resource configuration. With many different kinds of sub-jobs and resources, the process of mapping a Grid-based workflow within an SLA context defines an unfamiliar and difficult problem. To solve this problem, this chapter describes related concepts and mapping algorithms. In particular, several suboptimization algorithms to map sub-jobs of the workflow to the Grid resources within an SLA context are described. The simulation results show the efficiency of those mapping algorithms.
Dang Minh Quan, Odej Kao, Jörn Altmann
Chapter 15. Orchestrating e-Science with the Workflow Paradigm: Task-Based Scientific Workflow Modeling and Executing
Abstract
e-Science usually involves a great number of data sets, computing resources, and large teams managed and developed by research laboratories, universities, or governments. Science processes, if deployed in the workflow forms, can be managed more effectively and executed more automatically. Scientific workflows have therefore emerged and been adopted as a paradigm to organize and orchestrate activities in e-Science processes. Differing with workflows applied in the business world, however, scientific workflows need to take account of specific characteristics of science processes and make corresponding changes to accommodate those specific characteristics. A task-based scientific workflow modeling and executing approach is therefore proposed in this chapter for orchestrating e-Science with the workflow paradigm. Besides, this chapter also discusses some related work in the scientific workflow field.
Xiping Liu, Wanchun Dou, Jinjun Chen

e-Science: easy Science

Chapter 16. Face Recognition Using Global and Local Salient Features
Abstract
This chapter presents a robust face recognition technique which is based on the extraction of Scale Invariant Feature Transform (SIFT) features from the face areas. It uses both a global and local matching strategy. The local strategy is based on matching individual salient facial SIFT features as connected to facial landmarks such as the eyes and the mouth. As for the global matching strategy, all SIFT features are combined together to form a single feature. The Dempster–Shafer decision theory is applied to fuse the two matching strategies. The proposed technique has been evaluated with the Indian Institute of Technology Kanpur (IITK), Olivetti Research Laboratory (ORL) (formerly known as AT&T face database), and the Yale face databases. The experimental results demonstrate the effectiveness and potential of the proposed face recognition technique also in cases of partially occluded faces or with missing information. Besides this, some state-of-the-art face recognition techniques have been presented and the current face-matching technique is compared with those techniques while all the matching techniques use SIFT descriptors as local features.
Dakshina Ranjan Kisku, Phalguni Gupta, Jamuna Kanta Sing, Massimo Tistarelli
Chapter 17. OGSA-Based SOA for Collaborative Cancer Research: System Modeling and Generation
Abstract
The CancerGrid consortium is developing open-standards cancer informatics to address the challenges posed by modern cancer clinical trials. This chapter presents a framework for the metamodel-driven development of Open Grid Services Architecture (OGSA)-based Service-Oriented Architecture (SOA) for collaborative cancer research. We extend the existing Z model and the generation technology to support OGSA in a distributed collaborative environment. A generic SOA model is built based on a combination of the semantics of a standard domain metamodel and metadata, and the Web Services Resource Framework (WSRF) standards. This model is then employed to automate the generation of the trial management systems used in cancer clinical trials. The integration of the Web services standards with the standard domain metamodel enables the generated systems to support syntactic, semantic, and computational interoperability that is essential for collaborative cancer research. Automating the model-driven system generation not only speeds up its development, but also enforces its conformance to these standards. The SOA model and generated system are currently being evaluated for use in early-phase clinical trials. Our approach is also applicable to other research areas.
Tianyi Zang, Radu Calinescu, Marta Kwiatkowska
Chapter 18. e-Science, the Way Leading to Modernization of Sciences and Technologies: e-Science Practice and Thought in Chinese Academy of Sciences
Abstract
This chapter mainly introduces our understanding and practice of e-Science in the Chinese Academy of Sciences. We present the current situation of the information infrastructure from five aspects including digital network and communication infrastructure, high performance computing environment, scientific data environment, digital library, and virtual laboratory. In terms of e-Science applications, we focus on an e-Science application conducted in Qinghai Lake region to show how various information and communication technologies can be employed to facilitate the scientific research, providing an infrastructure for protecting wildlife and ecological environment and decision-making. We have realized that e-Science is the way leading to the next-generation scientific research, and we have been promoting e-Science practice and application systematically. By e-Science, to easy Science.
Baoping Yan, Wenzhuang Gui, Ze Luo, Gang Qin, Jian Li, Kai Nan, Zhonghua Lu, Yuanchun Zhou
Backmatter
Metadata
Title
Guide to e-Science
Editors
Xiaoyu Yang
Lizhe Wang
Wei Jie
Copyright Year
2011
Publisher
Springer London
Electronic ISBN
978-0-85729-439-5
Print ISBN
978-0-85729-438-8
DOI
https://doi.org/10.1007/978-0-85729-439-5

Premium Partner