Skip to main content
Top

Open Access 2020 | Open Access | Book

Cover of the book

Towards Interoperable Research Infrastructures for Environmental and Earth Sciences

A Reference Model Guided Approach for Common Challenges

insite
SEARCH

About this book

This open access book summarises the latest developments on data management in the EU H2020 ENVRIplus project, which brought together more than 20 environmental and Earth science research infrastructures into a single community. It provides readers with a systematic overview of the common challenges faced by research infrastructures and how a ‘reference model guided’ engineering approach can be used to achieve greater interoperability among such infrastructures in the environmental and earth sciences.

The 20 contributions in this book are structured in 5 parts on the design, development, deployment, operation and use of research infrastructures. Part one provides an overview of the state of the art of research infrastructure and relevant e-Infrastructure technologies, part two discusses the reference model guided engineering approach, the third part presents the software and tools developed for common data management challenges, the fourth part demonstrates the software via several use cases, and the last part discusses the sustainability and future directions.

Table of Contents

Frontmatter

Data Management in Environmental and Earth Sciences

Frontmatter

Open Access

Supporting Cross-Domain System-Level Environmental and Earth Science
Abstract
Answering the key challenges for society due to environmental issues like climate change, pollution and loss of biodiversity, and making the right decisions to tackle these in a cost-efficient and sustainable way requires scientific understanding of the Earth System. This scientific knowledge can then be used to inform the general public and policymakers. Scientific understanding starts with having available the right data, often in the form of observations. Research Infrastructures (RIs) exist to perform these observations in the required quality and to make the data available to first of all the researchers. In the current Big Data era, the increasing challenge is to provide the data in an interoperable and machine-readable and understandable form. The European RIs on environment formed a project cluster called ENVRI that tackles these issues. In this chapter, we introduce the societal relevance of the environmental data produced by the RIs and discuss the issues at hand in providing the relevant data according to the so-called FAIR principles.
Alex Vermeulen, Helen Glaves, Sylvie Pouliquen, Alexandra Kokkinaki

Open Access

ICT Infrastructures for Environmental and Earth Sciences
Abstract
E-Infrastructures play an increasingly important part in the provision of digital services to environmental researchers and other users. The availability of reliable networks, storage facilities, high performance and high throughput computers and associated middleware and services to ease their utilisation all contribute to enabling research and its exploitation. Their relevance, possible use and utilisation to date are described.
Keith Jeffery, Antti Pursula, Zhiming Zhao

Open Access

Common Challenges and Requirements
Abstract
Research infrastructures available for researchers in environmental and Earth science are diverse and highly distributed; dedicated research infrastructures exist for atmospheric science, marine science, solid Earth science, biodiversity research, and more. These infrastructures aggregate and curate key research datasets and provide consolidated data services for a target research community, but they also often overlap in scope and ambition, sharing data sources, sometimes even sites, using similar standards, and ultimately all contributing data that will be essential to addressing the societal challenges that face environmental research today. Thus, while their diversity poses a problem for open science and multidisciplinary research, their commonalities mean that they often face similar technical problems and consequently have common requirements when addressing the implementation of best practices in curation, cataloguing, identification and citation, and other related core topics for data science.
In this chapter, we review the requirements gathering performed in the context of the cluster of European environmental and Earth science research infrastructures participating in the ENVRI community, and survey the common challenges identified from that requirements gathering process.
Barbara Magagna, Paul Martin, Abraham Nieva de la Hidalga, Malcolm Atkinson, Zhiming Zhao

Reference Model Guided System Design and Development

Frontmatter

Open Access

The ENVRI Reference Model
Abstract
Advances in automation, communication, sensing and computation enable experimental scientific processes to generate data at increasingly great speeds and volumes. Research infrastructures are devised to take advantage of these data, providing advanced capabilities for acquisition, sharing, processing, and analysis; enabling advanced research and playing an ever-increasing role in the environmental and Earth science research domain. The ENVRI community identified several recurring requirements in the development of environmental research infrastructures such as i) duplication of efforts to solve similar problems; ii) lack of standards to harmonise and accelerate development, and bring about interoperability; iii) a large number of data models and data information systems within the domain, and iv) a steep learning curve for integration complex research infrastructure systems. To address these challenges, the ENVRI community has developed and refined the Environmental Research Infrastructures Reference Model (ENVRI Reference Model or ENVRI RM), a modelling framework encoding this knowledge. The proposed modelling framework encompasses a language and a notation to describe the research domain, its systems and the requirements and challenges faced when implementing those systems. By adopting ENVRI RM as an integrative approach, the environmental research community can secure interoperability between infrastructures, enable reuse, share resources, experiences and common language, reduce unnecessary duplication of effort, and speed up the understanding of research infrastructure systems. This chapter provides a short introduction to the ENVRI RM.
Abraham Nieva de la Hidalga, Alex Hardisty, Paul Martin, Barbara Magagna, Zhiming Zhao

Open Access

Reference Model Guided Engineering
Abstract
Environmental research infrastructures (RIs) support their respective research communities by integrating large-scale sensor/observation networks with data curation and management services, analytical tools and common operational policies. These RIs are developed as service pillars for intra- and interdisciplinary research; however, comprehension of the complex, interconnected aspects of the Earth’s ecosystem increasingly requires that researchers conduct their experiments across infrastructure boundaries. Consequently, almost all data-related activities within these infrastructures, from data capture to data usage, need to be designed to be broadly interoperable in order to enable real interdisciplinary innovation and to improve service offerings through the development of common services. To address these interoperability challenges as they relate to the design, implementation and operation of environmental RIs, a Reference Model guided engineering approach was proposed and has been used in the context of the ENVRI cluster of RIs. In this chapter, we will discuss how the approach combines the ENVRI Reference Model with the practices of Agile systems development to design common data management services and to tackle the dynamic requirements of research infrastructures.
Zhiming Zhao, Keith Jeffery

Open Access

Semantic and Knowledge Engineering Using ENVRI RM
Abstract
The ENVRI Reference Model provides architects and engineers with the means to describe the architecture and operational behaviour of environmental and Earth science research infrastructures (RIs) in a standardised way using the standard terminology. This terminology and the relationships between specific classes of concept can be used as the basis for the machine-actionable specification of RIs or RI subsystems.
Open Information Linking for Environmental RIs (OIL-E) is a framework for capturing architectural and design knowledge about environmental and Earth science RIs intended to help harmonise vocabulary, promote collaboration and identify common standards and technologies across different research infrastructure initiatives. At its heart is an ontology derived from the ENVRI Reference Model. Using this ontology, RI descriptions can be published as linked data, allowing discovery, querying and comparison using established Semantic Web technologies. It can also be used as an upper ontology by which to connect descriptions of RI entities (whether they be datasets, equipment, processes, etc.) that use other, more specific terminologies.
The ENVRI Knowledge Base uses OIL-E to capture information about environmental and Earth science RIs in the ENVRI community for query and comparison. The Knowledge Base can be used to identify the technologies and standards used for particular activities and services and as a basis for evaluating research infrastructure subsystems and behaviours against certain criteria, such as compliance with the FAIR data principles.
Paul Martin, Xiaofeng Liao, Barbara Magagna, Markus Stocker, Zhiming Zhao

Common Data Management Services in Environmental RIs

Frontmatter

Open Access

Data Curation and Preservation
Abstract
Data is a valuable resource. In some scientific disciplines, experiments can be redone to reproduce the data. In environmental sciences, the observations and measurements of the earth and its surroundings commonly can be made only once: each time point records uniquely the state of the many earth processes. This demands that environmental data - structured to information - is preserved in such a way that it may be reused. Phenomena like the ozone hole, biodiversity and climate change depend on data curated over a long period of time. However, it is not just the data that must be curated. The software used to process and analyse the data - or more accurately an executable specification of the software - must be preserved along with associated libraries and computing operational environment. Information on the equipment and sensors used must be preserved since this affects the relevance and quality of the data for future use. Equally challenging is the decision to discard data - for reasons of costs of storage (although that is reducing rapidly) or cost of curation. Curation is blended inextricably with cataloguing and provenance and the core requirement is for rich metadata to characterise the digital asset for all three purposes.
Keith Jeffery

Open Access

Data Cataloguing
Abstract
After a brief reminder on general concepts used in data cataloguing activities, this chapter provides information concerning the architecture and design recommendations for the implementation of catalogue systems for the ENVRIplus community. The main objective of this catalogue is to offer a unified discovery service allowing cross-disciplinary search and access to data collections coming from Research Infrastructures (RIs). This catalogue focuses on metadata with a coarse level of granularity. It was decided to offer metadata representing different types of dataset series. Only metadata for so-called flagship products (as defined by each community) are covered by the scope of this catalogue. The data collections remain within each RI. For RIs, the aim is to improve the visibility of their results beyond their traditional user communities.
Erwann Quimbert, Keith Jeffery, Claudia Martens, Paul Martin, Zhiming Zhao

Open Access

Identification and Citation of Digital Research Resources
Abstract
Environmental research infrastructures are often built on a large number of distributed observational or experimental sites, run by hundreds of scientists and technicians, financially supported and administrated by a large number of institutions. It becomes very important to acknowledge the data sources and their providers. There is also a strong need for common data citation tracking systems that allow data providers to identify downstream usage of their data so as to demonstrate their importance and show the impact to stakeholders and the public. This chapter highlights identification and citation in environmental RIs, reviews available technologies and develops common services for these operations. This chapter presents a suggested common system design for Identification and Citation, as well as an outline for negotiations and discussions with publishers and other actors in the scholarly data management and curation world.
Margareta Hellström, Maria Johnsson, Alex Vermeulen

Open Access

Data Processing and Analytics for Data-Centric Sciences
Abstract
The development of data processing and analytics tools is heavily driven by applications, which results in a great variety of software solutions, which often address specific needs. It is difficult to imagine a single solution that is universally suitable for all (or even most) application scenarios and contexts. This chapter describes the data analytics framework that has been designed and developed in the ENVRIplus project to be (a) suitable for serving the needs of researchers in several domains including environmental sciences, (b) open and extensible both with respect to the algorithms and methods it enables and the computing platforms it relies on to execute those algorithms and methods, and (c) open-science-friendly, i.e. it is capable of incorporating every algorithm and method integrated into the data processing framework as well as any computation resulting from the exploitation of integrated algorithms into a “research object” catering for citation, reproducibility, repeatability and provenance.
Leonardo Candela, Gianpaolo Coro, Lucio Lelii, Giancarlo Panichi, Pasquale Pagano

Open Access

Virtual Infrastructure Optimisation
Abstract
The increasing volumes of data being produced, curated and made available by research infrastructures in the environmental science domain require services able to optimise the delivery staging and process of data on behalf of researchers. Specialised data services for managing the data lifecycle, for creating and delivering data products, and for customised data processing and analysis, all play a crucial role in how these research infrastructures serve their communities, and many of these activities are time-critical needing to be carried out frequently within specific time windows. We describe our experiences identifying the time-critical requirements of environmental scientists making use of computational research support environments. We also present a microservice-based infrastructure optimisation suite, the Dynamic Real-time Infrastructure Planner, used for constructing virtual infrastructures for research applications on demand. This chapter is partially based on a recent paper presented in [1].
Spiros Koulouzis, Paul Martin, Zhiming Zhao

Open Access

Data Provenance
Abstract
The provenance of research data is of critical importance to the reproducibility of and trust in scientific results. As research infrastructures provide more amalgamated datasets for researchers and more integrated facilities for processing and publishing data, the capture of provenance in a standard, machine-actionable form becomes especially important. Significant progress has already been made in providing standards and tools for provenance tracking, but the integration of these technologies into research infrastructure remains limited in many scientific domains. Further development and collaboration are required to provide frameworks for provenance capture that can be adopted by as widely as possible, facilitating interoperability as well as dataset reuse. In this chapter, we examine the current state of the art for provenance, and the current state of provenance capture in environmental and Earth science research infrastructures in Europe, as surveyed in the course of the ENVRIplus project. We describe a service developed for the upload, dissemination and application of provenance templates that can be used to generate standardised provenance traces from input data in accordance with current best practice and standards. The use of such a service by research infrastructure architects and researchers can expedite both the understanding and use of provenance technologies, and so drive the standard use of provenance capture technologies in future research infrastructure developments.
Barbara Magagna, Doron Goldfarb, Paul Martin, Malcolm Atkinson, Spiros Koulouzis, Zhiming Zhao

Open Access

Semantic Linking of Research Infrastructure Metadata
Abstract
The use of metadata to characterise scientific datasets, making data easier to discover and use directly by researchers and via various online data services, is one of the primary concerns of research infrastructures (RIs); also, of concern is the use of metadata to describe equipment, facilities, services and other research assets. Metadata models and terminology differ greatly between different communities and infrastructures however, and so make synthesising complex interdisciplinary scientific workflows involving assets from multiple RIs very challenging.
‘Semantic linking’ addresses the need to enhance the interoperability of RI services and data by bridging metadata schemes, ontologies and vocabularies used by different research communities, whether by standardising the terminologies and schemes used by those communities, or by dynamically transforming metadata from one standard to another when retrieved by services on behalf of researchers executing their scientific workflows.
Multiple techniques for and modes of semantic linking have been investigated in the context of the ENVRI community cluster of environmental and Earth science RIs, including top-down modelling of entities and activities within a standard reference model, enrichment of existing metadata records with shared terminology, full transformation of metadata records from one standard to another, and the generation of additional links to existing online data. We review some of these activities and their application to the promotion of semantic interoperability between RIs, and discuss other possibilities and recent developments that may also be useful for enhancing interdisciplinary data science.
Paul Martin, Barbara Magagna, Xiaofeng Liao, Zhiming Zhao

Open Access

Authentication, Authorization, and Accounting
Abstract
Environmental research infrastructures and data providers are often required to authenticate researchers and manage their access rights to scientific data, sensor instruments or online computing resources. It is widely acknowledged that Authentication, Authorization and Accounting (AAA) play a crucial role in providing a secure distributed digital environment. This chapter reviews the advanced AAA technology and best practices in the existing pan-European e-Infrastructures. It also discusses the challenging issues of interoperability in federated access and presents state-of-the-art solutions.
Alessandro Paolini, Diego Scardaci, Nicolas Liampotis, Vincenzo Spinoso, Baptiste Grenier, Yin Chen

Open Access

Virtual Research Environments for Environmental and Earth Sciences: Approaches and Experiences
Abstract
Virtual Research Environments (VREs) are playing an increasingly important role in data centric sciences. Also, the concept is known as Science Gateways in North America where generally the functionality is portal plus workflow deployment and Virtual Laboratories in Australia where the end-user can compose a complete system from the user interface to use of e-Infrastructures by a ‘pick and mix’ process from the offered assets. The key aspect is to provide an environment wherein the end-user - researcher, policymaker, commercial enterprise or citizen scientist - has available with an integrating interface all the assets needed to achieve their objectives. These aspects are explored through different approaches related to ENVRI.
Keith Jeffery, Leonardo Candela, Helen Glaves

Case Studies

Frontmatter

Open Access

Case Study: Data Subscriptions Using Elastic Cloud Services
Abstract
To perform data-centric research in environmental and earth sciences, researchers need effectively query, select and access data products from different research infrastructures. When providing observation data continuously, infrastructure is expected to create and deliver customised data products, e.g. for specific geo-regions, time durations or observation parameters, to enhance its ability to serve the research communities. Such kind of services often have time-critical requirements; some tasks need to be carried out within specific time windows when the data products are needed for real-time modelling or simulation frameworks.
Spiros Koulouzis, Thierry Carval, Jani Heikkinen, Antti Pursula, Zhiming Zhao

Open Access

Case Study: ENVRI Science Demonstrators with D4Science
Abstract
Whenever a community of practice starts developing an IT solution for its use case(s) it has to face the issue of carefully selecting “the platform” to use. Such a platform should match the requirements and the overall settings resulting from the specific application context (including legacy technologies and solutions to be integrated and reused, costs of adoption and operation, easiness in acquiring skills and competencies). There is no one-size-fits-all solution that is suitable for all application context, and this is particularly true for scientific communities and their cases because of the wide heterogeneity characterising them. However, there is a large consensus that solutions from scratch are inefficient and services that facilitate the development and maintenance of scientific community-specific solutions do exist. This chapter describes how a set of diverse communities of practice efficiently developed their science demonstrators (on analysing and producing user-defined atmosphere data products, greenhouse gases fluxes, particle formation, mosquito diseases) by leveraging the services offered by the D4Science infrastructure. It shows that the D4Science design decisions aiming at streamlining implementations are effective. The chapter discusses the added value injected in the science demonstrators and resulting from the reuse of D4Science services, especially regarding Open Science practices and overall quality of service.
Leonardo Candela, Markus Stocker, Ingemar Häggström, Carl-Fredrik Enell, Domenico Vitale, Dario Papale, Baptiste Grenier, Yin Chen, Matthias Obst

Open Access

Case Study: LifeWatch Italy Phytoplankton VRE
Abstract
LifeWatch Italy, the Italian node of LifeWatch ERIC, has promoted and stimulated the debate on the use of semantics in biodiversity data management. Actually, biodiversity and ecosystems data are very heterogeneous and need to be better managed to improve the actual scientific knowledge extracted, as well as to address the urgent societal challenges concerning environmental issues. LifeWatch Italy has realized the Phytoplankton Virtual Research Environment (hereafter Phytoplankton VRE), a collaborative working environment supporting researchers to address basic and applied studies on phytoplankton ecology. The Phytoplankton VRE provides the IT infrastructure to enable researchers to obtain, share and analyse phytoplankton data at a level of resolution from individual cells to whole assemblages. A semantic approach has been used to address data harmonisation, integration and discovery: an interdisciplinary team has developed a thesaurus on phytoplankton functional traits and linked its concepts to other existing conceptual schemas related to the specific domain.
Elena Stanca, Nicola Fiore, Ilaria Rosati, Lucia Vaira, Francesco Cozzoli, Alberto Basset

Sustainability and Future Challenges

Frontmatter

Open Access

Towards Cooperative Sustainability
Abstract
The inescapable question in the ENVRIplus project is how to sustain all achievements after the end of a collaborative project. Considering that each individual research infrastructure as cooperating in the ENVRIplus project has its own legal entity with dedicated governance and management, the challenge is to agree on modes of cooperation to keep tools and services of common interest up to date and operational. This chapter starts with the views of stakeholders, more specifically the views of scientific bodies, policy bodies, and of the infrastructure managers and operators who have to keep a lot of balls in the air. The sustainability plan has to consider the influences from external developments such as the European Strategy Forum on Research Infrastructures (ESFRI), and the emergence of a European Open Science Cloud (EOSC). The chapter discusses ENVRI strategic views, position, and future challenges.
Wouter Los

Open Access

Towards Operational Research Infrastructures with FAIR Data and Services
Abstract
Environmental research infrastructures aim to provide scientists with facilities, resources and services to enable scientists to effectively perform advanced research. When addressing societal challenges such as climate change and pollution, scientists usually need data, models and methods from different domains to tackle the complexity of the complete environmental system. Research infrastructures are thus required to enable all data, including services, products, and virtual research environments is FAIR for research communities: Findable, Accessible, Interoperable and Reusable. In this last chapter, we conclude and identify future challenges in research infrastructure operation, user support, interoperability, and future evolution.
Zhiming Zhao, Keith Jeffery, Markus Stocker, Malcolm Atkinson, Andreas Petzold
Backmatter
Metadata
Title
Towards Interoperable Research Infrastructures for Environmental and Earth Sciences
Editors
Dr. Zhiming Zhao
Margareta Hellström
Copyright Year
2020
Electronic ISBN
978-3-030-52829-4
Print ISBN
978-3-030-52828-7
DOI
https://doi.org/10.1007/978-3-030-52829-4

Premium Partner