Skip to main content
Top

2021 | Book

Metadata and Semantic Research

14th International Conference, MTSR 2020, Madrid, Spain, December 2–4, 2020, Revised Selected Papers

insite
SEARCH

About this book

This book constitutes the thoroughly refereed proceedings of the 14th International Conference on Metadata and Semantic Research, MTSR 2020, held in Madrid, Spain, in December 2020. Due to the COVID-19 pandemic the conference was held online.
The 24 full and 13 short papers presented were carefully reviewed and selected from 82 submissions. The papers are organized in the following tracks: metadata, linked data, semantics and ontologies; metadata and semantics for digital libraries, information retrieval, big, linked, social and open data; metadata and semantics for agriculture, food, and environment, AgroSEM 2020; metadata and semantics for open repositories, research information systems and data infrastructures; digital humanities and digital curation, DHC 2020; metadata and semantics for cultural collections and applications; european and national projects; knowledge IT artifacts (KITA) in professional communities and aggregations, KITA 2020.

Table of Contents

Frontmatter

Metadata, Linked Data, Semantics and Ontologies - General Session

Frontmatter
Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species

Biodiversity image repositories are crucial sources for training machine learning approaches to support biological research. Metadata about object (e.g. image) quality is a putatively important prerequisite to selecting samples for these experiments. This paper reports on a study demonstrating the importance of image quality metadata for a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found anatomical feature visibility was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support machine learning.

Jeremy Leipzig, Yasin Bakis, Xiaojun Wang, Mohannad Elhamod, Kelly Diamond, Wasila Dahdul, Anuj Karpatne, Murat Maga, Paula Mabee, Henry L. Bart Jr., Jane Greenberg
Class and Instance Equivalences in the Web of Linked Data: Distribution and Graph Structure

The Web of Linked Open Data (LOD) is a decentralized effort in publishing datasets using a set of conventions to make them accesssible, notably thought RDF and SPARQL. Links across nodes in published datasets are thus critical in getting value for the LOD cloud as a collective effort. Connectivity among the datasets can occur through these links. Equivalence relationship is one of the fundamental links that connects different schemas or datasets, and is used to assert either class or instance equivalence. In this article, we report an empirical study on the equivalences found in over 59 million triples from datasets accessible via SPARQL endpoints in open source data portals. Metrics from graph analysis have been used to examine the relationships between repositories and determine their relative importance as well as their ability to facilitate knowledge discovery.

Salvador Sanchez-Alonso, Miguel A. Sicilia, Enayat Rajabi, Marçal Mora-Cantallops, Elena Garcia-Barriocanal
Predicting the Basic Level in a Hierarchy of Concepts

The “basic level”, according to experiments in cognitive psychology, is the level of abstraction in a hierarchy of concepts at which humans perform tasks quicker and with greater accuracy than at other levels. We argue that applications that use concept hierarchies could improve their user interfaces if they ‘knew’ which concepts are the basic level concepts. This paper examines to what extent the basic level can be learned from data. We test the utility of three types of concept features, that were inspired by the basic level theory: lexical features, structural features and frequency features. We evaluate our approach on WordNet, and create a training set of manually labelled examples from different part of WordNet. Our findings include that the basic level concepts can be accurately identified within one domain. Concepts that are difficult to label for humans are also harder to classify automatically. Our experiments provide insight into how classification performance across different parts of the hierarchy could be improved, which is necessary for identification of basic level concepts on a larger scale.

Laura Hollink, Aysenur Bilgin, Jacco van Ossenbruggen
Validating SAREF in a Smart Home Environment

SAREF is an ontology created to enable interoperability between smart devices. While the IoT community has shown interest and understanding of SAREF as a means for interoperability, there is a lack in the literature of practical examples to implement SAREF in real applications. In order to validate the practical implementation of SAREF we perform two experiments. First we map IoT data available in a smart home into RDF using SAREF. In the second part of the paper an IoT environment is created by using the Knowledge Engine, a framework created to allow communication between smart devices, operating on Raspberry Pi’s emulating IoT devices, where the communication of the IoT devices is performed by sharing knowledge represented with SAREF. These experiments demonstrate that SAREF is an ontology that is successfully applicable in different situations, with data-mapping showing that SAREF is able to represent the information of different smart devices and by using the Knowledge Engine showing that SAREF can enable interoperability between smart devices.

Roderick van der Weerdt, Victor de Boer, Laura Daniele, Barry Nouwt
An Approach for Representing and Storing RDF Data in Multi-model Databases

The emergence of NoSQL multi-model databases, natively supporting scalable and unified storage and querying of various data models, presents new opportunities for storing and managing RDF data. In this paper, we propose an approach to store RDF data in multi-model databases. We identify various aspects of representing the RDF data structure into a multi-model data structure and discuss their advantages and disadvantages. Furthermore, we implement and evaluate the proposed approach in a prototype using ArangoDB—a popular multi-model database.

Simen Dyve Samuelsen, Nikolay Nikolov, Ahmet Soylu, Dumitru Roman
Knowledge-Based Named Entity Recognition of Archaeological Concepts in Dutch

The advancement of Natural Language Processing (NLP) allows the process of deriving information from large volumes of text to be automated, making text-based resources more discoverable and useful. The attention is turned to one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature”. The paper presents the development and evaluation of a Named Entity Recognition system of Dutch archaeological grey literature targeted at extracting mentions of artefacts, archaeological features, materials, places and time entities. The role of domain vocabulary is discussed for the development of a KOS-driven NLP pipeline which is evaluated against a Gold Standard, human-annotated corpus.

Andreas Vlachidis, Douglas Tudhope, Milco Wansleeben
FAIRising Pedagogical Documentation for the Research Lifecycle

How can pedagogical research complement academic research and vice versa? This case study revisits two research projects on curricular resources: the first, in 2019, analyzes partially structured syllabi data in digital humanities, and the second, in 2021, focuses on unstructured course titles and descriptions in LIS course catalogs. Findings reexamine data collection and analysis processes in which the lack of linked semantic metadata and persistent digital objects in curricular resources impedes fruitful research on how (inter)disciplinary topics are taught and future researchers trained. Consequently, the case study locates a gap: the role of pedagogical documentation in the research lifecycle has not been considered. As suggested by the emergence of FAIR principles, metadata expertise is a foundation for establishing the findability, accessibility, interoperability, and reuse of persistent digital objects in research outputs. FAIRising pedagogical documentation for the research lifecycle holds potential to link curricular resources with other research outputs. Information professionals have a leadership role in assisting faculty to create FAIRised pedagogical documentation, and curricular resources so prepared address the gap for integrating pedagogical documentation with the research lifecycle. Benefits include recognition of curricular resources as vital research outputs and facilitating longitudinal research on (inter)disciplinary pedagogical practices in the FAIR ecosystem.

Deborah A. Garwood, Alex H. Poole
LigADOS: Interlinking Datasets in Open Data Portal Platforms on the Semantic Web

The fostering of data opening has been largely motivated by sets of laws on access to information, which establish the need to make data related to governmental activities available to citizens and society in general, as well as results from business processes or scientific research, also for accountability and transparency. There are several ways of making data available to the public, from a simple website to sophisticated applications for accessing the data. In this context, one of the options is the construction of an open data portal using platforms for data repositories and catalogs. In the last few years, there has been a rapid proliferation of this type of portals, with domain or organization specific datasets being widely disseminated in platforms like CKAN. In these platforms, datasets are organized in thematic groups and described by keywords and other attributes assigned by the publisher. Usually described by metadata with poor semantics, these datasets very often remain as “data silos”, with no explicit connection or data integration mechanism, making it difficult for the users to locate and interrelate relevant data sources. In contrast, the Semantic Web focuses on a way of modeling and representing data in an easier manner to establish interrelationships between data, accompanied by richer descriptors. Based on this scenario, this paper proposes LigADOS, an approach to create interconnections between datasets considering their content and related metadata. LigADOS is based on the principles of the Semantic Web and associated linked data solutions and technologies, to support rich access strategies to RDF data published using portal platforms like CKAN and others.

Glaucia Botelho de Figueiredo, Kelli de Faria Cordeiro, Maria Luiza Machado Campos
Lifting Tabular Data to RDF: A Survey

Tabular data formats (e.g. CSV and spreadsheets) combine ease of use, versatility and compatibility with information management systems. Despite their numerous advantages, these formats typically rely on column headers and out-of-band agreement to convey semantics. There is clearly a large gap with respect to the Semantic Web, which uses RDF as a graph-based data model, while relying on ontologies for well-defined semantics. Several systems have been developed to close this gap, supporting the conversion of tabular data to RDF. This study is a survey of these systems, which have been analyzed and compared. We identified commonalities and differences among them, discussed different approaches and derived useful insights on the task.

Manuel Fiorelli, Armando Stellato
Automatic Human Resources Ontology Generation from the Data of an E-Recruitment Platform

Over the last decade, several e-recruitment platforms have been developed, allowing users to publish their professional information (training, work history, career summary, etc.). However, representing this huge quantity of knowledge still limited. In this work, we present a method based on community detection and natural language processing techniques in order to generate a human resources “HR” ontology. The data used in the generation process is user’s profiles retrieved from the Algerian e-recruitment platform Emploitic.com( www.emploitic.com ). Data includes occupations, skills and professional domains. Our main contribution appears in the identification of new relationships between these concepts using community detection in each area of work. The generated ontology has hierarchical relationships between skills, professions and professional domains. In order to evaluate the relevance of this ontology, we used both the manual method with experts in human resources domain and the automatic method through comparisons with existing HR-ontologies. The evaluation has shown promising results.

Sabrina Boudjedar, Sihem Bouhenniche, Hakim Mokeddem, Hamid Benachour
Examination of NoSQL Transition and Data Mining Capabilities

An estimated 2.5 quintillion bytes of data are created every day. This data explosion, along with new datatypes, objects, and the wide usage of social media networks, with an estimated 3.8 billion users worldwide, make the exploitation and manipulation of data by relational databases, cumbersome and problematic. NoSQL databases introduce new capabilities aiming at improving the functionalities offered by traditional SQL DBMS. This paper elaborates on ongoing research regarding NoSQL, focusing on the background behind their development, their basic characteristics, their categorization and the noticeable increase in popularity. Functional advantages and data mining capabilities that come with the usage of graph databases are also presented. Common data mining tasks with graphs are presented, facilitating implementation, as well as efficiency. The aim is to highlight concepts necessary for incorporating data mining techniques and graph database functionalities, eventually proposing an analytical framework offering a plethora of domain specific analytics. For example, a virus outbreak analytics framework allowing health and government officials to make appropriate decisions.

Dimitrios Rousidis, Paraskevas Koukaras, Christos Tjortjis
Entity Linking as a Population Mechanism for Skill Ontologies: Evaluating the Use of ESCO and Wikidata

Ontologies or databases describing occupations in terms of competences or skills are an important resource for a number of applications. Exploiting large knowledge graphs thus becomes a promising direction to update those ontologies with entities of the latter, which may be updated faster, especially in the case of crowd-sourced resources. Here we report a first assessment of the potential of that strategy matching knowledge elements in ESCO to Wikidata using NER and document similarity models available at the Spacy NLP libraries. Results show that the approach may be effective, but the use of pre-trained language models and the short texts included with entities (labels and descriptions) does not result in sufficient quality for a fully automated process.

Lino González, Elena García-Barriocanal, Miguel-Angel Sicilia
Implementing Culturally Relevant Relationships Between Digital Cultural Heritage Objects

A vocabulary of culturally relevant relationships – CRR - between cultural heritage objects in Library, Archive, and Museum (LAM) was proposed with the aim of interlinking digital collections using the Linked Open Data (LOD) technologies. The CRR vocabulary was proposed to be used by culture curators, teachers, historians, etc. to enable them to interlink such digital resources to provide a richer context and to reveal new senses of such resources. This paper aims at testing and evaluating the CRR vocabulary, as follows. Wikipedia articles about remarkable cultural heritage objects as the painting Mona Lisa and the book Dom Casmurro are used to create RDF triples where the heritage object is the subject, relationships of the CRR vocabulary are the predicates, and links found in each Wikipedia article to other LAM digital objects, or other Web resources are the objects of each triple. The RDF graphs thus generated are presented and discussed. The necessity of improvements in the proposed CRR vocabulary is outlined and suggestions of these changes are proposed.

Carlos H. Marcondes
An Introduction to Information Network Modeling Capabilities, Utilizing Graphs

This paper presents research on Information Network (IN) modeling using graph mining. The theoretical background along with a review of relevant literature is showcased, pertaining the concepts of IN model types, network schemas and graph measures. Ongoing research involves experimentation and evaluation on bipartite and star network schemas, generating test subjects using Social Media, Energy or Healthcare data. Our contribution is showcased by two proof-of-concept simulations we plan to extend.

Paraskevas Koukaras, Dimitrios Rousidis, Christos Tjortjis

Track on Metadata and Semantics for Digital Libraries, Information Retrieval, Big, Linked, Social and Open Data

Frontmatter
Using METS to Express Digital Provenance for Complex Digital Objects

Today’s digital libraries consist of much more than simple 2D images of manuscript pages or paintings. Advanced imaging techniques – 3D modeling, spectral photography, and volumetric x-ray, for example – can be applied to all types of cultural objects and can be combined to create complex digital representations comprising many disparate parts. In addition, emergent technologies like virtual unwrapping and artificial intelligence (AI) make it possible to create “born digital” versions of unseen features, such as text and brush strokes, that are “hidden” by damage and therefore lack verifiable analog counterparts. Thus, the need for transparent metadata that describes and depicts the set of algorithmic steps and file combinations used to create such complicated digital representations is crucial. At EduceLab, we create various types of complex digital objects, from virtually unwrapped manuscripts that rely on machine learning tools to create born-digital versions of unseen text, to 3D models that consist of 2D photos, multi- and hyperspectral images, drawings, and 3D meshes. In exploring ways to document the digital provenance chain for these complicated digital representations and then support the dissemination of the metadata in a clear, concise, and organized way, we settled on the use of the Metadata Encoding Transmission Standard (METS). This paper outlines our design to exploit the flexibility and comprehensiveness of METS, particularly its behaviorSec, to meet emerging digital provenance metadata needs.

Christy Chapman, Seth Parker, Stephen Parsons, W. Brent Seales
How Can a University Take Its First Steps in Open Data?

Every university in Greece is obliged to comply with the national legal framework on open data. The rising question is how such a big and diverse organization could support open data from an administrative, legal and technical point of view, in a way that enables gradual improvement of the open data-related services. In this paper, we describe our experience, as University of Crete, for tackling these requirements. In particular, (a) we detail the steps of the process that we followed, (b) we show how an Open Data Catalog can be exploited also in the first steps of this process, (c) we describe the platform that we selected, how we organized the catalog and the metadata selection, (d) we describe extensions that were required, and (e) we discuss the current status, performance indicators, and possible next steps.

Yannis Tzitzikas, Marios Pitikakis, Giorgos Giakoumis, Kalliopi Varouha, Eleni Karkanaki
Linked Vocabularies for Mobility and Transport Research

The paper describes the creation of a vocabulary for a domain-specific information service platform (SIS move) by vocabulary re-use and linking. Source vocabularies differ with respect to several factors (domain-specificity, accessibility, data model). We address why vocabularies should be considered for a domain-specific vocabulary and how they are brought under a common modelling paradigm with standards for knowledge organization systems and alignment of schemata. We also discuss the creation and validation of alignments. Eventually, we give an outlook on the vocabulary’s further evolution and application.

Susanne Arndt, Mila Runnwerth
Linking Author Information: EconBiz Author Profiles

Author name ambiguity represents a real obstacle in the world of digital library (DL) information retrieval. The search with an author’s name almost always casts doubt on whether all publications in the result list belong to that author or another author sharing the same name. In several other cases, the scholar is interested in having additional information about a selected author, such as a short biography, affiliations, metrics, or co-relations with other authors. The main purpose of this work is the integration and usage of diverse data, based on Linked Data approaches and authority records, to create a comprehensive author profile inside a DL. We are proposing and deploying an approach that provides such author profiles by using the available data, on-the-fly, i.e., harvesting the available sources for this purpose. The proposed approach - developed as a fully functional prototype - has been introduced for evaluation to a group of authors, scholars, and librarians. The results indicate acceptance of such an approach, underlining the benefits and limitations that come with it.

Arben Hajra, Tamara Pianos, Klaus Tochtermann
OntoBelliniLetters: A Formal Ontology for a Corpus of Letters of Vincenzo Bellini

In this paper the formal OntoBelliniLetters ontology is described concerning the corpus of Vincenzo Bellini’s correspondence letters kept at the Belliniano Civic Museum of Catania. This ontology is part of a wider project - the BellinInRete project - one of whose aims is the development of a more general and complete ontology for the whole Vincenzo Bellini’s legacy preserved in the Museum.The main concepts and relations building up the ontology knowledge base are described and discussed and hints for their implementation by means of the standard OWL-2 description language are presented. The ontology schema is inspired by the CIDOC Conceptual Reference Model (CIDOC CRM).

Salvatore Cristofaro, Daria Spampinato
Creative Data Ontology: ‘Russian Doll’ Metadata Versioning in Film and TV Post-Production Workflows

A ‘Russian doll’ is a decorative painted hollow wooden figure that can be contained in a larger figure of the same sort, which can, in turn, be contained in another figure, and this can be repeated as many times as needed. This paper describes the development of an OWL-based ontology designed for metadata versioning in the media post-production industry. This has been implemented using the same Russian doll principle: of a ‘record’ being able to “wrap” (or contain) another record, one which relates to the versioning of that metadata, repeated as often as needed. Our ontology for metadata used in the media industry distinguishes itself from others by addressing the full range of post-production processes, rather than the archiving of a finished product. The ontology has been developed using metadata fields provided by high profile UK-based post-production companies, informed by ethnographic and co-design work carried out with them. This is the basis for a prototype metadata management tool for use in both media post-production and on media productions. We present central design principles emerging from our collaborative research, and describe the process of co-developing this ontology with our partners.

Christos A. Dexiades, Claude P. R. Heath
Machine Learning Models Applied to Weather Series Analysis

In recent years the explosion in high-performance computing systems and high-capacity storage has led to an exponential increase in the amount of information, generating the phenomenon of big data and the development of automatic processing models like machine learning analysis. In this paper a machine learning time series analysis was experimentally developed in relation to the paroxysmal meteorological event “cloudburst” characterized by a very intense storm, concentrated in a few hours and highly localized. These extreme phenomena such as hail, overflows and sudden floods are found in both urban and rural areas. The predictability over time of these phenomena is very short and depends on the event considered, therefore it is useful to add data driven methods to the deterministic modeling tools to get the anticipated predictability of the event, also known as nowcasting. The detailed knowledge of these phenomena, together with the development of simulation models for the propagation of cloudbursts, can be a useful tool for monitoring and mitigating risk in civil protection contingency plans.

Francesca Fallucchi, Riccardo Scano, Ernesto William De Luca
EPIC: A Proposed Model for Approaching Metadata Improvement

This paper outlines iterative steps involved in metadata improvement within a digital library: Evaluate, Prioritize, Identify, and Correct (EPIC). The process involves evaluating metadata values system-wide to identify errors; prioritizing errors according to local criteria; identifying records containing a particular error; and correcting individual records to eliminate the error. Based on the experiences at the University of North Texas (UNT) Libraries, we propose that these cyclical steps can serve as a model for organizations that are planning and conducting metadata quality assessment.

Hannah Tarver, Mark Edward Phillips
Semantic Web Oriented Approaches for Smaller Communities in Publishing Findable Datasets

Publishing findable datasets is a crucial step in data interoperability and reusability. Initiatives like Google data search and semantic web standards like Data Catalog Vocabulary (DCAT) and Schema.org provide mechanisms to expose datasets on the web and make them findable. Apart from these standards, it is also essential to optionally explain the datasets, both its structure and applications. Metadata application profiles are a suitable mechanism to ensure interoperability and improve use-cases for datasets. Standards and attempts, including the Profiles (PROF) and VoID vocabularies, as well as frameworks like Dublin core application profiles (DCAP), provide a better understanding of developing and publishing metadata application profiles. The major challenge for domain experts, especially smaller communities intending to publish findable data on the web, is the complexities in understanding and conforming to such standards. Mostly, these features are provided by complex data repository systems, which is not always a sustainable choice for various small groups and communities looking for self-publishing their datasets. This paper attempts to utilize these standards in self-publishing findable datasets through customizing minimal static web publishing tools and demonstrating the possibilities to encourage smaller communities to adopt cost-effective and simple dataset publishing. The authors express this idea though this work-in-progress paper with the notion that such simple tools will help small communities to publish findable datasets and thus, gain more reach and acceptance for their data. From the perspective of the semantic web, such tools will improve the number of linkable datasets as well as promote the fundamental concepts of the decentralized web.

Nishad Thalhath, Mitsuharu Nagamori, Tetsuo Sakaguchi, Deepa Kasaragod, Shigeo Sugimoto

Track on Metadata and Semantics for Agriculture, Food and Environment (AgroSEM’20)

Frontmatter
Ontology-Based Decision Support System for the Nitrogen Fertilization of Winter Wheat

Digital technologies are already used in several aspects of agriculture. However, decision-making in crop production is still often a manual process that relies on various heterogeneous data sources. Small-scale farmers and their local consultants are particularly burdened by increasingly complex requirements. Regional circumstances and regulations play an essential role and need to be considered. This paper presents an ontology-based decision support system for the nitrogen fertilization of winter wheat in Bavaria, Germany. Semantic Web and Linked Data technologies were employed to both reuse and model new common semantic structures for interrelated knowledge. Many relevant general and regional data sources from multiple domains were not yet available in RDF. Hence, we used several tools to transform relevant data into corresponding OWL ontologies and combined them in a central knowledge base. The GUI application of the decision support system queries it to parameterize requests to external web services and to show relevant information in an integrated view. It further uses SPARQL queries to automatically generate recommendations for farmers and their consultants.

Ingmar Kessler, Alexander Perzylo, Markus Rickert
Semantic Description of Plant Phenological Development Stages, Starting with Grapevine

The French project Data to Knowledge in Agronomy and Biodiversity (D2KAB) will make available a semantically-enabled French agricultural alert newsletter. In order to describe/annotate crop phenological development stages in the newsletters, we need a specific semantic resource to semantically represent each stages. Several scales already exist to describe plant phenological development stages. BBCH, considered a reference, offers several sets of stages –one per crop called ‘individual scales’–and a general one. The French Wine and Vine Institute (IFV) has aligned several existing scales in order to identify the most useful grapevine development stages for agricultural practices. Unfortunately these scales are not available in a semantic form preventing their use in agricultural semantic applications. In this paper, we present our work of creating an ontological framework for semantic description of plant development stages and transforming specific scales into RDF vocabularies; we introduce the BBCH-based Plant Phenological Description Ontology and we illustrate this framework with four scales related to grapevine.

Catherine Roussey, Xavier Delpuech, Florence Amardeilh, Stephan Bernard, Clement Jonquet
On the Evolution of Semantic Warehouses: The Case of Global Record of Stocks and Fisheries

Semantic Warehouses integrate data from various sources for offering a unified view of the data and enabling the answering of queries which cannot be answered by the individual sources. However, such semantic warehouses have to be refreshed periodically as the underlying datasets change. This is a challenging requirement, not only because the mappings and transformations that were used for constructing the semantic warehouse can be invalidated, but also because additional information (not existing in the initial datasets) may have been added in the semantic warehouse, and such information needs to be preserved after every reconstruction. In this paper we focus on this particular problem in a real setting: the Global Record of Stocks and Fisheries, a semantic warehouse that integrates data about stocks and fisheries from various information systems. We propose and detail a process that can tackle these requirements and we report our experiences from implementing it.

Yannis Marketakis, Yannis Tzitzikas, Aureliano Gentile, Bracken van Niekerk, Marc Taconet
A Software Application with Ontology-Based Reasoning for Agroforestry

Agroforestry consists of combining trees with agriculture, both on farms and in the agricultural landscape. Within a context of sustainable development, agroforestry can improve soil conservation and reduce the use of toxic chemicals on crops, as well as improving biodiversity. Interdisciplinary by nature, the field of agroforestry mobilizes a large body of knowledge from environmental and life sciences using systemic approaches. In this framework, field observation data are acquired in partnership with several categories of stakeholders such as scientists, foresters, farmers, breeders, politicians and land managers. For data management efficiency, we propose the software application AOBRA (a software Application with Ontology-Based Reasoning for Agroforestry). The core of AOBRA is a domain ontology called “Agroforestry” which serves as a basis for capitalizing and sharing knowledge in agroforestry. By exploiting the capabilities of inference and linkages to other areas of expertise on the Web offered by the use of an ontology model, it aims to provide a broad view of agroforestry designs, and to allow the comparison between different spatial layouts of trees and crops.

Raphaël Conde Salazar, Fabien Liagre, Isabelle Mougenot, Jéôme Perez, Alexia Stokes

Track on Metadata and Semantics for Open Repositories, Research Information Systems and Data Infrastructures

Frontmatter
HIVE-4-MAT: Advancing the Ontology Infrastructure for Materials Science

This paper introduces Helping Interdisciplinary Vocabulary Engineering for Materials Science (HIVE-4-MAT), an automatic linked data ontology application. The paper provides contextual background for materials science, shared ontology infrastructures, and knowledge extraction applications. HIVE-4-MAT’s three key features are reviewed: 1) Vocabulary browsing, 2) Term search and selection, and 3) Knowledge Extraction/Indexing, as well as the basics of named entity recognition (NER). The discussion elaborates on the importance of ontology infrastructures and steps taken to enhance knowledge extraction. The conclusion highlights next steps surveying the ontology landscape, including NER work as a step toward relation extraction (RE), and support for better ontologies.

Jane Greenberg, Xintong Zhao, Joseph Adair, Joan Boone, Xiaohua Tony Hu
Institutional Support for Data Management Plans: Five Case Studies

Researchers are being prompted by funders and institutions to expose the variety of results of their projects and to submit a Data Management Plan as part of their funding requests. In this context, institutions are looking for solutions to provide support to research data management activities in general, including DMP creation. We propose a collaborative approach where a researcher and a data steward create a DMP, involving other parties as required. We describe this collaborative method and its implementation, by means of a set of case studies that show the importance of the data steward in the institution. Feedback from researchers shows that the DMP are simple enough to lead people to engage in data management, but present enough challenges to constitute an entry point to the next level, the machine-actionable DMP.

Yulia Karimova, Cristina Ribeiro, Gabriel David

Track on Metadata and Semantics for Digital Humanities and Digital Curation (DHC2020)

Frontmatter
Building Linked Open Date Entities for Historical Research

Time is a focal point for historical research. Although existing Linked Open Data (LOD) resources hold time entities, they are often limited to modern period and year-month precision at most. Therefore, researchers are currently unable to execute co-reference resolution through entity linking to integrate different datasets which contain information on the day level or remote past. This paper aims to build an RDF model and lookup service for historical time at the lowest granularity level of a single day at a specific point in time, for the duration of 6000 years. The project, Linked Open Date Entities (LODE), generates stable URIs for over 2.2 million entities, which include essential information and links to other LOD resources. The value of date entities is discussed in a couple of use cases with existing datasets. LODE facilitates improved access and connectivity to unlock the potential for the data integration in interdisciplinary research.

Go Sugimoto
Wikidata Centric Vocabularies and URIs for Linking Data in Semantic Web Driven Digital Curation

Wikidata is evolving as the hub of Linked Open Data (LOD), with its language-neutral URIs and close adherence to Wikipedia. Well defined URIs help the data to be interoperable and linkable. This paper examines the possibilities of utilizing Wikidata as the means of a vocabulary resource for promoting the use of linkable concepts. Digital curation projects are vibrant with varying demands and purposes, which makes them less suitable for adopting any common vocabularies or ontologies. Also, developing and maintaining custom vocabularies are expensive processes for smaller projects in terms of resources and skill requirements. In general, Wikidata entities are well documented with Wikipedia entries, and Wikipedia entries express the conceptual and hierarchical relations in detail with provisions to modify or create. The authors explain the concept of using Wikidata as a vocabulary source with a proof of concept module implementation for Omeka-S, a widely adapted open source digital curation platform. This paper is expected to show some practical insights on reliable an reasonable vocabulary development for social informatics as well as cultural heritage projects, with a notion to improve the quality and quantity of linkable data from digital curation projects.

Nishad Thalhath, Mitsuharu Nagamori, Tetsuo Sakaguchi, Shigeo Sugimoto
A Linked Data Model for Data Scopes

With the rise of data driven methods in the humanities, it becomes necessary to develop reusable and consistent methodological patterns for dealing with the various data manipulation steps. This increases transparency, replicability of the research. Data scopes present a qualitative framework for such methodological steps. In this work we present a Linked Data model to represent and share Data Scopes. The model consists of a central Data scope element, with linked elements for data Selection, Linking, Modeling, Normalisation and Classification. We validate the model by representing the data scope for 24 articles from two domains: Humanities and Social Science.

Victor de Boer, Ivette Bonestroo, Marijn Koolen, Rik Hoekstra

Track on Metadata and Semantics for Cultural Collections and Applications

Frontmatter
Representing Archeological Excavations Using the CIDOC CRM Based Conceptual Models

This paper uses CIDOC CRM and CRM-based models (CRMarchaeo, CRMsci) to represent archaeological excavation activities and the observations of archaeologists during their work in the excavation field. These observations are usually recorded in documents such as context sheets. As an application of our approach (case study), we used the records of the recent archaeological excavations in Fuwairit in Qatar, part of the Origins of Doha and Qatar Project. We explore issues related to the application of classes and properties as they appear in the latest versions of the aforementioned models, i.e. CIDOC CRM, CRMarchaeo, and CRMsci. The proposed data model could be used as the basis to create an automated system for archaeological documentation and archeological data integration.

Manolis Gergatsoulis, Georgios Papaioannou, Eleftherios Kalogeros, Robert Carter
Generating and Exploiting Semantically Enriched, Integrated, Linked and Open Museum Data

The work presented in this paper is engaging with and contributes to the implementation and evaluation of Semantic Web applications in the cultural Linked Open Data (LOD) domain. The main goal is the semantic integration, enrichment and interlinking of data that are generated through the documentation process of artworks and cultural heritage objects. This is accomplished by using state-of-the-art technologies and current standards of the Semantic Web (RDF, OWL, SPARQL), as well as widely accepted models and vocabularies relevant to the cultural domain (Dublin Core, SKOS, Europeana Data Model). A set of specialized tools such as KARMA and OpenRefine/RDF-extension is being used and evaluated in order to achieve the semantic integration of museum data from heterogeneous sources. Interlinking is achieved using tools such as Silk and OpenRefine/RDF-extension, discovering links (at the back-end) between disparate datasets and other external data sources such as DBpedia and Wikidata that enrich the source data. Finally, a front-end Web application is developed in order to exploit the semantically integrated and enriched museum data, and further interlink (and enrich) them (at application run-time), with the data sources of DBpedia and Europeana. The paper discusses engineering choices made for the evaluation of the proposed framework/pipeline.

Sotirios Angelis, Konstantinos Kotis

Track on European and National Projects

Frontmatter
Metadata Aggregation via Linked Data: Results of the Europeana Common Culture Project

Digital cultural heritage resources are widely available on the web through the digital libraries of heritage institutions. To address the difficulties of discoverability in cultural heritage, the common practice is metadata aggregation, where centralized efforts like Europeana facilitate discoverability by collecting the resources’ metadata. We present the results of the linked data aggregation task conducted within the Europeana Common Culture project, which attempted an innovative approach to aggregation based on linked data made available by cultural heritage institutions. This task ran for one year with participation of twelve organizations, involving the three member roles of the Europeana network: data providers, intermediary aggregators, and the central aggregation hub, Europeana. We report on the challenges that were faced by data providers, the standards and specifications applied, and the resulting aggregated metadata.

Nuno Freire, Enno Meijers, Sjors de Valk, Julien A. Raemy, Antoine Isaac

Track on Knowledge IT Artifacts (KITA) in Professional Communities and Aggregations (KITA 2020)

Frontmatter
The Role of Data Storage in the Design of Wearable Expert Systems

Wearable technologies are transforming research in software and knowledge engineering research fields. In particular, expert systems have the opportunity to manage knowledge bases varying according to real-time data collected by position sensors, movement sensors, and so on. This opportunity launches a series of challenges, from the role of network technologies to allow reliable connection between applications and sensors to the definition of functions and methods to assess the quality and reliability of gathered data. In this paper, we reflect about the last point, presenting recent reflections on the wearable environment notion. An architecture for the reliable acquisition of data in the IoT context is proposed, together with first experiments conducted to evaluate its effectiveness in improving the quality of data elaborated by applications.

Fabio Sartori, Marco Savi, Riccardo Melen
Towards an Innovative Model in Wearable Expert System for Skiing

Mobile applications and portable devices are being used extensively in the healthcare sector due to their rapid development. Wearable devices having sensors can be used to collect, analyze, and transmit the vital signs of the wearer. In this paper, we have proposed a wearable expert system that supports and monitors the skier during his activity. This research work is motivated by the need to provide rapid assistance to skiers, especially during off-piste skiing, where its more dangerous, and seeking help is difficult with mishaps. Our approach mainly focuses on proposing an expert system that integrates wearable devices (helmet, goggles, digital watch) with the skier’s smartphone. We present an architecture model and knowledge artifacts to design a wearable expert system for skiing.

Elson Kurian, Sherwin Varghese, Stefano Fiorini
Backmatter
Metadata
Title
Metadata and Semantic Research
Editors
Dr. Emmanouel Garoufallou
María-Antonia Ovalle-Perandones
Copyright Year
2021
Electronic ISBN
978-3-030-71903-6
Print ISBN
978-3-030-71902-9
DOI
https://doi.org/10.1007/978-3-030-71903-6