Skip to main content
main-content

Über dieses Buch

This book constitutes the refereed proceedings of the 9th Metadata and Semantics Research Conference, MTSR 2015, held in Manchester, UK, in September 2015.The 35 full papers and 3 short papers presented together with 2 poster papers were carefully reviewed and selected from 76 submissions. The papers are organized in several sessions and tracks: general track on ontology evolution, engineering, and frameworks, semantic Web and metadata extraction, modelling, interoperability and exploratory search, data analysis, reuse and visualization; track on digital libraries, information retrieval, linked and social data; track on metadata and semantics for open repositories, research information systems and data infrastructure; track on metadata and semantics for agriculture, food and environment; track on metadata and semantics for cultural collections and applications; track on European and national projects.

Inhaltsverzeichnis

Frontmatter

Ontology Evolution, Engineering, and Frameworks

Frontmatter

An Orchestration Framework for Linguistic Task Ontologies

Ontologies provide knowledge representation formalism for expressing linguistic knowledge for computational tasks. However, natural language is complex and flexible, demanding fine-grained ontologies tailored to facilitate solving specific problems. Moreover, extant linguistic ontological resources ignore mechanisms for systematic modularisation to ensure semantic interoperability with task ontologies. This paper presents an orchestration framework to organise and control the inheritance of ontological elements in the development of linguistic task ontologies. The framework is illustrated in the design of new task ontologies for the Bantu noun classification system. Specific use is demonstrated with annotation of lexical items connected to ontology elements terms and with the classification of nouns in the ABox into noun classes.

Catherine Chavula, C. Maria Keet

On the Preservation of Evolving Digital Content - The Continuum Approach and Relevant Metadata Models

We consider the preservation of digital objects in continually evolving ecosystems, for which traditional lifecycle approaches are less appropriate. Motivated by the Records Continuum theory, we define an approach that combines active life with preservation and is non-custodial, which we refer to as the continuum approach. Preserving objects and their associated environment introduces high level of complexity. We therefore describe a model-driven approach, termed the Continuum approach, in which models rather than the digital objects themselves can be analysed. In such setting, the use of appropriate metadata is very important, we therefore outline the PERICLES Linked Resource Model, an upper ontology for modelling digital ecosystems, and compare and contrast it to the Australian Government Recordkeeping Metadata Standard, developed within the record keeping community.

Nikolaos Lagos, Simon Waddington, Jean-Yves Vion-Dury

Challenges for Ontological Engineering in the Humanities - A Case Study of Philosophy

The paper develops an idea of an engineering ontology whose purpose is to represent philosophy as a research discipline in the humanities. I discuss a three recent attempts in this respect with the aim to identify their modelling potential. The upshot of this analysis leads to a new conceptual framework for ontological engineering for philosophy. I show how this framework can be implemented in the form of a simple OWL ontology.

Pawel Garbacz

Threshold Determination and Engaging Materials Scientists in Ontology Design

This paper reports on research exploring a threshold for engaging scientists in semantic ontology development. The domain application, nanocrystalline metals, was pursued using a multi-method approach involving algorithm comparison, semantic concept/term evaluation, and term sorting. Algorithms from four open source term extraction applications (RAKE, Tagger, Kea, and Maui) were applied to a test corpus of preprint abstracts from the arXiv repository. Materials scientists identified 92 terms for ontology inclusion from a combined set of 228 unique terms, and the term sorting activity resulted in 9 top nodes. The combined methods were successful in engaging domain scientists in ontology design, and give a threshold capacity measure (threshold acceptability) to aid future work. This paper presents the research background and motivation, reviews the methods and procedures, and summarizes the initial results. A discussion explores term sorting approaches and mechanisms for determining thresholds for engaging scientist in semantically-driven ontology design and the concept of ontological empowerment.

Jane Greenberg, Yue Zhang, Adrian Ogletree, Garritt J. Tucker, Daniel Foley

What are Information Security Ontologies Useful for?

The engineering of ontologies in the information security domain have received some degree of attention in past years. Concretely, the use of ontologies has been proposed as a solution for a diversity of tasks related to that domain, from the modelling of cyber-attacks to easing the work of auditors or analysts. This has resulted in ontology artefacts, degrees of representation and ontological commitments of a diverse nature. In this paper, a selection of recent research in the area is categorized according to their purpose or application, highlighting their main commonalities. Then, an assessment of the current status of development in the area is provided, in an attempt to sketch a future roadmap for further research. The literature surveyed shows different levels of analysis, from the more conceptual to the more low-level, protocol-oriented, and also diverse levels of readiness for practice. Further, several of the works found use existing standardized, community-curated databases as sources for ontology population, which points out to a need to use these as a baseline for future research, adding ontology-based functionalities for those capabilities not directly supported by them.

Miguel-Angel Sicilia, Elena García-Barriocanal, Javier Bermejo-Higuera, Salvador Sánchez-Alonso

Semantic Web and Metadata Extraction, Modelling, Interoperability and Exploratory Search

Frontmatter

Interoperable Multimedia Annotation and Retrieval for the Tourism Sector

The Atlas Metadata System (AMS) employs semantic web annotation techniques in order to create an interoperable information annotation and retrieval platform for the tourism sector. AMS adopts state-of-the-art metadata vocabularies, annotation techniques and semantic web technologies. Interoperability is achieved by reusing several vocabularies and ontologies, including Dublin Core, PROV-O, FOAF, Geonames, Creative commons, SKOS, and CiTO, each of which provides with orthogonal views for annotating different aspects of digital assets. Our system invests a great deal in managing geospatial and temporal metadata, as they are extremely relevant for tourism-related applications. AMS has been implemented as a graph database using Neo4j, and is demonstrated with a dataset of more than 160000 images downloaded from Flickr. The system provides with online recommendations, via queries that exploit social networks, spatiotemporal references, and user rankings. AMS is offered via service-oriented endpoints using public vocabularies to ensure reusability.

Antonios Chatzitoulousis, Pavlos S. Efraimidis, Ioannis N. Athanasiadis

Species Identification Through Preference-Enriched Faceted Search

There are various ways and corresponding tools that the marine biologist community uses for identifying one species. Species identification is essentially a decision making process comprising steps in which the user makes a selection of characters, figures or photographs, or provides an input that restricts other choices, and so on, until reaching one species. In many cases such decisions should have a specific order, as in the textual dichotomous identification keys. Consequently, if a wrong decision is made at the beginning of the process, it could exclude a big list of options. To make this process more flexible (i.e. independent of the order of selections) and less vulnerable to wrong decisions, in this paper we investigate how an exploratory search process, specifically a Preference-enriched Faceted Search (PFS) process, can be used to aid the identification of species. We show how the proposed process covers and advances the existing methods. Finally, we report our experience from applying this process over data taken from FishBase, the most popular source for marine resources. The proposed approach can be applied over any kind of objects described by a number of attributes.

Yannis Tzitzikas, Nicolas Bailly, Panagiotis Papadakos, Nikos Minadakis, George Nikitakis

An Expert System for Water Quality Monitoring Based on Ontology

Semantic technologies have proved to be a suitable foundation for integrating Big Data applications. Wireless Sensor Networks (WSNs) represent a common domain which knowledge bases are naturally modeled through ontologies. In our previous works we have built domain ontology of WSN for water quality monitoring. The SSN ontology was extended to meet the requirements for classifying water bodies into appropriate statuses based on different regulation authorities. In this paper we extend this ontology with a module for identifying the possible sources of pollution. To infer new implicit knowledge from the knowledge bases different rule systems have been layered over ontologies by state-of-the-art WSN systems. A production rules system was developed to demonstrate how our ontology can be used to enable water quality monitoring. The paper presents an example of system validation with simulated data, but it is developed for use within the InWaterSense project with real data. It demonstrates how Biochemical Oxygen Demand observations are classified based on Water Framework Directive regulation standard and provide its eventual sources of pollution. The system features and challenges are discussed by also suggesting the potential directions of Semantic Web rule layer developments for reasoning with stream data.

Edmond Jajaga, Lule Ahmedi, Figene Ahmedi

Discovering the Topical Evolution of the Digital Library Evaluation Community

The successful management of textual information is a rising challenge for all the researchers’ communities, in order firstly to assess its current and previous statuses and secondly to enrich the level of their metadata description. The huge amount of unstructured data that is produced has consequently populated text mining techniques for its interpretation, selection and metadata enrichment opportunities that provides. Scientific production regarding Digital Libraries (DLs) evaluation has been grown in size and has broaden the scope of coverage as it consists a complex and multidimensional field. The current study proposes a probabilistic topic modeling implemented on a domain corpus from the JCDL, ECDL/TDPL and ICADL conferences proceedings in the period 2001-2013, aiming at the unveiling of its topics and subject temporal analysis, for exploiting and extracting semantic metadata from large corpora in an automatic way.

Leonidas Papachristopoulos, Nikos Kleidis, Michalis Sfakakis, Giannis Tsakonas, Christos Papatheodorou

Application of Metadata Standards for Interoperability Between Species Distribution Models

This paper presents a study about the use of metadata standards for the area of Biodiversity Informatics. Species Distribution Modeling tools generated models that offer information about species distribution and allow scientists, researchers, environmentalists, companies and govern to make decisions to protect and preserve biodiversity. Studies reveal that this area require new technologies and this include the interoperability between models generated by Species Distribution Tools. To ensure interoperability, we present a schema that use metadata standards to generate XML archives that contain all information necessary to reuse models of species distribution. This paper is part of a major study that claims for the use of a metadata standards as a fundamental way to provide structured biodiversity information.

Cleverton Borba, Pedro Luiz Pizzigatti Correa

Data Analysis, Reuse and Visualization

Frontmatter

Majority Voting Re-ranking Algorithm for Content Based-Image Retrieval

We propose a new algorithm, known as Majority Voting Re-ranking Algorithm (MVRA), which re-ranks the first returned images answered by an image retrieval system. Since this algorithm proceeds to change the images rate before any visualizing to the user, it does not require any assistance. The algorithm has been experimented using the Wang database and the Google image engine and has been compared to other methods based on two clustering algorithms namely: HACM and K-means. The obtained results indicate the clear superiority of the proposed algorithm.

Mawloud Mosbah, Bachir Boucheham

Narrative Analysis for HyperGraph Ontology of Movies Using Plot Units

We give a initial report on the potential of narrative theory driven graph based representations of movies for narrative analysis and summarization. This is done using automatic generation of Plot Units and determination of affect states for characters. Given the power of theory driven graph representations of movies based on these initial experiments, we present a graph ontology for movies based on hypergraphs. An example as to how graph representations for narrative analysis fit into this ontology is also presented. We thus argue that a graph data model for the content of movies could better capture its underlying semantics.

Sandeep Reddy Biddala, Navjyoti Singh

Track on Digital Libraries, Information Retrieval, Linked and Social Data Models, Frameworks and Applications

Frontmatter

RDA Element Sets and RDA Value Vocabularies: Vocabularies for Resource Description in the Semantic Web

Considering the need for metadata standards suitable for the Semantic Web, this paper describes the RDA Element Sets and the RDA Value Vocabularies that were created from attributes and relationships defined in Resource Description and Access (RDA). First, we present the vocabularies included in RDA Element Sets: the vocabularies of classes, of properties and of properties unconstrained by FRBR entities; and then we present the RDA Value Vocabularies, which are under development. As a conclusion, we highlight that these vocabularies can be used to meet the needs of different contexts due to the unconstrained properties and to the independence of the vocabularies of properties from the vocabularies of values and vice versa.

Fabrício Silva Assumpção, José Eduardo Santarem Segundo, Plácida Leopoldina Ventura Amorim da Costa Santos

Metadata for Scientific Audiovisual Media: Current Practices and Perspectives of the TIB|AV-Portal

Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB|AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English ‘translations’ that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.

Sven Strobel, Paloma Marín-Arraiza

Metadata Interoperability and Ingestion of Learning Resources into a Modern LMS

Over time, academic and research institutions worldwide have commenced the transformation of their digital libraries into a more structured concept, the Learning Object Repository (LOR) that enables the educators to share, manage and use educational resources much more effectively. The key point of LORs interoperability and scalability is without doubt the various standards and protocols such as LOM, SCORM etc. On the other hand, Learning Management Systems have boosted the expansion of the e-learning notion by providing the chance to follow remotely courses of the most well-known universities. However, there is no a uniform way to integrate these two achievements of e-learning and assure an effective collaboration between them. In this paper, we propose a solution on how we can ingest learning objects metadata into the Open eClass platform.

Aikaterini K. Kalou, Dimitrios A. Koutsomitropoulos, Georgia D. Solomou, Sotirios D. Botsios

Clustering Learning Objects for Improving Their Recommendation via Collaborative Filtering Algorithms

Collaborative Filtering can be used in the context of e-learning to recommend learning objects to students and teachers involved with the teaching and learning process. Although such technique presents a great potential for e-learning, studies related to this application in this field are still limited, mostly because the inexistence of available datasets for testing and evaluating. The present work evaluates a pre-processsing method through clustering for future use of collaborative filtering algorithms. For that we use a large data set collected from the MERLOT repository. The initial results point out that clustering learning objects before the use of collaborative filtering techniques can improve the recommendations performance.

Henrique Lemos dos Santos, Cristian Cechinel, Ricardo Matsumura Araujo, Miguel-Ángel Sicilia

Conceptualization of Personalized Privacy Preserving Algorithms

In recent years, personal data has been shared between organizations and researchers. While sharing information, individuals’ sensitive data should be preserved. For this purpose, a number of algorithms for privacy-preserving publish data have been designed. These algorithms modify or transform data to protect privacy. While the anonymization algorithms such as k-anonymity, l-diversity and t - closeness focus on changing data to a protected form, the differential privacy model considers the results of queries posed on data. Therefore, these algorithms can be compared according to their performance or utility of the queries that have been applied on anonymized data or computed results with noise. In this work, we present a domain-independent semantic model of data anonymization techniques which also considers individuals’ different privacy concerns. Thus, the proposed conceptualized model integrates the generic view of privacy preserving data anonymization algorithms with a personalized privacy approach.

Buket Usenmez, Ozgu Can

Data Quality and Evaluation Studies

Frontmatter

Evaluation of Metadata in Research Data Repositories: The Case of the DC.Subject Element

Research Data repositories are growing in terms of volume rapidly and exponentially. Their main goal is to provide scientists the essential mechanism to store, share, and re-use datasets generated at various stages of the research process. Despite the fact that metadata play an important role for research data management in the context of these repositories, several factors - such as the big volume of data and its complex lifecycles, as well as operational constraints related to financial resources and human factors - may impede the effectiveness of several metadata elements. The aim of the research reported in this paper was to perform a descriptive analysis of the DC.Subject metadata element and to identify its data quality problems in the context of the Dryad research data repository. In order to address this aim a total of 4.557 packages and 13.638 data files were analysed following a data-preprocessing method. The findings showed emerging trends about the subject coverage of the repository (e.g. the most popular subjects and the authors that contributed the most for these subjects). Also, quality problems related to the lack of controlled vocabulary and standardisation were very common. This study has implications for the evaluation of metadata and the improvement of the quality of the research data annotation process.

Dimitris Rousidis, Emmanouel Garoufallou, Panos Balatsoukas, Miguel-Angel Sicilia

Software Applications Ecosystem for Authority Control

Authority control is recognized as an expensive task in the cataloging process. This is actually an active research field in libraries and related research institutions even when several approaches have been proposed in this research area. In this paper, we propose AUCTORITAS, a tool for exposing high value services on the web for the authority control in a generic institution environment. This paper describes the application ecosystem behind AUCTORITAS and how the semantic web languages make possible the semantic integration of heterogeneous applications. Likewise we evaluate the applicability of the proposal for academic libraries.

Leandro Tabares Martín, Félix Oscar Fernández Peña, Amed Abel Leiva Mederos, Marc Goovaerts, Dailién Calzadilla Reyes, Wilbert Alberto Ruano Álvarez

Digital Libraries Evaluation: Measuring Europeana’s Usability

Europeana is an international trusted digital initiative providing access, from a single entry point, to prized collections from a number of European cultural institutions. Advanced Internet and digital technologies present new ways to connect with users; and there is a need continued evaluation of digital libraries. This paper reports on a task oriented, usability study exploring a number of aspects including user satisfaction specific to the Europeana Digital Library. Participants were students from Library Science and Information Systems department, who had some basic experience searching digital collections for information. Participants performed 13 tasks, and focused on the Hellenistic collection. Methodologically, the test was consisted of a list of tasks that among others aimed to assess user satisfaction and interest while performing them. The method applied was measuring Effectiveness, Efficiency, Learnability and Satisfaction. Despite the fact that it was not the first time that they came in contact with a digital library, several participants had difficulties while performing selected tasks, especially when they involved a variety of search types. In general, all of the participants seemed to comprehend how Europeana is organized, although the results also indicate that participants had feelings that expectations were not met when performing more complex tasks.

Anxhela Dani, Chrysanthi Chatzopoulou, Rania Siatri, Fotis Mystakopoulos, Stavroula Antonopoulou, Evangelia Katrinaki, Emmanouel Garoufallou

Usability Testing of an Annotation Tool in a Cultural Heritage Context

This paper presents the result of a usability test of an annotation tool. The annotation tool is implemented, used and tested in a cultural heritage context (CH), the TORCH project at the Oslo and Akershus University College of Applied Science. The experiments employed non-experts with the intention of facilitating for crowd-sourcing of annotations. Interesting problems and usability patterns from the literature manifest in our experiments. Despite some weaknesses in the interface of the tool version used for the experiments, the annotators show a reasonable rate of success.

Karoline Hoff, Michael Preminger

An Exploration of Users’ Needs for Multilingual Information Retrieval and Access

The need for promoting Multilingual Information Retrieval (MLIR) and Access (MLIA) has become evident, now more than ever, given the increase of the online information produced daily in languages other than English. This study aims to explore users’ information needs when searching for information across languages. Specifically, the method of questionnaire was employed to shed light on the Library and Information Science (LIS) undergraduate students’ use of search engines, databases, digital libraries when searching as well as their needs for multilingual access. This study contributes in informing the design of MLIR systems by focusing on the reasons and situations under which users would search and use information in multiple languages.

Evgenia Vassilakaki, Emmanouel Garoufallou, Frances Johnson, R. J. Hartley

Track on Metadata and Semantics for Open Repositories, Research Information Systems and Data Infrastructure

Frontmatter

Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML

OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a database of all EC FP7 and H2020 funded research projects, including metadata of their results (publications and datasets). These data are stored in an HBase NoSQL database, post-processed, and exposed as HTML for human consumption, and as XML through a web service interface. As an intermediate format to facilitate statistical computations, CSV is generated internally. To interlink the OpenAIRE data with related data on the Web, we aim at exporting them as Linked Open Data (LOD). The LOD export is required to integrate into the overall data processing workflow, where derived data are regenerated from the base data every day. We thus faced the challenge of identifying the best-performing conversion approach.We evaluated the performances of creating LOD by a MapReduce job on top of HBase, by mapping the intermediate CSV files, and by mapping the XML output.

Sahar Vahdati, Farah Karim, Jyun-Yao Huang, Christoph Lange

Contextual and Provenance Metadata in the Oxford University Research Archive (ORA)

Context and provenance are essential for understanding the meaning and significance of an artefact. In this paper we describe how scholarly outputs deposited in a long-term data repository, the Oxford University Research Archive (ORA), are described with contextual information and provenance. In addition, the digital objects in ORA that act as proxies to the scholarly outputs are also described with contextual information and provenance. The ORA data model is presented together with a description of the relationships in context.

Tanya Gray Jones, Lucie Burgess, Neil Jefferies, Anusha Ranganathan, Sally Rumsey

Labman: A Research Information System to Foster Insight Discovery Through Visualizations

Effective handling of research related data is an ambitious goal, as many data entities need to be suitably designed in order to model the distinctive features of different knowledge areas: publications, projects, people, events and so on. A well designed information architecture prevents errors due to data redundancy, outdated records or poor provenance, allowing both internal staff and third parties reuse the information produced by the research centre. Moreover, making the data available through a public, Internet accessible portal increases the visibility of the institution, fostering new collaborations with external centres. However, the lack of a common structure when describing research data might prevent non-expert users from using these data. Thus we present labman, a web-based information research system that connects all the actors in the research landscape in an interoperable manner, using metadata and semantic descriptions to enrich the stored data.

Labman presents different visualizations to allow data exploration and discovery in an interactive fashion, relying on humans’ visual capacity rather than an extensive knowledge on the research field itself. Thanks to the visual representations, visitors can quickly understand the performance of experts, project outcomes, publication trajectory and so forth.

Oscar Peña, Unai Aguilera, Aitor Almeida, Diego López-de-Ipiña

Repositories for Open Science: The SciRepo Reference Model

Open Science calls for innovative approaches and solutions embracing the entire research lifecycle. From the research publishing perspective, the aim is to pursue an holistic approach where publishing includes any product (e.g. articles, datasets, experiments, notebooks, websites) resulting from a research activity and relevant to the interpretation, evaluation, and reuse of the activity or part of it. In this paper, we present the foundational concepts and relationships characterising SciRepo, i.e. an innovative class of scientific repositories that (a) promotes a publishing mechanism blurring the distinction between research lifecycle and its scholarly communication; (b) simplifies the “publishing” of an entire research activity allowing to seamlessly exploit and reuse every research product; and (c) is conceived to be nicely integrated on top of existing research infrastructures

Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi, Pasquale Pagano

On-Demand Integration and Linking of Open Data Information

This paper introduces an extension of DALI, a framework for data integration and visualisation. When integrating new data, DALI automatically tries to recognise the schema and contents of the file, semantically lift them, and annotate them with existing ontologies. The extension presented in this paper allows users to import data from external data portals, namely portals using CKAN or Socrata, based on the results of a search query or by selecting individual datasets. Furthermore, we perform a semantic expansion of the search terms provided by the user in order to identify datasets that might still be relevant while not containing the exact search terms.

Nuno Lopes, Martin Stephenson, Vanessa Lopez, Pierpaolo Tommasi, Pól Mac Aonghusa

On Bridging Data Centers and Publishers: The Data-Literature Interlinking Service

Although research data publishing is today widely regarded as crucial for reproducibility and proper assessment of scientific results, several challenges still need to be solved to fully realize its potential. Developing links between the published literature and datasets is one of them. Current solutions are mostly based on bilateral, ad-hoc agreements between publishers and data centers, operating in silos whose content cannot be readily combined to deliver a network connecting research data and literature. The RDA Publishing Data Services Working Group (PDS-WG) aims to address this issue by bringing together different stakeholders to agree on common standards, combine links from disparate sources, and create a universal, open service for collecting and sharing such links: the Data-Literature Interlinking Service. This paper presents the synergic effort of the PDS-WG and the OpenAIRE infrastructure to realize and operate such a service. The Service populates and provides access to a graph of dataset-literature links collected from a variety of major data centers, publishers, and research organizations. At the time of writing, the Service has close to one million links with further contributions expected. Based on feedback from content providers and consumers, PDS-WG will continue to refine the Service data model and exchange format to make it a universal, cross-platform, crossdiscipline solution for collecting and sharing dataset-literature links.

Adrian Burton, Hylke Koers, Paolo Manghi, Sandro La Bruzzo, Amir Aryani, Michael Diepenbroek, Uwe Schindler

Historical Records and Digitization Factors in Biodiversity Communities of Practice

A central aim of biodiversity informatics initiatives is the global aggregation of biodiversity data. This work depends significantly on the translation of local data and metadata into wider global standards. While this is often considered to be primarily a technical task, there are also organizational factors to consider. In this paper, we use a Communities of Practice approach to argue that data and metadata in individual departments and institutions has often adapted over time to meet local organizational contexts, and that digitization workflows need to account for and capture the historical dimensions of collections, to support productive data migration. As part of this work, the central role of curators’ and managers’ practical and everyday knowledge of their collections is emphasized.

Michael Khoo, Gary Rosenberg

Ontologies for Research Data Description: A Design Process Applied to Vehicle Simulation

Data description is an essential part of research data management, and it is easy to argue for the importance of describing data early in the research workflow. Specific metadata schemas are often proposed to support description. Given the diversity of research domains, such schemas are often missing, and when available they may be too generic, too complex or hard to incorporate in a description platform. In this paper we present a method used to design metadata models for research data description as ontologies. Ontologies are gaining acceptance as knowledge representation structures, and we use them here in the scope of the Dendro platform. The ontology design process is illustrated with a case study from Vehicle Simulation. According to the design process, the resulting model was validated by a domain specialist.

João Aguiar Castro, Deborah Perrotta, Ricardo Carvalho Amorim, João Rocha da Silva, Cristina Ribeiro

Track on Metadata and Semantics for Agriculture, Food and Environment

Frontmatter

Setting up a Global Linked Data Catalog of Datasets for Agriculture

The movement to share data has been on the rise in the last decade and lately in the agricultural domain. Similarly platforms for publishing scientific and statistical datasets have sprouted and have improved visibility and availability of datasets. Yet there are still constraints in making datasets discoverable and re-usable. Commonly agreed semantics, authority lists to index datasets and standard formats and protocols to expose data are now essential. This paper explains how the CIARD RING provides a global linked data catalog of datasets for agriculture. The first part of this paper will describe the Linked Data layer of the CIARD RING focusing on the data model, semantics used and the CIARD RING LOD publication. The second part will provide examples of re-use of data from the RING. The paper concludes by describing the future steps in the development of the CIARD RING.

Valeria Pesce, Ajit Maru, Phil Archer, Thembani Malapela, Johannes Keizer

Improving Access to Big Data in Agriculture and Forestry Using Semantic Technologies

To better understand and manage the interactions of agriculture and natural resources, for example under current increasing societal demands and climate changes, agro-environmental research must bring together an ever growing amount of data and information from multiple science domains. Data that is inherently large, multi-dimensional and heterogeneous, and requires computational intensive processing. Thus, agro-environmental researchers must deal with specific Big Data challenges in efficiently acquiring the data fit to their job while limiting the amount of computational, network and storage resources needed to practical levels. Automated procedures for collection, selection, annotation and indexing of data and metadata are indispensable in order to be able to effectively exploit the global network of available scientific information. This paper describes work performed in the EU FP7 Trees4Future and SemaGrow projects that contributes to development and evaluation of an infrastructure that allows efficient discovery and unified querying of agricultural and forestry resources using Linked Data and semantic technologies.

Rob Lokers, Yke van Randen, Rob Knapen, Stephan Gaubitzer, Sergey Zudin, Sander Janssen

A Semantic Mediator for Handling Heterogeneity of Spatio-Temporal Environment Data

This paper presents the “Environment and landscape geoknowledge” project which aims to exploit heterogeneous data sources recorded at the Chizé environmental observatory since 1994. From a case study, we summarize the difficulties encountered by biologists and ecologists experts when maintaining and analyzing collected environmental data, essentially the spatial organization of landscape, crop rotation and wildlife data. We show how a framework which use a spatio-temporal ontology as a semantic mediator can solve challenges related to the analysis and maintenance of these heterogeneous data.

Ba-Huy Tran, Christine Plumejeaud-Perreau, Alain Bouju, Vincent Bretagnolle

Ontology Evolution for Experimental Data in Food

Throughout its life cycle, an ontology may change in order to reflect domain changes or new usages. This paper presents an ontology evolution activity applied to an ontology dedicated to the annotation of experimental data in food, and a plug-in, DynarOnto, which assists ontology engineers for carrying out the ontology changes. Our evolution method is an a priori method which takes as input an ontology in a consistent state, implements the changes selected to be applied and manages all the consequences of those changes by producing an ontology in a consistent state.

Rim Touhami, Patrice Buche, Juliette Dibie, Liliana Ibanescu

Graph Patterns as Representation of Rules Extracted from Decision Trees for Coffee Rust Detection

Diseases in Agricultural Production Systems represent one of the biggest drivers of losses and poor quality products. In the case of coffee production, experts in this area believe that weather conditions, along with physical properties of the crop are the main variables that determine the development of a disease known as Coffee Rust. On the other hand, several Artificial Intelligence techniques allow the analysis of agricultural environment variables in order to obtain their relationship with specific problems, such as diseases in crops. In this paper an extraction of rules to detect rust in coffee from induction of decision trees and expert knowledge is addressed. Finally, a graph-based representation of these rules is submitted, in order to obtain a model with greater expressiveness and interpretability.

Emmanuel Lasso, Thiago Toshiyuki Thamada, Carlos Alberto Alves Meira, Juan Carlos Corrales

Track on Metadata and Semantics for Cultural Collections and Applications

Frontmatter

Aggregating Cultural Heritage Data for Research Use: The Humanities Networked Infrastructure (HuNI)

This paper looks at the Humanities Networked Infrastructure (HuNI), a service which aggregates data from thirty Australian data sources and makes them available for use by researchers across the humanities and creative arts. We discuss the methods used by HuNI to aggregate data, as well as the conceptual framework which has shaped the design of HuNI’s Data Model around six core entity types. Two of the key functions available to users of HuNI - building collections and creating links - are discussed, together with their design rationale.

Deb Verhoeven, Toby Burrows

Historical Context Ontology (HiCO): A Conceptual Model for Describing Context Information of Cultural Heritage Objects

Communities addressing the problem of a shareable description of cultural heritage objects agree that a data-centric and context oriented approach should be reached in order to exchange and reuse heterogenous information. Here we present HiCO, an OWL 2 DL ontology aiming to outline relevant issues related to the workflow for stating, and formalizing, authoritative assertions about context information. The conceptual model outlines requirements for defining an authoritative statement and focuses on how a description of context information can be carried out when data are extracted from full-text of documents.

Marilena Daquino, Francesca Tomasi

Track on European and National Projects

Frontmatter

Standardizing NDT& E Techniques and Conservation Metadata for Cultural Artifacts

Conservation activities, before and after decay detection, are considered as a prerequisite for maintaining cultural artifacts in their initial/original form. Taking into account the strict regulations where sampling from art works of great historical value is restricted or in many cases prohibited, the application of Non- Destructive Testing techniques (NDTs) during the conservation or even decay detection is highly appreciated by conservators. Non-destructive examination include the employment of multiple analysis approaches and techniques namely Infrared Thermography (IRT), Ultrasonics (US), Ground Penetrating Radar (GPR), VIS-NIR Fiber Optics Diffuse Reflectance Spectroscopy (FORS), portable X-Ray Fluorescence (XRF), Environmental Scanning Electron Microscopy with Energy Dispersive X-Ray Analysis (ESEM-EDX), Attenuated Total Reflectance-Fourier Transform Infrared Spectroscopy (ATR-FTIR) and micro-Raman Spectroscopy. These produce a huge amount of data, in different formats, such as text, numerical sets and visual objects (i.e. images, thermograms, radargrams, spectral data, graphs, etc). Moreover, conservation documentation presents major drawbacks, as fragmentation and incomplete description of the related information is usually the case. Assigning conservation data to the objects’ metadata collection is very rare and not yet standardized. The Doc-Culture Project aims to provide solutions for the NDT application methodologies, analysis and process along with their output data and all related conservation documentation. The preliminary results are discussed in this paper.

Dimitris Kouis, Evgenia Vassilakaki, Eftichia Vraimaki, Eleni Cheilakou, Amani Christiana Saint, Evangelos Sakkopoulos, Emmanouil Viennas, Erion-Vasilis Pikoulis, Nikolaos Nodarakis, Nick Achilleopoulos, Spiros Zervos, Giorgos Giannakopoulos, Daphne Kyriaki-Manessi, Athanasios Tsakalidis, Maria Koui

Poster Papers

Frontmatter

ALIADA, an Open Source Tool for Automatic Publication of Linked Data from Libraries and Museums

ALIADA, Spanish word that means ‘ally’, is intended to be a tool to help librarians and curators from cultural heritage institutions to automatically publish their high quality data in the Linked Data Cloud. Traditionally, data from libraries and museums have been stored as ‘silos’ of information because their metadata are codified using their own schemes and formats, not accessible by machines and applications more general public-focused. In addition, these information professionals create rich data from their collections, but they are not expert enough to face the coming technologies required to take advantage of the opportunities that the information and open knowledge era provides. To overcome these limitations, ALIADA EC-funded Project has developed an open source tool compliant with libraries and museums standards that automatically converts library and museum metadata into structured data ready to be published in the Linked Data Cloud, according to the Linked Data paradigm. Thus, heritage and cultural data are also open and available to be queried and reused by machines, innovative applications, search engines and other cultural and research institutions to generate more open knowledge.

Cristina Gareta

Resource Classification as the Basis for a Visualization Pipeline in LOD Scenarios

After more than a decade since the first steps on the Semantic Web were set, mass adoption of these technologies is still an utopic goal. Machine-readable data should leverage to provide smarter summarisations of any dataset, making them comprehensible for any user, without the need for specific knowledge. The automatic generation of coherent visual representations based on Linked Open Data could stand for mass adoption of the Semantic Web’s vision.

Our effort towards this goal is to establish a visualization pipeline, from raw semantically annotated data as input, to insightful visualizations for data analysts as output. The first steps of this pipeline need to extract the nature of the data itself through generic SPARQL queries in order to draft the structure of the data for further stages

Oscar Peña, Unai Aguilera, Diego López-de-Ipiña

Backmatter

Weitere Informationen