Skip to main content

2019 | Buch

The Semantic Web

16th International Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings

herausgegeben von: Pascal Hitzler, Miriam Fernández, Krzysztof Janowicz, Amrapali Zaveri, Alasdair J.G. Gray, Vanessa Lopez, Armin Haller, Karl Hammar

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 16th International Semantic Web Conference, ESWC 2019, held in Portorož, Slovenia.

The 39 revised full papers presented were carefully reviewed and selected from 134 submissions. The papers are organized in three tracks: research track, resources track, and in-use track and deal with the following topical areas: distribution and decentralisation, velocity on the Web, research of research, ontologies and reasoning, linked data, natural language processing and information retrieval, semantic data management and data infrastructures, social and human aspects of the Semantic Web, and, machine learning.

Inhaltsverzeichnis

Frontmatter
Correction to: Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation

By mistake this chapter was originally published non open access. This has been corrected.

Katherine Thornton, Harold Solbrig, Gregory S. Stupp, Jose Emilio Labra Gayo, Daniel Mietchen, Eric Prud’hommeaux, Andra Waagmeester

Research Track

Frontmatter
A Decentralized Architecture for Sharing and Querying Semantic Data

Although the Semantic Web in principle provides access to a vast Web of interlinked data, the full potential remains mostly unexploited. One of the main reasons for this is the fact that the architecture of the current Web of Data relies on a set of servers providing access to the data. These servers represent bottlenecks and single points of failure that result in instability and unavailability of data at certain points in time. In this paper, we therefore propose a decentralized architecture (Piqnic) for sharing and querying semantic data. By combining both client and server functionality at each participating node, and introducing replication, Piqnic avoids bottlenecks and keeps datasets available and queryable although the original source might not be available. Our experimental results, using a standard benchmark of real datasets, show that Piqnic can serve as an architecture for sharing and querying semantic data, even in the presence of node failures.

Christian Aebeloe, Gabriela Montoya, Katja Hose
Reformulation-Based Query Answering for RDF Graphs with RDFS Ontologies

Query answering in RDF knowledge bases has traditionally been performed either through graph saturation, i.e., adding all implicit triples to the graph, or through query reformulation, i.e., modifying the query to look for the explicit triples entailing precisely what the original query asks for. The most expressive fragment of RDF for which Reformulation-based query answering exists is the so-called database fragment [13], in which implicit triples are restricted to those entailed using an RDFS ontology. Within this fragment, query answering was so far limited to the interrogation of data triples (non-RDFS ones); however, a powerful feature specific to RDF is the ability to query data and schema triples together. In this paper, we address the general query answering problem by reducing it, through a pre-query reformulation step, to that solved by the query reformulation technique of [13]. We also report on experiments demonstrating the low cost of our reformulation algorithm.

Maxime Buron, François Goasdoué, Ioana Manolescu, Marie-Laure Mugnier
A Hybrid Graph Model for Distant Supervision Relation Extraction

Distant supervision has advantages of generating training data automatically for relation extraction by aligning triples in Knowledge Graphs with large-scale corpora. Some recent methods attempt to incorporate extra information to enhance the performance of relation extraction. However, there still exist two major limitations. Firstly, these methods are tailored for a specific type of information which is not enough to cover most of the cases. Secondly, the introduced extra information may contain noise. To address these issues, we propose a novel hybrid graph model, which can incorporate heterogeneous background information in a unified framework, such as entity types and human-constructed triples. These various kinds of knowledge can be integrated efficiently even with several missing cases. In addition, we further employ an attention mechanism to identify the most confident information which can alleviate the side effect of noise. Experimental results demonstrate that our model outperforms the state-of-the-art methods significantly in various evaluation metrics.

Shangfu Duan, Huan Gao, Bing Liu, Guilin Qi
Retrieving Textual Evidence for Knowledge Graph Facts

Knowledge graphs have become vital resources for semantic search and provide users with precise answers to their information needs. Knowledge graphs often consist of billions of facts, typically encoded in the form of RDF triples. In most cases, these facts are extracted automatically and can thus be susceptible to errors. For many applications, it can therefore be very useful to complement knowledge graph facts with textual evidence. For instance, it can help users make informed decisions about the validity of the facts that are returned as part of an answer to a query. In this paper, we therefore propose , an approach that given a knowledge graph and a text corpus, retrieves the top-k most relevant textual passages for a given set of facts. Since our goal is to retrieve short passages, we develop a set of IR models combining exact matching through the Okapi BM25 model with semantic matching using word embeddings. To evaluate our approach, we built an extensive benchmark consisting of facts extracted from YAGO and text passages retrieved from Wikipedia. Our experimental results demonstrate the effectiveness of our approach in retrieving textual evidence for knowledge graph facts.

Gonenc Ercan, Shady Elbassuoni, Katja Hose
Boosting DL Concept Learners

We present a method for boosting relational classifiers of individual resources in the context of the Web of Data. We show how weak classifiers induced by simple concept learners can be enhanced producing strong classification models from training datasets. Even more so the comprehensibility of the model is to some extent preserved as it can be regarded as a sort of concept in disjunctive form. We demonstrate the application of this approach to a weak learner that is easily derived from learners that search a space of hypotheses, requiring an adaptation of the underlying heuristics to take into account weighted training examples. An experimental evaluation on a variety of artificial learning problems and datasets shows that the proposed approach enhances the performance of the basic learners and is competitive, outperforming current concept learning systems.

Nicola Fanizzi, Giuseppe Rizzo, Claudia d’Amato
Link Prediction in Knowledge Graphs with Concepts of Nearest Neighbours

The open nature of Knowledge Graphs (KG) often implies that they are incomplete. Link prediction consists in inferring new links between the entities of a KG based on existing links. Most existing approaches rely on the learning of latent feature vectors for the encoding of entities and relations. In general however, latent features cannot be easily interpreted. Rule-based approaches offer interpretability but a distinct ruleset must be learned for each relation, and computation time is difficult to control. We propose a new approach that does not need a training phase, and that can provide interpretable explanations for each inference. It relies on the computation of Concepts of Nearest Neighbours (CNN) to identify similar entities based on common graph patterns. Dempster-Shafer theory is then used to draw inferences from CNNs. We evaluate our approach on FB15k-237, a challenging benchmark for link prediction, where it gets competitive performance compared to existing approaches.

Sébastien Ferré
Disclosing Citation Meanings for Augmented Research Retrieval and Exploration

In recent years, new digital technologies are being used to support the navigation and the analysis of scientific publications, justified by the increasing number of articles published every year. For this reason, experts make use of on-line systems to browse thousands of articles in search of relevant information. In this paper, we present a new method that automatically assigns meanings to references on the basis of the citation text through a Natural Language Processing pipeline and a slightly-supervised clustering process. The resulting network of semantically-linked articles allows an informed exploration of the research panorama through semantic paths. The proposed approach has been validated using the ACL Anthology Dataset containing several thousands of papers related to the Computational Linguistics field. A manual evaluation on the extracted citation meanings carried to very high levels of accuracy. Finally, a freely-available web-based application has been developed and published on-line.

Roger Ferrod, Claudio Schifanella, Luigi Di Caro, Mario Cataldi
Injecting Domain Knowledge in Electronic Medical Records to Improve Hospitalization Prediction

Electronic medical records (EMR) contain key information about the different symptomatic episodes that a patient went through. They carry a great potential in order to improve the well-being of patients and therefore represent a very valuable input for artificial intelligence approaches. However, the explicit knowledge directly available through these records remains limited, the extracted features to be used by machine learning algorithms do not contain all the implicit knowledge of medical expert. In order to evaluate the impact of domain knowledge when processing EMRs, we augment the features extracted from EMRs with ontological resources before turning them into vectors used by machine learning algorithms. We evaluate these augmentations with several machine learning algorithms to predict hospitalization. Our approach was experimented on data from the PRIMEGE PACA database that contains more than 350,000 consultations carried out by 16 general practitioners (GPs).

Raphaël Gazzotti, Catherine Faron-Zucker, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon
Explore and Exploit. Dictionary Expansion with Human-in-the-Loop

Many Knowledge Extraction systems rely on semantic resources - dictionaries, ontologies, lexical resources - to extract information from unstructured text. A key for successful information extraction is to consider such resources as evolving artifacts and keep them up-to-date. In this paper, we tackle the problem of dictionary expansion and we propose a human-in-the-loop approach: we couple neural language models with tight human supervision to assist the user in building and maintaining domain-specific dictionaries. The approach works on any given input text corpus and is based on the explore and exploit paradigm: starting from a few seeds (or an existing dictionary) it effectively discovers new instances (explore) from the text corpus as well as predicts new potential instances which are not in the corpus, i.e. “unseen”, using the current dictionary entries (exploit). We evaluate our approach on five real-world dictionaries, achieving high accuracy with a rapid expansion rate.

Anna Lisa Gentile, Daniel Gruhl, Petar Ristoski, Steve Welch
Aligning Biomedical Metadata with Ontologies Using Clustering and Embeddings

The metadata about scientific experiments published in online repositories have been shown to suffer from a high degree of representational heterogeneity—there are often many ways to represent the same type of information, such as a geographical location via its latitude and longitude. To harness the potential that metadata have for discovering scientific data, it is crucial that they be represented in a uniform way that can be queried effectively. One step toward uniformly-represented metadata is to normalize the multiple, distinct field names used in metadata (e.g., lat lon, lat and long) to describe the same type of value. To that end, we present a new method based on clustering and embeddings (i.e., vector representations of words) to align metadata field names with ontology terms. We apply our method to biomedical metadata by generating embeddings for terms in biomedical ontologies from the BioPortal repository. We carried out a comparative study between our method and the NCBO Annotator, which revealed that our method yields more and substantially better alignments between metadata and ontology terms.

Rafael S. Gonçalves, Maulik R. Kamdar, Mark A. Musen
Generating Semantic Aspects for Queries

Large document collections can be hard to explore if the user presents her information need in a limited set of keywords. Ambiguous intents arising out of these short queries often result in long-winded query sessions and many query reformulations. To alleviate this problem, in this work, we propose the novel concept of semantic aspects (e.g., $${\langle }\{\textsf {michael\text {-}phelps}\}, \{\textsf {athens, beijing, london}\}, [2004,2016] \rangle $$ for the ambiguous query ) and present the xFactor algorithm that generates them from annotations in documents. Semantic aspects uplift document contents into a meaningful structured representation, thereby allowing the user to sift through many documents without the need to read their contents. The semantic aspects are created by the analysis of semantic annotations in the form of temporal, geographic, and named entity annotations. We evaluate our approach on a novel testbed of over 5,000 aspects on Web-scale document collections amounting to more than 450 million documents. Our results show the xFactor algorithm finds relevant aspects for highly ambiguous queries.

Dhruv Gupta, Klaus Berberich, Jannik Strötgen, Demetrios Zeinalipour-Yazti
A Recommender System for Complex Real-World Applications with Nonlinear Dependencies and Knowledge Graph Context

Most latent feature methods for recommender systems learn to encode user preferences and item characteristics based on past user-item interactions. While such approaches work well for standalone items (e.g., books, movies), they are not as well suited for dealing with composite systems. For example, in the context of industrial purchasing systems for engineering solutions, items can no longer be considered standalone. Thus, latent representation needs to encode the functionality and technical features of the engineering solutions that result from combining the individual components. To capture these dependencies, expressive and context-aware recommender systems are required. In this paper, we propose NECTR, a novel recommender system based on two components: a tensor factorization model and an autoencoder-like neural network. In the tensor factorization component, context information of the items is structured in a multi-relational knowledge base encoded as a tensor and latent representations of items are extracted via tensor factorization. Simultaneously, an autoencoder-like component captures the non-linear interactions among configured items. We couple both components such that our model can be trained end-to-end. To demonstrate the real-world applicability of NECTR, we conduct extensive experiments on an industrial dataset concerned with automation solutions. Based on the results, we find that NECTR outperforms state-of-the-art methods by approximately 50% with respect to a set of standard performance metrics.

Marcel Hildebrandt, Swathi Shyam Sunder, Serghei Mogoreanu, Mitchell Joblin, Akhil Mehta, Ingo Thon, Volker Tresp
Learning URI Selection Criteria to Improve the Crawling of Linked Open Data

As the Web of Linked Open Data is growing the problem of crawling that cloud becomes increasingly important. Unlike normal Web crawlers, a Linked Data crawler performs a selection to focus on collecting linked RDF (including RDFa) data on the Web. From the perspectives of throughput and coverage, given a newly discovered and targeted URI, the key issue of Linked Data crawlers is to decide whether this URI is likely to dereference into an RDF data source and therefore it is worth downloading the representation it points to. Current solutions adopt heuristic rules to filter irrelevant URIs. Unfortunately, when the heuristics are too restrictive this hampers the coverage of crawling. In this paper, we propose and compare approaches to learn strategies for crawling Linked Data on the Web by predicting whether a newly discovered URI will lead to an RDF data source or not. We detail the features used in predicting the relevance and the methods we evaluated including a promising adaptation of FTRL-proximal online learning algorithm. We compare several options through extensive experiments including existing crawlers as baseline methods to evaluate their efficacy.

Hai Huang, Fabien Gandon
Deontic Reasoning for Legal Ontologies

Many standards exist to formalize legal texts and rules. The same is true for legal ontologies. However, there is no proof theory to draw conclusions for these ontologically modeled rules. We address this gap by the proposal of a new modeling of deontic statements, and then we use this modeling to propose reasoning mechanisms to answer deontic questions i.e., questions like “Is it mandatory/permitted/prohibited to...”. We also show that using this modeling, it is possible to check the consistency of a deontic rule base. This work stands as a first important step towards a proof theory over a deontic rule base.

Cheikh Kacfah Emani, Yannis Haralambous
Incorporating Joint Embeddings into Goal-Oriented Dialogues with Multi-task Learning

Attention-based encoder-decoder neural network models have recently shown promising results in goal-oriented dialogue systems. However, these models struggle to reason over and incorporate state-full knowledge while preserving their end-to-end text generation functionality. Since such models can greatly benefit from user intent and knowledge graph integration, in this paper we propose an RNN-based end-to-end encoder-decoder architecture which is trained with joint embeddings of the knowledge graph and the corpus as input. The model provides an additional integration of user intent along with text generation, trained with multi-task learning paradigm along with an additional regularization technique to penalize generating the wrong entity as output. The model further incorporates a Knowledge Graph entity lookup during inference to guarantee the generated output is state-full based on the local knowledge graph provided. We finally evaluated the model using the BLEU score, empirical evaluation depicts that our proposed architecture can aid in the betterment of task-oriented dialogue system’s performance.

Firas Kassawat, Debanjan Chaudhuri, Jens Lehmann
Link Prediction Using Multi Part Embeddings

Knowledge graph embeddings models are widely used to provide scalable and efficient link prediction for knowledge graphs. They use different techniques to model embeddings interactions, where their tensor factorisation based versions are known to provide state-of-the-art results. In recent works, developments on factorisation based knowledge graph embedding models were mostly limited to enhancing the ComplEx and the DistMult models, as they can efficiently provide predictions within linear time and space complexity. In this work, we aim to extend the works of the ComplEx and the DistMult models by proposing a new factorisation model, TriModel, which uses three part embeddings to model a combination of symmetric and asymmetric interactions between embeddings. We perform an empirical evaluation for the TriModel model compared to other tensor factorisation models on different training configurations (loss functions and regularisation terms), and we show that the TriModel model provides the state-of-the-art results in all configurations. In our experiments, we use standard benchmarking datasets (WN18, WN18RR, FB15k, FB15k-237, YAGO10) along with a new NELL based benchmarking dataset (NELL239) that we have developed.

Sameh K. Mohamed, Vít Nováček
Modelling the Compatibility of Licenses

Web applications facilitate combining resources (linked data, web services, source code, documents, etc.) to create new ones. For a resource producer, choosing the appropriate license for a combined resource is not easy. It involves choosing a license compliant with all the licenses of combined resources and analysing the reusability of the resulting resource through the compatibility of its license. The risk is either, to choose a license too restrictive making the resource difficult to reuse, or to choose a not enough restrictive license that will not sufficiently protect the resource. Finding the right trade-off between compliance and compatibility is a difficult process. An automatic ordering over licenses would facilitate this task. Our research question is: given a license $$l_{i}$$ , how to automatically position $$l_{i}$$ over a set of licenses in terms of compatibility and compliance? We propose CaLi, a model that partially orders licenses. Our approach uses restrictiveness relations among licenses to define compatibility and compliance. We validate experimentally CaLi with a quadratic algorithm and show its usability through a prototype of a license-based search engine. Our work is a step towards facilitating and encouraging the publication and reuse of licensed resources in the Web of Data.

Benjamin Moreau, Patricia Serrano-Alvarado, Matthieu Perrin, Emmanuel Desmontils
GConsent - A Consent Ontology Based on the GDPR

Consent is an important legal basis for the processing of personal data under the General Data Protection Regulation (GDPR), which is the current European data protection law. GPDR provides constraints and obligations on the validity of consent, and provides data subjects with the right to withdraw their consent at any time. Determining and demonstrating compliance to these obligations require information on how the consent was obtained, used, and changed over time. Existing work demonstrates feasibility of semantic web technologies in modelling information and determining compliance for GDPR. Although these address consent, they currently do not model all the information associated with it. In this paper, we address this by first presenting our analysis of information associated with consent under the GDPR. We then present GConsent, an OWL2-DL ontology for representation of consent and its associated information such as provenance. The paper presents the methodology used in the creation and validation of the ontology as well as an example use-case demonstrating its applicability. The ontology and this paper can be accessed online at https://w3id.org/GConsent .

Harshvardhan J. Pandit, Christophe Debruyne, Declan O’Sullivan, Dave Lewis
Latent Relational Model for Relation Extraction

Analogy is a fundamental component of the way we think and process thought. Solving a word analogy problem, such as mason is to stone as carpenter is to wood, requires capabilities in recognizing the implicit relations between the two word pairs. In this paper, we describe the analogy problem from a computational linguistics point of view and explore its use to address relation extraction tasks. We extend a relational model that has been shown to be effective in solving word analogies and adapt it to the relation extraction problem. Our experiments show that this approach outperforms the state-of-the-art methods on a relation extraction dataset, opening up a new research direction in discovering implicit relations in text through analogical reasoning.

Gaetano Rossiello, Alfio Gliozzo, Nicolas Fauceglia, Giovanni Semeraro
Mini-ME Swift: The First Mobile OWL Reasoner for iOS

Mobile reasoners play a pivotal role in the so-called Semantic Web of Things. While several tools exist for the Android platform, iOS has been neglected so far. This is due to architectural differences and unavailability of OWL manipulation libraries, which make porting existing engines harder. This paper presents Mini-ME Swift, the first Description Logics reasoner for iOS. It implements standard (Subsumption, Satisfiability, Classification, Consistency) and non-standard (Abduction, Contraction, Covering, Difference) inferences in an OWL 2 fragment. Peculiarities are discussed and performance results are presented, comparing Mini-ME Swift with other state-of-the-art OWL reasoners.

Michele Ruta, Floriano Scioscia, Filippo Gramegna, Ivano Bilenchi, Eugenio Di Sciascio
Validation of SHACL Constraints over KGs with OWL 2 QL Ontologies via Rewriting

Constraints have traditionally been used to ensure data quality. Recently, several constraint languages such as SHACL, as well as mechanisms for constraint validation, have been proposed for Knowledge Graphs (KGs). KGs are often enhanced with ontologies that define relevant background knowledge in a formal language such as OWL 2 QL. However, existing systems for constraint validation either ignore these ontologies, or compile ontologies and constraints into rules that should be executed by some rule engine. In the latter case, one has to rely on different systems when validating constrains over KGs and over ontology-enhanced KGs. In this work, we address this problem by defining rewriting techniques that allow to compile an OWL 2 QL ontology and a set of SHACL constraints into another set of SHACL constraints. We show that in the general case the rewriting may not exists, but it always exists for the positive fragment of SHACL. Our rewriting techniques allow to validate constraints over KGs with and without ontologies using the same SHACL validation engines.

Ognjen Savković, Evgeny Kharlamov, Steffen Lamparter
An Ontology-Based Interactive System for Understanding User Queries

The use of ontologies in applications like dialogue systems, question-answering or decision-support is gradually gaining attention. In such applications, keyword-based user queries are mapped to ontology entities and then the respective application logic is activated. This task is not trivial as user queries may often be vague and imprecise or simply don’t match the entities recognised by the application. This is for example the case in symptom-checking dialogue systems where users can enter text like “I am not feeling well”, “I sleep terribly”, and more, which cannot be directly matched to entities found in formal medical ontologies. In the current paper we present a framework for automatically building a small dialogue for the purposes of bridging the gap between user queries and a set of pre-defined (target) ontology concepts. We show how we can use the ontology and statistical techniques to select an initial small set of candidate concepts from the target ones and how these can then be grouped into categories using their properties in the ontology. Using these groups we can ask the user questions in order to try and reduce the set of candidates to a single concept that captures the initial user intention. The effectiveness of this approach is hindered by well-known underspecification of ontologies which we address by a concept enrichment pre-processing step based on information extraction techniques. We have instantiated our framework and performed a preliminary evaluation largely motivated by a real-world symptom-checking application obtaining encouraging results.

Giorgos Stoilos, Szymon Wartak, Damir Juric, Jonathan Moore, Mohammad Khodadadi
Knowledge-Based Short Text Categorization Using Entity and Category Embedding

Short text categorization is an important task due to the rapid growth of online available short texts in various domains such as web search snippets, etc. Most of the traditional methods suffer from sparsity and shortness of the text. Moreover, supervised learning methods require a significant amount of training data and manually labeling such data can be very time-consuming and costly. In this study, we propose a novel probabilistic model for Knowledge-Based Short Text Categorization (KBSTC), which does not require any labeled training data to classify a short text. This is achieved by leveraging entities and categories from large knowledge bases, which are further embedded into a common vector space, for which we propose a new entity and category embedding model. Given a short text, its category (e.g. Business, Sports, etc.) can then be derived based on the entities mentioned in the text by exploiting semantic similarity between entities and categories. To validate the effectiveness of the proposed method, we conducted experiments on two real-world datasets, i.e., AG News and Google Snippets. The experimental results show that our approach significantly outperforms the classification approaches which do not require any labeled data, while it comes close to the results of the supervised approaches.

Rima Türker, Lei Zhang, Maria Koutraki, Harald Sack
A Hybrid Approach for Aspect-Based Sentiment Analysis Using a Lexicalized Domain Ontology and Attentional Neural Models

This work focuses on sentence-level aspect-based sentiment analysis for restaurant reviews. A two-stage sentiment analysis algorithm is proposed. In this method, first a lexicalized domain ontology is used to predict the sentiment and as a back-up algorithm a neural network with a rotatory attention mechanism (LCR-Rot) is utilized. Furthermore, two features are added to the backup algorithm. The first extension changes the order in which the rotatory attention mechanism operates (LCR-Rot-inv). The second extension runs over the rotatory attention mechanism for multiple iterations (LCR-Rot-hop). Using the SemEval-2015 and SemEval-2016 data, we conclude that the two-stage method outperforms the baseline methods, albeit with a small percentage. Moreover, we find that the method where we iterate multiple times over a rotatory attention mechanism has the best performance.

Olaf Wallaart, Flavius Frasincar
Predicting Entity Mentions in Scientific Literature

Predicting which entities are likely to be mentioned in scientific articles is a task with significant academic and commercial value. For instance, it can lead to monetary savings if the articles are behind paywalls, or be used to recommend articles that are not yet available. Despite extensive prior work on entity prediction in Web documents, the peculiarities of scientific literature make it a unique scenario for this task. In this paper, we present an approach that uses a neural network to predict whether the (unseen) body of an article contains entities defined in domain-specific knowledge bases (KBs). The network uses features from the abstracts and the KB, and it is trained using open-access articles and authors’ prior works. Our experiments on biomedical literature show that our method is able to predict subsets of entities with high accuracy. As far as we know, our method is the first of its kind and is currently used in several commercial settings.

Yalung Zheng, Jon Ezeiza, Mehdi Farzanehpour, Jacopo Urbani

Resources Track

Frontmatter
AYNEC: All You Need for Evaluating Completion Techniques in Knowledge Graphs

The popularity of knowledge graphs has led to the development of techniques to refine them and increase their quality. One of the main refinement tasks is completion (also known as link prediction for knowledge graphs), which seeks to add missing triples to the graph, usually by classifying potential ones as true or false. While there is a wide variety of graph completion techniques, there is no standard evaluation setup, so each proposal is evaluated using different datasets and metrics. In this paper we present AYNEC, a suite for the evaluation of knowledge graph completion techniques that covers the entire evaluation workflow. It includes a customisable tool for the generation of datasets with multiple variation points related to the preprocessing of graphs, the splitting into training and testing examples, and the generation of negative examples. AYNEC also provides a visual summary of the graph and the optional exportation of the datasets in an open format for their visualisation. We use AYNEC to generate a library of datasets ready to use for evaluation purposes based on several popular knowledge graphs. Finally, it includes a tool that computes relevant metrics and uses significance tests to compare each pair of techniques. These open source tools, along with the datasets, are freely available to the research community and will be maintained.

Daniel Ayala, Agustín Borrego, Inma Hernández, Carlos R. Rivero, David Ruiz
RVO - The Research Variable Ontology

Enterprises today are presented with a plethora of data, tools and analytics techniques, but lack systems which help analysts to navigate these resources and identify best fitting solutions for their analytics problems. To support enterprise-level data analytics research, this paper presents Research Variable Ontology (RVO), an ontology designed to catalogue and explore essential data analytics design elements such as variables, analytics models and available data sources. RVO is specialised to support researchers with exploratory and predictive analytics problems, popularly practiced in economics and social science domains. We present the RVO design process, its schema, how it links and extends existing ontologies to provide a holistic view of analytics related knowledge and how data analysts at the enterprise level can use it. Capabilities of RVO are illustrated through a case study on House Price Prediction.

Madhushi Bandara, Ali Behnaz, Fethi A. Rabhi
EVENTSKG: A 5-Star Dataset of Top-Ranked Events in Eight Computer Science Communities

Metadata of scientific events has become increasingly available on the Web, albeit often as raw data in various formats, disregarding its semantics and interlinking relations. This leads to restricting the usability of this data for, e.g., subsequent analyses and reasoning. Therefore, there is a pressing need to represent this data in a semantic representation, i.e., Linked Data. We present the new release of the EVENTSKG dataset, comprising comprehensive semantic descriptions of scientific events of eight computer science communities. Currently, EVENTSKG is a 5-star dataset containing metadata of 73 top-ranked event series (almost 2,000 events) established over the last five decades. The new release is a Linked Open Dataset adhering to an updated version of the Scientific Events Ontology, a reference ontology for event metadata representation, leading to richer and cleaner data. To facilitate the maintenance of EVENTSKG and to ensure its sustainability, EVENTSKG is coupled with a Java API that enables users to add/update events metadata without going into the details of the representation of the dataset. We shed light on events characteristics by analyzing EVENTSKG data, which provides a flexible means for customization in order to better understand the characteristics of renowned CS events.

Said Fathalla, Christoph Lange, Sören Auer
CORAL: A Corpus of Ontological Requirements Annotated with Lexico-Syntactic Patterns

Ontological requirements play a key role in ontology development as they determine the knowledge that needs to be modelled. In addition, the analysis of such requirements can be used (a) to improve ontology testing by easing the automation of requirements into tests; (b) to improve the requirements specification activity; or (c) to ease ontology reuse by facilitating the identification of patterns. However, there is a lack of openly available ontological requirements published together with their associated ontologies, which hinders such analysis. Therefore, in this work we present CORAL (Corpus of Ontological Requirements Annotated with Lexico-syntactic patterns), an openly available corpus of 834 ontological requirements annotated and 29 lexico-syntactic patterns, from which 12 are proposed in this work. CORAL is openly available in three different open formats, namely, HTML, CSV and RDF under “Creative Commons Attribution 4.0 International” license.

Alba Fernández-Izquierdo, María Poveda-Villalón, Raúl García-Castro
MMKG: Multi-modal Knowledge Graphs

We present Mmkg, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs. Therefore, multi-relational link prediction and entity matching communities can benefit from this resource. We believe this data set has the potential to facilitate the development of novel multi-modal learning approaches for knowledge graphs. We validate the utility of Mmkg in the $$\mathtt {sameAs}$$ link prediction task with an extensive set of experiments. These experiments show that the task at hand benefits from learning of multiple feature types.

Ye Liu, Hui Li, Alberto Garcia-Duran, Mathias Niepert, Daniel Onoro-Rubio, David S. Rosenblum
BeSEPPI: Semantic-Based Benchmarking of Property Path Implementations

In 2013 property paths were introduced with the release of SPARQL 1.1. These property paths allow for describing complex queries in a more concise and comprehensive way. The W3C introduced a formal specification of the semantics of property paths, to which implementations should adhere. Most commonly used RDF stores claim to support property paths. In order to give insight into how well current implementations of property paths work we have developed BeSEPPI, a benchmark for the semantic-based evaluation of property path implementations. BeSEPPI checks whether RDF stores follow the W3Cs semantics by testing the correctness and completeness of query result sets. The results of our benchmark show that only one out of 5 benchmarked RDF stores returns complete and correct result sets for all benchmark queries.

Adrian Skubella, Daniel Janke, Steffen Staab
QED: Out-of-the-Box Datasets for SPARQL Query Evaluation

In this paper, we present SPARQL QED, a system generating out-of-the-box datasets for SPARQL queries over linked data. QED distinguishes the queries according to the different SPARQL features and creates, for each query, a small but exhaustive dataset comprising linked data and the query answers over this data. These datasets can support the development of applications based on SPARQL query answering in various ways. For instance, they may serve as SPARQL compliance tests or can be used for learning in query-by-example systems. We ensure that the created datasets are diverse and cover various practical use cases and, of course, that the sets of answers included are the correct ones. Example tests generated based on queries and data from DBpedia have shown bugs in Jena and Virtuoso.

Veronika Thost, Julian Dolby
ToCo: An Ontology for Representing Hybrid Telecommunication Networks

The TOUCAN project proposed an ontology for telecommunication networks with hybrid technologies – the TOUCAN Ontology (ToCo), available at http://purl.org/toco/ , as well as a knowledge design pattern Device-Interface-Link (DIL) pattern. The core classes and relationships forming the ontology are discussed in detail. The ToCo ontology can describe the physical infrastructure, quality of channel, services and users in heterogeneous telecommunication networks which span multiple technology domains. The DIL pattern is observed and summarised when modelling networks with various technology domains. Examples and use cases of ToCo are presented for demonstration.

Qianru Zhou, Alasdair J. G. Gray, Stephen McLaughlin
A Software Framework and Datasets for the Analysis of Graph Measures on RDF Graphs

As the availability and the inter-connectivity of RDF datasets grow, so does the necessity to understand the structure of the data. Understanding the topology of RDF graphs can guide and inform the development of, e.g. synthetic dataset generators, sampling methods, index structures, or query optimizers. In this work, we propose two resources: (i) a software framework (Resource URL of the framework: https://doi.org/10.5281/zenodo.2109469 ) able to acquire, prepare, and perform a graph-based analysis on the topology of large RDF graphs, and (ii) results on a graph-based analysis of 280 datasets (Resource URL of the datasets: https://doi.org/10.5281/zenodo.1214433 ) from the LOD Cloud with values for 28 graph measures computed with the framework. We present a preliminary analysis based on the proposed resources and point out implications for synthetic dataset generators. Finally, we identify a set of measures, that can be used to characterize graphs in the Semantic Web.

Matthäus Zloch, Maribel Acosta, Daniel Hienert, Stefan Dietze, Stefan Conrad

In-Use Track

Frontmatter
The Location Index: A Semantic Web Spatial Data Infrastructure

The Location Index (LocI) project is building a national and authoritative, also federated, index for Australian spatial data using Semantic Web technologies. It will be used to link observation and measurement data (social, economic and environmental) to spatial objects identified in any one of multiple, interoperable, datasets. Its goal is to improve efficiency and reliability of data integration to support government decision making.

Nicholas J. Car, Paul J. Box, Ashley Sommer
Legislative Document Content Extraction Based on Semantic Web Technologies
A Use Case About Processing the History of the Law

This paper describes the system architecture for generating the History of the Law developed for the Chilean National Library of Congress (BCN). The production system uses Semantic Web technologies, Akoma-Ntoso, and tools that automate the marking of plain text to XML, enriching and linking documents. These documents semantically annotated allow to develop specialized political and legislative services, and to extract knowledge for a Legal Knowledge Base for public use. We show the strategies used for the implementation of the automatic markup tools, as well as describe the knowledge graph generated from semantic documents. Finally, we show the contrast between the time of document processing using semantic technologies versus manual tasks, and the lessons learnt in this process, installing a base for the replication of a technological model that allows the generation of useful services for diverse contexts.

Francisco Cifuentes-Silva, Jose Emilio Labra Gayo
BiographySampo – Publishing and Enriching Biographies on the Semantic Web for Digital Humanities Research

This paper argues for making a paradigm shift in publishing and using biographical dictionaries on the web, based on Linked Data. The idea is to provide the user with enhanced reading experience of biographies by enriching contents with data linking and reasoning. In addition, versatile tooling for (1) biographical research of individual persons as well as for (2) prosopographical research on groups of people are provided. To demonstrate and evaluate the new possibilities, we present the semantic portal “BiographySampo – Finnish Biographies on the Semantic Web”. The system is based on a knowledge graph extracted automatically from a collection of 13 100 textual biographies, enriched with data linking to 16 external data sources, and by harvesting external collection data from libraries, museums, and archives. The portal was released in September 2018 for free public use at http://biografiasampo.fi .

Eero Hyvönen, Petri Leskinen, Minna Tamper, Heikki Rantala, Esko Ikkala, Jouni Tuominen, Kirsi Keravuori
Tinderbook: Fall in Love with Culture

More than 2 millions of new books are published every year and choosing a good book among the huge amount of available options can be a challenging endeavor. Recommender systems help in choosing books by providing personalized suggestions based on the user reading history. However, most book recommender systems are based on collaborative filtering, involving a long onboarding process that requires to rate many books before providing good recommendations. Tinderbook provides book recommendations, given a single book that the user likes, through a card-based playful user interface that does not require an account creation. Tinderbook is strongly rooted in semantic technologies, using the DBpedia knowledge graph to enrich book descriptions and extending a hybrid state-of-the-art knowledge graph embeddings algorithm to derive an item relatedness measure for cold start recommendations. Tinderbook is publicly available ( http://www.tinderbook.it ) and has already generated interest in the public, involving passionate readers, students, librarians, and researchers. The online evaluation shows that Tinderbook achieves almost 50% of precision of the recommendations.

Enrico Palumbo, Alberto Buzio, Andrea Gaiardo, Giuseppe Rizzo, Raphael Troncy, Elena Baralis

Open Access

Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation

We discuss Shape Expressions (ShEx), a concise, formal, modeling and validation language for RDF structures. For instance, a Shape Expression could prescribe that subjects in a given RDF graph that fall into the shape “Paper” are expected to have a section called “Abstract”, and any ShEx implementation can confirm whether that is indeed the case for all such subjects within a given graph or subgraph.There are currently five actively maintained ShEx implementations. We discuss how we use the JavaScript, Scala and Python implementations in RDF data validation workflows in distinct, applied contexts. We present examples of how ShEx can be used to model and validate data from two different sources, the domain-specific Fast Healthcare Interoperability Resources (FHIR) and the domain-generic Wikidata knowledge base, which is the linked database built and maintained by the Wikimedia Foundation as a sister project to Wikipedia. Example projects that are using Wikidata as a data curation platform are presented as well, along with ways in which they are using ShEx for modeling and validation.When reusing RDF graphs created by others, it is important to know how the data is represented. Current practices of using human-readable descriptions or ontologies to communicate data structures often lack sufficient precision for data consumers to quickly and easily understand data representation details. We provide concrete examples of how we use ShEx as a constraint and validation language that allows humans and machines to communicate unambiguously about data assets. We use ShEx to exchange and understand data models of different origins, and to express a shared model of a resource’s footprint in a Linked Data source. We also use ShEx to agilely develop data models, test them against sample data, and revise or refine them. The expressivity of ShEx allows us to catch disagreement, inconsistencies, or errors efficiently, both at the time of input, and through batch inspections.ShEx addresses the need of the Semantic Web community to ensure data quality for RDF graphs. It is currently being used in the development of FHIR/RDF. The language is sufficiently expressive to capture constraints in FHIR, and the intuitive syntax helps people to quickly grasp the range of conformant documents. The publication workflow for FHIR tests all of these examples against the ShEx schemas, catching non-conformant data before they reach the public. ShEx is also currently used in Wikidata projects such as Gene Wiki and WikiCite to develop quality-control pipelines to maintain data integrity and incorporate or harmonize differences in data across different parts of the pipelines.

Katherine Thornton, Harold Solbrig, Gregory S. Stupp, Jose Emilio Labra Gayo, Daniel Mietchen, Eric Prud’hommeaux, Andra Waagmeester
Backmatter
Metadaten
Titel
The Semantic Web
herausgegeben von
Pascal Hitzler
Miriam Fernández
Krzysztof Janowicz
Amrapali Zaveri
Alasdair J.G. Gray
Vanessa Lopez
Armin Haller
Karl Hammar
Copyright-Jahr
2019
Electronic ISBN
978-3-030-21348-0
Print ISBN
978-3-030-21347-3
DOI
https://doi.org/10.1007/978-3-030-21348-0

Neuer Inhalt