Top

2018 | Book

Read chapter Read first chapter

Knowledge Engineering and Knowledge Management

21st International Conference, EKAW 2018, Nancy, France, November 12-16, 2018, Proceedings

Editors: Catherine Faron Zucker, Chiara Ghidini, Amedeo Napoli, Yannick Toussaint

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book constitutes the refereed proceedings of the 21th International Conference on Knowledge Engineering and Knowledge Management, EKAW 2018, held in Nancy, France, in November 2018. The 36 full papers presented were carefully reviewed and selected from 104 submissions. The papers cover all aspects of eliciting, acquiring, modeling, and managing knowledge, the construction of knowledge-intensive systems and services for the Semantic Web, knowledge management, e-business, natural language processing, intelligent information integration, personal digital assistance systems, and a variety of other related topics. A special focus was on "Knowledge and AI", i.e. papers describing algorithms, tools, methodologies, and applications that exploit the interplay between knowledge and Artificial Intelligence techniques, with a special emphasis on knowledge discovery.

Frontmatter

Research Papers

Frontmatter

An Empirical Evaluation of RDF Graph Partitioning Techniques

With the significant growth of RDF data sources in both numbers and volume comes the need to improve the scalability of RDF storage and querying solutions. Current implementations employ various RDF graph partitioning techniques. However, choosing the most suitable partitioning for a given RDF graph and application is not a trivial task. To the best of our knowledge, no detailed empirical evaluation exists to evaluate the performance of these techniques. In this work, we present an empirical evaluation of RDF graph partitioning techniques applied to real-world RDF data sets and benchmark queries. We evaluate the selected RDF graph partitioning techniques in terms of their partitioning time, partitioning imbalance (in sizes), and query run time performances achieved, based on real-world data sets and queries selected using the FEASIBLE benchmark generation framework.

Adnan Akhter, Axel-Cyrille Ngomo Ngonga, Muhammad Saleem

Fuzzy Semantic Labeling of Semi-structured Numerical Datasets

SPARQL endpoints provide access to rich sources of data (e.g. knowledge graphs), which can be used to classify other less structured datasets (e.g. CSV files or HTML tables on the Web). We propose an approach to suggest types for the numerical columns of a collection of input files available as CSVs. Our approach is based on the application of the fuzzy c-means clustering technique to numerical data in the input files, using existing SPARQL endpoints to generate training datasets. Our approach has three major advantages: it works directly with live knowledge graphs, it does not require knowledge-graph profiling beforehand, and it avoids tedious and costly manual training to match values with types. We evaluate our approach against manually annotated datasets. The results show that the proposed approach classifies most of the types correctly for our test sets.

Ahmad Alobaid, Oscar Corcho

From Georeferenced Data to Socio-Spatial Knowledge. Ontology Design Patterns to Discover Domain-Specific Knowledge from Crowdsourced Data

So far, ontologies developed to support Geographic Information science have been mostly designed from a space-centered rather than a human-centered and social perspective. In the last decades, a wealth of georeferenced data is collected through sensors, mobile and web platforms from the crowd, providing rich information about people’s collective experiences and behaviors in cities. As a consequence, these new data sources require models able to make machine-understandable the social meanings and uses people commonly associate with certain places. This contribution proposes a set of reusable Ontology Design Patterns (ODP) to guide a data mining workflow and to semantically enrich the mined results. The ODPs explicitly aim at representing two facets of the geographic knowledge - the built environment and people social behavior in cities - as well as the way they interact. Modelling the interplay between the physical and the human aspects of the urban environment provides an ontology representation of the socio-spatial knowledge which can be used as baseline domain knowledge for analysing and interpreting georeferenced data collected through crowdsourcing. An experimentation using a TripAdvisor data sample to recognize food consumption practices in the city of Turin is presented.

Alessia Calafiore, Guido Boella, Leender van der Torre

Conceptual Schema Transformation in Ontology-Based Data Access

Ontology-based Data Access (OBDA) is a by now well-established paradigm that relies on conceptually representing a domain of interest to provide access to relational data sources. The conceptual representation is given in terms of a domain schema (also called an ontology), which is linked to the data sources by means of declarative mapping specifications, and queries posed over the conceptual schema are automatically rewritten into queries over the sources. We consider the interesting setting where users would like to access the same data sources through a new conceptual schema, which we call the upper schema. This is particularly relevant when the upper schema is a reference model for the domain, or captures the data format used by data analysis tools. We propose a solution to this problem that is based on using transformation rules to map the upper schema to the domain schema, building upon the knowledge contained therein. We show how this enriched framework can be automatically transformed into a standard OBDA specification, which directly links the original relational data sources to the upper schema. This allows us to access data directly from the data sources while leveraging the domain schema and upper schema as a lens. We have realized the framework in a tool-chain that provides modeling of the conceptual schemas, a concrete annotation-based mechanism to specify transformation rules, and the automated generation of the final OBDA specification.

Diego Calvanese, Tahir Emre Kalayci, Marco Montali, Ario Santoso, Wil van der Aalst

SWRL Reasoning Using Decision Tables

Ontologies are widely used for representing and sharing knowledge specific to some domain. The Web Ontology Language (OWL) is a popular language for designing ontologies and has been extended with the Semantic Web Rule Language (SWRL) to enable the use of rules in OWL ontologies. However, reasoning with SWRL rules is a computationally complex task, making its use difficult in time-sensitive applications. Such applications usually rely on decision tables, a popular yet simple structure used for fast decision making. Decision tables however are limited to propositional rules, making it impossible to represent SWRL rules using universally quantified variables. In this paper, a technique is proposed to enable reasoning with decision tables for SWRL rules and OWL ontologies by exploiting the classes of the variables and entities. Experimental results show that for many settings, our technique offers faster reasoning speed when compared to a state of the art SWRL reasoner.

Maxime Clement, Ryutaro Ichise

A Framework for Explaining Query Answers in DL-Lite

An Ontology-based Data Access system is constituted by an ontology, namely a description of the concepts and the relations in a domain of interest, a database storing facts about the domain, and a mapping between the data and the ontology. In this paper, we consider ontologies expressed in the popular DL-Lite family of Description Logic, and we address the problem of computing explanations for answers to queries in an OBDA system, where queries are either positive, in particular conjunctive queries, or negative, i.e., negation of conjunctive queries. We provide the following contributions: (i) we propose a formal, comprehensive framework of explaining query answers in OBDA systems based on DL-Lite; (ii) we present an algorithm that, given a tuple returned as an answer to a positive query, and given a weighting function, examines all the explanations of the answer, and chooses the best explanation according to such function; (iii) we do the same for the answers to negative queries. Notably, on the way to get the latter result, we present what appears to be the first algorithm that computes the answers to negative queries in DL-Lite.

Federico Croce, Maurizio Lenzerini

DLFoil: Class Expression Learning Revisited

The paper presents the ultimate version of a concept learning system which can support typical ontology construction/evolution tasks through the induction of class expressions from groups of individual resources labeled by a domain expert. Stating the target task as a search problem, a Foil-like algorithm was devised based on the employment of refinement operators to traverse the version-space of candidate definitions for the target class. The algorithm has been further enhanced including a more general definition for the scoring function and better refinement operators. An experimental evaluation of the resulting new release of DL-Foil, which implements these improvements was carried out to assess its performance also in comparison with other concept learning systems.

Nicola Fanizzi, Giuseppe Rizzo, Claudia d’Amato, Floriana Esposito

Requirements Behaviour Analysis for Ontology Testing

In the software engineering field, every software product is delivered with its pertinent associated tests which verify its correct behaviour. Besides, there are several approaches which, integrated in the software development process, deal with software testing, such as unit testing or behaviour-driven development. However, in the ontology engineering field there is a lack of clearly defined testing processes that can be integrated into the ontology development process. In this paper we propose a testing framework composed by a set of activities (i.e., test design, implementation and execution), with the goal of checking whether the requirements identified are satisfied by the formalization and analysis of their expected behaviour. This testing framework can be used in different types of ontology development life-cycles, or concerning other goals such as conformance testing between ontologies. In addition to this, we propose an RDF vocabulary to store, publish and reuse these test cases and their results, in order to allow traceability between the ontology, the test cases and their requirements. We validate our approach by integrating the testing framework into an ontology engineering process where an ontology network has been developed following agile principles.

Alba Fernández-Izquierdo, Raúl García-Castro

Interactive Interpretation of Serial Episodes: Experiments in Musical Analysis

We propose an interactive approach for post-processing serial episodes mined from sequential data, i.e. time-stamped sequences of events. The strength of the approach rests upon an interactive interpretation that relies on a web interface featuring various tools for observing, sorting and filtering the mined episodes. Features of the approach include interestingness measures, interactive visualization of episode occurrences in the mined event sequence, and an automatic filtering mechanism that remove episodes depending on the analyst’s previous actions. We report experiments that show the advantages and limits of this approach in the domain of melodic analysis.

Béatrice Fuchs, Amélie Cordier

Network Metrics for Assessing the Quality of Entity Resolution Between Multiple Datasets

Matching entities between datasets is a crucial step for combining multiple datasets on the semantic web. A rich literature exists on different approaches to this entity resolution problem. However, much less work has been done on how to assess the quality of such entity links once they have been generated. Evaluation methods for link quality are typically limited to either comparison with a ground truth dataset (which is often not available), manual work (which is cumbersome and prone to error), or crowd sourcing (which is not always feasible, especially if expert knowledge is required). Furthermore, the problem of link evaluation is greatly exacerbated for links between more than two datasets, because the number of possible links grows rapidly with the number of datasets. In this paper, we propose a method to estimate the quality of entity links between multiple datasets. We exploit the fact that the links between entities from multiple datasets form a network, and we show how simple metrics on this network can reliably predict their quality. We verify our results in a large experimental study using six datasets from the domain of science, technology and innovation studies, for which we created a gold standard. This gold standard, available online, is an additional contribution of this paper. In addition, we evaluate our metric on a recently published gold standard to confirm our findings.

Al Koudous Idrissou, Frank van Harmelen, Peter van den Besselaar

Making Sense of Numerical Data - Semantic Labelling of Web Tables

With the increasing amount of structured data on the web the need to understand and support search over this emerging data space is growing. Adding semantics to structured data can help address existing challenges in data discovery, as it facilitates understanding the values in their context. While there are approaches on how to lift structured data to semantic web formats to enrich it and facilitate discovery, most work to date focuses on textual fields rather than numerical data. In this paper, we propose a two level (row and column based) approach to add semantic meaning to numerical values in tables, called NUMER. We evaluate our approach using a benchmark (NumDB) generated for the purpose of this work. We show the influence of the different levels of analysis on the success of assigning semantic labels to numerical values in tables. Our approach outperforms the state of the art and is less affected by data structure and quality issues such as a small number of entities or deviations in the data.

Emilia Kacprzak, José M. Giménez-García, Alessandro Piscopo, Laura Koesten, Luis-Daniel Ibáñez, Jeni Tennison, Elena Simperl

Towards Enriching DBpedia from Vertical Enumerative Structures Using a Distant Learning Approach

Automatic construction of semantic resources at large scale usually relies on general purpose corpora as Wikipedia. This resource, by nature rich in encyclopedic knowledge, exposes part of this knowledge with strongly structured elements (infoboxes, categories, etc.). Several extractors have targeted these structures in order to enrich or to populate semantic resources as DBpedia, YAGO or BabelNet. The remain semi-structured textual structures, such as vertical enumerative structures (those using typographic and dispositional layout) have been however under-exploited. However, frequent in corpora, they are rich sources of specific semantic relations, such as hypernyms. This paper presents a distant learning approach for extracting hypernym relations from vertical enumerative structures of Wikipedia, with the aim of enriching DBpedia. Our relation extraction approach achieves an overall precision of 62%, and 99% of the extracted relations can enrich DBpedia, with respect to a reference corpus.

Mouna Kamel, Cassia Trojahn

The Utility of the Abstract Relational Model and Attribute Paths in SQL

It is well-known that querying information is difficult for domain experts, for they are not familiar with querying actual relational schemata due to the notions of primary and foreign keys and the various ways of representing and storing information in a relational database. To overcome these problems, the Abstract Relational Model and the query language, SQLP, have been proposed. They are the theoretical foundations and ensure that explicit primary and foreign keys are hidden from the user’s view and that queries can be expressed more compactly. In this paper we evaluate these theoretical advantages with user studies that compare SQLP to plain SQL as the baseline. The experiments show significant statistical evidence that SQLP indeed requires less time for understanding and authoring queries, with no loss in accuracy. Considering the positive results, we develop a method to reverse engineer legacy relational schemata into abstract relational ones.

Weicong Ma, C. Maria Keet, Wayne Oldford, David Toman, Grant Weddell

Support and Centrality: Learning Weights for Knowledge Graph Embedding Models

Computing knowledge graph (KG) embeddings is a technique to learn distributional representations for components of a knowledge graph while preserving structural information. The learned embeddings can be used in multiple downstream tasks such as question answering, information extraction, query expansion, semantic similarity, and information retrieval. Over the past years, multiple embedding techniques have been proposed based on different underlying assumptions. The most actively researched models are translation-based which treat relations as translation operations in a shared (or relation-specific) space. Interestingly, almost all KG embedding models treat each triple equally, regardless of the fact that the contribution of each triple to the global information content differs substantially. Many triples can be inferred from others, while some triples are the foundational (basis) statements that constitute a knowledge graph, thereby supporting other triples. Hence, in order to learn a suitable embedding model, each triple should be treated differently with respect to its information content. Here, we propose a data-driven approach to measure the information content of each triple with respect to the whole knowledge graph by using rule mining and PageRank. We show how to compute triple-specific weights to improve the performance of three KG embedding models (TransE, TransR and HolE). Link prediction tasks on two standard datasets, FB15K and WN18, show the effectiveness of our weighted KG embedding model over other more complex models. In fact, for FB15K our TransE-RW embeddings model outperforms models such as TransE, TransM, TransH, and TransR by at least 12.98% for measuring the Mean Rank and at least 1.45% for HIT@10. Our HolE-RW model also outperforms HolE and ComplEx by at least 14.3% for MRR and about 30.4% for HIT@1 on FB15K. Finally, TransR-RW show an improvement over TransR by 3.90% for Mean Rank and 0.87% for HIT@10.

Gengchen Mai, Krzysztof Janowicz, Bo Yan

OmniScience and Extensions – Lessons Learned from Designing a Multi-domain, Multi-use Case Knowledge Representation System

With growing research across scientific domains and increasing daily publications volumes, it is essential to provide our users, at Elsevier, with up to date, comprehensive and to the point data. One of the key aspects of that offer is to have a global Knowledge Organization System (KOS) overarching scientific branches but also going deep enough into each domain to provide rich annotation or classification capacities. Knowing that the endeavor of creating one global “ontology of everything” is an utopia, we designed a dual/multi-vocabulary model where domain-specific extensions can be used in junction with a high-to-mid-level KOS covering the broad spectrum of scientific research. In this paper, we present our design model along with our updating procedure and our lessons learned in different use cases: the Evise submission system, the Topic Pages project and a Semantic Annotation Proof of Concept experiment in the field of Engineering.

Véronique Malaisé, Anke Otten, Pascal Coupet

A Semantic Use Case Simulation Framework for Training Machine Learning Algorithms

To train autonomous agents, large training data sets are required to provide the necessary support in solving real-world problems. In domains such as healthcare or ambient assisted living, such training sets are often incomplete or do not cover the unique requirements and constraints of specific use cases, leading to the cold-start problem. This work describes a semantic simulation framework that generates qualitative use case specific data for Machine-Learning (ML) driven agents, thus solving the cold-start problem. By combing simulated data with axiomatically formalized use case requirements, we are able to train ML algorithms without real-world data at hand. We integrate domain specific guidelines and their semantic representation by using SHACL/RDF(S) and SPARQL CONSTRUCT queries. The main benefits of this approach are (1) portability to other domains, (2) applicability to various ML algorithms, and (3) mitigation of the cold-start problem or sparse data.

Nicole Merkle, Stefan Zander, Viliam Simko

KnIGHT: Mapping Privacy Policies to GDPR

Although the use of apps and online services comes with accompanying privacy policies, a majority of end-users ignore them due to their length, complexity and unappealing presentation.pite the potential risks. In light of the, now enforced EU-wide, General Data Protection Regulation (GDPR) we present an automatic technique for mapping privacy policies excerpts to relevant GDPR articles so as to support average users in understanding their usage risks and rights as a data subject. KnIGHT (Know your rIGHTs), is a tool that finds candidate sentences in a privacy policy that are potentially related to specific articles in the GDPR. The approach employs semantic text matching in order to find the most appropriate GDPR paragraph, and to the best of our knowledge is one of the first automatic attempts of its kind applied to a company’s policy. Our evaluation shows that on average between 70–90% of the tool’s automatic mappings are at least partially correct, meaning that the tool can be used to significantly guide human comprehension. Following this result, in the future we will utilize domain-specific vocabularies to perform a deeper semantic analysis and improve the results further.

Najmeh Mousavi Nejad, Simon Scerri, Jens Lehmann

Automating Class/Instance Representational Choices in Knowledge Bases

We present a method for making decisions as to whether an entity in a knowledge base should be a class or an instance based on external evidence in the form of corresponding textual corpora such as Wikipedia articles. The approach, based on machine classification of the text, avoids the need for feature engineering and provides valuable guidance when building or refining large knowledge bases. The approach works well over different domains and outperforms a variety of other state-of-the-art approaches.

Ankur Padia, David Martin, Peter F. Patel-Schneider

Comparative Preferences in SPARQL

Sometimes one does not want all the solutions to a query but instead only those that are most desirable according to user-specified preferences. If a user-specified preference relation is acyclic then its specification and meaning are straightforward. In many settings, however, it is valuable to support preference relations that are not acyclic and that might not even be transitive, in which case though their handling involves some open questions. We discuss a definition of desired solutions for arbitrary preference relations and show its desirable properties. We modify a previous extension to SPARQL for simple preferences to correctly handle any preference relation and provide translations of this extension back into SPARQL that can compute the desired solutions for all preference relations that are acyclic or transitive. We also propose an additional extension that returns solutions at multiple levels of desirability, which adds additional expressiveness over prior work. However, for the latter we conjecture that an effective translation to a single (non-recursive) SPARQL query is not possible.

Peter F. Patel-Schneider, Axel Polleres, David Martin

Interplay of Game Incentives, Player Profiles and Task Difficulty in Games with a Purpose

How to take multiple factors into account when evaluating a Game with a Purpose? How is player behaviour or participation influenced by different incentives? How does player engagement impact their accuracy in solving tasks? In this paper, we present a detailed investigation of multiple factors affecting the evaluation of a GWAP and we show how they impact on the achieved results. We inform our study with the experimental assessment of a GWAP designed to solve a multinomial classification task.

Gloria Re Calegari, Irene Celino

Inferring Types on Large Datasets Applying Ontology Class Hierarchy Classifiers: The DBpedia Case

Adding type information to resources belonging to large knowledge graphs is a challenging task, specially when considering those that are generated collaboratively, such as DBpedia, which usually contain errors and noise produced during the transformation process from different data sources. It is important to assign the correct type(s) to resources in order to efficiently exploit the information provided by the dataset. In this work we explore how machine learning classification models can be applied to solve this issue, relying on the information defined by the ontology class hierarchy. We have applied our approaches to DBpedia and compared to the state of the art, using a per-level analysis. We also define metrics to measure the quality of the results. Our results show that this approach is able to assign 56% more new types with higher precision and recall than the current DBpedia state of the art.

Mariano Rico, Idafen Santana-Pérez, Pedro Pozo-Jiménez, Asunción Gómez-Pérez

A Framework for Tackling Myopia in Concept Learning on the Web of Data

A prominent class of supervised methods for the representations adopted in the context of the Web of Data are designed to solve concept learning problems. Such methods aim at approximating an intensional definition for a target concept from a set of individuals of a target knowledge base. In this scenario, most of the well-known solutions exploit a separate-and-conquer approach: intuitively, the learning algorithm builds an intensional definition by repeatedly specializing a partial solution with the aim of covering the largest number of positive examples as possible. Essentially such a strategy can be regarded as a form of hill-climbing search that can produce sub-optimal solutions. To cope with this problem, we propose a novel framework for the concept learning problem called DL-Focl. Three versions of this algorithmic solution, built upon DL-Foil, have been designed to tackle the inherent myopia of the separate-and-conquer strategies. Their implementation has been empirically tested against methods available in the DL-Learner suite showing interesting results.

Giuseppe Rizzo, Nicola Fanizzi, Claudia d’Amato, Floriana Esposito

Boosting Holistic Ontology Matching: Generating Graph Clique-Based Relaxed Reference Alignments for Holistic Evaluation

Ontology matching is the process of finding correspondences between entities from different ontologies. Whereas the field has fully developed in the last decades, most existing approaches are still limited to pairwise matching. However, in complex domains where several ontologies describing different but related aspects of the domain have to be linked together, matching multiple ontologies simultaneously, known as holistic matching, is required. In the absence of benchmarks dedicated to holistic matching evaluation, this paper presents a methodology for constructing pseudo-holistic reference alignments from available pairwise ones. We discuss the problem of relaxing graph cliques representing these alignments involving a different number of ontologies. We argue that fostering the development of holistic matching approaches depends on the availability of such data sets. We run our experiments on the OAEI Conference data set.

Philippe Roussille, Imen Megdiche, Olivier Teste, Cassia Trojahn

Prominence and Dominance in Networks

Topographic prominence and dominance were recently developed to quantify the relative importance of mountain peaks. Instead of simply using the height to characterize a mountain, they provide a more meaningful description based on vertical and horizontal distances in the neighborhood. In this paper, we propose structural prominence and dominance for networks, an adaptation of the topographic measures, for the detection of nodes with strong local importance. We create a network “landscape” which is generated by a node’s height and distance to other nodes in the network. We ground our proposed measures on the task of predicting award winners with high and sustainable impact in a co-authorship network. Our experiments show that our measures provide information about a graph, that is not provided by other graph measures.

Andreas Schmidt, Gerd Stumme

Deploying Spatial-Stream Query Answering in C-ITS Scenarios

Cooperative Intelligent Transport Systems (C-ITS) play an important role for providing the means to collect and exchange spatio-temporal data via V2X between vehicles and the infrastructure, which will be used for the deployment of (semi)-autonomous vehicles. The Local Dynamic Map (LDM) is a key concept for integrating static and streamed data in a spatial context. The LDM has been semantically enhanced to allow for an elaborate domain model that is captured by a mobility ontology, and for queries over data streams that cater for semantic concepts and spatial relationships. We show how this approach can be extended to address a wider range of use cases in the three C-ITS scenarios traffic statistics, events detection, and advanced driving assistance systems. We define for them requirements derived from necessary domain-specific features and report, based on them, on the extension of our query language with temporal relations, delaying, numeric predictions and trajectory predictions. An experimental evaluation of queries that reflect the requirements, using the real-world traffic simulation tool provides evidence for the feasibility/efficiency of our approach in the new scenarios.

Thomas Eiter, Ryutaro Ichise, Josiane Parreira Xavier, Patrik Schneider, Lihua Zhao

Metaproperty-Guided Deletion from the Instance-Level of a Knowledge Base

The ontology modeling practice of engineering metaproperties of concepts is a well-known technique. Some metaproperties of concepts describe the dynamics of concept instances, i.e. how instances can and cannot be altered. We investigate how deletions in an ontology-based knowledge base interact with the metaproperties rigidity and dependence. A particularly useful effect are delete cascades. We evaluate how rigidity and dependence may guide delete cascades in an engineering application. A case study in the area of product development shows that beyond explicitly defined deletions, our approach achieves further automated and desirable deletions of facts with high precision and good recall.

Claudia Schon, Steffen Staab, Patricia Kügler, Philipp Kestel, Benjamin Schleich, Sandro Wartzack

On Extracting Relations Using Distributional Semantics and a Tree Generalization

Extracting relations out of unstructured text is essential for a wide range of applications. Minimal human effort, scalability and high precision are desirable characteristics. We introduce a distant supervised closed relation extraction approach based on distributional semantics and a tree generalization. Our approach uses training data obtained from a reference knowledge base to derive dependency parse trees that might express a relation. It then uses a novel generalization algorithm to construct dependency tree patterns for the relation. Distributional semantics are used to eliminate false candidate patterns. We evaluate the performance in experiments on a large corpus using ninety target relations. Our evaluation results suggest that our approach achieves a higher precision than two state-of-the-art systems. Moreover, our results also underpin the scalability of our approach. Our open source implementation can be found at https://github.com/dice-group/Ocelot .

René Speck, Axel-Cyrille Ngomo Ngonga

A Query Model for Ontology-Based Event Processing over RDF Streams

Stream Reasoning (SR) envisioned, investigated and proved the possibility to make sense of streaming data in real-time. Now, the community is investigating more powerful solutions, realizing the vision of expressive stream reasoning. Ontology-Based Event Processing (OBEP) is our contribution to this field. OBEP combines Description Logics and Event Recognition Languages. It allows describing events either as logical statements or as complex event patterns, and it captures their occurrences over ontology streams. In this paper, we define OBEP’s query model, we present a language to define OBEP queries, and we explain the language semantics.

Riccardo Tommasini, Pieter Bonte, Emanuele Della Valle, Femke Ongenae, Filip De Turck

A Random Walk Model for Entity Relatedness

Semantic relatedness is a critical measure for a wide variety of applications nowadays. Numerous models, including path-based, have been proposed for this task with great success in many applications during the last few years. Among these applications, many of them require computing semantic relatedness between hundreds of pairs of items as part of their regular input. This scenario demands a computationally efficient model to process hundreds of queries in short time spans. Unfortunately, Path-based models are computationally challenging, creating large bottlenecks when facing these circumstances. Current approaches for reducing this computation have focused on limiting the number of paths to consider between entities.Contrariwise, we claim that a semantic relatedness model based on random walks is a better alternative for handling the computational cost. To this end, we developed a model based on the well-studied Katz score. Our model addresses the scalability issues of Path-based models by pre-computing relatedness for all pair of vertices in the knowledge graph beforehand and later providing them when needed in querying time. Our current findings demonstrate that our model has a competitive performance in comparison to Path-based models while being computationally efficient for high-demanding applications.

Pablo Torres-Tramón, Conor Hayes

Slicing and Dicing a Newspaper Corpus for Historical Ecology Research

Historical newspapers are a novel source of information for historical ecologists to study the interactions between humans and animals through time and space. Newspaper archives are particularly interesting to analyse because of their breadth and depth. However, the size and the occasional noisiness of such archives also brings difficulties, as manual analysis is impossible. In this paper, we present experiments and results on automatic query expansion and categorisation for the perception of animal species between 1800 and 1940. For query expansion and to the manual annotation process, we used lexicons. For the categorisation we trained a Support Vector Machine model. Our results indicate that we can distinguish newspaper articles that are about animal species from those that are not with an F $$_{1}$$ of 0.92 and the subcategorisation of the different types of newspapers on animals up to 0.84 F $$_{1}$$ .

Marieke van Erp, Jesse de Does, Katrien Depuydt, Rob Lenders, Thomas van Goethem

Using SPARQL – The Practitioners’ Viewpoint

A number of studies have analyzed SPARQL log data to draw conclusions about how SPARQL is being used. To complement this work, a survey of SPARQL users has been undertaken. Whilst confirming some of the conclusions of the previous studies, the current work is able to provide additional insight into how users create SPARQL queries, the difficulties they encounter, and the features they would like to see included in the language. Based on this insight, a number of recommendations are presented to the community. These relate to predicting and avoiding computationally expensive queries; extensions to the language; and extending the search paradigm.

Paul Warren, Paul Mulholland

In-Use Papers

Frontmatter

Combining Machine Learning and Semantics for Anomaly Detection

The emergence of the Internet of Things and stream processing forces large scale organizations to consider anomaly detection as a key component of their business. Using machine learning to solve such complex use cases is generally a cumbersome, costly, time-consuming and error-prone process. It involves many tasks from data cleansing, to dimension reduction, algorithm selection and fine tuning. It also requires the involvement of various experts such as statisticians, programmers and testers. With RAMSSES, we remove the burden of this pipeline and demonstrate that these tasks can be automated. Our system leverages on a Lambda architecture based on Apache Spark to analyze historical data, perform cleansing and deal with the curse of dimensionality. Then, it identifies the most interesting attributes and uses a continuous semantic query generator executed over streams. The sampled data are processed by self-selected machine learning methods to detect anomalies, an iterative process using end user annotations improves significantly the accuracy of the system. After a description of RAMSSES’s main components, the performance and relevancy of the system are demonstrated via a thorough evaluation over real-world and synthetic datasets.

Badre Belabbess, Musab Bairat, Jeremy Lhez, Olivier Curé

EROSO: Semantic Technologies Towards Thermal Comfort in Workplaces

Thermal comfort in workplaces not only has a direct impact on occupants working efficiency, but also on their morale and health. Therefore, there is a need to establish HVAC (Heating, Ventilation and Air Conditioning) control strategies that ensure comfortable thermal situations in these environments. KDD (Knowledge Discovery in Databases) processes may be used to calculate optimal HVAC control strategies that could ensure thermal comfort within a workplace. This paper presents EROSO (thERmal cOmfort SOlution), a framework that combines KDD processes and Semantic Technologies for ensuring thermal comfort in workplaces. Specifically, this paper focuses on EROSO’s approach for supporting the KDD’s Interpretation phase where Semantic Technologies are used to obtain an explanation of predictive model’s temperature predictions with regards to the thermal comfort regulations they satisfy. Furthermore, this result interpretation supports facility managers in the task of selecting the optimal HVAC control strategies. The EROSO framework is implemented in a real workplace and it is compared with an already existing solution implemented in the same physical scenario. Results show that Semantic Technologies make the proposed solution more usable and extensible, as well as ensuring a thermal comfort situation throughout the working day.

Iker Esnaola-Gonzalez, Jesús Bermúdez, Izaskun Fernández, Aitor Arnaiz

Divided We Stand Out! Forging Cohorts fOr Numeric Outlier Detection in Large Scale Knowledge Graphs (CONOD)

With the recent advances in data integration and the concept of data lakes, massive pools of heterogeneous data are being curated as Knowledge Graphs (KGs). In addition to data collection, it is of utmost importance to gain meaningful insights from this composite data. However, given the graph-like representation, the multimodal nature, and large size of data, most of the traditional analytic approaches are no longer directly applicable. The traditional approaches could collect all values of a particular attribute, e.g. height, and try to perform anomaly detection for this attribute. However, it is conceptually inaccurate to compare one attribute representing different entities, e.g. the height of buildings against the height of animals. Therefore, there is a strong need to develop fundamentally new approaches for the outlier detection in KGs. In this paper, we present a scalable approach, dubbed CONOD, that can deal with multimodal data and performs adaptive outlier detection against the cohorts of classes they represent, where a cohort is a set of classes that are similar based on a set of selected properties. We have tested the scalability of CONOD on KGs of different sizes, assessed the outliers using different inspection methods and achieved promising results.

Hajira Jabeen, Rajjat Dadwal, Gezim Sejdiu, Jens Lehmann

Decision Support Models to Assist in the Diagnosis of Meningitis

Meningitis diagnostic is a challenge especially in less developed countries where medical resources are limited, and the cost of treatments are not always affordable. For this reason, it would be desirable to have available any solution that could perform early diagnostics on meningitis to find the suitable treatment, at least for the more severe types of this disease (bacterial, meningococcal, …). In this paper, we present a set of clinical decision support models to assist physicians in the meningitis diagnostics. These models try to answer to the following two research questions: Can it be diagnosed reliably if a patient has meningitis? Can it be determined whether it is a bacterial or aseptic case? To explore the performance of our models, we have conducted validation experiments with a dataset of patients. For this purpose, we have counted with data of patient meningitis diagnostics in Brazil. The database was provided by the Directorate of Health Information of the Secretary of Health of the Brazilian State of Bahia and contained over 16,000 records. Several indexes have been computed to show the model accuracy, but the best corresponds to the ADTree classifier with a precision of 0.859 and a ROC area over 0.86. Validation results show a good performance of the models, suggesting, therefore, that our proposal can effectively support physicians’ decisions on meningitis management and treatment.

Viviane M. Lelis, María-Victoria Belmonte, Eduardo Guzmán

Position Papers

Frontmatter

A Framework to Conduct and Report on Empirical User Studies in Semantic Web Contexts

Semantic Web technologies are being applied to increasingly diverse areas where user involvement is crucial. While a number of user interfaces for Semantic Web systems have become available in the past years, their evaluation and reporting often still suffer from weaknesses. Empirical evaluations are essential to compare different approaches, demonstrate their benefits and reveal their drawbacks, and thus to facilitate further adoption of Semantic Web technologies. In this paper, we review empirical user studies of user interfaces, visualizations and interaction techniques recently published at relevant Semantic Web venues, assessing both the user studies themselves and their reporting. We then chart the design space of available methods for user studies in Semantic Web contexts. Finally, we propose a framework for their comprehensive reporting, taking into consideration user expertise, experimental setup, task design, experimental procedures and results analysis.

Catia Pesquita, Valentina Ivanova, Steffen Lohmann, Patrick Lambrix

Backmatter

Title: Knowledge Engineering and Knowledge Management
Editors: Catherine Faron Zucker
Chiara Ghidini
Amedeo Napoli
Yannick Toussaint
Publisher: Springer International Publishing
Electronic ISBN: 978-3-030-03667-6
Print ISBN: 978-3-030-03666-9
DOI: https://doi.org/10.1007/978-3-030-03667-6

Springer Professional

About this book

Table of Contents

Frontmatter

Research Papers

Frontmatter

An Empirical Evaluation of RDF Graph Partitioning Techniques

Fuzzy Semantic Labeling of Semi-structured Numerical Datasets

From Georeferenced Data to Socio-Spatial Knowledge. Ontology Design Patterns to Discover Domain-Specific Knowledge from Crowdsourced Data

Conceptual Schema Transformation in Ontology-Based Data Access

SWRL Reasoning Using Decision Tables

A Framework for Explaining Query Answers in DL-Lite

DLFoil: Class Expression Learning Revisited

Requirements Behaviour Analysis for Ontology Testing

Interactive Interpretation of Serial Episodes: Experiments in Musical Analysis

Network Metrics for Assessing the Quality of Entity Resolution Between Multiple Datasets

Making Sense of Numerical Data - Semantic Labelling of Web Tables

Towards Enriching DBpedia from Vertical Enumerative Structures Using a Distant Learning Approach

The Utility of the Abstract Relational Model and Attribute Paths in SQL

Support and Centrality: Learning Weights for Knowledge Graph Embedding Models

OmniScience and Extensions – Lessons Learned from Designing a Multi-domain, Multi-use Case Knowledge Representation System

A Semantic Use Case Simulation Framework for Training Machine Learning Algorithms

KnIGHT: Mapping Privacy Policies to GDPR

Automating Class/Instance Representational Choices in Knowledge Bases

Comparative Preferences in SPARQL

Interplay of Game Incentives, Player Profiles and Task Difficulty in Games with a Purpose

Inferring Types on Large Datasets Applying Ontology Class Hierarchy Classifiers: The DBpedia Case

A Framework for Tackling Myopia in Concept Learning on the Web of Data

Boosting Holistic Ontology Matching: Generating Graph Clique-Based Relaxed Reference Alignments for Holistic Evaluation

Prominence and Dominance in Networks

Deploying Spatial-Stream Query Answering in C-ITS Scenarios

Metaproperty-Guided Deletion from the Instance-Level of a Knowledge Base

On Extracting Relations Using Distributional Semantics and a Tree Generalization

A Query Model for Ontology-Based Event Processing over RDF Streams

A Random Walk Model for Entity Relatedness

Slicing and Dicing a Newspaper Corpus for Historical Ecology Research

Using SPARQL – The Practitioners’ Viewpoint

In-Use Papers

Frontmatter

Combining Machine Learning and Semantics for Anomaly Detection

EROSO: Semantic Technologies Towards Thermal Comfort in Workplaces

Divided We Stand Out! Forging Cohorts fOr Numeric Outlier Detection in Large Scale Knowledge Graphs (CONOD)

Decision Support Models to Assist in the Diagnosis of Meningitis

Position Papers

Frontmatter

A Framework to Conduct and Report on Empirical User Studies in Semantic Web Contexts

Backmatter

Premium Partner