Ontologies and Reasoning

Frontmatter

Handling Impossible Derivations During Stream Reasoning

With the rapid expansion of the Web and the advent of the Internet of Things, there is a growing need to design tools for intelligent analytics and decision making on streams of data. Logic-based frameworks like LARS allow the execution of complex reasoning on such streams, but it is paramount that the computation is completed in a timely manner before the stream expires. To reduce the runtime, we can extend the validity of inferred conclusions to the future to avoid repeated derivations, but this is not enough to avoid all sources of redundant computation. To further alleviate this problem, this paper introduces a new technique that infers the impossibility of certain derivations in the future and blocks the reasoner from performing computation that is doomed to fail anyway. An experimental analysis on microbenchmarks shows that our technique leads to a significant reduction of the reasoning runtime.

Hamid R. Bazoobandi, Henri Bal, Frank van Harmelen, Jacopo Urbani

Modular Graphical Ontology Engineering Evaluated

Ontology engineering is traditionally a complex and time-consuming process, requiring an intimate knowledge of description logic and predicting non-local effects of different ontological commitments. Pattern-based modular ontology engineering, coupled with a graphical modeling paradigm, can help make ontology engineering accessible to modellers with limited ontology expertise. We have developed CoModIDE, the Comprehensive Modular Ontology IDE, to develop and explore such a modeling approach. In this paper we present an evaluation of the CoModIDE tool, with a set of 21 subjects carrying out some typical modeling tasks. Our findings indicate that using CoModIDE improves task completion rate and reduces task completion time, compared to using standard Protégé. Further, our subjects report higher System Usability Scale (SUS) evaluation scores for CoModIDE, than for Protégé. The subjects also report certain room for improvements in the CoModIDE tool – notably, these comments all concern comparatively shallow UI bugs or issues, rather than limitations inherent in the proposed modeling method itself. We deduce that our modeling approach is viable, and propose some consequences for ontology engineering tool development.

Cogan Shimizu, Karl Hammar, Pascal Hitzler

Fast and Exact Rule Mining with AMIE 3

Given a knowledge base (KB), rule mining finds rules such as “If two people are married, then they live (most likely) in the same place”. Due to the exponential search space, rule mining approaches still have difficulties to scale to today’s large KBs. In this paper, we present AMIE 3, a system that employs a number of sophisticated pruning strategies and optimizations. This allows the system to mine rules on large KBs in a matter of minutes. Most importantly, we do not have to resort to approximations or sampling, but are able to compute the exact confidence and support of each rule. Our experiments on DBpedia, YAGO, and Wikidata show that AMIE 3 beats the state of the art by a factor of more than 15 in terms of runtime.

Jonathan Lajus, Luis Galárraga, Fabian Suchanek

A Simple Method for Inducing Class Taxonomies in Knowledge Graphs

The rise of knowledge graphs as a medium for storing and organizing large amounts of data has spurred research interest in automated methods for reasoning with and extracting information from this representation of data. One area which seems to receive less attention is that of inducing a class taxonomy from such graphs. Ontologies, which provide the axiomatic foundation on which knowledge graphs are built, are often governed by a set of class subsumption axioms. These class subsumptions form a class taxonomy which hierarchically organizes the type classes present in the knowledge graph. Manually creating and curating these class taxonomies oftentimes requires expert knowledge and is time costly, especially in large-scale knowledge graphs. Thus, methods capable of inducing the class taxonomy from the knowledge graph data automatically are an appealing solution to the problem. In this paper, we propose a simple method for inducing class taxonomies from knowledge graphs that is scalable to large datasets. Our method borrows ideas from tag hierarchy induction methods, relying on class frequencies and co-occurrences, such that it requires no information outside the knowledge graph’s triple representation. We demonstrate the use of our method on three real-world datasets and compare our results with existing tag hierarchy induction methods. We show that our proposed method outperforms existing tag hierarchy induction methods, although both perform well when applied to knowledge graphs.

Marcin Pietrasik, Marek Reformat

Hybrid Reasoning Over Large Knowledge Bases Using On-The-Fly Knowledge Extraction

The success of logic-based methods for comparing entities heavily depends on the axioms that have been described for them in the Knowledge Base (KB). Due to the incompleteness of even large and well engineered KBs, such methods suffer from low recall when applied in real-world use cases. To address this, we designed a reasoning framework that combines logic-based subsumption with statistical methods for on-the-fly knowledge extraction. Statistical methods extract additional (missing) axioms for the compared entities with the goal of tackling the incompleteness of KBs and thus improving recall. Although this can be beneficial, it can also introduce noise (false positives or false negatives). Hence, our framework uses heuristics to assess whether knowledge extraction is likely to be advantageous and only activates the statistical components if this is the case. We instantiate our framework by combining lightweight logic-based reasoning implemented on top of existing triple-stores with an axiom extraction method that is based on the labels of concepts. Our work was motivated by industrial use cases over which we evaluate our instantiated framework, showing that it outperforms approaches that are only based on textual information. Besides the best combination of precision and recall, our implementation is also scalable and is currently used in an industrial production environment.

Giorgos Stoilos, Damir Juric, Szymon Wartak, Claudia Schulz, Mohammad Khodadadi

Natural Language Processing and Information Retrieval

Frontmatter

Partial Domain Adaptation for Relation Extraction Based on Adversarial Learning

Relation extraction methods based on domain adaptation have begun to be extensively applied in specific domains to alleviate the pressure of insufficient annotated corpus, which enables learning by utilizing the training data set of a related domain. However, the negative transfer may occur during the adaptive process due to differences in data distribution between domains. Besides, it is difficult to achieve a fine-grained alignment of relation category without fully mining the multi-mode data structure. Furthermore, as a common application scenario, partial domain adaptation (PDA) refers to domain adaptive behavior when the relation class set of a specific domain is a subset of the related domain. In this case, some outliers belonging to the related domain will reduce the performance of the model. To solve these problems, a novel model based on a multi-adversarial module for partial domain adaptation (MAPDA) is proposed in this study. We design a weight mechanism to mitigate the impact of noise samples and outlier categories, and embed several adversarial networks to realize various category alignments between domains. Experimental results demonstrate that our proposed model significantly improves the state-of-the-art performance of relation extraction implemented in domain adaptation.

Xiaofei Cao, Juan Yang, Xiangbin Meng

SASOBUS: Semi-automatic Sentiment Domain Ontology Building Using Synsets

In this paper, a semi-automatic approach for building a sentiment domain ontology is proposed. Differently than other methods, this research makes use of synsets in term extraction, concept formation, and concept subsumption. Using several state-of-the-art hybrid aspect-based sentiment analysis methods like Ont + CABASC and Ont + LCR-Rot-hop on a standard dataset, the accuracies obtained using the semi-automatically built ontology as compared to the manually built one, are slightly lower (from approximately 87% to 84%). However, the user time needed for building the ontology is reduced by more than half (from 7 h to 3 h), thus showing the usefulness of this work. This is particularly useful for domains for which sentiment ontologies are not yet available.

Ewelina Dera, Flavius Frasincar, Kim Schouten, Lisa Zhuang

Keyword Search over RDF Using Document-Centric Information Retrieval Systems

For ordinary users, the task of accessing knowledge graphs through structured query languages like SPARQL is rather demanding. As a result, various approaches exploit the simpler and widely used keyword-based search paradigm, either by translating keyword queries to structured queries, or by adopting classical information retrieval (IR) techniques. In this paper, we study and adapt Elasticsearch, an out-of-the-box document-centric IR system, for supporting keyword search over RDF datasets. Contrary to other works that mainly retrieve entities, we opt for retrieving triples, due to their expressiveness and informativeness. We specify the set of functional requirements and study the emerging questions related to the selection and weighting of the triple data to index, and the structuring and ranking of the retrieved results. Finally, we perform an extensive evaluation of the different factors that affect the IR performance for four different query types. The reported results are promising and offer useful insights on how different Elasticsearch configurations affect the retrieval effectiveness and efficiency.

Giorgos Kadilierakis, Pavlos Fafalios, Panagiotis Papadakos, Yannis Tzitzikas

Entity Linking and Lexico-Semantic Patterns for Ontology Learning

Ontology learning from a text written in natural language is a well-studied domain. However, the applicability of techniques for ontology learning from natural language texts is strongly dependent on the characteristics of the text corpus and the language used. In this paper, we present our work so far in entity linking and enhancing the ontology with extracted relations between concepts. We discuss the benefits of adequately designed lexico-semantic patterns in ontology learning. We propose a preliminary set of lexico-semantic patterns designed for the Czech language to learn new relations between concepts in the related domain ontology in a semi-supervised approach. We utilize data from the urban planning and development domain to evaluate the introduced technique. As a partial prototypical implementation of the stack, we present Annotace, a text annotation service that provides links between the ontology model and the textual documents in Czech.

Lama Saeeda, Michal Med, Martin Ledvinka, Miroslav Blaško, Petr Křemen

Semantic Data Management and Data Infrastructures

Frontmatter

Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling

RDF dataset profiles provide a formal representation of a dataset’s characteristics (features). These profiles may cover various aspects of the data represented in the dataset as well as statistical descriptors of the data distribution. In this work, we focus on the characteristic sets profile feature summarizing the characteristic sets contained in an RDF graph. As this type of feature provides detailed information on both the structure and semantics of RDF graphs, they can be very beneficial in query optimization. However, in decentralized query processing, computing them is challenging as it is difficult and/or costly to access and process all datasets. To overcome this shortcoming, we propose the concept of a profile feature estimation. We present sampling methods and projection functions to generate estimations which aim to be as similar as possible to the original characteristic sets profile feature. In our evaluation, we investigate the feasibility of the proposed methods on four RDF graphs. Our results show that samples containing $$0.5\%$$ of the entities in the graph allow for good estimations and may be used by downstream tasks such as query plan optimization in decentralized querying.

Lars Heling, Maribel Acosta

Social and Human Aspects of the Semantic Web

Frontmatter

SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata

Wikidata is a free and open knowledge base which can be read and edited by both humans and machines. It acts as a central storage for the structured data of several Wikimedia projects. To improve the process of manually inserting new facts, the Wikidata platform features an association rule-based tool to recommend additional suitable properties. In this work, we introduce a novel approach to provide such recommendations based on frequentist inference. We introduce a trie-based method that can efficiently learn and represent property set probabilities in RDF graphs. We extend the method by adding type information to improve recommendation precision and introduce backoff strategies which further increase the performance of the initial approach for entities with rare property combinations. We investigate how the captured structure can be employed for property recommendation, analogously to the Wikidata PropertySuggester. We evaluate our approach on the full Wikidata dataset and compare its performance to the state-of-the-art Wikidata PropertySuggester, outperforming it in all evaluated metrics. Notably we could reduce the average rank of the first relevant recommendation by 71%.

Lars C. Gleim, Rafael Schimassek, Dominik Hüser, Maximilian Peters, Christoph Krämer, Michael Cochez, Stefan Decker

Machine Learning

Frontmatter

Hyperbolic Knowledge Graph Embeddings for Knowledge Base Completion

Learning embeddings of entities and relations existing in knowledge bases allows the discovery of hidden patterns in them. In this work, we examine the contribution of geometrical space to the task of knowledge base completion. We focus on the family of translational models, whose performance has been lagging. We extend these models to the hyperbolic space so as to better reflect the topological properties of knowledge bases. We investigate the type of regularities that our model, dubbed HyperKG, can capture and show that it is a prominent candidate for effectively representing a subset of Datalog rules. We empirically show, using a variety of link prediction datasets, that hyperbolic space allows to narrow down significantly the performance gap between translational and bilinear models and effectively represent certain types of rules.

Prodromos Kolyvakis, Alexandros Kalousis, Dimitris Kiritsis

Unsupervised Bootstrapping of Active Learning for Entity Resolution

Entity resolution is one of the central challenges when integrating data from large numbers of data sources. Active learning for entity resolution aims to learn high-quality matching models while minimizing the human labeling effort by selecting only the most informative record pairs for labeling. Most active learning methods proposed so far, start with an empty set of labeled record pairs and iteratively improve the prediction quality of a classification model by asking for new labels. The absence of adequate labeled data in the early active learning iterations leads to unstable models of low quality which is known as the cold start problem. In our work we solve the cold start problem using an unsupervised matching method to bootstrap active learning. We implement a thresholding heuristic that considers pre-calculated similarity scores and assigns matching labels with some degree of noise at no manual labeling cost. The noisy labels are used for initializing the active learning process and throughout the whole active learning cycle for model learning and query selection. We evaluate our pipeline with six datasets from three different entity resolution settings using active learning with a committee-based query strategy and show it successfully deals with the cold start problem. Comparing our method against two active learning baselines without bootstrapping, we show that it can additionally lead to overall improved learned models in terms of $$F_{1}$$ score and stability.

Anna Primpeli, Christian Bizer, Margret Keuper

Distribution and Decentralization

Frontmatter

Processing SPARQL Aggregate Queries with Web Preemption

Executing aggregate queries on the web of data allows to compute useful statistics ranging from the number of properties per class in a dataset to the average life of famous scientists per country. However, processing aggregate queries on public SPARQL endpoints is challenging, mainly due to quotas enforcement that prevents queries to deliver complete results. Existing distributed query engines allow to go beyond quota limitations, but their data transfer and execution times are clearly prohibitive when processing aggregate queries. Following the web preemption model, we define a new preemptable aggregation operator that allows to suspend and resume aggregate queries. Web preemption allows to continue query execution beyond quota limits and server-side aggregation drastically reduces data transfer and execution time of aggregate queries. Experimental results demonstrate that our approach outperforms existing approaches by orders of magnitude in terms of execution time and the amount of transferred data.

Arnaud Grall, Thomas Minier, Hala Skaf-Molli, Pascal Molli

Science of Science

Frontmatter

Embedding-Based Recommendations on Scholarly Knowledge Graphs

The increasing availability of scholarly metadata in the form of Knowledge Graphs (KG) offers opportunities for studying the structure of scholarly communication and evolution of science. Such KGs build the foundation for knowledge-driven tasks e.g., link discovery, prediction and entity classification which allow to provide recommendation services. Knowledge graph embedding (KGE) models have been investigated for such knowledge-driven tasks in different application domains. One of the applications of KGE models is to provide link predictions, which can also be viewed as a foundation for recommendation service, e.g. high confidence “co-author” links in a scholarly knowledge graph can be seen as suggested collaborations. In this paper, KGEs are reconciled with a specific loss function (Soft Margin) and examined with respect to their performance for co-authorship link prediction task on scholarly KGs. The results show a significant improvement in the accuracy of the experimented KGE models on the considered scholarly KGs using this specific loss. TransE with Soft Margin (TransE-SM) obtains a score of 79.5% Hits@10 for co-authorship link prediction task while the original TransE obtains 77.2%, on the same task. In terms of accuracy and Hits@10, TransE-SM also outperforms other state-of-the-art embedding models such as ComplEx, ConvE and RotatE in this setting. The predicted co-authorship links have been validated by evaluating profile of scholars.

Mojtaba Nayyeri, Sahar Vahdati, Xiaotian Zhou, Hamed Shariat Yazdi, Jens Lehmann

Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach

Knowledge about the software used in scientific investigations is necessary for different reasons, including provenance of the results, measuring software impact to attribute developers, and bibliometric software citation analysis in general. Additionally, providing information about whether and how the software and the source code are available allows an assessment about the state and role of open source software in science in general. While such analyses can be done manually, large scale analyses require the application of automated methods of information extraction and linking. In this paper, we present SoftwareKG—a knowledge graph that contains information about software mentions from more than 51,000 scientific articles from the social sciences. A silver standard corpus, created by a distant and weak supervision approach, and a gold standard corpus, created by manual annotation, were used to train an LSTM based neural network to identify software mentions in scientific articles. The model achieves a recognition rate of .82 F-score in exact matches. As a result, we identified more than 133,000 software mentions. For entity disambiguation, we used the public domain knowledge base DBpedia. Furthermore, we linked the entities of the knowledge graph to other knowledge bases such as the Microsoft Academic Knowledge Graph, the Software Ontology, and Wikidata. Finally, we illustrate, how SoftwareKG can be used to assess the role of software in the social sciences.

David Schindler, Benjamin Zapilko, Frank Krüger

Fostering Scientific Meta-analyses with Knowledge Graphs: A Case-Study

A meta-analysis is a Science of Science method widely used in the medical and social sciences to review, aggregate and quantitatively synthesise a body of studies that address the same research question. With the volume of research growing exponentially every year, conducting meta-analyses can be costly and inefficient, as a significant amount of time and human efforts needs to be spent in finding studies meeting research criteria, annotating them, and properly performing the statistical analyses to summarise the findings. In this work, we show these issues can be tackled with semantic representations and technologies, using a social science scenario as case-study. We show how the domain-specific content of research outputs can be represented and used to facilitate their search, analysis and synthesis. We present the very first representation of the domain of human cooperation, and the application we built on top of this to help experts in performing meta-analyses semi-automatically. Using few application scenarios, we show how our approach supports the various phases meta-analyses, and more in general contributes towards research replication and automated hypotheses generation.

Ilaria Tiddi, Daniel Balliet, Annette ten Teije

Security, Privacy, Licensing and Trust

Frontmatter

SAShA: Semantic-Aware Shilling Attacks on Recommender Systems Exploiting Knowledge Graphs

Recommender systems (RS) play a focal position in modern user-centric online services. Among them, collaborative filtering (CF) approaches have shown leading accuracy performance compared to content-based filtering (CBF) methods. Their success is due to an effective exploitation of similarities/correlations encoded in user interaction patterns, which is computed by considering common items users rated in the past. However, their strength is also their weakness. Indeed, a malicious agent can alter recommendations by adding fake user profiles into the platform thereby altering the actual similarity values in an engineered way.The spread of well-curated information available in knowledge graphs ($$\mathcal {KG}$$) has opened the door to several new possibilities in compromising the security of a recommender system. In fact, $$\mathcal {KG}$$ are a wealthy source of information that can dramatically increase the attacker’s (and the defender’s) knowledge of the underlying system. In this paper, we introduce SAShA, a new attack strategy that leverages semantic features extracted from a knowledge graph in order to strengthen the efficacy of the attack to standard CF models. We performed an extensive experimental evaluation in order to investigate whether SAShA is more effective than baseline attacks against CF models by taking into account the impact of various semantic features. Experimental results on two real-world datasets show the usefulness of our strategy in favor of attacker’s capacity in attacking CF models.

Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, Felice Antonio Merra

Knowledge Graphs

Frontmatter

Entity Extraction from Wikipedia List Pages

When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia’s policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia’s list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.

Nicolas Heist, Heiko Paulheim

The Knowledge Graph Track at OAEI

Gold Standards, Baselines, and the Golden Hammer Bias

The Ontology Alignment Evaluation Initiative (OAEI) is an annual evaluation of ontology matching tools. In 2018, we have started the Knowledge Graph track, whose goal is to evaluate the simultaneous matching of entities and schemas of large-scale knowledge graphs. In this paper, we discuss the design of the track and two different strategies of gold standard creation. We analyze results and experiences obtained in first editions of the track, and, by revealing a hidden task, we show that all tools submitted to the track (and probably also to other tracks) suffer from a bias which we name the golden hammer bias.

Sven Hertling, Heiko Paulheim

Detecting Synonymous Properties by Shared Data-Driven Definitions

Knowledge graphs have become an essential source of entity-centric information for modern applications. Today’s KGs have reached a size of billions of RDF triples extracted from a variety of sources, including structured sources and text. While this definitely improves completeness, the inherent variety of sources leads to severe heterogeneity, negatively affecting data quality by introducing duplicate information. We present a novel technique for detecting synonymous properties in large knowledge graphs by mining interpretable definitions of properties using association rule mining. Relying on such shared definitions, our technique is able to mine even synonym rules that have only little support in the data. In particular, our extensive experiments on DBpedia and Wikidata show that our rule-based approach can outperform state-of-the-art knowledge graph embedding techniques, while offering good interpretability through shared logical rules.

Jan-Christoph Kalo, Stephan Mennicke, Philipp Ehler, Wolf-Tilo Balke

Entity Summarization with User Feedback

Semantic Web applications have benefited from entity summarization techniques which compute a compact summary for an entity by selecting a set of key triples from underlying data. A wide variety of entity summarizers have been developed. However, the quality of summaries they generate are still not satisfying, and we lack mechanisms for improving computed summaries. To address this challenge, in this paper we present the first study of entity summarization with user feedback. We consider a cooperative environment where a user reads the current entity summary and provides feedback to help an entity summarizer compute an improved summary. Our approach represents this iterative process as a Markov decision process where the entity summarizer is modeled as a reinforcement learning agent. To exploit user feedback, we represent the interdependence of triples in the current summary and the user feedback by a novel deep neural network which is incorporated into the policy of the agent. Our approach outperforms five baseline methods in extensive experiments with both real users and simulated users.

Qingxia Liu, Yue Chen, Gong Cheng, Evgeny Kharlamov, Junyou Li, Yuzhong Qu

Incremental Multi-source Entity Resolution for Knowledge Graph Completion

We present and evaluate new methods for incremental entity resolution as needed for the completion of knowledge graphs integrating data from multiple sources. Compared to previous approaches we aim at reducing the dependency on the order in which new sources and entities are added. For this purpose, we consider sets of new entities for an optimized assignment of them to entity clusters. We also propose the use of a light-weight approach to repair entity clusters in order to correct wrong clusters. The new approaches are integrated within the FAMER framework for parallel and scalable entity clustering. A detailed evaluation of the new approaches for real-world workloads shows their high effectiveness. In particular, the repair approach outperforms other incremental approaches and achieves the same quality than with batch-like entity resolution showing that its results are independent from the order in which new entities are added.

Alieh Saeedi, Eric Peukert, Erhard Rahm

Building Linked Spatio-Temporal Data from Vectorized Historical Maps

Historical maps provide a rich source of information for researchers in the social and natural sciences. These maps contain detailed documentation of a wide variety of natural and human-made features and their changes over time, such as the changes in the transportation networks and the decline of wetlands. It can be labor-intensive for a scientist to analyze changes across space and time in such maps, even after they have been digitized and converted to a vector format. In this paper, we present an unsupervised approach that converts vector data of geographic features extracted from multiple historical maps into linked spatio-temporal data. The resulting graphs can be easily queried and visualized to understand the changes in specific regions over time. We evaluate our technique on railroad network data extracted from USGS historical topographic maps for several regions over multiple map sheets and demonstrate how the automatically constructed linked geospatial data enables effective querying of the changes over different time periods.

Basel Shbita, Craig A. Knoblock, Weiwei Duan, Yao-Yi Chiang, Johannes H. Uhl, Stefan Leyk

Integration, Services and APIs

Frontmatter

QAnswer KG: Designing a Portable Question Answering System over RDF Data

While RDF was designed to make data easily readable by machines, it does not make data easily usable by end-users. Question Answering (QA) over Knowledge Graphs (KGs) is seen as the technology which is able to bridge this gap. It aims to build systems which are capable of extracting the answer to a user’s natural language question from an RDF dataset.In recent years, many approaches were proposed which tackle the problem of QA over KGs. Despite such efforts, it is hard and cumbersome to create a Question Answering system on top of a new RDF dataset. The main open challenge remains portability, i.e., the possibility to apply a QA algorithm easily on new and previously untested RDF datasets.In this publication, we address the problem of portability by presenting an architecture for a portable QA system. We present a novel approach called QAnswer KG, which allows the construction of on-demand QA systems over new RDF datasets. Hence, our approach addresses non-expert users in QA domain.In this paper, we provide the details of QA system generation process. We show that it is possible to build a QA system over any RDF dataset while requiring minimal investments in terms of training. We run experiments using 3 different datasets.To the best of our knowledge, we are the first to design a process for non-expert users. We enable such users to efficiently create an on-demand, scalable, multilingual, QA system on top of any RDF dataset.

Dennis Diefenbach, José Giménez-García, Andreas Both, Kamal Singh, Pierre Maret

Equivalent Rewritings on Path Views with Binding Patterns

A view with a binding pattern is a parameterized query on a database. Such views are used, e.g., to model Web services. To answer a query on such views, the views have to be orchestrated together in execution plans. We show how queries can be rewritten into equivalent execution plans, which are guaranteed to deliver the same results as the query on all databases. We provide a correct and complete algorithm to find these plans for path views and atomic queries. Finally, we show that our method can be used to answer queries on real-world Web services.

Julien Romero, Nicoleta Preda, Antoine Amarilli, Fabian Suchanek

Resources

Frontmatter

A Knowledge Graph for Industry 4.0

One of the most crucial tasks for today’s knowledge workers is to get and retain a thorough overview on the latest state of the art. Especially in dynamic and evolving domains, the amount of relevant sources is constantly increasing, updating and overruling previous methods and approaches. For instance, the digital transformation of manufacturing systems, called Industry 4.0, currently faces an overwhelming amount of standardization efforts and reference initiatives, resulting in a sophisticated information environment. We propose a structured dataset in the form of a semantically annotated knowledge graph for Industry 4.0 related standards, norms and reference frameworks. The graph provides a Linked Data-conform collection of annotated, classified reference guidelines supporting newcomers and experts alike in understanding how to implement Industry 4.0 systems. We illustrate the suitability of the graph for various use cases, its already existing applications, present the maintenance process and evaluate its quality.

Sebastian R. Bader, Irlan Grangel-Gonzalez, Priyanka Nanjappa, Maria-Esther Vidal, Maria Maleshkova

MetaLink: A Travel Guide to the LOD Cloud

Graph-based traversal is an important navigation paradigm for the Semantic Web, where datasets are interlinked to provide context. While following links may result in the discovery of complementary data sources and enriched query results, it is widely recognized that traversing the LOD Cloud indiscriminately results in low quality answers. Over the years, approaches have been published that help to determine whether links are trustworthy or not, based on certain criteria. While such approaches are often useful for specific datasets and/or in specific applications, they are not yet widely used in practice or at the scale of the entire LOD Cloud. This paper introduces a new resource called MetaLink. MetaLink is a dataset that contains metadata for a very large set of owl:sameAs links that are crawled from the LOD Cloud. MetaLink encodes a previously published error metric for each of these links. MetaLink is published in combination with LOD-a-lot, a dataset that is based on a large crawl of a subset of the LOD Cloud. By combining MetaLink and LOD-a-lot, applications are able to make informed decisions about whether or not to follow specific links on the LOD Cloud. This paper describes our approach for creating the MetaLink dataset. It describes the vocabulary that it uses and provides an overview of multiple real-world use cases in which the MetaLink dataset can solve non-trivial research and application challenges that were not addressed before.

Wouter Beek, Joe Raad, Erman Acar, Frank van Harmelen

Astrea: Automatic Generation of SHACL Shapes from Ontologies

Knowledge Graphs (KGs) that publish RDF data modelled using ontologies in a wide range of domains have populated the Web. The SHACL language is a W3C recommendation that has been endowed to encode a set of either value or model data restrictions that aim at validating KG data, ensuring data quality. Developing shapes is a complex and time consuming task that is not feasible to achieve manually. This article presents two resources that aim at generating automatically SHACL shapes for a set of ontologies: (1) Astrea-KG, a KG that publishes a set of mappings that encode the equivalent conceptual restrictions among ontology constraint patterns and SHACL constraint patterns, and (2) Astrea, a tool that automatically generates SHACL shapes from a set of ontologies by executing the mappings from the Astrea-KG. These two resources are openly available at Zenodo, GitHub, and a web application. In contrast to other proposals, these resources cover a large number of SHACL restrictions producing both value and model data restrictions, whereas other proposals consider only a limited number of restrictions or focus only on value or model restrictions.

Andrea Cimmino, Alba Fernández-Izquierdo, Raúl García-Castro

SemTab 2019: Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems

Tabular data to Knowledge Graph matching is the process of assigning semantic tags from knowledge graphs (e.g., Wikidata or DBpedia) to the elements of a table. This task is a challenging problem for various reasons, including the lack of metadata (e.g., table and column names), the noisiness, heterogeneity, incompleteness and ambiguity in the data. The results of this task provide significant insights about potentially highly valuable tabular data, as recent works have shown, enabling a new family of data analytics and data science applications. Despite significant amount of work on various flavors of this problem, there is a lack of a common framework to conduct a systematic evaluation of state-of-the-art systems. The creation of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) aims at filling this gap. In this paper, we report about the datasets, infrastructure and lessons learned from the first edition of the SemTab challenge.

Ernesto Jiménez-Ruiz, Oktie Hassanzadeh, Vasilis Efthymiou, Jiaoyan Chen, Kavitha Srinivas

VQuAnDa: Verbalization QUestion ANswering DAtaset

Question Answering (QA) systems over Knowledge Graphs (KGs) aim to provide a concise answer to a given natural language question. Despite the significant evolution of QA methods over the past years, there are still some core lines of work, which are lagging behind. This is especially true for methods and datasets that support the verbalization of answers in natural language. Specifically, to the best of our knowledge, none of the existing Question Answering datasets provide any verbalization data for the question-query pairs. Hence, we aim to fill this gap by providing the first QA dataset VQuAnDa that includes the verbalization of each answer. We base VQuAnDa on a commonly used large-scale QA dataset – LC-QuAD, in order to support compatibility and continuity of previous work. We complement the dataset with baseline scores for measuring future training and evaluation work, by using a set of standard sequence to sequence models and sharing the results of the experiments. This resource empowers researchers to train and evaluate a variety of models to generate answer verbalizations.

Endri Kacupaj, Hamid Zafar, Jens Lehmann, Maria Maleshkova

ESBM: An Entity Summarization BenchMark

Entity summarization is the problem of computing an optimal compact summary for an entity by selecting a size-constrained subset of triples from RDF data. Entity summarization supports a multiplicity of applications and has led to fruitful research. However, there is a lack of evaluation efforts that cover the broad spectrum of existing systems. One reason is a lack of benchmarks for evaluation. Some benchmarks are no longer available, while others are small and have limitations. In this paper, we create an Entity Summarization BenchMark (ESBM) which overcomes the limitations of existing benchmarks and meets standard desiderata for a benchmark. Using this largest available benchmark for evaluating general-purpose entity summarizers, we perform the most extensive experiment to date where 9 existing systems are compared. Considering that all of these systems are unsupervised, we also implement and evaluate a supervised learning based system for reference.

Qingxia Liu, Gong Cheng, Kalpa Gunaratna, Yuzhong Qu

GEval: A Modular and Extensible Evaluation Framework for Graph Embedding Techniques

While RDF data are graph shaped by nature, most traditional Machine Learning (ML) algorithms expect data in a vector form. To transform graph elements to vectors, several graph embedding approaches have been proposed. Comparing these approaches is interesting for 1) developers of new embedding techniques to verify in which cases their proposal outperforms the state-of-art and 2) consumers of these techniques in choosing the best approach according to the task(s) the vectors will be used for. The comparison could be delayed (and made difficult) by the choice of tasks, the design of the evaluation, the selection of models, parameters, and needed datasets. We propose GEval, an evaluation framework to simplify the evaluation and the comparison of graph embedding techniques. The covered tasks range from ML tasks (Classification, Regression, Clustering), semantic tasks (entity relatedness, document similarity) to semantic analogies. However, GEval is designed to be (easily) extensible. In this article, we will describe the design and development of the proposed framework by detailing its overall structure, the already implemented tasks, and how to extend it. In conclusion, to demonstrate its operating approach, we consider the parameter tuning of the KGloVe algorithm as a use case.

Maria Angela Pellegrino, Abdulrahman Altabba, Martina Garofalo, Petar Ristoski, Michael Cochez

YAGO 4: A Reason-able Knowledge Base

YAGO is one of the large knowledge bases in the Linked Open Data cloud. In this resource paper, we present its latest version, YAGO 4, which reconciles the rigorous typing and constraints of schema.org with the rich instance data of Wikidata. The resulting resource contains 2 billion type-consistent triples for 64 Million entities, and has a consistent ontology that allows semantic reasoning with OWL 2 description logics.

Thomas Pellissier Tanon, Gerhard Weikum, Fabian Suchanek

In-Use

Frontmatter

On Modeling the Physical World as a Collection of Things: The W3C Thing Description Ontology

This document presents the Thing Description ontology, an axiomatization of the W3C Thing Description model. It also introduces an alignment with the Semantic Sensor Network ontology and evaluates how this alignment contributes to semantic interoperability in the Web of Things.

Victor Charpenay, Sebastian Käbisch

Applying Knowledge Graphs as Integrated Semantic Information Model for the Computerized Engineering of Building Automation Systems

During the life cycle of a smart building, an extensive amount of heterogeneous information is required to plan, construct, operate and maintain the building and its technical systems. Traditionally, there is an information gap between the different phases and stakeholders, leading to information being exchanged, processed and stored in a variety of mostly human-readable documents. This paper shows how a knowledge graph can be established as integrated information model that can provide the required information for all phases in a machine-interpretable way. The knowledge graph describes and connects all relevant information, which allows combining and applying it in a holistic way. This makes the knowledge graph a key enabler for a variety of advanced, computerized engineering tasks, ranging from the planning and design phases over the commissioning and the operation of a building. The computerized engineering of building automation systems (BAS) with an advanced software tool chain is presented as such a use case in more detail. The knowledge graph is based on standard semantic web technologies and builds on existing ontologies, such as the Brick and QUDT ontologies, with various novel extensions presented in this paper. Special attention is given to the rich semantic definition of the entities, such as the equipment and the typically thousands of datapoints in a BAS, which can be achieved as a combination of contextual modeling and semantic tagging.

Henrik Dibowski, Francesco Massa Gray

Supporting Complex Decision Making by Semantic Technologies

Complex decisions require stakeholders to identify potential decision options and collaboratively select the optimal option. Identifying potential decision options and communicating them to stakeholders is a challenging task as it requires the translation of the decision option’s technical dimension to a stakeholder-compliant language which describes the impact of the decision (e.g., financial, political). Existing knowledge-driven decision support methods generate decision options by automatically processing available data and knowledge. Ontology-based methods emerged as a sub-field in the medical domain and provide concrete instructions for given medical problems. However, the research field lacks an evaluated practical approach to support the full cycle from data and knowledge assessment to the actual decision making. This work advances the field by: (i) a problem-driven ontology engineering method which (a) supports creating the necessary ontology model for the given problem domain and (b) harmonizes relevant data and knowledge sources for automatically identifying decision options by reasoners, and (ii) an approach which translates technical decision options into a language that is understood by relevant stakeholders. Expert evaluations and real-world deployments in three different domains demonstrate the added value of this method.

Stefan Fenz

Open Access

Piveau: A Large-Scale Open Data Management Platform Based on Semantic Web Technologies

The publication and (re)utilization of Open Data is still facing multiple barriers on technical, organizational and legal levels. This includes limitations in interfaces, search capabilities, provision of quality information and the lack of definite standards and implementation guidelines. Many Semantic Web specifications and technologies are specifically designed to address the publication of data on the web. In addition, many official publication bodies encourage and foster the development of Open Data standards based on Semantic Web principles. However, no existing solution for managing Open Data takes full advantage of these possibilities and benefits. In this paper, we present our solution “Piveau”, a fully-fledged Open Data management solution, based on Semantic Web technologies. It harnesses a variety of standards, like RDF, DCAT, DQV, and SKOS, to overcome the barriers in Open Data publication. The solution puts a strong focus on assuring data quality and scalability. We give a detailed description of the underlying, highly scalable, service-oriented architecture, how we integrated the aforementioned standards, and used a triplestore as our primary database. We have evaluated our work in a comprehensive feature comparison to established solutions and through a practical application in a production environment, the European Data Portal. Our solution is available as Open Source.

Fabian Kirstein, Kyriakos Stefanidis, Benjamin Dittwald, Simon Dutkowski, Sebastian Urbanek, Manfred Hauswirth

PDF View full text

StreamPipes Connect: Semantics-Based Edge Adapters for the IIoT

Accessing continuous time series data from various machines and sensors is a crucial task to enable data-driven decision making in the Industrial Internet of Things (IIoT). However, connecting data from industrial machines to real-time analytics software is still technically complex and time-consuming due to the heterogeneity of protocols, formats and sensor types. To mitigate these challenges, we present StreamPipes Connect, targeted at domain experts to ingest, harmonize, and share time series data as part of our industry-proven open source IIoT analytics toolbox StreamPipes. Our main contributions are (i) a semantic adapter model including automated transformation rules for pre-processing, and (ii) a distributed architecture design to instantiate adapters at edge nodes where the data originates. The evaluation of a conducted user study shows that domain experts are capable of connecting new sources in less than a minute by using our system. The presented solution is publicly available as part of the open source software Apache StreamPipes.

Philipp Zehnder, Patrick Wiener, Tim Straub, Dominik Riemer

Springer Professional

About this book

Table of Contents

Frontmatter

Ontologies and Reasoning

Frontmatter

Handling Impossible Derivations During Stream Reasoning

Modular Graphical Ontology Engineering Evaluated

Fast and Exact Rule Mining with AMIE 3

A Simple Method for Inducing Class Taxonomies in Knowledge Graphs

Hybrid Reasoning Over Large Knowledge Bases Using On-The-Fly Knowledge Extraction

Natural Language Processing and Information Retrieval

Frontmatter

Partial Domain Adaptation for Relation Extraction Based on Adversarial Learning

SASOBUS: Semi-automatic Sentiment Domain Ontology Building Using Synsets

Keyword Search over RDF Using Document-Centric Information Retrieval Systems

Entity Linking and Lexico-Semantic Patterns for Ontology Learning

Semantic Data Management and Data Infrastructures

Frontmatter

Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling

Social and Human Aspects of the Semantic Web

Frontmatter

SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata

Machine Learning

Frontmatter

Hyperbolic Knowledge Graph Embeddings for Knowledge Base Completion

Unsupervised Bootstrapping of Active Learning for Entity Resolution

Distribution and Decentralization

Frontmatter

Processing SPARQL Aggregate Queries with Web Preemption

Science of Science

Frontmatter

Embedding-Based Recommendations on Scholarly Knowledge Graphs

Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach

Fostering Scientific Meta-analyses with Knowledge Graphs: A Case-Study

Security, Privacy, Licensing and Trust

Frontmatter

SAShA: Semantic-Aware Shilling Attacks on Recommender Systems Exploiting Knowledge Graphs

Knowledge Graphs

Frontmatter

Entity Extraction from Wikipedia List Pages

The Knowledge Graph Track at OAEI

Detecting Synonymous Properties by Shared Data-Driven Definitions

Entity Summarization with User Feedback

Incremental Multi-source Entity Resolution for Knowledge Graph Completion

Building Linked Spatio-Temporal Data from Vectorized Historical Maps

Integration, Services and APIs

Frontmatter

QAnswer KG: Designing a Portable Question Answering System over RDF Data

Equivalent Rewritings on Path Views with Binding Patterns

Resources

Frontmatter

A Knowledge Graph for Industry 4.0

MetaLink: A Travel Guide to the LOD Cloud

Astrea: Automatic Generation of SHACL Shapes from Ontologies

SemTab 2019: Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems

VQuAnDa: Verbalization QUestion ANswering DAtaset

ESBM: An Entity Summarization BenchMark

GEval: A Modular and Extensible Evaluation Framework for Graph Embedding Techniques

YAGO 4: A Reason-able Knowledge Base

In-Use

Frontmatter

On Modeling the Physical World as a Collection of Things: The W3C Thing Description Ontology

Applying Knowledge Graphs as Integrated Semantic Information Model for the Computerized Engineering of Building Automation Systems

Supporting Complex Decision Making by Semantic Technologies

Piveau: A Large-Scale Open Data Management Platform Based on Semantic Web Technologies

StreamPipes Connect: Semantics-Based Edge Adapters for the IIoT

Backmatter