nach oben

2015 | Buch

The Semantic Web - ISWC 2015

14th International Semantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part I

herausgegeben von: Marcelo Arenas, Oscar Corcho, Elena Simperl, Markus Strohmaier, Mathieu d'Aquin, Kavitha Srinivas, Paul Groth, Michel Dumontier, Jeff Heflin, Krishnaprasad Thirunarayan, Krishnaprasad Thirunarayan, Steffen Staab

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The two-volume set LNCS 9366 and 9367 constitutes the refereed proceedings of the 14th International Semantic Web Conference, ISWC 2015, held in Bethlehem, PA, USA, in October 2015.

The International Semantic Web Conference is the premier forum for Semantic Web research, where cutting edge scientific results and technological innovations are presented, where problems and solutions are discussed, and where the future of this vision is being developed. It brings together specialists in fields such as artificial intelligence, databases, social networks, distributed computing, Web engineering, information systems, human-computer interaction, natural language processing, and the social sciences.

The papers cover topics such as querying with SPARQL; querying linked data; linked data; ontology-based data access; ontology alignment; reasoning; instance matching, entity resolution and topic generation; RDF data dynamics; ontology extraction and generation; knowledge graphs and scientific data publication; ontology instance alignment; knowledge graphs; data processing, IoT, sensors; archiving and publishing scientific data; I

oT and sensors; experiments; evaluation; and empirical studies.

Part 1 (LNCS 9366) contains a total of 38 papers which were presented in the research track. They were carefully reviewed and selected from 172 submissions.

Part 2 (LNCS 9367) contains 14 papers from the in-use and software track, 8 papers from the datasets and ontologies track, and 7 papers from the empirical studies and experiments track, selected, respectively, from 33, 35, and 23 submissions.

Inhaltsverzeichnis

Frontmatter

Querying with SPARQL

Frontmatter

SPARQL with Property Paths

The original SPARQL proposal was often criticized for its inability to navigate through the structure of RDF documents. For this reason property paths were introduced in SPARQL 1.1, but up to date there are no theoretical studies examining how their addition to the language affects main computational tasks such as query evaluation, query containment, and query subsumption. In this paper we tackle all of these problems and show that although the addition of property paths has no impact on query evaluation, they do make the containment and subsumption problems substantially more difficult.

Egor V. Kostylev, Juan L. Reutter, Miguel Romero, Domagoj Vrgoč

Recursion in SPARQL

In this paper we propose a general purpose recursion operator to be added to SPARQL, formalize its syntax and develop algorithms for evaluating it in practical scenarios. We also show how to implement recursion as a plug-in on top of existing systems and test its performance on several real world datasets.

Juan L. Reutter, Adrián Soto, Domagoj Vrgoč

Federated SPARQL Queries Processing with Replicated Fragments

Federated query engines provide a unified query interface to federations of SPARQL endpoints. Replicating data fragments from different Linked Data sources facilitates data re-organization to better fit federated query processing needs of data consumers. However, existing federated query engines are not designed to support replication and replicated data can negatively impact their performance. In this paper, we formulate the source selection problem with fragment replication (SSP-FR). For a given set of endpoints with replicated fragments and a SPARQL query, the problem is to select the endpoints that minimize the number of tuples to be transferred. We devise the

Fedra

source selection algorithm that approximates SSP-FR. We implement

Fedra

in the state-of-the-art federated query engines FedX and ANAPSID, and empirically evaluate their performance. Experimental results suggest that

Fedra

efficiently solves SSP-FR, reducing the number of selected SPARQL endpoints as well as the size of query intermediate results.

Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, Maria-Esther Vidal

FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework

Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks and benchmark generation frameworks have been developed to evaluate triple stores, they mostly provide a one-fits-all solution to the benchmarking problem. This approach to benchmarking is however unsuitable to evaluate the performance of a triple store for a given application with particular requirements. We address this drawback by presenting FEASIBLE, an automatic approach for the generation of benchmarks out of the query history of applications, i.e., query logs. The generation is achieved by selecting prototypical queries of a userdefined size from the input set of queries. We evaluate our approach on two query logs and show that the benchmarks it generates are accurate approximations of the input query logs. Moreover, we compare four different triple stores with benchmarks generated using our approach and show that they behave differently based on the data they contain and the types of queries posed. Our results suggest that FEASIBLE generates better sample queries than the state of the art. In addition, the better query selection and the larger set of query types used lead to triple store rankings which partly differ from the rankings generated by previous works.

Muhammad Saleem, Qaiser Mehmood, Axel-Cyrille Ngonga Ngomo

Querying Linked Data

Frontmatter

LDQL: A Query Language for the Web of Linked Data

The Web of Linked Data is composed of tons of RDF documents interlinked to each other forming a huge repository of distributed semantic data. Effectively querying this distributed data source is an important open problem in the SemanticWeb area. In this paper, we propose LDQL, a declarative language to query Linked Data on the Web. One of the novelties of LDQL is that it expresses separately (i) patterns that describe the expected query result, and (ii)Web navigation paths that select the data sources to be used for computing the result. We present a formal syntax and semantics, prove equivalence rules, and study the expressiveness of the language. In particular, we show that LDQL is strictly more expressive than the query formalisms that have been proposed previously for Linked Data on the Web. The high expressiveness allows LDQL to define queries for which a complete execution is not computationally feasible over the Web. We formally study this issue and provide a syntactic sufficient condition to avoid this problem; queries satisfying this condition are ensured to have a procedure to be effectively evaluated over the Web of Linked Data.

Olaf Hartig, Jorge Pérez

Opportunistic Linked Data Querying Through Approximate Membership Metadata

Between

uri

dereferencing and the

sparql

protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute

sparql

queries against lowcost servers, at the cost of higher bandwidth. Increasing a client’s efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical

sparql

query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing

http

requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer

http

requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface.

Miel Vander Sande, Ruben Verborgh, Joachim Van Herwegen, Erik Mannens, Rik Van de Walle

Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data

Client-side query processing techniques that rely on the materialization of fragments of the original RDF dataset provide a promising solution for Web query processing. However, because of unexpected data transfers, the traditional optimize-then-execute paradigm, used by existing approaches, is not always applicable in this context, i.e., performance of client-side execution plans can be negatively affected by live conditions where rate at which data arrive from sources changes. We tackle adaptivity for client-side query processing, and present a network of Linked Data Eddies that is able to adjust query execution schedulers to data availability and runtime conditions. Experimental studies suggest that the network of Linked Data Eddies outperforms static Web query schedulers in scenarios with unpredictable transfer delays and data distributions.

Maribel Acosta, Maria-Esther Vidal

Substring Filtering for Low-Cost Linked Data Interfaces

Recently, Triple Pattern Fragments (

tpfs

) were introduced as a low-cost server-side interface when high numbers of clients need to evaluate

sparql

queries. Scalability is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the

tpfs

interface purposely does not support complex constructs such as

sparql

filters, queries that use them need to be executed mostly on the client, resulting in long execution times. We therefore investigated the impact of adding a literal substring matching feature to the

tpfs

interface, with the goal of improving query performance while maintaining low server cost. In this paper, we discuss the client/server setup and compare the performance of

sparql

queries on multiple implementations, including Elastic Search and case-insensitive

-index. Our evaluations indicate that these improvements allow for faster query execution without significantly increasing the load on the server. Offering the substring feature on

tpf

servers allows users to obtain faster responses for filter-based

sparql

queries. Furthermore, substring matching can be used to support other filters such as complete regular expressions or range queries.

Joachim Van Herwegen, Laurens De Vocht, Ruben Verborgh, Erik Mannens, Rik Van de Walle

Linked Data

Frontmatter

LinkDaViz - Automatic Binding of Linked Data to Visualizations

As the Web of Data is growing steadily, the demand for userfriendly means for exploring, analyzing and visualizing Linked Data is also increasing. The key challenge for visualizing Linked Data consists in providing a clear overview of the data and supporting non-technical users in finding suitable visualizations while hiding technical details of Linked Data and visualization configuration. In order to accomplish this, we propose a largely automatic workflow which guides users through the process of creating visualizations by automatically categorizing and binding data to visualization parameters. The approach is based on a heuristic analysis of the structure of the input data and a comprehensive visualization model facilitating the automatic binding between data and visualization parameters. The resulting assignments are ranked and presented to the user. With LinkDaViz we provide a web-based implementation of the approach and demonstrate the feasibility by an extended user and performance evaluation.

Klaudia Thellmann, Michael Galkin, Fabrizio Orlandi, Sören Auer

Facilitating Entity Navigation Through Top-K Link Patterns

Entity navigation over Linked Data often follows semantic links by using Linked Data browsers. With the increasing volume of Linked Data, the rich and diverse links make it difficult for users to traverse the link graph and find target entities. Besides, there is a necessity for navigation paradigm to take into account not only single-entityoriented transition, but also entity-set-oriented transition. To facilitate entity navigation, we propose a novel concept called link pattern, and introduce link pattern lattice to organize semantic links when browsing an entity or a set of entities. Furthermore, to help users quickly find target entities, top-K link patterns are selected for entity navigation. The proposed approach is implemented in a prototype system and then compared with two Linked Data browsers via a user study. Experimental results show that our approach is effective.

Liang Zheng, Yuzhong Qu, Jidong Jiang, Gong Cheng

Serving DBpedia with DOLCE - More than Just Adding a Cherry on Top

Large knowledge bases, such as DBpedia, are most often created heuristically due to scalability issues. In the building process, both random as well as systematic errors may occur. In this paper, we focus on finding

systematic

errors, or

anti-patterns

, in DBpedia. We show that by aligning the DBpedia ontology to the foundational ontology DOLCEZero, and by combining reasoning and clustering of the reasoning results, errors affecting millions of statements can be identified at a minimal workload for the knowledge base designer.

Heiko Paulheim, Aldo Gangemi

Ontology-Based Data Access

Frontmatter

Ontology-Based Integration of Cross-Linked Datasets

In this paper we tackle the problem of answering SPARQL queries over

virtually integrated

databases. We assume that the entity resolution problem has already been solved and explicit information is available about which records in the different databases refer to the same real world entity. Surprisingly, to the best of our knowledge, there has been no attempt to extend the standard

Ontology-Based Data Access

(OBDA) setting to take into account these DB links for SPARQL queryanswering and consistency checking. This is partly because the OWL built-in

owl:sameAs

property, the most natural representation of links between data sets, is not included in OWL 2 QL, the

de facto

ontology language for OBDA. We formally treat several fundamental questions in this context: how links over database identifiers can be represented in terms of

owl:sameAs

statements, how to recover rewritability of SPARQL into SQL (lost because of

owl:sameAs

statements), and how to check consistency. Moreover, we investigate how our solution can be made to scale up to large enterprise datasets. We have implemented the approach, and carried out an extensive set of experiments showing its scalability.

Diego Calvanese, Martin Giese, Dag Hovland, Martin Rezk

Mapping Analysis in Ontology-Based Data Access: Algorithms and Complexity

Ontology-based data access (OBDA) is a recent paradigm for accessing data sources through an ontology that acts as a conceptual, integrated view of the data, and declarative mappings that connect the ontology to the data sources. We study the formal analysis of mappings in OBDA. Specifically, we focus on the problem of identifying mapping inconsistency and redundancy, two of the most important anomalies for mappings in OBDA. We consider a wide range of ontology languages that comprises OWL 2 and all its profiles, and examine mapping languages of different expressiveness over relational databases. We provide algorithms and establish tight complexity bounds for the decision problems associated with mapping inconsistency and redundancy. Our results prove that, in our general framework, such forms of mapping analysis enjoy nice computational properties, in the sense that they are not harder than standard reasoning tasks over the ontology or over the relational database schema.

Domenico Lembo, Jose Mora, Riccardo Rosati, Domenico Fabio Savo, Evgenij Thorstensen

Ontology Alignment

Frontmatter

Towards Defeasible Mappings for Tractable Description Logics

We present a novel approach to denote mappings between

$\mathcal{EL}$

-based ontologies which are defeasible in the sense that such a mapping only applies to individuals if this does not cause an inconsistency. This provides the advantage of handling exceptions automatically and thereby avoiding logical inconsistencies that may be caused due to the traditional type of mappings. We consider the case where mappings from many possibly heterogeneous ontologies are one-way links towards an overarching ontology. Questions can then be asked in terms of the concepts in the overarching ontology. We provide the formal semantics for the defeasible mappings and show that reasoning under such a setting is decidable even when the defeasible axioms apply to unknowns. Furthermore, we show that this semantics actually is strongly related to the idea of answer sets for logic programs.

Kunal Sengupta, Pascal Hitzler

An Algebra of Qualitative Taxonomical Relations for Ontology Alignments

Algebras of relations were shown useful in managing ontology alignments. They make it possible to aggregate alignments disjunctively or conjunctively and to propagate alignments within a network of ontologies. The previously considered algebra of relations contains taxonomical relations between classes. However, compositional inference using this algebra is sound only if we assume that classes which occur in alignments have nonempty extensions. Moreover, this algebra covers relations only between classes. Here we introduce a new algebra of relations, which, first, solves the limitation of the previous one, and second, incorporates all qualitative taxonomical relations that occur between individuals and concepts, including the relations “is a” and “is not”. We prove that this algebra is coherent with respect to the simple semantics of alignments.

Armen Inants, Jérôme Euzenat

CogMap: A Cognitive Support Approach to Property and Instance Alignment

The iterative user interaction approach for data integration proposed by Falconer and Noy can be generalized to consider interactions between integration tools (generators) that generate potential schema mappings and users or analysis tools (analyzers) that select the best mapping. Each such selection then provides high-confidence guidance for the next iteration of the integration tool. We have implemented this generalized approach in

CogMap

, a matching system for both property and instance alignments between heterogeneous data. The generator in

CogMap

uses the instance alignment from the previous iteration to create high-quality property alignments and presents these alignments and their consequences to the analyzer. Our experiments show that multiple iterations as well as the interplay between instance and property alignment serve to improve the final alignments.

Jan Nößner, David Martin, Peter Z. Yeh, Peter F. Patel-Schneider

Effective Online Knowledge Graph Fusion

Recently, Web search engines have empowered their search with knowledge graphs to satisfy increasing demands of complex information needs about entities. Each engine offers an online knowledge graph service to display highly relevant information about the query entity in form of a structured summary called

knowledge card

. The cards from different engines might be complementary. Therefore, it is necessary to fuse knowledge cards from these engines to get a comprehensive view. Such a problem can be considered as a new branch of ontology alignment, which is actually an on-the-fly online data fusion based on the users’ needs. In this paper, we present the first effort to work on knowledge cards fusion. We propose a novel probabilistic scoring algorithm for card disambiguation to select the most likely entity a card should refer to. We then design a learning-based method to align properties from cards representing the same entity. Finally, we perform value deduplication to group equivalent values of the aligned properties as value clusters. The experimental results show that our approach outperforms the state of the art ontology alignment algorithms in terms of precision and recall.

Haofen Wang, Zhijia Fang, Le Zhang, Jeff Z. Pan, Tong Ruan

Reasoning

Frontmatter

Adding DL-Lite TBoxes to Proper Knowledge Bases

Levesque’s proper knowledge bases (proper KBs) correspond to infinite sets of ground positive and negative facts, with the notable property that for FOL formulas in a certain normal form, which includes conjunctive queries and positive queries possibly extended with a controlled form of negation, entailment reduces to formula evaluation. However proper KBs represent extensional knowledge only. In description logic terms, they correspond to ABoxes. In this paper, we augment them with

DL-Lite

TBoxes, expressing intensional knowledge (i.e., the ontology of the domain).

DL-Lite

has the notable property that conjunctive query answering over TBoxes and standard description logic ABoxes is reducible to formula evaluation over the ABox only. Here, we investigate whether such a property extends to ABoxes consisting of proper KBs. Specifically, we consider two

DL-Lite

variants:

−

Lite

rdfs

, roughly corresponding to RDFS, and

−

Lite

core

, roughly corresponding to OWL 2 QL.We show that when a

−

Lite

rdfs

TBox is coupled with a proper KB, the TBox can be compiled away, reducing query answering to evaluation on the proper KB alone. But this reduction is no longer possible when we associate proper KBs with

−

Lite

core

TBoxes. Indeed, we show that in the latter case, query answering even for conjunctive queries becomes coNP-hard in data complexity.

Giuseppe De Giacomo, Hector Levesque

R2O2: An Efficient Ranking-Based Reasoner for OWL Ontologies

It has been shown, both theoretically and empirically, that performing core reasoning tasks on large and expressive ontologies in OWL 1 and OWL 2 is time-consuming and resource-intensive. Moreover, due to the different reasoning algorithms and optimisation techniques employed, each reasoner may be efficient for ontologies with different characteristics. In this paper, we present R

, a

meta-reasoner

that automatically combines, ranks and selects from a number of stateof- the-art OWL 2 DL reasoners to achieve high efficiency, making use of performance prediction models and ranking models. Our comprehensive evaluation on a large ontology corpus shows that R

significantly and consistently outperforms 6 state-of-the-art OWL 2 DL reasoners on average performance, with an average speedup of up to 14x. R

also shows a 1.4x speedup over Konclude, the current dominant OWL 2 DL reasoner.

Yong-Bin Kang, Shonali Krishnaswamy, Yuan-Fang Li

Rewriting-Based Instance Retrieval for Negated Concepts in Description Logic Ontologies

Instance retrieval computes all instances of a given concept in a consistent description logic (DL) ontology. Although it is a popular task for ontology reasoning, there is no scalable method for instance retrieval for negated concepts by now. This paper studies a new approach to instance retrieval for negated concepts based on query rewriting. A class of DL ontologies called the

inconsistency

−

based

first

−

order

rewritable

(

IFO

−

rewritable

) class is identified. This class guarantees that instance retrieval for an atomic negation can be reduced to answering a disjunction of conjunctive queries (CQs) over the ABox. The IFOrewritable class is more expressive than the first-order rewritable class which guarantees that answering a CQ is reducible to answering a disjunction of CQs over the ABox regardless of the TBox. Two sufficient conditions are proposed to detect IFO-rewritable ontologies that are not first-order rewritable. A rewriting-based method for retrieving instances of a negated concept is proposed for IFO-rewritable ontologies. Preliminary experimental results on retrieving instances of all atomic negations show that this method is significantly more efficient than existing methods implemented in state-of-the-art DL systems.

Jianfeng Du, Jeff Z. Pan

Optimizing the Computation of Overriding

We introduce optimization techniques for reasoning in

$\mathcal{DL}$

- a recently introduced family of nonmonotonic description logics whose characterizing features appear well-suited to model the examples naturally arising in biomedical domains and semantic web access control policies. Such optimizations are validated experimentally on large KBs with more than 30K axioms. Speedups exceed 1 order of magnitude. For the first time, response times compatible with real-time reasoning are obtained with nonmonotonic KBs of this size.

Piero A. Bonatti, Iliana M. Petrova, Luigi Sauro

Instance Matching, Entity Resolution and Topic Generation

Frontmatter

LANCE: Piercing to the Heart of Instance Matching Tools

One of the main challenges in the Data Web is the identification of instances that refer to the same real-world entity. Choosing the right framework for this purpose remains tedious, as current instance matching benchmarks fail to provide end users and developers with the necessary insights pertaining to how current frameworks behave when dealing with real data. In this paper, we present

lance

, a domain-independent instance matching benchmark generator which focuses on benchmarking instance matching systems for Linked Data.

lance

is the first Linked Data benchmark generator to support complex

semantics-aware

test cases that take into account expressive OWL constructs, in addition to the standard test cases related to structure and value transformations.

lance

supports the definition of matching tasks with varying degrees of difficulty and produces a weighted gold standard, which allows a more fine-grained analysis of the performance of instance matching tools. It can accept

any

linked dataset and its accompanying schema as input to produce a target dataset implementing test cases of varying levels of difficulty. We provide a comparative analysis with

lance

benchmarks to assess and identify the capabilities of state of the art instance matching systems as well as an evaluation to demonstrate the scalability of

lance

’s test case generator.

Tzanina Saveta, Evangelia Daskalaki, Giorgos Flouris, Irini Fundulaki, Melanie Herschel, Axel-Cyrille Ngonga Ngomo

Decision-Making Bias in Instance Matching Model Selection

Instance matching has emerged as an important problem in the Semantic Web, with machine learning methods proving especially effective. To enhance performance, task-specific knowledge is typically used to introduce bias in the model selection problem. Such biases tend to be exploited by practitioners in a piecemeal fashion. This paper introduces a framework where the model selection design process is represented as a factor graph. Nodes in this bipartite graphical model represent opportunities for explicitly introducing bias. The graph is first used to unify and visualize common biases in the design of existing instance matchers. As a direct application, we then use the graph to hypothesize about potential unexploited biases. The hypotheses are evaluated by training 1032 neural networks on three instance matching tasks on Microsoft Azure’s cloud-based platform. An analysis over 25 GB of experimental data indicates that the proposed biases can improve efficiency by over 65% over a baseline configuration, with effectiveness improving by a smaller margin. The findings lead to a promising set of four recommendations that can be integrated into existing supervised instance matchers.

Mayank Kejriwal, Daniel P. Miranker

Klink-2: Integrating Multiple Web Sources to Generate Semantic Topic Networks

The amount of scholarly data available on the web is steadily increasing, enabling different types of analytics which can provide important insights into the research activity. In order to make sense of and explore this large-scale body of knowledge we need an accurate, comprehensive and up-to-date ontology of research topics. Unfortunately, human crafted classifications do not satisfy these criteria, as they evolve too slowly and tend to be too coarse-grained. Current automated methods for generating ontologies of research areas also present a number of limitations, such as: i) they do not consider the rich amount of indirect statistical and semantic relationships, which can help to understand the relation between two topics - e.g., the fact that two research areas are associated with a similar set of venues or technologies; ii) they do not distinguish between different kinds of hierarchical relationships; and iii) they are not able to handle effectively ambiguous topics characterized by a noisy set of relationships. In this paper we present Klink-2, a novel approach which improves on our earlier work on automatic generation of semantic topic networks and addresses the aforementioned limitations by taking advantage of a variety of knowledge sources available on the web. In particular, Klink-2 analyses networks of research entities (including papers, authors, venues, and technologies) to infer three kinds of semantic relationships between topics. It also identifies ambiguous keywords (e.g., “ontology”) and separates them into the appropriate distinct topics - e.g., “ontology/philosophy” vs. “ontology/semantic web”. Our experimental evaluation shows that the ability of Klink-2 to integrate a high number of data sources and to generate topics with accurate contextual meaning yields significant improvements over other algorithms in terms of both precision and recall.

Francesco Osborne, Enrico Motta

TabEL: Entity Linking in Web Tables

Web tables form a valuable source of relational data. TheWeb contains an estimated 154 million HTML tables of relational data, with Wikipedia alone containing 1.6 million high-quality tables. Extracting the semantics of Web tables to produce machine-understandable knowledge has become an active area of research.

A key step in extracting the semantics of Web content is

entity linking

(EL): the task of mapping a phrase in text to its referent entity in a knowledge base (KB). In this paper we present

TabEL

, a new EL system for Web tables.

TabEL

differs from previous work by weakening the assumption that the semantics of a table can be mapped to pre-defined types and relations found in the target KB. Instead,

TabEL

enforces soft constraints in the form of a graphical model that assigns higher likelihood to sets of entities that tend to co-occur in Wikipedia documents and tables. In experiments,

TabEL

significantly reduces error when compared to current state-of-the-art table EL systems, including a 75% error reduction on Wikipedia tables and a 60% error reduction on Web tables. We also make our parsed Wikipedia table corpus and test datasets publicly available for future work.

Chandra Sekhar Bhagavatula, Thanapon Noraset, Doug Downey

Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation

Semantic relatedness and disambiguation are fundamental problems for linking text documents to the Web of Data. There are many approaches dealing with both problems but most of them rely on word or concept distribution over Wikipedia. They are therefore not applicable to concepts that do not have a rich textual description. In this paper, we show that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base. In addition, we propose a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness. As opposed to the majority of state-of-the-art systems that target mainly named entities, we use our approach to disambiguate both entities and common nouns. In our experiments, we first validate our relatedness measure on multiple knowledge bases and ground truth datasets and show that it performs better than related state-of-the-art graph based measures. Afterwards, we evaluate the disambiguation algorithm and show that it also achieves superior disambiguation accuracy with respect to alternative state-ofthe- art graph-based algorithms.

Ioana Hulpuş, Narumol Prangnawarat, Conor Hayes

SANAPHOR: Ontology-Based Coreference Resolution

We tackle the problem of resolving coreferences in textual content by leveraging Semantic Web techniques. Specifically, we focus on noun phrases that coreference identifiable entities that appear in the text; the challenge in this context is to improve the coreference resolution by leveraging potential semantic annotations that can be added to the identified mentions. Our system,

SANAPHOR

, first applies state-of-the-art techniques to extract entities, noun phrases, and candidate coreferences. Then, we propose an approach to type noun phrases using an inverted index built on top of a Knowledge Graph (e.g., DBpedia). Finally, we use the semantic relatedness of the introduced types to improve the stateof- the-art techniques by splitting and merging coreference clusters. We evaluate

SANAPHOR

on CoNLL datasets, and show how our techniques consistently improve the state of the art in coreference resolution.

Roman Prokofyev, Alberto Tonon, Michael Luggen, Loic Vouilloz, Djellel Eddine Difallah, Philippe Cudré-Mauroux

Improving Entity Retrieval on Structured Data

The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the

x-means

and

spectral

clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches.

Besnik Fetahu, Ujwal Gadiraju, Stefan Dietze

RDF Data Dynamics

Frontmatter

A Flexible Framework for Understanding the Dynamics of Evolving RDF Datasets

The dynamic nature of Web data gives rise to a multitude of problems related to the description and analysis of the evolution of RDF datasets, which are important to a large number of users and domains, such as, the curators of biological information where changes are constant and interrelated. In this paper, we propose a framework that enables identifying, analysing and understanding these dynamics. Our approach is flexible enough to capture the peculiarities and needs of different applications on dynamic data, while being formally robust due to the satisfaction of the completeness and unambiguity properties. In addition, our framework allows the persistent representation of the detected changes between versions, in a manner that enables easy and efficient navigation among versions, automated processing and analysis of changes, crosssnapshot queries (spanning across different versions), as well as queries involving both changes and data. Our work is evaluated using real Linked Open Data, and exhibits good scalability properties.

Yannis Roussakis, Ioannis Chrysakis, Kostas Stefanidis, Giorgos Flouris, Yannis Stavrakas

Interest-Based RDF Update Propagation

Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amounts of requests from diverse applications. Many data products and services rely on full or partial local LOD replications to ensure faster querying and processing. Given the evolving nature of the original and authoritative datasets, to ensure consistent and up-to-date replicas frequent replacements are required at a great cost. In this paper, we introduce an approach for interest-based RDF update propagation, which propagates only interesting parts of updates from the source to the target dataset. Effectively, this enables remote applications to ‘subscribe’ to relevant datasets and consistently reflect the necessary changes locally without the need to frequently replace the entire dataset (or a relevant subset). Our approach is based on a formal definition for graph-pattern-based interest expressions that is used to filter interesting parts of updates from the source. We implement the approach in the iRap framework and perform a comprehensive evaluation based on DBpedia Live updates, to confirm the validity and value of our approach.

Kemele M. Endris, Sidra Faisal, Fabrizio Orlandi, Sören Auer, Simon Scerri

Ontology Extraction and Generation

Frontmatter

General Terminology Induction in OWL

Automated acquisition, or learning, of ontologies has attracted research attention because it can help ontology engineers build ontologies and give domain experts new insights into their data. However, existing approaches to ontology learning are considerably limited, e.g. focus on learning descriptions for given classes, require intense supervision and human involvement, make assumptions about data, do not fully respect background knowledge. We investigate the problem of general terminology induction, i.e. learning sets of general class inclusions, GCIs, from data and background knowledge.We introduce measures that evaluate logical and statistical quality of a set of GCIs. We present methods to compute these measures and an anytime algorithm that induces sets of GCIs. Our experiments show that we can acquire interesting sets of GCIs and provide insights into the structure of the search space.

Viachaslau Sazonau, Uli Sattler, Gavin Brown

Understanding How Users Edit Ontologies: Comparing Hypotheses About Four Real-World Projects

Ontologies are complex intellectual artifacts and creating them requires significant expertise and effort. While existing ontologyediting tools and methodologies propose ways of building ontologies in a normative way, empirical investigations of how experts

actually

construct ontologies “in the wild” are rare. Yet, understanding actual user behavior can play an important role in the design of effective tool support. Although previous empirical investigations have produced a series of interesting insights, they were exploratory in nature and aimed at gauging the problem space only. In this work, we aim to advance the state of knowledge in this domain by systematically defining and comparing a set of hypotheses about how users edit ontologies. Towards that end, we study the user editing trails of four real-world ontologyengineering projects. Using a coherent research framework, called Hyp- Trails, we derive formal definitions of hypotheses from the literature, and systematically compare them with each other. Our findings suggest that the

hierarchical structure

of an ontology exercises the strongest influence on user editing behavior, followed by the

entity similarity

, and the

semantic distance

of classes in the ontology. Moreover, these findings are strikingly consistent across all ontology-engineering projects in our study, with only minor exceptions for one of the smaller datasets. We believe that our results are important for ontology tools builders and for project managers, who can potentially leverage this information to create user interfaces and processes that better support the observed editing patterns of users.

Simon Walk, Philipp Singer, Lisette Espín Noboa, Tania Tudorache, Mark A. Musen, Markus Strohmaier

Next Step for NoHR: OWL 2 QL

The Protégé plug-in NoHR allows the user to combine an OWL 2 EL ontology with a set of non-monotonic (logic programming) rules - suitable, e.g., to express defaults and exceptions - and query the combined knowledge base (KB). The formal approach realized in NoHR is polynomial (w.r.t. data complexity) and it has been shown that even very large health care ontologies, such as SNOMED CT, can be handled. As each of the tractable OWL profiles is motivated by different application cases, extending the tool to the other profiles is of particular interest, also because these preserve the polynomial data complexity of the combined formalism. Yet, a straightforward adaptation of the existing approach to OWL 2 QL turns out to not be viable. In this paper, we provide the non-trivial solution for the extension of NoHR to OWL 2 QL by directly translating the ontology into rules without any prior classification. We have implemented our approach and our evaluation shows encouraging results.

Nuno Costa, Matthias Knorr, João Leite

Concept Forgetting in $\mathcal{ALCOI}$ -Ontologies Using an Ackermann Approach

We present a method for forgetting concept symbols in ontologies specified in the description logic

$\mathcal{ALCOI}$

. The method is an adaptation and improvement of a second-order quantifier elimination method developed for modal logics and used for computing correspondence properties for modal axioms. It follows an approach exploiting a result of Ackermann adapted to description logics. An important feature inherited from the modal approach is that the inference rules are guided by an ordering compatible with the elimination order of the concept symbols. This provides more control over the inference process and reduces non-determinism, resulting in a smaller search space. The method is extended with a new case splitting inference rule, and several simplification rules. Compared to related forgetting and uniform interpolation methods for description logics, the method can handle inverse roles, nominals and ABoxes. Compared to the modal approach on which it is based, it is more efficient in time and improves the success rates. The method has been implemented in Java using the OWL API. Experimental results show that the order in which the concept symbols are eliminated significantly affects the success rate and efficiency.

Yizheng Zhao, Renate A. Schmidt

Knowledge Graphs and Scientific Data Publication

Frontmatter

Content-Based Recommendations via DBpedia and Freebase: A Case Study in the Music Domain

The

Web of Data

has been introduced as a novel scheme for imposing structured data on the Web. This renders data easily understandable by human beings and seamlessly processable by machines at the same time. The recent boom in

Linked Data

facilitates a new stream of data-intensive applications that leverage the knowledge available in semantic datasets such as

DBpedia

and

Freebase

. These latter are well known encyclopedic collections of data that can be used to feed a contentbased recommender system. In this paper we investigate how the choice of one of the two datasets may influence the performance of a recommendation engine not only in terms of precision of the results but also in terms of their diversity and novelty. We tested four different recommendation approaches exploiting both

DBpedia

and

Freebase

in the music domain.

Phuong T. Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio

Explaining and Suggesting Relatedness in Knowledge Graphs

Knowledge graphs (KGs) are a key ingredient for searching, browsing and knowledge discovery activities. Motivated by the need to harness knowledge available in a variety of KGs, we face the following two problems. First, given a pair of entities defined in some KG, find an explanation of their relatedness. We formalize the notion of relatedness explanation and introduce different criteria to build explanations based on information-theory, diversity and their combinations. Second, given a pair of entities, find other (pairs of) entities sharing a similar relatedness perspective. We describe an implementation of our ideas in a tool, called

RECAP

, which is based on RDF and SPARQL. We provide an evaluation of

RECAP

and a comparison with related systems on real-world data.

Giuseppe Pirrò

Type-Constrained Representation Learning in Knowledge Graphs

Large knowledge graphs increasingly add value to various applications that require machines to recognize and understand queries and their semantics, as in search or question answering systems. Latent variable models have increasingly gained attention for the statistical modeling of knowledge graphs, showing promising results in tasks related to knowledge graph completion and cleaning. Besides storing facts about the world, schema-based knowledge graphs are backed by rich semantic descriptions of entities and relation-types that allow machines to understand the notion of things and their semantic relationships. In this work, we study how type-constraints can generally support the statistical modeling with latent variable models. More precisely, we integrated prior knowledge in form of type-constraints in various state of the art latent variable approaches. Our experimental results show that prior knowledge on relation-types significantly improves these models up to 77% in linkprediction tasks. The achieved improvements are especially prominent when a low model complexity is enforced, a crucial requirement when these models are applied to very large datasets. Unfortunately, typeconstraints are neither always available nor always complete e.g., they can become fuzzy when entities lack proper typing. We show that in these cases, it can be beneficial to apply a local closed-world assumption that approximates the semantics of relation-types based on observations made in the data.

Denis Krompaß, Stephan Baier, Volker Tresp

Publishing Without Publishers: A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Making available and archiving scientific results is for the most part still considered the task of classical publishing companies, despite the fact that classical forms of publishing centered around printed narrative articles no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. Here we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used for the Semantic Web in general. Evaluation of the current small network shows that this system is efficient and reliable.

Tobias Kuhn, Christine Chichester, Michael Krauthammer, Michel Dumontier

Backmatter

Titel: The Semantic Web - ISWC 2015
herausgegeben von: Marcelo Arenas
Oscar Corcho
Elena Simperl
Markus Strohmaier
Mathieu d'Aquin
Kavitha Srinivas
Paul Groth
Michel Dumontier
Jeff Heflin
Krishnaprasad Thirunarayan
Krishnaprasad Thirunarayan
Steffen Staab
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-25007-6
Print ISBN: 978-3-319-25006-9
DOI: https://doi.org/10.1007/978-3-319-25007-6