Skip to main content

2013 | Buch

The Semantic Web: Semantics and Big Data

10th International Conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings

herausgegeben von: Philipp Cimiano, Oscar Corcho, Valentina Presutti, Laura Hollink, Sebastian Rudolph

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 10th Extended Semantic Web Conference, ESWC 2013, held in Montpellier, France, in May 2013. The 42 revised full papers presented together with three invited talks were carefully reviewed and selected from 162 submissions. They are organized in tracks on ontologies; linked open data; semantic data management; mobile Web, sensors and semantic streams; reasoning; natural language processing and information retrieval; machine learning; social Web and Web science; cognition and semantic Web; and in-use and industrial tracks. The book also includes 17 PhD papers presented at the PhD Symposium.

Inhaltsverzeichnis

Frontmatter

Research Track

Ontologies

A Unified Approach for Aligning Taxonomies and Debugging Taxonomies and Their Alignments

With the increased use of ontologies in semantically-enabled applications, the issues of debugging and aligning ontologies have become increasingly important. The quality of the results of such applications is directly dependent on the quality of the ontologies and mappings between the ontologies they employ. A key step towards achieving high quality ontologies and mappings is discovering and resolving modeling defects, e.g., wrong or missing relations and mappings. In this paper we present a unified framework for aligning taxonomies, the most used kind of ontologies, and debugging taxonomies and their alignments, where ontology alignment is treated as a special kind of debugging. Our framework supports the detection and repairing of missing and wrong is-a structure in taxonomies, as well as the detection and repairing of missing (alignment) and wrong mappings between ontologies. Further, we implemented a system based on this framework and demonstrate its benefits through experiments with ontologies from the Ontology Alignment Evaluation Initiative.

Valentina Ivanova, Patrick Lambrix
Opening the Black Box of Ontology Matching

Due to the high heterogeneity of ontologies, a combination of many methods is necessary in order to discover correctly the semantic correspondences between their elements. An ontology matching tool can be seen as a collection of several matching components, each implementing a specific method dealing with a specific heterogeneity type (terminological, structural or semantic). In addition, a mapping selection module is introduced to filter out the most likely mapping candidates. This paper proposes an empirical study of the interaction between these components working together inside an ontology matching system. By the help of datasets from the Ontology Alignment Evaluation Initiative, we have carried out several experimental studies. In the first place, we have been interested in the impact of the mapping selection module on the performance of terminological and structural matchers revealing the advantage of using global methods vs. local ones. Further, we have carried an extensive study on the flaw of the performance of a structural matcher in the presence of noisy input coming from a terminological method. Finally, we have analyzed the behavior of a structural and a semantic component with respect to inputs taken from different terminological matchers.

DuyHoa Ngo, Zohra Bellahsene, Konstantin Todorov
Towards Evaluating Interactive Ontology Matching Tools

With a growing number of ontologies used in the semantic web, agents can fully make sense of different datasets only if correspondences between those ontologies are known. Ontology matching tools have been proposed to find such correspondences. While the current research focus is mainly on fully automatic matching tools, some approaches have been proposed that involve the user in the matching process. However, there are currently no benchmarks and test methods to compare such tools. In this paper, we introduce a number of quality measures for interactive ontology matching tools, and we discuss means to automatically run benchmark tests for such tools. To demonstrate how those evaluation can be designed, we show examples on assessing the quality of interactive matching tools which involve the user in matcher selection and matcher parametrization.

Heiko Paulheim, Sven Hertling, Dominique Ritze
A Session-Based Approach for Aligning Large Ontologies

There are a number of challenges that need to be addressed when aligning large ontologies. Previous work has pointed out scalability and efficiency of matching techniques, matching with background knowledge, support for matcher selection, combination and tuning, and user involvement as major requirements. In this paper we address these challenges. Our first contribution is an ontology alignment framework that enables solutions to each of the challenges. This is achieved by introducing different kinds of interruptable sessions. The framework allows partial computations for generating mapping suggestions, partial validations of mappings suggestions and use of validation decisions in (re)computation of mapping suggestions and the recommendation of alignment strategies to use. Further, we describe an implemented system providing solutions to each of the challenges and show through experiments the advantages of the session-based approach.

Patrick Lambrix, Rajaram Kaliyaperumal
Organizing Ontology Design Patterns as Ontology Pattern Languages

Ontology design patterns have been pointed out as a promising approach for ontology engineering. The goal of this paper is twofold. Firstly, based on well-established works in Software Engineering, we revisit the notion of ontology patterns in Ontology Engineering to introduce the notion of ontology pattern language as a way to organize related ontology patterns. Secondly, we present an overview of a software process ontology pattern language.

Ricardo de Almeida Falbo, Monalessa Perini Barcellos, Julio Cesar Nardi, Giancarlo Guizzardi
An Ontology Design Pattern for Cartographic Map Scaling

The concepts of

scale

is at the core of cartographic abstraction and mapping. It defines which geographic phenomena should be displayed, which type of geometry and map symbol to use, which measures can be taken, as well as the degree to which features need to be exaggerated or spatially displaced. In this work, we present an ontology design pattern for map scaling using the Web Ontology Language (OWL) within a particular extension of the OWL RL profile. We explain how it can be used to describe scaling applications, to reason over scale levels, and geometric representations. We propose an axiomatization that allows us to impose meaningful constraints on the pattern, and, thus, to go beyond simple surface semantics. Interestingly, this includes several functional constraints currently not expressible in any of the OWL profiles. We show that for this specific scenario, the addition of such constraints does not increase the reasoning complexity which remains tractable.

David Carral, Simon Scheider, Krzysztof Janowicz, Charles Vardeman, Adila A. Krisnadhi, Pascal Hitzler
Locking for Concurrent Transactions on Ontologies

Collaborative editing on large-scale ontologies imposes serious demands on concurrent modifications and conflict resolution. In order to enable robust handling of concurrent modifications, we propose a locking-based approach that ensures independent transactions to simultaneously work on an ontology while blocking those transactions that might influence other transactions. In the logical context of ontologies, dependence and independence of transactions do not only rely on the single data items that are modified, but also on the inferences drawn from these items. In order to address this issue, we utilize logical modularization of ontologies and lock the parts of the ontology that share inferential dependencies for an ongoing transaction. We compare and evaluate modularization and the naive approach of locking the whole ontology for each transaction and analyze the trade-off between the time needed for computing locks and the time gained by running transactions concurrently.

Stefan Scheglmann, Steffen Staab, Matthias Thimm, Gerd Gröner
Predicting the Understandability of OWL Inferences

In this paper, we describe a method for predicting the understandability level of inferences with OWL. Specifically, we present a probabilistic model for measuring the understandability of a multiple-step inference based on the measurement of the understandability of individual inference steps. We also present an evaluation study which confirms that our model works relatively well for two-step inferences with OWL. This model has been applied in our research on generating accessible explanations for an entailment of OWL ontologies, to determine the most understandable inference among alternatives, from which the final explanation is generated.

Tu Anh T. Nguyen, Richard Power, Paul Piwek, Sandra Williams

Linked Open Data

Detecting SPARQL Query Templates for Data Prefetching

Publicly available Linked Data repositories provide a multitude of information. By utilizing

Sparql

, Web sites and services can consume this data and present it in a user-friendly form, e.g., in mash-ups. To gather RDF triples for this task, machine agents typically issue similarly structured queries with recurring patterns against the

Sparql

endpoint. These queries usually differ only in a small number of individual triple pattern parts, such as resource labels or literals in objects. We present an approach to detect such recurring patterns in queries and introduce the notion of query templates, which represent clusters of similar queries exhibiting these recurrences. We describe a matching algorithm to extract query templates and illustrate the benefits of prefetching data by utilizing these templates. Finally, we comment on the applicability of our approach using results from real-world

Sparql

query logs.

Johannes Lorey, Felix Naumann
Synonym Analysis for Predicate Expansion

Despite unified data models, such as the Resource Description Framework (

Rdf

) on structural level and the corresponding query language

Sparql

, the integration and usage of Linked Open Data faces major heterogeneity challenges on the semantic level. Incorrect use of ontology concepts and class properties impede the goal of machine readability and knowledge discovery. For example, users searching for movies with a certain artist cannot rely on a single given property

artist

, because some movies may be connected to that artist by the predicate

starring

. In addition, the information need of a data consumer may not always be clear and her interpretation of given schemata may differ from the intentions of the ontology engineer or data publisher.

It is thus necessary to either support users during query formulation or to incorporate implicitly related facts through predicate expansion. To this end, we introduce a data-driven synonym discovery algorithm for predicate expansion. We applied our algorithm to various data sets as shown in a thorough evaluation of different strategies and rule-based techniques for this purpose.

Ziawasch Abedjan, Felix Naumann
Instance-Based Ontological Knowledge Acquisition

The Linked Open Data (LOD) cloud contains tremendous amounts of interlinked instances, from where we can retrieve abundant knowledge. However, because of the heterogeneous and big ontologies, it is time consuming to learn all the ontologies manually and it is difficult to observe which properties are important for describing instances of a specific class. In order to construct an ontology that can help users easily access to various data sets, we propose a semi-automatic ontology integration framework that can reduce the heterogeneity of ontologies and retrieve frequently used core properties for each class. The framework consists of three main components: graph-based ontology integration, machine-learning-based ontology schema extraction, and an ontology merger. By analyzing the instances of the linked data sets, this framework acquires ontological knowledge and constructs a high-quality integrated ontology, which is easily understandable and effective in knowledge acquisition from various data sets using simple SPARQL queries.

Lihua Zhao, Ryutaro Ichise
Logical Linked Data Compression

Linked data has experienced accelerated growth in recent years. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets, called Rule Based Compression (RB Compression) that compresses datasets by generating a set of new logical rules from the dataset and removing triples that can be inferred from these rules. Unlike other compression techniques, our approach not only takes advantage of syntactic verbosity and data redundancy but also utilizes semantic associations present in the RDF graph. Depending on the nature of the dataset, our system is able to prune more than 50% of the original triples without affecting data integrity.

Amit Krishna Joshi, Pascal Hitzler, Guozhu Dong
Access Control for HTTP Operations on Linked Data

Access control is a recognized open issue when interacting with RDF using HTTP methods. In literature, authentication and authorization mechanisms either introduce undesired complexity such as SPARQL and ad-hoc policy languages, or rely on basic access control lists, thus resulting in limited policy expressiveness. In this paper we show how the Shi3ld attribute-based authorization framework for SPARQL endpoints has been progressively converted to protect HTTP operations on RDF. We proceed by steps: we start by supporting the SPARQL 1.1 Graph Store Protocol, and we shift towards a SPARQL-less solution for the Linked Data Platform. We demonstrate that the resulting authorization framework provides the same functionalities of its SPARQL-based counterpart, including the adoption of Semantic Web languages only.

Luca Costabello, Serena Villata, Oscar Rodriguez Rocha, Fabien Gandon
Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data

Bio2RDF currently provides the largest network of Linked Data for the Life Sciences. Here, we describe a significant update to increase the overall quality of RDFized datasets generated from open scripts powered by an API to generate registry-validated IRIs, dataset provenance and metrics, SPARQL endpoints, downloadable RDF and database files. We demonstrate federated SPARQL queries within and across the Bio2RDF network, including semantic integration using the Semanticscience Integrated Ontology (SIO). This work forms a strong foundation for increased coverage and continuous integration of data in the life sciences.

Alison Callahan, José Cruz-Toledo, Peter Ansell, Michel Dumontier
Observing Linked Data Dynamics

In this paper, we present the design and first results of the

Dynamic Linked Data Observatory

: a long-term experiment to monitor the two-hop neighbourhood of a core set of eighty thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs to monitor, retrieving the documents, and further crawling part of the two-hop neighbourhood. Having now run this experiment for six months, we analyse the dynamics of the monitored documents over the data collected thus far. We look at the estimated lifespan of the core documents, how often they go on-line or off-line, how often they change; we further investigate domain-level trends. Next we look at changes within the RDF content of the core documents across the weekly snapshots, examining the elements (i.e., triples, subjects, predicates, objects, classes) that are most frequently added or removed. Thereafter, we look at how the links between dereferenceable documents evolves over time in the two-hop neighbourhood.

Tobias Käfer, Ahmed Abdelrahman, Jürgen Umbrich, Patrick O’Byrne, Aidan Hogan
A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud

Schema information about resources in the Linked Open Data (LOD) cloud can be provided in a twofold way: it can be explicitly defined by attaching RDF types to the resources. Or it is provided implicitly via the definition of the resources’ properties. In this paper, we present a method and metrics to analyse the information theoretic properties and the correlation between the two manifestations of schema information. Furthermore, we actually perform such an analysis on large-scale linked data sets. To this end, we have extracted schema information regarding the types and properties defined in the data set segments provided for the Billion Triples Challenge 2012. We have conducted an in depth analysis and have computed various entropy measures as well as the mutual information encoded in the two types of schema information. Our analysis provides insights into the information encoded in the different schema characteristics. Two major findings are that implicit schema information is far more discriminative and that applications involving schema information based on either types or properties alone will only capture between 63.5% and 88.1% of the schema information contained in the data. Based on these observations, we derive conclusions about the design of future schemas for LOD as well as potential application scenarios.

Thomas Gottron, Malte Knauf, Stefan Scheglmann, Ansgar Scherp

Semantic Data Management

Lightweight Spatial Conjunctive Query Answering Using Keywords

With the advent of publicly available geospatial data, ontology-based data access (OBDA) over spatial data has gained increasing interest. Spatio-relational DBMSs are used to implement geographic information systems (GIS) and are fit to manage large amounts of data and geographic objects such as points, lines, polygons, etc. In this paper, we extend the Description Logic DL-Lite with spatial objects and show how to answer spatial conjunctive queries (SCQs) over ontologies—that is, conjunctive queries with point-set topological relations such as

next

and

within

—expressed in this language. The goal of this extension is to enable an off-the-shelf use of spatio-relational DBMSs to answer SCQs using rewriting techniques, where data sources and geographic objects are stored in a database and spatial conjunctive queries are rewritten to SQL statements with spatial functions. Furthermore, we consider keyword-based querying over spatial OBDA data sources, and show how to map queries expressed as simple keyword lists describing objects of interest to SCQs, using a meta-model for completing the SCQs with spatial aspects. We have implemented our lightweight approach to spatial OBDA in a prototype and show initial experimental results using data sources such as Open Street Maps and Open Government Data Vienna from an associated project. We show that for real-world scenarios, practical queries are expressible under meta-model completion, and that query answering is computationally feasible.

Thomas Eiter, Thomas Krennwallner, Patrik Schneider
Representation and Querying of Valid Time of Triples in Linked Geospatial Data

We introduce the temporal component of the stRDF data model and the stSPARQL query language, which have been recently proposed for the representation and querying of linked geospatial data that changes over time. With this temporal component in place, stSPARQL becomes a very expressive query language for linked geospatial data, going beyond the recent OGC standard GeoSPARQL, which has no support for valid time of triples. We present the implementation of the stSPARQL temporal component in the system Strabon, and study its performance experimentally. Strabon is shown to outperform all the systems it has been compared with.

Konstantina Bereta, Panayiotis Smeros, Manolis Koubarakis
When to Reach for the Cloud: Using Parallel Hardware for Link Discovery

With the ever-growing amount of RDF data available across the Web, the discovery of links between datasets and deduplication of resources within knowledge bases have become tasks of crucial importance. Over the last years, several link discovery approaches have been developed to tackle the runtime and complexity problems that are intrinsic to link discovery. Yet, so far, little attention has been paid to the management of hardware resources for the execution of link discovery tasks. This paper addresses this research gap by investigating the efficient use of hardware resources for link discovery. We implement the

$\mathcal{HR}^3$

approach for three different parallel processing paradigms including the use of GPUs and MapReduce platforms. We also perform a thorough performance comparison for these implementations. Our results show that certain tasks that appear to require cloud computing techniques can actually be accomplished using standard parallel hardware. Moreover, our evaluation provides break-even points that can serve as guidelines for deciding on when to use which hardware for link discovery.

Axel-Cyrille Ngonga Ngomo, Lars Kolb, Norman Heino, Michael Hartung, Sören Auer, Erhard Rahm
No Size Fits All – Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views

Statistics published as Linked Data promise efficient extraction, transformation and loading (ETL) into a database for decision support. The predominant way to implement analytical query capabilities in industry are specialised engines that translate OLAP queries to SQL queries on a relational database using a star schema (ROLAP). A more direct approach than ROLAP is to load Statistical Linked Data into an RDF store and to answer OLAP queries using SPARQL. However, we assume that general-purpose triple stores – just as typical relational databases – are no perfect fit for analytical workloads and need to be complemented by OLAP-to-SPARQL engines. To give an empirical argument for the need of such an engine, we first compare the performance of our generated SPARQL and of ROLAP SQL queries. Second, we measure the performance gain of RDF aggregate views that, similar to aggregate tables in ROLAP, materialise parts of the data cube.

Benedikt Kämpgen, Andreas Harth

Mobile Web, Sensors and Semantic Streams

Seven Commandments for Benchmarking Semantic Flow Processing Systems

Over the last few years, the processing of dynamic data has gained increasing attention in the Semantic Web community. This led to the development of several stream reasoning systems that enable on-the-fly processing of semantically annotated data that changes over time. Due to their streaming nature, analyzing such systems is extremely difficult. Currently, their evaluation is conducted under heterogeneous scenarios, hampering their comparison and an understanding of their benefits and limitations. In this paper, we strive for a better understanding of the key challenges that these systems must face and define a generic methodology to evaluate their performance. Specifically, we identify three

Key Performance Indicators

and seven

commandments

that specify how to design the stress tests for system evaluation.

Thomas Scharrenbach, Jacopo Urbani, Alessandro Margara, Emanuele Della Valle, Abraham Bernstein

Reasoning

Graph-Based Ontology Classification in OWL 2 QL

Ontology classification is the reasoning service that computes all subsumption relationships inferred in an ontology between concept, role, and attribute names in the ontology signature. OWL 2 QL is a tractable profile of OWL 2 for which ontology classification is polynomial in the size of the ontology TBox. However, to date, no efficient methods and implementations specifically tailored to OWL 2 QL ontologies have been developed. In this paper, we provide a new algorithm for ontology classification in OWL 2 QL, which is based on the idea of encoding the ontology TBox into a directed graph and reducing core reasoning to computation of the transitive closure of the graph. We have implemented the algorithm in the

QuOnto

reasoner and extensively evaluated it over very large ontologies. Our experiments show that

QuOnto

outperforms various popular reasoners in classification of OWL 2 QL ontologies.

Domenico Lembo, Valerio Santarelli, Domenico Fabio Savo
RDFS with Attribute Equations via SPARQL Rewriting

In addition to taxonomic knowledge about concepts and properties typically expressible in languages such as RDFS and OWL, implicit information in an RDF graph may be likewise determined by arithmetic equations. The main use case here is exploiting knowledge about functional dependencies among numerical attributes expressible by means of such equations. While some of this knowledge can be encoded in rule extensions to ontology languages, we provide an arguably more flexible framework that treats attribute equations as first class citizens in the ontology language. The combination of ontological reasoning and attribute equations is realized by extending query rewriting techniques already successfully applied for ontology languages such as (the DL-Lite-fragment of) RDFS or OWL, respectively. We deploy this technique for rewriting SPARQL queries and discuss the feasibility of alternative implementations, such as rule-based approaches.

Stefan Bischof, Axel Polleres

Natural Language Processing and Information Retrieval

A Comparison of Knowledge Extraction Tools for the Semantic Web

In the last years, basic NLP tasks: NER, WSD, relation extraction, etc. have been configured for Semantic Web tasks including ontology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowledge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived specifically for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing specifically on the actual semantic reusability of KE output.

Aldo Gangemi
Constructing a Focused Taxonomy from a Document Collection

We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain.

Olena Medelyan, Steve Manion, Jeen Broekstra, Anna Divoli, Anna-Lan Huang, Ian H. Witten
Semantic Multimedia Information Retrieval Based on Contextual Descriptions

Semantic analysis and annotation of textual information with appropriate semantic entities is an essential task to enable content based search on the annotated data. For video resources textual information is rare at first sight. But in recent years the development of technologies for automatic extraction of textual information from audio visual content has advanced. Additionally, video portals allow videos to be annotated with tags and comments by authors as well as users. All this information taken together forms video metadata which is manyfold in various ways. By making use of the characteristics of the different metadata types context can be determined to enable sound and reliable semantic analysis and to support accuracy of understanding the video’s content. This paper proposes a description model of video metadata for semantic analysis taking into account various contextual factors.

Nadine Steinmetz, Harald Sack
Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language Information

DBpedia is a project aiming to represent Wikipedia content in RDF triples. It plays a central role in the Semantic Web, due to the large and growing number of resources linked to it. Nowadays, only 1.7M Wikipedia pages are deeply classified in the DBpedia ontology, although the English Wikipedia contains almost 4M pages, showing a clear problem of coverage. In other languages (like French and Spanish) this coverage is even lower. The objective of this paper is to define a methodology to increase the coverage of DBpedia in different languages. The major problems that we have to solve concern the high number of classes involved in the DBpedia ontology and the lack of coverage for some classes in certain languages. In order to deal with these problems, we first extend the population of the classes for the different languages by connecting the corresponding Wikipedia pages through cross-language links. Then, we train a supervised classifier using this extended set as training data. We evaluated our system using a manually annotated test set, demonstrating that our approach can add more than 1M new entities to DBpedia with high precision (90%) and recall (50%). The resulting resource is available through a SPARQL endpoint and a downloadable package.

Alessio Palmero Aprosio, Claudio Giuliano, Alberto Lavelli
A Support Framework for Argumentative Discussions Management in the Web

On the Web, wiki-like platforms allow users to provide arguments in favor or against issues proposed by other users. The increasing content of these platforms as well as the high number of revisions of the content through pros and cons arguments make it difficult for community managers to understand and manage these discussions. In this paper, we propose an automatic framework to support the management of argumentative discussions in wiki-like platforms. Our framework is composed by (i) a natural language module, which automatically detects the arguments in natural language returning the relations among them, and (ii) an argumentation module, which provides the overall view of the argumentative discussion under the form of a directed graph highlighting the accepted arguments. Experiments on the history of Wikipedia show the feasibility of our approach.

Elena Cabrio, Serena Villata, Fabien Gandon
A Multilingual Semantic Wiki Based on Attempto Controlled English and Grammatical Framework

We describe a semantic wiki system with an underlying controlled natural language grammar implemented in Grammatical Framework (GF). The grammar restricts the wiki content to a well-defined subset of Attempto Controlled English (ACE), and facilitates a precise bidirectional automatic translation between ACE and language fragments of a number of other natural languages, making the wiki content accessible multilingually. Additionally, our approach allows for automatic translation into the Web Ontology Language (OWL), which enables automatic reasoning over the wiki content. The developed wiki environment thus allows users to build, query and view OWL knowledge bases via a user-friendly multilingual natural language interface. As a further feature, the underlying multilingual grammar is integrated into the wiki and can be collaboratively edited to extend the vocabulary of the wiki or even customize its sentence structures. This work demonstrates the combination of the existing technologies of Attempto Controlled English and Grammatical Framework, and is implemented as an extension of the existing semantic wiki engine AceWiki.

Kaarel Kaljurand, Tobias Kuhn

Machine Learning

COALA – Correlation-Aware Active Learning of Link Specifications

Link Discovery plays a central role in the creation of knowledge bases that abide by the five Linked Data principles. Over the last years, several active learning approaches have been developed and used to facilitate the supervised learning of link specifications. Yet so far, these approaches have not taken the correlation between unlabeled examples into account when requiring labels from their user. In this paper, we address exactly this drawback by presenting the concept of the correlation-aware active learning of link specifications. We then present two generic approaches that implement this concept. The first approach is based on graph clustering and can make use of intra-class correlation. The second relies on the activation-spreading paradigm and can make use of both intra- and inter-class correlations. We evaluate the accuracy of these approaches and compare them against a state-of-the-art link specification learning approach in ten different settings. Our results show that our approaches outperform the state of the art by leading to specifications with higher F-scores.

Axel-Cyrille Ngonga Ngomo, Klaus Lyko, Victor Christen
Transductive Inference for Class-Membership Propagation in Web Ontologies

The increasing availability of structured machine-processable knowledge in the context of the Semantic Web, allows for inductive methods to back and complement purely deductive reasoning in tasks where the latter may fall short. This work proposes a new method for similarity-based class-membership prediction in this context. The underlying idea is the

propagation

of class-membership information among similar individuals. The resulting method is essentially non-parametric and it is characterized by interesting complexity properties, that make it a candidate for the application of transductive inference to large-scale contexts. We also show an empirical evaluation of the method with respect to other approaches based on inductive inference in the related literature.

Pasquale Minervini, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito

Social Web and Web Science

Measuring the Topical Specificity of Online Communities

For community managers and hosts it is not only important to identify the current key topics of a community but also to assess the specificity level of the community for: a) creating sub-communities, and: b) anticipating community behaviour and topical evolution. In this paper we present an approach that empirically characterises the topical specificity of online community forums by measuring the abstraction of semantic concepts discussed within such forums. We present a range of concept abstraction measures that function over concept graphs - i.e. resource type-hierarchies and SKOS category structures - and demonstrate the efficacy of our method with an empirical evaluation using a ground truth ranking of forums. Our results show that the proposed approach outperforms a random baseline and that resource type-hierarchies work well when predicting the topical specificity of any forum with various abstraction measures.

Matthew Rowe, Claudia Wagner, Markus Strohmaier, Harith Alani
Broadening the Scope of Nanopublications

In this paper, we present an approach for extending the existing concept of nanopublications — tiny entities of scientific results in RDF representation — to broaden their application range. The proposed extension uses English sentences to represent informal and underspecified scientific claims. These sentences follow a syntactic and semantic scheme that we call AIDA (Atomic, Independent, Declarative, Absolute), which provides a uniform and succinct representation of scientific assertions. Such AIDA nanopublications are compatible with the existing nanopublication concept and enjoy most of its advantages such as information sharing, interlinking of scientific findings, and detailed attribution, while being more flexible and applicable to a much wider range of scientific results. We show that users are able to create AIDA sentences for given scientific results quickly and at high quality, and that it is feasible to automatically extract and interlink AIDA nanopublications from existing unstructured data sources. To demonstrate our approach, a web-based interface is introduced, which also exemplifies the use of nanopublications for non-scientific content, including meta-nanopublications that describe other nanopublications.

Tobias Kuhn, Paolo Emilio Barbano, Mate Levente Nagy, Michael Krauthammer
The Wisdom of the Audience: An Empirical Study of Social Semantics in Twitter Streams

Interpreting the meaning of a document represents a fundamental challenge for current semantic analysis methods. One interesting aspect mostly neglected by existing methods is that authors of a document usually assume certain background knowledge of their intended audience. Based on this knowledge, authors usually decide what to communicate and how to communicate it. Traditionally, this kind of knowledge has been elusive to semantic analysis methods. However, with the rise of social media such as Twitter, background knowledge of intended audiences (i.e., the community of potential readers) has become explicit to some extents, i.e., it can be modeled and estimated. In this paper, we (i) systematically compare different methods for estimating background knowledge of different audiences on Twitter and (ii) investigate to what extent the background knowledge of audiences is useful for interpreting the meaning of social media messages. We find that estimating the background knowledge of social media audiences may indeed be useful for interpreting the meaning of social media messages, but that its utility depends on manifested structural characteristics of message streams.

Claudia Wagner, Philipp Singer, Lisa Posch, Markus Strohmaier

Cognition and Semantic Web

Collecting Links between Entities Ranked by Human Association Strengths

In recent years, the ongoing adoption of Semantic Web technologies has lead to a large amount of Linked Data that has been generated. While in the early days of the Semantic Web we were fighting data scarcity, nowadays we suffer from an overflow of information. In many situations we want to restrict the amount of facts which is shown to an end-user or passed on to another system to just the most important ones.

In this paper we propose to rank facts in accordance to human association strengths between concepts. In order to collect a ground truth we developed a Family Feud like web-game called “Knowledge Test Game”. Given a Linked Data entity it collects other associated Linked Data entities from its players. We explain the game’s concept, its suggestion box which maps the players’ text input back to Linked Data entities and include a detailed evaluation of the game showing promising results. The collected data is published and can be used to evaluate algorithms which rank facts.

Jörn Hees, Mohamed Khamis, Ralf Biedert, Slim Abdennadher, Andreas Dengel
Personalized Concept-Based Search and Exploration on the Web of Data Using Results Categorization

As the size of the Linked Open Data (LOD) increases, searching and exploring LOD becomes more challenging. To overcome this issue, we propose a novel personalized search and exploration mechanism for the Web of Data (WoD) based on concept-based results categorization. In our approach, search results (LOD resources) are conceptually categorized into UMBEL concepts to form

concept lenses

, which assist exploratory search and browsing. When the user selects a concept lens for exploration, results are immediately personalized. In particular, all concept lenses are personally re-organized according to their similarity to the selected concept lens using a similarity measure. Within the selected concept lens; more relevant results are included using results re-ranking and query expansion, as well as relevant concept lenses are suggested to support results exploration. This is an innovative feature offered by our approach since it allows dynamic adaptation of results to the user’s local choices. We also support interactive personalization; when the user clicks on a result, within the interacted lens, relevant categories and results are included using results re-ranking and query expansion. Our personalization approach is non-intrusive, privacy preserving and scalable since it does not require login and implemented at the client-side. To evaluate efficacy of the proposed personalized search, a benchmark was created on a tourism domain. The results showed that the proposed approach performs significantly better than a non-adaptive baseline concept-based search and traditional ranked list presentation.

Melike Sah, Vincent Wade
Combining a Co-occurrence-Based and a Semantic Measure for Entity Linking

One key feature of the Semantic Web lies in the ability to link related Web resources. However, while relations within particular datasets are often well-defined, links between disparate datasets and corpora of Web resources are rare. The increasingly widespread use of cross-domain reference datasets, such as Freebase and DBpedia for annotating and enriching datasets as well as documents, opens up opportunities to exploit their inherent semantic relationships to align disparate Web resources. In this paper, we present a combined approach to uncover relationships between disparate entities which exploits (a) graph analysis of reference datasets together with (b) entity co-occurrence on the Web with the help of search engines. In (a), we introduce a novel approach adopted and applied from social network theory to measure the connectivity between given entities in reference datasets. The connectivity measures are used to identify connected Web resources. Finally, we present a thorough evaluation of our approach using a publicly available dataset and introduce a comparison with established measures in the field.

Bernardo Pereira Nunes, Stefan Dietze, Marco Antonio Casanova, Ricardo Kawase, Besnik Fetahu, Wolfgang Nejdl

In-Use and Industrial Track

Publishing Bibliographic Records on the Web of Data: Opportunities for the BnF (French National Library)

Linked open data tools have been implemented through

data.bnf.fr

, a project which aims at making the BnF data more useful on the Web.

data.bnf.fr

gathers data automatically from different databases on pages about authors, works and themes. Online since July 2011, it is still under development and has feedbacks from several users, already.

First the article will present the issues linked to our data and stress the importance of useful links and of persistency for archival purposes. We will discuss our solution and methodology, showing their strengths and weaknesses, to create new services for the library. An insight on the ontology and vocabularies will be given, with a “business” view of the interaction between rich RDF ontologies and light HTML embedded data such as

schema.org

. The broader question of Libraries on the Semantic Web will be addressed so as to help specify similar projects.

Agnès Simon, Romain Wenz, Vincent Michel, Adrien Di Mascio
Hafslund Sesam – An Archive on Semantics

Sesam is an archive system developed for Hafslund, a Norwegian energy company. It achieves the often-sought but rarely-achieved goal of automatically enriching metadata by using semantic technologies to extract and integrate business data from business applications. The extracted data is also indexed with a search engine together with the archived documents, allowing true enterprise search.

Lars Marius Garshol, Axel Borge
Connecting the Smithsonian American Art Museum to the Linked Data Cloud

Museums around the world have built databases with metadata about millions of objects, their history, the people who created them, and the entities they represent. This data is stored in proprietary databases and is not readily available for use. Recently, museums embraced the Semantic Web as a means to make this data available to the world, but the experience so far shows that publishing museum data to the linked data cloud is difficult: the databases are large and complex, the information is richly structured and varies from museum to museum, and it is difficult to link the data to other datasets. This paper describes the process and lessons learned in publishing the data from the Smithsonian American Art Museum (SAAM). We highlight complexities of the database-to-RDF mapping process, discuss our experience linking the SAAM dataset to hub datasets such as DBpedia and the Getty Vocabularies, and present our experience in allowing SAAM personnel to review the information to verify that it meets the high standards of the Smithsonian. Using our tools, we helped SAAM publish high-quality linked data of their complete holdings (41,000 objects and 8,000 artists).

Pedro Szekely, Craig A. Knoblock, Fengyu Yang, Xuming Zhu, Eleanor E. Fink, Rachel Allen, Georgina Goodlander
Guiding the Evolution of a Multilingual Ontology in a Concrete Setting

Evolving complex artifacts as multilingual ontologies is a difficult activity demanding for the involvement of different roles and for guidelines to drive and coordinate them. We present the methodology and the underlying tool that have been used in the context of the Organic.Lingua project for the collaborative evolution of the multilingual Organic Agriculture ontology. Findings gathered from a quantitative and a qualitative evaluation of the experience are reported, revealing the usefulness of the methodology used in synergy with the tool.

Mauro Dragoni, Chiara Di Francescomarino, Chiara Ghidini, Julia Clemente, Salvador Sánchez Alonso
Using BMEcat Catalogs as a Lever for Product Master Data on the Semantic Web

To date, the automatic exchange of product information between business partners in a value chain is typically done using Business-to-Business (B2B) catalog standards such as EDIFACT, cXML, or BMEcat. At the same time, the Web of Data, in particular the GoodRelations vocabulary, offers the necessary means to publish highly-structured product data in a machine-readable format. The advantage of the publication of rich product descriptions can be manifold, including better integration and exchange of information between Web applications, high-quality data along the various stages of the value chain, or the opportunity to support more precise and more effective searches. In this paper, we (1) stress the importance of rich product master data for e-commerce on the Semantic Web, and (2) present a tool to convert BMEcat XML data sources into an RDF-based data model anchored in the GoodRelations vocabulary. The benefits of our proposal are tested using product data collected from a set of 2500+ online retailers of varying sizes and domains.

Alex Stolz, Benedicto Rodriguez-Castro, Martin Hepp

PhD Symposium

Ontology-Supported Document Ranking for Novelty Search

Within specific domains, users generally face the challenge to populate an ontology according to their needs. Especially in case of novelty detection and forecast, the user wants to integrate novel information contained in natural text documents into his/her own ontology in order to utilise the knowledge base in a further step. In this paper, a semantic document ranking approach is proposed which serves as a prerequisite for ontology population. By using the underlying ontology for both query generation and document ranking, query and ranking are structured and, therefore, promise to provide a better ranking in terms of relevance and novelty than without using semantics.

Michael Färber
Semantic Web for the Humanities

Researchers have been interested recently in publishing and linking Humanities datasets following Linked Data principles. This has given rise to some issues that complicate the semantic modelling, comparison, combination and longitudinal analysis of these datasets. In this research proposal we discuss three of these issues: representation round-tripping, concept drift, and contextual knowledge. We advocate an integrated approach to solve them, and present some preliminary results.

Albert Meroño-Peñuela
Maintaining Mappings Valid between Dynamic KOS

K

nowledge

O

rganization

S

ystems (KOS) and the existing mappings between them have become extremely relevant in semantic-enabled systems especially for interoperability reasons. KOS may have a dynamic nature since knowledge in a lot of domains evolves fast, and thus KOS evolution can potentially impact mappings, turning them unreliable. A still open research problem is how to adapt mappings in the course of KOS evolution without re-computing semantic correspondences between elements of the involved KOS. This PhD study tackles this issue proposing an approach for adapting mappings according to KOS changes. A framework is conceptualized with a mechanism to support the maintenance of mappings over time, keeping them valid. This proposal will decrease the efforts to maintain mappings up-to-date.

Julio Cesar Dos Reis
Automatic Argumentation Extraction

This extended abstract outlines the area of automatic argumentation extraction. The state of the art is discussed, and how it has influenced the proposed direction of this work. This research aims to provide decision support by automatically extracting argumentation from natural language, enabling a decision maker to follow more closely the reasoning process, to examine premises and counter-arguments, and to reach better informed decisions.

Alan Sergeant
Guided Composition of Tasks with Logical Information Systems - Application to Data Analysis Workflows in Bioinformatics

In a number of domains, particularly in bioinformatics, there is a need for complex data analysis. For that issue, elementary data analysis operations called tasks are composed as workflows. The composition of tasks is however difficult due to the distributed and heterogeneous resources of bioinformatics. This doctorial work will address the composition of tasks using Logical Information Systems (LIS). LIS let users build complex queries and updates over semantic web data through guided navigation, suggesting relevant pieces and updates at each step. The objective is to use semantics to describe bioinformatic tasks and to adapt the guided approach of Sewelis, a LIS semantic web tool, to the composition of tasks. We aim at providing a tool that supports guided composition of semantic web services in bioinformatics, and that will support biologists in designing workflows for complex data analysis.

Mouhamadou Ba
Storing and Provisioning Linked Data as a Service

Linked Data offers novel opportunities for aggregating information about a wide range of topics and for a multitude of applications. While the technical specifications of Linked Data have been a major research undertaking for the last decade, there is still a lack of real-world data and applications exploiting this data. Partly, this is due to the fact that datasets remain isolated from one another and their integration is a non-trivial task. In this work, we argue for a Data-as-a-Service approach combining both warehousing and query federation to discover and consume Linked Data. We compare our work to state-of-the-art approaches for discovering, integrating, and consuming Linked Data. Moreover, we illustrate a number of challenges when combining warehousing with federation features, and highlight key aspects of our research.

Johannes Lorey
Interlinking Cross-Lingual RDF Data Sets

Linked Open Data is an essential part of the Semantic Web. More and more data sets are published in natural languages comprising not only English but other languages as well. It becomes necessary to link the same entities distributed across different RDF data sets. This paper is an initial outline of the research to be conducted on cross-lingual RDF data set interlinking, and it presents several ideas how to approach this problem.

Tatiana Lesnikova
Trusting Semi-structured Web Data

The growth of the Web brings an uncountable amount of useful information to everybody who can access it. These data are often crowdsourced or provided by heterogenous or unknown sources, therefore they might be maliciously manipulated or unreliable. Moreover, because of their amount it is often impossible to extensively check them, and this gives rise to massive and ever growing trust issues. The research presented in this paper aims at investigating the use of data sources and reasoning techniques to address trust issues about Web data. In particular, these investigations include the use of trusted Web sources, of uncertainty reasoning, of semantic similarity measures and of provenance information as possible bases for trust estimation. The intended result of this thesis is a series of analyses and tools that allow to better understand and address the problem of trusting semi-structured Web data.

Davide Ceolin
Augmented Reality Supported by Semantic Web Technologies

Augmented Reality applications are more and more widely used nowadays. With help of it the real physical environment could be extended by computer generated virtual elements. These virtual elements can be for example important context-aware information. With Semantic Web it is possible among others to handle data which come from heterogeneous sources. As a result we have the opportunity to combine Semantic Web and Augmented Reality utilizing the benefits of combination of these technologies. The obtained system may be suitable for daily use with wide range of applications in field of tourism, entertainment, navigation, ambient assisted living, etc. The purpose of my research is to develop a prototype of general framework which satisfies the above criteria.

Tamás Matuszka
Search Result Ontologies for Digital Libraries

This PhD investigates a novel architecture for digital libraries. This architecture should enable search processes to return instances of result core ontologies further on called result ontologies linked to documents found within a digital library. Such result ontologies would describe a search result more comprehensively, concisely and coherently. Other applications can then access these result ontologies via the web. This outcome should be achieved by introducing a modular ontology repository and an automatic ontology learning methodology for documents stored in a digital library. Current limitations in terms of automatic extraction of ontologies should be overcome with the help of seed ontologies, deep natural language processing techniques and weights applied to newly added concepts. The modular ontology repository will be comprised of a top-level ontology layer, a core ontology layer and a document and result ontology layer.

Emanuel Reiterer
Semantically Assisted Workflow Patterns for the Social Web

The abundance of discussions in the Social Web has altered the way that people consume products and services. This PhD topic aims to materialise a novel approach to assist online communication in the Social Web by combining workflow patterns and behaviour modelling. Semantic Web technologies are considered beneficial in various aspects of this approach, like in the behaviour modelling, personalisation and context-aware workflows.

Ioannis Stavrakantonakis
An Architecture to Aggregate Heterogeneous and Semantic Sensed Data

We are surrounding by sensor networks such as healthcare, home or environmental monitoring, weather forecasting, etc. All sensor-based applications proposed are domain-specific. We aim to link these heterogeneous sensor networks to propose promising applications. Existing applications add semantics to the sensor networks, more specifically, to the context, rather than to the sensed data. We propose an architecture to merge heterogeneous sensor networks, convert measurements into semantic data and reason on them.

Amelie Gyrard
Linked Data Interfaces for Non-expert Users

Linked Data has become an essential part of the Semantic Web. A lot of Linked Data is already available in the Linked Open Data cloud, which keeps growing due to an influx of new data from research and open government activities. However, it is still quite difficult to access this wealth of semantically enriched data directly without having in-depth knowledge about SPARQL and related semantic technologies. The presented dissertation explores Linked Data interfaces for non-expert users, especially keyword search as an entry point and tabular interfaces for filtering and exploration. It also looks at the value chain surrounding Linked Data and the possibilities that open up when people without a background in computer science can easily access Linked Data.

Patrick Hoefler
Event Matching Using Semantic and Spatial Memories

We address the problem of real-time matching and correlation of events which are detected and reported by humans. As in Twitter, facebook, blogs and phone calls, the stream of reported events are unstructured and require intensive manual processing. The plethora of events and their different types need a flexible model and a representation language that allows us to encode them for online processing. Current approaches in complex event processing and stream reasoning focus on temporal relationships between composite events and usually refer to pre-defined sensor locations. We propose a methodology and a computational framework for matching and correlating atomic and complex events which have no pre-defined schemas based on their content. Matching evaluation on real events show significant improvement compared to the manual matching process.

Majed Ayyad
Incremental SPARQL Query Processing

The number of linked data sources available on the Web is growing at a rapid rate. Moreover, users are showing an interest for any framework that allows them to obtain answers, for a formulated query, accessing heterogeneous data sources without the need of explicitly specifying the sources to answer the query. Our proposal focus on that interest and its goal is to build a system capable of answering to user queries in an incremental way. Each time a different data source is accessed the previous answer is eventually enriched. Brokering across the data sources is enabled by using source mapping relationships. User queries are rewritten using those mappings in order to obtain translations of the original query across data sources. Semantically equivalent translations are first looked for, but semantically approximated ones are generated if equivalence is not achieved. Well defined metrics are considered to estimate the information loss, if any.

Ana I. Torre-Bastida
Knowledge Point-Based Approach to Interlink Open Education Resources

With more and more Open Education Resources (OER) courses being recognised and acknowledged by global learners, an emerging issue is that learners’ self-efficacy is often affected by the lack of interaction between peers and instructors in their continuous self-learning process. This paper proposes a low-level Knowledge Point-based approach to serve application layers to enhance the interaction during the self-learning. This is achieved through taking advantage of Semantic Web and Linked Data techniques to annotate and interlink OER fragments which can later be reused and interoperated more conveniently.

Xinglong Ma
A Linked Data Reasoner in the Cloud

Over the last decade, the paradigm of Linked Data has gained momentum. It is possible to leverage implicit knowledge from these data using a reasoner. Nevertheless, current methods for reasoning over linked data are well suited for small to medium datasets, and they fail at reaching the scale of the Web of Data. In this PhD thesis, we are interested in how distributed computing in the Cloud can help a linked data reasoner to scale. We present in this paper the early state of this thesis.

Jules Chevalier
Backmatter
Metadaten
Titel
The Semantic Web: Semantics and Big Data
herausgegeben von
Philipp Cimiano
Oscar Corcho
Valentina Presutti
Laura Hollink
Sebastian Rudolph
Copyright-Jahr
2013
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-38288-8
Print ISBN
978-3-642-38287-1
DOI
https://doi.org/10.1007/978-3-642-38288-8

Neuer Inhalt