Semantic Data Management, Big Data, and Scalability Track

Frontmatter

Traffic Analytics for Linked Data Publishers

We present a traffic analytics platform for servers that publish Linked Data. To the best of our knowledge, this is the first system that mines access logs of registered Linked Data servers to extract traffic insights on daily basis and without human intervention. The framework extracts Linked Data-specific traffic metrics from log records of HTTP lookups and SPARQL queries, and provides insights not available in traditional web analytics tools. Among all, we detect visitor sessions with a variant of hierarchical agglomerative clustering. We also identify workload peaks of SPARQL endpoints by detecting heavy and light SPARQL queries with supervised learning. The platform has been tested on 13 months of access logs of the British National Bibliography RDF dataset.

Luca Costabello, Pierre-Yves Vandenbussche, Gofran Shukair, Corine Deliot, Neil Wilson

Explaining Graph Navigational Queries

Graph navigational languages allow to specify pairs of nodes in a graph subject to the existence of paths satisfying a certain regular expression. Under this evaluation semantics, connectivity information in terms of intermediate nodes/edges that contributed to the answer is lost. The goal of this paper is to introduce the GeL language, which provides query evaluation semantics able to also capture connectivity information and output graphs. We show how this is useful to produce query explanations. We present efficient algorithms to produce explanations and discuss their complexity. GeL machineries are made available into existing SPARQL processors thanks to a translation from GeL queries into CONSTRUCT SPARQL queries. We outline examples of explanations obtained with a tool implementing our framework and report on an experimental evaluation that investigates the overhead of producing explanations.

Valeria Fionda, Giuseppe Pirrò

A SPARQL Extension for Generating RDF from Heterogeneous Formats

RDF aims at being the universal abstract data model for structured data on the Web. While there is effort to convert data in RDF, the vast majority of data available on the Web does not conform to RDF. Indeed, exposing data in RDF, either natively or through wrappers, can be very costly. Furthermore, in the emerging Web of Things, resource constraints of devices prevent from processing RDF graphs. Hence one cannot expect that all the data on the Web be available as RDF anytime soon. Several tools can generate RDF from non-RDF data, and transformation or mapping languages have been designed to offer more flexible solutions (GRDDL, XSPARQL, R2RML, RML, CSVW, etc.). In this paper, we introduce a new language, SPARQL-Generate, that generates RDF from: (i) a RDF Dataset, and (ii) a set of documents in arbitrary formats. As SPARQL-Generate is designed as an extension of SPARQL 1.1, it can provably: (i) be implemented on top on any existing SPARQL engine, and (ii) leverage the SPARQL extension mechanism to deal with an open set of formats. Furthermore, we show evidence that (iii) it can be easily learned by knowledge engineers that know SPARQL 1.1, and (iv) our first naive open source implementation performs better than the reference implementation of RML for big transformations.

Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally

Linked Data Track

Frontmatter

Exploiting Source-Object Networks to Resolve Object Conflicts in Linked Data

Considerable effort has been exerted to increase the scale of Linked Data. However, an inevitable problem arises when dealing with data integration from multiple sources. Various sources often provide conflicting objects for a certain predicate of the same real-world entity, thereby causing the so-called object conflict problem. At present, object conflict problem has not received sufficient attention in the Linked Data community. Thus, in this paper, we firstly formalize the object conflict resolution as computing the joint distribution of variables on a heterogeneous information network called the Source-Object Network, which successfully captures three correlations from objects and Linked Data sources. Then, we introduce a novel approach based on network effects called ObResolution (object resolution), to identify a true object from multiple conflicting objects. ObResolution adopts a pairwise Markov Random Field (pMRF) to model all evidence under a unified framework. Extensive experimental results on six real-world datasets show that our method achieves higher accuracy than existing approaches and it is robust and consistent in various domains.

Wenqiang Liu, Jun Liu, Haimeng Duan, Wei Hu, Bifan Wei

Methods for Intrinsic Evaluation of Links in the Web of Data

The current Web of Data contains a large amount of interlinked data. However, there is still a limited understanding about the quality of the links connecting entities of different and distributed data sets. Our goal is to provide a collection of indicators that help assess existing interlinking. In this paper, we present a framework for the intrinsic evaluation of RDF links, based on core principles of Web data integration and foundations of Information Retrieval. We measure the extent to which links facilitate the discovery of an extended description of entities, and the discovery of other entities in other data sets. We also measure the use of different vocabularies. We analysed links extracted from a set of data sets from the Linked Data Crawl 2014 using these measures.

Cristina Sarasua, Steffen Staab, Matthias Thimm

Entity Deduplication on ScholarlyData

ScholarlyData is the new and currently the largest reference linked dataset of the Semantic Web community about papers, people, organisations, and events related to its academic conferences. Originally started from the Semantic Web Dog Food (SWDF), it addressed multiple issues on data representation and maintenance by (i) adopting a novel data model and (ii) establishing an open source workflow to support the addition of new data from the community. Nevertheless, the major issue with the current dataset is the presence of multiple URIs for the same entities, typically in persons and organisations. In this work we: (i) perform entity deduplication on the whole dataset, using supervised classification methods; (ii) devise a protocol to choose the most representative URI for an entity and deprecate duplicated ones, while ensuring backward compatibilities for them; (iii) incorporate the automatic deduplication step in the general workflow to reduce the creation of duplicate URIs when adding new data. Our early experiment focused on the person and organisation URIs and results show significant improvement over state-of-the-art solutions. We managed to consolidate, on the entire dataset, over 100 and 800 pairs of duplicate person and organisation URIs and their associated triples (over 1,800 and 5,000) respectively, hence significantly improving the overall quality and connectivity of the data graph. Integrated into the ScholarlyData data publishing workflow, we believe that this serves a major step towards the creation of clean, high-quality scholarly linked data on the Semantic Web.

Ziqi Zhang, Andrea Giovanni Nuzzolese, Anna Lisa Gentile

Machine Learning Track

Frontmatter

Wombat – A Generalization Approach for Automatic Link Discovery

A significant portion of the evolution of Linked Data datasets lies in updating the links to other datasets. An important challenge when aiming to update these links automatically under the open-world assumption is the fact that usually only positive examples for the links exist. We address this challenge by presenting and evaluating Wombat, a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. Wombat is based on generalisation via an upward refinement operator to traverse the space of link specification. We study the theoretical characteristics of Wombat and evaluate it on 8 different benchmark datasets. Our evaluation suggests that Wombat outperforms state-of-the-art supervised approaches while relying on less information. Moreover, our evaluation suggests that Wombat’s pruning algorithm allows it to scale well even on large datasets.

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, Jens Lehmann

Actively Learning to Rank Semantic Associations for Personalized Contextual Exploration of Knowledge Graphs

Knowledge Graphs (KG) represent a large amount of Semantic Associations (SAs), i.e., chains of relations that may reveal interesting and unknown connections between different types of entities. Applications for the contextual exploration of KGs help users explore information extracted from a KG, including SAs, while they are reading an input text. Because of the large number of SAs that can be extracted from a text, a first challenge in these applications is to effectively determine which SAs are most interesting to the users, defining a suitable ranking function over SAs. However, since different users may have different interests, an additional challenge is to personalize this ranking function to match individual users’ preferences. In this paper we introduce a novel active learning to rank model to let a user rate small samples of SAs, which are used to iteratively learn a personalized ranking function. Experiments conducted with two data sets show that the approach is able to improve the quality of the ranking function with a limited number of user interactions.

Federico Bianchi, Matteo Palmonari, Marco Cremaschi, Elisabetta Fersini

Synthesizing Knowledge Graphs for Link and Type Prediction Benchmarking

Despite the growing amount of research in link and type prediction in knowledge graphs, systematic benchmark datasets are still scarce. In this paper, we propose a synthesis model for the generation of benchmark datasets for those tasks. Synthesizing data is a way of having control over important characteristics of the data, and allows the study of the impact of such characteristics on the performance of different methods. The proposed model uses existing knowledge graphs to create synthetic graphs with similar characteristics, such as distributions of classes, relations, and instances. As a first step, we replicate already existing knowledge graphs in order to validate the synthesis model. To do so, we perform extensive experiments with different link and type prediction methods. We show that we can systematically create knowledge graph benchmarks which allow for quantitative measurements of the result quality and scalability of link and type prediction methods.

André Melo, Heiko Paulheim

Online Relation Alignment for Linked Datasets

The large number of linked datasets in the Web, and their diversity in terms of schema representation has led to a fragmented dataset landscape. Querying and addressing information needs that span across disparate datasets requires the alignment of such schemas. Majority of schema and ontology alignment approaches focus exclusively on class alignment. Yet, relation alignment has not been fully addressed, and existing approaches fall short on addressing the dynamics of datasets and their size.In this work, we address the problem of relation alignment across disparate linked datasets. Our approach focuses on two main aspects. First, online relation alignment, where we do not require full access, and sample instead for a minimal subset of the data. Thus, we address the main limitation of existing work on dealing with the large scale of linked datasets, and in cases where the datasets provide only query access. Second, we learn supervised machine learning models for which we employ various features or matchers that account for the diversity of linked datasets at the instance level. We perform an experimental evaluation on real-world linked datasets, DBpedia, YAGO, and Freebase. The results show superior performance against state-of-the-art approaches in schema matching, with an average relation alignment accuracy of 84%. In addition, we show that relation alignment can be performed efficiently at scale.

Maria Koutraki, Nicoleta Preda, Dan Vodislav

Tuning Personalized PageRank for Semantics-Aware Recommendations Based on Linked Open Data

In this article we investigate how the knowledge available in the Linked Open Data cloud (LOD) can be exploited to improve the effectiveness of a semantics-aware graph-based recommendation framework based on Personalized PageRank (PPR).In our approach we extended the classic bipartite data model, in which only user-item connections are modeled, by injecting the exogenous knowledge about the items which is available in the LOD cloud. Our approach works in two steps: first, all the available items are automatically mapped to a DBpedia node; next, the resources gathered from DBpedia that describe the item are connected to the item nodes, thus enriching the original representation and giving rise to a tripartite data model. Such a data model can be exploited to provide users with recommendations by running PPR against the resulting representation and by suggesting the items with the highest PageRank score.In the experimental evaluation we showed that our semantics-aware recommendation framework exploiting DBpedia and PPR can overcome the performance of several state-of-the-art approaches. Moreover, a proper tuning of PPR parameters, obtained by better distributing the weights among the nodes modeled in the graph, further improved the overall accuracy of the framework and confirmed the effectiveness of our strategy.

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops

Terminological Cluster Trees for Disjointness Axiom Discovery

Despite the benefits deriving from explicitly modeling concept disjointness to increase the quality of the ontologies, the number of disjointness axioms in vocabularies for the Web of Data is still limited, thus risking to leave important constraints underspecified. Automated methods for discovering these axioms may represent a powerful modeling tool for knowledge engineers. For the purpose, we propose a machine learning solution that combines (unsupervised) distance-based clustering and the divide-and-conquer strategy. The resulting terminological cluster trees can be used to detect candidate disjointness axioms from emerging concept descriptions. A comparative empirical evaluation on different types of ontologies shows the feasibility and the effectiveness of the proposed solution that may be regarded as complementary to the current methods which require supervision or consider atomic concepts only.

Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito

Embedding Learning for Declarative Memories

The major components of the brain’s declarative or explicit memory are semantic memory and episodic memory. Whereas semantic memory stores general factual knowledge, episodic memory stores events together with their temporal and spatial contexts. We present mathematical models for declarative memories where we consider semantic memory to be represented by triples and episodes to be represented as quadruples i.e., triples in time. E.g., (Jack, receivedDiagnosis, Diabetes, Jan1) states that Jack was diagnosed with diabetes on January 1. Both from a cognitive and a technical perspective, an interesting research question is how declarative data can efficiently be stored and semantically be decoded. We propose that a suitable data representation for episodic event data is a 4-way tensor with dimensions subject, predicate, object, and time. We demonstrate that the 4-way tensor can be decomposed, e.g., using a 4-way Tucker model, which permits semantic decoding of an event, as well as efficient storage. We also propose that semantic memory can be derived from the episodic model by a marginalization of the time dimension, which can be performed efficiently. We argue that the storage of episodic memory typically requires models with a high rank, whereas semantic memory can be modelled with a comparably lower rank. We analyse experimentally the relationship between episodic and semantic memory models and discuss potential relationships to the corresponding brain’s cognitive memories.

Volker Tresp, Yunpu Ma, Stephan Baier, Yinchong Yang

Mobile Web, Sensors, and Semantic Streams Track

Frontmatter

Spatial Ontology-Mediated Query Answering over Mobility Streams

The development of (semi)-autonomous vehicles and communication between vehicles and infrastructure (V2X) will aid to improve road safety by identifying dangerous traffic scenes. A key to this is the Local Dynamic Map (LDM), which acts as an integration platform for static, semi-static, and dynamic information about traffic in a geographical context. At present, the LDM approach is purely database-oriented with simple query capabilities, while an elaborate domain model as captured by an ontology and queries over data streams that allow for semantic concepts and spatial relationships are still missing. To fill this gap, we present an approach in the context of ontology-mediated query answering that features conjunctive queries over DL-Lite$$_A$$ ontologies allowing spatial relations and window operators over streams having a pulse. For query evaluation, we present a rewriting approach to ordinary DL-Lite$$_A$$ that transforms spatial relations involving epistemic aggregate queries and uses a decomposition approach that generates a query execution plan. Finally, we report on experiments with two scenarios and evaluate our implementation based on the stream RDBMS PipelineDB.

Thomas Eiter, Josiane Xavier Parreira, Patrik Schneider

Optimizing the Performance of Concurrent RDF Stream Processing Queries

With the growing popularity of Internet of Things (IoT) and sensing technologies, a large number of data streams are being generated at a very rapid pace. To explore the potentials of the integration of IoT and semantic technologies, a few RDF Stream Processing (RSP) query engines are made available which are capable of processing, analyzing and reasoning over semantic data streams in real-time. This way, RSP mitigates data interoperability issues and promotes knowledge discovery and smart decision making for time-sensitive applications. However, a major hurdle in the wide adoption of RSP systems is their query performance. Particularly, the ability of RSP engines to handle a large number of concurrent queries is very limited which refrains large scale stream processing applications (e.g. smart city applications) to adopt RSP. In this paper, we propose a shared-join based approach to improve the performance of an RSP engine for concurrent queries. We also leverage query federation mechanisms to allow distributed query processing over multiple RSP engine instances in order to gain performance for concurrent and distributed queries. We apply load balancing strategies to distribute queries and further optimize the concurrent query performance. We provide a proof of concept implementation by extending CQELS RSP engine and evaluate our approach using existing benchmark datasets for RSP. We also compare the performance of our proposed approach with the state of the art implementation of CQELS RSP engine.

Chan Le Van, Feng Gao, Muhammad Intizar Ali

AGACY Monitoring: A Hybrid Model for Activity Recognition and Uncertainty Handling

Acquiring an ongoing human activity from raw sensor data is a challenging problem in pervasive systems. Earlier, research in this field has mainly adopted data-driven or knowledge based techniques for the activity recognition, however these techniques suffer from a number of drawbacks. Therefore, recent works have proposed a combination of these techniques. Nevertheless, they still do not handle sensor data uncertainty. In this paper, we propose a new hybrid model called AGACY Monitoring to cope with the uncertain nature of the sensor data. Moreover, we present a new algorithm to infer the activity instances by exploiting the obtained uncertainty values. The experimental evaluation of AGACY Monitoring with a large real-world dataset has proved the viability and efficiency of our solution.

Hela Sfar, Amel Bouzeghoub, Nathan Ramoly, Jérôme Boudy

Natural Language Processing and Information Retrieval Track

Frontmatter

Mapping Natural Language to Description Logic

While much work on automated ontology enrichment has focused on mining text for concepts and relations, little attention has been paid to the task of enriching ontologies with complex axioms. In this paper, we focus on a form of text that is frequent in industry, namely system installation design principle (SIDP) and we present a framework which can be used both to map SIDPs to OWL DL axioms and to assess the quality of these automatically derived axioms. We present experimental results on a set of 960 SIDPs provided by Airbus which demonstrate (i) that the approach is robust (97.50% of the SIDPs can be parsed) and (ii) that DL axioms assigned to full parses are very likely to be correct in 96% of the cases.

Bikash Gyawali, Anastasia Shimorina, Claire Gardent, Samuel Cruz-Lara, Mariem Mahfoudh

Harnessing Diversity in Crowds and Machines for Better NER Performance

Over the last years, information extraction tools have gained a great popularity and brought significant performance improvement in extracting meaning from structured or unstructured data. For example, named entity recognition (NER) tools identify types such as people, organizations or places in text. However, despite their high F1 performance, NER tools are still prone to brittleness due to their highly specialized and constrained input and training data. Thus, each tool is able to extract only a subset of the named entities (NE) mentioned in a given text. In order to improve NE Coverage, we propose a hybrid approach, where we first aggregate the output of various NER tools and then validate and extend it through crowdsourcing. The results from our experiments show that this approach performs significantly better than the individual state-of-the-art tools (including existing tools that integrate individual outputs already). Furthermore, we show that the crowd is quite effective in (1) identifying mistakes, inconsistencies and ambiguities in currently used ground truth, as well as in (2) a promising approach to gather ground truth annotations for NER that capture a multitude of opinions.

Oana Inel, Lora Aroyo

All that Glitters Is Not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking

The evaluation of Named Entity Recognition as well as Entity Linking systems is mostly based on manually created gold standards. However, the current gold standards have three main drawbacks. First, they do not share a common set of rules pertaining to what is to be marked and linked as an entity. Moreover, most of the gold standards have not been checked by other researchers after they were published. Hence, they commonly contain mistakes. Finally, many gold standards lack actuality as in most cases the reference knowledge bases used to link entities are refined over time while the gold standards are typically not updated to the newest version of the reference knowledge base. In this work, we analyze existing gold standards and derive a set of rules for annotating documents for named entity recognition and entity linking. We derive Eaglet, a tool that supports the semi-automatic checking of a gold standard based on these rules. A manual evaluation of Eaglet’s results shows that it achieves an accuracy of up to 88% when detecting errors. We apply Eaglet to 13 English gold standards and detect 38,453 errors. An evaluation of 10 tools on a subset of these datasets shows a performance difference of up to 10% micro F-measure on average.

Kunal Jha, Michael Röder, Axel-Cyrille Ngonga Ngomo

Semantic Annotation of Data Processing Pipelines in Scientific Publications

Data processing pipelines are a core object of interest for data scientist and practitioners operating in a variety of data-related application domains. To effectively capitalise on the experience gained in the creation and adoption of such pipelines, the need arises for mechanisms able to capture knowledge about datasets of interest, data processing methods designed to achieve a given goal, and the performance achieved when applying such methods to the considered datasets. However, due to its distributed and often unstructured nature, this knowledge is not easily accessible. In this paper, we use (scientific) publications as source of knowledge about Data Processing Pipelines. We describe a method designed to classify sentences according to the nature of the contained information (i.e. scientific objective, dataset, method, software, result), and to extract relevant named entities. The extracted information is then semantically annotated and published as linked data in open knowledge repositories according to the DMS ontology for data processing metadata. To demonstrate the effectiveness and performance of our approach, we present the results of a quantitative and qualitative analysis performed on four different conference series.

Sepideh Mesbah, Kyriakos Fragkeskos, Christoph Lofi, Alessandro Bozzon, Geert-Jan Houben

Combining Word and Entity Embeddings for Entity Linking

The correct identification of the link between an entity mention in a text and a known entity in a large knowledge base is important in information retrieval or information extraction. The general approach for this task is to generate, for a given mention, a set of candidate entities from the base and, in a second step, determine which is the best one. This paper proposes a novel method for the second step which is based on the joint learning of embeddings for the words in the text and the entities in the knowledge base. By learning these embeddings in the same space we arrive at a more conceptually grounded model that can be used for candidate selection based on the surrounding context. The relative improvement of this approach is experimentally validated on a recent benchmark corpus from the TAC-EDL 2015 evaluation campaign.

Jose G. Moreno, Romaric Besançon, Romain Beaumont, Eva D’hondt, Anne-Laure Ligozat, Sophie Rosset, Xavier Tannier, Brigitte Grau

Beyond Time: Dynamic Context-Aware Entity Recommendation

Entities and their relatedness are useful information in various tasks such as entity disambiguation, entity recommendation or search. In many cases, entity relatedness is highly affected by dynamic contexts, which can be reflected in the outcome of different applications. However, the role of context is largely unexplored in existing entity relatedness measures. In this paper, we introduce the notion of contextual entity relatedness, and show its usefulness in the new yet important problem of context-aware entity recommendation. We propose a novel method of computing the contextual relatedness with integrated time and topic models. By exploiting an entity graph and enriching it with an entity embedding method, we show that our proposed relatedness can effectively recommend entities, taking contexts into account. We conduct large-scale experiments on a real-world data set, and the results show considerable improvements of our solution over the states of the art.

Nam Khanh Tran, Tuan Tran, Claudia Niederée

Vocabularies, Schemas, and Ontologies Track

Frontmatter

Patterns for Heterogeneous TBox Mappings to Bridge Different Modelling Decisions

Correspondence patterns have been proposed as templates of commonly used alignments between heterogeneous elements in ontologies, although design tools are currently not equipped with handling these definition alignments nor pattern alignments. We aim to address this by, first, formalising the notion of design pattern; secondly, defining typical modelling choice patterns and their alignments; and finally, proposing algorithms for integrating automatic pattern detection into existing ontology design tools. This gave rise to six formalised pattern alignments and two efficient local search and pattern matching algorithms to propose possible pattern alignments to the modeller.

Pablo Rubén Fillottrani, C. Maria Keet

Exploring Importance Measures for Summarizing RDF/S KBs

Given the explosive growth in the size and the complexity of the Data Web, there is now more than ever, an increasing need to develop methods and tools in order to facilitate the understanding and exploration of RDF/S Knowledge Bases (KBs). To this direction, summarization approaches try to produce an abridged version of the original data source, highlighting the most representative concepts. Central questions to summarization are: how to identify the most important nodes and then how to link them in order to produce a valid sub-schema graph. In this paper, we try to answer the first question by revisiting six well-known measures from graph theory and adapting them for RDF/S KBs. Then, we proceed further to model the problem of linking those nodes as a graph Steiner-Tree problem (GSTP) employing approximations and heuristics to speed up the execution of the respective algorithms. The performed experiments show the added value of our approach since (a) our adaptations outperform current state of the art measures for selecting the most important nodes and (b) the constructed summary has a better quality in terms of the additional nodes introduced to the generated summary.

Alexandros Pappas, Georgia Troullinou, Giannis Roussakis, Haridimos Kondylakis, Dimitris Plexousakis

Data-Driven Joint Debugging of the DBpedia Mappings and Ontology

Towards Addressing the Causes Instead of the Symptoms of Data Quality in DBpedia

DBpedia is a large-scale, cross-domain knowledge graph extracted from Wikipedia. For the extraction, crowd-sourced mappings from Wikipedia infoboxes to the DBpedia ontology are utilized. In this process, different problems may arise: users may create wrong and/or inconsistent mappings, use the ontology in an unforeseen way, or change the ontology without considering all possible consequences. In this paper, we present a data-driven approach to discover problems in mappings as well as in the ontology and its usage in a joint, data-driven process. We show both quantitative and qualitative results about the problems identified, and derive proposals for altering mappings and refactoring the DBpedia ontology.

Heiko Paulheim

Rule-Based OWL Modeling with ROWLTab Protégé Plugin

It has been argued that it is much easier to convey logical statements using rules rather than OWL (or description logic (DL)) axioms. Based on recent theoretical developments on transformations between rules and DLs, we have developed ROWLTab, a Protégé plugin that allows users to enter OWL axioms by way of rules; the plugin then automatically converts these rules into OWL 2 DL axioms if possible, and prompts the user in case such a conversion is not possible without weakening the semantics of the rule. In this paper, we present ROWLTab, together with a user evaluation of its effectiveness compared to entering axioms using the standard Protégé interface. Our evaluation shows that modeling with ROWLTab is much quicker than the standard interface, while at the same time, also less prone to errors for hard modeling tasks.

Md. Kamruzzaman Sarker, Adila Krisnadhi, David Carral, Pascal Hitzler

Chaudron: Extending DBpedia with Measurement

Wikipedia is the largest collaborative encyclopedia and is used as the source for DBpedia, a central dataset of the LOD cloud. Wikipedia contains numerous numerical measures on the entities it describes, as per the general character of the data it encompasses. The DBpedia Information Extraction Framework transforms semi-structured data from Wikipedia into structured RDF. However this extraction framework offers a limited support to handle measurement in Wikipedia.In this paper, we describe the automated process that enables the creation of the Chaudron dataset. We propose an alternative extraction to the traditional mapping creation from Wikipedia dump, by also using the rendered HTML to avoid the template transclusion issue.This dataset extends DBpedia with more than 3.9 million triples and 949.000 measurements on every domain covered by DBpedia. We define a multi-level approach powered by a formal grammar that proves very robust on the extraction of measurement. An extensive evaluation against DBpedia and Wikidata shows that our approach largely surpasses its competitors for measurement extraction on Wikipedia Infoboxes. Chaudron exhibits a F1-score of .89 while DBpedia and Wikidata respectively reach 0.38 and 0.10 on this extraction task.

Julien Subercaze

SM4MQ: A Semantic Model for Multidimensional Queries

On-Line Analytical Processing (OLAP) is a data analysis approach to support decision-making. On top of that, Exploratory OLAP is a novel initiative for the convergence of OLAP and the Semantic Web (SW) that enables the use of OLAP techniques on SW data. Moreover, OLAP approaches exploit different metadata artifacts (e.g., queries) to assist users with the analysis. However, modeling and sharing of most of these artifacts are typically overlooked. Thus, in this paper we focus on the query metadata artifact in the Exploratory OLAP context and propose an RDF-based vocabulary for its representation, sharing, and reuse on the SW. As OLAP is based on the underlying multidimensional (MD) data model we denote such queries as MD queries and define SM4MQ: A Semantic Model for Multidimensional Queries. Furthermore, we propose a method to automate the exploitation of queries by means of SPARQL. We apply the method to a use case of transforming queries from SM4MQ to a vector representation. For the use case, we developed the prototype and performed an evaluation that shows how our approach can significantly ease and support user assistance such as query recommendation.

Jovan Varga, Ekaterina Dobrokhotova, Oscar Romero, Torben Bach Pedersen, Christian Thomsen

Using Insights from Psychology and Language to Improve How People Reason with Description Logics

Inspired by insights from theories of human reasoning and language, we propose additions to the Manchester OWL Syntax to improve comprehensibility. These additions cover: functional and inverse functional properties, negated conjunction, the definition of exceptions, and existential and universal restrictions. By means of an empirical study, we demonstrate the effectiveness of a number of these additions, in particular: the use of solely to clarify the uniqueness of the object in a functional property; the replacement of and with intersection in conjunction, which was particularly beneficial in negated conjunction; the use of except as a substitute for and not; and the replacement of some with including and only with noneOrOnly, which helped in certain situations to clarify the nature of these restrictions.

Paul Warren, Paul Mulholland, Trevor Collins, Enrico Motta

Reasoning Track

Frontmatter

Updating Wikipedia via DBpedia Mappings and SPARQL

DBpedia crystallized most of the concepts of the Semantic Web using simple mappings to convert Wikipedia articles (i.e., infoboxes and tables) to RDF data. This “semantic view” of wiki content has rapidly become the focal point of the Linked Open Data cloud, but its impact on the original Wikipedia source is limited. In particular, little attention has been paid to the benefits that the semantic infrastructure can bring to maintain the wiki content, for instance to ensure that the effects of a wiki edit are consistent across infoboxes. In this paper, we present an approach to allow ontology-based updates of wiki content. Starting from DBpedia-like mappings converting infoboxes to a fragment of OWL 2 RL ontology, we discuss various issues associated with translating SPARQL updates on top of semantic data to the underlying Wiki content. On the one hand, we provide a formalization of DBpedia as an Ontology-Based Data Management framework and study its computational properties. On the other hand, we provide a novel approach to the inherently intractable update translation problem, leveraging the pre-existent data for disambiguating updates.

Albin Ahmeti, Javier D. Fernández, Axel Polleres, Vadim Savenkov

Learning Commonalities in RDF

Finding the commonalities between descriptions of data or knowledge is a foundational reasoning problem of Machine Learning introduced in the 70’s, which amounts to computing a least general generalization($$\mathtt {lgg}$$) of such descriptions. It has also started receiving consideration in Knowledge Representation from the 90’s, and recently in the Semantic Web field. We revisit this problem in the popular Resource Description Framework (RDF) of W3C, where descriptions are RDF graphs, i.e., a mix of data and knowledge. Notably, and in contrast to the literature, our solution to this problem holds for the entire RDF standard, i.e., we do not restrict RDF graphs in any way (neither their structure nor their semantics based on RDF entailment, i.e., inference) and, further, our algorithms can compute $$\mathtt {lgg}$$s of small-to-huge RDF graphs.

Sara El Hassad, François Goasdoué, Hélène Jaudoin

Lean Kernels in Description Logics

Lean kernels (LKs) are an effective optimization for deriving the causes of unsatisfiability of a propositional formula. Interestingly, no analogous notion exists for explaining consequences of description logic (DL) ontologies. We introduce LKs for DLs using a general notion of consequence-based methods, and provide an algorithm for computing them which incurs in only a linear time overhead. As an example, we instantiate our framework to the DL $${\mathcal {ALC}}$$. We prove formally and empirically that LKs provide a tighter approximation of the set of relevant axioms for a consequence than syntactic locality-based modules.

Rafael Peñaloza, Carlos Mencía, Alexey Ignatiev, Joao Marques-Silva

Social Web and Web Science Track

Frontmatter

Open Access

Linked Data Notifications: A Resource-Centric Communication Protocol

In this article we describe the Linked Data Notifications (LDN) protocol, which is a W3C Candidate Recommendation. Notifications are sent over the Web for a variety of purposes, for example, by social applications. The information contained within a notification is structured arbitrarily, and typically only usable by the application which generated it in the first place. In the spirit of Linked Data, we propose that notifications should be reusable by multiple authorised applications. Through separating the concepts of senders, receivers and consumers of notifications, and leveraging Linked Data principles of shared vocabularies and URIs, LDN provides a building block for decentralised Web applications. This permits end users more freedom to switch between the online tools they use, as well as generating greater value when notifications from different sources can be used in combination. We situate LDN alongside related initiatives, and discuss additional considerations such as security and abuse prevention measures. We evaluate the protocol’s effectiveness by analysing multiple, independent implementations, which pass a suite of formal tests and can be demonstrated interoperating with each other. To experience the described features please open this document in your Web browser under its canonical URI: http://csarven.ca/linked-data-notifications.

Sarven Capadisli, Amy Guy, Christoph Lange, Sören Auer, Andrei Sambra, Tim Berners-Lee

PDF Zum Volltext

Crowdsourced Affinity: A Matter of Fact or Experience

User-entity affinity is an essential component of many user-centric information systems such as online advertising, exploratory search, recommender system etc. The affinity is often assessed by analysing the interactions between users and entities within a data space. Among different affinity assessment techniques, content-based ones hypothesize that users have higher affinity with entities similar to the ones with which they had positive interactions in the past. Knowledge graph and folksonomy are respectively the milestones of Semantic Web and Social Web. Despite their shared crowdsourcing trait (not necessarily all knowledge graphs but some major large-scale ones), the encoded data are different in nature and structure. Knowledge graph encodes factual data with a formal ontology. Folksonomy encodes experience data with a loose structure. Many efforts have been made to make sense of folksonomy and to structure the community knowledge inside. Both data spaces allow to compute similarity between entities which can thereafter be used to calculate user-entity affinity. In this paper, we are interested in observing their comparative performance in the affinity assessment task. To this end, we carried out a first experiment within a travel destination recommendation scenario on a gold standard dataset. Our main findings are that knowledge graph helps to assess more accurately the affinity but folksonomy helps to increase the diversity and the novelty. This interesting complementarity motivated us to develop a semantic affinity framework to harvest the benefits of both data spaces. A second experiment with real users showed the utility of the proposed framework and confirmed our findings.

Chun Lu, Milan Stankovic, Filip Radulovic, Philippe Laublet

A Semantic Graph-Based Approach for Radicalisation Detection on Social Media

From its start, the so-called Islamic State of Iraq and the Levant (ISIL/ISIS) has been successfully exploiting social media networks, most notoriously Twitter, to promote its propaganda and recruit new members, resulting in thousands of social media users adopting a pro-ISIS stance every year. Automatic identification of pro-ISIS users on social media has, thus, become the centre of interest for various governmental and research organisations. In this paper we propose a semantic graph-based approach for radicalisation detection on Twitter. Unlike previous works, which mainly rely on the lexical representation of the content published by Twitter users, our approach extracts and makes use of the underlying semantics of words exhibited by these users to identify their pro/anti-ISIS stances. Our results show that classifiers trained from semantic features outperform those trained from lexical, sentiment, topic and network features by 7.8% on average F1-measure.

Hassan Saif, Thomas Dickinson, Leon Kastler, Miriam Fernandez, Harith Alani

Semantic Web and Transparency Track

Frontmatter

Modeling and Querying Greek Legislation Using Semantic Web Technologies

In this work, we study how legislation can be published as open data using semantic web technologies. We focus on Greek legislation and show how it can be modeled using ontologies expressed in OWL and RDF, and queried using SPARQL. To demonstrate the applicability and usefulness of our approach, we develop a web application, called Nomothesia, which makes Greek legislation easily accessible to the public. Nomothesia offers advanced services for retrieving and querying Greek legislation and is intended for citizens through intuitive presentational views and search interfaces, but also for application developers that would like to consume content through two web services: a SPARQL endpoint and a RESTful API. Opening up legislation in this way is a great leap towards making governments accountable to citizens and increasing transparency.

Ilias Chalkidis, Charalampos Nikolaou, Panagiotis Soursos, Manolis Koubarakis

Self-Enforcing Access Control for Encrypted RDF

The amount of raw data exchanged via web protocols is steadily increasing. Although the Linked Data infrastructure could potentially be used to selectively share RDF data with different individuals or organisations, the primary focus remains on the unrestricted sharing of public data. In order to extend the Linked Data paradigm to cater for closed data, there is a need to augment the existing infrastructure with robust security mechanisms. At the most basic level both access control and encryption mechanisms are required. In this paper, we propose a flexible and dynamic mechanism for securely storing and efficiently querying RDF datasets. By employing an encryption strategy based on Functional Encryption (FE) in which controlled data access does not require a trusted mediator, but is instead enforced by the cryptographic approach itself, we allow for fine-grained access control over encrypted RDF data while at the same time reducing the administrative overhead associated with access control management.

Javier D. Fernández, Sabrina Kirrane, Axel Polleres, Simon Steyskal

Removing Barriers to Transparency: A Case Study on the Use of Semantic Technologies to Tackle Procurement Data Inconsistency

Public Procurement (PP) information, made available as Open Government Data (OGD), leads to tangible benefits to identify government spending for goods and services. Nevertheless, making data freely available is a necessary, but not sufficient condition for improving transparency. Fragmentation of OGD due to diverse processes adopted by different administrations and inconsistency within data affect opportunities to obtain valuable information. In this article, we propose a solution based on linked data to integrate existing datasets and to enhance information coherence. We present an application of such principles through a semantic layer built on Italian PP information available as OGD. As result, we overcame the fragmentation of datasources and increased the consistency of information, enabling new opportunities for analyzing data to fight corruption and for raising competition between companies in the market.

Giuseppe Futia, Alessio Melandri, Antonio Vetrò, Federico Morando, Juan Carlos De Martin

NdFluents: An Ontology for Annotated Statements with Inference Preservation

RDF provides the means to publish, link, and consume heterogeneous information on the Web of Data, whereas OWL allows the construction of ontologies and inference of new information that is implicit in the data. Annotating RDF data with additional information, such as provenance, trustworthiness, or temporal validity is becoming more and more important in recent times; however, it is possible to natively represent only binary (or dyadic) relations between entities in RDF and OWL. While there are some approaches to represent metadata on RDF, they lose most of the reasoning power of OWL. In this paper we present an extension of Welty and Fikes’ 4dFluents ontology—on associating temporal validity to statements—to any number of dimensions, provide guidelines and design patterns to implement it on actual data, and compare its reasoning power with alternative representations.

José M. Giménez-García, Antoine Zimmermann, Pierre Maret

Adopting Semantic Technologies for Effective Corporate Transparency

A new transparency model with more and better corporate data is necessary to promote sustainable economic growth. In particular, there is a need to link factors regarding non-financial performance of corporations - such as social and environmental impacts, both positive and negative - into decision-making processes of investors and other stakeholders. To do this, we need to develop better ways to access and analyse corporate social, environmental and financial performance information, and to link together insights from these different sources. Such sources are already on the web in non-structured and structured data formats, a big part of them in XBRL (Extensible Business Reporting Language). This study is about promoting solutions to drive effective transparency for a sustainable economy, given the current adoption of XBRL, and the new opportunities that Linked Data can offer. We present (1) a methodology to formalise XBRL as RDF using Linked data principles and (2) demonstrate its usefulness through a use case connecting and making the data accessible.

Maria Mora-Rodriguez, Ghislain Auguste Atemezing, Chris Preist

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Semantic Data Management, Big Data, and Scalability Track

Frontmatter

Traffic Analytics for Linked Data Publishers

Explaining Graph Navigational Queries

A SPARQL Extension for Generating RDF from Heterogeneous Formats

Linked Data Track

Frontmatter

Exploiting Source-Object Networks to Resolve Object Conflicts in Linked Data

Methods for Intrinsic Evaluation of Links in the Web of Data

Entity Deduplication on ScholarlyData

Machine Learning Track

Frontmatter

Wombat – A Generalization Approach for Automatic Link Discovery

Actively Learning to Rank Semantic Associations for Personalized Contextual Exploration of Knowledge Graphs

Synthesizing Knowledge Graphs for Link and Type Prediction Benchmarking

Online Relation Alignment for Linked Datasets

Tuning Personalized PageRank for Semantics-Aware Recommendations Based on Linked Open Data

Terminological Cluster Trees for Disjointness Axiom Discovery

Embedding Learning for Declarative Memories

Mobile Web, Sensors, and Semantic Streams Track

Frontmatter

Spatial Ontology-Mediated Query Answering over Mobility Streams

Optimizing the Performance of Concurrent RDF Stream Processing Queries

AGACY Monitoring: A Hybrid Model for Activity Recognition and Uncertainty Handling

Natural Language Processing and Information Retrieval Track

Frontmatter

Mapping Natural Language to Description Logic

Harnessing Diversity in Crowds and Machines for Better NER Performance

All that Glitters Is Not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking

Semantic Annotation of Data Processing Pipelines in Scientific Publications

Combining Word and Entity Embeddings for Entity Linking

Beyond Time: Dynamic Context-Aware Entity Recommendation

Vocabularies, Schemas, and Ontologies Track

Frontmatter

Patterns for Heterogeneous TBox Mappings to Bridge Different Modelling Decisions

Exploring Importance Measures for Summarizing RDF/S KBs

Data-Driven Joint Debugging of the DBpedia Mappings and Ontology

Rule-Based OWL Modeling with ROWLTab Protégé Plugin

Chaudron: Extending DBpedia with Measurement

SM4MQ: A Semantic Model for Multidimensional Queries

Using Insights from Psychology and Language to Improve How People Reason with Description Logics

Reasoning Track

Frontmatter

Updating Wikipedia via DBpedia Mappings and SPARQL

Learning Commonalities in RDF

Lean Kernels in Description Logics

Social Web and Web Science Track

Frontmatter

Linked Data Notifications: A Resource-Centric Communication Protocol

Crowdsourced Affinity: A Matter of Fact or Experience

A Semantic Graph-Based Approach for Radicalisation Detection on Social Media

Semantic Web and Transparency Track

Frontmatter

Modeling and Querying Greek Legislation Using Semantic Web Technologies

Self-Enforcing Access Control for Encrypted RDF

Removing Barriers to Transparency: A Case Study on the Use of Semantic Technologies to Tackle Procurement Data Inconsistency

NdFluents: An Ontology for Annotated Statements with Inference Preservation

Adopting Semantic Technologies for Effective Corporate Transparency

Backmatter

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.