Skip to main content

Über dieses Buch

This book constitutes the refereed proceedings of the 20th InternationalConference on Knowledge Engineering and Knowledge Management, EKAW 2016,held in Bologna, Italy, in November 2016.
The 51 full papers presented were carefully reviewed and selected from 171 submissions. The papers cover all aspects of eliciting, acquiring, modeling, and managing knowledge, the construction of knowledge-intensive systems and services for the Semantic Web, knowledge management, e-business, natural language processing,intelligent information integration, personal digital assistance systems, and a variety of other related topics. A special focus was on "evolving knowledge", i.e., the impact of space and time on knowledge representation, concerning all aspects of the management and acquisition of knowledge representation of evolving, contextual, and local models.



Research Papers


Automatic Key Selection for Data Linking

The paper proposes an RDF key ranking approach that attempts to close the gap between automatic key discovery and data linking approaches and thus reduce the user effort in linking configuration. Indeed, data linking tool configuration is a laborious process, where the user is often required to select manually the properties to compare, which supposes an in-depth expert knowledge of the data. Key discovery techniques attempt to facilitate this task, but in a number of cases do not fully succeed, due to the large number of keys produced, lacking a confidence indicator. Since keys are extracted from each dataset independently, their effectiveness for the matching task, involving two datasets, is undermined. The approach proposed in this work suggests to unlock the potential of both key discovery techniques and data linking tools by providing to the user a limited number of merged and ranked keys, well-suited to a particular matching task. In addition, the complementarity properties of a small number of top-ranked keys is explored, showing that their combined use improves significantly the recall. We report our experiments on data from the Ontology Alignment Evaluation Initiative, as well as on real-world benchmark data about music.

Manel Achichi, Mohamed Ben Ellefi, Danai Symeonidou, Konstantin Todorov

Selection and Combination of Heterogeneous Mappings to Enhance Biomedical Ontology Matching

This paper presents a novel background knowledge approach which selects and combines existing mappings from a given biomedical ontology repository to improve ontology alignment. Current background knowledge approaches usually select either manually or automatically a limited number of different ontologies and use them as a whole for background knowledge. Whereas in our approach, we propose to pick up only relevant concepts and relevant existing mappings linking these concepts all together in a specific and customized background knowledge graph. Paths within this graph will help to discover new mappings. We have implemented and evaluated our approach using the content of the NCBO BioPortal repository and the Anatomy benchmark from the Ontology Alignment Evaluation Initiative. We used the mapping gain measure to assess how much our final background knowledge graph improves results of state-of-the-art alignment systems. Furthermore, the evaluation shows that our approach produces a high quality alignment and discovers mappings that have not been found by state-of-the-art systems.

Amina Annane, Zohra Bellahsene, Faiçal Azouaou, Clement Jonquet

Populating a Knowledge Base with Object-Location Relations Using Distributional Semantics

The paper presents an approach to extract knowledge from large text corpora, in particular knowledge that facilitates object manipulation by embodied intelligent systems that need to act in the world. As a first step, our goal is to extract the prototypical location of given objects from text corpora. We approach this task by calculating relatedness scores for objects and locations using techniques from distributional semantics. We empirically compare different methods for representing locations and objects as vectors in some geometric space, and we evaluate them with respect to a crowd-sourced gold standard in which human subjects had to rate the prototypicality of a location given an object. By applying the proposed framework on DBpedia, we are able to build a knowledge base of 931 high confidence object-locations relations in a fully automatic fashion (The work in this paper is partially funded by the ALOOF project (CHIST-ERA program)).

Valerio Basile, Soufian Jebbara, Elena Cabrio, Philipp Cimiano

Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time $$t+1$$t+1, using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.

Amparo Elizabeth Cano-Basave, Francesco Osborne, Angelo Antonio Salatino

Leveraging the Impact of Ontology Evolution on Semantic Annotations

This paper deals with the problem of maintenance of semantic annotations produced based on domain ontologies. Many annotated texts have been produced and made available to end-users. If not reviewed regularly, the quality of these annotations tends to decrease over time due to the evolution of the domain ontologies. The quality of these annotations is critical for tools that exploit them (e.g., search engines and decision support systems) and need to ensure an acceptable level of performance. Although the recent advances for ontology-based annotation systems to annotate new documents, the maintenance of existing annotations remains under studied. In this work we present an analysis of the impact of ontology evolution on existing annotations. To do so, we used two well-known annotators to generate more than 66 million annotations from a pre-selected set of 5000 biomedical journal articles and standard ontologies covering a period ranging from 2004 to 2016. We highlight the correlation between changes in the ontologies and changes in the annotations and we discuss the necessity to improve existing annotation formalisms in order to include elements required to support (semi-) automatic annotation maintenance mechanisms.

Silvio Domingos Cardoso, Cédric Pruski, Marcos Da Silveira, Ying-Chi Lin, Anika Groß, Erhard Rahm, Chantal Reynaud-Delaître

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Automatic estimation of the quality of Web documents is a challenging task, especially because the definition of quality heavily depends on the individuals who define it, on the context where it applies, and on the nature of the tasks at hand. Our long-term goal is to allow automatic assessment of Web document quality tailored to specific user requirements and context. This process relies on the possibility to identify document characteristics that indicate their quality. In this paper, we investigate these characteristics as follows: (1) we define features of Web documents that may be indicators of quality; (2) we design a procedure for automatically extracting those features; (3) develop a Web application to present these results to niche users to check the relevance of these features as quality indicators and collect quality assessments; (4) we analyse user’s qualitative assessment of Web documents to refine our definition of the features that determine quality, and establish their relevant weight in the overall quality, i.e., in the summarizing score users attribute to a document, determining whether it meets their standards or not. Hence, our contribution is threefold: a Web application for nichesourcing quality assessments; a curated dataset of Web document assessments; and a thorough analysis of the quality assessments collected by means of two case studies involving experts (journalists and media scholars). The dataset obtained is limited in size but highly valuable because of the quality of the experts that provided it. Our analyses show that: (1) it is possible to automate the process of Web document quality estimation to a level of high accuracy; (2) document features shown in isolation are poorly informative to users; and (3) related to the tasks we propose (i.e., choosing Web documents to use as a source for writing an article on the vaccination debate), the most important quality dimensions are accuracy, trustworthiness, and precision.

Davide Ceolin, Julia Noordegraaf, Lora Aroyo

Active Integrity Constraints for Multi-context Systems

We introduce a formalism to couple integrity constraints over general-purpose knowledge bases with actions that can be executed to restore consistency. This formalism generalizes active integrity constraints over databases. In the more general setting of multi-context systems, adding repair suggestions to integrity constraints allows defining simple iterative algorithms to find all possible grounded repairs – repairs for the global system that follow the suggestions given by the actions in the individual rules. We apply our methodology to ontologies, and show that it can express most relevant types of integrity constraints in this domain.

Luís Cruz-Filipe, Graça Gaspar, Isabel Nunes, Peter Schneider-Kamp

Evolutionary Discovery of Multi-relational Association Rules from Ontological Knowledge Bases

In the Semantic Web, OWL ontologies play the key role of domain conceptualizations, while the corresponding assertional knowledge is given by the heterogeneous Web resources referring to them. However, being strongly decoupled, ontologies and assertional knowledge can be out of sync. In particular, an ontology may be incomplete, noisy, and sometimes inconsistent with the actual usage of its conceptual vocabulary in the assertions. Despite of such problematic situations, we aim at discovering hidden knowledge patterns from ontological knowledge bases, in the form of multi-relational association rules, by exploiting the evidence coming from the (evolving) assertional data. The final goal is to make use of such patterns for (semi-)automatically enriching/completing existing ontologies. An evolutionary search method applied to populated ontological knowledge bases is proposed for the purpose. The method is able to mine intensional and assertional knowledge by exploiting problem-aware genetic operators, echoing the refinement operators of inductive logic programming, and by taking intensional knowledge into account, which allows to restrict the search space and direct the evolutionary process. The discovered rules are represented in SWRL, so that they can be straightforwardly integrated within the ontology, thus enriching its expressive power and augmenting the assertional knowledge that can be derived from it. Discovered rules may also suggest new (schema) axioms to be added to the ontology. We performed experiments on publicly available ontologies, validating the performances of our approach and comparing them with the main state-of-the-art systems.

Claudia d’Amato, Andrea G. B. Tettamanzi, Tran Duc Minh

An Incremental Learning Method to Support the Annotation of Workflows with Data-to-Data Relations

Workflow formalisations are often focused on the representation of a process with the primary objective to support execution. However, there are scenarios where what needs to be represented is the effect of the process on the data artefacts involved, for example when reasoning over the corresponding data policies. This can be achieved by annotating the workflow with the semantic relations that occur between these data artefacts. However, manually producing such annotations is difficult and time consuming. In this paper we introduce a method based on recommendations to support users in this task. Our approach is centred on an incremental rule association mining technique that allows to compensate the cold start problem due to the lack of a training set of annotated workflows. We discuss the implementation of a tool relying on this approach and how its application on an existing repository of workflows effectively enable the generation of such annotations.

Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta

A Query Model to Capture Event Pattern Matching in RDF Stream Processing Query Languages

The current state of the art in RDF Stream Processing (RSP) proposes several models and implementations to combine Semantic Web technologies with Data Stream Management System (DSMS) operators like windows. Meanwhile, only a few solutions combine Semantic Web and Complex Event Processing (CEP), which includes relevant features, such as identifying sequences of events in streams. Current RSP query languages that support CEP features have several limitations: EP-SPARQL can identify sequences, but its selection and consumption policies are not all formally defined, while C-SPARQL offers only a naive support to pattern detection through a timestamp function. In this work, we introduce an RSP query language, called RSEP-QL, which supports both DSMS and CEP operators, with a special interest in formalizing CEP selection and consumption policies. We show that RSEP-QL captures EP-SPARQL and C-SPARQL, and offers features going beyond the ones provided by current RSP query languages.

Daniele Dell’Aglio, Minh Dao-Tran, Jean-Paul Calbimonte, Danh Le Phuoc, Emanuele Della Valle

TAIPAN: Automatic Property Mapping for Tabular Data

The Web encompasses a significant amount of knowledge hidden in entity-attributes tables. Bridging the gap between these tables and the Web of Data thus has the potential to facilitate a large number of applications, including the augmentation of knowledge bases from tables, the search for related tables and the completion of tables using knowledge bases. Computing such bridges is impeded by the poor accuracy of automatic property mapping, the lack of approaches for the discovery of subject columns and the mere size of table corpora. We propose Taipan, a novel approach for recovering the semantics of tables. Our approach begins by identifying subject columns using a combination of structural and semantic features. It then maps binary relations inside a table to predicates from a given knowledge base. Therewith, our solution supports both the tasks of table expansion and knowledge base augmentation. We evaluate our approach on a table dataset generated from real RDF data and a manually curated version of the T2D gold standard. Our results suggest that we outperform the state of the art by up to 85 % F-measure.

Ivan Ermilov, Axel-Cyrille Ngonga Ngomo

Semantic Authoring of Ontologies by Exploration and Elimination of Possible Worlds

We propose a novel approach to ontology authoring that is centered on semantics rather than on syntax. Instead of writing axioms formalizing a domain, the expert is invited to explore the possible worlds of her ontology, and to eliminate those that do not conform to her knowledge. Each elimination generates an axiom that is automatically derived from the explored situation. We have implemented the approach in prototype PEW (Possible World Explorer), and conducted a user study comparing it to Protégé. The results show that more axioms are produced with PEW, without making more errors. More importantly, the produced ontologies are more complete, and hence more deductively powerful, because more negative constraints are expressed.

Sébastien Ferré

An RDF Design Pattern for the Structural Representation and Querying of Expressions

Expressions, such as mathematical formulae, logical axioms, or structured queries, account for a large part of human knowledge. It is therefore desirable to allow for their representation and querying with Semantic Web technologies. We propose an RDF design pattern that fulfills three objectives. The first objective is the structural representation of expressions in standard RDF, so that expressive structural search is made possible. We propose simple Turtle and SPARQL abbreviations for the concise notation of such RDF expressions. The second objective is the automated generation of expression labels that are close to usual notations. The third objective is the compatibility with existing practice and legacy data in the Semantic Web (e.g., SPIN, OWL/RDF). We show the benefits for RDF tools to support this design pattern with the extension of SEWELIS, a tool for guided exploration and edition, and its application to mathematical search.

Sébastien Ferré

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

This paper provides a comparative analysis of the performance of four state-of-the-art distributional semantic models (DSMs) over 11 languages, contrasting the native language-specific models with the use of machine translation over English-based DSMs. The experimental results show that there is a significant improvement (average of 16.7 % for the Spearman correlation) by using state-of-the-art machine translation approaches. The results also show that the benefit of using the most informative corpus outweighs the possible errors introduced by the machine translation. For all languages, the combination of machine translation over the Word2Vec English distributional model provided the best results consistently (average Spearman correlation of0.68).

André Freitas, Siamak Barzegar, Juliano Efson Sales, Siegfried Handschuh, Brian Davis

On Emerging Entity Detection

While large Knowledge Graphs (KGs) already cover a broad range of domains to an extent sufficient for general use, they typically lack emerging entities that are just starting to attract the public interest. This disqualifies such KGs for tasks like entity-based media monitoring, since a large portion of news inherently covers entities that have not been noted by the public before. Such entities are unlinkable, which ultimately means, they cannot be monitored in media streams. This is the first paper that thoroughly investigates all types of challenges that arise from out-of-KG entities for entity linking tasks. By large-scale analytics of news streams we quantify the importance of each challenge for real-world applications. We then propose a machine learning approach which tackles the most frequent but least investigated challenge, i.e., when entities are missing in the KG and cannot be considered by entity linking systems. We construct a publicly available benchmark data set based on English news articles and editing behavior on Wikipedia. Our experiments show that predicting whether an entity will be added to Wikipedia is challenging. However, we can reliably identify emerging entities that could be added to the KG according to Wikipedia’s own notability criteria.

Michael Färber, Achim Rettinger, Boulos El Asmar

Framester: A Wide Coverage Linguistic Linked Data Hub

Semantic web applications leveraging NLP can benefit from easy access to expressive lexical resources such as FrameNet. However, the usefulness of FrameNet is affected by its limited coverage and non-standard semantics. The access to existing linguistic resources is also limited because of poor connectivity among them. We present some strategies based on Linguistic Linked Data to broaden FrameNet coverage and formal linkage of lexical and factual resources. We created a novel resource, Framester, which acts as a hub between FrameNet, WordNet, VerbNet, BabelNet, DBpedia, Yago, DOLCE-Zero, as well as other resources. Framester is not only a strongly connected knowledge graph, but also applies a rigorous formal treatment for Fillmore’s frame semantics, enabling full-fledged OWL querying and reasoning on a large frame-based knowledge graph. We also describe Word Frame Disambiguation, an application that reuses Framester data as a base in order to perform frame detection from text, with results comparable in precision to the state of the art, but with a much higher coverage.

Aldo Gangemi, Mehwish Alam, Luigi Asprino, Valentina Presutti, Diego Reforgiato Recupero

An Investigation of Definability in Ontology Alignment

The ability to rewrite defined ontological entities into syntactically different, but semantically equivalent forms is an important property of Definability. While rewriting has been extensively studied, the practical applicability of currently existing methods is limited, as they are bounded to particular Description Logics (DLs), and they often present only theoretical results. Moreover, these efforts focus on computing single definitions, whereas the ability to find the complete set of alternatives, or even just their signature, can support ontology alignment, and semantic interoperability in general. As the number of possible rewritings is potentially exponential in the size of the ontology, we present a novel approach that provides a comprehensive and efficient way to compute in practice all definition signatures of the feasible (given pre-defined complexity bounds) defined entities described using a DL language for which a particular definability property holds (Beth definability). This paper assesses the prevalence, extent and merits of definability over large and diverse corpora, and lays the basis for its use in ontology alignment.

David Geleta, Terry R. Payne, Valentina Tamma

Alligator: A Deductive Approach for the Integration of Industry 4.0 Standards

Industry 4.0 standards, such as AutomationML, are used to specify properties of mechatronic elements in terms of views, such as electrical and mechanical views of a motor engine. These views have to be integrated in order to obtain a complete model of the artifact. Currently, the integration requires user knowledge to manually identify elements in the views that refer to the same element in the integrated model. Existing approaches are not able to scale up to large models where a potentially large number of conflicts may exist across the different views of an element. To overcome this limitation, we developed Alligator, a deductive rule-based system able to identify conflicts between AutomationML documents. We define a Datalog-based representation of the AutomationML input documents, and a set of rules for identifying conflicts. A deductive engine is used to resolve the conflicts, to merge the input documents and produce an integrated AutomationML document. Our empirical evaluation of the quality of Alligator against a benchmark of AutomationML documents suggest that Alligator accurately identifies various types of conflicts between AutomationML documents, and thus helps increasing the scalability, efficiency, and coherence of models for Industry 4.0 manufacturing environments.

Irlán Grangel-González, Diego Collarana, Lavdim Halilaj, Steffen Lohmann, Christoph Lange, María-Esther Vidal, Sören Auer

Combining Textual and Graph-Based Features for Named Entity Disambiguation Using Undirected Probabilistic Graphical Models

Named Entity Disambiguation (NED) is the task of disambiguating named entities in a natural language text by linking them to their corresponding entities in a knowledge base such as DBpedia, which are already recognized. It is an important step in transforming unstructured text into structured knowledge. Previous work on this task has proven a strong impact of graph-based methods such as PageRank on entity disambiguation. Other approaches rely on distributional similarity between an article and the textual description of a candidate entity. However, the combined impact of these different feature groups has not been explored to a sufficient extent. In this paper, we present a novel approach that exploits an undirected probabilistic model to combine different types of features for named entity disambiguation. Capitalizing on Markov Chain Monte Carlo sampling, our model is capable of exploiting complementary strengths between both graph-based and textual features. We analyze the impact of these features and their combination on named entity disambiguation. In an evaluation on the GERBIL benchmark, our model compares favourably to the current state-of-the-art in 8 out of 14 data sets.

Sherzod Hakimov, Hendrik ter Horst, Soufian Jebbara, Matthias Hartung, Philipp Cimiano

VoCol: An Integrated Environment to Support Version-Controlled Vocabulary Development

Vocabularies are increasingly being developed on platforms for hosting version-controlled repositories, such as GitHub. However, these platforms lack important features that have proven useful in vocabulary development. We present VoCol, an integrated environment that supports the development of vocabularies using Version Control Systems. VoCol is based on a fundamental model of vocabulary development, consisting of the three core activities modeling, population, and testing. We implemented VoCol using a loose coupling of validation, querying, analytics, visualization, and documentation generation components on top of a standard Git repository. All components, including the version-controlled repository, can be configured and replaced with little effort to cater for various use cases. We demonstrate the applicability of VoCol with a real-world example and report on a user study that confirms its usability and usefulness.

Lavdim Halilaj, Niklas Petersen, Irlán Grangel-González, Christoph Lange, Sören Auer, Gökhan Coskun, Steffen Lohmann

Event-Based Recognition of Lived Experiences in User Reviews

User reviews on the web are an important source of opinions on products and services. For a popular product or service, the number of reviews can be large. Therefore, it may be difficult for a potential customer to read all of them and make a decision. We hypothesize and test if lived experiences from reviews may support the confidence of a user in a review. We identify and extract such lived experiences with a novel technique based on machine reading. Our experimental results demonstrate the effectiveness of the technique.

Ehab Hassan, Davide Buscaldi, Aldo Gangemi

An Evolutionary Algorithm to Learn SPARQL Queries for Source-Target-Pairs

Finding Patterns for Human Associations in DBpedia

Efficient usage of the knowledge provided by the Linked Data community is often hindered by the need for domain experts to formulate the right SPARQL queries to answer questions. For new questions they have to decide which datasets are suitable and in which terminology and modelling style to phrase the SPARQL query.In this work we present an evolutionary algorithm to help with this challenging task. Given a training list of source-target node-pair examples our algorithm can learn patterns (SPARQL queries) from a SPARQL endpoint. The learned patterns can be visualised to form the basis for further investigation, or they can be used to predict target nodes for new source nodes.Amongst others, we apply our algorithm to a dataset of several hundred human associations (such as “circle - square”) to find patterns for them in DBpedia. We show the scalability of the algorithm by running it against a SPARQL endpoint loaded with $$> 7.9$$>7.9 billion triples. Further, we use the resulting SPARQL queries to mimic human associations with a Mean Average Precision (MAP) of $$39.9\,\%$$39.9% and a Recall@10 of $$63.9\,\%$$63.9%.

Jörn Hees, Rouven Bauer, Joachim Folz, Damian Borth, Andreas Dengel

Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling

Place name disambiguation is the task of correctly identifying a place from a set of places sharing a common name. It contributes to tasks such as knowledge extraction, query answering, geographic information retrieval, and automatic tagging. Disambiguation quality relies on the ability to correctly identify and interpret contextual clues, complicating the task for short texts. Here we propose a novel approach to the disambiguation of place names from short texts that integrates two models: entity co-occurrence and topic modeling. The first model uses Linked Data to identify related entities to improve disambiguation quality. The second model uses topic modeling to differentiate places based on the terms used to describe them. We evaluate our approach using a corpus of short texts, determine the suitable weight between models, and demonstrate that a combined model outperforms benchmark systems such as DBpedia Spotlight and Open Calais in terms of F1-score and Mean Reciprocal Rank.

Yiting Ju, Benjamin Adams, Krzysztof Janowicz, Yingjie Hu, Bo Yan, Grant McKenzie

Relating Some Stuff to Other Stuff

Traceability in food and medicine supply chains has to handle stuffs—entities such as milk and starch indicated with mass nouns—and their portions and parts that get separated and put together to make the final product. Implementations have underspecified ‘links’, if at all, and theoretical accounts from philosophy and in domain ontologies are incomplete as regards the relations involved. To solve this issue, we define seven relations for portions and stuff-parts, which are temporal where needed. The resulting theory distinguishes between the extensional and intensional level, and between amount of stuff and quantity. With application trade-offs, this has been implemented as an extension to the Stuff Ontology core ontology that now also imports a special purpose module of the Ontology of units of Measure for quantities. Although atemporal, some automated reasoning for traceability is still possible thanks to using property chains to approximate the relevant temporal aspects.

C. Maria Keet

A Model for Verbalising Relations with Roles in Multiple Languages

Natural language renderings of ontologies facilitate communication with domain experts. While for ontologies with terms in English this is fairly straightforward, it is problematic for grammatically richer languages due to conjugation of verbs, an article that may be dependent on the preposition, or a preposition that modifies the noun. There is no systematic way to deal with such ‘complex’ names of OWL object properties, or their verbalisation with existing language models for annotating ontologies. The modifications occur only when the object performs some role in a relation, so we propose a conceptual model that can handle this. This requires reconciling the standard view with relational expressions to a positionalist view, which is included in the model and in the formalisation of the mapping between the two. This eases verbalisation and it allows for a more precise representation of the knowledge, yet is still compatible with existing technologies. We have implemented it as a Protégé plugin and validated its adequacy with several languages that need it, such as German and isiZulu.

C. Maria Keet, Takunda Chirema

Dependencies Between Modularity Metrics Towards Improved Modules

Recent years have seen many advances in ontology modularisation. This has made it difficult to determine whether a module is actually a good module; it is unclear which metrics should be considered. The few existing works on evaluation metrics focus on only some metrics that suit the modularisation technique, and there is not always a quantitative approach to calculate them. Overall, the metrics are not comprehensive enough to apply to a variety of modules and it is unclear which metrics fare well with particular types of ontology modules. To address this, we create a comprehensive list of module evaluation metrics with quantitative measures. These measures were implemented in the new Tool for Ontology Module Metrics (TOMM) which was then used in a testbed to test these metrics with existing modules. The results obtained, in turn, uncovered which metrics fare well with which module types, i.e., which metrics need to be measured to determine whether a module of some type is a ‘good’ module.

Zubeida Casmod Khan, C. Maria Keet

Travel Attractions Recommendation with Knowledge Graphs

Selecting relevant travel attractions for a given user is a real and important problem from both a traveller’s and a travel supplier’s perspectives. Knowledge graphs have been used to conduct recommendations of music artists, movies and books. In this paper, we identify how knowledge graphs might be efficiently leveraged to recommend travel attractions. We improve two main drawbacks in existing systems where semantic information is exploited: semantic poorness and city-agnostic user profiling strategy. Accordingly, we constructed a rich world scale travel knowledge graph from existing large knowledge graphs namely Geonames, DBpedia and Wikidata. The underlying ontology contains more than 1200 classes to describe attractions. We applied a city-dependent user profiling strategy that makes use of the fine semantics encoded in the constructed graph. Our evaluation on YFCC100M dataset showed that our approach achieves a 5.3 % improvement in terms of F1-score, a 4.3 % improvement in terms of nDCG compared with the state-of-the-art approach.

Chun Lu, Philippe Laublet, Milan Stankovic

Making Entailment Set Changes Explicit Improves the Understanding of Consequences of Ontology Authoring Actions

The consequences of adding or removing axioms are difficult to apprehend for ontology authors using the Web Ontology Language (OWL). Consequences of modelling actions range from unintended inferences to outright defects such as incoherency or even inconsistency. One of the central ontology authoring activities is verifying that a particular modelling step has had the intended consequences, often with the help of reasoners. For users of Protégé, this involves, for example, exploring the inferred class hierarchy.We explore the hypothesis that making changes to key entailment sets explicit improves verification compared to the standard static hierarchy/frame-based approach. We implement our approach as a Protégé plugin and conduct an exploratory study to isolate the authoring actions for which users benefit from our approach. In a second controlled study we address our hypothesis and find that, for a set of key authoring problems, making entailment set changes explicit improves the understanding of consequences both in terms of correctness and speed, and is rated as the preferred way to track changes compared to a static hierarchy/frame-based view.

Nicolas Matentzoglu, Markel Vigo, Caroline Jay, Robert Stevens

Data 2 Documents: Modular and Distributive Content Management in RDF

Content Management Systems haven’t gained much from the Linked Data uptake, and sharing content between different websites and systems is hard. On the other side, using Linked Data in web documents is not as trivial as managing regular web content using a CMS. To address these issues, we present a method for creating human readable web documents out of machine readable web data, focussing on modularity and re-use. A vocabulary is introduced to structure the knowledge involved in these tasks in a modular and distributable fashion. The vocabulary has a strong relation with semantic elements in HTML5 and allows for a declarative form of content management expressed in RDF. We explain and demonstrate the vocabulary using concrete examples with RDF data from various sources and present a user study in two sessions involving (semantic) web experts and computer science students.

Niels Ockeloen, Victor de Boer, Tobias Kuhn, Guus Schreiber

TechMiner: Extracting Technologies from Academic Publications

In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.

Francesco Osborne, Hélène de Ribaupierre, Enrico Motta

Ontology Learning in the Deep

Recent developments in the area of deep learning have been proved extremely beneficial for several natural language processing tasks, such as sentiment analysis, question answering, and machine translation. In this paper we exploit such advances by tailoring the ontology learning problem as a transductive reasoning task that learns to convert knowledge from natural language to a logic-based specification. More precisely, using a sample of definitory sentences generated starting by a synthetic grammar, we trained Recurrent Neural Network (RNN) based architectures to extract OWL formulae from text. In addition to the low feature engineering costs, our system shows good generalisation capabilities over the lexicon and the syntactic structure. The encouraging results obtained in the paper provide a first evidence of the potential of deep learning techniques towards long term ontology learning challenges such as improving domain independence, reducing engineering costs, and dealing with variable language forms.

Giulio Petrucci, Chiara Ghidini, Marco Rospocher

Interest Representation, Enrichment, Dynamics, and Propagation: A Study of the Synergetic Effect of Different User Modeling Dimensions for Personalized Recommendations on Twitter

Microblogging services such as Twitter have been widely adopted due to the highly social nature of interactions they have facilitated. With the rich information generated by users on these services, user modeling aims to acquire knowledge about a user’s interests, which is a fundamental step towards personalization as well as recommendations. To this end, researchers have explored different dimensions such as (1) Interest Representation, (2) Content Enrichment, (3) Temporal Dynamics of user interests, and (4) Interest Propagation using semantic information from a knowledge base such as DBpedia. However, those dimensions of user modeling have largely been studied separately, and there is a lack of research on the synergetic effect of those dimensions for user modeling. In this paper, we address this research gap by investigating 16 different user modeling strategies produced by various combinations of those dimensions. Different user modeling strategies are evaluated in the context of a personalized link recommender system on Twitter. Results show that Interest Representation and Content Enrichment play crucial roles in user modeling, followed by Temporal Dynamics. The user modeling strategy considering Interest Representation, Content Enrichment and Temporal Dynamics provides the best performance among the 16 strategies. On the other hand, Interest Propagation has little effect on user modeling in the case of leveraging a rich Interest Representation or considering Content Enrichment.

Guangyuan Piao, John G. Breslin

Integrating New Refinement Operators in Terminological Decision Trees Learning

The problem of predicting the membership w.r.t. a target concept for individuals of Semantic Web knowledge bases can be cast as a concept learning problem, whose goal is to induce intensional definitions describing the available examples. However, the models obtained through the methods borrowed from Inductive Logic Programming e.g. Terminological Decision Trees, may be affected by two crucial aspects: the refinement operators for specializing the concept description to be learned and the heuristics employed for selecting the most promising solution (i.e. the concept description that describes better the examples). In this paper, we started to investigate the effectiveness of Terminological Decision Tree and its evidential version when a refinement operator available in DL-Learner and modified heuristics are employed. The evaluation showed an improvement in terms of the predictiveness.

Giuseppe Rizzo, Nicola Fanizzi, Jens Lehmann, Lorenz Bühmann

SEON: A Software Engineering Ontology Network

Software Engineering (SE) is a wide domain, where ontologies are useful instruments for dealing with Knowledge Management (KM) related problems. When SE ontologies are built and used in isolation, some problems remain, in particular those related to knowledge integration. The goal of this paper is to provide an integrated solution for better dealing with KM-related problems in SE by means of a Software Engineering Ontology Network (SEON). SEON is designed with mechanisms for easing the development and integration of SE domain ontologies. The current version of SEON includes core ontologies for software and software processes, as well as domain ontologies for the main technical software engineering subdomains, namely requirements, design, coding and testing. We discuss the development of SEON and some of its envisioned applications related to KM.

Fabiano Borges Ruy, Ricardo de Almeida Falbo, Monalessa Perini Barcellos, Simone Dornelas Costa, Giancarlo Guizzardi

Discovering Ontological Correspondences Through Dialogue

Whilst significant attention has been given to centralised approaches for aligning full ontologies, limited attention has been given to the problem of aligning partially exposed ontologies in a decentralised setting. Traditional ontology alignment techniques rely on the full disclosure of the ontological models that find the “best” set of correspondences that map entities from one ontology to another. However, within open and opportunistic environments, such approaches may not always be pragmatic or even acceptable (due to privacy concerns). We present a novel dialogue based negotiation mechanism that supports the strategic agreement over correspondences between agents with limited or no prior knowledge of their opponent’s ontology. This mechanism allows both agents to reach a mutual agreement over an alignment through the selective disclosure of their ontological model, and facilitates rational choices on the grounds of their ontological knowledge and their specific strategies. We formally introduce the dialogue mechanism, and discuss its behaviour, properties and outcomes.

Gabrielle Santos, Terry R. Payne, Valentina Tamma, Floriana Grasso

IoT-O, a Core-Domain IoT Ontology to Represent Connected Devices Networks

Smart objects are now present in our everyday lives, and the Internet of Things is expanding both in number of devices and in volume of produced data. These devices are deployed in dynamic ecosystems, with spatial mobility constraints, intermittent network availability depending on many parameters (e.g. battery level or duty cycle), etc. To capture knowledge describing such evolving systems, open, shared and dynamic knowledge representations are required. These representations should also have the ability to adapt over time to the changing state of the world. That is why we propose IoT-O, a core-domain modular IoT ontology proposing a vocabulary to describe connected devices and their relation with their environment. First, existing IoT ontologies are described and compared to requirements an IoT ontology should be compliant with. Then, after a detailed description of its modules, IoT-O is instantiated in a home automation use case to illustrate how it supports the description of evolving systems.

Nicolas Seydoux, Khalil Drira, Nathalie Hernandez, Thierry Monteil

AutoMap4OBDA: Automated Generation of R2RML Mappings for OBDA

Ontology-Based Data Access (OBDA) has become a popular paradigm for the integration of heterogeneous data. The key components of an OBDA system are the mappings between the data source and the target ontology. The great efforts required to create manual mappings are still a significant barrier to adopting the OBDA. Current relational-to-ontology mapping generators are far from providing 100 % of the mappings required in real-world problems. To overcome this issue we present AutoMap4OBDA, a system which automatically generates R2RML mappings based on the intensive use of relational source contents and features of the target ontology. Ontology learning techniques are applied to infer class hierarchies, the string similarity metrics are selected based on the target ontology labels, and graph structures are applied to generate the mappings. We have used the RODI benchmarking suite to evaluate AutoMap4OBDA which outperforms the most advanced state-of-the-art mapping generators.

Álvaro Sicilia, German Nemirovski

Word Tagging with Foundational Ontology Classes: Extending the WordNet-DOLCE Mapping to Verbs

Semantic annotation is fundamental to deal with large-scale lexical information, mapping the information to an enumerable set of categories over which rules and algorithms can be applied, and foundational ontology classes can be used as a formal set of categories for such tasks. A previous alignment between WordNet noun synsets and DOLCE provided a starting point for ontology-based annotation, but in NLP tasks verbs are also of substantial importance. This work presents an extension to the WordNet-DOLCE noun mapping, aligning verbs according to their links to nouns denoting perdurants, transferring to the verb the DOLCE class assigned to the noun that best represents that verb’s occurrence. To evaluate the usefulness of this resource, we implemented a foundational ontology-based semantic annotation framework, that assigns a high-level foundational category to each word or phrase in a text, and compared it to a similar annotation tool, obtaining an increase of 9.05 % in accuracy.

Vivian S. Silva, André Freitas, Siegfried Handschuh

Locating Things in Space and Time: Verification of the SUMO Upper-Level Ontology

Upper-level ontologies provide an account of the most basic, domain independent entities, such as time, space, objects and processes. They are intended to be broadly reused, among others, during ontology engineering tasks, such as ontology building and integration. Ontology verification is the process by which a theory is checked to rule out its unintended models, and possibly characterize missing intended ones. In this paper, we translate into first-order logic, modularize, and verify the subtheory of location of entities in space and time of the Suggested Upper Merged Ontology (SUMO). As a result, we propose the addition of some axioms that rule out unintended models in SUMO, the correction of others, and make available a modularized version of SUMO characterization of location of entities in time and space represented in standard first-order logic.

Lydia Silva Muñoz, Michael Grüninger

Detecting Meaningful Compounds in Complex Class Labels

Real-world ontologies such as, for instance, those for the medical domain often represent highly specific, fine-grained concepts using complex labels that consist of a sequence of sublabels. In this paper, we investigate the problem of automatically detecting meaningful compounds in such complex class labels to support methods that require an automatic understanding of their meaning such as, for example, ontology matching, ontology learning and semantic search. We formulate compound identification as a supervised learning task and investigate a variety of heterogeneous features, including statistical (i.e., knowledge-lean) as well as knowledge-based, for the task at hand. Our classifiers are trained and evaluated using a manually annotated dataset consisting of about 300 complex labels taken from real-world ontologies, which we designed to provide a benchmarking gold standard for this task. Experimental results show that by using a combination of distributional and knowledge-based features we are able to reach an accuracy of more than 90 % for compounds of length one and almost 80 % for compounds of length two. Finally, we evaluate our method in an extrinsic experimental setting: this consists of a use case highlighting the benefits of using automatically identified compounds for the high-end semantic task of ontology matching.

Heiner Stuckenschmidt, Simone Paolo Ponzetto, Christian Meilicke

Categorization Power of Ontologies with Respect to Focus Classes

When reusing existing ontologies, preference might be given to those providing extensive subcategorization for the classes deemed important in the new ontology (focus classes). The reused set of categories may not only consist of named classes but also of some compound concept expressions that could be viewed as meaningful categories by human ontologist. We define the general notion of focused ontologistic categorization power; for the sake of tractable experiments we then choose a restricted concept expression language and map it to syntactic axiom patterns. The occurrence of the patterns has been verified in two ontology collections, and for a sample of pattern instances their ontologistic status has been assessed by different groups of users.

Vojtěch Svátek, Ondřej Zamazal, Miroslav Vacura

Selecting Optimal Background Knowledge Sources for the Ontology Matching Task

It is a common practice to rely on background knowledge (BK) in order to assist and improve the ontology matching process. The choice of an appropriate source of background knowledge for a given matching task, however, remains a vastly unexplored question. In the current paper, we propose an automatic BK selection approach that does not depend on an initial direct matching, can handle multilingualism and is domain independent. The approach is based on the construction of an index for a set of BK candidates. The couple of ontologies to be aligned is modeled as a query with respect to the indexed BK sources and the best candidate is selected within an information retrieval paradigm. We evaluate our system in a series of experiments in both general-purpose and domain-specific matching scenarios. The results show that our approach is capable of selecting the BK that provides the best alignment quality with respect to a given reference alignment for each of the considered matching tasks.

Abdel Nasser Tigrine, Zohra Bellahsene, Konstantin Todorov

Considering Semantics on the Discovery of Relations in Knowledge Graphs

Knowledge graphs encode semantic knowledge that can be exploited to enhance different data-driven tasks, e.g., query answering, data mining, ranking or recommendation. However, knowledge graphs may be incomplete, and relevant relations may be not included in the graph, affecting accuracy of these data-driven tasks. We tackle the problem of relation discovery in a knowledge graph, and devise $$\mathcal {KOI}$$KOI, a semantic based approach able to discover relations in portions of knowledge graphs that comprise similar entities. $$\mathcal {KOI}$$KOI exploits both datatype and object properties to compute the similarity among entities, i.e., two entities are similar if their datatype and object properties have similar values. $$\mathcal {KOI}$$KOI implements graph partitioning techniques that exploit similarity values to discover relations from knowledge graph partitions. We conduct an experimental study on a knowledge graph of TED talks with state-of-the-art similarity measures and graph partitioning techniques. Our observed results suggest that $$\mathcal {KOI}$$KOI is able to discover missing edges between related TED talks that cannot be discovered by state-of-the-art approaches. These results reveal that combining semantics encoded both in the similarity measures and in the knowledge graph structure, has a positive impact on the relation discovery problem.

Ignacio Traverso-Ribón, Guillermo Palma, Alejandro Flores, Maria-Esther Vidal

ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment

Crowdsourcing has emerged as a powerful paradigm for quality assessment and improvement of Linked Data. A major challenge of employing crowdsourcing, for quality assessment in Linked Data, is the cold-start problem: how to estimate the reliability of crowd workers and assign the most reliable workers to tasks? We address this challenge by proposing a novel approach for generating test questions from DBpedia based on the topics associated with quality assessment tasks. These test questions are used to estimate the reliability of the new workers. Subsequently, the tasks are dynamically assigned to reliable workers to help improve the accuracy of collected responses. Our proposed approach, ACRyLIQ, is evaluated using workers hired from Amazon Mechanical Turk, on two real-world Linked Data datasets. We validate the proposed approach in terms of accuracy and compare it against the baseline approach of reliability estimate using gold-standard task. The results demonstrate that our proposed approach achieves high accuracy without using gold-standard task.

Umair ul Hassan, Amrapali Zaveri, Edgard Marx, Edward Curry, Jens Lehmann

The Semantic Web in an SMS

Many ICT applications and services, including those from the Semantic Web, rely on the Web for the exchange of data. This includes expensive server and network infrastructures. Most rural areas of developing countries are not reached by the Web and its possibilities, while at the same time the ability to share knowledge has been identified as a key enabler for development. To make widespread knowledge sharing possible in these rural areas, the notion of the Web has to be downscaled based on the specific low-resource infrastructure in place. In this paper, we introduce SPARQL over SMS, a solution for Web-like exchange of RDF data over cellular networks in which HTTP is substituted by SMS. We motivate and validate this through two use cases in West Africa. We present the design and implementation of the solution, along with a data compression method that combines generic compression strategies and strategies that use Semantic Web specific features to reduce the size of RDF before it is transferred over the low-bandwidth cellular network.

Onno Valkering, Victor de Boer, Gossa Lô, Romy Blankendaal, Stefan Schlobach

Extraction and Visualization of TBox Information from SPARQL Endpoints

The growing amount of data being published as Linked Data has a huge potential, but the usage of this data is still cumbersome, especially for non-technical users. Visualizations can help to get a better idea of the type and structure of the data available in some SPARQL endpoint, and can provide a useful starting point for querying and analysis. We present an approach for the extraction and visualization of TBox information from Linked Data. SPARQL queries are used to infer concept information from the ABox of a given endpoint, which is then gradually added to an interactive VOWL graph visualization. We implemented the approach in a web application, which was tested on several SPARQL endpoints and evaluated in a qualitative user study with promising results.

Marc Weise, Steffen Lohmann, Florian Haag

In-Use Papers


Learning Domain Labels Using Conceptual Fingerprints: An In-Use Case Study in the Neurology Domain

Modelling a science domain for the purposes of thematically categorizing the research work and enabling better browsing and search can be a daunting task, especially if a specialized taxonomy or ontology does not exist for this domain. Elsevier, the largest academic publisher, faces this challenge often, for the needs of supporting the journals submission system, but also for supplying ScienceDirect and Scopus, two flagship platforms of the company, with sufficient metadata, such as conceptual labels that characterize the research works, which can improve the user experience in browsing and searching the literature. In this paper we describe an Elsevier in-use case study of learning appropriate domain labels from a collection of 6, 357 full text articles in the neurology domain, exploring different document representations and clustering mechanisms. Besides the baseline approaches for document representation (e.g., bag-of-words) and their variations (e.g., n-grams), we employ a novel in-house methodology which produces conceptual fingerprints of the research articles, starting from a general domain taxonomy, such as the Medical Subject Headings (MeSH). A thorough empirical evaluation is presented, using a variety of clustering mechanisms and several validity indices to evaluate the resulting clusters. Our results summarize the best practices in modelling this specific domain and we report on the advantages and disadvantages of using the different clustering mechanisms and document representations that were examined, with the aim to learn appropriate conceptual labels for this domain.

Zubair Afzal, George Tsatsaronis, Marius Doornenbal, Pascal Coupet, Michelle Gregory

Semantic Business Process Regulatory Compliance Checking Using LegalRuleML

Legal documents are the source of norms, guidelines, and rules that often feed into different applications. In this perspective, to foster the need of development and deployment of different applications, it is important to have a sufficiently expressive conceptual framework such that various heterogeneous aspects of norms can be modeled and reasoned with. In this paper, we investigate how to exploit Semantic Web technologies and languages, such as LegalRuleML, to model a legal document. We show how the semantic annotations can be used to empower a business process (regulatory) compliance system and discuss the challenges of adapting a semantic approach to legal domain.

Guido Governatori, Mustafa Hashmi, Ho-Pun Lam, Serena Villata, Monica Palmirani

An Open Repository Model for Acquiring Knowledge About Scientific Experiments

The availability of high-quality metadata is key to facilitating discovery in the large variety of scientific datasets that are increasingly becoming publicly available. However, despite the recent focus on metadata, the diversity of metadata representation formats and the poor support for semantic markup typically result in metadata that are of poor quality. There is a pressing need for a metadata representation format that provides strong interoperation capabilities together with robust semantic underpinnings. In this paper, we describe such a format, together with open-source Web-based tools that support the acquisition, search, and management of metadata. We outline an initial evaluation using metadata from a variety of biomedical repositories.

Martin J. O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, Mark A. Musen

OpenResearch: Collaborative Management of Scholarly Communication Metadata

Scholars often need to search for matching, high-profile scientific events to publish their research results. Information about topical focus and quality of events is not made sufficiently explicit in the existing communication channels where events are announced. Therefore, scholars have to spend a lot of time on reading and assessing calls for papers but might still not find the right event. Additionally, events might be overlooked because of the large number of events announced every day. We introduce OpenResearch, a crowd sourcing platform that supports researchers in collecting, organizing, sharing and disseminating information about scientific events in a structured way. It enables quality-related queries over a multidisciplinary collection of events according to a broad range of criteria such as acceptance rate, sustainability of event series, and reputation of people and organizations. Events are represented in different views using map extensions, calendar and time-line visualizations. We have systematically evaluated the timeliness, usability and performance of OpenResearch.

Sahar Vahdati, Natanael Arndt, Sören Auer, Christoph Lange

Position Paper


Data-Driven RDF Property Semantic-Equivalence Detection Using NLP Techniques

DBpedia extracts most of its data from Wikipedia’s infoboxes. Manually-created “mappings” link infobox attributes to DBpedia ontology properties (dbo properties) producing most used DBpedia triples. However, infoxbox attributes without a mapping produce triples with properties in a different namespace (dbp properties). In this position paper we point out that (a) the number of triples containing dbp properties is significant compared to triples containing dbo properties for the DBpedia instances analyzed, (b) the SPARQL queries made by users barely use both dbp and dbo properties simultaneously, (c) as an exploitation example we show a method to automatically enhance SPARQL queries by using syntactic and semantic similarities between dbo properties and dbp properties.

Mariano Rico, Nandana Mihindukulasooriya, Asunción Gómez-Pérez


Weitere Informationen

Premium Partner