Automatic Ontology Population from Product Catalogs

In this paper we present an approach for ontology population based on heterogeneous documents describing commercial products with various descriptions and diverse styles. The originality is the generation and progressive refinement of semantic annotations leading to identify the types of the products and their features whereas the initial information is very poor quality. Documents are annotated using an ontology. The annotation process is based on an initial set of known instances, this set being built from terminological elements added in the ontology. Our approach first uses semi-automated annotation techniques on a small dataset and then applies machine learning techniques in order to fully annotate the entire dataset. This work was motivated by specific application needs. Experimentations were conducted on real-world datasets in the toys domain.

Céline Alec, Chantal Reynaud-Delaître, Brigitte Safar, Zied Sellami, Uriel Berdugo

Measuring Similarity in Ontologies: A New Family of Measures

Several attempts have been already made to develop similarity measures for ontologies. We noticed that some existing similarity measures are ad-hoc and unprincipled. In addition, there is still a need for similarity measures which are applicable to expressive Description Logics and which are terminological. To address these requirements, we have developed a new family of similarity measures. Two separate empirical studies have been carried out to evaluate the new measures. First, we compare the new measures along with some existing measures against a gold-standard. Second, we examine the practicality of using the new measures over an independently motivated corpus of ontologies.

Tahani Alsubait, Bijan Parsia, Uli Sattler

Relation Extraction from the Web Using Distant Supervision

Extracting information from Web pages requires the ability to work at Web scale in terms of the number of documents, the number of domains and domain complexity. Recent approaches have used existing knowledge bases to learn to extract information with promising results. In this paper we propose the use of distant supervision for relation extraction from the Web. Distant supervision is a method which uses background information from the Linking Open Data cloud to automatically label sentences with relations to create training data for relation classifiers. Although the method is promising, existing approaches are still not suitable for Web extraction as they suffer from three main issues: data sparsity, noise and lexical ambiguity. Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains, as well as extracting relations across sentence boundaries. We reduce the noise caused by lexical ambiguity by employing statistical methods to strategically select training data. Our experiments show that using a more robust entity recognition approach and expanding the scope of relation extraction results in about 8 times the number of extractions, and that strategically selecting training data can result in an error reduction of about 30%.

Isabelle Augenstein, Diana Maynard, Fabio Ciravegna

Inductive Lexical Learning of Class Expressions

Despite an increase in the number of knowledge bases published according to Semantic Web W3C standards, many of those consist primarily of instance data and lack sophisticated schemata, although the availability of such schemata would allow more powerful querying, consistency checking and debugging as well as improved inference. One of the reasons why schemata are still rare is the effort required to create them. Consequently, numerous ontology learning approaches have been developed to simplify the creation of schemata. Those approaches usually either learn structures from text or existing RDF data. In this submission, we present the first approach combining both sources of evidence, in particular we combine an existing logical learning approach with statistical relevance measures applied on textual resources. We perform an experiment involving a manual evaluation on 100 classes of the DBpedia 3.9 dataset and show that the inclusion of relevance measures leads to a significant improvement of the accuracy over the baseline algorithm.

Lorenz Bühmann, Daniel Fleischhacker, Jens Lehmann, Andre Melo, Johanna Völker

Question Generation from a Knowledge Base

When designing the natural language question asking interface for a formal knowledge base, managing and scoping the user expectations regarding what questions the system can answer is a key challenge. Allowing users to type ask arbitrary English questions will likely result in user frustration, because the system may be unable to answer many questions even if it correctly understands the natural language phrasing. We present a technique for responding to natural language questions, by suggesting a series of questions that the system can actually answer. We also show that the suggested questions are useful in a variety of ways in an intelligent textbook to improve student learning.

Vinay K. Chaudhri, Peter E. Clark, Adam Overholtzer, Aaron Spaulding

Inconsistency Monitoring in a Large Scientific Knowledge Base

Large scientific knowledge bases (KBs) are bound to contain inconsistencies and under-specified knowledge. Inconsistencies are inherent because the approach to modeling certain phenomena evolves over time, and at any given time, contradictory approaches to modeling a piece of domain knowledge may simultaneously exist in the KB. Underspecification is inherent because a large, complex KB is rarely fully specified, especially when authored by domain experts who are not formally trained in knowledge representation. We describe our approach for inconsistency monitoring in a large biology KB. We use a combination of anti-patterns that are indicative of poor modeling and inconsistencies due to underspecification. We draw the following lessons from this experience: (1) knowledge authoring must include an intermediate step between authoring and run time inference to identify errors and inconsistencies; (2) underspecification can ease knowledge encoding but requires appropriate user control; and (3) since

real-life

KBs are rarely consistent, a scheme to derive useful conclusions in spite of inconsistencies is essential.

Vinay K. Chaudhri, Rahul Katragadda, Jeff Shrager, Michael Wessel

Pay-As-You-Go Multi-user Feedback Model for Ontology Matching

Using our multi-user model, a community of users provides feedback in a pay-as-you-go fashion to the ontology matching process by validating the mappings found by automatic methods, with the following advantages over having a single user: the effort required from each user is reduced, user errors are corrected, and consensus is reached. We propose strategies that dynamically determine the order in which the candidate mappings are presented to the users for validation. These strategies are based on mapping quality measures that we define. Further, we use a propagation method to leverage the validation of one mapping to other mappings. We use an extension of the AgreementMaker ontology matching system and the Ontology Alignment Evaluation Initiative (OAEI) Benchmarks track to evaluate our approach. Our results show how Fmeasure and robustness vary as a function of the number of user validations. We consider different user error and revalidation rates (the latter measures the number of times that the same mapping is validated). Our results highlight complex trade-offs and point to the benefits of dynamically adjusting the revalidation rate.

Isabel F. Cruz, Francesco Loprete, Matteo Palmonari, Cosmin Stroe, Aynaz Taheri

Information Flow within Relational Multi-context Systems

Multi-context systems (MCSs) are an important framework for heterogeneous combinations of systems within the Semantic Web. In this paper, we propose generic constructions to achieve specific forms of interaction in a principled way, and systematize some useful techniques to work with ontologies within an MCS. All these mechanisms are presented in the form of general-purpose design patterns. Their study also suggests new ways in which this framework can be further extended.

Luís Cruz-Filipe, Graça Gaspar, Isabel Nunes

Using Linked Data to Diversify Search Results a Case Study in Cultural Heritage

In this study we consider wether, and to what extent, additional semantics in the form of Linked Data can help diversifying search results. We undertake this study in the domain of cultural heritage. The data consists of collection data of the Rijksmuseum Amsterdam together with a number of relevant external vocabularies, which are all published as Linked Data. We apply an existing graph search algorithm to this data, using entries from the museum query log as test set. The results show that in this domain an increase in diversity can be achieved through adding external vocabularies. We also analyse why some vocabularies have a significant effect, while others influence the results only marginally.

Chris Dijkshoorn, Lora Aroyo, Guus Schreiber, Jan Wielemaker, Lizzy Jongma

Personalised Access to Linked Data

Recent efforts in the Semantic Web community have been primarily focused on developing technical infrastructure and technologies for efficient Linked Data acquisition, publishing and interlinking. Nevertheless, due to the huge and diverse amount of information, the actual access to a piece of information in the LOD cloud still demands significant amount of effort. In this paper, we present a novel configurable method for personalised access to Linked Data. The method recommends resources of interest from users with similar tastes. To measure the similarity between the users we introduce a novel resource semantic similarity metric, which takes into account the

commonalities

and

informativeness

of the resources. We validate and evaluate the method on a real-world dataset from the Web services domain. The results show that our method outperforms the other baseline methods in terms of accuracy, serendipity and diversity.

Milan Dojchinovski, Tomas Vitvar

Roadmapping and Navigating in the Ontology Visualization Landscape

Proper visualization is essential for ontology development, sharing and usage; various use cases however pose specific requirements on visualization features. We analyzed several visualization tools from the perspective of use case categories as well as low-level functional features and OWL expressiveness. A rule-based recommender was subsequently developed to help the user choose a suitable visualizer. Both the analysis results and the recommender were evaluated via a questionnaire.

Marek Dudáš, Ondřej Zamazal, Vojtěch Svátek

aLDEAS: A Language to Define Epiphytic Assistance Systems

We propose a graphical language that enables the specification of assistance systems for a given application, by means of a set of rules. This language is completed by several assistance actions patterns. We implemented these propositions through an assistance editor aimed at assistance designers, and a generic assistance engine able to execute the specified assistance for the target-application end users, without a need to modify this application. We performed several experimentations both with assistance designers and with target-applications end users.

Blandine Ginon, Stéphanie Jean-Daubias, Pierre-Antoine Champin, Marie Lefevre

Ontology Design Pattern Property Specialisation Strategies

Ontology Design Patterns (ODPs) show potential in enabling simpler, faster, and more correct Ontology Engineering by laymen and experts. For ODP adoption to take off, improved tool support for ODP use in Ontology Engineering is required. This paper studies and evaluates the effects of strategies for object property specialisation in ODPs, and suggests tool improvements based on those strategies. Results indicate the existence of three previously unstudied strategies for ODP specialisation, the uses of which affect reasoning performance and integration complexity of resulting ontologies.

Karl Hammar

The uComp Protégé Plugin: Crowdsourcing Enabled Ontology Engineering

Crowdsourcing techniques have been shown to provide effective means for solving a variety of ontology engineering problems. Yet, they are mainly being used as external means to ontology engineering, without being closely integrated into the work of ontology engineers. In this paper we investigate how to closely integrate crowdsourcing into ontology engineering practices. Firstly, we show that a set of basic crowdsourcing tasks are used recurrently to solve a range of ontology engineering problems. Secondly, we present the

uComp Protégé plugin

that facilitates the integration of such typical crowdsourcing tasks into ontology engineering work from within the Protégé ontology editing environment. An evaluation of the plugin in a typical ontology engineering scenario where ontologies are built from automatically learned semantic structures, shows that its use reduces the working times for the ontology engineers 11 times, lowers the overall task costs with 40% to 83% depending on the crowdsourcing settings used and leads to data quality comparable with that of tasks performed by ontology engineers.

Florian Hanika, Gerhard Wohlgenannt, Marta Sabou

Futures Studies Methods for Knowledge Management in Academic Research

The management of academic knowledge is a relatively young area of attention. Higher education entities accumulate a great deal of knowledge and the management of this asset is more than ever crucial for the strategic alignment. Hence, this paper aims at showing that knowledge management in academic research should work hand in hand with futures studies to develop and foster a strategic orientation. For this purpose the knowledge management model by Probst et al. (1998) with its eight building blocks serves as a framework. The focus of this paper lies on the processes of

knowledge goals

and

knowledge identification

and it is suggested that the futures studies methods monitoring, scenario technique and forecasting are suitable to complement knowledge management methods within academic research due to their ability to identify and concentrate information and knowledge relevant to the future.

Sabine Kadlubek, Stella Schulte-Cörne, Florian Welter, Anja Richert, Sabina Jeschke

Adaptive Concept Vector Space Representation Using Markov Chain Model

This paper proposes an adaptive document representation (concept vector space model) using Markov Chain model. The vector space representation is one of the most common models for representing documents in classification process. The document classification based on ontology classification approach is represented as a vector, whose components are ontology concepts and their relevance. The relevance is represented the by frequency of concepts’ occurrence. These concepts make various contributions in classification process. The contributions depend on the position of concepts where they are depicted in the ontology hierarchy. The hierarchy such as classes, subclasses and instances may have different values to represent the concepts’ importance. The weights to define concepts’ importance are generally selected by empirical analysis and are usually kept fixed. Thus, making it less effective and time consuming. We therefore propose a new model to automatically estimate weights of concepts within the ontology. This model initially maps the ontology to a Markov chain model and then calculates the transition probability matrix for this Markov chain. Further, the transition probability matrix is used to compute the probability of steady states based on left eigenvectors. Finally, the importance is calculated for each ontology concept. And, an enhanced concept vector space representation is created with concepts’ importance and concepts’ relevance. The concept vector space representation can be adapted for new ontology concepts.

Zenun Kastrati, Ali Shariq Imran

A Core Ontology of Macroscopic Stuff

Domain ontologies contain representations of types of stuff (matter, mass, or substance), such as milk, alcohol, and mud, which are represented in a myriad of ways that are neither compatible with each other nor do they follow a structured approach within the domain ontology. Foundational ontologies and Ontology distinguish between pure stuff and mixtures only, if it contains stuff. We aim to fill this gap between foundational and domain ontologies by applying the notion of a ‘bridging’ core ontology, being an ontology of categories of stuff that is formalised in OWL. This core ontology both refines the DOLCE and BFO foundational ontologies and resolves the main type of interoperability issues with stuffs in domain ontologies, thereby also contributing to better ontology quality. Modelling guidelines are provided to facilitate the Stuff Ontology’s use.

C. Maria Keet

Feasibility of Automated Foundational Ontology Interchangeability

While a foundational ontology can solve interoperability issues among the domain ontologies aligned to it, multiple foundational ontologies have been developed. Thus, there are still interoperability issues among domain ontologies aligned to different foundational ontologies. Questions arise about the feasibility of linking one’s ontology to multiple foundational ontologies to increase its potential for uptake. To answer this, we have developed the tool SUGOI, Software Used to Gain Ontology Interchangeability, which allows a user to interchange automatically a domain ontology among the DOLCE, BFO and GFO foundational ontologies. The success of swapping based on equivalence varies by source ontology, ranging from 2 to 82% and averaging at 36% for the ontologies included in the evaluation. This is due to differences in coverage, notably DOLCE’s qualities and BFO and GFO’s roles, and amount of mappings. SUGOI therefore also uses subsumption mappings so that every domain ontology can be interchanged, preserves the structure of the ontology, and increases its potential for usability.

Zubeida Casmod Khan, C. Maria Keet

Automating Cross-Disciplinary Defect Detection in Multi-disciplinary Engineering Environments

Multi-disciplinary engineering (ME) projects are conducted in complex heterogeneous environments, where participants, originating from different disciplines, e.g., mechanical, electrical, and software engineering, collaborate to satisfy project and product quality as well as time constraints. Detecting defects across discipline boundaries early and efficiently in the engineering process is a challenging task due to heterogeneous data sources. In this paper we explore how Semantic Web technologies can address this challenge and present the Ontology-based Cross-Disciplinary Defect Detection (OCDD) approach that supports automated cross-disciplinary defect detection in ME environments, while allowing engineers to keep their well-known tools, data models, and their customary engineering workflows. We evaluate the approach in a case study at an industry partner, a large-scale industrial automation software provider, and report on our experiences and lessons learned. Major result was that the OCDD approach was found useful in the evaluation context and more efficient than manual defect detection, if cross-disciplinary defects had to be handled.

Olga Kovalenko, Estefanía Serral, Marta Sabou, Fajar J. Ekaputra, Dietmar Winkler, Stefan Biffl

Querying the Global Cube: Integration of Multidimensional Datasets from the Web

National statistical indicators such as the Gross Domestic Product per Capita are published on the Web by various organisations such as Eurostat, the World Bank and the International Monetary Fund. Uniform access to such statistics will allow for elaborate analysis and visualisations. Though many datasets are also available as Linked Data, heterogeneities remain since publishers use several identifiers for common dimensions and differing levels of detail, units, and formulas. For queries over the Global Cube, i.e., the integration of available datasets modelled in the RDF Data Cube Vocabulary, we extend the well-known Drill-Across operation over data cubes to consider implicit overlaps between datasets in Linked Data. To evaluate more complex mappings we define the Convert-Cube operation over values from a single dataset. We generalise the two operations for arbitrary combinations of multiple datasets with the Merge-Cubes operation and show the feasibility of the analytical operations for integrating government statistics.

Benedikt Kämpgen, Steffen Stadtmüller, Andreas Harth

VOWL 2: User-Oriented Visualization of Ontologies

Ontologies become increasingly important as a means to structure and organize information. This requires methods and tools that enable not only ontology experts but also other user groups to work with ontologies and related data. We have developed VOWL, a comprehensive and well-specified visual language for the user-oriented representation of ontologies, and conducted a comparative study on an initial version of VOWL. Based upon results from that study, as well as an extensive review of other ontology visualizations, we have reworked many parts of VOWL. In this paper, we present the new version VOWL 2 and describe how the initial definitions were used to systematically redefine the visual notation. Besides the novelties of the visual language, which is based on a well-defined set of graphical primitives and an abstract color scheme, we briefly describe two implementations of VOWL 2. To gather some insight into the user experience with the new version of VOWL, we have conducted a qualitative user study. We report on the study and its results, which confirmed that not only the general ideas of VOWL but also most of our enhancements for VOWL 2 can be well understood by casual ontology users.

Steffen Lohmann, Stefan Negru, Florian Haag, Thomas Ertl

What Is Linked Historical Data?

Datasets that represent historical sources are relative newcomers in the Linked Open Data (LOD) cloud. Following the standard LOD practices for publishing historical sources raises several questions: how can we distinguish between RDF graphs of primary and secondary sources? Should we treat archived and online RDF graphs differently in historical research? How do we deal with change and immutability of a triplified History? To answer these fundamental questions, we model

historical primary and secondary sources

using the OntoClean metaproperties and the theories of perdurance and endurance. We then use this model to give a definition of Linked Historical Data. We advocate a set of publishing practices for Linked Historical Data that preserve the ontological properties of historical sources.

Albert Meroño-Peñuela, Rinke Hoekstra

A Quality Assurance Workflow for Ontologies Based on Semantic Regularities

Syntactic regularities

or

syntactic patterns

are sets of axioms in an OWL ontology with a regular structure. Detecting these patterns and reporting them in human readable form should help the understanding the authoring style of an ontology and is therefore useful in itself. However, pattern detection is sensitive to syntactic variations in the assertions; axioms that are semantically equivalent but syntactically different can reduce the effectiveness of the technique. Semantic regularity analysis focuses on the knowledge encoded in the ontology, rather than how it is spelled out, which is the focus of syntactic regularity analysis. Cluster analysis of the information provided by an OWL DL reasoner mitigates this sensitivity, providing measurable benefits over purely syntactic patterns - an example being patterns that are instantiated only in the entailments of an ontology. In this paper, we demonstrate, using SNOMED-CT, how the detection of semantic regularities in entailed axioms can be used in ontology quality assurance, in combination with lexical techniques. We also show how the detection of

irregularities

, i.e., deviations from a pattern, are useful for the same purpose. We evaluate and discuss the results of performing a semantic pattern inspection and we compare them against existing work on

syntactic regularity

detection. Systematic extraction of

lexical

,

syntactic

and

semantic patterns

is used and a quality assurance workflow that combines these patterns is presented.

Eleni Mikroyannidi, Manuel Quesada-Martínez, Dmitry Tsarkov, Jesualdo Tomás Fernández Breis, Robert Stevens, Ignazio Palmisano

Adaptive Knowledge Propagation in Web Ontologies

The increasing availability of structured machine-processable knowledge in the

Web of Data

calls for machine learning methods to support standard reasoning based services (such as query-answering and logic inference). Statistical regularities can be efficiently exploited to overcome the limitations of the inherently incomplete knowledge bases distributed across the Web. This paper focuses on the problem of predicting missing class-memberships and property values of individual resources in Web ontologies. We propose a transductive inference method for inferring missing properties about individuals: given a class-membership/property value learning problem, we address the task of identifying relations which are likely to link similar individuals, and efficiently propagating knowledge across such (possibly diverse) relations. Our experimental evaluation demonstrates the effectiveness of the proposed method.

Pasquale Minervini, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito

Using Event Spaces, Setting and Theme to Assist the Interpretation and Development of Museum Stories

Stories are used to provide a context for museum objects, for example linking those objects to what they depict or the historical context in which they were created. Many explicit and implicit relationships exist between the people, places and things mentioned in a story and the museum objects with which they are associated. We describe an interface for authoring stories about museum objects in which textual stories can be associated with semantic annotations and media elements. A recommender component provides additional context as to how the story annotations are related directly or via other concepts not mentioned in the story.

The approach involves generating a concept space for different types of story annotation such as artists and museum objects. The concept space is predominantly made up of a set of events, forming an event space. The concept spaces of all story annotations can be combined into a single view. Narrative notions of setting and theme are used to reason over the concept space, identifying key concepts and time-location pairs, and their relationship to the rest of the story. Story setting and theme can then be used by the reader or author to assist in interpretation or further evolution of the story.

Paul Mulholland, Annika Wolff, Eoin Kilfeather, Evin McCarthy

Functional-Logic Programming for Web Knowledge Representation, Sharing and Querying

We propose a unified approach to semantically rich knowledge representation, querying and exchange for the Web, based on functional-logic programming. JavaScript- and JSON-based so-called information scripts serve as a unified knowledge representation and query format, with logical reasoning being a constraint solving or narrowing task. This way, our framework provides a highly versatile, easy to use and radically different alternative compared to conventional forms of knowledge representation and exchange for the Web.

Matthias Nickles

Inferring Semantic Relations by User Feedback

In the last ten years, ontology-based recommender systems have been shown to be effective tools for predicting user preferences and suggesting items. There are however some issues associated with the ontologies adopted by these approaches, such as: 1) their crafting is not a cheap process, being time consuming and calling for specialist expertise; 2) they may not represent accurately the viewpoint of the targeted user community; 3) they tend to provide rather static models, which fail to keep track of evolving user perspectives. To address these issues, we propose Klink UM, an approach for extracting emergent semantics from user feedbacks, with the aim of tailoring the ontology to the users and improving the recommendations accuracy. Klink UM uses statistical and machine learning techniques for finding hierarchical and similarity relationships between keywords associated with rated items and can be used for: 1) building a conceptual taxonomy from scratch, 2) enriching and correcting an existing ontology, 3) providing a numerical estimate of the intensity of semantic relationships according to the users. The evaluation shows that Klink UM performs well with respect to handcrafted ontologies and can significantly increase the accuracy of suggestions in content-based recommender systems.

Francesco Osborne, Enrico Motta

A Hybrid Semantic Approach to Building Dynamic Maps of Research Communities

In earlier papers we characterised the notion of

diachronic

topic-based communities

–i.e., communities of people who work on semantically related topics at the same time. These communities are important to enable topic-centred analyses of the dynamics of the research world. In this paper we present an innovative algorithm, called Research Communities Map Builder (RCMB), which is able to automatically link diachronic topic-based communities over subsequent time intervals to identify significant events. These include topic shifts within a research community; the appearance and fading of a community; communities splitting, merging, spawning other communities; and others. The output of our algorithm is a map of research communities, annotated with the detected events, which provides a concise visual representation of the dynamics of a research area. In contrast with existing approaches, RCMB enables a much more fine-grained understanding of the evolution of research communities, with respect to both the granularity of the events and the granularity of the topics. This improved understanding can, for example, inform the research strategies of funders and researchers alike. We illustrate our approach with two case studies, highlighting the main communities and events that characterized the World Wide Web and Semantic Web areas in the 2000 – 2010 decade.

Francesco Osborne, Giuseppe Scavo, Enrico Motta

Logical Detection of Invalid SameAs Statements in RDF Data

In the last years, thanks to the standardization of Semantic Web technologies, we are experiencing an unprecedented production of data, published online as

Linked Data

. In this context, when a typed link is instantiated between two different resources referring to the same real world entity, the usage of

owl:sameAs

is generally predominant. However, recent research discussions have shown issues in the use of

owl:sameAs

. Problems arise both in cases in which sameAs is automatically discovered by a data linking tool erroneously, or when users declare it but meaning something less ’strict’ than the semantics defined by OWL. In this work, we discuss further this issue and we present a method for logically detect invalid sameAs statements under specific circumstances. We report our experimental results, performed on OAEI datasets, to prove that the approach is promising.

Laura Papaleo, Nathalie Pernelle, Fatiha Saïs, Cyril Dumont

Integrating Know-How into the Linked Data Cloud

This paper presents the first framework for integrating procedural knowledge, or “know-how”, into the Linked Data Cloud. Know-how available on the Web, such as step-by-step instructions, is largely unstructured and isolated from other sources of online knowledge. To overcome these limitations, we propose extending to procedural knowledge the benefits that Linked Data has already brought to representing, retrieving and reusing declarative knowledge. We describe a framework for representing generic know-how as Linked Data and for automatically acquiring this representation from existing resources on the Web. This system also allows the automatic generation of links between different know-how resources, and between those resources and other online knowledge bases, such as DBpedia. We discuss the results of applying this framework to a real-world scenario and we show how it outperforms existing manual community-driven integration efforts.

Paolo Pareti, Benoit Testu, Ryutaro Ichise, Ewan Klein, Adam Barker

A Dialectical Approach to Selectively Reusing Ontological Correspondences

Effective communication between autonomous knowledge systems is dependent on the correct interpretation of exchanged messages, based on the entities (or vocabulary) within the messages, and their ontological definitions. However, as such systems cannot be assumed to share the same ontologies, a mechanism for autonomously determining a mutually acceptable alignment between the ontologies is required. Furthermore, the ontologies themselves may be confidential or commercially sensitive, and thus neither systems may be willing to expose their full ontologies to other parties (this may be pertinent as the transaction may only relate to part, and not all of the ontology). In this paper, we present a novel inquiry dialogue that allows such autonomous systems, or

agents

to assert, counter, accept and reject correspondences. It assumes that agents have acquired a variety of correspondences from past encounters, or from publicly available alignment systems, and that such knowledge is

asymmetric

and

incomplete

(i.e. not all agents may be aware of some correspondences, and their associated utility can vary greatly). By strategically selecting the order in which correspondences are disclosed, the two agents can jointly construct a bespoke alignment whilst minimising the disclosure of private knowledge. We show how partial alignments, garnered from different alignment systems, can be reused and aggregated through our dialectical approach, and illustrate how solutions to the

Stable Marriage

problem can be used to eliminate ambiguities (i.e. when an entity in one ontology is mapped to several other entities in another ontology). We empirically evaluate the performance of the resulting alignment compared to the use of randomly selected alignment systems, and show how by adopting a sceptical mentalistic attitude, an agent can further reduce the necessary disclosure of ontological knowledge.

Terry R. Payne, Valentina Tamma

Uncovering the Semantics of Wikipedia Pagelinks

Wikipedia pagelinks, i.e. links between Wikipages, carry an intended semantics: they indicate the existence of a factual relation between the DBpedia entity referenced by the source Wikipage, and the DBpedia entity referenced by the target Wikipage of the link. These relations are represented in DBpedia as occurrences of the generic ”wikiPageWikilink” property. We designed and implemented a novel method to uncover the intended semantics of pagelinks, and to represent them as semantic relations. In this paper, we test our method on a subset of Wikipedia, showing its potential impact for DBpedia enrichment.

Valentina Presutti, Sergio Consoli, Andrea Giovanni Nuzzolese, Diego Reforgiato Recupero, Aldo Gangemi, Ines Bannour, Haïfa Zargayouna

Closed-World Concept Induction for Learning in OWL Knowledge Bases

We present a general-purpose method for inducing OWL class descriptions over data and knowledge captured with RDF and OWL in a closed-world way. We combine our approach with a top-down refinement-based search with Description Logic (DL) expressions which incorporates OWL background knowledge. Our methods are designed for speed and scalability to support analysis tasks like data mining over large knowledge-rich data sets. We compare our methods to a state-of-the-art DL learning tool with respect to a large benchmark problem to demonstrate the speed and effectiveness of our approach.

David Ratcliffe, Kerry Taylor

YASGUI: Feeling the Pulse of Linked Data

Existing studies of Linked Data focus on the availability of data rather than its use in practice. The number of query logs available is very much restricted to a small number of datasets. This paper proposes to track Linked Data usage at the

client

side. We use YASGUI, a feature rich web-based query editor, as a measuring device for interactions with the Linked Data Cloud. It enables us to determine what part of the Linked Data Cloud is actually used, what part is open or closed, the efficiency and complexity of queries, and how these results relate to commonly used dataset statistics.

Laurens Rietveld, Rinke Hoekstra

Tackling the Class-Imbalance Learning Problem in Semantic Web Knowledge Bases

In the Semantic Web context, procedures for deciding the class-membership of an individual to a target concept in a knowledge base are generally based on automated reasoning. However, frequent cases of incompleteness/inconsistency due to distributed, heterogeneous nature and the Web-scale dimension of the knowledge bases. It has been shown that resorting to models induced from the data may offer comparably effective and efficient solutions for these cases, although skewness in the instance distribution may affect the quality of such models. This is known as

class-imbalance

problem. We propose a machine learning approach, based on the induction of

Terminological Random Forests

, that is an extension of the notion of

Random Forest

to cope with this problem in case of knowledge bases expressed through the standard Web ontology languages. Experimentally we show the feasibility of our approach and its effectiveness w.r.t. related methods, especially with imbalanced datasets.

Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito

On the Collaborative Development of Application Ontologies: A Practical Case Study with a SME

With semantic technologies coming of age, ontology-based applications are becoming more prevalent. These applications exploit the content encoded in ontologies to perform different tasks and operations. The development of ontologies to be used by a specific application presents some peculiarities compared to the modelling process of other types of ontologies. These peculiarities are related to the choice of the ontology metamodel which should be optimised for the application, and the possibility of an indirect evaluation of the ontology by running the application. In this paper we report the experience of collaboratively building an ontology for an application that supports the development of Individual Educational Plans (IEPs) for pupils with special needs. This application is a commercial product of a Small-Medium Enterprise (SME). The ontology is the result of a one-year long modelling experience that involved more than a dozen users having different expertise and competences, such as educationalists, psychologists, teachers, knowledge engineers, and application engineers. Beside describing the modelling process and tool, we report the lessons learned in collaboratively modelling an application ontology in a very concrete case. We believe our experience is worth reporting as our findings and lessons learned may be beneficial for similar modelling initiatives regarding the development of application ontologies.

Marco Rospocher, Elena Cardillo, Ivan Donadello, Luciano Serafini

Relationship-Based Top-K Concept Retrieval for Ontology Search

With the recent growth of Linked Data on the Web there is an increased need for knowledge engineers to find ontologies to describe their data. Only limited work exists that addresses the problem of searching and ranking ontologies based on a given query term. In this paper we introduce DWRank, a two-staged bi-directional graph walk ranking algorithm for concepts in ontologies. We apply this algorithm on the task of searching and ranking concepts in ontologies and compare it with state-of-the-art ontology ranking models and traditional information retrieval algorithms such as PageRank and tf-idf. Our evaluation shows that DWRank significantly outperforms the best ranking models on a benchmark ontology collection for the majority of the sample queries defined in the benchmark.

Anila Sahar Butt, Armin Haller, Lexing Xie

A Knowledge Driven Approach towards the Validation of Externally Acquired Traceability Datasets in Supply Chain Business Processes

The sharing of near real-time traceability knowledge in supply chains plays a central role in coordinating business operations and is a key driver for their success. However before traceability datasets received from external partners can be integrated with datasets generated internally within an organisation, they need to be validated against information recorded for the physical goods received as well as against bespoke rules defined to ensure uniformity, consistency and completeness within the supply chain. In this paper, we present a knowledge driven framework for the runtime validation of critical constraints on incoming traceability datasets encapuslated as EPCIS event-based linked pedigrees. Our constraints are defined using SPARQL queries and SPIN rules. We present a novel validation architecture based on the integration of Apache Storm framework for real time, distributed computation with popular Semantic Web/Linked data libraries and exemplify our methodology on an abstraction of the pharmaceutical supply chain.

Monika Solanki, Christopher Brewster

Testing OWL Axioms against RDF Facts: A Possibilistic Approach

Automatic knowledge base enrichment methods rely critically on candidate axiom scoring. The most popular scoring heuristics proposed in the literature are based on statistical inference. We argue that such a probability-based framework is not always completely satisfactory and propose a novel, alternative scoring heuristics expressed in terms of possibility theory, whereby a candidate axiom receives a bipolar score consisting of a degree of possibility and a degree of necessity. We evaluate our proposal by applying it to the problem of testing

SubClassOf

axioms against the DBpedia RDF dataset.

Andrea G. B. Tettamanzi, Catherine Faron-Zucker, Fabien Gandon

Quantifying the Bias in Data Links

The main idea behind Linked Data is to connect data from different sources together, in order to develop a hub of shared and publicly accessible knowledge. While the benefit of sharing knowledge is universally recognised, what is less visible is how much results can be affected when the knowledge in one dataset and in the connected ones are not equally distributed. This lack of balance in information, or bias, generally assumed a priori, can actually be quantified to improve the quality of the results of applications and analytics relying on such linked data. In this paper, we propose a process to measure how much bias one dataset contains when compared to another one, by identifying the most affected RDF properties and values within the set of entities that those datasets have in common (defined as the linkset). This process was ran on a wide range of linksets from Linked Data, and in the experiment section we present the results as well as measures of its performance.

Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta

Using Neural Networks to Aggregate Linked Data Rules

Two typical problems are encountered after obtaining a set of rules from a data mining process: (i) their number can be extremely large and (ii) not all of them are interesting to be considered. Both manual and automatic strategies trying to overcome those problems have to deal with technical issues such as time costs and computational complexity. This work is an attempt to address the quantity and quality issues through using a Neural Network model for predicting the quality of Linked Data rules. Our motivation comes from our previous work, in which we obtained large sets of atomic rules through an inductive logic inspired process traversing Linked Data. Assuming a limited amount of resources, and therefore the impossibility of trying every possible combination to obtain a better rule representing a subset of items, the major issue becomes detecting the combinations that will produce the best rule in the shortest time. Therefore, we propose to use a Neural Network to learn directly from the rules how to recognise a promising aggregation. Our experiments show that including a Neural Network-based prediction model in a rule aggregation process significantly reduces the amount of resources (time and space) required to produce high-quality rules.

Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta

Temporal Semantics: Time-Varying Hashtag Sense Clustering

Hashtags are creative labels used in micro-blogs to characterize the topic of a message/discussion. However, since hashtags are created in a spontaneous and highly dynamic way by users using multiple languages, the same topic can be associated to different hashtags and conversely, the same hashtag may imply different topics in different time spans. Contrary to common words, sense clustering for hashtags is complicated by the fact that no sense catalogues are available, like, e.g. Wikipedia or WordNet and furthermore, hashtag labels are often obscure. In this paper we propose a sense clustering algorithm based on temporal mining. First, hashtag time series are converted into strings of symbols using Symbolic Aggregate ApproXimation (SAX), then, hashtags are clustered based on string similarity and temporal co-occurrence. Evaluation is performed on two reference datasets of semantically tagged hashtags. We also perform a complexity evaluation of our algorithm, since efficiency is a crucial performance factor when processing large-scale data streams, such as Twitter.

Giovanni Stilo, Paola Velardi

Using Ontologies

Understanding the User Experience

Drawing on 118 responses to a survey of ontology use, this paper describes the experiences of those who create and use ontologies. Responses to questions about language and tool use illustrate the dominant position of OWL and provide information about the OWL profiles and particular Description Logic features used. The paper suggests that further research is required into the difficulties experienced with OWL constructs, and with modelling in OWL. The survey also reports on the use of ontology visualization software, finding that the importance of visualization to ontology users varies considerably. This is also an area which requires further investigation. The use of ontology patterns is examined, drawing on further input from a follow-up study devoted exclusively to this topic. Evidence suggests that pattern creation and use are frequently informal processes and there is a need for improved tools. A classification of ontology users into four groups is suggested. It is proposed that the categorisation of users and user behaviour should be taken into account when designing ontology tools and methodologies. This should enable rigorous, user-specific use cases.

Paul Warren, Paul Mulholland, Trevor Collins, Enrico Motta

A Conceptual Model for Detecting Interactions among Medical Recommendations in Clinical Guidelines

A Case-Study on Multimorbidity

Representation of clinical knowledge is still an open research topic. In particular, classical languages designed for representing clinical guidelines, which were meant for producing diagnostic and treatment plans, present limitations such as for re-using, combining, and reasoning over existing knowledge. In this paper, we address such limitations by proposing an extension of the TMR conceptual model to represent clinical guidelines that allows re-using and combining knowledge from several guidelines to be applied to patients with multimorbidities. We provide means to (semi)automatically detect interactions among recommendations that require some attention from experts, such as recommending more than once the same drug. We evaluate the model by applying it to a realistic case study involving 3 diseases (Osteoarthritis, Hypertension and Diabetes) and compare the results with two other existing methods.

Veruska Zamborlini, Rinke Hoekstra, Marcos da Silveira, Cédric Pruski, Annette ten Teije, Frank van Harmelen

Learning with Partial Data for Semantic Table Interpretation

This work studies methods of annotating Web tables for semantic indexing and search - labeling table columns with semantic type information and linking content cells with named entities. Built on a state-of-the-art method, the focus is placed on developing and evaluating methods able to achieve the goals with

partial

content sampled from the table as opposed to using the entire table content as typical state-of-the-art methods would otherwise do. The method starts by annotating table columns using a

sample

automatically selected based on the data in the table, then using the type information to guide content cell disambiguation. Different methods of sample selection are introduced, and experiments show that they contribute to higher accuracy in cell disambiguation, comparable accuracy in column type annotation but with reduced computational overhead.

Ziqi Zhang

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter