Skip to main content

Table of Contents


Prior Conceptual Knowledge in Machine Learning and Knowledge Discovery


On Ontologies as Prior Conceptual Knowledge in Inductive Logic Programming

In this paper we consider the problem of having ontologies as prior conceptual knowledge in Inductive Logic Programming (ILP). In particular, we take a critical look at three ILP proposals based on knowledge representation frameworks that integrate Description Logics and Horn Clausal Logic. From the comparative analysis of the three, we draw general conclusions that can be considered as guidelines for an upcoming Onto-Relational Learning aimed at extending Relational Learning to account for ontologies.
Francesca A. Lisi, Floriana Esposito

A Knowledge-Intensive Approach for Semi-automatic Causal Subgroup Discovery

This paper presents a methodological viewon knowledge-intensive causal subgroup discovery implemented in a semi-automatic approach. We show how to identify causal relations between subgroups by generating an extended causal subgroup network utilizing background knowledge. Using the links within the network we can identify causal relations, but also relations that are potentially confounded and/or effect-modified by external (confounding) factors. In a semi-automatic approach, the network and the discovered relations are presented to the user as an intuitive visualization. The applicability and benefit of the presented technique is illustrated by examples from a case-study in the medical domain.
Martin Atzmueller, Frank Puppe

A Study of the SEMINTEC Approach to Frequent Pattern Mining

This paper contains the experimental investigation of an approach, named SEMINTEC, to frequent pattern mining in combined knowledge bases represented in description logic with rules (so-called \({\mathcal DL}\)-safe ones). Frequent patterns in this approach are the conjunctive queries to a combined knowledge base. In this paper, first, we prove that the approach introduced in our previous work for the DLP fragment of description logic family of languages, is also valid for more expressive languages. Next, we present the experimental results under different settings of the approach, and on knowledge bases of different sizes and complexities.
Joanna Józefowska, Agnieszka Ławrynowicz, Tomasz Łukaszewski

Partitional Conceptual Clustering of Web Resources Annotated with Ontology Languages

The paper deals with the problem of cluster discovery in the context of Semantic Web knowledge bases. A partitional clustering algorithm is presented. It is applied for grouping resources contained in knowledge bases and expressed in the standard ontology languages. The method exploits a language-independent semi-distance measure for individuals that is based on the semantics of the resources w.r.t. a context represented by a set of concept descriptions (discriminating features). The clustering algorithm adapts Bisecting k-Means method to work with medoids. Besides, we propose simple mechanisms to assign each cluster an intensional definition that may suggest new concepts for the knowledge base (vivification). A final experiment demonstrates the validity of the approach through absolute quality indices for clustering results.
Floriana Esposito, Nicola Fanizzi, Claudia d’Amato

The Ex Project: Web Information Extraction Using Extraction Ontologies

Extraction ontologies represent a novel paradigm in web information extraction (as one of ‘deductive’ species of web mining) allowing to swiftly proceed from initial domain modelling to running a functional prototype, without the necessity of collecting and labelling large amounts of training examples. Bottlenecks in this approach are however the tedium of developing an extraction ontology adequately covering the semantic scope of web data to be processed and the difficulty of combining the ontology-based approach with inductive or wrapper-based approaches. We report on an ongoing project aiming at developing a web information extraction tool based on richly-structured extraction ontologies and with additional possibility of (1) semi-automatically constructing these from third-party domain ontologies, (2) absorbing the results of inductive learning for subtasks where pre-labelled data abound, and (3) actively exploiting formatting regularities in the wrapper style.
Martin Labský, Vojtěch Svátek, Marek Nekvasil, Dušan Rak

Dealing with Background Knowledge in the SEWEBAR Project

SEWEBAR is a research project the goal of which is to study possibilities of dissemination of analytical reports through Semantic Web. We are interested in analytical reports presenting results of data mining. Each analytical report gives answer to one analytical question. Lot of interesting analytical questions can be answered by GUHA procedures implemented in the LISp-Miner system. The SEWEBAR project deals with these analytical questions. However the process of formulating and answering such analytical questions requires various background knowledge. The paper presents first steps in storing and application of several forms of background knowledge in the SEWEBAR project. Examples concerning dealing with medical knowledge are presented.
Jan Rauch, Milan Šimůnek

Web Mining 2.0


Item Weighting Techniques for Collaborative Filtering

Collaborative Filtering (CF) recommender systems generate rating predictions for a target user by exploiting the ratings of similar users. Therefore, the computation of user-to-user similarity is an important element in CF; it is used in the neighborhood formation and rating prediction steps. In this paper we investigate the role of item weighting techniques. An item weight provides a measure of the importance of an item for predicting the rating of another item and it is computed as a correlation coefficient between the two items’ rating vectors. In this paper we analyze a wide range of item weighting schemas. Moreover, we introduce an item filtering approach, based on item weighting, that works by discarding in the user-touser similarity computation the items with the smallest weights.We assume that the items with smallest weights are the least useful for generating the prediction. We have evaluated the proposed methods using two datasets (MovieLens and Yahoo!) and identified the conditions for their best application in CF.
Linas Baltrunas, Francesco Ricci

Using Term-Matching Algorithms for the Annotation of Geo-services

This paper presents an approach to automating semantic annotation within service-oriented architectures that provide interfaces to databases of spatialinformation objects. The automation of the annotation process facilitates the transition from the current state-of-the-art architectures towards semantically-enabled architectures. We see the annotation process as the task of matching an arbitrary word or term with the most appropriate concept in the domain ontology. The term matching techniques that we present are based on text mining. To determine the similarity between two terms, we first associate a set of documents [that we obtain from a Web search engine] with each term. We then transform the documents into feature vectors and thus transition the similarity assessment into the feature space. After that, we compute the similarity by training a classifier to distinguish between ontology concepts. Apart from text mining approaches, we also present an alternative technique, namely Google Distance, which proves less suitable for our task. The paper also presents the results of an extensive evaluation of the presented term matching methodswhich shows that these methodswork best on synonymous nouns from a specific vocabulary. Furthermore, the fast and simple centroid-based classifier is shown to perform very well for this task. The main contribution of this paper is thus in proposing a term matching algorithm based on text mining and information retrieval. Furthermore, the presented evaluation should give a notion of how the algorithm performs in various scenarios.
Miha Grčar, Eva Klien, Blaž Novak


Additional information

Premium Partner

    Image Credits