nach oben

2006 | Buch

Semantics, Web and Mining

Joint International Workshops, EWMF 2005 and KDO 2005, Porto, Portugal, October 3-7, 2005, Revised Selected Papers

herausgegeben von: Markus Ackermann, Bettina Berendt, Marko Grobelnik, Andreas Hotho, Dunja Mladenič, Giovanni Semeraro, Myra Spiliopoulou, Gerd Stumme, Vojtěch Svátek, Maarten van Someren

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Finding knowledge – or meaning – in data is the goal of every knowledge d- covery e?ort. Subsequent goals and questions regarding this knowledge di?er amongknowledgediscovery(KD) projectsandapproaches. Onecentralquestion is whether and to what extent the meaning extracted from the data is expressed in a formal way that allows not only humans but also machines to understand and re-use it, i. e. , whether the semantics are formal semantics. Conversely, the input to KD processes di?ers between KD projects and approaches. One central questioniswhetherthebackgroundknowledge,businessunderstanding,etc. that the analyst employs to improve the results of KD is a set of natural-language statements, a theory in a formal language, or somewhere in between. Also, the data that are being mined can be more or less structured and/or accompanied by formal semantics. These questions must be asked in every KD e?ort. Nowhere may they be more pertinent, however, than in KD from Web data (“Web mining”). This is due especially to the vast amounts and heterogeneity of data and ba- ground knowledge available for Web mining (content, link structure, and - age), and to the re-use of background knowledge and KD results over the Web as a global knowledge repository and activity space. In addition, the (Sem- tic) Web can serve as a publishing space for the results of knowledge discovery from other resources, especially if the whole process is underpinned by common ontologies.

Inhaltsverzeichnis

Frontmatter

EWMF Papers

A Website Mining Model Centered on User Queries

Abstract

We present a model for mining user queries found within the access logs of a website and for relating this information to the website’s overall usage, structure and content. The aim of this model is to discover, in a simple way, valuable information to improve the quality of the website, allowing the website to become more intuitive and adequate for the needs of its users. This model presents a methodology of analysis and classification of the different types of queries registered in the usage logs of a website, such as queries submitted by users to the site’s internal search engine and queries on global search engines that lead to documents in the website. These queries provide useful information about topics that interest users visiting the website and the navigation patterns associated to these queries indicate whether or not the documents in the site satisfied the user’s needs at that moment.

Ricardo Baeza-Yates, Barbara Poblete

WordNet-Based Word Sense Disambiguation for Learning User Profiles

Abstract

Nowadays, the amount of available information, especially on the Web and in Digital Libraries, is increasing over time. In this context, the role of user modeling and personalized information access is increasing. This paper focuses on the problem of choosing a representation of documents that can be suitable to induce concept-based user profiles as well as to support a content-based retrieval process. We propose a framework for content-based retrieval, which integrates a word sense disambiguation algorithm based on a semantic similarity measure between concepts (synsets) in the WordNet IS-A hierarchy, with a relevance feedback method to induce semantic user profiles. The document representation adopted in the framework, that we called Bag-Of-Synsets (BOS) extends and slightly improves the classic Bag-Of-Words (BOW) approach, as shown by an extensive experimental session.

M. Degemmis, P. Lops, G. Semeraro

Visibility Analysis on the Web Using Co-visibilities and Semantic Networks

Abstract

Monitoring public attention for a topic is of interest for many target groups like social scientists or public relations. Several examples demonstrate how public attention caused by real-world events is accompanied by an accordant visibility of topics on the web. It is shown that the hitcount values of a search engine we use as initial visibility values have to be adjusted by taking the semantic relations between topics into account. We model these relations using semantic networks and present an algorithm based on Spreading Activation that adjusts the initial visibilities. The concept of co-visibility between topics is integrated to obtain an algorithm that mostly complies with an intuitive view on visibilities. The reliability of search engine hitcounts is discussed.

Peter Kiefer, Klaus Stein, Christoph Schlieder

Link-Local Features for Hypertext Classification

Abstract

Previous work in hypertext classification has resulted in two principal approaches for incorporating information about the graph properties of the Web into the training of a classifier. The first approach uses the complete text of the neighboring pages, whereas the second approach uses only their class labels. In this paper, we argue that both approaches are unsatisfactory: the first one brings in too much irrelevant information, while the second approach is too coarse by abstracting the entire page into a single class label. We argue that one needs to focus on relevant parts of predecessor pages, namely on the region in the neighborhood of the origin of an incoming link. To this end, we will investigate different ways for extracting such features, and compare several different techniques for using them in a text classifier.

Hervé Utard, Johannes Fürnkranz

Information Retrieval in Trust-Enhanced Document Networks

Abstract

To fight the problem of information overload in huge information sources like large document repositories, e. g. citeseer, or internet websites you need a selection criterion: some kind of ranking is required. Ranking methods like PageRank analyze the structure of the document reference network. However, these rankings do not distinguish different reference semantics. We enhance these rankings by incorporating information of a second layer: the author trust network to improve ranking quality and to enable personalized selections.

Klaus Stein, Claudia Hess

Semi-automatic Creation and Maintenance of Web Resources with webTopic

Abstract

In this paper we propose a methodology for automatically retrieving document collections from the web on specific topics and for organizing them and keeping them up-to-date over time, according to user specific persistent information needs. The documents collected are organized according to user specifications and are classified partly by the user and partly automatically. A presentation layer enables the exploration of large sets of documents and, simultaneously, monitors and records user interaction with these document collections. The quality of the system is permanently monitored; the system periodically measures and stores the values of its quality parameters. Using this quality log it is possible to maintain the quality of the resources by triggering procedures aimed at correcting or preventing quality degradation.

Nuno F. Escudeiro, Alípio M. Jorge

KDO Papers on KDD for Ontology

Discovering a Term Taxonomy from Term Similarities Using Principal Component Analysis

Abstract

We show that eigenvector decomposition can be used to extract a term taxonomy from a given collection of text documents. So far, methods based on eigenvector decomposition, such as latent semantic indexing (LSI) or principal component analysis (PCA), were only known to be useful for extracting symmetric relations between terms. We give a precise mathematical criterion for distinguishing between four kinds of relations of a given pair of terms of a given collection: unrelated (car – fruit), symmetrically related (car – automobile), asymmetrically related with the first term being more specific than the second (banana – fruit), and asymmetrically related in the other direction (fruit – banana). We give theoretical evidence for the soundness of our criterion, by showing that in a simplified mathematical model the criterion does the apparently right thing. We applied our scheme to the reconstruction of a selected part of the open directory project (ODP) hierarchy, with promising results.

Holger Bast, Georges Dupret, Debapriyo Majumdar, Benjamin Piwowarski

Semi-automatic Construction of Topic Ontologies

Abstract

In this paper, we review two techniques for topic discovery in collections of text documents (Latent Semantic Indexing and K-Means clustering) and present how we integrated them into a system for semi-automatic topic ontology construction. The OntoGen system offers support to the user during the construction process by suggesting topics and analyzing them in real time. It suggests names for the topics in two alternative ways both based on extracting keywords from a set of documents inside the topic. The first set of descriptive keyword is extracted using document centroid vectors, while the second set of distinctive keyword is extracted from the SVM classification model dividing documents in the topic from the neighboring documents.

Blaž Fortuna, Dunja Mladenič, Marko Grobelnik

Evaluation of Ontology Enhancement Tools

Abstract

Mining algorithms can enhance the task of ontology establishment but methods are needed to assess the quality of their findings. Ontology establishment is a long-term interactive process, so it is important to evaluate the contribution of a mining tool at an early phase of this process so that only appropriate tools are used in later phases. We propose a method for the evaluation of such tools on their impact on ontology enhancement. We model impact as quality perceived by the expert and as statistical quality computed by an objective function. We further provide a mechanism that juxtaposes the two forms of quality. We have applied our method on an ontology enhancement tool and gained some interesting insights on the interplay between perceived impact and statistical quality.

Myra Spiliopoulou, Markus Schaal, Roland M. Müller, Marko Brunzel

KDO Papers on Ontology for KDD

Introducing Semantics in Web Personalization: The Role of Ontologies

Abstract

Web personalization is the process of customizing a web site to the needs of each specific user or set of users. Personalization of a web site may be performed by the provision of recommendations to the users, high-lighting/adding links, creation of index pages, etc. The web personalization systems are mainly based on the exploitation of the navigational patterns of the web site’s visitors. When a personalization system relies solely on usage-based results, however, valuable information conceptually related to what is finally recommended may be missed. The exploitation of the web pages’ semantics can considerably improve the results of web usage mining and personalization, since it provides a more abstract yet uniform and both machine and human understandable way of processing and analyzing the usage data. The underlying idea is to integrate usage data with content semantics, expressed in ontology terms, in order to produce semantically enhanced navigational patterns that can subsequently be used for producing valuable recommendations. In this paper we propose a semantic web personalization system, focusing on word sense disambiguation techniques which can be applied in order to semantically annotate the web site’s content.

Magdalini Eirinaki, Dimitrios Mavroeidis, George Tsatsaronis, Michalis Vazirgiannis

Ontology-Enhanced Association Mining

Abstract

The roles of ontologies in KDD are potentially manifold. We track them through different phases of the KDD process, from data understanding through task setting to mining result interpretation and sharing over the semantic web. The underlying KDD paradigm is association mining tailored to our 4ft-Miner tool. Experience from two different application domains—medicine and sociology—is presented throughout the paper. Envisaged software support for prior knowledge exploitation via customisation of an existing user-oriented KDD tool is also discussed.

Vojtěch Svátek, Jan Rauch, Martin Ralbovský

Ontology-Based Rummaging Mechanisms for the Interpretation of Web Usage Patterns

Abstract

Web Usage Mining (WUM) is the application of data mining techniques over web server logs in order to extract navigation usage patterns. Identifying the relevant and interesting patterns, and to understand what knowledge they represent in the domain is the goal of the Pattern Analysis phase, one of the phases of the WUM process. Pattern analysis is a critical phase in WUM due to two main reasons: a) mining algorithms yield a huge number of patterns; b) there is a significant semantic gap between URLs and events performed by users. In this paper, we discuss an ontology-based approach to support the analysis of sequential navigation patterns, discussing the main features of the O3R (Ontology-based Rules Retrieval and Rummaging) prototype. O3R functionality is targeted at supporting the comprehension of patterns through interactive pattern rummaging, as well as on the identification of potentially interesting ones. All functionality is based on the availability of the domain ontology, which dynamically provides meaning to URLs. The paper provides an overall view of O3R, details the rummaging functionality, and discusses preliminary results on the use of O3R.

Mariângela Vanzin, Karin Becker

Backmatter

Titel: Semantics, Web and Mining
herausgegeben von: Markus Ackermann
Bettina Berendt
Marko Grobelnik
Andreas Hotho
Dunja Mladenič
Giovanni Semeraro
Myra Spiliopoulou
Gerd Stumme
Vojtěch Svátek
Maarten van Someren
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-47698-6
Print ISBN: 978-3-540-47697-9
DOI: https://doi.org/10.1007/11908678