Advances in Web Mining and Web Usage Analysis

7th International Workshop on Knowledge Discovery on the Web, WebKDD 2005, Chicago, IL, USA, August 21, 2005. Revised Papers

herausgegeben von: Olfa Nasraoui, Osmar Zaïane, Myra Spiliopoulou, Bamshad Mobasher, Brij Masand, Philip S. Yu

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Thisbookcontainsthepostworkshopproceedingsofthe7thInternationalWo- shop on Knowledge Discovery from the Web, WEBKDD 2005. The WEBKDD workshop series takes place as part of the ACM SIGKDD International Conf- ence on Knowledge Discovery and Data Mining (KDD) since 1999. The discipline of data mining delivers methodologies and tools for the an- ysis of large data volumes and the extraction of comprehensible and non-trivial insights from them. Web mining, a much younger discipline, concentrates on the analysisofdata pertinentto theWeb.Web mining methods areappliedonusage data and Web site content; they strive to improve our understanding of how the Web is used, to enhance usability and to promote mutual satisfaction between e-business venues and their potential customers. In the last years, the interest for the Web as medium for communication, interaction and business has led to new challenges and to intensive, dedicated research. Many of the infancy problems in Web mining have now been solved but the tremendous potential for new and improved uses, as well as misuses, of the Web are leading to new challenges.

Inhaltsverzeichnis

Frontmatter

Mining Significant Usage Patterns from Clickstream Data

Abstract

Discovery of usage patterns from Web data is one of the primary purposes for Web Usage Mining. In this paper, a technique to generate Significant Usage Patterns (SUP) is proposed and used to acquire significant “user preferred navigational trails”. The technique uses pipelined processing phases including sub-abstraction of sessionized Web clickstreams, clustering of the abstracted Web sessions, concept-based abstraction of the clustered sessions, and SUP generation. Using this technique, valuable customer behavior information can be extracted by Web site practitioners. Experiments conducted using Web log data provided by J.C.Penney demonstrate that SUPs of different types of customers are distinguishable and interpretable. This technique is particularly suited for analysis of dynamic websites.

Lin Lu, Margaret Dunham, Yu Meng

Using and Learning Semantics in Frequent Subgraph Mining

Abstract

The search for frequent subgraphs is becoming increasingly important in many application areas including Web mining and bioinformatics. Any use of graph structures in mining, however, should also take into account that it is essential to integrate background knowledge into the analysis, and that patterns must be studied at different levels of abstraction. To capture these needs, we propose to use taxonomies in mining and to extend frequency / support measures by the notion of context-induced interestingness. The AP-IP mining problem is to find all frequent abstract patterns and the individual patterns that constitute them and are therefore interesting in this context (even though they may be infrequent). The paper presents the fAP-IP algorithm that uses a taxonomy to search for the abstract and individual patterns, and that supports graph clustering to discover further structure in the individual patterns. Semantics are used as well as learned in this process. fAP-IP is implemented as an extension of Gaston (Nijssen & Kok, 2004), and it is complemented by the AP-IP visualization tool that allows the user to navigate through detail-and-context views of taxonomy context, pattern context, and transaction context. A case study of a real-life Web site shows the advantages of the proposed solutions.

ACM categories and subject descriptors and keywords: H.2.8 [Database Management]: Database Applications—data mining; H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia —navigation, user issues; graph mining; Web mining; background knowledge and semantics in mining.

Bettina Berendt

Overcoming Incomplete User Models in Recommendation Systems Via an Ontology

Abstract

To make accurate recommendations, recommendation systems currently require more data about a customer than is usually available. We conjecture that the weaknesses are due to a lack of inductive bias in the learning methods used to build the prediction models. We propose a new method that extends the utility model and assumes that the structure of user preferences follows an ontology of product attributes. Using the data of the MovieLens system, we show experimentally that real user preferences indeed closely follow an ontology based on movie attributes. Furthermore, a recommender based just on a single individual’s preferences and this ontology performs better than collaborative filtering, with the greatest differences when little data about the user is available. This points the way to how proper inductive bias can be used for significantly more powerful recommender systems in the future.

Vincent Schickel-Zuber, Boi Faltings

Data Sparsity Issues in the Collaborative Filtering Framework

Abstract

With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater user efficiency has emerged. Within the fields of user profiling and Web personalization several popular content filtering techniques have been developed. In this chapter we present one of such techniques – collaborative filtering. Apart from giving an overview of collaborative filtering approaches, we present the experimental results of confronting the k-Nearest Neighbor (kNN) algorithm with Support Vector Machine (SVM) in the collaborative filtering framework using datasets with different properties. While the k-Nearest Neighbor algorithm is usually used for collaborative filtering tasks, Support Vector Machine is considered a state-of-the-art classification algorithm. Since collaborative filtering can also be interpreted as a classification/regression task, virtually any supervised learning algorithm (such as SVM) can also be applied. Experiments were performed on two standard, publicly available datasets and, on the other hand, on a real-life corporate dataset that does not fit the profile of ideal data for collaborative filtering. We conclude that the quality of collaborative filtering recommendations is highly dependent on the sparsity of available data. Furthermore, we show that kNN is dominant on datasets with relatively low sparsity while SVM-based approaches may perform better on highly sparse data.

Miha Grčar, Dunja Mladenič, Blaž Fortuna, Marko Grobelnik

USER: User-Sensitive Expert Recommendations for Knowledge-Dense Environments

Abstract

Traditional recommender systems tend to focus on e-commerce applications, recommending products to users from a large catalog of available items. The goal has been to increase sales by tapping into the user’s interests by utilizing information from various data sources to make relevant recommendations. Education, government, and policy websites face parallel challenges, except the product is information and their users may not be aware of what is relevant and what isn’t. Given a large, knowledge-dense website and a nonexpert user searching for information, making relevant recommendations becomes a significant challenge. This paper addresses the problem of providing recommendations to non-experts, helping them understand what they need to know, as opposed to what is popular among other users. The approach is usersensitive in that it adopts a ‘model of learning’ whereby the user’s context is dynamically interpreted as they browse and then leveraging that information to improve our recommendations.

Colin DeLong, Prasanna Desikan, Jaideep Srivastava

Analysis and Detection of Segment-Focused Attacks Against Collaborative Recommendation

Abstract

Significant vulnerabilities have recently been identified in collaborative filtering recommender systems. These vulnerabilities mostly emanate from the open nature of such systems and their reliance on user-specified judgments for building profiles. Attackers can easily introduce biased data in an attempt to force the system to “adapt” in a manner advantageous to them. Our research in secure personalization is examining a range of attack models, from the simple to the complex, and a variety of recommendation techniques. In this chapter, we explore an attack model that focuses on a subset of users with similar tastes and show that such an attack can be highly successful against both user-based and item-based collaborative filtering. We also introduce a detection model that can significantly decrease the impact of this attack.

Bamshad Mobasher, Robin Burke, Chad Williams, Runa Bhaumik

Adaptive Web Usage Profiling

Abstract

Web usage models and profiles capture significant interests and trends from past accesses. They are used to improve user experience, say through recommendation of pages, pre-fetching of pages, etc. While browsing behavior changes dynamically over time, many web usage modeling techniques are static due to prohibitive model compilation times and also lack of fast incremental update mechanism. However, profiles have to be maintained so that they dynamically adapt to new interests and trends, since otherwise their use can lead to poor, irrelevant, and mis-targeted recommendations in personalization systems. We present a new profile maintenance scheme, which extends the Relational Fuzzy Subtractive Clustering (RFSC) technique and enables efficient incremental update of usage profiles. An impact factor is defined whose value can be used to decide the need for recompilation. The results from extensive experiments on a large real dataset of web logs show that the proposed maintenance technique, with considerably reduced computational costs, is almost as good as complete remodeling.

Bhushan Shankar Suryavanshi, Nematollaah Shiri, Sudhir P. Mudur

On Clustering Techniques for Change Diagnosis in Data Streams

Abstract

In recent years, data streams have become ubiquitous in a variety of applications because of advances in hardware technology. Since data streams may be generated by applications which are time-changing in nature, it is often desirable to explore the underlying changing trends in the data. In this paper, we will explore and survey some of our recent methods for change detection. In particular, we will study methods for change detection which use clustering in order to provide a concise understanding of the underlying trends. We discuss our recent techniques which use micro-clustering in order to diagnose the changes in the underlying data. We also discuss the extension of this method to text and categorical data sets as well community detection in graph data streams.

Charu C. Aggarwal, Philip S. Yu

Personalized Search Results with User Interest Hierarchies Learnt from Bookmarks

Abstract

Personalized web search incorporates an individual user’s interests when deciding relevant results to return. While, most web search engines are usually designed to serve all users, without considering the interests of individual users. We propose a method to (re)rank the results from a search engine using a learned user profile, called a user interest hierarchy (UIH), from web pages that are of interest to the user. The user’s interest in web pages will be determined implicitly, without directly asking the user. Experimental results indicate that our personalized ranking methods, when used with a popular search engine, can yield more potentially interesting web pages for individual users.

Hyoung-rae Kim, Philip K. Chan

Backmatter

Titel: Advances in Web Mining and Web Usage Analysis
herausgegeben von: Olfa Nasraoui
Osmar Zaïane
Myra Spiliopoulou
Bamshad Mobasher
Brij Masand
Philip S. Yu
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-46348-1
Print ISBN: 978-3-540-46346-7
DOI: https://doi.org/10.1007/11891321