nach oben

2001 | Buch

Kapitel lesen Erstes Kapitel lesen

Relational Data Mining

herausgegeben von: Sašo Džeroski, Nada Lavrač

Verlag: Springer Berlin Heidelberg

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

As the first book devoted to relational data mining, this coherently written multi-author monograph provides a thorough introduction and systematic overview of the area. The first part introduces the reader to the basics and principles of classical knowledge discovery in databases and inductive logic programming; subsequent chapters by leading experts assess the techniques in relational data mining in a principled and comprehensive way; finally, three chapters deal with advanced applications in various fields and refer the reader to resources for relational data mining.
This book will become a valuable source of reference for R&D professionals active in relational data mining. Students as well as IT professionals and ambitioned practitioners interested in learning about relational data mining will appreciate the book as a useful text and gentle introduction to this exciting new field.

Inhaltsverzeichnis

Frontmatter

Introduction

Frontmatter

1. Data Mining in a Nutshell

Abstract

Data mining, the central activity in the process of knowledge discovery in databases, is concerned with finding patterns in data. This chapter introduces and illustrates the most common types of patterns considered by data mining approaches and gives rough outlines of the data mining algorithms that are most frequently used to look for such patterns. It also briefly introduces relational data mining, starting with patterns that involve multiple relations and laying down the basic principles common to relational data mining algorithms. An overview of the contents of this book is given, as well as pointers to literature and Internet resources on data mining.

Sašo Džeroski

2. Knowledge Discovery in Databases: An Overview

Abstract

Data Mining and Knowledge Discovery in Databases (KDD) promise to play an important role in the way people interact with databases, especially decision support databases where analysis and exploration operations are essential. Inductive logic programming can potentially play some key roles in KDD. We define the basic notions in data mining and KDD, define the goals, present motivation, and give a high-level definition of the KDD Process and how it relates to Data Mining. We then focus on data mining methods. Basic coverage of a sampling of methods will be provided to illustrate the methods and how they are used. We cover two case studies of successful applications in science data analysis, one of which is the classification of cataloging of a major astronomy sky survey covering 2 billion objects in the northern sky. The system can outperform human as well as classical computational analysis tools in astronomy on the task of recognizing faint stars and galaxies. We conclude with a listing of research challenges and we outline the areas where ILP could play some important roles in KDD.

Usama Fayyad

3. An Introduction to Inductive Logic Programming

Abstract

Inductive logic programming (ILP) is concerned with the development of techniques and tools for relational data mining. Besides the ability to deal with data stored in multiple tables, ILP systems are usually able to take into account generally valid background (domain) knowledge in the form of a logic program. They also use the powerful language of logic programs for describing discovered patterns. This chapter introduces the basics of logic programming and relates logic programming terminology to database terminology. It then defines the task of relational rule induction, the basic data mining task addressed by ILP systems, and presents some basic techniques for solving this task. It concludes with an overview of other relational data mining tasks addressed by ILP systems.

Sašo Džeroski, Nada Lavrač

4. Inductive Logic Programming for Knowledge Discovery in Databases

Abstract

Relational data mining algorithms and systems are capable of directly dealing with multiple tables or relations as they are found in today’s relational databases. This reduces the need for manual preprocessing and allows problems to be treated that cannot be handled easily with standard single-table methods. This paper provides a tutorial-style introduction to the topic, beginning with a detailed explanation of why and where one might be interested in relational analysis. We then present the basics of Inductive Logic Programming (ILP), the scientific field where relational methods are primarily studied. After illustrating the workings of MiDOS, a relational methods for subgroup discovery, in more detail, we show how to use relational methods in one of the current data mining systems.

Stefan Wrobel

Techniques

Frontmatter

5. Three Companions for Data Mining in First Order Logic

Abstract

Three companion systems, Claudien, ICL and Tilde, are presented. They use a common representation for examples and hypotheses: each example is represented by a relational database. This contrasts with the classical inductive logic programming systems such as Progol and Foil. It is argued that this representation is closer to attribute value learning and hence more natural. Furthermore, the three systems can be considered first order upgrades of typical data mining systems, which induce association rules, classification rules or decision trees respectively.

Luc De Raedt, Hendrik Blockeel, Luc Dehaspe, Wim Van Laer

6. Inducing Classification and Regression Trees in First Order Logic

Abstract

In this chapter, we present a system that enhances the representational capabilities of decision and regression tree learning by extending it to first-order logic, i.e., relational representations as commonly used in Inductive Logic Programming. We describe an algorithm named Structural Classification and Regression Trees (S-Cart), which is capable of inducing first-order trees for both classification and regression problems, i.e., for the prediction of either discrete classes or numerical values. We arrive at this algorithm by a strategy called upgrading — we start from a propositional induction algorithm and turn it into a relational learner by devising suitable extensions of the representation language and the associated algorithms. In particular, we have upgraded Cart, the classical method for learning classification and regression trees, to handle relational examples and background knowledge. The system constructs a tree containing a literal (an atomic formula or its negation) or a conjunction of literals in each node, and assigns either a discrete class or a numerical value to each leaf. In addition, we have extended the Cart methodology by adding linear regression models to the leaves of the trees; this does not have a counterpart in Cart, but was inspired by its approach to pruning. The regression variant of S-Cart is one of the few systems applicable to Relational Regression problems. Experiments in several real-world domains demonstrate that the approach is useful and competitive with existing methods, indicating that the advantage of relatively small and comprehensible models does not come at the expense of predictive accuracy.

Stefan Kramer, Gerhard Widmer

7. Relational Rule Induction with CProgol4.4: A Tutorial Introduction

Abstract

This chapter describes the theory and use of CProgol4.4, a state-ofthe-art Inductive Logic Programming (ILP) system. After explaining how to download the source code, the reader is guided through the development of Progol input files containing type definitions, mode declarations, background knowledge, examples and integrity constraints. The theory behind the system is then described using a simple example as illustration. The main algorithms in Progol are given and methods of pruning the search space of possible hypotheses are discussed. Next the application of built-in procedures for estimating predictive accuracy and statistical significance of Progol hypotheses is demonstrated. Lastly, the reader is shown how to use the more advanced features of CProgol4.4, including positive-only learning and the use of metalogical predicates for pruning the search space.

Stephen Muggleton, John Firth

8. Discovery of Relational Association Rules

Abstract

Within KDD, the discovery of frequent patterns has been studied in a variety of settings. In its simplest form, known from association rule mining, the task is to discover all frequent item sets, i.e., all combinations of items that are found in a sufficient number of examples. We present algorithms for relational association rule discovery that are well-suited for exploratory data mining. They offer the flexibility required to experiment with examples more complex than feature vectors and patterns more complex than item sets.

Luc Dehaspe, Hannu Toivonen

9. Distance Based Approaches to Relational Learning and Clustering

Abstract

Within data analysis, distance-based methods have always been very popular. Such methods assume that it is possible to compute for each pair of objects in a domain their mutual distance (or similarity). In a distance-based setting, many of the tasks usually considered in data mining can be carried out in a surprisingly simple yet powerful way. In this chapter, we give a tutorial introduction to the use of distance-based methods for relational representations, concentrating in particular on predictive learning and clustering. We describe in detail one relational distance measure that has proven very successful in applications, and introduce three systems that actually carry out relational distance-based learning and clustering: Ribl2, Rdbc and Forc. We also present a detailed case study of how these three systems were applied to a domain from molecular biology.

Mathias Kirsten, Stefan Wrobel, Tamás Horváth

From Propositional to Relational Data Mining

Frontmatter

10. How to Upgrade Propositional Learners to First Order Logic: A Case Study

Abstract

We describe a methodology for upgrading existing attribute-value learners towards first-order logic. This method has several advantages: one can profit from existing research on propositional learners (and inherit its efficiency and effectiveness), relational learners (and inherit its expressiveness) and PAC-learning (and inherit its theoretical basis). Moreover there is a clear relationship between the new relational system and its propositional counterpart. This makes the ILP system easy to use and understand by users familiar with the propositional counterpart. We demonstrate the methodology on the ICL system which is an upgrade of the propositional learner CN2.

Wim Van Laer, Luc De Raedt

11. Propositionalization Approaches to Relational Data Mining

Abstract

This chapter surveys methods that transform a relational representation of a learning problem into a propositional (feature-based, attribute-value) representation. This kind of representation change is known as propositionalization. Taking such an approach, feature construction can be decoupled from model construction. It has been shown that in many relational data mining applications this can be done without loss of predictive performance. After reviewing both general-purpose and domaindependent propositionalization approaches from the literature, an extension to the Linus propositionalization method that overcomes the system’s earlier inability to deal with non-determinate local variables is described.

Stefan Kramer, Nada Lavrač, Peter Flach

12. Relational Learning and Boosting

Abstract

Boosting, a methodology for constructing and combining multiple classifiers, has been found to lead to substantial improvements in predictive accuracy. Although boosting was formulated in a propositional learning context, the same ideas can be applied to first-order learning (also known as inductive logic programming). Boosting is used here with a system that learns relational definitions of functions. Results show that the magnitude of the improvement, the additional computational cost, and the occasional negative impact of boosting all resemble the corresponding observations for propositional learning.

Ross Quinlan

13. Learning Probabilistic Relational Models

Abstract

Probabilistic relational models (PRMs) are a language for describing statistical models over typed relational domains. A PRM models the uncertainty over the attributes of objects in the domain and uncertainty over the relations between the objects. The model specifies, for each attribute of an object, its (probabilistic) dependence on other attributes of that object and on attributes of related objects. The dependence model is defined at the level of classes of objects. The class dependence model is instantiated for any object in the class, as appropriate to the particular context of the object (i.e., the relations between this objects and others). PRMs can also represent uncertainty over the relational structure itself, e.g., by specifying a (class-level) probability that two objects will be related to each other. PRMs provide a foundation for dealing with the noise and uncertainty encountered in most real-world domains. In this chapter, we show that the compact and natural representation of PRMs allows them to be learned directly from an existing relational database using well-founded statistical techniques. We give an introduction to PRMs and an overview of methods for learning them. We show that PRMs provide a new framework for relational data mining, and offer new challenges for the endeavor of learning relational models for real-world domains.

Lise Getoor, Nir Friedman, Daphne Koller, Avi Pfeffer

Applications and Web Resources

Frontmatter

14. Relational Data Mining Applications: An Overview

Abstract

This chapter gives an overview of applications of relational learning and inductive logic programming to data mining problems in a variety of areas. These include bioinformatics, where successful applications come from drug design, predicting mutagenicity and carcinogenicity, and predicting protein structure and function, including genome scale prediction of protein functional class. Other application areas include medicine, environmental sciences and monitoring, mechanical and traffic engineering. Applications of relational learning are also emerging in business data analysis, text and Web mining, and miscellaneous other fields, such as the analysis of musical performances.

Sašo Džeroski

15. Four Suggestions and a Rule Concerning the Application of ILP

Abstract

Since the late 1980s there has been a sustained research effort directed at investigating the application of Inductive Logic Programming (ILP) to problems in biology and chemistry. This essay is a personal view of some interesting issues that have arisen during my involvement in this enterprise. Many of the concerns of the broader field of Knowledge Discovery in Databases manifest themselves during the application of ILP to analyse bio-chemical data. Addressing them in this microcosm has given me some directions on the wider application of ILP, and I present these here in the form of four suggestions and one rule. Readers are invited to consider them in the context of a hypothetical Recommended Codes and Practices for the application of ILP.

Ashwin Srinivasan

16. Internet Resources on ILP for KDD

Abstract

The aim of this chapter is to review ILP resources available on the Internet. These are especially important due to the growing interest for applying ILP methods to knowledge discovery in databases (KDD) and data mining problems. They also play an important role in connecting the ILP research community with potential end users of ILP methods. Most of the chapter is dedicated to the ILP Internet resources gathered and maintained in the ILPnet and ILPnet2 projects. Some other ILP and KDD related Internet resources are briefly reviewed as well.

Ljupčo Todorovski, Irene Weber, Nada Lavrač, Olga Stěpánkova, Sašo Džeroski, Dimitar Kazakov, Darko Zupanič, Peter Flach

Backmatter

Titel: Relational Data Mining
herausgegeben von: Sašo Džeroski
Nada Lavrač
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-662-04599-2
Print ISBN: 978-3-642-07604-6
DOI: https://doi.org/10.1007/978-3-662-04599-2

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Introduction

Frontmatter

1. Data Mining in a Nutshell

2. Knowledge Discovery in Databases: An Overview

3. An Introduction to Inductive Logic Programming

4. Inductive Logic Programming for Knowledge Discovery in Databases

Techniques

Frontmatter

5. Three Companions for Data Mining in First Order Logic

6. Inducing Classification and Regression Trees in First Order Logic

7. Relational Rule Induction with CProgol4.4: A Tutorial Introduction

8. Discovery of Relational Association Rules

9. Distance Based Approaches to Relational Learning and Clustering

From Propositional to Relational Data Mining

Frontmatter

10. How to Upgrade Propositional Learners to First Order Logic: A Case Study

11. Propositionalization Approaches to Relational Data Mining

12. Relational Learning and Boosting

13. Learning Probabilistic Relational Models

Applications and Web Resources

Frontmatter

14. Relational Data Mining Applications: An Overview

15. Four Suggestions and a Rule Concerning the Application of ILP

16. Internet Resources on ILP for KDD

Backmatter