Skip to main content

Über dieses Buch

Machine learning is currently one of the most rapidly growing areas of research in computer science. In compiling this volume we have brought together contributions from some of the most prestigious researchers in this field. This book covers the three main learning systems; symbolic learning, neural networks and genetic algorithms as well as providing a tutorial on learning casual influences. Each of the nine chapters is self-contained.

Both theoreticians and application scientists/engineers in the broad area of artificial intelligence will find this volume valuable. It also provides a useful sourcebook for Postgraduate since it shows the direction of current research.



1. A Bayesian Approach to Causal Discovery

We examine the Bayesian approach to the discovery of causal DAG models and compare it to the constraint-based approach. Both approaches rely on the Causal Markov condition, but the two differ significantly in theory and practice. An important difference between the approaches is that the constraint-based approach uses categorical information about conditional-independence constraints in the domain, whereas the Bayesian approach weighs the degree to which such constraints hold. As a result, the Bayesian approach has three distinct advantages over its constraint-based counterpart. One, conclusions derived from the Bayesian approach are not susceptible to incorrect categorical decisions about independence facts that can occur with data sets of finite size. Two, using the Bayesian approach, finer distinctions among model structures—both quantitative and qualitative—can be made. Three, information from several models can be combined to make better inferences and to better account for modeling uncertainty. In addition to describing the general Bayesian approach to causal discovery, we review approximation methods for missing data and hidden variables, and illustrate differences between the Bayesian and constraint-based methods using artificial and real examples.
David Heckerman, Christopher Meek, Gregory Cooper

2. A Tutorial on Learning Causal Influence

In the 1990’s related research in artificial intelligence, cognitive science, and philosophy resulted in a method for learning causal relationships from passive data when we have data on at least four variables. We illustrate the method using a few simple examples. Then we present recent research showing that we can even learn something about causal relationships when we have data on only two variables.
Richard E. Neapolitan, Xia Jiang

3. Learning Based Programming

A significant amount of the software written today, and more so in the future, interacts with naturally occurring data — text, speech, images and video, streams of financial data, biological sequences — and needs to reason with respect to concepts that are complex and often cannot be written explicitly in terms of the raw data observed. Developing these systems requires software that is centered around a semantic level interaction model, made possible via trainable components that support abstractions over real world observations.
Today’s programming paradigms and the corresponding programming languages, though, are not conducive for that goal. Conventional programming languages rely on a programmer to explicitly define all the concepts and relations involved. On the other hand, in order to write programs that deal with naturally-occurring data, that is highly variable and ambiguous at the measurement level, one needs to develop a new programming model, in which some of the variables, concepts and relations may not be known at programming time, may be defined only in a data driven way, or may not be unambiguously defined without relying on other concepts acquired this way.
In Learning Based Programming (LBP), we propose a programming model that supports interaction with domain elements at a semantic level. LBP addresses key issues that will facilitate the development of systems that interact with real-world data at that level by (1) allowing the programmer to name abstractions over domain elements and information sources — defined implicitly in observed data, (2) allowing a programmer to interact with named abstractions (3) supporting seamless incorporation of trainable components into the program, (4) providing a level of inference over trainable components to support combining sources and decisions in ways that respect domain—s or application—s constraints, and (5) a compilation process that turns a data-dependent high level program into an explicit program, once data is observed.
This chapter describes preliminary work towards the design of such a language, presents some of the theoretical foundations for it and outlines a first generation implementation of a Learning based Programming language, along with some examples.
Dan Roth

4. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

By combining experimental interventions with search procedures for graphical causal models we show that under familiar assumptions, with perfect data, N - 1 experiments suffice to determine the causal relations among N>2 variables when each experiment randomizes at most one variable. We show the same bound holds for adaptive learners, but does not hold for N > 4 when each experiment can simultaneously randomize more than one variable. This bound provides a type of ideal for the measure of success of heuristic approaches in active learning methods of causal discovery, which currently use less informative measures.
Frederick Eberhardt, Clark Glymour, Richard Scheines

5. Support Vector Inductive Logic Programming

In this paper we explore a topic which is at the intersection of two areas of Machine Learning: namely Support Vector Machines (SVMs) and Inductive Logic Programming (ILP). We propose a general method for constructing kernels for Support Vector Inductive Logic Programming (SVILP). The kernel not only captures the semantic and syntactic relational information contained in the data but also provides the flexibility of using arbitrary forms of structured and non-structured data coded in a relational way. While specialised kernels have been developed for strings, trees and graphs our approach uses declarative background knowledge to provide the learning bias. The use of explicitly encoded background knowledge distinguishes SVILP from existing relational kernels which in ILP-terms work purely at the atomic generalisation level. The SVILP approach is a form of generalisation relative to background knowledge, though the final combining function for the ILP-learned clauses is an SVM rather than a logical conjunction. We evaluate SVILP empirically against related approaches, including an industry-standard toxin predictor called TOPKAT. Evaluation is conducted on a new broad-ranging toxicity dataset (DSSTox). The experimental results demonstrate that our approach significantly outperforms all other approaches in the study.
S. H. Muggleton, H. Lodhi, A. Amini, M. J. E. Sternberg

6. Neural Probabilistic Language Models

A central goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on several methods to speed-up both training and probability computation, as well as comparative experiments to evaluate the improvements brought by these techniques. We finally describe the incorporation of this new language model into a state-of-the-art speech recognizer of conversational speech.
Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, Jean-Luc Gauvain

7. Computational Grammatical Inference

Grammatical Inference (GI) concentrates on finding compact representations, i.e. grammars, of possibly infinite sets of sentences. These grammars describe what sentences do or do not belong to a particular language. The process of learning the form of a grammar based on example sentences from the language touches several fields. Here, we give an overview of the field of GI as well as fields that are closely related. We discuss linguistic, empirical, and formal grammatical inference and discuss the work that falls in the areas where these fields overlap.
Pieter W. Adriaans, Menno M. van Zaanen

8. On Kernel Target Alignment

Kernel based methods are increasingly being used for data modeling because of their conceptual simplicity and outstanding performance on many tasks. However, in practice the kernel function is often chosen using trial-and-error heuristics. In this paper we address the problem of measuring the degree of agreement between a kernel and a learning task. We propose a quantity to capture this notion, which we call alignment. We study its theoretical properties, and derive a series of simple algorithms for adapting a kernel to the targets. This produces a series of novel methods for both transductive and inductive inference, kernel combination and kernel selection for both classification and regression problems that are computationally feasible for large problems. The algorithms are tested on publicly available datasets and are shown to exhibit good performance.
Nello Cristianini, Jaz Kandola, Andre Elisseeff, John Shawe-Taylor

9. The Structure of Version Space

We investigate the generalisation performance of consistent classifiers, i.e. classifiers that are contained in the so-called version space, both from a theoretical and experimental angle. In contrast to classical VC analysis—where no single classifier within version space is singled out on grounds of a generalisation error bound—the data dependent structural risk minimisation framework suggests that there exists one particular classifier that is to be preferred because it minimises the generalisation error bound. This is usually taken to provide a theoretical justification for learning algorithms such as the well known support vector machine. A reinterpretation of a recent PAC-Bayesian result, however, reveals that given a suitably chosen hypothesis space there exists a large fraction of classifiers with small generalisation error albeit we cannot identify them for a specific learning task. In the particular case of linear classifiers we show that classifiers found by the classical p erceptron algorithm have guarantees bounded by the size of version space. These results are complemented with an empirical study for kernel classifiers on the task of handwritten digit recognition which demonstrates that even classifiers with a small margin may exhibit excellent generalisation. In order to perform this analysis we introduce the kernel Gibbs sampler—an algorithm which can be used to sample consistent kernel classifiers.
Ralf Herbrich, Thore Graepel, Robert C. Williamson


Weitere Informationen

Premium Partner


    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.