scroll identifier for mobile
main-content

## Über dieses Buch

This handbook presents some of the most recent topics in neural information processing, covering both theoretical concepts and practical applications. The contributions include:

Deep architectures Recurrent, recursive, and graph neural networks Cellular neural networks Bayesian networks Approximation capabilities of neural networks Semi-supervised learning Statistical relational learning Kernel methods for structured data Multiple classifier systems Self organisation and modal learning Applications to content-based image retrieval, text mining in large document collections, and bioinformatics

This book is thought particularly for graduate students, researchers and practitioners, willing to deepen their knowledge on more advanced connectionist models and related learning paradigms.

## Inhaltsverzeichnis

### Deep Learning of Representations

Abstract
Unsupervised learning of representations has been found useful in many applications and benefits from several advantages, e.g., where there are many unlabeled examples and few labeled ones (semi-supervised learning), or where the unlabeled or labeled examples are from a distribution different but related to the one of interest (self-taught learning, multi-task learning, and domain adaptation). Some of these algorithms have successfully been used to learn a hierarchy of features, i.e., to build a deep architecture, either as initialization for a supervised predictor, or as a generative model. Deep learning algorithms can yield representations that are more abstract and better disentangle the hidden factors of variation underlying the unknown generating distribution, i.e., to capture invariances and discover non-local structure in that distribution. This chapter reviews the main motivations and ideas behind deep learning algorithms and their representation-learning components, as well as recent results in this area, and proposes a vision of challenges and hopes on the road ahead, focusing on the questions of invariance and disentangling.
Yoshua Bengio, Aaron Courville

### Recurrent Neural Networks

Abstract
This chapter presents an introduction to recurrent neural networks for readers familiar with artificial neural networks in general, and multi-layer perceptrons trained with gradient descent algorithms (back-propagation) in particular. A recurrent neural network (RNN) is an artificial neural network with internal loops. These internal loops induce recursive dynamics in the networks and thus introduce delayed activation dependencies across the processing elements (PEs) in the network.
Sajid A. Marhon, Christopher J. F. Cameron, Stefan C. Kremer

### Supervised Neural Network Models for Processing Graphs

Abstract
An intelligent agent interacts with the environment where it lives taking its decisions on the basis of sensory data that describe the specific context in which the agent is currently operating. These measurements compose the environment representation that is processed by the agent’s decision algorithms and, hence, they should provide sufficient information to yield the correct actions to support the agent’s life. In general, the developed environment description is redundant to provide robustness with respect to noise and eventual missing data. On the other hand, a proper organization of the input data can ease the development of successful processing and decisional schemes.
Monica Bianchini, Marco Maggini

### Topics on Cellular Neural Networks

Abstract
In this chapter we present the CNN paradigm introduced by Chua and Yang and several analog parallel architectures inspired by it as well as aspects regarding their spatio-temporal dynamics and applications.
Liviu Goraş, Ion Vornicu, Paul Ungureanu

### Approximating Multivariable Functions by Feedforward Neural Nets

Abstract
Theoretical results on approximation of multivariable functions by feedforward neural networks are surveyed. Some proofs of universal approximation capabilities of networks with perceptrons and radial units are sketched. Major tools for estimation of rates of decrease of approximation errors with increasing model complexity are proven. Properties of best approximation are discussed. Recent results on dependence of model complexity on input dimension are presented and some cases when multivariable functions can be tractably approximated are described
Paul C. Kainen, Věra Kůrková, Marcello Sanguineti

### Bochner Integrals and Neural Networks

Abstract
A Bochner integral formula $$f = \mathcal{B}-\int_Y w(y)\Phi(y) d\mu(y)$$ is derived that presents a function f in terms of weights w and a parametrized family of functions Φ(y), y in Y . Comparison is made to pointwise formulations, norm inequalities relating pointwise and Bochner integrals are established, G-variation and tensor products are studied, and examples are presented.
Paul C. Kainen, Andrew Vogt

### Semi-supervised Learning

Abstract
In traditional supervised learning, one uses ”labeled” data to build a model. However, labeling the training data for real-world applications is difficult, expensive, or time consuming, as it requires the effort of human annotators sometimes with specific domain experience and training. There are implicit costs associated with obtaining these labels from domain experts, such as limited time and financial resources. This is especially true for applications that involve learning with large number of class labels and sometimes with similarities among them. Semi-supervised learning (SSL) addresses this inherent bottleneck by allowing the model to integrate part or all of the available unlabeled data in its supervised learning. The goal is to maximize the learning performance of the model through such newly-labeled examples while minimizing the work required of human annotators. Exploiting unlabeled data to help improve the learning performance has become a hot topic during the last decade and it is divided into four main directions: SSL with graphs, SSL with generative models, semi-supervised support vector machines and SSL by disagreement (SSL with committees). This survey article provides an overview to research advances in this branch of machine learning.
Mohamed Farouk Abdel Hady, Friedhelm Schwenker

### Statistical Relational Learning

Abstract
Relational learning refers to learning from data that have a complex structure. This structure may be either internal (a data instance may itself have a complex structure) or external (relationships between this instance and other data elements). Statistical relational learning refers to the use of statistical learning methods in a relational learning context, and the challenges involved in that. In this chapter we give an overview of statistical relational learning. We start with some motivating problems, and continue with a general description of the task of (statistical) relational learning and some of its more concrete forms (learning from graphs, learning from logical interpretations, learning from relational databases). Next, we discuss a number of approaches to relational learning, starting with symbolic (non-probabilistic) approaches, and moving on to numerical and probabilistic methods. Methods discussed include inductive logic programming, relational neural networks, and probabilistic logical or relational models
Hendrik Blockeel

### Kernel Methods for Structured Data

Abstract
Kernel methods are a class of non-parametric learning techniques relying on kernels. A kernel generalizes dot products to arbitrary domains and can thus be seen as a similarity measure between data points with complex structures. The use of kernels allows to decouple the representation of the data from the specific learning algorithm, provided it can be defined in terms of distance or similarity between instances. Under this unifying formalism a wide range of methods have been developed, dealing with binary and multiclass classification, regression, ranking, clustering and novelty detection to name a few. Recent developments include statistical tests of dependency and alignments between related domains, such as documents written in different languages. Key to the success of any kernel method is the definition of an appropriate kernel for the data at hand. A well-designed kernel should capture the aspects characterizing similar instances while being computationally efficient. Building on the seminal work by D. Haussler on convolution kernels, a vast literature on kernels for structured data has arisen. Kernels have been designed for sequences, trees and graphs, as well as arbitrary relational data represented in first or higher order logic. From the representational viewpoint, this allowed to address one of the main limitations of statistical learning approaches, namely the difficulty to deal with complex domain knowledge. Interesting connections between the complementary fields of statistical and symbolic learning have arisen as one of the consequences. Another interesting connection made possible by kernels is between generative and discriminative learning. Here data are represented with generative models and appropriate kernels are built on top of them to be used in a discriminative setting.
Andrea Passerini

### Multiple Classifier Systems: Theory, Applications and Tools

Abstract
In many Pattern Recognition applications, the achievement of acceptable recognition rates is conditioned by the large pattern variability, whose distribution cannot be simply modeled.
This affects the results at each stage of the recognition system so that, once this has been designed, its performance cannot be improved over a certain bound, despite the efforts in refining either the classification or the description method.
Francesco Gargiulo, Claudio Mazzariello, Carlo Sansone

### Self Organisation and Modal Learning: Algorithms and Applications

Abstract
Modal learning in neural computing [33] refers to the strategic combination of modes of adaptation and learning within a single artificial neural network structure. Modes, in this context, are learning methods that are transferable from one learning architecture to another, such as weight update equations. In modal learning two or more modes may proceed in parallel in different parts of the neural computing structure (layers and neurons), or they occupy the same part of the structure, and there is a mechanism for allowing the neural network to switch between modes.
From a theoretical perspective any individual mode has inherent limitations because it is trying to optimise a particular objective function. Since we cannot in general know a priori the most effective learning method or combination of methods for solving a given problem, we should equip the system (the neural network) with the means to discover the optimal combination of learning modes during the learning process. There is potential to furnish a neural system with numerous modes. Most of the work conducted so far concentrates on the effectiveness of two to four modes. The modal learning approach applies equally to supervised and unsupervised (including self organisational) methods. In this chapter, we focus on modal self organisation.
Examples of modal learning methods include the Snap-Drift Neural Network (SDNN) [5, 25, 28, 33, 32] which toggles its learning between two modes, an adaptive function neural network, in which adaptation applies simultaneously to both the weights and to the shape of the individual neuron activation functions, and the combination of four learning modes, in the form of Snap-drift ADaptive FUnction Neural Network [17, 18, 33]. In this chapter, after reviewing modal learning in general, we present some examples methods of modal self organisation. Self organisation is taken in the broadest context to include unsupervised methods.We review the simple unsupervised modalmethod called snap-drift [5, 25, 28, 32], which combines Learning Vector Quantization [21, 22, 23, 37] with a ’Min’ or Fuzzy AND method. Snap-drift is then applied to the Self-Organising Map [34]. The methods are utilised in numerous real-world problems such as grouping learners’ responses to multiple choice questions, natural language phrase recognition and pattern classification on well known datasets. Algorithms, dataset descriptions, pseudocode and Matlab code are presented.
Dominic Palmer-Brown, Chrisina Jayne

### Bayesian Networks, Introduction and Practical Applications

Abstract
In this chapter, we will discuss Bayesian networks, a currently widely accepted modeling class for reasoning with uncertainty. We will take a practical point of view, putting emphasis on modeling and practical applications rather than on mathematical formalities and the advanced algorithms that are used for computation. In general, Bayesian network modeling can be data driven. In this chapter, however, we restrict ourselves to modeling based on domain knowledge only. We will start with a short theoretical introduction to Bayesian networks models and inference. We will describe some of the typical usages of Bayesian network models, e.g. for reasoning and diagnostics; furthermore, we will describe some typical network behaviors such as the explaining away phenomenon, and we will briefly discuss the common approach to network model design by causal modeling. We will illustrate these matters by a detailed modeling and application of a toy model for medical diagnosis. Next, we will discuss two real-world applications. In particular we will discuss the modeling process in some details. With these examples we also aim to illustrate that the modeling power of Bayesian networks goes further than suggested by the common textbook toy applications. The first application that we will discuss is for victim identification by kinship analysis based on DNA profiles. The distinguishing feature in this application is that Bayesian networks are generated and computed on-the-fly, based on case information. The second one is an application for petrophysical decision support to determine the mineral content of a well based on borehole measurements. This model illustrates the possibility to model with continuous variables and nonlinear relations.
Wim Wiegerinck, Willem Burgers, Bert Kappen

### Relevance Feedback in Content-Based Image Retrieval: A Survey

Abstract
In content-based image retrieval, relevance feedback is an interactive process, which builds a bridge to connect users with a search engine. It leads to much improved retrieval performance by updating a query and similarity measures according to a user’s preference; and recently techniques have matured to some extent. Most previous relevance feedback approaches exploit short-term learning (intraquery learning) that deals with the current feedback session but ignoring historical data from other users, which potentially results in a great loss of useful information. In the last few years, long-term learning (inter-query learning), by recording and collecting feedback knowledge from different users over a variety of query sessions has played an increasingly important role in multimedia information searching. It can further improve the retrieval performance in terms of effectiveness and efficiency. In the published literature, no comprehensive survey of both short-term learning and long-term learning RF techniques has been conducted. To this end, the goal of this chapter is to address this omission and offer suggestions for future work.
Jing Li, Nigel M. Allinson

### Learning Structural Representations of Text Documents in Large Document Collections

Abstract
The main aim of this chapter is to study the effects of structural representation of text documents when applying a connectionist approach to modelling the domain. While text documents are often processed un-structured, we will show in this chapter that the performance and problem solving capability of machine learning methods can be enhanced through the use of suitable structural representations of text documents. It will be shown that the extraction of structure from text documents does not require a knowledge of the underlying semantic relationships among words used in the text. This chapter describes an extension of the bag of words approach. By incorporating the “relatedness” of word tokens as they are used in the context of a document, this results in a structural representation of text documents which is richer in information than the bag of words approach alone. An application to very large datasets for a classification and a regression problem will show that our approach scales very well. The classification problem will be tackled by the latest in a series of techniques which applied the idea of self organizing map to graph domains. It is shown that with the incorporation of the relatedness information as expressed using the Concept Link Graph, the resulting clusters are tighter when compared them with those obtained using a self organizing map alone using a bag of words representation. The regression problem is to rank a text corpus. In this case, the idea is to include content information in the ranking of documents and compare them with those obtained using PageRank. In this case, the results are inconclusive due possibly to the truncation of the representation of the Concept Link Graph representations. It is conjectured that the ranking of documents will be sped up if we include the Concept Link Graph representation of all documents together with their hyperlinked structure. The methods described in this chapter are capable of solving real world and data mining problems.
Ah Chung Tsoi, Markus Hagenbuchner, Milly Kc, ShuJia Zhang

### Neural Networks in Bioinformatics

Abstract
Bioinformatics or computational biology is a multidisciplinary research area that combines molecular biology, computer science, and mathematics. Its aims are to organize, utilize and explore the vast amount of information obtained from biological experiments for understanding the relationships and useful patterns in data. Bioinformatics problems, such as protein structure prediction and sequence alignments, are commonly categorized as non-deterministic polynomial problems, and require sophisticated algorithms and powerful computational resources. Artificial Intelligence (AI) techniques have a proven track record in the development of many research areas in the applied sciences. Among the AI techniques, artificial neural networks (ANNs) and their variations have proven to be one of the more powerful tools in terms of their generalization and pattern recognition capabilities. In this chapter, we review a number of bioinformatics problems solved by different artificial neural network architectures.
Masood Zamani, Stefan C. Kremer

### Backmatter

Weitere Informationen