nach oben

2003 | Buch

Kapitel lesen Erstes Kapitel lesen

Hierarchical Neural Networks for Image Interpretation

verfasst von: Sven Behnke

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

Human performance in visual perception by far exceeds the performance of contemporary computer vision systems. While humans are able to perceive their environment almost instantly and reliably under a wide range of conditions, computer vision systems work well only under controlled conditions in limited domains.

This book sets out to reproduce the robustness and speed of human perception by proposing a hierarchical neural network architecture for iterative image interpretation. The proposed architecture can be trained using unsupervised and supervised learning techniques.

Applications of the proposed architecture are illustrated using small networks. Furthermore, several larger networks were trained to perform various nontrivial computer vision tasks.

Inhaltsverzeichnis

Frontmatter

Introduction

Abstract

Visual perception is important for both humans and computers. Humans are visual animals. Just imagine how loosing your sight would effect you to appreciate its importance. We extract most information about the world around us by seeing.

This is possible because photons sensed by the eyes carry information about the world. On their way from light sources to the photoreceptors they interact with objects and get altered by this process. For instance, the wavelength of a photon may reveal information about the color of a surface it was reflected from. Sudden changes in the intensity of light along a line may indicate the edge of an object. By analyzing intensity gradients, the curvature of a surface may be recovered. Texture or the type of reflection can be used to further characterize surfaces. The change of visual stimuli over time is an important source of information as well. Motion may indicate the change of an object’s pose or reflect ego-motion. Synchronous motion is a strong hint for segmentation, the grouping of visual stimuli to objects because parts of the same object tend to move together.

Sven Behnke

Part I. Theory

Frontmatter

Neurobiological Background

Abstract

Learning from nature is a principle that has inspired many technical developments. There is even a field of science concerned with this issue: bionics. Many problems that arise in technical applications have already been solved by biological systems because evolution has had millions of years to search for a solution. Understanding nature’s approach allows us to apply the same principles for the solution of technical problems.

Sven Behnke

Related Work

Abstract

In the previous chapter, we saw that object recognition in the human visual system is based on a hierarchy of retinotopic feature maps with local recurrent connectivity. The following chapter reviews several applications of the concepts of hierarchy and recurrence to the representation, processing, and interpretation of images with computers.

Sven Behnke

Neural Abstraction Pyramid Architecture

Abstract

The last two chapters reviewed what is known about object recognition in the human brain and how the concepts of hierarchy and recurrence have been applied to image processing. Now it is time to put both together.

In this chapter, an architecture for image interpretation is defined that will be used for the remainder of this thesis. I will refer to this architecture as the Neural Abstraction Pyramid. The Neural Abstraction Pyramid is a neurobiologically inspired hierarchical neural network with local recurrent connectivity. Images are represented at multiple levels of abstraction. Local connections form horizontal and vertical feedback loops between simple processing elements. This allows to resolve ambiguities by the flexible use of partial interpretation results as context.

Sven Behnke

Unsupervised Learning

Abstract

The example networks presented so far were designed manually to highlight different features of the Neural Abstraction Pyramid architecture. While the manually designed networks are relatively easy to interpret, their utility is limited by the low network complexity. Only relatively few features can be designed manually. If multiple layers of abstraction are needed, the design complexity explodes with height, as the number of different feature arrays and the number of potential weights per feature increase exponentially.

Sven Behnke

Supervised Learning

Abstract

In the last chapter, supervised learning has already been used to classify the outputs of a Neural Abstraction Pyramid that was trained with unsupervised learning. In this chapter, it is discussed how supervised learning techniques can be applied in the Neural Abstraction Pyramid itself.

After an introduction, supervised learning in feed-forward neural networks is covered. Attention is paid to the issues of weight sharing and the handling of network borders, which are relevant for the Neural Abstraction Pyramid architecture. Section 6.3 discusses supervised learning for recurrent networks. The difficulty of gradient computation in recurrent networks makes it necessary to employ algorithms that use only the sign of the gradient to update the weights.

Sven Behnke

Part II. Applications

Frontmatter

Recognition of Meter Values

Abstract

The remainder of the thesis applies the proposed Neural Abstraction Pyramid to several computer vision tasks in order to investigate the performance of this approach.

This chapter deals with the recognition of postage meter values. A feed-forward Neural Abstraction Pyramid is trained in a supervised fashion to solve a pattern recognition task. The network classifies an entire digit block and thus does not need prior digit segmentation. If the block recognition is not confident enough, a second stage tries to recognize single digits, taking into account the block classifier output for a neighboring digit as context. The system is evaluated on a large database.

Sven Behnke

Binarization of Matrix Codes

Abstract

In this chapter, the binarization of matrix codes is investigated as an application of supervised learning of image processing tasks using a recurrent version of the Neural Abstraction Pyramid.

The desired network output is computed using an adaptive thresholding method for images of high contrast. The network is trained to iteratively produce it even when the contrast is lowered and typical noise is added to the input.

Sven Behnke

Learning Iterative Image Reconstruction

Abstract

Successful image reconstruction requires the recognition of a scene and the generation of a clean image of that scene. In this chapter, I show how to use Neural Abstraction Pyramid networks for both analysis and synthesis of images. The networks have a hierarchical architecture which represents images in multiple scales with different degrees of abstraction. The mapping between these representations is mediated by a local recurrent connection structure.

Degraded images are shown to the networks which are trained to reconstruct the originals iteratively. Through iterative reconstruction, partial results provide context information that eliminates ambiguities.

The performance of this approach is demonstrated in this chapter by applying it to four tasks: super-resolution, filling-in of occluded parts, noise removal / contrast enhancement, and reconstruction from sequences of degraded images.

Sven Behnke

Face Localization

Abstract

One of the major tasks in human-computer interface applications, such as face recognition and video-telephony, is the exact localization of a face in an image.

In this chapter, I use the Neural Abstraction Pyramid architecture to solve this problem, even in presence of complex backgrounds, difficult lighting, and noise. The network is trained using a database of gray-scale still images to reproduce manually determined eye coordinates. It is able to generate reliable and accurate eye coordinates for unknown images by iteratively refining an initial solution.

The performance of the proposed approach is evaluated against a large test set. It is also shown that a moving face can be tracked. The fast network update allows for real-time operation.

Sven Behnke

Summary and Conclusions

Abstract

In order to overcome limitations of current computer vision systems, this thesis proposed an architecture for image interpretation, called Neural Abstraction Pyramid. This hierarchical architecture consists of simple processing elements that interact with their neighbors. The recurrent interactions are described be weight templates.Weighted links form horizontal and vertical feedback loops that mediate contextual influences. Images are transformed into a sequence of representations that become increasingly abstract as their spatial resolution decreases, while feature diversity as well as invariance increase. This process works iteratively. If the interpretation of an image patch cannot be decided locally, the decision is deferred, until contextual evidence arrives that can be used as bias. Local ambiguities are resolved in this way.

Sven Behnke

Backmatter

Titel: Hierarchical Neural Networks for Image Interpretation
verfasst von: Sven Behnke
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-45169-3
Print ISBN: 978-3-540-40722-5
DOI: https://doi.org/10.1007/b11963

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Introduction

Introduction

Part I. Theory

Frontmatter

Neurobiological Background

Related Work

Neural Abstraction Pyramid Architecture

Unsupervised Learning

Supervised Learning

Part II. Applications

Frontmatter

Recognition of Meter Values

Binarization of Matrix Codes

Learning Iterative Image Reconstruction

Face Localization

Summary and Conclusions

Backmatter