nach oben

2003 | Buch

Kapitel lesen Erstes Kapitel lesen

Front-End Vision and Multi-Scale Image Analysis

Multi-Scale Computer Vision Theory and Applications, written in Mathematics

verfasst von: Prof. Dr. Bart M. ter Haar Romeny

Verlag: Springer Netherlands

Buchreihe : Computational Imaging and Vision

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

Many approaches have been proposed to solve the problem of finding the optic flow field of an image sequence. Three major classes of optic flow computation techniques can discriminated (see for a good overview Beauchemin and Barron IBeauchemin19951): gradient based (or differential) methods; phase based (or frequency domain) methods; correlation based (or area) methods; feature point (or sparse data) tracking methods; In this chapter we compute the optic flow as a dense optic flow field with a multi scale differential method. The method, originally proposed by Florack and Nielsen [Florack1998a] is known as the Multiscale Optic Flow Constrain Equation (MOFCE). This is a scale space version of the well known computer vision implementation of the optic flow constraint equation, as originally proposed by Horn and Schunck [Horn1981]. This scale space variation, as usual, consists of the introduction of the aperture of the observation in the process. The application to stereo has been described by Maas et al. [Maas 1995a, Maas 1996a]. Of course, difficulties arise when structure emerges or disappears, such as with occlusion, cloud formation etc. Then knowledge is needed about the processes and objects involved. In this chapter we focus on the scale space approach to the local measurement of optic flow, as we may expect the visual front end to do. 17. 2 Motion detection with pairs of receptive fields As a biologically motivated start, we begin with discussing some neurophysiological findings in the visual system with respect to motion detection.

Inhaltsverzeichnis

Frontmatter

1. Apertures and the notion of scale

Observations are always done by integrating some physical property with a measurement device. Integration can be done over a spatial area, over an amount of time, over wavelengths etc. depending on the task of the physical measurement. For example, we can integrate the emitted or reflected light intensity of an object with a CCD (charge-coupled device) detector element in a digital camera, or a grain in the photographic emulsion in a film, or a photoreceptor in our eye. These ‘devices’ have a sensitive area, where the light is collected. This is the aperture for this measurement. Today’s digital cameras have several million ‘pixels’ (picture elements), very small squares where the incoming light is integrated and transformed into an electrical signal. The size of such pixels/apertures determines the maximal sharpness of the resulting picture.

2. Foundations of scale-space

To compute any type of representation from the image data, information must be extracted using certain operators interacting with the data. Basic questions then are: Which operators to apply? Where to apply them? How should they look like? How large should they be?

3. The Gaussian kernel

The Gaussian (better Gaußian) kernel is named after Carl Friedrich Gauß (1777–1855), a brilliant German mathematician. This chapter discusses many of the attractive and special properties of the Gaussian kernel.

4. Gaussian derivatives

We will encounter the Gaussian derivative function at many places throughout this book. The Gaussian derivative function has many interesting properties. We will discuss them in one dimension first. We study its shape and algebraic structure, its Fourier transform, and its close relation to other functions like the Hermite functions, the Gabor functions and the generalized functions. In two and more dimensions additional properties are involved like orientation (directional derivatives) and anisotropy.

5. Multi-scale derivatives: implementations

In order to get a good feeling for the interactive use of Mathematica, we discuss in this section three implementations of convolution with a Gaussian derivative kernel (in 2D) in detail:

implementation in the spatial domain with a 2D kernel;

through two sequential 1D kernel convolutions (exploiting the separability property);

implementation in the Fourier domain.

Just blurring is done through convolution with the zero order Gaussian derivative, i.e. the Gaussian kernel itself.

6. Differential structure of images

In this chapter we will study the differential structure of discrete images in detail. This is the structure described by the local multi-scale derivatives of the image. We start with the development of a toolkit for the definitions of heightlines, local coordinate systems and independence of our choice of coordinates.

7. Natural limits on observations

8. Differentiation and regularization

Regularization is the technique to make data behave well when an operator is applied to them. Such data could e.g. be functions, that are impossible or difficult to differentiate, or discrete data where a derivative seems to be not defined at all. In scale-space theory, we realize that we do physics. This implies that when we consider a system, a small variation of the input data should lead to small change in the output data.

9. The front-end visual system — the retina

The visual system is our most important sense. It is estimated that about one quarter of all nerve cells we have in our central nervous system (CNS) is related to vision in some way. The task of the system is not to form an image of the outside world into the brain, but to help us to survive in this world. Therefore it is necessary to perform a substantial analysis of the 2D image as a projection of the 3D world. For this reason there is much more measured than just the spatial intensity distribution. We will see that the front-end visual system measures simultaneously at multiple resolutions, it measures directly (in the scale-space model) derivatives of the image in all directions at least up to fourth order, it measures temporal changes of intensity, the motion and disparity parameters, and the color differential structure. As a consequence, the layout of the receptors on the retina is strikingly different from the 2D pixel arrays in our conventional digital camera’s.

10. A scale-space model for the retinal sampling

Why do we have all these different sizes? Smaller receptive fields are useful for a sharp high-resolution measurement, while the larger receptive fields measure a blurred picture of the world. We denote the size of the receptive field its scale. We seem to sample the incoming image with our retina at many scales simultaneously.

11. The front-end visual system — LGN and cortex

From the retina, the optic nerve runs into the central brain area and makes a first monosynaptic connection in the Lateral Geniculate Nucleus, a specialized area of the thalamus (see figure 11.1 and 11.2).

12. The front-end visual system — cortical columns

Hubel and Wiesel were the first to find the regularity of the orientation sensitivity tuning. They recorded a regular change of the orientation sensitivity of receptive fields when the electrode followed a track tangential to the cortex surface (see figure 12.1).

13. Deep structure I. watershed segmentation

The previous chapters have presented the notion of scale — any observation is, implicitly or explicitly, defined in terms of the area of support for the observation. This allows different observations at the same location that focus on different structures. A classical illustration of this concept is the observation of a tree. At fine scale the structures of the bark and the leaves are apparent. In order to see the shapes of the leaves and the twigs a higher scale is required; an even higher scale is appropriate for studying the branches, and finally the stem and the crown are best described at a very coarse scale.

Erik Dam, Bart M. ter Haar Romeny

14. Deep structure II. catastrophe theory

The previous chapter illustrates a number of approaches that explore the deep structure. However, there are a number of caveats. The edge focusing technique implicitly assumes that the edges for the signal can be located at the adjacent lower scale level in a small neighborhood around the location at the current scale. As mentioned, no formal scheme for defining the size and shape of the neighborhood is presented. Furthermore, this method ignores the problems encountered when edge points merge or split with increasing scale.

Erik Dam, Bart M. ter Haar Romeny

15. Deep structure III. topological numbers

In the previous chapters we detected and followed the singularity strings through scale-space in an ac hoc manner. In the scale selection section, the detection of maxima was done by simply looking for pixels with values larger than its neighbors. In the edge focusing section, minima and maxima were detected (and distinguished) for a 1D signal by looking at sign changes for the derivative signal. Furthermore these extrema were tracked down through scale-space by simply looking for extrema in a close neighborhood in successive scale levels below. Finally, in the multi-scale segmentation section, the dissimilarity minima were represented indirectly by the catchment basins, and the linking across scale was done robustly by matching regions instead of points.

Bart M. ter Haar Romeny, Erik Dam

16. Deblurring Gaussian blur

To discuss an application where really high order Gaussian derivatives are applied, we study the deblurring of Gaussian blur by inverting the action of the diffusion equation, as originally described by Florack et al. [Florack et al. 1994b, TerHaarRomeny 1994a].

17. Multi-scale optic flow

In this chapter we focus on the quantitative extraction of small differences in an image sequence caused by motion, and in an image pair by differences in depth. We like to extract the local motion parameters as a small local shift over time or space. We call the resulting vectorfield the optic flow from the image sequence, a spatio-temporal feature, and we call the resulting vectorfield the disparity map for the stereo pair. As the application of the method described in this chapter is virtually the same for stereo disparity extraction, we will focus in the treatment on spatio-temporal optic flow.

Bart ter Haar Romeny, Luc Florack, Avan Suinesiaputra

18. Color differential structure

Color is an important extra dimension. Information extracted from color is useful for almost any computer vision task, like segmentation, surface characterization, etc. The field of color science is huge [Wyszecki2000], and many theories exist. It is far beyond the scope of this book to cover even a fraction of the many different approaches. We will focus on a single recent theory, based on the color sensitive receptive fields in the front-end visual system. We are especially interested in the extraction of multi-scale differential structure in the spatial and the color domain of color images. This scale-space approach was recently introduced by Geusebroek et al. [Geusebroekl999a, Geusebroek2000a], based on the pioneering work of Koenderink’s Gaussian derivative color model [Koenderinkl998a]. This chapter presents the theory and a practical implementation of the extraction of color differential structure.

Jan-Mark Geusebroek, Bart M. ter Haar Romeny, Jan J. Koenderink, Rein van den Boomgaard, Peter Van Osta

19. Steerable kernels

Clearly, orientation of structure is a multi-scale concept. On a small scale, the local orientation of structural elements, such as edges and ridges, may be different from the orientations of the same elements at a larger scale. Figure 19.1 illustrates this.

20. Scale-time

In the time domain we encounter sampled data just as in the spatial domain. E.g. a movie is a series of frames, samples taken at regular intervals. In the spatial domain we needed an integration over a spatial area to catch the information. Likewise, we need to have an aperture in time integrating for some time to perform the measurement. This is the integration time. Systems with a short resp. long integration time are said to have a fast resp. slow response. Because of the necessity of this integration time, which need to have a finite duration (temporal width) in time, a scale-space construct is a physical necessity again.

21. Geometry-driven diffusion

So far we calculated edges and other differential invariants at a range of scales. The task determined whether to select a fine or a coarse scale. The advantage of selecting a larger scale was the improved reduction of noise, and the appearance of more prominent structure, but the price to pay for this is reduced localization accuracy. Linear, isotropic diffusion cannot preserve the position of the differential invariant features over scale.

22. Epilog

Computer vision is a huge field, and this book could only touch upon a small section of it. First of all, the emphasis has been on the introduction of the notion of observing the physical phenomena, which makes the incorporation of scale unavoidable. Secondly, scale-space theory nicely starts from an axiomatic basis, and incorporates the full mathematical toolbox. It has become a mature branch in modern computer vision research.

Backmatter

Titel: Front-End Vision and Multi-Scale Image Analysis
verfasst von: Prof. Dr. Bart M. ter Haar Romeny
Verlag: Springer Netherlands
Electronic ISBN: 978-1-4020-8840-7
Print ISBN: 978-1-4020-1503-8
DOI: https://doi.org/10.1007/978-1-4020-8840-7

Springer Professional