Skip to main content

Über dieses Buch

Computer vision solutions used to be very specific and difficult to adapt to different or even unforeseen situations. The current development is calling for simple to use yet robust applications that could be employed in various situations. This trend requires the reassessment of some theoretical issues in computer vision. A better general understanding of vision processes, new insights and better theories are needed. The papers selected from the conference staged in Dagstuhl in 1996 to gather scientists from the West and the former eastern-block countries address these goals and cover such fields as 2D images (scale space, morphology, segmentation, neural networks, Hough transform, texture, pyramids), recovery of 3-D structure (shape from shading, optical flow, 3-D object recognition) and how vision is integrated into a larger task-driven framework (hand-eye calibration, navigation, perception-action cycle).



A semidiscrete nonlinear scale-space theory and its relation to the Perona—Malik paradox

Although much effort has been spent in the recent decade to establish a theoretical foundation of certain partial differential equations (PDEs) as scale-spaces, it is almost never taken into account that, in practice, images are sampled on a fixed pixel grid1. For nonlinear PDE-based filters, usually straightforward finite difference discretizations are applied in the hope that they reflect the nice properties of the continuous equations. Since scale-spaces cannot perform better than their numerical discretizations, however, it would be desirable to have a genuinely discrete nonlinear framework which reflects the discrete nature of digital images. In this paper we discuss a semidiscrete scale-space framework for nonlinear diffusion filtering. It keeps the scale-space idea of having a continuous time parameter, while taking into account the spatial discretization on a fixed pixel grid. It leads to nonlinear systems of coupled ordinary differential equations. Conditions are established under which one can prove existence of a stable unique solution which preserves the average grey level. An interpretation as a smoothing scale-space transformation is introduced which is based on an extremum principle and the existence of a large class of Lyapunov functionals comprising for instance p-norms, even central moments and the entropy. They guarantee that the process is not only simplifying and information-reducing, but also converges to a constant image as the scale parameter t tends to infinity.
Joachim Weickert, Brahim Benhamouda

Topological approach to mathematical morphology

For low—bit—rate coding of video sequences, specifically in the context of the MPEG—4 proposal, morphological approaches proved to be highly attractive (see e.g. [16]). During a series of coding experiments performed at Siemens Research Laboratory in München [9] the authors felt that there are some deficiencies of theory which need investigation. The aim of this paper is to sketch a theory which allows to understand the relationships between three classes of discrete concepts, namely discrete topology, discrete morphology and discrete metrics. Mathematical morphology is based on topologies for systems of subsets of a set [13]. The topology of the underlying set enters only indirectly. Therefore such concepts as connectedness of sets can cause difficulties if treated purely morphologically. These conceptual difficulties became especially apparent, when structures and algorithms were used practically which simultaneously involve both subset topologies and connectedness of subsets as is the case e.g. in watershed segmentation [16].
Ulrich Eckhardt, Eckart Hundt

Segmentation by watersheds: definition and parallel implementation

In the field of grey scale mathematical morphology the watershed transform, originally proposed by Digabel and Lantuéjoul, is frequently used for image segmentation [1, 9, 11]. It can be classified as a region-based segmentation approach. The intuitive idea underlying this method is that of flooding a landscape or topographic relief with water. Basins will fill up with water starting at local minima, and at points where water coming from different basins would meet, dams are built. When the water level has reached the highest peak in the landscape, the process is stopped. The set of dams thus obtained partitions the landscape into regions or ‘catchment basins’ separated by dams. These dams are called watershed lines or simply watersheds. A sketch is given in Fig. 1.
Jos B. T. M. Roerdink, Arnold Meijster

A graph network for image segmentation

Image segmentation as one of the oldest problems in image processing and computer vision is, despite of various attempts to solve it [1], not yet solved satisfactorily. Having in mind the huge capability of the human visual system, highly parallel and pipelined computation seems to be necessary for success in this field. According to Uhr [2] parallel-serial layered architectures are best suited for image analysis. In this sense, a new Layered Graph Network (LGN) was developed [3], which is presented and applied to the processing of simulated and real world images.
Herbert Jahn

Associative memory for images by recurrent neural subnetworks

Pattern recognition and pattern autoassociation are related but not identical tasks attributed to intelligent systems. In pattern recognition, a system is sup­posed to identify a class Ci, i = 1, ... , K, where an object belongs to, giving the object’s features x which were previously measured and delivered to the system. In autoassociation, an associative memory reconstructs the original pattern x when distorted or incomplete version x’ is presented to the system [1, 8, 2, 3]. Usually such a system is trained to store many original patterns which can be considered as representatives of pattern classes.
Władysław Skarbek

Optimal models for visual recognition

Over the years building models of objects from sensory data has been tackled in various ways. Following [1], model based recognition methods are divided into graph theoretic and non graph theoretic. Graph theoretic methods use graphs as a representation for objects and scenes. An object is divided into parts. Nodes of a graph that describes an object characterize the parts of the object and arcs of the graph represent spatial relations among parts of the object. Recognition of an object in the scene is performed as search for a subgraph isomorphism between the scene graph and each of the model graphs. In non graph theoretic methods, local features are used to describe the object. Grimson and Lozano-Peres [3], used a constrained tree search to efficiently coordinate values of point features and surface normals in models to those found in the scenes.
Matevž Kovačič, Bojan Kverh, Franc Solina

Order of points on a line segment

The Hough transform is a method for the detection of many lines on a plane [1,2,3,4]. This method achieves line detection by converting the line fitting problem on an imaging plane to a peak search problem in an accumulator space using the voting procedure. Although the Hough transform provides a method for line detection, this transform can not detect line segments. For the detection of line segments, it is necessary to detect both endpoints of each line segment. The detection of pairs of endpoints of line segments is mainly performed using the point following procedure by local window operation along each line; that is, assuming the connectivity of digitized points, the algorithm follows a series of sample points which should lie on a line. The method is, however, equivalent to a whole area search in the worst case, because it is necessary to investigate the connectivity of all sample points in the region of interest, point by point.
Atsushi Imiya

Subjective contours detection

Basic visual activities are related to edge detection (motion, recognition,...). We show that there are different kinds of edges. The most known and already studied are edges due to luminance variations. After a brief introduction to classical edge-detectors, we introduce another class of edges, the “subjective contours”. We assume end-points as textons as proposed by B. Julesz [7] and D. Marr [11]. The first step of our algorithm consists in geometric feature (i.e. segments) extraction using the hierarchical Hough transform. The end-points of the detected features are then drawn on a new image. This second “feature map” (of higher level) is then processed in a second step using again the hierarchical Hough transform. The end-points acts as clues indicating the presence of “subjective contours”. We show that there is a relation between the number of clues present in the “feature map” and the pop-out phenomenon of the “subjective contour”.
Souheil Ben-Yacoub

Texture feature based interaction maps: potential and limits

Motivated by the discovery of the high level texture features responsible for perceptual grouping of textures [11] and the development of the Markov-Gibbs texture model with pairwise pixel interactions [9], we have recently proposed the method of feature based interaction maps (FBIM) and applied this new tool to the problem of pattern orientation [4] and rotation-invariant texture classification [7]. Experimental results have demonstrated that the FBIM approach can be used to recover the basic structural properties and orientation of a wide range of patterns, including weak structures.
Dmitry Chetverikov

Non-Markov Gibbs image model with almost local pairwise pixel interactions

Markov/Gibbs models represent digital images as samples of Markov random fields (MRF) on finite 2D lattices with Gibbs probability distributions (GPD). Most of the known models take account of only pairwise pixel interactions. These models, studied in general form by Dobrushin [11], Averintsev [1], and Besag [3], were first applied to the images by Cross and Jain [9], Hassner and Sklansky [23], Lebedev et al. [25], Derin et al. [10], Geman and Geman [15]. Later, they were studied in numerous works (see, for instance, surveys [24, 13, 7, 28]). The models have features useful for describing and analysing image textures.
Georgy L. Gimel’farb

Equivalent contraction kernels to build dual irregular pyramids

A raw digital image consists of a 2D spatial arragement of pixels each of which results from measuring the light at a specific location of the image plane. Currently most of the artificial sensors (e.g. CCD cameras) have the rigid structure of an orthogonal grid, whereas most natural vision systems are based on non-regular arrangements of sensors [1]. Although arrays are certainly easier to manage technically, topological relations seem to play an even more important role for vision tasks in natural systems than precise geometrical positions.
Walter G. Kropatsch

Towards a generalized primal sketch

Seeing is probably the leading goal of Computer Vision. Yet, what to see may be more difficult to analyze as it is mainly application-dependent. Hence, if we commonly accept to represent a scene by means of a set of pixels (picture elements), independently of the acquisition process, or the pixels topology, treatments applied on such elements differ greatly according to the approaches. Similarly, biological vision is in agreement with the “sampling” principle of the scene; yet, subsequent treatments are not clearly defined but “global” functions leading to cortical specializations.
Christophe Duperthuy, Jean-Michel Jolion

Categorization through temporal analysis of patterns

Fifteen years ago, in Vision [1], David Marr proposed a “computational investigation into the human representation and processing of visual information”. This book is considered by many in the field of computer vision as the main work of these last fifteen years. Indeed, Marr was the first to propose a complete methodology for computer vision which became known as the Marr paradigm. Considering vision as an information-processing system and a system as a mapping from one representation to another, Marr defined more precisely vision as a process that produces, from images of the external world, a description that is useful to the viewer and not cluttered with irrelevant information. Mary’s hypothesis was “if we are able to create, using vision, an accurate representation of the three-dimensional world and its properties, then using this information we can perform any visual task” [2]. Visually perceiving the external world and using these information were clearly separated.
Jean-Michel Jolion

Detection of regions of interest via the Pyramid Discrete Symmetry Transform

Pyramid computation has been introduced to design efficient vision algorithms [1], [2] based on both top-down and bottom-up strategies. It has been also suggested by biological arguments that show a correspondence between pyramids architecture and the mammalian visual pathway, starting from the retina and ending in the deepest layers of the visual cortex.
Vito Di Gesú, Cesare Valenti

Dense depth maps by active color illumination and image pyramids

Only few problems in computer vision have been investigated more vigorously than stereo vision. The key problem in stereo is how to find the corresponding points in the left and in the right image, referred to as the correspondence problem. Whenever the corresponding points are determined, the depth can be computed by triangulation. Although, more than 300 papers have been published dealing with stereo vision this technique still suffers from a lack in accuracy and/or long computation time needed to match stereo images. Therefore, there is still a need for more precise and faster algorithms.
Andreas Koschan, Volker Rodehorst

Local and global integration of discrete vector fields

Several methods in the field of shape reconstruction [6, 8, 11] (most shading based methods) lead to gradient data that still have to be transformed into (scaled) height or depth maps, or into surface data for many applications. Thus the reconstruction accuracy also depends upon the performance of such a transformation module. Surprisingly, not much work was done so far in this area. This paper starts with a review of the state of the art and discusses two approaches in detail. Several experimental evaluations of both methods for transforming gradient data into height data are reported. The studied (synthetic and real) object classes are curved and polyhedral objects. General qualitative evaluations of the compared transformation procedures are possible in relation to these object classes and in relation to different types of noise simulated for synthetic objects.
Karsten Schlüns, Reinhard Klette

A new approach to shape from shading

Most of the works on Shape from Shading [e.g. 1, 2] consider the problem as the restoration of the spatial shape of a smooth continuous surface when a continuous function representing the brightness at each point of a plane projection of the surface is given. In order to solve the problem, properties of nonlinear partial differential equations are investigated. The ultimate solution is then performed by some numerical methods. As seen e.g. from the recent review [3], this approach has not led to a practically usable solution. The author sees the reason of the lack of success in the discrepancy between continuous models and the theory of differential equations on the one hand and the digital nature of the images as well as the use of numerical methods on the other hand. Thinking first about continuous functions and differential equations, and then coming back to a finite numerical representation and a numerical solution, is irrational.
Vladimir A. Kovalevsky

Recent uniqueness results in shape from shading

The main purpose of this paper is to discuss briefly two topics. The first one is to show that Sneddon’s claim ([9, Section 7 pp. 61]) about representability of any solution to a given first-order partial differential equation in terms of either a complete or a general or a singular integral is erroneous. The literature on complete integrals is a bewildering collection of incomplete and false statements (see e.g. Dou [6] or [9]). Recent results by Chojnacki [3] and Kozera [8] shed new light on this topic and fill a gap in the literature. The second goal of this paper is to critically inspect uniqueness results (see Brooks [1, 2]) concerning the images of a Lambertian hemisphere and a Lambertian plane, which resort to Sneddon’s erroneous assertion and as such are invalid. Finally, we adopt a different approach so that the results claimed in [1, 2], subject to minor reformulations, become valid. For a more detailed analysis an interested reader is also referred to [3] or [8].
Ryszard Kozera

Computation of time-varying motion and structure parameters from real image sequences

We address the problem of robust estimation of motion and structure parameters, which describe an observer’s translation, rotation, and environmental layout (i.e. the relative depth of visible 3-d points) from noisy time-varying optical flow. Allowable observer motions include a moving vehicle and a broad class of robot arm motions. We assume the observer is a camera rigidly attached to the moving vehicle or robot arm, which moves along a smooth trajectory in a stationary environment. As the camera moves it acquires images at some reasonable sampling rate (say 30 images per second). Given a sequence of such images we analyze them to recover the camera’s motion and depth information for various surfaces in the environment. As the camera moves, with respect to some 3-d environmental point, the relative 3-d velocity that occurs is mapped (under perspective projection) onto the camera’s image plane as 2-d image motion. Optical flow or image velocity is an infinitesimal approximation to this image motion. Since the camera moves relative to a scene we can compute image velocity fields at each time. Given the observer’s translation, \(\vec U\), and rotation, \(\vec \omega \), and the coordinates of a 3-d point, \(\vec P\), a non-linear equation that relates these parameters to the 2-d image velocity, \(\vec \upsilon \), at image point \(\vec Y\), where \(\vec Y\) is the perspective projection of \(\vec P\), is as follows [10]:
$$\vec \upsilon \left( {\vec Y,t} \right) = {\vec \upsilon _T}\left( {\vec Y,t} \right) + {\vec \upsilon _R}\left( {\vec Y,t} \right)$$
where \({\vec \upsilon _T} \) and \({\vec \upsilon _R}\) are the translational and rotational components of image velocity:
$${\vec \upsilon _T}\left( {\vec Y,t} \right) = {A_1}\left( {\vec Y} \right)\vec u\left( {\vec Y,t} \right){\left\| {\vec Y} \right\|_2} and {\vec v_R}\left( {\vec Y,t} \right) = {A_2}\left( {\vec Y} \right)\vec \omega \left( t \right)$$
$${A_1} = \left( {\begin{array}{*{20}{c}}{ - 1}&0&{{y_1}} \\ 0&{ - 1}&{{y_2}} \end{array}} \right) and {A_2}\left( {\begin{array}{*{20}{c}} {{y_1}{y_2}}&{ - \left( {1 + y_1^2} \right)}&{{y_2}} \\ {\left( {1 + y_2^2} \right)}&{ - {y_1}{y_2}}&{{y_1}} \end{array}} \right)$$
John L. Barron, Roy Eagleson

A theory of occlusion in the context of optical flow

Traditionally, image motion and its approximation known as optical flow have been treated as continuous functions of the image domain [9]. However, in realistic imagery, one finds cases verifying this hypothesis exceedingly rarely. Many phenomena may cause discontinuities in the optical flow function of imagery [16]. Among them, occlusion and translucency are frequent causes of discontinuities in realistic imagery. In addition, their information content is useful to later stages of processing [8] such as motion segmentation [1] and 3-d surface reconstruction [17].
Steven S. Beauchemin, John L. Barron

Algebraic method for solution of some best matching problems

Without Abstract
Michail Schlesinger

Determining the attitude of planar objects with general curved contours from a single perspective view

In 3D-scene analysis, the monocular 3D-pose estimation represents a relatively restricted part. The goal is to determine the location and orientation of modelled objects in a threedimensional scene by a single perspective view. The need for a model description restricts these methods mainly to technical objects.
Michael Schubert, Klaus Voss

CAD based 3d object recognition on range images

In industrial manufacturing the production process still is separated from the design level. But, the growing need for a higher standard of quality, a higher variety of products and a more flexible production forces to bring the separated fields together. Only a broad communication between all levels can guarantee that the causes for malfunctions are eliminated early and quickly. Thus, it is desirable to use general CAD descriptions at all levels of manufacturing. One step towards this direction is the new field called CAD Based Vision (CBV) introducing usual CAD object representations into the computer vision community (e. g. [7, 13, 16]).
Björn Krebs, Friedrich M. Wahl

Dual quaternions for absolute orientation and hand-eye calibration

Many computer vision problems involving three dimensional motion necessitate an efficient representation for 3D displacement that enhances the understanding of the problem and facilitates linear solutions of low complexity. The most common representation is a rotation about an axis through the origin followed by a translation and represented by an orthogonal matrix and a vector, respectively. An alternative representation to orthogonal matrices are the unit quaternions already used in several vision algorithms [4] which still use the translation as a separate unknown. From Chasles’ theorem [1] it is known that a rigid transformation can be modeled as a rotation about an axis not through the origin and a translation along this axis. This well known screw transformation can be algebraically modeled using dual vectors, matrices or quaternions [5].
Konstantinos Daniilidis

Segmentation of behavioral spaces for navigation tasks

From the very beginning, the forefathers of the Artificial Intelligence field (McCarthy, Minsky, Newel and Simon) have emphasized the importance of the internal representation of an agent, whether artificial or biological. The issue that has been debated for the last 30 years is what the exact form of this representation is. In fact some philosophers, such as Dreyfus [5], even doubt whether this internal representation can ever be formalized. In this paper we shall assume such a formalism exists and do our part to address the long-debated question of what it is or what it should be.
Ruzena Bajcsy, Henrik I. Christensen, Jana Košecká

Geometric algebra as a framework for the perception—action cycle

In this paper we will present a mathematical framework for embedding the realization of technical systems which are designed on principles of the perception—action cycle (PAC). The use of PAC as a design principle of `systems which should have both capabilities of perception and action is motivated by ethology and has its theoretical roots in the theory of non—linear dynamic systems. PAC is the frame of autonomous behavior. It relates perception and action in a purposive manner. The global competence of such systems results from cooperation and competition of a set of behaviors, each as an observable manifestation of a certain kind of competence. If both acquired skill and experience are the sources to yield competence, there is hope also to gain such attractive system properties like robustness and adaptivity. The essence behind this extension of the active vision paradigm is a certain kind of equivalence between visual perception and action. That means both perceptual categories and those of actions are mutually supported and have to be mutually verified. Perception and action constitute the afferent and efferent interfaces of the agent to its environment. Using them in a mature stage the active agent stabilizes its relation to the environment by equalizing categories of perception with those of action. The first ones are defined by the experience that similar patterns cause similar actions (or reactions) and the second ones correspond to the skill that similar actions cause similar patterns. Following that line it should be possible to design both technical visual systems with support of active components of movement and seeing robots. This necessitates the fusion of computer vision (as active vision), robotics, signal processing, and neural computation. It becomes obvious that representations will take on central importance. They have to relate the agent with the environment in Euclidean space—time. Evaluating the actual situation with respect to the representation problem we have to state both serious shortcomings within the disciplines and gaps between them.
Gerald Sommer, Eduardo Bayro-Corrochano, Thomas Bülow


Weitere Informationen