Skip to main content

Über dieses Buch

Computer vision researchers have been frustrated in their attempts to automatically derive depth information from conventional two-dimensional intensity images. Research on "shape from texture", "shape from shading", and "shape from focus" is still in a laboratory stage and had not seen much use in commercial machine vision systems. A range image or a depth map contains explicit information about the distance from the sensor to the object surfaces within the field of view in the scene. Information about "surface geometry" which is important for, say, three-dimensional object recognition is more easily extracted from "2 1/2 D" range images than from "2D" intensity images. As a result, both active sensors such as laser range finders and passive techniques such as multi-camera stereo vision are being increasingly utilized by vision researchers to solve a variety of problems. This book contains chapters written by distinguished computer vision researchers covering the following areas: Overview of 3D Vision Range Sensing Geometric Processing Object Recognition Navigation Inspection Multisensor Fusion A workshop report, written by the editors, also appears in the book. It summarizes the state of the art and proposes future research directions in range image sensing, processing, interpretation, and applications. The book also contains an extensive, up-to-date bibliography on the above topics. This book provides a unique perspective on the problem of three-dimensional sensing and processing; it is the only comprehensive collection of papers devoted to range images. Both academic researchers interested in research issues in 3D vision and industrial engineers in search of solutions to particular problems will find this a useful reference book.



1. Report: 1988 NSF Range Image Understanding Workshop

Most computer vision research has concentrated on using digitized grayscale intensity images as sensor data. It has proven to be extraordinarily difficult to program computers to understand and describe these images in a general purpose way. One important problem is that digitized intensity images are rectangular arrays of numbers which indicate the brightness at individual points on a regularly spaced rectangular grid and contain no explicit information that is directly usable in depth perception. Yet human beings are able to correctly infer depth relationships quickly and easily among intensity image regions whereas automatic inference of such depth relationships has proven to be remarkably complex. In fact, many famous visual illusions, such as Kanizsa’s triangle, vividly demonstrate that humans impose 3-D surface structure on images to interpret them. Computer vision researchers recognized the importance of surfaces in the understanding of images. The popularity of shape from … approaches in the last decade is the result of this recognition.
Ramesh Jain, Anil K. Jain

2. A Rule-Based Approach to Binocular Stereopsis

Before the famous random-dot stereogram experiments by Julesz [Jul60], it was generally believed that a necessary precursor to binocular fusion was a recognition of monocular cues in each of the two images. The experiments by Julesz caused a paradigm shift of sorts in the psychophysics of the human visual system; suddenly, the preponderance of the research effort shifted toward explaining practically all aspects of human stereopsis in terms of low-level processes, as opposed to high-level cognitive phenomena. One of the high points of this post-Julesz period was the development of the Marr-Poggio paradigm [MP79]. Marr and Poggio presented a computational theory, later implemented by Grimson, that provided successful explanation of the Julesz experiments in terms of the matchings of zerocrossings at different scales, the zero-crossings at each scale corresponding to the filtering of the images with a Laplacian-of-a-Gaussian (LoG) filter [Gri81a].
S. Tanaka, A. C. Kak

3. Geometric Signal Processing

A wide variety of sensing techniques allow the direct measurement of the three-dimensional (3-D) coordinates of closely spaced points in a scene. A optical profilometer or a single light-stripe range sensor measures a one-dimensional depth profile z i = f(x i ) of a surface along a line specified by a y value. A single-view range imaging sensor [Bes88b] generates a two-dimensional set of samples z ij = f(x i , y j ) that represent surface points in a scene. Such sensors might also yield surface dependent properties p ij at each measured point, such as reflectance. Magnetic resonance imaging (MRI) systems [HL83] and computerized tomography (CT) systems [BGP83] measure various properties of 3-D points in a volume p ijk = f(x i , y j , z k ). A video camera is a good sensor for recovering planar curves \( \left( {{x_i},\,{y_i}} \right) = \vec f\left( {{s_i}} \right) \) that define the shape of a two-dimensional object in a plane specified by a fixed z value. In each case, a digital signal directly representing scene geometry is produced by the sensor. Typically, the signal is noisier than one would prefer and is therefore processed using digital signal/image processing techniques to clean up the digital sensor data for further computations, such as extracting descriptions of geometric primitives.
Paul J. Besl

4. Segmentation versus object representation — are they separable?

When vision is used for moving through the environment, for manipulating or for recognizing objects, it has to simplify the visual input to the level that is required for the specific task. To simplify means to partition images into entities that correspond to individual regions, objects and parts in the real world and to describe those entities only in detail sufficient for performing a required task. For visual discrimination, shape is probably the most important property. After all, line drawings of scenes and objects are usually sufficient for description and subsequent recognition. In computer vision literature this partitioning of images and description of individual parts is called segmentation and shape representation. Segmentation and shape representation appear to be distinct problems and are treated as such in most computer vision systems. In this paper we try to disperse this notion and show that there is no clear division between segmentation and shape representation. Solving any one of those two problems separately is very difficult. On the other hand, if any one of the two problems is solved first, the other one becomes much easier. For example, if the image is correctly divided into parts, the subsequent shape description of those parts gets easier. The opposite is also true when the shapes of parts are known, the partitioning of the image gets simpler.
Ruzena Bajcsy, Franc Solina, Alok Gupta

5. Object Recognition

Humans recognize many objects effortlessly. A chair is easily recognized by us; so is a screwdriver. Biederman roughly estimates that a six-year-old can recognize 30,000 distinguishable objects at a time when his vocabulary is roughly 10,000 words [Bie87]. In order to construct flexible robots for industry, navigation, and the home, computer vision must provide the capability of recognizing many of these objects. Objects need to be recognized for inspection, for grasping and manipulation, including assembly, or, in order to attack them or plan path around them. However, current vision systems don’t come close to the recognition ability of a six-year old.
George Stockman

6. Applications of Range Image Sensing and Processing

In any machine vision problem, one is concerned with analyzing an image so as to produce a description of the image relevant to the task at hand. Most industrial problems have to do with the inspection, manipulation or measurement of three dimensional objects in a three dimensional workspace. Thus, machine vision systems used in these problems are often required to provide geometric descriptions of object and and workspace. The desired descriptions can be formed by implicit or explicit methods. Classical machine vision research seeks to construct three dimensional representation of objects in the field of view implicitly from one or more gray scale luminance images. The central idea behind this approach is that shape characteristics can be inferred from luminance variations observed in the image through the use of a variety of physical, optical and conceptual models. The system forms a description of the scene based on the information from the models, features from the images and control flow supplied by the particular reasoning approach used.
N. R. Corby, J. L. Mundy

7. 3-D Vision Techniques for Autonomous Vehicles

A mobile robot is a vehicle that navigates autonomously through an unknown or partially known environment. Research in the field of mobile robots has received considerable attention in the past decade due to its wide range of potential applications, from surveillance to planetary exploration, and the research opportunities it provides, including virtually the whole spectrum of robotics research from vehicle control to symbolic planning (see for example [Har88b] for an analysis of the research issues in mobile robots). In this paper we present our investigation of some the issues in one of the components of mobile robots: perception. The role of perception in mobile robots is to transform data from sensors into representations that can be used by the decision-making components of the system. The simplest example is the detection of potentially dangerous regions in the environment (i. e. obstacles) that can be used by a path planner whose role is to generate safe trajectories for the vehicle. An example of a more complex situation is a mission that requires the recognition of specific landmarks, in which case the perception components must produce complex descriptions of the sensed environment and relate them to stored models of the landmarks.
Martial Hebert, Takeo Kanade, InSo Kweon

8. Multisensor Fusion for Automatic Scene Interpretation

The area of computer analysis of images for automated detection and classification of objects in a scene has been intensively researched in the recent past. Two kinds of approaches may be noted in current and past research in machine perception - (1) To model the functions of biological vision systems, e.g., edge detection by the human visual system, and (2) To develop a scheme which a machine can use for accomplishing a particular task, e.g. automated detection of faulty placement of components on a printed circuit board. The latter approach produces a scheme that is application specific. In developing a scheme for a particular machine perception task one has a wide choice of sensing modalities and techniques to interpret the sensed signals. One is not limited by characteristics of a biological vision system that one is forced to emulate in the first approach, not even by the restriction that system emulate only the observed behavior of the biological system.
J. K. Aggarwa, N. Nandhakumar


Weitere Informationen