Skip to main content

Über dieses Buch

Recent research results in the area of parallel algorithms for problem solving, search, natural language parsing, and computer vision, are brought together in this book. The research reported demonstrates that substantial parallelism can be exploited in various machine intelligence and vision problems. The chapter authors are prominent researchers actively involved in the study of parallel algorithms for machine intelligence and vision. Extensive experimental studies are presented that will help the reader in assessing the usefulness of an approach to a specific problem. Intended for students and researchers actively involved in parallel algorithms design and in machine intelligence and vision, this book will serve as a valuable reference work as well as an introduction to several research directions in these areas.



Scalable Parallel Formulations of Depth-First Search

This paper presents a parallel formulation of depth-first search. To study its effectiveness we have implemented it to solve the 15-puzzle problem on a variety of commercially available multiprocessors. We are able to achieve fairly linear speedup on these multiprocessors for as many as 128 processors (the maximum configurations available to us). At the heart of this parallel formulation is a work-distribution scheme that divides the work dynamically among different processors. The effectiveness of the parallel formulation is strongly influenced by the work-distribution scheme and the target architecture. We introduce the concept of isoeffciency function to characterize the scalability of different architectures and work-distribution schemes. The isoefficiency analysis of previously known work-distribution schemes motivated the design of substantially improved schemes for ring and shared-memory architectures. The analysis shows that our parallel formulation of DFS can provide near linear speedup on very large parallel architectures.
Vipin Kumar, V. Nageshwara Rao

Parallel Heuristic Search: Two Approaches

We explore two approaches to parallel heuristic search: one based on tree decomposition, in which different processors search different parts of the tree, and the other based on parallel window search, in which each processor searches the whole tree but with different cost bounds. In the first, we present a generic distributed tree search algorithm that effectively searches irregular trees using an arbitrary number of processors without shared memory or centralized control. For brute-force search the algorithm achieves almost linear speedup. For alpha-beta search, the straightforward approach of allocating P processors in a breadth-first manner achieves an overall speedup with random node ordering of P .75 . Furthermore we present a novel processor allocation strategy, called Bound-and-Branch, for parallel alpha-beta search that achieves linear speedup in the case of perfect node ordering. In practice, we achieve a speedup of 12 with 32 processors on a 32-node Hypercube multiprocessor for the game of Othello.
In the second approach, we show how node ordering can be combined with parallel window search to quickly find a near-optimal solution to single-agent problems. First, we show how node ordering by maximum g among nodes with equal f = g + h values can improve the performance of iterative-deepening-A* (IDA*). We then consider a window search where different processes perform IDA* simultaneously on the same problem but with different cost thresholds. Next, we combine the two ideas to produce a parallel window search algorithm in which node ordering information is shared among the different processes. Finally, we show how to combine distributed tree search with parallel window search in single-agent or two-player game searches.
Curt Powley, Chris Ferguson, Richard E. Korf

Distributed Game Tree Search

We present a distributed algorithm for searching game trees. A general strategy for distributed computing is used that can be applied also to other search algorithms. Two new concepts are introduced in order to reduce search overhead and communication overhead: the “Young Brothers Wait Concept” and the “Helpful Master Concept”. We describe some properties of our distributed algorithm including optimal speedup on best ordered game trees.
An implementation of this algorithm in a distributed chess program is studied and experimental data showing surprisingly good performance are presented. Since the performance of our algorithm increases with a better move ordering, this algorithm promises to outperform other known algorithms, especially when combined with state-of-the-art chess programs.
R. Feldmann, B. Monien, P. Mysliwietz, O. Vornberger

Multiprocessing of Combinatorial Search Problems

This chapter presents three paradigms of representations for combinatorial search problems. Depending on the functions of the nonterminal nodes in the graphical representation, a search problem can be represented as an AND-tree, an OR-tree, and an AND/OR graph. This classification facilitates the design of unique computer architectures for supporting efficient evaluation of combinatorial search problems. For each representation, we develop theoretical bounds, efficient algorithms, and functional requirements of multiprocessing architectures, and illustrate these results by examples.
Benjamin W. Wah, Guo-Jie Li, Chee-Fen Yu

Parallel Problem Solving

Problem-reduction based problem solving is a technique that is used when there are multiple methods that may be used to solve a subproblem, and more important, each method may involve solving multiple subproblems consistently. We present an overview of our research in executing such problem solving computations in parallel. The traditional AND OR tree representation is shown to be inadequate for representing parallel evaluations, and REDUCE OR trees are presented as a more suitable representation that can capture more parallelism. As Logic Programming and Horn-Clause theorem proving have much in common with problem solving, the representation is useful for parallel execution in these domains also. A parallel execution scheme for logic programs is described that is based on REDUCE-OR trees. Parallel implementation of this scheme, as well as other parallel computations, is facilitated by a run time support system that allows us to run the parallel interpreter on many different shared memory and message passing machines. In problem solving, as in search, there may be multiple solutions. It is useful but difficult to minimize the time to first solution in a parallel system. We discuss several strategies that help achieve this objective. Performance of these strategies on multiprocessors and on a simulation system is also discussed.
L. V. Kaié

Prism: a Testbed for Parallel Control

A Parallel Inference System for Problem Solving (PRISM) has been designed and implemented. The system provides a general experimental tool for the construction of large artificial intelligence problem solvers. In this paper we present some of the basic facilities for controlling parallelism and inference provided by the system and present experimental data on the use of these facilities. The experiments concern the functionality of the processing elements, the use of heuristics to reduce the search space at run-time, and the use of message passing to distribute tasks in distributed memory parallel machines. The experiments were performed using PRISM on two parallel machines - a one-hundred node shared memory BBN Butterfly, and a 16 node message passing machine.
Mark E. Giuliano, Madhur Kohli, Jack Minker, Irene Durand

Or-Parallelism in Natural Language Parsing

We report on a series of simulation experiments for a large-scale natural language processing system. The results indicate that an or-parallel, all solutions search provides substantial speed-up (20-30 fold) for this application. Longer sentences show relatively greater speed-up, with parse times that increase linearly with sentence length, given a sufficient number of processors. These results have been obtained using a simple, application-specific model of independent, non-communicating or-parallel processes in a shared memory environment. Simulations run with a range of overhead costs show significant benefits from parallel processing with per-process overheads ranging as high as the median process size; our estimates of overhead times are substantially smaller than median process size.
William C. Hopkins, Lynette Hirschman, Robert C. Smith

Parallelism in Computer Vision: a Review

Computer vision tasks require an enormous amount of computation, especially when the data is in image form, demanding high-performance computers for practical, real-time applications. Parallelism appears to be the only economical way to achieve the level of performance required for vision tasks. Researchers in human and machine vision share the belief that massive parallel processing characterizes low-level vision. In this paper we review the various parallel algorithms used in Computer Vision. The problem of visual recognition is divided into three conceptual levels-low-level, intermediate-level and high-level. There are few conceptual difficulties in parallelizing low-level vision. Intermediate-level vision is relatively difficult. Most algorithms in these two levels have been parallelized. However, not much work has been done in high-level vision. We present a survey of algorithms within each of the three levels.
Vipin Chaudhary, J. K. Aggarwal

Real-Time, Parallel Motion Tracking of Three Dimensional Objects From Spatiotemporal Sequences

A major issue in computer vision is the interpretation of three-dimensional (3D) motion of moving objects from a continuous stream of two-dimensional (2D) images. In this paper we consider the problem where the 3D motion of an object corresponding to a known 3D model is to be tracked using only the motion of 2D features in the stream of images. Two general solution paradigms for this problem are characterized: (1) motion-searching, which hypothesizes and tests 3D motion parameters, and (2) motion-calculating, which uses back-projection to directly estimate 3D motion from image-feature motion. Two new algorithms for computing 3D motion based on these two paradigms are presented. One of the major novel aspects of both algorithms is their use of the assumption that the input image stream is spatiotemporally dense. This constraint is psychologically plausible since it is also used by the short-range motion processes in the human visual system.
The processing of a temporally-unbounded, spatiotemporal image combined with the resource constraint of finite image buffer memory, requires real-time throughput rates for our algorithms. Consequently, another major focus of this paper is the development of real-time, parallel implementations to achieve the required throughput. Implementations of both algorithms are described using an Aspex Pipe for low-level, image-feature computations and a Sequent Symmetry for high-level, model-based computations. The Pipe, a pipelined image processor, is tightly-coupled with the Sequent, and semaphores are used for synchronization between the two. Design issues and parallel implementation issues of both algorithms are discussed in detail.
Gilbert Verghese, Karey Lynch Gale, Charles R. Dyer

Parallel Simulation of a Connectionist Stereo Algorithm on a Shared-Memory Multiprocessor

This paper describes the parallel simulation of a connectionist network stereo matching algorithm. In the algorithm a relaxation computation uses a variety of consistency measures between candidate matches to determine the correct match for each edge. These measures include disparity consistency, contour consistency, and multiresolution consistency. In the network each node represents a candidate match and the connections between nodes implement the consistency relations. The simulation involves constructing the list of candidate matches, constructing the list of connections between matches, and performing the iterations of the relaxation computation. Within each of these stages of the simulation, computation may be partitioned among a number of processors with very little need for synchronization or mutual exclusion except at the end of the stage. As a result, the implementation of the simulation on a shared-memory multiprocessor produced nearly linear speed-ups for up to nine processors (the maximum available at the time of implementation).
Charles V. Stewart, Charles R. Dyer

Parallel Algorithms for Image Template Matching

We review some efficient algorithms for image template matching on systolic arrays, meshes, pyramids, fine grain MIMD and SIMD hypercubes, and medium grain MIMD hypercubes. The fine grained algorithms require only 0(1) memory per processor and the medium grained algorithm assumes a fixed number of processors. The medium grain hypercube algorithm is experimentally evaluated using an NCUBE hypercube. Comparative run times for an algorithm that is optimized for the CRAY2 supercomputer are also presented.
Sanjay Ranka, Sartaj Sahni

Image Computations on Fine Grain Electro-Optical Arrays

In this paper, we consider fine grain electro-optical arrays for image computations. The unit time interconnection medium made possible with the use of free-space optics results in efficient solutions to communication-intensive image computations. We present optimal algorithms for several problems on digitized images using a proposed electro-optical mesh. These include O(log N) time solutions for identifying and labeling figures, computing convexity properties, determining distances, etc. We also derive the minimum hardware requirement of any electro-optical architecture for solving problems such as image convolution.
Mehrnoosh Mary Eshaghian, V. K. Prasanna Kumar


Weitere Informationen