Skip to main content
main-content

Über dieses Buch

At the initiative of the IBM Almaden Research Center and the National Science Foundation, a workshop on "Opportunities and Constraints of Parallel Computing" was held in San Jose, California, on December 5-6, 1988. The Steering Committee of the workshop consisted of Prof. R. Karp (University of California at Berkeley), Prof. L. Snyder (University of Washington at Seattle), and Dr. J. L. C. Sanz (IBM Almaden Research Center). This workshop was intended to provide a vehicle for interaction for people in the technical community actively engaged in research on parallel computing. One major focus of the workshop was massive parallelism, covering theory and models of computing, algorithm design and analysis, routing architectures and interconnection networks, languages, and application requirements. More conventional issues involving the design and use of parallel computers with a few dozen processors were not addressed at the meeting. A driving force behind the realization of this workshop was the need for interaction between theoreticians and practitioners of parallel computation. Therefore, a group of selected participants from the theory community was invited to attend, together with well-known colleagues actively involved in parallelism from national laboratories, government agencies, and industry.

Inhaltsverzeichnis

Frontmatter

A Critique of the PRAM Model of Computation

In theoretical computer science, parallel computation has been traditionally studied by investigating time, processor, and space complexities of various problems in a model of parallel random access machine called the PRAM model. The PRAM model assumes that there are some number of processors and a global memory into which the data is stored. In one step, a processor can read or write one word from the global memory into its registers or it can perform some simple operations such as adding two numbers or comparing them. Unfortunately, this model is too simplistic and does not capture some important aspects of parallel computation that are observed in practice. In particular, real computers are connected to each other by an interconnection network.; This imposes the following constraints that are not taken into account by the PRAM model: communication latency and local versus global communication, block transfers, memory and network conflicts, and the bandwidth of the interconnection network. We give below a brief description of these constraints and indicate some recent work that has been done in order to obtain more realistic models of parallel computation.

Alok Aggarwal

Future Requirements for Effective Use of Parallel Computers

The field of parallel computing is developing so rapidly now that in some sense making any statement about the present, much less the future, is a risky undertaking. Nonetheless, some clear directions can be delineated.

David H. Bailey

A Perspective on Shared-memory and Message-memory Architectures

Parallel processing is becoming widely accepted as a key technology in the development of extremely high performance computing systems. Although a wide variety of parallel processor architectures have been proposed over the last five to ten years, to date there is very little quantitative analysis comparing the various alternatives. The proposed architectures essentially fall into two broad categories: message-passing architectures and shared-memory architectures. In this paper we will briefly review the key characteristics of message-passing and shared-memory architectures. We then propose an approach to obtaining a quantitative comparison of these architectures and suggest areas for research.

Alan Baratz, Kevin McAuliffe

Some Observations on Models of Parallel Computation

Although recent developments have clearly established parallelism as a viable approach to building faster computers, a consensus has not yet emerged on how parallelism should best be exploited. A variety of radically different architectures, languages, and frameworks for algorithm design have been proposed, praised, and criticized by different schools of thought. Many aspects of parallel computation have been analyzed in some depth, but mainly in isolation. Much less is understood on their global interaction. A characterization of the overall design space for parallel computing systems has not yet been formulated. As a first step, it is important to develop an understanding of how different models of parallel computation relate to each other.

Gianfranco Bilardi

The Physics of the Parallel Machines

I argue that architectures for massively parallel computers must be designed to go beyond supporting a particular class of algorithms to supporting the underlying physical processes being modelled. I shall discuss specifically physical processes modelled by partial differential equations (PDEs) and argue that an efficient architecture must go beyond nearest neighbor mesh interconnections and support global and hierarchical communications.

Tony F. Chan

Parallel Program Design

My goal is to propose a set of questions that I think are important. J. Misra and I are working on these questions.

K. Mani Chandy

The APRAM: Incorporating Asynchrony into the PRAM Model

The PRAM model is a machine that comprises p processors and m memory cells; each processor can access each memory cell in constant time. The PRAM has proved a popular model for parallel algorithm design. For the task of designing efficient, highly parallel algorithms is quite difficult, in general. The PRAM model provides an abstraction that strips away problems of synchronization, reliability and communication delays, thereby permitting algorithm designers to focus first and foremost on the structure of the computational problem at hand, rather than the architecture of a currently available machine. As a consequence, a considerable body of PRAM algorithms has been discovered in the past several years, and a number of powerful techniques for designing such algorithms have been identified (see, for instance, the survey articles [KR88, EG88]).

Richard Cole

Architectures for Data Parallel Algorithms

One very useful division of parallel algorithms is the division between those algorithms that exploit data parallelism and those that exploit control parallelism. As defined by Hillis, data parallel algorithms have a single thread of control that acts on many data items in parallel, while control parallel algorithms have many threads of control. Although the distinction is not always sharp, it does capture an important aspect of parallel computing.

Robert Cypher

Algorithm Design for Different Computer Architectures

Within the last ten years many who work on the development of numerical algorithms have come to realize the need to get directly involved in the software development process. Issues such as robustness, ease of use, and portability are standard fare in any discussion of numerical algorithm design and implementation. The portability issue, in particular, can be very challenging. As new and exotic architectures evolve they will embrace the notions of concurrent processing, shared memory, pipelining, etc. in order to increase performance. The portability issue becomes formidable indeed as different architectural designs become reality. In fact, it is very tempting to assume that an unavoidable byproduct of portability must be an unacceptable degradation in the level of efficiency on a given variety of machine architecture. We contend that this assumption is erroneous and that its widespread adoption could seriously hamper the ability to effectively utilize machines of the future.

Jack J. Dongarra, Danny C. Sorensen

Neural Networks and Computer Science

My “position paper” can be summed up in the following sentence: There are strong scientific, technological and political reasons why massively parallel, (neural network) computation should be integrated into mainstream computer science. What follows is elaboration.

Jerome A. Feldman

Theoretical Problems in the Design of Tools to Aid in the Construction of Parallel Programs

The subject of parallel computation has been around for almost 20 years and, until very recently, it has been an exotic subdiscipline of computer architecture and algorithm analysis. However, recently we have seen a fundamental shift in the role of parallelism studies in both academic computer science and in the computer industry. In particular, parallel computation issues are now a significant part of research in most branches of computer science and the industry sees parallelism as the only answer to the next generation of high performance machines. Because we can now build parallel systems with relative ease and because application programmers are now intensively involved in the use of these machines, the nature of basic research has shifted focus. In the past, when there were no real machines available, all we could do was make theoretical studies of parallel algorithm complexity and debate the merits of switching network design. Now the discipline more closely resembles other sciences in that there is a very active experimental branch of computer science that is busy testing new design theories and algorithms. The experimental work is driving new investigations of theoretical questions that have arisen from our need to fully understand this first generation of parallel hardware.

Dennis Gannon

Towards More Flexible Architectures

Communication and synchronization costs constitute two major sources of overhead in parallel processing systems. Multiprocessor architectures provide hardware mechanisms to reduce the costs associated with communication and synchronization. Often there is a single mechanism provided for handling a certain task. Although this mechanism may provide acceptable performance for the average case, it may not be the ideal choice for each and every case. A more flexible architecture would provide several mechanisms, allowing the programmer and/or compiler to choose the best suited policy. In this way, the architecture can efficiently support several programming paradigms. In addition, there is synergy in using several mechanisms within the same program. I argue that such flexibility can be realized without excessive increase in hardware complexity or loss in performance. The following sections are written from the viewpoint of augmenting shared memory architectures.

Kourosh Gharachorloo

Towards Better Shared Memory Programming Models

In an often-cited paper, A. Karp [1] commented on the “sorry state” of parallel programming, lamenting the difficulties in programming and debugging parallel programs on existing programming models. Indeed, a variety of programming models have been proposed with not entirely satisfactory results. In this note, we focus on the world of massively parallel shared memory machines and examine programming models for these machines. A shared memory parallel computer is one in which the processors communicate by reading and writing data words to a shared memory. We consider shared memory parallel computers with the following characteristics. Each processor has some fast, relatively small, private memory (e.g. registers, cache). There is a large shared memory composed of a series of memory banks that are accessible by all processors via an interconnection network. Each processor has one such memory bank which is local to it, i.e. it can access the bank without going through the interconnection network. The access time to a memory bank for a processor depends in part on the lengths of the links traversed and the number of hops in the path from the processor to the bank. Special coprocessors take care of interprocessor communication: both programmers and processors are relieved of the burden of routing the parallel requests to shared memory, including handling intermediate hops of requests traveling to their destinations. In order to overcome the high latency of accessing global memory, the parallel computer supports the pipelining of global memory accesses through the network. This implies that the interconnection network has sufficient bandwidth to handle the multiple requests per processor.

Phillip B. Gibbons

New Cost Measures in Graph Embeddings

The problem of efficiently simulating an algorithm designed for an n-processor architecture G on an m-processor architecture H is a fundamental problem in parallel processing. Significant contributions to this problem and the related problem of studying the computational equivalence between interconnection networks have been made by viewing both architectures as graphs and applying concepts and solution techniques used in graph em-beddings [AR82, BCHLR88, BCLR86, HMR83, KA86, R79]. While the conventional cost measures in graph embeddings (i.e., dilation, load and expansion) can easily be used to determine the maximum amount of time required to simulate one step of G on H, they do not capture other important situations arising in a parallel environmenmt.

Susanne E. Hambrusch

Parallel Computing: Perspectives and Prospects

The trend toward parallel processing is driven by both scientific and economic factors. At the high end of the performance range, parallelism is the only answer to the fundamental limits on computational speed that are being rapidly approached by today’s supercomputers. In the lower and middle performance ranges, parallel architectures assembled from commodity parts, such as microprocessors and memory chips, provide the most cost effective systems. These trends are mutually supportive, as the lessons learned and software developed are transferred across systems.

Michael T. Heath

Challenges in Multiprocessor Architectures

If one was to read a set of multiprocessor architecture papers, you might well conclude that the task of building a multiprocessor amounted to little more than selecting your favorite 10-mips microprocessor and deciding how to hook up 1,000 of them. Your machine could deliver 10,000 mips — at least it could if you selected the right benchmark! Indeed, if you could supply the data to these processors and keep them busy doing useful work, you might actually be able to get a reasonable amount of performance from this machine.

John Hennessy

Making Parallel Computers Usable:

A Software Environment for Parallel Computer Vision

One of the basic issues in parallel processing has to do with how knowledge gained from past experience can be used effectively. From the point of view of a user of parallel systems, specific areas in which prior experience should be able to play a major role include: the ability to make effective use of libraries of previously developed algorithmsthe efficient mapping new algorithms to parallel architectures, where efficient applies both to the resulting mapping and the process by which the mapping is donethe development of new algorithmsthe rapid assessment of proposed and new architectures for their suitability for a given application.

Leah H. Jamieson

A Position Paper on Parallel Computation

The research of the past several years has produced a substantial body of knowledge concerning the design of parallel algorithms and their implementation on various architectures. In many ways the enterprise has been highly succesful: a number of fundamental principles of parallel algorithm design and parallel communication have emerged, and the foundations of a rich mathematical theory have been laid. Despite these successes, the theoretical work has had relatively little influence on developments in parallel computer architecture, or on the application of parallel computation techniques to fields such as computer vision, artificial intelligence, combinatorial optimization and scientific computing. We must continue to work on the development of models of parallel computation that are sufficiently abstract to support a theoretical investigation, and yet reflect the objectives, constraints and design considerations that arise in practice.

Richard M. Karp

Efficient Parallel Algorithms: Theory and Practice

There are two main communities of parallel algorithm designers: the theoreticians and the practitioners. Theoreticians have been developing algorithms with narrow theory and little practical importance; in contrast, practitioners have been developing algorithms with little theory and narrow practical importance. This note expands on the previous statement, and discusses a theory of parallel computation that we believe has wide practical importance.

Clyde P. Kruskal

What is the Right Model for Designing Parallel Algorithms?

In some ways, the title of this position paper is a moot question. The reason is that most all parallel algorithms are described in terms of the PRAM abstraction, which is a practice that is not likely to change in the near future. Although there are many types of PRAMs, and there is much debate over what kind of concurrent reading and writing (if any) is most reasonable, the basic idea of the PRAM abstraction is the same: the underlying network architecture of the parallel machine is ignored, and packet routing is given as a unit cost primitive.

Tom Leighton

Beyond Parallel Random-Access Machines

The fundamental paradigm in theoretical computer science is to propose a formal model of computation, and then to analyze the complexity of different problems in the model. However, if the model fails to reflect constraints in the real world, these analyses have limited practical importance. The parallel random-access machine (PRAM) is by far the most popular model of parallel computation. Hundreds of algorithms have been designed for the PRAM. Yet, the PRAM ignores contraints on communication imposed by modern electronic packaging technology.

Bruce Maggs

Are Pointer-Based Parallel Algorithms Realistic?

Most if not all the recent theoretical work in parallel algorithm design has focus on what I shall refer to as pointer-based algorithm design. By this, I mean those algorithms that spend most of their time chasing pointers. Included in pointer-based sequential problems/algorithms are most graph algorithms such as depth-first search, breadth-first search, and maximum flow. These problems are know to be parallelizable. However, the algorithms do not use pointers, but rather rely on dense matrix operations and thus use an unrealistic number of processors. The theory community realizes that these algorithms at the present time are not efficient. On the other hand, many graph problems can be performed in parallel with only a modest increase in total work done, ie, (time) X (number of processors). These problems graph connectivity, finding maximal independent sets in a graph, and list ranking. Algorithms for these problem are often referred to as processor efficient algorithms, modestly known as “optimal” algorithms. There is a substantial effort underway in the theory community to find additional optimal algorithms. All the algorithms found so far use pointerchasing to decrease the number of processors used.

Gary L. Miller

Opportunities and Constraints of Parallel Computing

As the speed of uniprocessors approaches the theoretical limit, it is clear that parallel processing is the only way to keep increasing the speed of computation. As a matter of fact, since very expensive technology is required to keep increasing the speed of uniprocessors, parallel processing using less expensive technology offers a cost-effective alternative for high-speed computation. A number of commercial parallel machines have appeared on the market, but parallel computing has not taken off, and still remains the purvey of a few researchers in industrial and academic labs. Why? There are a number of reasons. A consideration of them points to potential opportunities and constraints in parallel computing.

K. Moidin Mohiuddin

Adaptive Routing in Multicomputer Networks

Multicomputer Networks Message-passing concurrent computers, more commonly known as multicomputers, such as the Caltech Cosmic Cube [1] and its commercial descendents, consist of many computing nodes that interact with each other by sending and receiving messages over communication channels between the nodes [2]. The existing communication networks of the second-generation machines such as the Ametek 2010 employ an oblivious wormhole routing technique [6,7] which guarantees deadlock freedom. The network performance of these highly evolved oblivious techniques have reached a limit of being as fast as physically possible while capable of delivering, under random traffic, a stable maximum sustained throughput of ≈ 45 to 50% of the limit set by the network bisection bandwidth. Any further improvements on these networks will require an adaptive utilization of available network bandwidth to diffuse local congestions.

John Y. Ngai, Charles L. Seitz

New Languages with Implicit Parallelism and Heap Storage are Needed for Parallel Programming

As we increase the number of nodes in a parallel machine, the granularity of the tasks must decrease.1Larger machines also have longer latencies. To avoid idling a processor during a remote communication, it must switch to another task, which, in turn, may also engage in a remote communication, and so on. Thus, we need a large pool of tasks per processor so that it is likely that there are always some that are ready to execute.

Rishiyur S. Nikhil

Very High Level Parallel Programming

It is becoming increasingly clear that high level programming languages are a prerequisite for the more widespread use of parallel computers. Can we design high level programming languages that are convenient to program with and also lead to efficient execution? One high level model that is frequently used for algorithm design is the PRAM. While the PRAM has been found to be very convenient to program, it is substantially different from realistic parallel computers. Thus it is possible that efficient algorithms for PRAMs may not necessarily translate to efficient algorithms for realistic machines. One possibility, then, is to base high level languages on a model that is intermediate between the ideal PRAM model and the more realistic, distributed parallel computation models.

Abhiram Ranade

A Case for Randomized Parallel Algorithms

Randomization was formally introduced by Rabin[6] and independently by Solovay & Strassen[8] as a tool for improving the efficiency of certain algorithms. In a nutshell, a randomized algorithm uses coin-flips to make decisions at different steps of the algorithm. Therefore a randomized algorithm is actually a family of algorithms where each member of this family corresponds to a fixed sequence of outcomes of the coin-flip. Two of the most commonly used forms of randomization in literature are the Las Vegas algorithms and Monte Carlo algorithms. The former kind ensures that the output of the algorithm is always correct — however only a fraction (usually greater than 1/2) of the family of algorithms halt within a certain time bound (as well as with respect to some other resources like space). In contrast, the Monte Carlo procedures always halt in a pre-determined time period; however the final output is correct with a certain probability (typically > 1/2). This lends itself very naturally to decision algorithms (Rabin’s primality testing being a good example). For the purpose of this discussion we shall limit ourselves to the Las Vegas algorithms which have been more popular with the algorithm designers. For a general algorithm which produces more than just ‘yes-no’ output, the precise meaning of an incorrect output becomes subjective; for example we may need to know how close are we to the correct output in order to decide if the output is acceptable. Although, this is one of the reasons for bias towards Las Vegas algorithms, the use of either kind of algorithms depends on the particular application.

John H. Reif, Sandeep Sen

Designing and Using Parallel Architectures

The world of parallel architectures badly needs methodologies for: 1.assessing the relative powers of parallel architectures — by simulating architecture A on architecture B efficiently on general computationson important specific classes of algorithms2.efficiently and effectively “programming” a given architecture — by seeking algorithmic paradigms, not ad hoc algorithms, via: simulating “ideal” architecturesexploiting algorithmic structure3.accommodating the technology used to implement a given architecture — by achieving graceful degradation of computational efficiency and: preservation of architectural structure(to preserve programmability and program efficiency);selective salvage of fault-free parts(to enhance computational efficiency).

Arnold L. Rosenberg

The Tower of Babel in Parallel Computing

Parallel computing is a relatively new area of research and development. In spite of being young, parallel computing is already a fairly messy field. At times, I feel that the field is in such a state of confusion that we can hardly find a large group of people talking the same technical language. Parallel computation has attracted the work of a diverse community of scientists and engineers, but this cannot be the only reason for the present mess.

Jorge L. C. Sanz

Why Are All Us Parallel Numerical Analysts Doing Parallel Software Tools?

Anyone attending (parallel) numerical computation conferences recently is likely to have noticed that a decent number of numerical analysts are actually working on software tools for parallel computation, such as languages, interfaces, modeling systems, visualization tools, or other aspects of programming environments. There are several possible explanations for this phenomenon. Among them: 1.There is no interesting research to be done on parallel numerical algorithms themselves.2.All the interesting questions in parallel numerical computation are computer science issues.3.The numerical analysts are doing this work because the “real” computer scientists haven’t, or won’t.

Robert B. Schnabel

Portraying the Dynamics of Concurrent Programs

Just as designers of concurrent programs and algorithms for multiple-process programs with shared variables may find the analysis or portrayal of the behavior of these programs clarified by using a simplified model of the shared-memory multiprocessor (eg, the PRAM), designers of concurrent programs and algorithms for multiple-process programs that operate by message passing can benefit from using a simplified model of the message-passing multicomputer. My students and I have developed informally what we believe is a reasonably pleasing and physically credible model of the execution of reactive message-passing programs. This sweep model is described in detail in [1], and its essentials are summarized in [2].

Charles L. Seitz

Remarks on the Directions of Research in Parallel Computing

These remarks are intended to provoke discussion about what might accelerate the evolution of parallel computation. Such a general question deserves investigation well beyond the scope of this informal note, which is aimed at only two or three specific topics. It also needs the input from experts in many areas of computer science and parallel computation. Accordingly, forgiveness and understanding are requested for any and all errors and omissions in this necessarily limited presentation.

Alan Siegel

Parallelism, Compiler Optimization, and Deterministic Scheduling Theory

As massively parallel machines come into existence, people will be writing algorithms and code tailored to these machines. Nonetheless, there remains a considerable need for optimizing parallel compilers. One inescapable fact is that there is a tremendous amount of code already written for sequential machines. Since producing software can be very costly, there is an understandable reluctance to rewrite already working code to make it run efficiently on a parallel computer. This problem is exacerbated by the phenomenon of unknown side effects. It can be a risky proposition to rewrite code which frequently is poorly documented, if it is documented at all, when you don’t know precisely what the code does.

Barbara Simons

Time for the 4GLs?

Our expectations to use more and more realistic mathematical models of complex phenomena grow as computers become bigger and faster. Highly parallel computers (comprising 1000 or more processors) offer attractive potential for gains in performance, but the task of exploiting the parallelism still lies with the applications programmer. The combination of more complex (alternately, more realistic) algorithms and complex architectures make general solutions to the larger problem difficult to reach.

Stephen Skedzielewski

Scalable Shared Memory MIMD Computers

Why are shared memory MIMD computers with many processors difficult to implement? The answer depends in part on the definition of the term “shared memory”, but there is probably broad agreement that any sort of shared memory system with hundreds of processors is a challenge. The main problem is the latency associated with memory access and its consequences for processor performance. There are two possible solutions to this problem, namely latency avoidance and latency tolerance. Latency avoidance is accomplished by arranging a processor’s memory accesses so that most of them are to locations that are both spatially and temporally nearby. Latency tolerance is brought about through the use of additional parallelism. Both ideas have been used in shared memory systems, with varying success.

Burton J. Smith

Parallel Computation Models — Some Useful Questions

The Webster New Collegiate Dictionary defines the word model as, among other things, (1) a mathematical description of an entity; and (2) a pattern of something to be made. Scientists use the word in the first sense; their models are descriptive. Engineers use the word in the second sense; their models are prescriptive. Computer scientists, which are a little bit of both, use it in a hybrid sense: A computing model is an idealized description of real computers. This is the prevaling sense in areas with well established paradigms, such as sequential computing (with the Von Neumann computer paradigm). Computing models are also a prescription of how computers could, or should be built. This aspect is more important in parallel processing, where various paradigms are competing. Has complexity theory a role to play in the selection of such prescriptive model?

Marc Snir

Parallel Algorithms and Programming Languages

It is possible to identify two principles to guide research and development in parallel algorithms and programming languages. Parallel algorithms research should focus on nonshared memory models of parallel computation.Parallel programming languages should “contain” a nonshared memory programming model.

Lawrence Snyder

Programming High Performance Parallel Machines

Many researchers are working on the problem of providing effective programming paradigms for parallel machines. Much of this research is motivated by the difficulties involved in automatically parallelizing programs that were initially written with a sequential machine model in mind. Another major motivation for the research is to provide models that are more flexible so that both the programmer and the compiler can best utilize the capabilities of a target machine. Such models may exhibit explicit and/or implicit parallelism, and offer a shared or distributed model of memory (or may offer no memory model at all). No matter what features the models exhibit, the key is that all such models take into account the potential parallel nature of a computation, so as not to constrain the execution of a program written within the model to necessarily be sequential. Unfortunately, it is extremely difficult to convince users to change their programming model without convincing them that the benefits of learning a new paradigm, such as potential performance gains and greater productivity, greatly outweigh the difficulties involved in changing the way they look at solving their problems.

Alan Sussman

Optimally Universal Parallel Computers

A key property of the von Neumann architecture for sequential computers is efficient universality. It can simulate arbitrary programs written in appropriate high-level languages in time proportional to that which they would take if special-purpose sequential machines were built for each of them. This makes possible standardized languages and transportable software, without which the level of pervasiveness that computers have now reached would be difficult to imagine.

L. G. Valiant

Basic Research in Parallel Computing Must Address Both Short- and Long-Term Objectives

Although we as computer scientists are usually loath to admit it, in large measure the value of our “science” is judged by how well it supports more established branches of science and engineering. And though the egos of some computer scientists may be bruised by the fact that other sciences view computing as a service discipline, in actuality that view offers computer science an unmatched source of research problems, and opportunities to contribute to the solution of deep questions about the natural and man-made world.

André M. van Tilborg

PRAM Algorithms: Teach and Preach

General purpose parallel computation raises two main challenges: 1.How to build parallel computers?2.How to use them effectively?

Uzi Vishkin

Too Much “Research” on Parallel Numerical Algorithms

In the early 1970’s, there were a few isolated pockets of research activity focused on parallel algorithms. The advent of the Illiac IV stirred some additional interest but it remained for two events of the 1980’s to cause the explosion of interest that we see today. One of these events was the availability of relatively inexpensive parallel hardware such as the Intel iPSC. The other, spurred on by numerous federally supported research studies, was the advent of agency funds for parallel computing research.

Robert G. Voigt
Weitere Informationen