Skip to main content

1988 | Buch

Parallel Programming and Compilers

verfasst von: Constantine D. Polychronopoulos

Verlag: Springer US

Buchreihe : The International Series in Engineering and Computer Science

insite
SUCHEN

Über dieses Buch

The second half of the 1970s was marked with impressive advances in array/vector architectures and vectorization techniques and compilers. This progress continued with a particular focus on vector machines until the middle of the 1980s. The major­ ity of supercomputers during this period were register-to-register (Cray 1) or memory-to-memory (CDC Cyber 205) vector (pipelined) machines. However, the increasing demand for higher computational rates lead naturally to parallel comput­ ers and software. Through the replication of autonomous processors in a coordinated system, one can skip over performance barriers due technology limitations. In princi­ ple, parallelism offers unlimited performance potential. Nevertheless, it is very difficult to realize this performance potential in practice. So far, we have seen only the tip of the iceberg called "parallel machines and parallel programming". Parallel programming in particular is a rapidly evolving art and, at present, highly empirical. In this book we discuss several aspects of parallel programming and parallelizing compilers. Instead of trying to develop parallel programming methodologies and paradigms, we often focus on more advanced topics assuming that the reader has an adequate background in parallel processing. The book is organized in three main parts. In the first part (Chapters 1 and 2) we set the stage and focus on program transformations and parallelizing compilers. The second part of this book (Chapters 3 and 4) discusses scheduling for parallel machines from the practical point of view macro and microtasking and supporting environments). Finally, the last part (Le.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Parallel Architectures and Compilers
Abstract
The ever increasing demand for computational power in many areas of science and engineering is an undeniable fact. In the first few generations of computer systems, increased computational speed was primarily obtained through advances in technology that allowed higher degree of integration and faster circuits. As technology approaches certain physical limitations, parallelism seems to be the most promising alternative for satisfying the ever-increasing demand for computational speed. Through the replication of computational elements that are interconnected in some regular structure, programs can execute on multiple processors and access multiple memory banks. In other words computations (and memory transfers) can be performed in parallel. Parallelism is an old concept and was applied to a limited degree in some of the early computer systems. First in the form of I/O channels that allowed overlapped computation and I/O, and later in the form of multiple functional units that could operate in parallel. Technology constraints and software limitations however did not make it feasible to design and build parallel machines until later in the 70’s.
Constantine D. Polychronopoulos
Chapter 2. Program Restructuring for Parallel Execution
Abstract
The effectiveness of automatic program restructuring depends on how accurately we can compute data dependences. In general, data dependences give us information about how data are computed in a program and how they are used. This information is then used to parallelize the program automatically. The resulting parallel program should be such that during its parallel execution it computes and uses data in the order implied by the various data and control dependences; this is necessary to guarantee correctness of results. The material presented in this chapter requires some knowledge of data dependences and some associated concepts. Before we give a short introduction to data dependences however, we need to establish a basic notation.
Constantine D. Polychronopoulos
Chapter 3. A Comprehensive Environment for Automatic Packaging and Scheduling of Parallelism
Abstract
The state of the art in vectorizing and parallelizing compilers has overcome the language barrier and offers a viable (and not necessarily exclusive) alternative to parallel programming. The evolution of parallel programming languages has not kept up with the evolution of parallel architectures [KDLS86], [GGKM83], [Chen83], [MiUc84]. Even though this may change in the future, currently we are confined to using enhanced versions of existing languages which do not provide either the power or the flexibility for programming modern parallel supercomputers [ANSI86], [GuPL88], [MeRo85]. Under this reality parallel programming remains still a highly complex and empirical art, even for the skillful programmer. Parallelizing compilers offer a solution to both of these problems. A serial program can be automatically vectorized and/or parallelized, and in many cases automatic concurrentization can proceed to the maximum possible degree.
Constantine D. Polychronopoulos
Chapter 4. Static and Dynamic Loop Scheduling
Abstract
Loops are the largest potential source of program parallelism and the problem of using several processors for the fast execution of complex parallel loops has attracted considerable attention in the last few years [PoKu87], [PoKP86], [TaPe86], [KrWe85], [PaKL80], [Cytr84]. A key problem in designing and using large parallel processor systems is determining how to schedule independent processors to execute a parallel program as fast as possible. In this chapter we present a practical method that can be used to obtain very efficient schedules of parallel loops on shared memory parallel machines. The problem of mapping or assigning different tasks of the same program to different processors (otherwise known as “scheduling”) has become more pressing than ever before with the recent introduction of many parallel machines in the market.
Constantine D. Polychronopoulos
Chapter 5. Run-Time Overhead
Abstract
The overhead involved with the simultaneous application of many processors to the same task can be very significant [PoKu87], [Poly86], [Rein85], [Cytr85]. So far most of the existing parallel processor systems have not addressed the overhead issue adequately nor have they taken it into account either in the compiler or in the hardware. In Chapter 3 we discussed in length the impact of run-time overhead on usable parallelism. The more the overhead, the less one can exploit the parallelism of a given program. There are two issues pertaining to run-time overhead. The first concerns the design of parallel machines and their software around the goal of minimizing run-time overhead which is incurred during parallel execution. The second issue concerns methodologies for the best utilization of existing parallel computers. In a given machine-program combination, the limitations (overhead etc.) are known and they cannot be altered. The former issue was analyzed in Chapter 3 from the software point of view. The latter case is the focus of this chapter.
Constantine D. Polychronopoulos
Chapter 6. Static Program Partitioning
Abstract
In this chapter we attempt to define more precisely the problems of program partitioning and interprocessor communication, to model them, identify the variables involved and quantify them. We mentioned earlier that program partitioning, interprocessor communication, parallelism, and data dependences are all closely related. In shared memory parallel machines, interprocessor communication is not explicit but it happens through sharing of specific memory locations. Processors read from and write to shared memory under the control of synchronization which enforces a predetermined order. However, in message passing systems (e.g., hypercube architectures), communication between processors takes place through the use of explicit send/receive of packets of data. Since there is no shared memory in such systems, data is requested by a given processor from another specific processor in the distributed system. The latter processor must then assemble a packet with the requested data and forward it to the requesting processor. The message may be routed through other intermediate processors before it reaches its destination. Thus communication overhead is far more critical on distributed systems than it is on shared memory systems. As discussed in Chapter 3, a consequence of this is that parallel tasks in distributed systems are of coarse granularity.
Constantine D. Polychronopoulos
Chapter 7. Static Task Scheduling
Abstract
Depending on the granularity of the different parts of a program we have low and high level spreading for fine and coarse grain program modules respectively. Most instances of the spreading problem are NP-Complete [GaJo79]. In this chapter we discuss optimal solutions for some instances of high level spreading, and efficient heuristics for the intractable cases.
Constantine D. Polychronopoulos
Chapter 8. Speedup Bounds for Parallel Programs
Abstract
In this chapter we consider a more idealized model according to which a program consists of a series of atomic operations. Based on this model we discuss how one can compute speedup values for inter and intra-task parallelism. Then we consider nested doacross loops and give a generalized speedup formula. Let us state the assumptions and describe the model before we proceed to the derivation of speedup formulas.
Constantine D. Polychronopoulos
Backmatter
Metadaten
Titel
Parallel Programming and Compilers
verfasst von
Constantine D. Polychronopoulos
Copyright-Jahr
1988
Verlag
Springer US
Electronic ISBN
978-1-4613-1077-8
Print ISBN
978-1-4612-8416-1
DOI
https://doi.org/10.1007/978-1-4613-1077-8