Skip to main content
main-content

Über dieses Buch

It is universally accepted today that parallel processing is here to stay but that software for parallel machines is still difficult to develop. However, there is little recognition of the fact that changes in processor architecture can significantly ease the development of software. In the seventies the availability of processors that could address a large name space directly, eliminated the problem of name management at one level and paved the way for the routine development of large programs. Similarly, today, processor architectures that can facilitate cheap synchronization and provide a global address space can simplify compiler development for parallel machines. If the cost of synchronization remains high, the pro­ gramming of parallel machines will remain significantly less abstract than programming sequential machines. In this monograph Bob Iannucci presents the design and analysis of an architecture that can be a better building block for parallel machines than any von Neumann processor. There is another very interesting motivation behind this work. It is rooted in the long and venerable history of dataflow graphs as a formalism for ex­ pressing parallel computation. The field has bloomed since 1974, when Dennis and Misunas proposed a truly novel architecture using dataflow graphs as the parallel machine language. The novelty and elegance of dataflow architectures has, however, also kept us from asking the real question: "What can dataflow architectures buy us that von Neumann ar­ chitectures can't?" In the following I explain in a round about way how Bob and I arrived at this question.

Inhaltsverzeichnis

Frontmatter

Chapter One. The Problem Domain

Abstract
This text considers the space of architectures which fit the description of scalable, general purpose parallel computers. The term PARALLEL COMPUTER denotes a collection of computing resources, specifically, some number of identical, asynchronously operating processors, some number of identical memory units, and some means for intercommunication, assembled for the purpose of cooperating on the solution of problems1. Such problems are decomposed into communicating parts which are mapped onto the processors. GENERAL PURPOSE means simply that such computers can exploit parallelism, when present, in any program, without appealing to some specific attribute unique to some specific problem domain. SCALABILITY implies that, given sufficient program parallelism, adding hardware resources will result in higher performance without requiring program alteration. The scaling range is assumed to be from a single processor up to a thousand processors2. Parallelism significantly beyond this limit demands yet another change in viewpoint for both machines and languages.
Robert A. Iannucci

Chapter Two. The Importance of Processor Architecture

Abstract
Current-day multiprocessors represent the general belief that processor architecture is of little importance in designing parallel machines. In this Chapter, the fallacy of this assumption will be demonstrated on the basis of the two fundamental issue of latency and synchronization.
Robert A. Iannucci

Chapter Three. A Dataflow / von Neumann Hybrid

Abstract
In the previous Chapter, it was concluded that satisfactory solutions to the problems raised for von Neumann architectures can only be had by altering the architecture of the processor itself. It was further observed that dataflow architectures do address these problems satisfactorily. Based on observations of the near-miss behavior of certain von Neumann parallel processors (e.g., the Denelcor HEP [40, 52]), it is reasonable to speculate that dataflow and von Neumann machines actually represent two points on a continuum of architectures. The goal of the present study is to develop a new machine model which differs minimally from the von Neumann model, yet embodies the same latency and synchronization characteristics which make dataflow architectures amenable to parallel processing.
Robert A. Iannucci

Chapter Four. Compiling for the Hybrid Architecture

Abstract
This chapter considers the task of transforming dataflow program graphs into partitioned graphs, and thence into PML. Section 4.1 extends the work of Section 1.1 by completing the description of DFPG’s. Section 4.2 discusses the issues involved in generating partitioned code from DFPG’s. Section 4.3 presents the design of a suitable code generator.
Robert A. Iannucci

Chapter Five. Analysis

Abstract
This chapter presents experimental results from the first set of emulation studies of the hybrid architecture along with a comparison to similar results from studies of the TTDA. Section 5.1 considers the behavior of a collection of benchmark programs as compiled for the hybrid architecture and as executed by the idealized machine. The results reflect the characteristics of the programs subject to the hybrid partitioning constraints. A comparison is made to the TTDA which shows how the hybrid’s less powerful instructions can be used in place of TTDA instructions with little change in dynamic instruction counts. Also in this section, the costs and benefits of dynamic loop unfolding are studied. Section 5.2 examines the behavior of the realistic model using these same benchmark programs. The costs of aborted instructions (due to synchronization tests which fail) and multi-ported access to the local memory are considered. Data cache effectiveness is studied.
Robert A. Iannucci

Chapter Six. Conclusion

Abstract
This study is concluded by reviewing the work done to date in unifying the dataflow and von Neumann views of computer architecture. The first section summarizes the the present work. The second section presents directions for future work. The last section analyzes related efforts by other researchers in light of the present work.
Robert A. Iannucci

Backmatter

Weitere Informationen