Skip to main content
main-content

Über dieses Buch

The papers in this book were presented at the Third Caltech Conference on Very Large Scale Integration, held March 21-23, 1983 in Pasadena, California. The conference was organized by the Computer Science Depart­ ment, California Institute of Technology, and was partly supported by the Caltech Silicon Structures Project. This conference focused on the role of systematic methodologies, theoretical models, and algorithms in all phases of the design, verification, and testing of very large scale integrated circuits. The need for such disciplines has arisen as a result of the rapid progress of integrated circuit technology over the past 10 years. This progress has been driven largely by the fabrica­ tion technology, providing the capability to manufacture very complex elec­ tronic systems reliably and at low cost. At this point the capability to manufac­ ture very large scale integrated circuits has exceeded our capability to develop new product designs quickly, reliably, and at a reasonable cost. As a result new designs are undertaken only if the production volume will be large enough to amortize high design costs, products first appear on the market well past their announced delivery date, and reference manuals must be amended to document design flaws. Recent research in universities and in private industry has created an emerg­ ing science of very large scale integration.

Inhaltsverzeichnis

Frontmatter

Invited Papers

Frontmatter

Practical Experience With VLSI Methodology

Abstract
Some VLSI typical problems and their solutions with the help of architectural concepts are presented. These principles are demonstrated on two experimental chips fabricated in our research process line. Regular structures for the control part of a VLSI processor are described in more detail.
Egon Hörbst, Gerd Sandweg, Stefan Wallstab

CAPRI: A Design Methodology and a Silicon Compiler for VLSI Circuits Specified by Algorithms

Abstract
This paper presents a design methodology and a silicon compiler for VLSI circuits specified by algorithms. The first part of this paper describes some principles to obtain optimized layout. The second part describes the design methodologies based on the use of predefined architectural templets selected for their efficiency. The last part describes the tools which are currently in development at the Computer Architecture group of IMAG. The efficiency of this methodology has been proved by its application in the design of industrial IC.
F. Anceau

Design of a High Performance VLSI Processor

Abstract
Current VLSI fabrication technology makes it possible to design a 32-bit CPU on a single chip. However, to achieve high performance from that processor, the architecture and implementation must be carefully designed and tuned. The MIPS processor incorporates some new architectural ideas into a single-chip, nMOS implementation. Processor performance is obtained by the careful integration of the software (e.g., compilers), the architecture, and the hardware implementation. This integrated view also simplifies the design, making it practical to implement the processor at a university.
John L. Hennessy, Norman P. Jouppi, Steven Przybylski, Christopher Rowen, Thomas Gross

Fundamental Issues in the Electrical Design of VLSI Circuits

Abstract
The Mead-Conway approach to VLSI provides a path for non-specialists to enter the world of VLSI design. The approach condenses most of the more esoteric aspects of circuits and devices into simpler rules of thumb, which, when applied in the context of the prescribed methodology, should result in functional designs. In addition, the simplifications have caused some experienced designers to reshape their own techniques and methodology towards a more clearly focused kernel.
Daniel W. Dobberpuhl

Circuit Timing

Frontmatter

Crystal: A Timing Analyzer for nMOS VLSI Circuits

Abstract
Crystal is a timing analyzer for nMOS circuits designed in the Mead-Conway style. Based on the circuit extracted from a mask set, Crystal determines the length of each clock phase and pinpoints the longest paths. The analysis is independent of specific data values and uses critical path techniques along with a simple RC model of delays. Several additional techniques are used to improve the speed and accuracy of the program, including separate up/down timing, static level assignment, flow control for pass transistors, and precharging.
John K. Ousterhout

TV: An nMOS Timing Analyzer

Abstract
TV is a timing analyzer for nMOS designs. Based on the circuit obtained from existing circuit extractors, TV determines the minimum clock duty and cycle times and verifies that the circuit obeys the MIPS clocking methodology. The delay analysis is an event driven simulation that only uses the values stable, rise, fall, as well as information about clock qualification. TV stresses fast running time, small user input requirements, and the ability to offer die user valuable advice. It calculates as much as possible statically, including the direction of signal flow, use, and clock qualification of all transistors.
Norman P. Jouppi

Optimizing Synchronous Circuitry by Retiming (Preliminary Version)

Abstract
This paper explores circuit optimization within a graph-theoretic framework. The vertices of the graph are combinational logic elements with assigned numerical propagation delays. The edges of the graph are interconnections between combinational logic elements. Each edge is given a weight equal to the number of clocked registers through which the interconnection passes. A distinguished vertex, called the host, represents the interface between the circuit and the external world.
This paper shows how the technique of retiming can be used to transform a given synchronous circuit into a more efficient circuit under a variety of difTerent cost criteria. We give an easily programmed O(|V|3lg|V|) algorithm for determining an equivalent circuit with the smallest possible clock period. We show how to improve the asymptotic time complexity by reducing this problem to an efficiently solvable mixed-integer linear programming problem. We also show that the problem of determining an equivalent circuit with minimum state (total number of registers) is the linear-programming dual of a minimum-cost flow problem, and hence can also be solved efficiently. The techniques are general in that many other constraints can be handled within the graph-theoretic framework.
Charles E. Leiserson, Flavio M. Rose, James B. Saxe

Routing and Interconnection

Frontmatter

A New Channel Routing Algorithm

Abstract
This paper presents a new algorithm for solving the two-layer channel routing problem with doglegging. Based on a set of intuitive and reasonable heuristics, the algorithm tries to obtain a channel routing configuration with a minimum number of tracks. For every benchmark problem tested, the algorithm gives a routing configuration with the smallest number of tracks reported in the literature.
Wan S. Chan

River Routing: Methodology and Analysis

Abstract
The problem of river routing across a channel is only a special case of more general routing configurations. Both its methodological and combinatorial characteristics can be extended in useful ways which will be explored in this paper. The two characteristics that we generalize here are planarity and grouping. Planarity means that the connections are realizable in one layer; ie the interconnection pattern of the nets is planar. Grouping means that the connections are made in order, that is to say that the routing of net i+l is adjacent, conceptually and preferably physically, to the routing of net I.
This paper investigates both placement and routing problems. First we provide a graph theoretical model that accomodates the interconnect specifications in a succinct manner, allowing us to find a placement that enables a planar routing pattern in linear time. Second we study problems of detailed routing, namely whether wires fit in the area allotted by a specific placement. Detailed planar routing of two-point nets for an entire chip (with rectangular modules) is shown to be NP-complete, whereas a polynomial time algorithm is given for detailed routing for a simple polygon (ie one without holes). Routability testing is shown to be easier than the actual generation of the wires. Finally we show how to view general channel routing as a superimposed collection of river routing problems.
Ron Y. Pinter

Area and Delay Penalties in Restructurable Wafer-Scale Arrays

Abstract
The penalties for restructuring wafer-scale arrays for yield enhancement are assessed. Each element of the fabricated array is assumed to be defective with independent probability p. A fixed fraction R of the elements are to be connected into a prespecified defect-free configuration by means of switched interconnections. The area penalty is determined by the required number of tracks per wiring channel t. Propagation delay is determined by the required maximum connection length d. It is shown that: Connection of RN fixed I/O ports to distinct nondefective elements from an N-element linear array requires d, t =θ(logN); Connection of RN pairs of elements from two N-element linear arrays requires only constant d and t; Connection of a chain of RN 2 elements from an N×N array also requires only constant d and t; Connection of a \(\sqrt R N \times \sqrt R N\) lattice from an N×N array requires \(d = \Omega (\sqrt {\log N} )\). Constant t suffices to connect a lattice if d=θ(logN). Algorithms are presented that connect any fraction R < l-p of the elements with probability approaching one as N increases. It appears that these results hold even under actual defect distributions.
Jonathan W. Greene, Abbas EI Gamal

Formal System Models

Frontmatter

Verification of VLSI Designs

Abstract
We present a method for formal proof of correctness of VLSI designs. Our approach is an adaptation of software verification techniques that works on circuit networks rather than program flowgraphs. The method may be viewed as a dual of Floyd’s method for program verification in which the roles of state and control are interchanged. We illustrate the approach with the semi-automatic verification of a simple NMOS design, and discuss its application to large scale VSLI designs. A proof of soundness of the method is presented in a forthcoming paper [ShS 83].
Robert E. Shostak

A Hierarchical Simulator Based on Formal Semantics

Abstract
Simulation consists of exercising the representation of a design on a general purpose computer. It differs from programming only because the ultimate implementation will be in a different medium, say a VLSI chip. In order for simulation to be in any sense effective, the simulated system must perform the same function as the ultimate implementation. A VLSI chip is a highly concurrent object; the simulation of such a chip amounts to programming a highly concurrent system. It follows that any demonstrably correct simulation technique will be one of the two types:
(1)
The entire design is represented as an implementation with objects which are abstract models of the medium at the bottom level (e.g. transistor model). The simulation operates on a representation which is a direct image of the fully instantiated implementation in the medium.
 
(2)
The design is represented as a hierarchy of implementations. Each level of implementation is constructed of objects which are abstract models of the implementation at the level below it. The simulation operates on a hierarchical representation where each level is refined by the level below it.
 
Marina C. Chen, Carver A. Mead

Trace Theory and the Definition of Hierarchical Components

Abstract
The relevance of trace theory to the design of VLSI circuits is discussed. We present an introduction to trace theory and to regular trace structures in particular. We show, in a number of examples, how trace theory can be used to reason about and to prove properties of hierarchical components.
Martin Rem, Jan L. A. van de Snepscheut, Jan Tijmen Udding

Deriving Circuits from Programs

Abstract
Introduces a theory of traces and a notation for programs consisting of a hierarchy of components. The trace structures of components and programs are regular and, hence, one might construct a finite state machine accepting a program’s traces. The chip area required to implement a finite state machine is, in the worst case, proportional to the number of states of the trace structure, [1], The number of states of the composite of some trace structures is, in the worst case, the product of the numbers of states of the composing trace structures. Hence, the chip area grows exponentially with the number of components in the program, which is not very attractive. Another disadvantage of the traditional finite state machines is that accepting traces is not very interesting: the purpose of a machine is to receive inputs and to generate outputs. In this paper the latter problem is solved by distinguishing, in the trace structures, between input and output symbols. The former problem is then solved by constructing a Mealy- or Moore-like finite state machine per component and by introducing a communications protocol between these machines. Hence, the circuit implementing a program consists of two parts: one part for the finite state machines and one part for their interconnections. In the worst case, the chip areas required for these parts exhibit linear and quadratic growth respectively with the number of components in the program.
Jan L. A. van de Snepscheut

System Building Blocks

Frontmatter

Self-Timed IC Design with PPL’s

Abstract
PPL’s [8] are a cellular IC design tool derived from PLA [9, 2] and SLA [6, 7] ideas. They allow IC designs to be created by placing logical symbols on a grid. Modules thus created may be interconnected by placing additional cells on the grid. A design may be checked with a simulator [5] which reads the file created in the process of laying out the design. When the designer is satisfied with a design he may finish the process of creating a die by placing cells from the PPL pad cell set on a larger grid around his design, and connecting power, ground and the input and output signals from the pads to the power, ground and 10 points on his design with metal and/or poly wires. The wires also lay out on a grid that assures proper spacing between them. The cell symbols are then automatically replaced by the poly, metal and diffusion layout that realizes them and the resulting composite is fractured and dumped to tape in the format for the intended pattern generator.
Alan B. Hayes

A Self-Timed Static RAM

Abstract
This paper presents the design of a self-timed static RAM. Although the memory array uses a conventional six-transistor static cell, extra circuitry is associated with each column and each row of the memory to make the memory self-timed. This circuitry detects several conditions: address line precharge complete, word line driven, bit line precharge complete, read complete, and write complete. To increase the read speed, each column of the memory uses an unclocked sense amplifier. The memory design includes a controller that implements a four-phase request/acknowledge interface. Although the memory is intended for use as part of a single-chip processor and not as a stand-alone chip, we have fabricated a 64 by 64 bit test chip and measured its performance.
Edward H. Frank, Robert F. Sproull

Design of the PSC: A Programmable Systolic Chip

Abstract
The programmable systolic chip (PSC) is a high performance, special-purpose, single-chip microprocessor intended to be used in groups of tens or hundreds for the efficient implementation of a broad variety of systolic arrays. For implementing these systolic arrays, the PSC is expected to be at least an order of magnitude more efticient than conventional microprocessors, llie development of the PSC design, from initial concept to a silicon layout, took slightly less than a year, This project represents an integration of many disciplines including applications, algorithms, architecture, microprocessor design, and chip layout. This paper describes the goals of the project, tlie design process, major design features and current status.
Allan L. Fisher, H. T. Kung, Louis M. Monier, Hank Walker, Yasunori Dohi

Special-Purpose Chip Architectures

Frontmatter

The VLSI Design of a Reed-Solomon Encoder Using Berlekamp’s Bit-Serial Multiplier Algorithm

Abstract
E.R. Berlekamp has developed for the Jet Propulsion Laboratory a bit-serial multiplication algorithm for the encoding of Reed-Solomon (RS) codes, using a dual basis over a Galois field. The conventional RS-encoder for long codes often requires look-up tables to perform the multiplication of two field elements. Berlekamp’s algorithm requires only shifting and exclusive-OR operations. It is shown in this paper that the new dual-basis (255, 223) RS-encoder can be realized readily on a single VLSI chip with NMOS technology.
T. K. Truong, L. J. Deutsch, I. S. Reed, I. S. Hsu, K. Wang, C. S. Yeh

A VLSI Chess Legal Move Generator

Abstract
Constructing a chess legal move generator (LMG) illustrates the design and evaluar tion of a variety of algorithms and the problems encountered implementing them in VLSI. Software-based algorithms for LMG and their possible hardware implementations are examined. Several new approaches exploiting parallelism and systolic structure are developed. The implementations of several algorithms are compared: the space, time, complexity, and feasibility tradeoffs provide interesting insights into the development of custom-designed VLSI circuits for non-numeric applications.
Jonathan Schaeffer, P. A. D. Powell, Jim Jonkman

New VLSI Architectures with Reduced Hardware

Abstract
This paper explores the possibility of designing special-purpose chips with limited processing area. We propose a new architecture v/hich allows many problems to be solved quite efficiently on chips with very small processing areas. We consider in detail the sorting problem and show how it can be solved quickly and elegantly on our model. We show that sorting n numbers can be done 2 on a chip with processing area A = o(n) in time \(T = 0(\frac{{n\log ^2 n}} {A} + \sqrt n \,\log ^2 n)\) in a network with mesh-connected interconnections, which is optimal within a logn factor. The control is shown to be simple and easily implementable in VLSI. Several other examples, such as the Fast Fourier Transform and integer multiplication, are considered and are shown to possess simple and fast algorithms in our model.
Joseph Ja’Ja’, Robert Michael Owens

Silicon Compilation

Frontmatter

Dumbo, A Schematic-to-Layout Compiler

Abstract
This paper describes a technique for compiling logic circuit descriptions into layout and a program, Dumbo, to demonstrate the technique. Cells are described to the compiler as structural primitives, which describe the general form of the layout, and electrical primitives, which describe tlie circuit connectivity in terms of Boolean and pass gates. Dumbo maps these primitives into stick figures; it tlien solves a placement and wiring problem to produce a stick diagram tliat can be compacted into a layout. Implementation of Dumbo has focused on nMOS technology. Experimental results show that the resultmg system gives promising results for a large class of cells with much less effort on the designer’s part than required for traditional design techniques.
Wayne Wolf, John Newkirk, Robert Mathews, Robert Dutton

Macrocell Design for Concurrent Signal Processing

Abstract
The term “macrocell”, used in the context of large-scale integrated circuit design, has had a variety of meanings in the past. There is now general agreement that macrocells have the following properties:
1.
A macrocell is large, larger than for example a cell from a typical standard cell library. The macrocells to be described here range in complexity from several hundred to ten thousand transistors.
 
2.
Macrocells have sufficient internal functionality, and a sufliciently small need for external communication, that they may be assembled into a circuit using general placement and routing techniques with acceptable area efficiency.
 
3.
When a macrocell, stored in a library, is called into use by the designer, there will in general be parameters and options. This optimizes the macrocell to its particular application.
 
Stephen P. Pope, R. W. Brodersen

A Case Study of the F.I.R.S.T. Silicon Compiler

Abstract
This paper presents a case study of a particular silicon compiler and the development of the software environment necessary to support it. The FIRST silicon compiler (Fast Implementation of Real-time Signal Transforms) has been developed as a cooperative project between the departments of Electrical Engineering and Computer Science at the University of Edinburgh, in order to allow the rapid investigation and implementation of VLSI digital signal processing systems
Neil Bergmann
Weitere Informationen