Skip to main content
main-content

Über dieses Buch

As we continue to build faster and fast. er computers, their performance is be­ coming increasingly dependent on the memory hierarchy. Both the clock speed of the machine and its throughput per clock depend heavily on the memory hierarchy. The time to complet. e a cache acce88 is oft. en the factor that det. er­ mines the cycle time. The effectiveness of the hierarchy in keeping the average cost of a reference down has a major impact on how close the sustained per­ formance is to the peak performance. Small changes in the performance of the memory hierarchy cause large changes in overall system performance. The strong growth of ruse machines, whose performance is more tightly coupled to the memory hierarchy, has created increasing demand for high performance memory systems. This trend is likely to accelerate: the improvements in main memory performance will be small compared to the improvements in processor performance. This difference will lead to an increasing gap between prOCe880r cycle time and main memory acce. time. This gap must be closed by improving the memory hierarchy. Computer architects have attacked this gap by designing machines with cache sizes an order of magnitude larger than those appearing five years ago. Microproce880r-based RISe systems now have caches that rival the size of those in mainframes and supercomputers.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract
High-performance processors require a large bandwidth to the memory system. Caches are small high-speed memories placed between the processor and main memory that increase the effective memory bandwidth. They store frequently used instructions and data in high-speed RAMs, providing fast access to a subset of the memory. Cache memories are effective because they exploit the locality property of programs [23]. The property of locahty is a program’s preference for a small subset of its address space over a given period of time.
Anant Agarwal

Chapter 2. Obtaining Accurate Trace Data

Abstract
Cache simulation studies depend heavily on realistic address traces to drive the simulation; the need for generating reliable traces cannot be overstated. This chapter describes a new method to generate address traces that overcomes many of the limitations of the current tracing methods. By changing the microcode of a computer so that it records the address of every memory location it touches, we can capture complete traces of all programs that run on the computer, including the operating system.
Anant Agarwal

Chapter 3. Cache Analyses Techniques — An Analytical Cache Model

Abstract
Once accurate traces have been obtained, their efficient analyses is an important issue. Brute-force simulations to derive cache performance figures are time consuming and may not yield much insight into cache behavior. Three methods are presented in the following chapters for efficient and insightful analyses of caches.
Anant Agarwal

Chapter 4. Transient Cache Analysis — Trace Sampling and Trace Stitching

Abstract
Despite its expensive nature, trace-driven simulation is necessary if more than first-cut estimates as provided by analytical models are desired. In the following two chapters, efficient techniques of trace-driven simulation for cache performance evaluation are presented. This chapter first analyzes cache transient behavior, and then presents a trace sampling methodology.
Anant Agarwal

Chapter 5. Cache Performance Analysis for System References

Abstract
The previous chapters laid the groundwork for accurate and efficient cache performance analysis by describing techniques for data collection and cache analysis. The next two chapters analyze cache performance using both cold-start and warm-start trace-driven simulation of the ATUM trace samples.
Anant Agarwal

Chapter 6. Impact of Multiprogramming on Cache Performance

Abstract
This chapter discusses both the performance of virtual caches for multiprogramming workloads, and the validity of the earlier schemes to model multiprogramming effects. After outlining our cache analysis methods for multiprogramming workloads, we compare cache performance data at different levels of multiprogramming to motivate a study of this nature. We evaluate various techniques that have been proposed for improving multitasking in caches, such as cache flushing and PIDs. After initially considering only user references, we then include system references and discuss the impact of system references on cache performance in a multiprogramming environment. We contrast our experimental findings with various synthetic models of multiprogrammed caches used by earlier researchers and examine the validity of assumptions made by earlier studies. We end this chapter by examining various techniques to improve cache performance for multiprogramming.
Anant Agarwal

Chapter 7. Multiprocessor Cache Analysis

Abstract
In recent times multiprocessing has become a popular means of achieving performance levels that can far exceed those of single processors. The design of high-performance multiprocessors necessitates a careful analysis of the memory system performance of parallel programs. The common theme of this section is the increased understanding of the dynamics of large writeback caches in multiple processors with shared memory. The multiprocessor extension of ATUM for gathering multiprocessor traces and its implementation on a VAX 8350 multiprocessor is first described. Because the resulting parallel traces are dissimilar to the traces used in our single processor studies, and since we would like isolate the effects of multiprocessing on cache performance from the effects of multiprogramming, we will first repeat some single processor experiments with the new traces, and compare the effect of cache interference between multiple processes in both physical-addressed and virtual-addressed caches. Such a study is possible with the extended ATUM scheme because a complete virtual to physical address map is contained in the traces. The performance degradation due to cache interference between multiple processors is then analyzed. The improvement in cache performance if process migration is disallowed is evaluated. We also study semaphore usage and its effect on cache performance.
Anant Agarwal

Chapter 8. Conclusions and Suggestions for Future Work

Abstract
We began this research with several goals: The main aim was to accurately characterize cache performance with particular attention to large caches in realistic environments. This required more accurate and efficient cache analysis techniques than were available earlier, and also reliable trace data to derive accurate cache performance statistics.
Anant Agarwal

Backmatter

Weitere Informationen