Abstract
The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. While paper studies and software simulators are useful for understanding many high-level design trade-offs, prototypes are essential to ensure that no critical details are overlooked. A prototype provides convincing evidence of the feasibility of the design allows one to accurately estimate both the hardware and the complexity cost of various features, and provides a platform for studying real workloads. A 16-processor prototype of the DASH multiprocessor has been operational for the last six months. In this paper, the hardware overhead of directory-based cache coherence in the prototype is examined. We also discuss the performance of the system, and the speedups obtained by parallel applications running on the prototype. Using a sophisticated hardware performance monitor, we characterize the effectiveness of coherent caches and the relationship between an application's reference behavior and its speedup.
- 1 Agarwal, A., B.-H. Lim, D. Kranz, and J. Kubiatowicz. LimitLESS Directories: A Scalable Cache Coherence Scheme. In Proc. Fourth Int. Conf. on Architectural Support Programming Languages and Operating Systems. pp. 224- 234, 1991. Google ScholarDigital Library
- 2 Agarwal, A., R. Simoni, J. Hermessy, and M. Horowitz. An Evaluation of Directory Schemes for Cache Coherence. In Proc. 15th Int. Syrup. on Computer Architecture. pp. 280- 289, 1988. Google ScholarDigital Library
- 3 Baskett, F., T. Jermoluk, and D. Solomon. The 4D-MP Graphics Superworkstation: Computing + Graphics = 40 MIPS + 40 MFLOPS and 100,000 Lighted Polygons per Second. In Proc. Compcon Spring 88. pp. 468-471, 1988.Google Scholar
- 4 Censier, L. and P. Feautrier, A New Solution to Coherence Problems in Multicache Systems. IEEE Trans. on Con~uters, C(27):1112-1118, 1978.Google ScholarDigital Library
- 5 Flaig, C.M., VLSI Mesh Routing Systems. Technical Report 524 l:TR:87, California Institute of Technology, May 1987. Google ScholarDigital Library
- 6 Gupta, A., W.-D. Weber, and T. Mowry. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In Proc. 1990 Int. Conf. on Parallel Processing. pp. 1:312-321, 1990.Google Scholar
- 7 Lenoski, D., J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proc. 17th Int. Syrup. on Computer Architecture. pp. 148-159, 1990. Google ScholarDigital Library
- 8 Lenoski, D., J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Larn, The Stanford DASH Multiprocessor. Computer, 25(3), 1992. Google ScholarDigital Library
- 9 Lenoski, D.E., The Design and Analysis of DASH: A Scalable Directory-Based Multiprocessor. Ph.D. Thesis. Stanford University. 1991. Also available as Stanford University Technical Report CSL-TR-92-507 Google ScholarDigital Library
- 10 Lusk, E., R. Overbeek, J. Boyle, R. Butler, T. Disz, B. Glickfeld, J. Patterson, and R. Stevens, Portable Programs for Parallel Processors. Holt, Rinehard and Winston, Inc.1987. Google ScholarDigital Library
- 11 O'Krafka, B.W. and A.R. Newton. An Empirical Evaluation of Two Memory-Efficient Directory Methods. In Proc. 17th Int. Syrup. on Computer Architecture. pp. 138-147, 1990. Google ScholarDigital Library
- 12 Papamarcos, M.S. and J.H. Patel. A Low Overhead Coherence Solution for Multiprocessors with Private Cache Memories. In Proc. 11th Int. Syrup. on Computer Architecture. pp. 348-354, 1984. Google ScholarDigital Library
- 13 Singh, J.P., C. Holt, T.Totsuka, A. Gupta, andJ.L. Hermessy, Balancing and Data Locality in Parallel N-body Techniques. Technical Report CSL-TR-92-505, Stanford University, 1991.Google Scholar
- 14 Singh, J.P., W.-D. Weber, and A. Gupta, SPLASH: Stanford Parallel Applications for Shared Memory. Technical Report CSL-TR-91-469, Stanford University, 1991. Google ScholarDigital Library
- 15 Xilinx, The Programmable Gate Array Data Book. 1991.Google Scholar
Index Terms
- The DASH prototype: implementation and performance
Recommendations
The directory-based cache coherence protocol for the DASH multiprocessor
Special Issue: Proceedings of the 17th annual international symposium on Computer ArchitectureDASH is a scalable shared-memory multiprocessor currently being developed at Stanford's Computer Systems Laboratory. The architecture consists of powerful processing nodes, each with a portion of the shared-memory, connected to a scalable ...
The Stanford Dash Multiprocessor
The overall goals and major features of the directory architecture for shared memory (Dash) are presented. The fundamental premise behind the architecture is that it is possible to build a scalable high-performance machine with a single address space ...
The DASH Prototype: Logic Overhead and Performance
The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. The hardware overhead of directory-based cache coherence in a 48-processor is examined. The data ...
Comments