skip to main content
article
Free Access

The DASH prototype: implementation and performance

Published:01 April 1992Publication History
Skip Abstract Section

Abstract

The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. While paper studies and software simulators are useful for understanding many high-level design trade-offs, prototypes are essential to ensure that no critical details are overlooked. A prototype provides convincing evidence of the feasibility of the design allows one to accurately estimate both the hardware and the complexity cost of various features, and provides a platform for studying real workloads. A 16-processor prototype of the DASH multiprocessor has been operational for the last six months. In this paper, the hardware overhead of directory-based cache coherence in the prototype is examined. We also discuss the performance of the system, and the speedups obtained by parallel applications running on the prototype. Using a sophisticated hardware performance monitor, we characterize the effectiveness of coherent caches and the relationship between an application's reference behavior and its speedup.

References

  1. 1 Agarwal, A., B.-H. Lim, D. Kranz, and J. Kubiatowicz. LimitLESS Directories: A Scalable Cache Coherence Scheme. In Proc. Fourth Int. Conf. on Architectural Support Programming Languages and Operating Systems. pp. 224- 234, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 Agarwal, A., R. Simoni, J. Hermessy, and M. Horowitz. An Evaluation of Directory Schemes for Cache Coherence. In Proc. 15th Int. Syrup. on Computer Architecture. pp. 280- 289, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 Baskett, F., T. Jermoluk, and D. Solomon. The 4D-MP Graphics Superworkstation: Computing + Graphics = 40 MIPS + 40 MFLOPS and 100,000 Lighted Polygons per Second. In Proc. Compcon Spring 88. pp. 468-471, 1988.Google ScholarGoogle Scholar
  4. 4 Censier, L. and P. Feautrier, A New Solution to Coherence Problems in Multicache Systems. IEEE Trans. on Con~uters, C(27):1112-1118, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 Flaig, C.M., VLSI Mesh Routing Systems. Technical Report 524 l:TR:87, California Institute of Technology, May 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 Gupta, A., W.-D. Weber, and T. Mowry. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In Proc. 1990 Int. Conf. on Parallel Processing. pp. 1:312-321, 1990.Google ScholarGoogle Scholar
  7. 7 Lenoski, D., J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proc. 17th Int. Syrup. on Computer Architecture. pp. 148-159, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 Lenoski, D., J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Larn, The Stanford DASH Multiprocessor. Computer, 25(3), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 Lenoski, D.E., The Design and Analysis of DASH: A Scalable Directory-Based Multiprocessor. Ph.D. Thesis. Stanford University. 1991. Also available as Stanford University Technical Report CSL-TR-92-507 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 Lusk, E., R. Overbeek, J. Boyle, R. Butler, T. Disz, B. Glickfeld, J. Patterson, and R. Stevens, Portable Programs for Parallel Processors. Holt, Rinehard and Winston, Inc.1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 O'Krafka, B.W. and A.R. Newton. An Empirical Evaluation of Two Memory-Efficient Directory Methods. In Proc. 17th Int. Syrup. on Computer Architecture. pp. 138-147, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 Papamarcos, M.S. and J.H. Patel. A Low Overhead Coherence Solution for Multiprocessors with Private Cache Memories. In Proc. 11th Int. Syrup. on Computer Architecture. pp. 348-354, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 Singh, J.P., C. Holt, T.Totsuka, A. Gupta, andJ.L. Hermessy, Balancing and Data Locality in Parallel N-body Techniques. Technical Report CSL-TR-92-505, Stanford University, 1991.Google ScholarGoogle Scholar
  14. 14 Singh, J.P., W.-D. Weber, and A. Gupta, SPLASH: Stanford Parallel Applications for Shared Memory. Technical Report CSL-TR-91-469, Stanford University, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 Xilinx, The Programmable Gate Array Data Book. 1991.Google ScholarGoogle Scholar

Index Terms

  1. The DASH prototype: implementation and performance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 20, Issue 2
      Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)
      May 1992
      429 pages
      ISSN:0163-5964
      DOI:10.1145/146628
      Issue’s Table of Contents
      • cover image ACM Conferences
        ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture
        May 1992
        439 pages
        ISBN:0897915097
        DOI:10.1145/139669

      Copyright © 1992 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 April 1992

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader