skip to main content
article
Free Access

Cooperative shared memory: software and hardware for scalable multiprocessors

Published:01 November 1993Publication History
Skip Abstract Section

Abstract

We believe the paucity of massively parallel, shared-memory machines follows from the lack of a shared-memory programming performance model that can inform programmers of the cost of operations (so they can avoid expensive ones) and can tell hardware designers which cases are common (so they can build simple hardware to optimize them). Cooperative shared memory, our approach to shared-memory design, addresses this problem.

Our initial implementation of cooperative shared memory uses a simple programming model, called Check-In/Check-Out (CICO), in conjunction with even simpler hardware, called Dir1SW. In CICO, programs bracket uses of shared data with a check_in directive terminating the expected use of the data. A cooperative prefetch directive helps hide communication latency. Dir1SW is a minimal directory protocol that adds little complexity to message-passing hardware, but efficiently supports programs written within the CICO model.

References

  1. ADVE, S. V., ADVE, V. S., HILL, M. D., ANII VERNON, M. K. 1991. Comparison of hardware and software cache coherence schemes. In Proceedings of the 18th Annual International Symposium on Computer Architecture ACM/IEEE, New York, 298-308, Google ScholarGoogle Scholar
  2. AGARWAL, A., SIMONI, R., HOROWITZ, M., AND HENNESSY, J. 1988 An evaluation of directory schemes for cache coherence In Proceedings of the 15th Annual International Symposium on Computer Architecture. ACM/IEEE, New York, 280-289. Google ScholarGoogle Scholar
  3. ARCHIBALD, J., AND BAER, J.-L. 1984. An economical solution to the cache coherence problem. In Proceedings of the Ilth Annual International Symposium on Computer Architecture. 355-362. Google ScholarGoogle Scholar
  4. BAYLUR, S J., MCAULIFFE, K. P., AND RATHI, B, D, 1991. An evaluation of cache coherence protocols for MIN-based multiprocessors In International Symposium on Shared Memory Multiprocessing. 230-241.Google ScholarGoogle Scholar
  5. BELL, C G. 1985 Multis A new class of multiprocessor computers. Science 228, 462-466Google ScholarGoogle Scholar
  6. CALLAHAN, D., KENNEDY, K., AND POTERFIELD, A. 1991 Software prefetching In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV) 40-52 Google ScholarGoogle Scholar
  7. CANON, M D , FRITZ, D. H , HOWARD, J. H , HOWELL, T. D., MITOMA, M. F., AND RODRIGUEZ-ROSELL, J 1980. A virtual machine emulator for performance evaluation Commun. ACM 23, 2 (Feb.), 71-80. Google ScholarGoogle Scholar
  8. CHAIKEN, D. L. 1990 Cache coherence protocols for large-scale multiprocessors. Tech. Rep MIT/LCS/TR-489, MIT Laboratory for Computer Science, Cambridge, Mass. Google ScholarGoogle Scholar
  9. CHAIKEN, D., KUBIATOWICS, J., AND AKARWAL, A, 1991. LimitLESS Directories: A scalable cache coherence scheme. In proceedings of the 4th International Conference on Architectural Support for Programmmg Languages and Opcratzng Systems (ASPLOS IV). 224-234. Google ScholarGoogle Scholar
  10. CHEONG, J., AND VEIDENBAUM, A. V. 1988. A cache coherence scheme with fast selective invalidation. In Proceedings of the 15th Annual International Symposium on Computer Architecture. 299-307. Google ScholarGoogle Scholar
  11. CHERITON, D. R., GOOSEN, H. A., AND BOYLE, P. D. 1991a. Paradigm: A highly scalable shared-memory multiprocessor. IEEE Comput. 24, 2 (Feb.), 33-46. Google ScholarGoogle Scholar
  12. CHERITON, D. R., GOOSEN, H. A., AND MACHANICK, P. 1991b. Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: A first experience. In International Symposium on Shared Memory Multiprocessing. 109-118.Google ScholarGoogle Scholar
  13. CYTRON, R., KARLOVSKV, S., AND MCAULIFFE, K. P. 1988. Automatic management of programmable caches. In Proceedings of the 1988 International Conference on Parallel Processing (Vol. II Software). Penn State University, 229-238.Google ScholarGoogle Scholar
  14. GOODMAN, J. R., VARDTOA, M. K., AND WOEST, P. J. 1989. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the 3rd International Conference on Architectural Support for Programmmg Languages and Operating Systems (ASPLOS III). 64-77. Google ScholarGoogle Scholar
  15. GUPTA, A., HENNESSY, J., GHARACHORLOO, K., MOWRY, T.J. AND WEBER, W.-D. 1991. Comparative evaluation of latency reducing and tolerating techniques. In Proceedings of the 18th Ann ual International Symposium on Computer Architecture. 254-263. Google ScholarGoogle Scholar
  16. GUSTAVSON, D. B. 1992. The scalable coherent interface and related standards projects. IEEE Micro 12, 2, 10-22. Google ScholarGoogle Scholar
  17. GUSTAVSON, D. B., AND JAMES, D. V., Ens. 1991. SCI: Scalable Coherent Interface: Logical, Physical and Cache coherence Specifications. Vol. P1596/D2.00 18 Nov. 91. Draft 2.00 for Recirculation to the Balloting Body. IEEE, New York.Google ScholarGoogle Scholar
  18. HENNESSY, J. L., AND PATTERSON, D.A. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, Calif. Google ScholarGoogle Scholar
  19. HILL, M. D., AND LARUS, J. R. 1990. Cache considerations for programmers of multiprocessors. Commun. ACM 33, 8 (Aug.), 97-102. Google ScholarGoogle Scholar
  20. HILL, M. D., LARUS, J. R., REINHARDT, S. K., AND WOOD, D. A. 1992. Cooperative shared memory: Software and hardware for scalable multiprocessors. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V). 262-273. Google ScholarGoogle Scholar
  21. JOHNSON, D. 1990. Trap architectures for Lisp systems. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming. ACM, New York, 79-86. Google ScholarGoogle Scholar
  22. KATZ, R. H., EGGERS, S. J., WOOD, D. A., PERKINS, C. L., AND SHELDON, R. G. 1985. Implementing a cache consistency protocol. In proceedings of the 12th Annual International Symposium on Computer Architecture. 276-283. Google ScholarGoogle Scholar
  23. LARUS, J. R., CHANDRA, S., AND WOOD, D. A. 1993. CICO: A shared-memory programming performance model. In Portability and Performance for Parallel Processing. Wiley, Sussex, England.Google ScholarGoogle Scholar
  24. LENOSKI, D., LAUDON, J., GHARACHORLOO, K. WEBER, W.-D., GUPTA, A., HENNESSY, J., HOROWITZ, M., AND Lw, M. 1992. The Stanford DASH multiprocessor. IEEE Comput. 25, 3 (Mar.), 63-79. Google ScholarGoogle Scholar
  25. LENOSKI, D., LAUDON, J., JOE, T., NAKAHIRA, D., STEVRNS, L., GUPTA, A., AND HENNESSY, J. 1993. The DASH prototype: Logic overhead and performance. IEEE Trans. Parall. Distr. Syst. 4, 1 (Jan.), 41-61. Google ScholarGoogle Scholar
  26. LIN, C., AND SNYDER, L. 1990. A comparison of programming models for shared memory multiprocessors. In Proceedings of the 1990 International Conference on Parallel Processing (Vol. H Software). Penn State University, 11-163-170.Google ScholarGoogle Scholar
  27. MELLOR-CRUMMEY, J. M., AND SCOTT, M. L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1(Feb.), 21-65. Google ScholarGoogle Scholar
  28. MIN, S. L., AND BAER, J.-L. 1989. A timestarnp-based cache coherence scheme. In Proceedings of the 1989 International Conference on Parallel Processing (Vol. I Architecture). Penn State University, I-23-32.Google ScholarGoogle Scholar
  29. REINHARDT, S. K., HILL, M. D., LARUS, J. R., LEBECK, A. R., LEWIS, J. C., AND WOOD, D. A. 1993. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. In Proceedings of the 1993 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems. ACM, New York, 48-60 Google ScholarGoogle Scholar
  30. SINGH, J. P,, WEBER, W.-D,, AND GUPTA, A. 1992. SPLASH: Stanford Parallel Applications for Shared Memory. Comput ArchLt. Neus 20, 1(Mar.), 44, Google ScholarGoogle Scholar
  31. WEBER, W.-D , AND GUPTA, A. 1989 Analysis of cache invalidation patterns in multiprocessors. In Proceedings of the 3rd International Conference on Arehltectural Support for Programming Languages and Operating Systems (ASPLOS III). 243-256. Google ScholarGoogle Scholar
  32. WOOD, D A., CEIA.NDRA, S., FALSAFI, B.J. HILL, M, D., LARUS, J. R., LABACK, A. R., LEWIS, J. C., MUMIERJEE, S. S., PALACHARLA, S., AND REINHARDT, S. K, 1993. Mechanisms for cooperative shared memory. In proceedings of the 20th Annual International Sympoisum on Computer Architecture. 156-168. Google ScholarGoogle Scholar
  33. WOOD, D A. GIBSON, G. G., AND KATZ, R. H. 1990. Verifying a multiprocessor cache controller using random case generation. IEEE Des. Test Comput 7, 4 (Aug), 13-25 Google ScholarGoogle Scholar

Index Terms

  1. Cooperative shared memory: software and hardware for scalable multiprocessors

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader