Abstract
We believe the paucity of massively parallel, shared-memory machines follows from the lack of a shared-memory programming performance model that can inform programmers of the cost of operations (so they can avoid expensive ones) and can tell hardware designers which cases are common (so they can build simple hardware to optimize them). Cooperative shared memory, our approach to shared-memory design, addresses this problem.
Our initial implementation of cooperative shared memory uses a simple programming model, called Check-In/Check-Out (CICO), in conjunction with even simpler hardware, called Dir1SW. In CICO, programs bracket uses of shared data with a check_in directive terminating the expected use of the data. A cooperative prefetch directive helps hide communication latency. Dir1SW is a minimal directory protocol that adds little complexity to message-passing hardware, but efficiently supports programs written within the CICO model.
- ADVE, S. V., ADVE, V. S., HILL, M. D., ANII VERNON, M. K. 1991. Comparison of hardware and software cache coherence schemes. In Proceedings of the 18th Annual International Symposium on Computer Architecture ACM/IEEE, New York, 298-308, Google Scholar
- AGARWAL, A., SIMONI, R., HOROWITZ, M., AND HENNESSY, J. 1988 An evaluation of directory schemes for cache coherence In Proceedings of the 15th Annual International Symposium on Computer Architecture. ACM/IEEE, New York, 280-289. Google Scholar
- ARCHIBALD, J., AND BAER, J.-L. 1984. An economical solution to the cache coherence problem. In Proceedings of the Ilth Annual International Symposium on Computer Architecture. 355-362. Google Scholar
- BAYLUR, S J., MCAULIFFE, K. P., AND RATHI, B, D, 1991. An evaluation of cache coherence protocols for MIN-based multiprocessors In International Symposium on Shared Memory Multiprocessing. 230-241.Google Scholar
- BELL, C G. 1985 Multis A new class of multiprocessor computers. Science 228, 462-466Google Scholar
- CALLAHAN, D., KENNEDY, K., AND POTERFIELD, A. 1991 Software prefetching In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV) 40-52 Google Scholar
- CANON, M D , FRITZ, D. H , HOWARD, J. H , HOWELL, T. D., MITOMA, M. F., AND RODRIGUEZ-ROSELL, J 1980. A virtual machine emulator for performance evaluation Commun. ACM 23, 2 (Feb.), 71-80. Google Scholar
- CHAIKEN, D. L. 1990 Cache coherence protocols for large-scale multiprocessors. Tech. Rep MIT/LCS/TR-489, MIT Laboratory for Computer Science, Cambridge, Mass. Google Scholar
- CHAIKEN, D., KUBIATOWICS, J., AND AKARWAL, A, 1991. LimitLESS Directories: A scalable cache coherence scheme. In proceedings of the 4th International Conference on Architectural Support for Programmmg Languages and Opcratzng Systems (ASPLOS IV). 224-234. Google Scholar
- CHEONG, J., AND VEIDENBAUM, A. V. 1988. A cache coherence scheme with fast selective invalidation. In Proceedings of the 15th Annual International Symposium on Computer Architecture. 299-307. Google Scholar
- CHERITON, D. R., GOOSEN, H. A., AND BOYLE, P. D. 1991a. Paradigm: A highly scalable shared-memory multiprocessor. IEEE Comput. 24, 2 (Feb.), 33-46. Google Scholar
- CHERITON, D. R., GOOSEN, H. A., AND MACHANICK, P. 1991b. Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: A first experience. In International Symposium on Shared Memory Multiprocessing. 109-118.Google Scholar
- CYTRON, R., KARLOVSKV, S., AND MCAULIFFE, K. P. 1988. Automatic management of programmable caches. In Proceedings of the 1988 International Conference on Parallel Processing (Vol. II Software). Penn State University, 229-238.Google Scholar
- GOODMAN, J. R., VARDTOA, M. K., AND WOEST, P. J. 1989. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the 3rd International Conference on Architectural Support for Programmmg Languages and Operating Systems (ASPLOS III). 64-77. Google Scholar
- GUPTA, A., HENNESSY, J., GHARACHORLOO, K., MOWRY, T.J. AND WEBER, W.-D. 1991. Comparative evaluation of latency reducing and tolerating techniques. In Proceedings of the 18th Ann ual International Symposium on Computer Architecture. 254-263. Google Scholar
- GUSTAVSON, D. B. 1992. The scalable coherent interface and related standards projects. IEEE Micro 12, 2, 10-22. Google Scholar
- GUSTAVSON, D. B., AND JAMES, D. V., Ens. 1991. SCI: Scalable Coherent Interface: Logical, Physical and Cache coherence Specifications. Vol. P1596/D2.00 18 Nov. 91. Draft 2.00 for Recirculation to the Balloting Body. IEEE, New York.Google Scholar
- HENNESSY, J. L., AND PATTERSON, D.A. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, Calif. Google Scholar
- HILL, M. D., AND LARUS, J. R. 1990. Cache considerations for programmers of multiprocessors. Commun. ACM 33, 8 (Aug.), 97-102. Google Scholar
- HILL, M. D., LARUS, J. R., REINHARDT, S. K., AND WOOD, D. A. 1992. Cooperative shared memory: Software and hardware for scalable multiprocessors. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V). 262-273. Google Scholar
- JOHNSON, D. 1990. Trap architectures for Lisp systems. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming. ACM, New York, 79-86. Google Scholar
- KATZ, R. H., EGGERS, S. J., WOOD, D. A., PERKINS, C. L., AND SHELDON, R. G. 1985. Implementing a cache consistency protocol. In proceedings of the 12th Annual International Symposium on Computer Architecture. 276-283. Google Scholar
- LARUS, J. R., CHANDRA, S., AND WOOD, D. A. 1993. CICO: A shared-memory programming performance model. In Portability and Performance for Parallel Processing. Wiley, Sussex, England.Google Scholar
- LENOSKI, D., LAUDON, J., GHARACHORLOO, K. WEBER, W.-D., GUPTA, A., HENNESSY, J., HOROWITZ, M., AND Lw, M. 1992. The Stanford DASH multiprocessor. IEEE Comput. 25, 3 (Mar.), 63-79. Google Scholar
- LENOSKI, D., LAUDON, J., JOE, T., NAKAHIRA, D., STEVRNS, L., GUPTA, A., AND HENNESSY, J. 1993. The DASH prototype: Logic overhead and performance. IEEE Trans. Parall. Distr. Syst. 4, 1 (Jan.), 41-61. Google Scholar
- LIN, C., AND SNYDER, L. 1990. A comparison of programming models for shared memory multiprocessors. In Proceedings of the 1990 International Conference on Parallel Processing (Vol. H Software). Penn State University, 11-163-170.Google Scholar
- MELLOR-CRUMMEY, J. M., AND SCOTT, M. L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1(Feb.), 21-65. Google Scholar
- MIN, S. L., AND BAER, J.-L. 1989. A timestarnp-based cache coherence scheme. In Proceedings of the 1989 International Conference on Parallel Processing (Vol. I Architecture). Penn State University, I-23-32.Google Scholar
- REINHARDT, S. K., HILL, M. D., LARUS, J. R., LEBECK, A. R., LEWIS, J. C., AND WOOD, D. A. 1993. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. In Proceedings of the 1993 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems. ACM, New York, 48-60 Google Scholar
- SINGH, J. P,, WEBER, W.-D,, AND GUPTA, A. 1992. SPLASH: Stanford Parallel Applications for Shared Memory. Comput ArchLt. Neus 20, 1(Mar.), 44, Google Scholar
- WEBER, W.-D , AND GUPTA, A. 1989 Analysis of cache invalidation patterns in multiprocessors. In Proceedings of the 3rd International Conference on Arehltectural Support for Programming Languages and Operating Systems (ASPLOS III). 243-256. Google Scholar
- WOOD, D A., CEIA.NDRA, S., FALSAFI, B.J. HILL, M, D., LARUS, J. R., LABACK, A. R., LEWIS, J. C., MUMIERJEE, S. S., PALACHARLA, S., AND REINHARDT, S. K, 1993. Mechanisms for cooperative shared memory. In proceedings of the 20th Annual International Sympoisum on Computer Architecture. 156-168. Google Scholar
- WOOD, D A. GIBSON, G. G., AND KATZ, R. H. 1990. Verifying a multiprocessor cache controller using random case generation. IEEE Des. Test Comput 7, 4 (Aug), 13-25 Google Scholar
Index Terms
- Cooperative shared memory: software and hardware for scalable multiprocessors
Recommendations
Mechanisms for cooperative shared memory
Special Issue: Proceedings of the 20th annual international symposium on Computer architecture (ISCA '93)This paper explores the complexity of implementing directory protocols by examining their mechanisms primitive operations on directories, caches, and network interfaces. We compare the following protocols: Dir1B, Dir4B, Dir4NB, DirnNB[2], Dir1SW[9] and ...
Mechanisms for cooperative shared memory
ISCA '93: Proceedings of the 20th annual international symposium on computer architectureThis paper explores the complexity of implementing directory protocols by examining their mechanisms primitive operations on directories, caches, and network interfaces. We compare the following protocols: Dir1B, Dir4B, Dir4NB, DirnNB[2], Dir1SW[9] and ...
A comparative evaluation of hardware-only and software-only directory protocols in shared-memory multiprocessors
The hardware complexity of hardware-only directory protocols in shared-memory multiprocessors has motivated many researchers to emulate directory management by software handlers executed on the compute processors, called software-only directory ...
Comments