Abstract
Efficient synchronization is important for achieving good performance in parallel programs, especially on large-scale multiprocessors. Most synchronization algorithms have been designed to run on a dedicated machine, with one application process per processor, and can suffer serious performance degradation in the presence of multiprogramming. Problems arise when running processes block or, worse, busy-wait for action on the part of a process that the scheduler has chosen not to run. We show that these problems are particularly severe for scalable synchronization algorithms based on distributed data structures. We then describe and evaluate a set of algorithms that perform well in the presence of multiprogramming while maintaining good performance on dedicated machines. We consider both large and small machines, with a particular focus on scalability, and examine mutual-exclusion locks, reader-writer locks, and barriers. Our algorithms vary in the degree of support required from the kernel scheduler. We find that while it is possible to avoid pathological performance problems using previously proposed kernel mechanisms, a modest additional widening of the kernel/user interface can make scheduler-conscious synchronization algorithms significantly simpler and faster, with performance on dedicated machines comparable to that of scheduler-oblivious algorithms.
- ABOULENEIN, N. M., GOODMAN, J. R., GJESSING, S., AND WOEST, P.J. 1994. Hardware support for synchronization in the Scalable Coherent Interface (SCI). In Proceedings of the 8th International Parallel Processing Symposium. IEEE, New York, 141-150.]] Google Scholar
- ANDERSON, T.E. 1990. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1, 1 (Jan.), 6-16.]] Google ScholarDigital Library
- ANDERSON, T. E., BERSHAD, B. N., LAZOWSKA, E. D., AND LEVY, H. M. 1992. Scheduler activations: Effective kernel support for the user-level management of parallelism. ACM Trans. Comput. Syst. 10, 1 (Feb.), 53-79.]] Google ScholarDigital Library
- ARENSTORF, N. S. AND JORDAN, H. 1989. Comparing barrier algorithms. Parallel Comput. 12, 157-170.]]Google ScholarCross Ref
- AXELROD, T. S. 1986. Effects of synchronization barriers on multiprocessor performance. Parallel Comput. 3, 129-140.]] Google ScholarDigital Library
- BLACK, D. L. 1990. Scheduling support for concurrency and parallelism in the Mach operating system. IEEE Comput. 23, 5 (May), 35-43.]] Google ScholarDigital Library
- CRAIG, T. S. 1993. Queuing spin lock algorithms to support timing predictability. In Proceedings of the 14th IEEE Real-Time Systems Symposium. IEEE, New York, 148-157.]]Google Scholar
- CROVELLA, M., DAS, P., DUBNICKI, C., LEBLANC, T., AND MARKATOS, E. 1991. Multiprogramming on multiprocessors. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. IEEE, New York, 590-597.]]Google Scholar
- EDLER, J., LIPKIS, J., AND SCHONBERG, E. 1988. Process management for highly parallel UNIX systems. In Proceedings of the USENIX Workshop on Unix and Supercomputers. USENIX Assoc., Berkeley, Calif.]]Google Scholar
- GRAUNKE, G. AND THAKKAR, S. 1990. Synchronization algorithms for shared-memory multiprocessors. IEEE Comput. 23, 6 (June), 60-69.]] Google ScholarDigital Library
- HENSGEN, D., FINKEL, R., AND MANBER, U. 1988. Two algorithms for barrier synchronization. Int. J. Parallel Program. 17, 1, 1-17.]] Google ScholarDigital Library
- HERLIHY, M. 1993. A methodology for implementing highly concurrent data objects. ACM Trans. Program. Lang. Syst. 15, 5 (Nov.), 745-770.]] Google ScholarDigital Library
- KARLIN, A. R., LI, K., MANASSE, M. S., AND OWICKI, S. 1991. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. ACM, New York, 41-55.]] Google Scholar
- KONTOTHANASSIS, L. AND WISNIEWSKI, R. 1993. Using scheduler information to achieve optimal barrier synchronization performance. In Proceedings of the 4th ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 64-72.]] Google Scholar
- KONTOTHANASSIS, L. I., WISNIEWSKI, R. W., AND SCOTT, M. L. 1994. Scheduler-conscious synchronization. Tech. Rep. 550, Computer Science Dept., Univ. of Rochester, Rochester, N.Y.]] Google Scholar
- KRANZ, D., JOHNSON, K., AGARWAL, A., KUBIATOWICZ, J., AND LIM, B.-H. 1993. Integrating message-passing and shared-memory: Early experience. In Proceedings of the 4th ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 54-63.]] Google Scholar
- KRIEGER, O., STUMM, M., AND UNRAU, R. 1993. A fair fast scalable reader-writer lock. In Proceedings of the 1993 International Conference on Parallel Processing. CRC Press, Boca Raton, Fla., II: 201-204.]] Google Scholar
- LEE, C. A. 1990. Barrier synchronization over multistage interconnection networks. In Proceedings of the 2nd IEEE Symposium on Parallel and Distributed Processing. IEEE, New York, 130-133.]]Google Scholar
- LENOSKI, D., LAUDON, J., GHARACHORLOO, K., WEBER, W.-D., GUPTA, A., HENNESSY, J., HORWITZ, M., AND LAM, M.S. 1992. The Stanford Dash multiprocessor. IEEE Comput. 25, 3 (Mar.), 63-79.]] Google ScholarDigital Library
- LEUTENEGGER, S. T. AND VERNON, M.K. 1990. Performance of multiprogrammed multiprocessor scheduling algorithms. In Proceedings of the 1990 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 226-236.]] Google Scholar
- LIM, B.-H. AND AGARWAL, A. 1993. Waiting algorithms for synchronization in large-scale multiprocessors. ACM Trans. Comput. Syst. 11, 3 (Aug.), 253-294.]] Google ScholarDigital Library
- LIM, B.-H. AND AGARWAL, A. 1994. Reactive synchronization algorithms for multiprocessors. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 25-35.]] Google Scholar
- LINDSAY, B. G., HAAS, L. M., MOHAN, C., WILMS, P. F., AND YOST, R.A. 1984. Computation and communication in R*: A distributed database manager. ACM Trans. Comput. Syst. 2, 1 (Feb.), 24-28.]] Google ScholarDigital Library
- LUBACHEVSKY, B. 1989. Synchronization barrier and related tools for shared memory parallel programming. In Proceedings of the 1989 International Conference on Parallel Processing. Penn State University Press, University Park, Pa., II:175-179.]]Google Scholar
- MAGNUSSEN, P., LANDIN, A., AND HAGERSTEN, E. 1994. Queue locks on cache coherent multiprocessors. In Proceedings of the 8th International Parallel Processing Symposium. IEEE, New York, 165-171.]] Google Scholar
- MARKATOS, E., CROVELLA, M., DAS, P., DUBNICKI, C., AND LEBLANC, T. 1991. The effects of multiprogramming on barrier synchronization. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. IEEE, New York, 662-669.]]Google Scholar
- MARKATOS, E.P. 1991. Multiprocessor synchronization primitives with priorities. In Proceedings of the 8th IEEE Workshop on Real-Time Operating Systems and Software. IEEE, New York, 1-7.]]Google Scholar
- MARSH, B. D., SCOTT, M. L., LEBLANC, T. J., AND MARKATOS, E.P. 1991. First-class user-level threads. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. ACM, New York, 110-121.]] Google Scholar
- MELLOR-CRUMMEY, J. M. AND SCOTT, M.L. 1991a. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb.), 21-65.]] Google ScholarDigital Library
- MELLOR-CRUMMEY, J. M. AND SCOTT, M.L. 1991b. Scalable reader-writer synchronization on shared-memory multiprocessors. In Proceedings of the 3rd ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 106-113.]] Google Scholar
- NOAKES, M., WALLACH, D., AND DALLY, W. 1993. The J-machine multicomputer: An architecture evaluation. In Proceedings of the 20th International Symposium on Computer Architecture. ACM, New York, 224-235.]] Google Scholar
- OUSTERHOUT, J.K. 1982. Scheduling techniques for concurrent systems. In Proceedings of the 3rd International Conference on Distributed Computing Systems. IEEE, New York, 22-30.]]Google Scholar
- SCOTT, M. L. AND MELLOR-CRUMMEY, J. M. 1994. Fast, contention-free combining tree barriers. Int. J. Parallel Program, 22, 4 (Aug.), 449-481.]] Google ScholarDigital Library
- SCOTT, M. L., LEBLANC, T. J., AND MARSH, B.D. 1990. Multi-model parallel programming in Psyche. In Proceedings of the 2nd ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 70-78.]] Google Scholar
- SINGH, J. P., WEBER, W.-D., AND GUPTA, A. 1992. SPLASH: Stanford parallel applications for shared-memory. ACM SIGARCH Comput. Arch. News 20, 1 (Mar.), 5-44.]] Google ScholarDigital Library
- STAMOS, J. W. AND GIFFORD, D.K. 1990. Remote evaluation. ACM Trans. Program. Lang. Syst. 12, 4 (Oct.), 537-565.]] Google ScholarDigital Library
- TAKADA, H. AND SAKAMURA, K. 1994. Predictable spin lock algorithms with preemption. In Proceedings of the 11th IEEE Workshop on Real-Time Operating Systems and Software. IEEE, New York, 2-6.]]Google Scholar
- TUCKER, A. AND GUPTA, A. 1989. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles. ACM, New York, 159-166.]] Google Scholar
- VON EICKEN, T., CULLER, D. E., GOLDSTEIN, S. C., AND SCHAUSER, K. E. 1992. Active messages: A mechanism for integrated communication and computation. In Proceedings of the 19th International Symposium on Computer Architecture. ACM, New York, 256-266.]] Google Scholar
- WISNIEWSKI, R. W., KONTOTHANASSIS, L., AND SCOTT, M. L. 1994. Scalable spin locks for multiprogrammed systems. In Proceedings of the 8th International Parallel Processing Symposium. IEEE, New York, 583-589.]] Google Scholar
- YANG, H.-H. AND ANDERSON, J. H. 1993. Fast, scalable synchronization with minimal hardware support (extended abstract). In Proceedings of the 12th ACM Symposium on Principles of Distributed Computing. ACM, New York, 171-182.]] Google Scholar
- YEW, P.-C., TZENG, H.-F., AND LAWRIE, D. H. 1987. Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. C-36, 4 (Apr.), 388-395.]] Google Scholar
- ZAHORJAN, g. AND MCCANN, C. 1990. Processor scheduling in shared memory multiprocessors. In Proceedings of the 1990 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 214-225.]] Google Scholar
- ZAHORJAN, J., LAZOWSKA, E. D., AND EAGER, D.L. 1991. The effect of scheduling discipline on spin overhead in shared memory parallel systems. IEEE Trans. Parallel Distrib. Syst. 2, 2 (Apr.), 180-198.]] Google ScholarDigital Library
Index Terms
- Scheduler-conscious synchronization
Recommendations
Fast and Portable Locking for Multicore Architectures
The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking ...
Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
Most multiprocessors are multiprogrammed to achieve acceptable response time and to increase their utilization. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this ...
Fissile Locks
Networked SystemsAbstractClassic test-and-test (TS) mutual exclusion locks are simple, and enjoy high performance and low latency of ownership transfer under light or no contention. They do not, however, scale gracefully under high contention and do not provide any ...
Comments