article

Free Access

Scheduler-conscious synchronization

Authors:
Leonidas I. Kontothanassis

DEC Cambridge Research Lab, Cambridge, MA

DEC Cambridge Research Lab, Cambridge, MA
View Profile

,
Robert W. Wisniewski

Silicon Graphics, Inc., Mountain View, CA

Silicon Graphics, Inc., Mountain View, CA
View Profile

,
Michael L. Scott

Univ. of Rochester, Rochester, NY

Univ. of Rochester, Rochester, NY
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 15 Issue 1pp 3–40https://doi.org/10.1145/244764.244765

Published:01 February 1997Publication History

ACM Transactions on Computer Systems

Abstract

Efficient synchronization is important for achieving good performance in parallel programs, especially on large-scale multiprocessors. Most synchronization algorithms have been designed to run on a dedicated machine, with one application process per processor, and can suffer serious performance degradation in the presence of multiprogramming. Problems arise when running processes block or, worse, busy-wait for action on the part of a process that the scheduler has chosen not to run. We show that these problems are particularly severe for scalable synchronization algorithms based on distributed data structures. We then describe and evaluate a set of algorithms that perform well in the presence of multiprogramming while maintaining good performance on dedicated machines. We consider both large and small machines, with a particular focus on scalability, and examine mutual-exclusion locks, reader-writer locks, and barriers. Our algorithms vary in the degree of support required from the kernel scheduler. We find that while it is possible to avoid pathological performance problems using previously proposed kernel mechanisms, a modest additional widening of the kernel/user interface can make scheduler-conscious synchronization algorithms significantly simpler and faster, with performance on dedicated machines comparable to that of scheduler-oblivious algorithms.

References

ABOULENEIN, N. M., GOODMAN, J. R., GJESSING, S., AND WOEST, P.J. 1994. Hardware support for synchronization in the Scalable Coherent Interface (SCI). In Proceedings of the 8th International Parallel Processing Symposium. IEEE, New York, 141-150.]] Google Scholar
ANDERSON, T.E. 1990. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1, 1 (Jan.), 6-16.]] Google ScholarDigital Library
ANDERSON, T. E., BERSHAD, B. N., LAZOWSKA, E. D., AND LEVY, H. M. 1992. Scheduler activations: Effective kernel support for the user-level management of parallelism. ACM Trans. Comput. Syst. 10, 1 (Feb.), 53-79.]] Google ScholarDigital Library
ARENSTORF, N. S. AND JORDAN, H. 1989. Comparing barrier algorithms. Parallel Comput. 12, 157-170.]]Google ScholarCross Ref
AXELROD, T. S. 1986. Effects of synchronization barriers on multiprocessor performance. Parallel Comput. 3, 129-140.]] Google ScholarDigital Library
BLACK, D. L. 1990. Scheduling support for concurrency and parallelism in the Mach operating system. IEEE Comput. 23, 5 (May), 35-43.]] Google ScholarDigital Library
CRAIG, T. S. 1993. Queuing spin lock algorithms to support timing predictability. In Proceedings of the 14th IEEE Real-Time Systems Symposium. IEEE, New York, 148-157.]]Google Scholar
CROVELLA, M., DAS, P., DUBNICKI, C., LEBLANC, T., AND MARKATOS, E. 1991. Multiprogramming on multiprocessors. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. IEEE, New York, 590-597.]]Google Scholar
EDLER, J., LIPKIS, J., AND SCHONBERG, E. 1988. Process management for highly parallel UNIX systems. In Proceedings of the USENIX Workshop on Unix and Supercomputers. USENIX Assoc., Berkeley, Calif.]]Google Scholar
GRAUNKE, G. AND THAKKAR, S. 1990. Synchronization algorithms for shared-memory multiprocessors. IEEE Comput. 23, 6 (June), 60-69.]] Google ScholarDigital Library
HENSGEN, D., FINKEL, R., AND MANBER, U. 1988. Two algorithms for barrier synchronization. Int. J. Parallel Program. 17, 1, 1-17.]] Google ScholarDigital Library
HERLIHY, M. 1993. A methodology for implementing highly concurrent data objects. ACM Trans. Program. Lang. Syst. 15, 5 (Nov.), 745-770.]] Google ScholarDigital Library
KARLIN, A. R., LI, K., MANASSE, M. S., AND OWICKI, S. 1991. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. ACM, New York, 41-55.]] Google Scholar
KONTOTHANASSIS, L. AND WISNIEWSKI, R. 1993. Using scheduler information to achieve optimal barrier synchronization performance. In Proceedings of the 4th ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 64-72.]] Google Scholar
KONTOTHANASSIS, L. I., WISNIEWSKI, R. W., AND SCOTT, M. L. 1994. Scheduler-conscious synchronization. Tech. Rep. 550, Computer Science Dept., Univ. of Rochester, Rochester, N.Y.]] Google Scholar
KRANZ, D., JOHNSON, K., AGARWAL, A., KUBIATOWICZ, J., AND LIM, B.-H. 1993. Integrating message-passing and shared-memory: Early experience. In Proceedings of the 4th ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 54-63.]] Google Scholar
KRIEGER, O., STUMM, M., AND UNRAU, R. 1993. A fair fast scalable reader-writer lock. In Proceedings of the 1993 International Conference on Parallel Processing. CRC Press, Boca Raton, Fla., II: 201-204.]] Google Scholar
LEE, C. A. 1990. Barrier synchronization over multistage interconnection networks. In Proceedings of the 2nd IEEE Symposium on Parallel and Distributed Processing. IEEE, New York, 130-133.]]Google Scholar
LENOSKI, D., LAUDON, J., GHARACHORLOO, K., WEBER, W.-D., GUPTA, A., HENNESSY, J., HORWITZ, M., AND LAM, M.S. 1992. The Stanford Dash multiprocessor. IEEE Comput. 25, 3 (Mar.), 63-79.]] Google ScholarDigital Library
LEUTENEGGER, S. T. AND VERNON, M.K. 1990. Performance of multiprogrammed multiprocessor scheduling algorithms. In Proceedings of the 1990 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 226-236.]] Google Scholar
LIM, B.-H. AND AGARWAL, A. 1993. Waiting algorithms for synchronization in large-scale multiprocessors. ACM Trans. Comput. Syst. 11, 3 (Aug.), 253-294.]] Google ScholarDigital Library
LIM, B.-H. AND AGARWAL, A. 1994. Reactive synchronization algorithms for multiprocessors. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 25-35.]] Google Scholar
LINDSAY, B. G., HAAS, L. M., MOHAN, C., WILMS, P. F., AND YOST, R.A. 1984. Computation and communication in R*: A distributed database manager. ACM Trans. Comput. Syst. 2, 1 (Feb.), 24-28.]] Google ScholarDigital Library
LUBACHEVSKY, B. 1989. Synchronization barrier and related tools for shared memory parallel programming. In Proceedings of the 1989 International Conference on Parallel Processing. Penn State University Press, University Park, Pa., II:175-179.]]Google Scholar
MAGNUSSEN, P., LANDIN, A., AND HAGERSTEN, E. 1994. Queue locks on cache coherent multiprocessors. In Proceedings of the 8th International Parallel Processing Symposium. IEEE, New York, 165-171.]] Google Scholar
MARKATOS, E., CROVELLA, M., DAS, P., DUBNICKI, C., AND LEBLANC, T. 1991. The effects of multiprogramming on barrier synchronization. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. IEEE, New York, 662-669.]]Google Scholar
MARKATOS, E.P. 1991. Multiprocessor synchronization primitives with priorities. In Proceedings of the 8th IEEE Workshop on Real-Time Operating Systems and Software. IEEE, New York, 1-7.]]Google Scholar
MARSH, B. D., SCOTT, M. L., LEBLANC, T. J., AND MARKATOS, E.P. 1991. First-class user-level threads. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. ACM, New York, 110-121.]] Google Scholar
MELLOR-CRUMMEY, J. M. AND SCOTT, M.L. 1991a. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb.), 21-65.]] Google ScholarDigital Library
MELLOR-CRUMMEY, J. M. AND SCOTT, M.L. 1991b. Scalable reader-writer synchronization on shared-memory multiprocessors. In Proceedings of the 3rd ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 106-113.]] Google Scholar
NOAKES, M., WALLACH, D., AND DALLY, W. 1993. The J-machine multicomputer: An architecture evaluation. In Proceedings of the 20th International Symposium on Computer Architecture. ACM, New York, 224-235.]] Google Scholar
OUSTERHOUT, J.K. 1982. Scheduling techniques for concurrent systems. In Proceedings of the 3rd International Conference on Distributed Computing Systems. IEEE, New York, 22-30.]]Google Scholar
SCOTT, M. L. AND MELLOR-CRUMMEY, J. M. 1994. Fast, contention-free combining tree barriers. Int. J. Parallel Program, 22, 4 (Aug.), 449-481.]] Google ScholarDigital Library
SCOTT, M. L., LEBLANC, T. J., AND MARSH, B.D. 1990. Multi-model parallel programming in Psyche. In Proceedings of the 2nd ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 70-78.]] Google Scholar
SINGH, J. P., WEBER, W.-D., AND GUPTA, A. 1992. SPLASH: Stanford parallel applications for shared-memory. ACM SIGARCH Comput. Arch. News 20, 1 (Mar.), 5-44.]] Google ScholarDigital Library
STAMOS, J. W. AND GIFFORD, D.K. 1990. Remote evaluation. ACM Trans. Program. Lang. Syst. 12, 4 (Oct.), 537-565.]] Google ScholarDigital Library
TAKADA, H. AND SAKAMURA, K. 1994. Predictable spin lock algorithms with preemption. In Proceedings of the 11th IEEE Workshop on Real-Time Operating Systems and Software. IEEE, New York, 2-6.]]Google Scholar
TUCKER, A. AND GUPTA, A. 1989. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles. ACM, New York, 159-166.]] Google Scholar
VON EICKEN, T., CULLER, D. E., GOLDSTEIN, S. C., AND SCHAUSER, K. E. 1992. Active messages: A mechanism for integrated communication and computation. In Proceedings of the 19th International Symposium on Computer Architecture. ACM, New York, 256-266.]] Google Scholar
WISNIEWSKI, R. W., KONTOTHANASSIS, L., AND SCOTT, M. L. 1994. Scalable spin locks for multiprogrammed systems. In Proceedings of the 8th International Parallel Processing Symposium. IEEE, New York, 583-589.]] Google Scholar
YANG, H.-H. AND ANDERSON, J. H. 1993. Fast, scalable synchronization with minimal hardware support (extended abstract). In Proceedings of the 12th ACM Symposium on Principles of Distributed Computing. ACM, New York, 171-182.]] Google Scholar
YEW, P.-C., TZENG, H.-F., AND LAWRIE, D. H. 1987. Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. C-36, 4 (Apr.), 388-395.]] Google Scholar
ZAHORJAN, g. AND MCCANN, C. 1990. Processor scheduling in shared memory multiprocessors. In Proceedings of the 1990 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 214-225.]] Google Scholar
ZAHORJAN, J., LAZOWSKA, E. D., AND EAGER, D.L. 1991. The effect of scheduling discipline on spin overhead in shared memory parallel systems. IEEE Trans. Parallel Distrib. Syst. 2, 2 (Apr.), 180-198.]] Google ScholarDigital Library

Index Terms

Scheduler-conscious synchronization
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multiprocessing / multiprogramming / multitasking
        Process synchronization

Recommendations

Fast and Portable Locking for Multicore Architectures

The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking ...
Read More
Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors

Most multiprocessors are multiprogrammed to achieve acceptable response time and to increase their utilization. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this ...
Read More
Fissile Locks
Networked Systems
Abstract
Classic test-and-test (TS) mutual exclusion locks are simple, and enjoy high performance and low latency of ownership transfer under light or no contention. They do not, however, scale gracefully under high contention and do not provide any ...
Read More

Reviews

Reviewer: Nancy R. Mead

The performance degradation of many synchronization algorithms in the presence of multiprogramming is addressed in this technical paper. According to the authors, many synchronization algorithms have been designed to run on dedicated machines. As a result, many standard approaches, such as busy-waiting, are not effective in the multiprogramming environment. The authors build on previous work on the interaction between scheduling and synchronization. Specifically, they show that interactions between scheduling and synchronization are a more serious issue for scalable synchronization algorithms. They propose minor extensions to the user-kernel interface that will allow applications to interact with schedulers more effectively. They distinguish between preemption-safe algorithms and scheduler-conscious algorithms. Six new synchronization algorithms are described and evaluated: a preemption-safe ticket lock, a preemption-safe queue lock, a scheduler-conscious queue lock, a scheduler-conscious queued reader-writer lock, a scheduler-conscious small-scale barrier, and a scheduler-conscious scalable barrier. The algorithms are tested on two different architectures—a small-scale bus-based machine and a large-scale distributed-memory machine. The machines used are a 12-processor 100 MHz Silicon Graphics Challenge and a 64-processor partition of a 20 MHz Kendall Square KSR 1. Both simulated and real applications are used. This is an extremely thorough study. The empirical results provide credibility and enabled the researchers to improve the algorithms. This paper should interest anyone who is studying synchronization primitives at the operating system level and is concerned with multiprogramming. It should also be of help to users who want to understand what goes on below the surface when they apply synchronization primitives. Finally, this work reinforces the fact that changes in the underlying operating system and hardware configuration are not necessarily transparent to the user.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 15, Issue 1
Feb. 1997
108 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/244764
Editor:
Ken Birman
Cornell Univ., Ithaca, NY
Issue’s Table of Contents
Copyright © 1997 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 1997
Published in tocs Volume 15, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
barriers
busy-waiting
kernel-user interaction
locks
mutual exclusion
preemption
scalability
scheduling
synchronization
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 1,046
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scheduler-conscious synchronization

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Fast and Portable Locking for Multicore Architectures

Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors

Fissile Locks

Reviews

Access critical reviews of Computing literature here