skip to main content
article
Free Access

Scheduler-conscious synchronization

Published:01 February 1997Publication History
Skip Abstract Section

Abstract

Efficient synchronization is important for achieving good performance in parallel programs, especially on large-scale multiprocessors. Most synchronization algorithms have been designed to run on a dedicated machine, with one application process per processor, and can suffer serious performance degradation in the presence of multiprogramming. Problems arise when running processes block or, worse, busy-wait for action on the part of a process that the scheduler has chosen not to run. We show that these problems are particularly severe for scalable synchronization algorithms based on distributed data structures. We then describe and evaluate a set of algorithms that perform well in the presence of multiprogramming while maintaining good performance on dedicated machines. We consider both large and small machines, with a particular focus on scalability, and examine mutual-exclusion locks, reader-writer locks, and barriers. Our algorithms vary in the degree of support required from the kernel scheduler. We find that while it is possible to avoid pathological performance problems using previously proposed kernel mechanisms, a modest additional widening of the kernel/user interface can make scheduler-conscious synchronization algorithms significantly simpler and faster, with performance on dedicated machines comparable to that of scheduler-oblivious algorithms.

References

  1. ABOULENEIN, N. M., GOODMAN, J. R., GJESSING, S., AND WOEST, P.J. 1994. Hardware support for synchronization in the Scalable Coherent Interface (SCI). In Proceedings of the 8th International Parallel Processing Symposium. IEEE, New York, 141-150.]] Google ScholarGoogle Scholar
  2. ANDERSON, T.E. 1990. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1, 1 (Jan.), 6-16.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ANDERSON, T. E., BERSHAD, B. N., LAZOWSKA, E. D., AND LEVY, H. M. 1992. Scheduler activations: Effective kernel support for the user-level management of parallelism. ACM Trans. Comput. Syst. 10, 1 (Feb.), 53-79.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ARENSTORF, N. S. AND JORDAN, H. 1989. Comparing barrier algorithms. Parallel Comput. 12, 157-170.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. AXELROD, T. S. 1986. Effects of synchronization barriers on multiprocessor performance. Parallel Comput. 3, 129-140.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. BLACK, D. L. 1990. Scheduling support for concurrency and parallelism in the Mach operating system. IEEE Comput. 23, 5 (May), 35-43.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CRAIG, T. S. 1993. Queuing spin lock algorithms to support timing predictability. In Proceedings of the 14th IEEE Real-Time Systems Symposium. IEEE, New York, 148-157.]]Google ScholarGoogle Scholar
  8. CROVELLA, M., DAS, P., DUBNICKI, C., LEBLANC, T., AND MARKATOS, E. 1991. Multiprogramming on multiprocessors. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. IEEE, New York, 590-597.]]Google ScholarGoogle Scholar
  9. EDLER, J., LIPKIS, J., AND SCHONBERG, E. 1988. Process management for highly parallel UNIX systems. In Proceedings of the USENIX Workshop on Unix and Supercomputers. USENIX Assoc., Berkeley, Calif.]]Google ScholarGoogle Scholar
  10. GRAUNKE, G. AND THAKKAR, S. 1990. Synchronization algorithms for shared-memory multiprocessors. IEEE Comput. 23, 6 (June), 60-69.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. HENSGEN, D., FINKEL, R., AND MANBER, U. 1988. Two algorithms for barrier synchronization. Int. J. Parallel Program. 17, 1, 1-17.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. HERLIHY, M. 1993. A methodology for implementing highly concurrent data objects. ACM Trans. Program. Lang. Syst. 15, 5 (Nov.), 745-770.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. KARLIN, A. R., LI, K., MANASSE, M. S., AND OWICKI, S. 1991. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. ACM, New York, 41-55.]] Google ScholarGoogle Scholar
  14. KONTOTHANASSIS, L. AND WISNIEWSKI, R. 1993. Using scheduler information to achieve optimal barrier synchronization performance. In Proceedings of the 4th ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 64-72.]] Google ScholarGoogle Scholar
  15. KONTOTHANASSIS, L. I., WISNIEWSKI, R. W., AND SCOTT, M. L. 1994. Scheduler-conscious synchronization. Tech. Rep. 550, Computer Science Dept., Univ. of Rochester, Rochester, N.Y.]] Google ScholarGoogle Scholar
  16. KRANZ, D., JOHNSON, K., AGARWAL, A., KUBIATOWICZ, J., AND LIM, B.-H. 1993. Integrating message-passing and shared-memory: Early experience. In Proceedings of the 4th ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 54-63.]] Google ScholarGoogle Scholar
  17. KRIEGER, O., STUMM, M., AND UNRAU, R. 1993. A fair fast scalable reader-writer lock. In Proceedings of the 1993 International Conference on Parallel Processing. CRC Press, Boca Raton, Fla., II: 201-204.]] Google ScholarGoogle Scholar
  18. LEE, C. A. 1990. Barrier synchronization over multistage interconnection networks. In Proceedings of the 2nd IEEE Symposium on Parallel and Distributed Processing. IEEE, New York, 130-133.]]Google ScholarGoogle Scholar
  19. LENOSKI, D., LAUDON, J., GHARACHORLOO, K., WEBER, W.-D., GUPTA, A., HENNESSY, J., HORWITZ, M., AND LAM, M.S. 1992. The Stanford Dash multiprocessor. IEEE Comput. 25, 3 (Mar.), 63-79.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. LEUTENEGGER, S. T. AND VERNON, M.K. 1990. Performance of multiprogrammed multiprocessor scheduling algorithms. In Proceedings of the 1990 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 226-236.]] Google ScholarGoogle Scholar
  21. LIM, B.-H. AND AGARWAL, A. 1993. Waiting algorithms for synchronization in large-scale multiprocessors. ACM Trans. Comput. Syst. 11, 3 (Aug.), 253-294.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. LIM, B.-H. AND AGARWAL, A. 1994. Reactive synchronization algorithms for multiprocessors. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 25-35.]] Google ScholarGoogle Scholar
  23. LINDSAY, B. G., HAAS, L. M., MOHAN, C., WILMS, P. F., AND YOST, R.A. 1984. Computation and communication in R*: A distributed database manager. ACM Trans. Comput. Syst. 2, 1 (Feb.), 24-28.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. LUBACHEVSKY, B. 1989. Synchronization barrier and related tools for shared memory parallel programming. In Proceedings of the 1989 International Conference on Parallel Processing. Penn State University Press, University Park, Pa., II:175-179.]]Google ScholarGoogle Scholar
  25. MAGNUSSEN, P., LANDIN, A., AND HAGERSTEN, E. 1994. Queue locks on cache coherent multiprocessors. In Proceedings of the 8th International Parallel Processing Symposium. IEEE, New York, 165-171.]] Google ScholarGoogle Scholar
  26. MARKATOS, E., CROVELLA, M., DAS, P., DUBNICKI, C., AND LEBLANC, T. 1991. The effects of multiprogramming on barrier synchronization. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. IEEE, New York, 662-669.]]Google ScholarGoogle Scholar
  27. MARKATOS, E.P. 1991. Multiprocessor synchronization primitives with priorities. In Proceedings of the 8th IEEE Workshop on Real-Time Operating Systems and Software. IEEE, New York, 1-7.]]Google ScholarGoogle Scholar
  28. MARSH, B. D., SCOTT, M. L., LEBLANC, T. J., AND MARKATOS, E.P. 1991. First-class user-level threads. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. ACM, New York, 110-121.]] Google ScholarGoogle Scholar
  29. MELLOR-CRUMMEY, J. M. AND SCOTT, M.L. 1991a. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb.), 21-65.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. MELLOR-CRUMMEY, J. M. AND SCOTT, M.L. 1991b. Scalable reader-writer synchronization on shared-memory multiprocessors. In Proceedings of the 3rd ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 106-113.]] Google ScholarGoogle Scholar
  31. NOAKES, M., WALLACH, D., AND DALLY, W. 1993. The J-machine multicomputer: An architecture evaluation. In Proceedings of the 20th International Symposium on Computer Architecture. ACM, New York, 224-235.]] Google ScholarGoogle Scholar
  32. OUSTERHOUT, J.K. 1982. Scheduling techniques for concurrent systems. In Proceedings of the 3rd International Conference on Distributed Computing Systems. IEEE, New York, 22-30.]]Google ScholarGoogle Scholar
  33. SCOTT, M. L. AND MELLOR-CRUMMEY, J. M. 1994. Fast, contention-free combining tree barriers. Int. J. Parallel Program, 22, 4 (Aug.), 449-481.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. SCOTT, M. L., LEBLANC, T. J., AND MARSH, B.D. 1990. Multi-model parallel programming in Psyche. In Proceedings of the 2nd ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 70-78.]] Google ScholarGoogle Scholar
  35. SINGH, J. P., WEBER, W.-D., AND GUPTA, A. 1992. SPLASH: Stanford parallel applications for shared-memory. ACM SIGARCH Comput. Arch. News 20, 1 (Mar.), 5-44.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. STAMOS, J. W. AND GIFFORD, D.K. 1990. Remote evaluation. ACM Trans. Program. Lang. Syst. 12, 4 (Oct.), 537-565.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. TAKADA, H. AND SAKAMURA, K. 1994. Predictable spin lock algorithms with preemption. In Proceedings of the 11th IEEE Workshop on Real-Time Operating Systems and Software. IEEE, New York, 2-6.]]Google ScholarGoogle Scholar
  38. TUCKER, A. AND GUPTA, A. 1989. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles. ACM, New York, 159-166.]] Google ScholarGoogle Scholar
  39. VON EICKEN, T., CULLER, D. E., GOLDSTEIN, S. C., AND SCHAUSER, K. E. 1992. Active messages: A mechanism for integrated communication and computation. In Proceedings of the 19th International Symposium on Computer Architecture. ACM, New York, 256-266.]] Google ScholarGoogle Scholar
  40. WISNIEWSKI, R. W., KONTOTHANASSIS, L., AND SCOTT, M. L. 1994. Scalable spin locks for multiprogrammed systems. In Proceedings of the 8th International Parallel Processing Symposium. IEEE, New York, 583-589.]] Google ScholarGoogle Scholar
  41. YANG, H.-H. AND ANDERSON, J. H. 1993. Fast, scalable synchronization with minimal hardware support (extended abstract). In Proceedings of the 12th ACM Symposium on Principles of Distributed Computing. ACM, New York, 171-182.]] Google ScholarGoogle Scholar
  42. YEW, P.-C., TZENG, H.-F., AND LAWRIE, D. H. 1987. Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. C-36, 4 (Apr.), 388-395.]] Google ScholarGoogle Scholar
  43. ZAHORJAN, g. AND MCCANN, C. 1990. Processor scheduling in shared memory multiprocessors. In Proceedings of the 1990 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 214-225.]] Google ScholarGoogle Scholar
  44. ZAHORJAN, J., LAZOWSKA, E. D., AND EAGER, D.L. 1991. The effect of scheduling discipline on spin overhead in shared memory parallel systems. IEEE Trans. Parallel Distrib. Syst. 2, 2 (Apr.), 180-198.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scheduler-conscious synchronization

      Recommendations

      Reviews

      Nancy R. Mead

      The performance degradation of many synchronization algorithms in the presence of multiprogramming is addressed in this technical paper. According to the authors, many synchronization algorithms have been designed to run on dedicated machines. As a result, many standard approaches, such as busy-waiting, are not effective in the multiprogramming environment. The authors build on previous work on the interaction between scheduling and synchronization. Specifically, they show that interactions between scheduling and synchronization are a more serious issue for scalable synchronization algorithms. They propose minor extensions to the user-kernel interface that will allow applications to interact with schedulers more effectively. They distinguish between preemption-safe algorithms and scheduler-conscious algorithms. Six new synchronization algorithms are described and evaluated: a preemption-safe ticket lock, a preemption-safe queue lock, a scheduler-conscious queue lock, a scheduler-conscious queued reader-writer lock, a scheduler-conscious small-scale barrier, and a scheduler-conscious scalable barrier. The algorithms are tested on two different architectures—a small-scale bus-based machine and a large-scale distributed-memory machine. The machines used are a 12-processor 100 MHz Silicon Graphics Challenge and a 64-processor partition of a 20 MHz Kendall Square KSR 1. Both simulated and real applications are used. This is an extremely thorough study. The empirical results provide credibility and enabled the researchers to improve the algorithms. This paper should interest anyone who is studying synchronization primitives at the operating system level and is concerned with multiprogramming. It should also be of help to users who want to understand what goes on below the surface when they apply synchronization primitives. Finally, this work reinforces the fact that changes in the underlying operating system and hardware configuration are not necessarily transparent to the user.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Computer Systems
        ACM Transactions on Computer Systems  Volume 15, Issue 1
        Feb. 1997
        108 pages
        ISSN:0734-2071
        EISSN:1557-7333
        DOI:10.1145/244764
        Issue’s Table of Contents

        Copyright © 1997 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 February 1997
        Published in tocs Volume 15, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader