skip to main content
article

Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

Published:01 August 2001Publication History
Skip Abstract Section

Abstract

In modern distributed systems, coordinated time-sharing is required for communicating processes to leverage the performance of switch-based networks and low-overhead protocols. Coordinated time-sharing has traditionally been achieved with gang scheduling or explicit coscheduling, implementations of which often suffer from many deficiencies: multiple points of failure, high context-switch overheads, and poor interaction with client-server, interactive, and I/O -intensive workloads. Implicit coscheduling dynamically coordinates communicating processes across distributed machines without these structural deficiencies. In implicit coscheduling, no communication is required across operating systems schedulers; instead, cooperating processes achieve coordination by reacting to implicit information carried by communication existing within the parallel application. The implementation of this approach is simple and allows participating nodes to act autonomously. We introduce two key mechanisms in implicit coscheduling. The first is conditional two-phase waiting, a generalization of traditional two-phase waiting in which spin-time may be increased depending upon events occuring while the process waits. The second is an extension to stride scheduling that provides preemption and is fair to processes that block. To demonstrate that implicit coscheduling performs well, we show results from an extensive set of simulation and implementation experiments. To exercise the conditional two-phase waiting algorithm, we examine three workloads: bulk-synchronous and continuous-communication synthetic applications and application kernels written in the Split-C language. To exercise the local scheduler, we examine competing jobs with different communication characteristics. We demonstrate that our implementation scales well with the number of jobs and workstations and is robust to process placement. Our experiments show that implicit coscheduling is effective and fair for a wide range of workloads; most perform within 30% of an idealized model of gang scheduling.

References

  1. ALEXANDROV, A., IONESCU, M., SCHAUSER,K.E.,AND SCHEIMAN, C. 1995. LogGP: Incorporating Long Messages into the LogP model-One step closer towards a realistic model for parallel computation. In 7th Annual Symposium on Parallel Algorithms and Architectures (SPAA'95) (July 1995).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ANDERSON, T. E., CULLER, D. E., PATTERSON,D.A.,AND THE NOW TEAM. 1995a. A Case for NOW (Networks of Workstations). IEEE Micro 15, 1 (February), 54-64.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ANDERSON, T. E., DAHLIN,M.D.,NEEFE, J. M., PATTERSON, D. A., ROSELLI,D.S.,AND WANG, R. Y. 1995b. Serverless Network File Systems. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (December 1995), pp. 109-126.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ARPACI, R. H., DUSSEAU,A.C.,VAHDAT, A. M., LIU,L.T.,ANDERSON,T.E.,AND PATTERSON, D. A. 1995. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. In Proceedings of ACM SIGMETRICS'95/PERFORMANCE'95 Joint International Conference on Measurement and Modeling of Computer Systems (May 1995), pp. 267-278.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. ARPACI-DUSSEAU,A.AND CULLER, D. 1997. Extending Proportional-Share Scheduling to a Network of Workstations. In International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (Las Vegas, Nevada, June 1997).]]Google ScholarGoogle Scholar
  6. ARPACI-DUSSEAU, A. C. 1998. Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems. Ph. D. thesis, University of California at Berkeley.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. ARPACI-DUSSEAU,A.C.,ARPACI-DUSSEAU, R. H., CULLER, D. E., HELLERSTEIN,J.M.,AND PATTERSON,D.P. 1997. High-Performance Sorting on Networks of Workstations. In Proceedings of the 1997 ACM SIGMOD Conference (1997), pp. 243-254.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. BIAGIONI, E., COOPER, E., AND SANSOM, R. 1993. Designing a Practical ATMLAN. IEEE Network 7,2 (March), 32-39.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. BODEN,N.J.,COHEN, D., FELDERMAN, R. E., KULAWIK, A. E., SEITZ, C. L., SEIZOVIC,J.N.,AND SU,W.-K. 1995. Myrinet--A Gigabit-per-Second Local-Area Network. IEEE Micro 15, 1 (February), 29- 38.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. BREWER,E.A.AND KUSZMAUL, B. C. 1993. How to Get Good Performance from the CM5 Data Network. In Proceedings of the 1994 International Parallel Processing Symposium (Cancun, Mexico, April 1993), pp. 858-867.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. BUCHANAN,M.AND CHIEN, A. 1997. Coordinated Thread Scheduling for Workstation Clusters Under Windows NT. In Proceedings of USENIX Windows NT Workshop (Aug. 1997).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. BURGER,D.C.,HYDER,R.S.,MILLER,B.P.,AND WOOD, D. A. 1994. Paging Tradeoffs in Distributed- Shared-Memory Multiprocessors. In Supercomputing'94 (Nov. 1994).]]Google ScholarGoogle Scholar
  13. CROVELLA, M., DAS, P., DUBNICKI, C., LEBLANC,T.,AND MARKATOS, E. 1991. Multiprogramming on Multiprocessors. Technical Report 385 (February), University of Rochester, Computer Science Department.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. CULLER, D., ARPACI-DUSSEAU, A., ARPACI-DUSSEAU, R., CHUN, B., LUMETTA, S., MAINWARING, A., MARTIN, R., YOSHIKAWA,C.,AND WONG, F. 1997. Parallel Computing on the Berkeley NOW. In Ninth Joint Symposium on Parallel Processing (Kobe, Japan, May 1997).]]Google ScholarGoogle Scholar
  15. CULLER, D. E., DUSSEAU, A., GOLDSTEIN,S.C.,KRISHNAMURTHY, A., LUMETTA,S.,VON EICKEN,T.,AND YELICK, K. 1993a. Parallel Programming in Split-C. In Proceedings of Supercomputing '93 (1993), pp. 262-273.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. CULLER, D. E., KARP, R. M., PATTERSON, D. A., SAHAY, A., SCHAUSER, K. E., SANTOS, E., SUBRAMONIAN, R., AND VON EICKEN, T. 1993b. LogP: Towards a Realistic Model of Parallel Computation. In Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (May 1993), pp. 262-273.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. DENNING,D.E.AND DENNING, P. J. 1979. Data Security. Computing Surveys 11, 2 (Sept.), 227-249.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. DEWITT,D.AND GRAY, J. 1992. Parallel Database Systems: The Future of High-Performance Database Systems. Communications of the ACM 35, 6 (June), 85-98.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. DOWDY, L. 1988. On the Partitioning of Multiprocessor Systems. Technical Report 88-06 (July), Department of Computer Science, Vanderbilt University.]]Google ScholarGoogle Scholar
  20. DUSSEAU,A.C.,ARPACI,R.H.,AND CULLER, D. E. 1996a. Effective Distributed Scheduling of Parallel Workloads. In Proceedings of 1996 ACM Sigmetrics International Conference on Measurement and Modeling of Computer Systems (1996).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. DUSSEAU,A.C.,CULLER, D. E., SCHAUSER,K.E.,AND MARTIN, R. P. 1996b. Fast Parallel Sorting Under LogP: Experience with the CM-5. IEEE Transactions on Parallel and Distributed Systems 7,8 (August), 791-805.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. EYKHOLT, J. R., KLEIMAN, S. R., BARTON, S., VOLL, J., FAULKNER, R., SHIVALINGIAH, A., SMITH, M., STEIN, D., WEEKS, M., AND WILLIAMS, D. 1992. Beyond Multiprocessing: Multithreading the SunOS Kernel. In Proceedings of the Summer 1992 USENIX Technical Conference and Exhibition (San Antontio, TX, USA, June 1992), pp. 11-18.]]Google ScholarGoogle Scholar
  23. FEITELSON, D. G. 1995. A Survey of Scheduling in Multiprogrammed Parallel Systems. Research Report RC 19790 (87657) (February), IBM T. J. Watson Research Center, Yorktown Heights, NY. Second Revision, August 1997.]]Google ScholarGoogle Scholar
  24. FEITELSON,D.G.AND JETTE, M. 1997. Improved Utilization and Responsiveness with Gang Scheduling. In Proceedings of the IPPS '97 Workshop on Job Scheduling Strategies for Parallel Processing (1997).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. FEITELSON,D.G.AND RUDOLPH, L. 1990. Distributed Hierarchical Control for Parallel Processing. IEEE Computer 23, 5 (May), 65-77.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. FEITELSON,D.G.AND RUDOLPH, L. 1992. Gang Scheduling Performance Benefits for Fine- Grained Synchronization. Journal of Parallel and Distributed Computing 16, 4 (December), 306-18.]]Google ScholarGoogle ScholarCross RefCross Ref
  27. FEITELSON,D.G.AND RUDOLPH, L. 1995. Coscheduling Based on Run-Time Identification of Activity Working Sets. International Journal of Parallel Programming 23, 2 (April), 136-160.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. GHORMLEY,D.P.,PETROU, D., RODRIGUES, S. H., VAHDAT,A.M.,AND ANDERSON, T. E. 1989. GLU- nix: A Global Layer Unix for a Network of Workstations. In Software Practice and Experience (1989).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. GOODHEART,B.AND COX, J. 1994. The Magic Garden Explained: The Internals of UNIX System V Release 4. Prentice Hall.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. HELLERSTEIN, J. L. 1993. Achieving Service Rate Objectives with Decay Usage Scheduling. IEEE Transactions on Software Engineering 19, 8 (August), 813-825.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. HILL,M.D.,LARUS, J. R., REINHARDT,S.K.,AND WOOD, D. A. 1993. Cooperative Shared Memory: oftware and Hardware for Scalable Multiprocessors. ACM Transactions on Computer Systems 11, 4 (November), 300-18.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. HORI, A., TEZUKA, H., AND ISHIKAWA, Y. 1997. Global State Detection using Network Preemption. In Proceedings of the IPPS '97 Workshop on Job Scheduling Strategies for Parallel Processing (1997).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. JACOBSON, V. 1988. Congestion avoidance and control. In SIGCOMM '88 Symposium: Communications Architectures and Protocols (Aug. 1988), pp. 314-29.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. KARLIN, A. R., MANASSE, M., MCGEOCH, L., AND OWICKI, S. 1994. Competitive Randomized Algorithms For Nonuniform Problems. Algorithmica 11, 6 (June), 542-71.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. KAY,J.AND LAUDER, P. 1988. A Fair Share Scheduler. Communications of the ACM31, 1 (January), 44-55.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. LEE, W., FRANK, M., LEE, V., MACKENZIE, K., AND RUDOLPH, L. 1997. Implications of I/O for Gang Scheduled Workloads. In Proceedings of the IPPS '97 Workshop on Job Scheduling Strategies for Parallel Processing (1997).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. LEUTENEGGER,S.T.AND VERNON, M. K. 1990. The Performance of Multiprogrammed Multiprocessor Scheduling Policies. In Proceedings of the 1990 ACM SIGMETRICS Conference (May 1990), pp. 226-36.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. MAINWARING, A. M. 1995. Active Message Application Programming Interface and Communication Subsystem Organization. Master's thesis, University of California, Berkeley.]]Google ScholarGoogle Scholar
  39. MARTIN,R.P.,VAHDAT, A. M., CULLER,D.E.,AND ANDERSON, T. E. 1997. Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture. In Proceedings of the 24th Annual International Symposium on Computer Architecture (Denver, CO, June 1997).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. NELSON,R.AND TOWSLEY, D. 1993. A Performance Evaluation of Several Priority Policies for Parallel Processing Systems. Journal of the ACM 40, 3 (July), 714-740.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. NIEH, J., HANKO,J.G.,NORTHCUTT,J.D.,AND WALL, G. A. 1993. SVR4 Unix Scheduler Unacceptable for Multimedia Applications. In Proceedings of the Workshop on Network and Operating System Support for Digital Audio and Video (Nov. 1993).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. OUSTERHOUT, J. K. 1982. Scheduling Techniques for Concurrent Systems. In Third International Conference on Distributed Computing Systems (May 1982), pp. 22-30.]]Google ScholarGoogle Scholar
  43. PAKIN, S., LAURIA, M., AND CHIEN, A. 1995. High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet. In Proceedings of the 1995 Supercomputing Conference (San Diego, California, 1995).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. ROSTI, E., SERAZZI, G., SMIRNI, E., AND SQUILLANTE, M. S. 1998. The Impact of I/O on Program Behavior and Parallel Scheduling. In SIGMETRICS '98/PERFORMANCE'98 (June 1998).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. SOBALVARRO,P.G.AND WEIHL, W. E. 1995. Demand-based Coscheduling of Parallel Jobs on Multi-programmed Multiprocessors. In Proceedings of the IPPS '95 Workshop on Job Scheduling Strategies for Parallel Processing (April 1995), pp. 63-75.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. SOBALVARRO,P.G.,PAKIN, S., WEIHL,W.E.,AND CHIEN, A. A. 1998. Dynamic Coscheduling on Workstation Clusters. In Proceedings of the IPPS '98 Workshop on Job Scheduling Strategies for Parallel Processing (1998).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. SUNDERAM, V. 1990. PVM: A Framework for Parallel Distributed Computing. Concurrency: Practice and Experience 2, 4 (December), 315-339.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. THE MPI FORUM. 1993. MPI: A Message Passing Interface. In Proceedings of Supercomputing '93 (Nov. 1993), pp. 878-883.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. VON EICKEN, T., BASU, A., BUCH,V.,AND VOGELS, W. 1995. U-Net: A User-Level Network Interface for Parallel and Distributed Computing. In Proceedings of the Fifteenth SOSP (Copper Mountain, CO, December 1995), pp. 40-53.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. VON EICKEN, T., CULLER, D. E., GOLDSTEIN,S.C.,AND SCHAUSER, K. E. 1992. Active Messages: a Mechanism for Integrated Communication and Computation. In Proceedings of the 19th International Symposium on Computer Architecture (Gold Coast, Australia, May 1992).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. WALDSPURGER, C. A. 1995. Lottery and Stride Scheduling: Flexible Proportional-Share Resource Management. Ph. D. thesis, Massachusetts Institute of Technology. Also appears as Technical Report MIT/LCS/TR-667.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. WALDSPURGER,C.A.AND WEIHL, W. E. 1996. An Object-Oriented Framework for Modular Resource Management. In 5th Workshop on Object-Orientation in Operating Systems (IWOOOS '96) (October 1996).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. WANG, F., PAPAEFTHYMIOU, M., AND SQUILLANTE, M. 1997. Performance Evaluation of Gang Scheduling for Parallel and Distributed Multiprogramming. In Proceedings of the IPPS '97 Workshop on Job Scheduling Strategies for Parallel Processing (1997).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. WARREN,M.S.,BECKER,D.J.,GODA,M.P.,SALMON,J.K.,AND STERLING, T. 1997. Parallel Supercomputing with Commodity Components. In International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (Las Vegas, Nevada, June 1997), pp. 1372- 1381.]]Google ScholarGoogle Scholar
  55. WONG,F.C.,ARPACI-DUSSEAU,A.C.,AND CULLER, D. E. 1999. Building MPI for Multi-Programming Systems using Implicit Information. In Proceedings of The 6th European PVM/MPI User's Group Meeting (Barcelona, Spain, Sept. 1999).]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Computer Systems
    ACM Transactions on Computer Systems  Volume 19, Issue 3
    Aug. 2001
    130 pages
    ISSN:0734-2071
    EISSN:1557-7333
    DOI:10.1145/380749
    Issue’s Table of Contents

    Copyright © 2001 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 August 2001
    Published in tocs Volume 19, Issue 3

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader