Abstract
In modern distributed systems, coordinated time-sharing is required for communicating processes to leverage the performance of switch-based networks and low-overhead protocols. Coordinated time-sharing has traditionally been achieved with gang scheduling or explicit coscheduling, implementations of which often suffer from many deficiencies: multiple points of failure, high context-switch overheads, and poor interaction with client-server, interactive, and I/O -intensive workloads. Implicit coscheduling dynamically coordinates communicating processes across distributed machines without these structural deficiencies. In implicit coscheduling, no communication is required across operating systems schedulers; instead, cooperating processes achieve coordination by reacting to implicit information carried by communication existing within the parallel application. The implementation of this approach is simple and allows participating nodes to act autonomously. We introduce two key mechanisms in implicit coscheduling. The first is conditional two-phase waiting, a generalization of traditional two-phase waiting in which spin-time may be increased depending upon events occuring while the process waits. The second is an extension to stride scheduling that provides preemption and is fair to processes that block. To demonstrate that implicit coscheduling performs well, we show results from an extensive set of simulation and implementation experiments. To exercise the conditional two-phase waiting algorithm, we examine three workloads: bulk-synchronous and continuous-communication synthetic applications and application kernels written in the Split-C language. To exercise the local scheduler, we examine competing jobs with different communication characteristics. We demonstrate that our implementation scales well with the number of jobs and workstations and is robust to process placement. Our experiments show that implicit coscheduling is effective and fair for a wide range of workloads; most perform within 30% of an idealized model of gang scheduling.
- ALEXANDROV, A., IONESCU, M., SCHAUSER,K.E.,AND SCHEIMAN, C. 1995. LogGP: Incorporating Long Messages into the LogP model-One step closer towards a realistic model for parallel computation. In 7th Annual Symposium on Parallel Algorithms and Architectures (SPAA'95) (July 1995).]] Google ScholarDigital Library
- ANDERSON, T. E., CULLER, D. E., PATTERSON,D.A.,AND THE NOW TEAM. 1995a. A Case for NOW (Networks of Workstations). IEEE Micro 15, 1 (February), 54-64.]] Google ScholarDigital Library
- ANDERSON, T. E., DAHLIN,M.D.,NEEFE, J. M., PATTERSON, D. A., ROSELLI,D.S.,AND WANG, R. Y. 1995b. Serverless Network File Systems. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (December 1995), pp. 109-126.]] Google ScholarDigital Library
- ARPACI, R. H., DUSSEAU,A.C.,VAHDAT, A. M., LIU,L.T.,ANDERSON,T.E.,AND PATTERSON, D. A. 1995. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. In Proceedings of ACM SIGMETRICS'95/PERFORMANCE'95 Joint International Conference on Measurement and Modeling of Computer Systems (May 1995), pp. 267-278.]] Google ScholarDigital Library
- ARPACI-DUSSEAU,A.AND CULLER, D. 1997. Extending Proportional-Share Scheduling to a Network of Workstations. In International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (Las Vegas, Nevada, June 1997).]]Google Scholar
- ARPACI-DUSSEAU, A. C. 1998. Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems. Ph. D. thesis, University of California at Berkeley.]] Google ScholarDigital Library
- ARPACI-DUSSEAU,A.C.,ARPACI-DUSSEAU, R. H., CULLER, D. E., HELLERSTEIN,J.M.,AND PATTERSON,D.P. 1997. High-Performance Sorting on Networks of Workstations. In Proceedings of the 1997 ACM SIGMOD Conference (1997), pp. 243-254.]] Google ScholarDigital Library
- BIAGIONI, E., COOPER, E., AND SANSOM, R. 1993. Designing a Practical ATMLAN. IEEE Network 7,2 (March), 32-39.]]Google ScholarDigital Library
- BODEN,N.J.,COHEN, D., FELDERMAN, R. E., KULAWIK, A. E., SEITZ, C. L., SEIZOVIC,J.N.,AND SU,W.-K. 1995. Myrinet--A Gigabit-per-Second Local-Area Network. IEEE Micro 15, 1 (February), 29- 38.]] Google ScholarDigital Library
- BREWER,E.A.AND KUSZMAUL, B. C. 1993. How to Get Good Performance from the CM5 Data Network. In Proceedings of the 1994 International Parallel Processing Symposium (Cancun, Mexico, April 1993), pp. 858-867.]] Google ScholarDigital Library
- BUCHANAN,M.AND CHIEN, A. 1997. Coordinated Thread Scheduling for Workstation Clusters Under Windows NT. In Proceedings of USENIX Windows NT Workshop (Aug. 1997).]] Google ScholarDigital Library
- BURGER,D.C.,HYDER,R.S.,MILLER,B.P.,AND WOOD, D. A. 1994. Paging Tradeoffs in Distributed- Shared-Memory Multiprocessors. In Supercomputing'94 (Nov. 1994).]]Google Scholar
- CROVELLA, M., DAS, P., DUBNICKI, C., LEBLANC,T.,AND MARKATOS, E. 1991. Multiprogramming on Multiprocessors. Technical Report 385 (February), University of Rochester, Computer Science Department.]] Google ScholarDigital Library
- CULLER, D., ARPACI-DUSSEAU, A., ARPACI-DUSSEAU, R., CHUN, B., LUMETTA, S., MAINWARING, A., MARTIN, R., YOSHIKAWA,C.,AND WONG, F. 1997. Parallel Computing on the Berkeley NOW. In Ninth Joint Symposium on Parallel Processing (Kobe, Japan, May 1997).]]Google Scholar
- CULLER, D. E., DUSSEAU, A., GOLDSTEIN,S.C.,KRISHNAMURTHY, A., LUMETTA,S.,VON EICKEN,T.,AND YELICK, K. 1993a. Parallel Programming in Split-C. In Proceedings of Supercomputing '93 (1993), pp. 262-273.]] Google ScholarDigital Library
- CULLER, D. E., KARP, R. M., PATTERSON, D. A., SAHAY, A., SCHAUSER, K. E., SANTOS, E., SUBRAMONIAN, R., AND VON EICKEN, T. 1993b. LogP: Towards a Realistic Model of Parallel Computation. In Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (May 1993), pp. 262-273.]] Google ScholarDigital Library
- DENNING,D.E.AND DENNING, P. J. 1979. Data Security. Computing Surveys 11, 2 (Sept.), 227-249.]] Google ScholarDigital Library
- DEWITT,D.AND GRAY, J. 1992. Parallel Database Systems: The Future of High-Performance Database Systems. Communications of the ACM 35, 6 (June), 85-98.]] Google ScholarDigital Library
- DOWDY, L. 1988. On the Partitioning of Multiprocessor Systems. Technical Report 88-06 (July), Department of Computer Science, Vanderbilt University.]]Google Scholar
- DUSSEAU,A.C.,ARPACI,R.H.,AND CULLER, D. E. 1996a. Effective Distributed Scheduling of Parallel Workloads. In Proceedings of 1996 ACM Sigmetrics International Conference on Measurement and Modeling of Computer Systems (1996).]] Google ScholarDigital Library
- DUSSEAU,A.C.,CULLER, D. E., SCHAUSER,K.E.,AND MARTIN, R. P. 1996b. Fast Parallel Sorting Under LogP: Experience with the CM-5. IEEE Transactions on Parallel and Distributed Systems 7,8 (August), 791-805.]] Google ScholarDigital Library
- EYKHOLT, J. R., KLEIMAN, S. R., BARTON, S., VOLL, J., FAULKNER, R., SHIVALINGIAH, A., SMITH, M., STEIN, D., WEEKS, M., AND WILLIAMS, D. 1992. Beyond Multiprocessing: Multithreading the SunOS Kernel. In Proceedings of the Summer 1992 USENIX Technical Conference and Exhibition (San Antontio, TX, USA, June 1992), pp. 11-18.]]Google Scholar
- FEITELSON, D. G. 1995. A Survey of Scheduling in Multiprogrammed Parallel Systems. Research Report RC 19790 (87657) (February), IBM T. J. Watson Research Center, Yorktown Heights, NY. Second Revision, August 1997.]]Google Scholar
- FEITELSON,D.G.AND JETTE, M. 1997. Improved Utilization and Responsiveness with Gang Scheduling. In Proceedings of the IPPS '97 Workshop on Job Scheduling Strategies for Parallel Processing (1997).]] Google ScholarDigital Library
- FEITELSON,D.G.AND RUDOLPH, L. 1990. Distributed Hierarchical Control for Parallel Processing. IEEE Computer 23, 5 (May), 65-77.]] Google ScholarDigital Library
- FEITELSON,D.G.AND RUDOLPH, L. 1992. Gang Scheduling Performance Benefits for Fine- Grained Synchronization. Journal of Parallel and Distributed Computing 16, 4 (December), 306-18.]]Google ScholarCross Ref
- FEITELSON,D.G.AND RUDOLPH, L. 1995. Coscheduling Based on Run-Time Identification of Activity Working Sets. International Journal of Parallel Programming 23, 2 (April), 136-160.]] Google ScholarDigital Library
- GHORMLEY,D.P.,PETROU, D., RODRIGUES, S. H., VAHDAT,A.M.,AND ANDERSON, T. E. 1989. GLU- nix: A Global Layer Unix for a Network of Workstations. In Software Practice and Experience (1989).]] Google ScholarDigital Library
- GOODHEART,B.AND COX, J. 1994. The Magic Garden Explained: The Internals of UNIX System V Release 4. Prentice Hall.]] Google ScholarDigital Library
- HELLERSTEIN, J. L. 1993. Achieving Service Rate Objectives with Decay Usage Scheduling. IEEE Transactions on Software Engineering 19, 8 (August), 813-825.]] Google ScholarDigital Library
- HILL,M.D.,LARUS, J. R., REINHARDT,S.K.,AND WOOD, D. A. 1993. Cooperative Shared Memory: oftware and Hardware for Scalable Multiprocessors. ACM Transactions on Computer Systems 11, 4 (November), 300-18.]] Google ScholarDigital Library
- HORI, A., TEZUKA, H., AND ISHIKAWA, Y. 1997. Global State Detection using Network Preemption. In Proceedings of the IPPS '97 Workshop on Job Scheduling Strategies for Parallel Processing (1997).]] Google ScholarDigital Library
- JACOBSON, V. 1988. Congestion avoidance and control. In SIGCOMM '88 Symposium: Communications Architectures and Protocols (Aug. 1988), pp. 314-29.]] Google ScholarDigital Library
- KARLIN, A. R., MANASSE, M., MCGEOCH, L., AND OWICKI, S. 1994. Competitive Randomized Algorithms For Nonuniform Problems. Algorithmica 11, 6 (June), 542-71.]]Google ScholarDigital Library
- KAY,J.AND LAUDER, P. 1988. A Fair Share Scheduler. Communications of the ACM31, 1 (January), 44-55.]] Google ScholarDigital Library
- LEE, W., FRANK, M., LEE, V., MACKENZIE, K., AND RUDOLPH, L. 1997. Implications of I/O for Gang Scheduled Workloads. In Proceedings of the IPPS '97 Workshop on Job Scheduling Strategies for Parallel Processing (1997).]] Google ScholarDigital Library
- LEUTENEGGER,S.T.AND VERNON, M. K. 1990. The Performance of Multiprogrammed Multiprocessor Scheduling Policies. In Proceedings of the 1990 ACM SIGMETRICS Conference (May 1990), pp. 226-36.]] Google ScholarDigital Library
- MAINWARING, A. M. 1995. Active Message Application Programming Interface and Communication Subsystem Organization. Master's thesis, University of California, Berkeley.]]Google Scholar
- MARTIN,R.P.,VAHDAT, A. M., CULLER,D.E.,AND ANDERSON, T. E. 1997. Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture. In Proceedings of the 24th Annual International Symposium on Computer Architecture (Denver, CO, June 1997).]] Google ScholarDigital Library
- NELSON,R.AND TOWSLEY, D. 1993. A Performance Evaluation of Several Priority Policies for Parallel Processing Systems. Journal of the ACM 40, 3 (July), 714-740.]] Google ScholarDigital Library
- NIEH, J., HANKO,J.G.,NORTHCUTT,J.D.,AND WALL, G. A. 1993. SVR4 Unix Scheduler Unacceptable for Multimedia Applications. In Proceedings of the Workshop on Network and Operating System Support for Digital Audio and Video (Nov. 1993).]] Google ScholarDigital Library
- OUSTERHOUT, J. K. 1982. Scheduling Techniques for Concurrent Systems. In Third International Conference on Distributed Computing Systems (May 1982), pp. 22-30.]]Google Scholar
- PAKIN, S., LAURIA, M., AND CHIEN, A. 1995. High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet. In Proceedings of the 1995 Supercomputing Conference (San Diego, California, 1995).]] Google ScholarDigital Library
- ROSTI, E., SERAZZI, G., SMIRNI, E., AND SQUILLANTE, M. S. 1998. The Impact of I/O on Program Behavior and Parallel Scheduling. In SIGMETRICS '98/PERFORMANCE'98 (June 1998).]] Google ScholarDigital Library
- SOBALVARRO,P.G.AND WEIHL, W. E. 1995. Demand-based Coscheduling of Parallel Jobs on Multi-programmed Multiprocessors. In Proceedings of the IPPS '95 Workshop on Job Scheduling Strategies for Parallel Processing (April 1995), pp. 63-75.]] Google ScholarDigital Library
- SOBALVARRO,P.G.,PAKIN, S., WEIHL,W.E.,AND CHIEN, A. A. 1998. Dynamic Coscheduling on Workstation Clusters. In Proceedings of the IPPS '98 Workshop on Job Scheduling Strategies for Parallel Processing (1998).]] Google ScholarDigital Library
- SUNDERAM, V. 1990. PVM: A Framework for Parallel Distributed Computing. Concurrency: Practice and Experience 2, 4 (December), 315-339.]] Google ScholarDigital Library
- THE MPI FORUM. 1993. MPI: A Message Passing Interface. In Proceedings of Supercomputing '93 (Nov. 1993), pp. 878-883.]] Google ScholarDigital Library
- VON EICKEN, T., BASU, A., BUCH,V.,AND VOGELS, W. 1995. U-Net: A User-Level Network Interface for Parallel and Distributed Computing. In Proceedings of the Fifteenth SOSP (Copper Mountain, CO, December 1995), pp. 40-53.]] Google ScholarDigital Library
- VON EICKEN, T., CULLER, D. E., GOLDSTEIN,S.C.,AND SCHAUSER, K. E. 1992. Active Messages: a Mechanism for Integrated Communication and Computation. In Proceedings of the 19th International Symposium on Computer Architecture (Gold Coast, Australia, May 1992).]] Google ScholarDigital Library
- WALDSPURGER, C. A. 1995. Lottery and Stride Scheduling: Flexible Proportional-Share Resource Management. Ph. D. thesis, Massachusetts Institute of Technology. Also appears as Technical Report MIT/LCS/TR-667.]] Google ScholarDigital Library
- WALDSPURGER,C.A.AND WEIHL, W. E. 1996. An Object-Oriented Framework for Modular Resource Management. In 5th Workshop on Object-Orientation in Operating Systems (IWOOOS '96) (October 1996).]] Google ScholarDigital Library
- WANG, F., PAPAEFTHYMIOU, M., AND SQUILLANTE, M. 1997. Performance Evaluation of Gang Scheduling for Parallel and Distributed Multiprogramming. In Proceedings of the IPPS '97 Workshop on Job Scheduling Strategies for Parallel Processing (1997).]] Google ScholarDigital Library
- WARREN,M.S.,BECKER,D.J.,GODA,M.P.,SALMON,J.K.,AND STERLING, T. 1997. Parallel Supercomputing with Commodity Components. In International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (Las Vegas, Nevada, June 1997), pp. 1372- 1381.]]Google Scholar
- WONG,F.C.,ARPACI-DUSSEAU,A.C.,AND CULLER, D. E. 1999. Building MPI for Multi-Programming Systems using Implicit Information. In Proceedings of The 6th European PVM/MPI User's Group Meeting (Barcelona, Spain, Sept. 1999).]] Google ScholarDigital Library
Index Terms
- Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
Recommendations
A runtime resolution scheme for priority boost conflict in implicit coscheduling
High-performance parallel and scientific applications are composed of multiple processes running on distinct CPUs that communicate frequently. Due to the synchronization needs of such applications, performance is greatly hampered if their processes are ...
Coscheduling in Clusters: Is It a Viable Alternative?
SC '04: Proceedings of the 2004 ACM/IEEE conference on SupercomputingIn this paper, we conduct an in-depth evaluation of a broad spectrum of scheduling alternatives for clusters. These include the widely used batch scheduling, local scheduling, gang scheduling, all prior communication-driven coscheduling algorithms (...
Adaptive Parallel Job Scheduling with Flexible Coscheduling
Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if ...
Comments