Abstract
We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particular importance is the blocking algorithm which decides the action of a process waiting for a communication or synchronization event to complete. Through simulation of bulk-synchronous parallel applications, we find that a simple two-phase fixed-spin blocking algorithm performs well; a two-phase adaptive algorithm that gathers run-time data on barrier wait-times performs slightly better. Our results hold for a range of machine parameters and parallel program characteristics. These findings are in direct contrast to the literature that states explicit coscheduling is necessary for fine-grained programs. We show that the choice of the local scheduler is crucial, with a priority-based scheduler performing two to three times better than a round-robin scheduler. Overall, we find that the performance of implicit scheduling is near that of coscheduling (+/- 35%), without the requirement of explicit, global coordination.
- 1 R. H. Arpaci, D. E. Culler, A. Krishnamurthy, S. Steinberg, and K. Yelick. Empirical Evaluation of the CRAY-T3D: A Compiler Perspective. In Proceedsngs of the 2~.2nd Annual Internatsonal Symposzum on Computer Archztecture, June 1995. Google ScholarDigital Library
- 2 R. H. Arpaci, A. C. Dusseau, A. M. Vahdat, L. T. Liu, T. E. Anderson, and D. A. Patterson. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. In Proceedzngs of A CM SIGMETRICS'95/PERFORMANCE'95 Joznt International Conference on Measurement and Modetzng of Computer Systems, pages 267-278, May 1995. Google ScholarDigital Library
- 3 K. Arvind. Probabalistic Clock 21ynchronization in Distributed Systems. IEEE Transactzons on Parallel and Distributed Systems, 5(5):474-87, May 1994. Google ScholarDigital Library
- 4 M. J. Atallah, C. L. Black, D. C. Marinescu, H. J. Siegel, and T. L. Casavant. Models and Algorithms for Co-scheduling Compute-Intensive Tasks on a Network of Workstations Journal of Parallel and Distrbuted Computing, 16-319-327, 1992.Google Scholar
- 5 R. Chandra, S. Devine, B. Verghese, A. Gupta, and M Rosenblum. Scheduling and Page Migration for Multiprocessor Compute Servers. In Internatzonal Conference on Archztectural Support for Programming Languages and Operating Systems, Oct. 1994 Google ScholarDigital Library
- 6 J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D Teodoslu, and A Gupta. Hive: Fault Containment for Shared-Memory Multiprocessors. In Proceedings of the Fzfteenth ACM Symposzum on Operating Systems Pmnc,ples, pages 12-25, Dec. 1995 Google ScholarDigital Library
- 7 S-H. Chlang, R K. Mansharamani, and M. K. Vernon. Use of Application Characteristics and Limited Preemption for Run-To- Completion Parallel Processor Scheduhng Policies. In Proceed- ,ngs of the 1994 ACM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 34-44, Feb. 1994. Google ScholarDigital Library
- 8 M. Crovella, P. Das, C. Dubnicki, T. LeBlanc, and E. Markatos. Multiprogramming on Multiprocessors. In Proceed,ngs of the Third IEEE Symposium on Parallel and D~stmbuted Processzng, pages 590-597, Dec. 1991.Google Scholar
- 9 D. Culler, L. T. Lm, R. Martin, and C. Yoshikawa LogP Performance Assessment of Fast Network Interfaces. IEEE Mzcro, Feb. 1996. Google ScholarDigital Library
- 10 R. Cypher, A Ho, S. Konstantinidou, and P. Messina. Architectural Requirements of Parallel Scientific Applications with Explicit Communication. In 20th Annual Internatzonal Symposium on Computer Architecture, May 1993 Google ScholarDigital Library
- 11 J. J. Dongarra, J. R. Bunch, C. B. Moler, and G W. Stewart LINPACK User's Guide. SIAM, Philadelphia, Penn, 1979.Google Scholar
- 12 A. Dusseau, D. Culler, K. Schauser, and R. Martin Fast Parallel Sorting under LogP: Experience with the CM-5 IEEE Transactions on Parallel and D~stmbuted Systems, 1996. To Appear. Google ScholarDigital Library
- 13 S. Evans, K. Clarke, D. Singleton, and B. Smaalders. Optimizing Unix Resource Scheduling for User Interaction. In 1993 Summer Usen,x, pages 205-218. USENIX, June 1993.Google Scholar
- 14 D. G. Feitelson and L. Rudolph. Distributed Hierarchical Control for Parallel Processing. IEEE Computer, 23(5):65-77, May 1990. Google ScholarDigital Library
- 15 D. G. Feltelson and L Rudolph. Gang Scheduling Performance Benefits for Fine-Grained Synchronization Journal of Parallel and D,stmbuted Computing, 16(4):306-318, Dec. 1992Google Scholar
- 16 D. G. Feitelson and L Rudolph. Coscheduhng Based on Run- Time Identificatmn of Actlwty Working Sets International Journal of Parallel Programming, 23(2):136-160, Apr. 1995. Google ScholarDigital Library
- 17 D. Ghosal, G Serazzi, and S. K. Trlpathl The Processor Working Set and Its Use in Scheduling Mult~processor Systems. IEEE Transactions on Software Eng~neemng, 17(5):443-453, May t991. Google ScholarDigital Library
- 18 B Goodheart and J Cox The Magic Garden Explained: The Internals of UNIX System V Release 4. Prentice Hall, 1994. Google ScholarDigital Library
- 19 A. Gupta, A. Tucker, and S. Urushibaru. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In Proceed,ngs of the 199i A CM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 120-131, May 1991. Google ScholarDigital Library
- 20 High Performance Fortran Forum. High Performance Fortran language specification version 1 0 Draft, Jan. 1993Google Scholar
- 21 M, D. Hill, J. R. L~ru~, S. K. l~~inh~rdt, ~nd D. A. Wood. Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors ACM Transactions on Computer Systems, 11(4).300-18, Nov. 1993. Google ScholarDigital Library
- 22 A. Karlin, K. Li, M. Manasse, and S Owicki Empirical studies of competitive spinning for a shared-memory multiprocessor. In Thirteenth A CM Symposzum on Operating Systems Pmnc,ples, Oct. 1991. Google ScholarDigital Library
- 23 K. Keeton, T. Anderson, and D. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Proceedings of Hot Interconnects II{, Aug. 1995Google Scholar
- 24 L. I. Kontothanassm and R. W. Wismewski. Using Scheduler Information to Achmve Optimal Barrier Synchronization Performance. In Proceedzngs of the ACM SIGPLAN Symposium on Pmnczples and Practice of Parallel Programming, May 1993 Google ScholarDigital Library
- 25 S. J. Leffler, M. K McKusick, M. J Karels, and J. S. Quarterman. The Design and lmptementatzon of the ~4.3BSD UNIX Operating System Addison-Wesley, 1990. Google ScholarDigital Library
- 26 S. T. Leutenegger and X.-H. Sun. Distributed Computing Feasibility in a Non-Dedicated Homogenous Distributed System. In Proceedzngs of Supercomput~ng 93, pages 143-152, 1993. Google ScholarDigital Library
- 27 S. T. Leutenegger and M. K. Vernon. The Performance of Multiprogrammed Multlprocessor Scheduhng Policies In Proceedings of the A CM SIGMETRICS Conference, pages 226-236, May 1990. Google ScholarDigital Library
- 28 R. P. Martin HPAM An Active Message Layer for a Network of Workstations. In Proceedzngs of Hot Interconnects {I, July 1994.Google Scholar
- 29 C. McCann, R. Vaswam, and J. Zahorjan. A Dynamic Processor Allocation Policy for Multi-programmed Shared-Memory Multiprocessors. In A CM Transactions on Computer Systems, volume 11, pages 146-178, May 1993. Google ScholarDigital Library
- 30 C. McCann and J. Zahorjan Processor Allocation Policies for Message-Passing Parallel Computers. In Proceedings of the 1994 A CM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 19-32, Feb. 1994. Google ScholarDigital Library
- 31 C. McCann and J. Zahorjan Scheduling Memory Constrained Jobs on Distributed Memory Parallel Computers. In Proceedings of ACM SIGMETRICS'95/PERFORMANCE'95 Joint International Conference on Measurement and Modehng of Computer Systems, pages 208-219, May 1995. Google ScholarDigital Library
- 32 L McVoy and C. Staelin. lmbench: Portable Tools for Performance Analysis. In Proceedings of the i996 W, nter USENIX, Jan. 1996. Google ScholarDigital Library
- 33 V. K. Naik, S. K Setm, and M S Squillante. Performance Analysis of Job Scheduling Policies m Parallel Supercomputing Enwronments. In Proceedings of Supercomputmg '93, pages 824-833, Nov. 1993. Google ScholarDigital Library
- 34 A. Olson and K. G. Shin. Fault-tolerant Clock Synchronization in Large Multlcomputer Systems. IEEE Transactions on Parallel and D~stmbuted Systems, 5(9) 912-923, Sept. 1994. Google ScholarDigital Library
- 35 J. K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Th,rd International Conference on D~stmbuted Comput,ng Systems, pages 22-30, May 1982Google Scholar
- 36 V G Peris, M. S. Squillante, and V. K. Naik. Analysis of the Impact of Memory in Dmtributed Parallel Processing Sytems In 1994 A CM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 5-18, Feb 1994. Google ScholarDigital Library
- 37 K. C Sevclk Characterizations of Parailelmm m Applications and their Use in Scheduhng. in Proceedings of the 1989 A CM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 171-180, May 1989. Google ScholarDigital Library
- 38 J P Singh, W.-D Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20(1):5-44, Mar. 1992. Google ScholarDigital Library
- 39 P G Sobalvarro and W. E. Weihl Demand-based Coscheduling of Parallel Jobs on Multiprogrammed Multlprocessors. In Proceedzngs of the IPPS '95 Workshop on Job Scheduhng Strategzes for Parallel Processing, pages 63-75, Apr 1995 Google ScholarDigital Library
- 40 T yon Elcken, A Basu, V. Buch, and W. Vogels. U-Net' A User- Level Network Interface for Parallel and Distributed Computing. In Proceedings of the F,fteenth A CM Symposium on Operating System Pmnc~ples, pages 40-51, Dec 1995 Google ScholarDigital Library
- 41 T von E~cken, D. E. Culler, S C. Goldstem, and K E Schauser. Active Messages' a Mechamsm for Integrated Communication ~nd Comlou~ion. In Pr~vwd,r~g~ ef th~ 19th Ir,~vrnatlenal Symposium on Computer ArcMtecture, May 1992 Google ScholarDigital Library
- 42 R W. Wisniewski, L I. Kontothanassis, and M L Scott. High Performance Synchromzat~on Algomthms for Multiprogrammed Multiprocessors In Proceed,rigs of the Fifth A CM SIGPLAN Sympos,um on Principles and Practice of Parallel Programm~ng, pages 199-206, July 1995. Google ScholarDigital Library
- 43 J. Zahor9an and E. D. Lazowska. Spinning Versus Blocking m Parallel ~y~tems with Uncertain~y. In Proceedings of the IFIP International Seminar on Performance of D,stmbuted and Parallel Systems, pages 455-472, Dec. 1988.Google Scholar
Index Terms
- Effective distributed scheduling of parallel workloads
Recommendations
Effective distributed scheduling of parallel workloads
SIGMETRICS '96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsWe present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of ...
Simulation Based Job Scheduling Optimization for Batch Workloads
ICPE '19: Proceedings of the 2019 ACM/SPEC International Conference on Performance EngineeringWe present a simulation based approach for scheduling jobs that are part of a batch workflow. Our objective is to minimize the makespan, defined as completion time of the last job to leave the system in a batch workflow with dependencies. The existing ...
Comments