Effective distributed scheduling of parallel workloads

Authors:
Andrea C. Dusseau

Computer Science Division, University of California, Berkeley

Computer Science Division, University of California, Berkeley
View Profile

,
Remzi H. Arpaci

Computer Science Division, University of California, Berkeley

Computer Science Division, University of California, Berkeley
View Profile

,
David E. Culler

Computer Science Division, University of California, Berkeley

Computer Science Division, University of California, Berkeley
View Profile

Authors Info & Claims

ACM SIGMETRICS Performance Evaluation Review Volume 24 Issue 1May 1996pp 25–36https://doi.org/10.1145/233008.233020

Published:15 May 1996Publication History

ACM SIGMETRICS Performance Evaluation Review

Abstract

We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particular importance is the blocking algorithm which decides the action of a process waiting for a communication or synchronization event to complete. Through simulation of bulk-synchronous parallel applications, we find that a simple two-phase fixed-spin blocking algorithm performs well; a two-phase adaptive algorithm that gathers run-time data on barrier wait-times performs slightly better. Our results hold for a range of machine parameters and parallel program characteristics. These findings are in direct contrast to the literature that states explicit coscheduling is necessary for fine-grained programs. We show that the choice of the local scheduler is crucial, with a priority-based scheduler performing two to three times better than a round-robin scheduler. Overall, we find that the performance of implicit scheduling is near that of coscheduling (+/- 35%), without the requirement of explicit, global coordination.

References

1 R. H. Arpaci, D. E. Culler, A. Krishnamurthy, S. Steinberg, and K. Yelick. Empirical Evaluation of the CRAY-T3D: A Compiler Perspective. In Proceedsngs of the 2~.2nd Annual Internatsonal Symposzum on Computer Archztecture, June 1995. Google ScholarDigital Library
2 R. H. Arpaci, A. C. Dusseau, A. M. Vahdat, L. T. Liu, T. E. Anderson, and D. A. Patterson. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. In Proceedzngs of A CM SIGMETRICS'95/PERFORMANCE'95 Joznt International Conference on Measurement and Modetzng of Computer Systems, pages 267-278, May 1995. Google ScholarDigital Library
3 K. Arvind. Probabalistic Clock 21ynchronization in Distributed Systems. IEEE Transactzons on Parallel and Distributed Systems, 5(5):474-87, May 1994. Google ScholarDigital Library
4 M. J. Atallah, C. L. Black, D. C. Marinescu, H. J. Siegel, and T. L. Casavant. Models and Algorithms for Co-scheduling Compute-Intensive Tasks on a Network of Workstations Journal of Parallel and Distrbuted Computing, 16-319-327, 1992.Google Scholar
5 R. Chandra, S. Devine, B. Verghese, A. Gupta, and M Rosenblum. Scheduling and Page Migration for Multiprocessor Compute Servers. In Internatzonal Conference on Archztectural Support for Programming Languages and Operating Systems, Oct. 1994 Google ScholarDigital Library
6 J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D Teodoslu, and A Gupta. Hive: Fault Containment for Shared-Memory Multiprocessors. In Proceedings of the Fzfteenth ACM Symposzum on Operating Systems Pmnc,ples, pages 12-25, Dec. 1995 Google ScholarDigital Library
7 S-H. Chlang, R K. Mansharamani, and M. K. Vernon. Use of Application Characteristics and Limited Preemption for Run-To- Completion Parallel Processor Scheduhng Policies. In Proceed- ,ngs of the 1994 ACM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 34-44, Feb. 1994. Google ScholarDigital Library
8 M. Crovella, P. Das, C. Dubnicki, T. LeBlanc, and E. Markatos. Multiprogramming on Multiprocessors. In Proceed,ngs of the Third IEEE Symposium on Parallel and D~stmbuted Processzng, pages 590-597, Dec. 1991.Google Scholar
9 D. Culler, L. T. Lm, R. Martin, and C. Yoshikawa LogP Performance Assessment of Fast Network Interfaces. IEEE Mzcro, Feb. 1996. Google ScholarDigital Library
10 R. Cypher, A Ho, S. Konstantinidou, and P. Messina. Architectural Requirements of Parallel Scientific Applications with Explicit Communication. In 20th Annual Internatzonal Symposium on Computer Architecture, May 1993 Google ScholarDigital Library
11 J. J. Dongarra, J. R. Bunch, C. B. Moler, and G W. Stewart LINPACK User's Guide. SIAM, Philadelphia, Penn, 1979.Google Scholar
12 A. Dusseau, D. Culler, K. Schauser, and R. Martin Fast Parallel Sorting under LogP: Experience with the CM-5 IEEE Transactions on Parallel and D~stmbuted Systems, 1996. To Appear. Google ScholarDigital Library
13 S. Evans, K. Clarke, D. Singleton, and B. Smaalders. Optimizing Unix Resource Scheduling for User Interaction. In 1993 Summer Usen,x, pages 205-218. USENIX, June 1993.Google Scholar
14 D. G. Feitelson and L. Rudolph. Distributed Hierarchical Control for Parallel Processing. IEEE Computer, 23(5):65-77, May 1990. Google ScholarDigital Library
15 D. G. Feltelson and L Rudolph. Gang Scheduling Performance Benefits for Fine-Grained Synchronization Journal of Parallel and D,stmbuted Computing, 16(4):306-318, Dec. 1992Google Scholar
16 D. G. Feitelson and L Rudolph. Coscheduhng Based on Run- Time Identificatmn of Actlwty Working Sets International Journal of Parallel Programming, 23(2):136-160, Apr. 1995. Google ScholarDigital Library
17 D. Ghosal, G Serazzi, and S. K. Trlpathl The Processor Working Set and Its Use in Scheduling Mult~processor Systems. IEEE Transactions on Software Eng~neemng, 17(5):443-453, May t991. Google ScholarDigital Library
18 B Goodheart and J Cox The Magic Garden Explained: The Internals of UNIX System V Release 4. Prentice Hall, 1994. Google ScholarDigital Library
19 A. Gupta, A. Tucker, and S. Urushibaru. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In Proceed,ngs of the 199i A CM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 120-131, May 1991. Google ScholarDigital Library
20 High Performance Fortran Forum. High Performance Fortran language specification version 1 0 Draft, Jan. 1993Google Scholar
21 M, D. Hill, J. R. L~ru~, S. K. l~~inh~rdt, ~nd D. A. Wood. Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors ACM Transactions on Computer Systems, 11(4).300-18, Nov. 1993. Google ScholarDigital Library
22 A. Karlin, K. Li, M. Manasse, and S Owicki Empirical studies of competitive spinning for a shared-memory multiprocessor. In Thirteenth A CM Symposzum on Operating Systems Pmnc,ples, Oct. 1991. Google ScholarDigital Library
23 K. Keeton, T. Anderson, and D. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Proceedings of Hot Interconnects II{, Aug. 1995Google Scholar
24 L. I. Kontothanassm and R. W. Wismewski. Using Scheduler Information to Achmve Optimal Barrier Synchronization Performance. In Proceedzngs of the ACM SIGPLAN Symposium on Pmnczples and Practice of Parallel Programming, May 1993 Google ScholarDigital Library
25 S. J. Leffler, M. K McKusick, M. J Karels, and J. S. Quarterman. The Design and lmptementatzon of the ~4.3BSD UNIX Operating System Addison-Wesley, 1990. Google ScholarDigital Library
26 S. T. Leutenegger and X.-H. Sun. Distributed Computing Feasibility in a Non-Dedicated Homogenous Distributed System. In Proceedzngs of Supercomput~ng 93, pages 143-152, 1993. Google ScholarDigital Library
27 S. T. Leutenegger and M. K. Vernon. The Performance of Multiprogrammed Multlprocessor Scheduhng Policies In Proceedings of the A CM SIGMETRICS Conference, pages 226-236, May 1990. Google ScholarDigital Library
28 R. P. Martin HPAM An Active Message Layer for a Network of Workstations. In Proceedzngs of Hot Interconnects {I, July 1994.Google Scholar
29 C. McCann, R. Vaswam, and J. Zahorjan. A Dynamic Processor Allocation Policy for Multi-programmed Shared-Memory Multiprocessors. In A CM Transactions on Computer Systems, volume 11, pages 146-178, May 1993. Google ScholarDigital Library
30 C. McCann and J. Zahorjan Processor Allocation Policies for Message-Passing Parallel Computers. In Proceedings of the 1994 A CM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 19-32, Feb. 1994. Google ScholarDigital Library
31 C. McCann and J. Zahorjan Scheduling Memory Constrained Jobs on Distributed Memory Parallel Computers. In Proceedings of ACM SIGMETRICS'95/PERFORMANCE'95 Joint International Conference on Measurement and Modehng of Computer Systems, pages 208-219, May 1995. Google ScholarDigital Library
32 L McVoy and C. Staelin. lmbench: Portable Tools for Performance Analysis. In Proceedings of the i996 W, nter USENIX, Jan. 1996. Google ScholarDigital Library
33 V. K. Naik, S. K Setm, and M S Squillante. Performance Analysis of Job Scheduling Policies m Parallel Supercomputing Enwronments. In Proceedings of Supercomputmg '93, pages 824-833, Nov. 1993. Google ScholarDigital Library
34 A. Olson and K. G. Shin. Fault-tolerant Clock Synchronization in Large Multlcomputer Systems. IEEE Transactions on Parallel and D~stmbuted Systems, 5(9) 912-923, Sept. 1994. Google ScholarDigital Library
35 J. K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Th,rd International Conference on D~stmbuted Comput,ng Systems, pages 22-30, May 1982Google Scholar
36 V G Peris, M. S. Squillante, and V. K. Naik. Analysis of the Impact of Memory in Dmtributed Parallel Processing Sytems In 1994 A CM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 5-18, Feb 1994. Google ScholarDigital Library
37 K. C Sevclk Characterizations of Parailelmm m Applications and their Use in Scheduhng. in Proceedings of the 1989 A CM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 171-180, May 1989. Google ScholarDigital Library
38 J P Singh, W.-D Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20(1):5-44, Mar. 1992. Google ScholarDigital Library
39 P G Sobalvarro and W. E. Weihl Demand-based Coscheduling of Parallel Jobs on Multiprogrammed Multlprocessors. In Proceedzngs of the IPPS '95 Workshop on Job Scheduhng Strategzes for Parallel Processing, pages 63-75, Apr 1995 Google ScholarDigital Library
40 T yon Elcken, A Basu, V. Buch, and W. Vogels. U-Net' A User- Level Network Interface for Parallel and Distributed Computing. In Proceedings of the F,fteenth A CM Symposium on Operating System Pmnc~ples, pages 40-51, Dec 1995 Google ScholarDigital Library
41 T von E~cken, D. E. Culler, S C. Goldstem, and K E Schauser. Active Messages' a Mechamsm for Integrated Communication ~nd Comlou~ion. In Pr~vwd,r~g~ ef th~ 19th Ir,~vrnatlenal Symposium on Computer ArcMtecture, May 1992 Google ScholarDigital Library
42 R W. Wisniewski, L I. Kontothanassis, and M L Scott. High Performance Synchromzat~on Algomthms for Multiprogrammed Multiprocessors In Proceed,rigs of the Fifth A CM SIGPLAN Sympos,um on Principles and Practice of Parallel Programm~ng, pages 199-206, July 1995. Google ScholarDigital Library
43 J. Zahor9an and E. D. Lazowska. Spinning Versus Blocking m Parallel ~y~tems with Uncertain~y. In Proceedings of the IFIP International Seminar on Performance of D,stmbuted and Parallel Systems, pages 455-472, Dec. 1988.Google Scholar

Index Terms

Effective distributed scheduling of parallel workloads

Recommendations

Effective distributed scheduling of parallel workloads
SIGMETRICS '96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of ...
Read More
Simulation Based Job Scheduling Optimization for Batch Workloads
ICPE '19: Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering

We present a simulation based approach for scheduling jobs that are part of a batch workflow. Our objective is to minimize the makespan, defined as completion time of the last job to leave the system in a batch workflow with dependencies. The existing ...
Read More
Parallel job scheduling algorithms
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMETRICS Performance Evaluation Review Volume 24, Issue 1
May 1996
273 pages
ISSN:0163-5999
DOI:10.1145/233008
Editors:
Blaine E. Gaither
Hewlett-Packard, Fort Collins, CO
,
Daniel A. Reed
Univ. of Illinois, Urbana
Issue’s Table of Contents
SIGMETRICS '96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
May 1996
279 pages
ISBN:0897917936
DOI:10.1145/233013
Chairman:
Daniel A. Reed
Univ. of Illinois, Urbana
,
Editor:
Blaine D. Gaither
Hewlett-Packard, Fort Collins, CO
Copyright © 1996 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 May 1996
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 110
  Total Citations
  View Citations
- 1,683
  Total Downloads
- Downloads (Last 12 months)114
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effective distributed scheduling of parallel workloads

ACM SIGMETRICS Performance Evaluation Review

Abstract

References

Cited By

Index Terms

Recommendations

Effective distributed scheduling of parallel workloads

Simulation Based Job Scheduling Optimization for Batch Workloads

Parallel job scheduling algorithms