skip to main content
article
Free Access

Effective distributed scheduling of parallel workloads

Authors Info & Claims
Published:15 May 1996Publication History
Skip Abstract Section

Abstract

We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particular importance is the blocking algorithm which decides the action of a process waiting for a communication or synchronization event to complete. Through simulation of bulk-synchronous parallel applications, we find that a simple two-phase fixed-spin blocking algorithm performs well; a two-phase adaptive algorithm that gathers run-time data on barrier wait-times performs slightly better. Our results hold for a range of machine parameters and parallel program characteristics. These findings are in direct contrast to the literature that states explicit coscheduling is necessary for fine-grained programs. We show that the choice of the local scheduler is crucial, with a priority-based scheduler performing two to three times better than a round-robin scheduler. Overall, we find that the performance of implicit scheduling is near that of coscheduling (+/- 35%), without the requirement of explicit, global coordination.

References

  1. 1 R. H. Arpaci, D. E. Culler, A. Krishnamurthy, S. Steinberg, and K. Yelick. Empirical Evaluation of the CRAY-T3D: A Compiler Perspective. In Proceedsngs of the 2~.2nd Annual Internatsonal Symposzum on Computer Archztecture, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 R. H. Arpaci, A. C. Dusseau, A. M. Vahdat, L. T. Liu, T. E. Anderson, and D. A. Patterson. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. In Proceedzngs of A CM SIGMETRICS'95/PERFORMANCE'95 Joznt International Conference on Measurement and Modetzng of Computer Systems, pages 267-278, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 K. Arvind. Probabalistic Clock 21ynchronization in Distributed Systems. IEEE Transactzons on Parallel and Distributed Systems, 5(5):474-87, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 M. J. Atallah, C. L. Black, D. C. Marinescu, H. J. Siegel, and T. L. Casavant. Models and Algorithms for Co-scheduling Compute-Intensive Tasks on a Network of Workstations Journal of Parallel and Distrbuted Computing, 16-319-327, 1992.Google ScholarGoogle Scholar
  5. 5 R. Chandra, S. Devine, B. Verghese, A. Gupta, and M Rosenblum. Scheduling and Page Migration for Multiprocessor Compute Servers. In Internatzonal Conference on Archztectural Support for Programming Languages and Operating Systems, Oct. 1994 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D Teodoslu, and A Gupta. Hive: Fault Containment for Shared-Memory Multiprocessors. In Proceedings of the Fzfteenth ACM Symposzum on Operating Systems Pmnc,ples, pages 12-25, Dec. 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 S-H. Chlang, R K. Mansharamani, and M. K. Vernon. Use of Application Characteristics and Limited Preemption for Run-To- Completion Parallel Processor Scheduhng Policies. In Proceed- ,ngs of the 1994 ACM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 34-44, Feb. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 M. Crovella, P. Das, C. Dubnicki, T. LeBlanc, and E. Markatos. Multiprogramming on Multiprocessors. In Proceed,ngs of the Third IEEE Symposium on Parallel and D~stmbuted Processzng, pages 590-597, Dec. 1991.Google ScholarGoogle Scholar
  9. 9 D. Culler, L. T. Lm, R. Martin, and C. Yoshikawa LogP Performance Assessment of Fast Network Interfaces. IEEE Mzcro, Feb. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 R. Cypher, A Ho, S. Konstantinidou, and P. Messina. Architectural Requirements of Parallel Scientific Applications with Explicit Communication. In 20th Annual Internatzonal Symposium on Computer Architecture, May 1993 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 J. J. Dongarra, J. R. Bunch, C. B. Moler, and G W. Stewart LINPACK User's Guide. SIAM, Philadelphia, Penn, 1979.Google ScholarGoogle Scholar
  12. 12 A. Dusseau, D. Culler, K. Schauser, and R. Martin Fast Parallel Sorting under LogP: Experience with the CM-5 IEEE Transactions on Parallel and D~stmbuted Systems, 1996. To Appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 S. Evans, K. Clarke, D. Singleton, and B. Smaalders. Optimizing Unix Resource Scheduling for User Interaction. In 1993 Summer Usen,x, pages 205-218. USENIX, June 1993.Google ScholarGoogle Scholar
  14. 14 D. G. Feitelson and L. Rudolph. Distributed Hierarchical Control for Parallel Processing. IEEE Computer, 23(5):65-77, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 D. G. Feltelson and L Rudolph. Gang Scheduling Performance Benefits for Fine-Grained Synchronization Journal of Parallel and D,stmbuted Computing, 16(4):306-318, Dec. 1992Google ScholarGoogle Scholar
  16. 16 D. G. Feitelson and L Rudolph. Coscheduhng Based on Run- Time Identificatmn of Actlwty Working Sets International Journal of Parallel Programming, 23(2):136-160, Apr. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 D. Ghosal, G Serazzi, and S. K. Trlpathl The Processor Working Set and Its Use in Scheduling Mult~processor Systems. IEEE Transactions on Software Eng~neemng, 17(5):443-453, May t991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 B Goodheart and J Cox The Magic Garden Explained: The Internals of UNIX System V Release 4. Prentice Hall, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 A. Gupta, A. Tucker, and S. Urushibaru. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In Proceed,ngs of the 199i A CM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 120-131, May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 High Performance Fortran Forum. High Performance Fortran language specification version 1 0 Draft, Jan. 1993Google ScholarGoogle Scholar
  21. 21 M, D. Hill, J. R. L~ru~, S. K. l~~inh~rdt, ~nd D. A. Wood. Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors ACM Transactions on Computer Systems, 11(4).300-18, Nov. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 A. Karlin, K. Li, M. Manasse, and S Owicki Empirical studies of competitive spinning for a shared-memory multiprocessor. In Thirteenth A CM Symposzum on Operating Systems Pmnc,ples, Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 K. Keeton, T. Anderson, and D. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Proceedings of Hot Interconnects II{, Aug. 1995Google ScholarGoogle Scholar
  24. 24 L. I. Kontothanassm and R. W. Wismewski. Using Scheduler Information to Achmve Optimal Barrier Synchronization Performance. In Proceedzngs of the ACM SIGPLAN Symposium on Pmnczples and Practice of Parallel Programming, May 1993 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25 S. J. Leffler, M. K McKusick, M. J Karels, and J. S. Quarterman. The Design and lmptementatzon of the ~4.3BSD UNIX Operating System Addison-Wesley, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26 S. T. Leutenegger and X.-H. Sun. Distributed Computing Feasibility in a Non-Dedicated Homogenous Distributed System. In Proceedzngs of Supercomput~ng 93, pages 143-152, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27 S. T. Leutenegger and M. K. Vernon. The Performance of Multiprogrammed Multlprocessor Scheduhng Policies In Proceedings of the A CM SIGMETRICS Conference, pages 226-236, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28 R. P. Martin HPAM An Active Message Layer for a Network of Workstations. In Proceedzngs of Hot Interconnects {I, July 1994.Google ScholarGoogle Scholar
  29. 29 C. McCann, R. Vaswam, and J. Zahorjan. A Dynamic Processor Allocation Policy for Multi-programmed Shared-Memory Multiprocessors. In A CM Transactions on Computer Systems, volume 11, pages 146-178, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30 C. McCann and J. Zahorjan Processor Allocation Policies for Message-Passing Parallel Computers. In Proceedings of the 1994 A CM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 19-32, Feb. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31 C. McCann and J. Zahorjan Scheduling Memory Constrained Jobs on Distributed Memory Parallel Computers. In Proceedings of ACM SIGMETRICS'95/PERFORMANCE'95 Joint International Conference on Measurement and Modehng of Computer Systems, pages 208-219, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32 L McVoy and C. Staelin. lmbench: Portable Tools for Performance Analysis. In Proceedings of the i996 W, nter USENIX, Jan. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33 V. K. Naik, S. K Setm, and M S Squillante. Performance Analysis of Job Scheduling Policies m Parallel Supercomputing Enwronments. In Proceedings of Supercomputmg '93, pages 824-833, Nov. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34 A. Olson and K. G. Shin. Fault-tolerant Clock Synchronization in Large Multlcomputer Systems. IEEE Transactions on Parallel and D~stmbuted Systems, 5(9) 912-923, Sept. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35 J. K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Th,rd International Conference on D~stmbuted Comput,ng Systems, pages 22-30, May 1982Google ScholarGoogle Scholar
  36. 36 V G Peris, M. S. Squillante, and V. K. Naik. Analysis of the Impact of Memory in Dmtributed Parallel Processing Sytems In 1994 A CM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 5-18, Feb 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37 K. C Sevclk Characterizations of Parailelmm m Applications and their Use in Scheduhng. in Proceedings of the 1989 A CM SIGMETRICS Conference on Measurement and Modehng of Computer Systems, pages 171-180, May 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. 38 J P Singh, W.-D Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Computer Architecture News, 20(1):5-44, Mar. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. 39 P G Sobalvarro and W. E. Weihl Demand-based Coscheduling of Parallel Jobs on Multiprogrammed Multlprocessors. In Proceedzngs of the IPPS '95 Workshop on Job Scheduhng Strategzes for Parallel Processing, pages 63-75, Apr 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. 40 T yon Elcken, A Basu, V. Buch, and W. Vogels. U-Net' A User- Level Network Interface for Parallel and Distributed Computing. In Proceedings of the F,fteenth A CM Symposium on Operating System Pmnc~ples, pages 40-51, Dec 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. 41 T von E~cken, D. E. Culler, S C. Goldstem, and K E Schauser. Active Messages' a Mechamsm for Integrated Communication ~nd Comlou~ion. In Pr~vwd,r~g~ ef th~ 19th Ir,~vrnatlenal Symposium on Computer ArcMtecture, May 1992 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. 42 R W. Wisniewski, L I. Kontothanassis, and M L Scott. High Performance Synchromzat~on Algomthms for Multiprogrammed Multiprocessors In Proceed,rigs of the Fifth A CM SIGPLAN Sympos,um on Principles and Practice of Parallel Programm~ng, pages 199-206, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. 43 J. Zahor9an and E. D. Lazowska. Spinning Versus Blocking m Parallel ~y~tems with Uncertain~y. In Proceedings of the IFIP International Seminar on Performance of D,stmbuted and Parallel Systems, pages 455-472, Dec. 1988.Google ScholarGoogle Scholar

Index Terms

  1. Effective distributed scheduling of parallel workloads

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGMETRICS Performance Evaluation Review
              ACM SIGMETRICS Performance Evaluation Review  Volume 24, Issue 1
              May 1996
              273 pages
              ISSN:0163-5999
              DOI:10.1145/233008
              Issue’s Table of Contents
              • cover image ACM Conferences
                SIGMETRICS '96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
                May 1996
                279 pages
                ISBN:0897917936
                DOI:10.1145/233013

              Copyright © 1996 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 15 May 1996

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader