ABSTRACT
Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph is known at compile-time or on dynamic scheduling of independent loop iterations such as in OpenMP.
In irregular algorithms, dependences between activities are complex functions of runtime values so these algorithms are not amenable to compile-time analysis nor do they consist of independent activities. Moreover, the amount of work can vary dramatically with the scheduling policy. To handle these complexities, implementations of irregular algorithms employ carefully handcrafted, algorithm-specific schedulers but these schedulers are themselves parallel programs, complicating the parallel programming problem further.
In this paper, we present a flexible and efficient approach for specifying and synthesizing scheduling policies for irregular algorithms. We develop a simple compositional specification language and show how it can concisely encode scheduling policies in the literature. Then, we show how to synthesize efficient parallel schedulers from these specifications. We evaluate our approach for five irregular algorithms on three multicore architectures and show that (1) the performance of some algorithms can improve by orders of magnitude with the right scheduling policy, and (2) for the same policy, the overheads of our synthesized schedulers are comparable to those of fixed-function schedulers.
- N. Amenta, S. Choi, and G. Rote. Incremental constructions con brio. In Proc. Symp. on Computational Geometry (SCG), pages 211--219, 2003. Google ScholarDigital Library
- L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, May 1994. (DIKU report 94/19).Google Scholar
- D. Bader and V. Sachdeva. A cache-aware parallel implementation of the push-relabel network flow algorithm and experimental evaluation of the gap relabeling heuristic. In Proc. 18th ISCA International Conference on Parallel and Distributed Computing Systems (PDCS), September 2005.Google Scholar
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995. Google ScholarDigital Library
- A. Bowyer. Computing Dirichlet tessellations. The Computer Journal, 24(2):162--166, 1981.Google ScholarCross Ref
- B. V. Cherkassy and A. V. Goldberg. On implementing push-relabel method for the maximum flow problem. In Proc. Intl. Conf. on Integer Programming and Combinatorial Optimization (IPCO), pages 157--171, London, UK, 1995. Google ScholarDigital Library
- L. P. Chew. Guaranteed-quality mesh generation for curved surfaces. In Proc. Symp. on Computational Geometry (SCG), 1993. Google ScholarDigital Library
- K. L. Clarkson and P. W. Shor. Applications of random sampling in computational geometry, ii. Discrete Comput. Geom., 4(5):287--421, 1989.Google ScholarDigital Library
- T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google ScholarDigital Library
- L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng., 5(1):46--55, 1998. Google ScholarDigital Library
- A. Duran, J. Corbalán, and E. Ayguadé. Evaluation of OpenMP task scheduling strategies. In Proc. Conf. on OpenMP in a new era of parallelism (IWOMP), pages 100--110, 2008. Google ScholarDigital Library
- A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988. Google ScholarDigital Library
- R. H. Halstead, Jr. MULTILISP: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst., 7:501--538, October 1985. Google ScholarDigital Library
- B. Hardekopf and C. Lin. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proc. Conf. on Programming Language Design and Implementation (PLDI), 2007. Google ScholarDigital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proc. of International Symposium on Computer Architecture (ISCA), 1993. Google ScholarDigital Library
- M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463--492, 1990. Google ScholarDigital Library
- Intel Corporation. Intel threading building blocks 3.0. http://threadingbuildingblocks.org.Google Scholar
- M. Kulkarni, P. Carribault, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Scheduling strategies for optimistic parallel execution of irregular programs. In Proc. Symp. on Parallelism in algorithms and architectures (SPAA), pages 217--228, New York, NY, USA, 2008. Google ScholarDigital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI), 42(6):211--222, 2007. Google ScholarDigital Library
- M. Kulkarni, D. Prountzos, D. Nguyen, and K. Pingali. Defining and implementing commutativity conditions for parallel execution. regular rech report TR-ECE-09-11, School of Electrial and Computer Engineering, Purdue University, August 2009.Google Scholar
- D. Lea. A Java fork/join framework. In Proc. Conf. on Java Grande, pages 36--43, 2000. Google ScholarDigital Library
- D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proc. Conf. on object-oriented programming, systems, languages, and applications (OOPSLA), pages 227--242, 2009. Google ScholarDigital Library
- K. Madduri, D. Bader, J. Berry, and J. Crobak. Parallel shortest path algorithms for solving large-scale instances. Technical Report GT-CSE-06-19, Georgia Institute of Technology, September 2006.Google Scholar
- M. Méndez-Lojo, A. Mathew, and K. Pingali. Parallel inclusion-based points-to analysis. In Proc. Intl. Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH), 2010. Google ScholarDigital Library
- U. Meyer and P. Sanders. Delta-stepping: A parallel single source shortest path algorithm. In Proc. European Symp. on Algorithms (ESA), pages 393--404, 1998. Google ScholarDigital Library
- M. M. Michael, M. T. Vechev, and V. A. Saraswat. Idempotent work stealing. In Proc. Symp. on Principles and practice of parallel programming (PPoPP), pages 45--54, 2009. Google ScholarDigital Library
- G. L. Miller. A time efficient Delaunay refinement algorithm. In Proc. Symposium on Discrete Algorithms (SODA), pages 400--409, New Orleans, 2004. Google ScholarDigital Library
- F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program Analysis. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999. Google ScholarDigital Library
- D. J. Pearce, P. H. J. Kelly, and C. L. Hankin. Online cycle detection and difference propagation for pointer analysis. In Intl. IEEE Workshop on Source Code Analysis and Manipulation (SCAM), pages 3--12, 2003.Google ScholarCross Ref
- K. Pingali, M. Kulkarni, D. Nguyen, M. Burtscher, M. Mendez-Lojo, D. Prountzos, X. Sui, and Z. Zhong. Amorphous data-parallelism in irregular algorithms. regular tech report TR-09-05, The University of Texas at Austin, 2009.Google Scholar
- J. Ruppert. A Delaunay refinement algorithm for quality 2-dimensional mesh generation. J. Algorithms, 18(3):548--585, 1995. Google ScholarDigital Library
- J. R. Shewchuk. Triangle: Engineering a 2d quality mesh generator and Delaunay triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. Springer-Verlag, 1996. Google ScholarDigital Library
- A. Solar-Lezama, C. G. Jones, and R. Bodik. Sketching concurrent data structures. In Proc. Conf. on Programming Language Design and Implementation (PLDI), pages 136--148, New York, NY, USA, 2008. Google ScholarDigital Library
- M. T. Vechev, E. Yahav, D. F. Bacon, and N. Rinetzky. CGCExplorer: a semi-automated search procedure for provably correct concurrent collectors. In Proc. Conf. on Programming Language Design and Implementation (PLDI), pages 456--467, New York, NY, USA, 2007. Google ScholarDigital Library
- D. F. Watson. Computing the n-dimensional tessellation with application to Voronoi polytopes. The Computer Journal, 24(2):167--172, 1981.Google ScholarCross Ref
- J. Wu, R. Das, J. Saltz, H. Berryman, and S. Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44, 1995. Google ScholarDigital Library
Index Terms
- Synthesizing concurrent schedulers for irregular algorithms
Recommendations
Synthesizing concurrent schedulers for irregular algorithms
ASPLOS '11Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph ...
Synthesizing concurrent schedulers for irregular algorithms
ASPLOS '11Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph ...
Scheduling of deteriorating jobs with release dates to minimize the maximum lateness
In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Comments