Skip to main content
Top
Published in: International Journal of Parallel Programming 2/2018

06-01-2017

Scheduling Parallel Computations by Work Stealing: A Survey

Authors: Jixiang Yang, Qingbi He

Published in: International Journal of Parallel Programming | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Work stealing has been proven to be an efficient technique for scheduling parallel computations, and has been gaining popularity as the multiprocessor/multicore-processor load balancing technology of choice in both industry and academia. A review on the work stealing scheduling is provided from the perspective of scheduling algorithms, optimization of algorithm implementation and processor architecture oriented optimization. The future research trends and recommendations driven by theory, emerging applications and motifs, architecture and heterogeneous platforms are also provided.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.L.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)CrossRef Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.L.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)CrossRef
2.
go back to reference Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI’98), pp. 212–223. ACM, New York, NY, USA, June 16–19 (1998) Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI’98), pp. 212–223. ACM, New York, NY, USA, June 16–19 (1998)
4.
go back to reference Burton, F.W., Sleep, M.R.: Executing functional programs on a virtual tree of processors. In: Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture (FPCA’81), pp. 187–194. ACM, New York, NY, USA, October 18–22 (1981) Burton, F.W., Sleep, M.R.: Executing functional programs on a virtual tree of processors. In: Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture (FPCA’81), pp. 187–194. ACM, New York, NY, USA, October 18–22 (1981)
5.
go back to reference Halstead, R.H.: Implementation of multilisp: Lisp on a multiprocessor. In: Proceedings of the 1984 ACM Symposium on LISP and Functional Programming (LFP’84), pp. 9–17. ACM, New York, NY, USA, August 6–8 (1984) Halstead, R.H.: Implementation of multilisp: Lisp on a multiprocessor. In: Proceedings of the 1984 ACM Symposium on LISP and Functional Programming (LFP’84), pp. 9–17. ACM, New York, NY, USA, August 6–8 (1984)
6.
go back to reference Mohr, E., Kranz, D.A., Halstead, J.R.H.: Lazy task creation: a technique for increasing the granularity of parallel programs. In: Proceedings of the 1990 ACM Conference on LISP and Functional Programming (LFP’90), pp. 185–197. ACM, New York, NY, USA, June 27–29 (1990) Mohr, E., Kranz, D.A., Halstead, J.R.H.: Lazy task creation: a technique for increasing the granularity of parallel programs. In: Proceedings of the 1990 ACM Conference on LISP and Functional Programming (LFP’90), pp. 185–197. ACM, New York, NY, USA, June 27–29 (1990)
7.
go back to reference Vrba, Ž., Espeland, H., Halvorsen, P., Griwodz, C.: Limits of work-stealing scheduling. In: Proceedings of the 14th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP’09), pp. 280–299. Springer, Berlin, Germany, May 29 (2009) Vrba, Ž., Espeland, H., Halvorsen, P., Griwodz, C.: Limits of work-stealing scheduling. In: Proceedings of the 14th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP’09), pp. 280–299. Springer, Berlin, Germany, May 29 (2009)
8.
go back to reference Squillante, M.S., Nelson, R.D.: Analysis of task migration in shared-memory multiprocessor scheduling. In: Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’91), New York, NY, USA, May 21–24 (1991) Squillante, M.S., Nelson, R.D.: Analysis of task migration in shared-memory multiprocessor scheduling. In: Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’91), New York, NY, USA, May 21–24 (1991)
9.
go back to reference Mitzenmacher, M.: Analyses of load stealing models based on differential equations. In: Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’98), pp. 212–221. ACM, New York, NY, USA, June 28–July 2 (1998) Mitzenmacher, M.: Analyses of load stealing models based on differential equations. In: Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’98), pp. 212–221. ACM, New York, NY, USA, June 28–July 2 (1998)
10.
go back to reference Rudolph, L., Slivkin-Allalouf, M., Upfal, E.: A simple load balancing scheme for task allocation in parallel machines. In: Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’91), pp. 237–245. New York, NY, USA, July 21–24 (1991) Rudolph, L., Slivkin-Allalouf, M., Upfal, E.: A simple load balancing scheme for task allocation in parallel machines. In: Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’91), pp. 237–245. New York, NY, USA, July 21–24 (1991)
11.
go back to reference Eager, D.L., Lazowska, E.D., Zahorjan, J.: A comparison of receiver-initiated and sender-initiated adaptive load sharing. Perform. Eval. 6(1), 53–68 (1986)CrossRef Eager, D.L., Lazowska, E.D., Zahorjan, J.: A comparison of receiver-initiated and sender-initiated adaptive load sharing. Perform. Eval. 6(1), 53–68 (1986)CrossRef
12.
go back to reference Mirchandaney, R., Towsley, D., Stankovic, J.A.: Analysis of the effects of delays on load sharing. IEEE Trans. Comput. 38(11), 1513–1525 (1989)CrossRef Mirchandaney, R., Towsley, D., Stankovic, J.A.: Analysis of the effects of delays on load sharing. IEEE Trans. Comput. 38(11), 1513–1525 (1989)CrossRef
13.
go back to reference Squillante, M.S., Lazowska, E.D.: Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 4(2), 131–143 (1993)CrossRef Squillante, M.S., Lazowska, E.D.: Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 4(2), 131–143 (1993)CrossRef
14.
go back to reference Squillante, M.S., Xia, C.H., Yao, D.D., Zhang, L.: Threshold-based priority policies for parallel-server systems with affinity scheduling. In: Proceedings of the 2001 American Control Conference (ACC’01), pp. 2992–2999. IEEE, New York, NY, USA, June 25–27 (2001) Squillante, M.S., Xia, C.H., Yao, D.D., Zhang, L.: Threshold-based priority policies for parallel-server systems with affinity scheduling. In: Proceedings of the 2001 American Control Conference (ACC’01), pp. 2992–2999. IEEE, New York, NY, USA, June 25–27 (2001)
15.
16.
go back to reference Narang, A., Shyamasundar, R.K.: Performance driven distributed scheduling of parallel hybrid computations. Theor. Comput. Sci. 412(32), 4212–4225 (2011)MathSciNetCrossRefMATH Narang, A., Shyamasundar, R.K.: Performance driven distributed scheduling of parallel hybrid computations. Theor. Comput. Sci. 412(32), 4212–4225 (2011)MathSciNetCrossRefMATH
17.
go back to reference Suksompong, W., Leiserson, C.E., Schardl, T.B.: On the efficiency of localized work stealing. Inf. Process. Lett. 116(2), 100–106 (2016)MathSciNetCrossRefMATH Suksompong, W., Leiserson, C.E., Schardl, T.B.: On the efficiency of localized work stealing. Inf. Process. Lett. 116(2), 100–106 (2016)MathSciNetCrossRefMATH
18.
go back to reference Mirchandaney, R., Towsley, D., Stankovic, J.A.: Adaptive load sharing in heterogeneous distributed systems. J. Parallel Distrib. Comput. 9(4), 331–346 (1990)CrossRef Mirchandaney, R., Towsley, D., Stankovic, J.A.: Adaptive load sharing in heterogeneous distributed systems. J. Parallel Distrib. Comput. 9(4), 331–346 (1990)CrossRef
19.
go back to reference Bender, M.A., Rabin, M.O.: Online scheduling of parallel programs on heterogeneous systems with applications to Cilk. Theory Comput. Syst. 35(3), 289 (2002)MathSciNetCrossRefMATH Bender, M.A., Rabin, M.O.: Online scheduling of parallel programs on heterogeneous systems with applications to Cilk. Theory Comput. Syst. 35(3), 289 (2002)MathSciNetCrossRefMATH
20.
go back to reference Gast, N., Bruno, G.: A mean field model of work stealing in large-scale systems. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’10), pp. 13–24. ACM, New York, NY, USA, June 14–18 (2010) Gast, N., Bruno, G.: A mean field model of work stealing in large-scale systems. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’10), pp. 13–24. ACM, New York, NY, USA, June 14–18 (2010)
21.
go back to reference Blelloch, G.E., Gibbons, P.B., Matias, Y.: Provably efficient scheduling for languages with fine-grained parallelism. J. ACM 46(2), 281–321 (1999)MathSciNetCrossRefMATH Blelloch, G.E., Gibbons, P.B., Matias, Y.: Provably efficient scheduling for languages with fine-grained parallelism. J. ACM 46(2), 281–321 (1999)MathSciNetCrossRefMATH
22.
go back to reference Fatourou, P., Spirakis, P.: Efficient scheduling of strict multithreaded computations. Theory Comput. Syst. 33(3), 173–232 (2000)MathSciNetCrossRefMATH Fatourou, P., Spirakis, P.: Efficient scheduling of strict multithreaded computations. Theory Comput. Syst. 33(3), 173–232 (2000)MathSciNetCrossRefMATH
23.
go back to reference Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The natural work-stealing algorithm is stable. SIAM J. Comput. 32(5), 1260–1279 (2003)MathSciNetCrossRefMATH Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The natural work-stealing algorithm is stable. SIAM J. Comput. 32(5), 1260–1279 (2003)MathSciNetCrossRefMATH
24.
go back to reference Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. Theory Comput. Syst. 34(2), 115–144 (2001)MathSciNetCrossRefMATH Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. Theory Comput. Syst. 34(2), 115–144 (2001)MathSciNetCrossRefMATH
26.
go back to reference Tchiboukdjian, M., Gast, N., Trystram, D., Roch, J., Bernard, J.: A tighter analysis of work stealing. In: Proceedings of the 21st International Symposium on Algorithms and Computation (ISAAC’10), pp. 291–302. Springer, Berlin, Germany, December 15–17 (2010) Tchiboukdjian, M., Gast, N., Trystram, D., Roch, J., Bernard, J.: A tighter analysis of work stealing. In: Proceedings of the 21st International Symposium on Algorithms and Computation (ISAAC’10), pp. 291–302. Springer, Berlin, Germany, December 15–17 (2010)
27.
go back to reference Agrawal, K., Leiserson, C.E., Sukha, J.: Helper locks for fork-join parallel programming. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 245–256. ACM, New York, NY, USA, January 9–14 (2010) Agrawal, K., Leiserson, C.E., Sukha, J.: Helper locks for fork-join parallel programming. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 245–256. ACM, New York, NY, USA, January 9–14 (2010)
28.
go back to reference Cole, R., Ramachandran, V.: Analysis of randomized work stealing with false sharing. In: Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS’13), pp. 985–998. IEEE, Los Alamitos, CA, USA, May 20–24 (2013) Cole, R., Ramachandran, V.: Analysis of randomized work stealing with false sharing. In: Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS’13), pp. 985–998. IEEE, Los Alamitos, CA, USA, May 20–24 (2013)
29.
go back to reference Agrawal, K., Fineman, J.T., Sheridan, B., Sukha, J., Utterback, R.: Provably good scheduling for parallel programs that use data structures through implicit batching. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architecture (SPAA’14), pp. 84–95. ACM, New York, NY, USA, June 23–25 (2014) Agrawal, K., Fineman, J.T., Sheridan, B., Sukha, J., Utterback, R.: Provably good scheduling for parallel programs that use data structures through implicit batching. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architecture (SPAA’14), pp. 84–95. ACM, New York, NY, USA, June 23–25 (2014)
30.
go back to reference Sanchez, D., Yoo, R.M., Kozyrakis, C.: Flexible architectural support for fine-grain scheduling. In: Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10), pp. 311–322. ACM, New York, NY, USA, March 13–17 (2010) Sanchez, D., Yoo, R.M., Kozyrakis, C.: Flexible architectural support for fine-grain scheduling. In: Proceedings of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10), pp. 311–322. ACM, New York, NY, USA, March 13–17 (2010)
31.
go back to reference Kulkarni, M., Carribault, P., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Scheduling strategies for optimistic parallel execution of irregular programs. In: Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’08), pp. 217–228. ACM, New York, NY, USA, June 14–16 (2008) Kulkarni, M., Carribault, P., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P.: Scheduling strategies for optimistic parallel execution of irregular programs. In: Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’08), pp. 217–228. ACM, New York, NY, USA, June 14–16 (2008)
32.
go back to reference Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41(7), 33–38 (2008)CrossRef Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41(7), 33–38 (2008)CrossRef
33.
go back to reference Chen, S., Gibbons, P.B., Kozuch, M., Liaskovitis, V., Ailamaki, A., Blelloch, G.E., Falsafi, B., Fix, L., Hardavellas, N., Mowry, T.C., Wilkerson, C.: Scheduling threads for constructive cache sharing on CMPs. In: Proceedings of the 19th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’07), pp. 105–115. ACM, New York, NY, USA, June 9–11 (2007) Chen, S., Gibbons, P.B., Kozuch, M., Liaskovitis, V., Ailamaki, A., Blelloch, G.E., Falsafi, B., Fix, L., Hardavellas, N., Mowry, T.C., Wilkerson, C.: Scheduling threads for constructive cache sharing on CMPs. In: Proceedings of the 19th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’07), pp. 105–115. ACM, New York, NY, USA, June 9–11 (2007)
34.
go back to reference Faxén, K.F.: Efficient work stealing for fine grained parallelism. In: Proceedings of the 39th International Conference on Parallel Processing (ICPP’10), pp. 313–322. IEEE, Los Alamitos, CA, USA, September 13–16 (2010) Faxén, K.F.: Efficient work stealing for fine grained parallelism. In: Proceedings of the 39th International Conference on Parallel Processing (ICPP’10), pp. 313–322. IEEE, Los Alamitos, CA, USA, September 13–16 (2010)
35.
go back to reference Mohr, E., Kranz, D.A., Halstead Jr., R.H.: Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2(3), 264–280 (1991)CrossRef Mohr, E., Kranz, D.A., Halstead Jr., R.H.: Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2(3), 264–280 (1991)CrossRef
36.
go back to reference Loidl, H.W., Hammond, K.: On the granularity of divide-and-conquer parallelism. In: Proceedings of the 1995 Glasgow Workshop on Functional Programming (FP’95), pp. 8–10. Springer, Berlin, Germany, July 10–12 (1995) Loidl, H.W., Hammond, K.: On the granularity of divide-and-conquer parallelism. In: Proceedings of the 1995 Glasgow Workshop on Functional Programming (FP’95), pp. 8–10. Springer, Berlin, Germany, July 10–12 (1995)
37.
go back to reference Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Proceedings of the 4th International Workshop on OpenMP in New Era of Parallelism (IWOMP’08), pp. 100–110. Springer, Berlin, Germany, May 12–14 (2008) Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Proceedings of the 4th International Workshop on OpenMP in New Era of Parallelism (IWOMP’08), pp. 100–110. Springer, Berlin, Germany, May 12–14 (2008)
38.
go back to reference Cong, G., Kodali, S., Krishnamoorthy, S., Lea, D., Saraswat, V., Wen, T.: Solving large, irregular graph problems using adaptive work-stealing. In: Proceedings of the 37th International Conference on Parallel Processing (ICPP’08), pp. 536–545. IEEE, Washington, DC, USA, September 9–12 (2008) Cong, G., Kodali, S., Krishnamoorthy, S., Lea, D., Saraswat, V., Wen, T.: Solving large, irregular graph problems using adaptive work-stealing. In: Proceedings of the 37th International Conference on Parallel Processing (ICPP’08), pp. 536–545. IEEE, Washington, DC, USA, September 9–12 (2008)
39.
go back to reference Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08), pp. 339–349. IEEE, Piscataway, NJ, USA, November 15–21 (2008) Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08), pp. 339–349. IEEE, Piscataway, NJ, USA, November 15–21 (2008)
40.
go back to reference Acar, U.A., Charguéraud, A., Rainey, M.: Oracle scheduling: controlling granularity in implicitly parallel languages. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Ap-plications (OOPSLA’11), pp. 499–518. ACM, New York, NY, USA, October 22–27 (2011) Acar, U.A., Charguéraud, A., Rainey, M.: Oracle scheduling: controlling granularity in implicitly parallel languages. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Ap-plications (OOPSLA’11), pp. 499–518. ACM, New York, NY, USA, October 22–27 (2011)
41.
go back to reference Wang, L., Cui, H., Duan, Y., Lu, F., Feng, X., Yew, P.: An adaptive task creation strategy for work-stealing scheduling. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’10). ACM, New York, NY, USA, April 24–28 (2010) Wang, L., Cui, H., Duan, Y., Lu, F., Feng, X., Yew, P.: An adaptive task creation strategy for work-stealing scheduling. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’10). ACM, New York, NY, USA, April 24–28 (2010)
42.
go back to reference Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 179–190. ACM, New York, NY, USA, January 9–14 (2010) Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 179–190. ACM, New York, NY, USA, January 9–14 (2010)
43.
go back to reference Cao, Q., Hu, C.J., Li, S.G., He, H.H.: Adaptive task granularity strategy for OpenMP3.0 task model on cell architecture. In: Proceedings of International Conferences on High Performance Networking, Computing, Communication Systems, and Mathematical Foundations (ICHCC 2011), pp. 393–400. Springer, Berlin, Germany, May 5–6 (2011) Cao, Q., Hu, C.J., Li, S.G., He, H.H.: Adaptive task granularity strategy for OpenMP3.0 task model on cell architecture. In: Proceedings of International Conferences on High Performance Networking, Computing, Communication Systems, and Mathematical Foundations (ICHCC 2011), pp. 393–400. Springer, Berlin, Germany, May 5–6 (2011)
44.
go back to reference Hoffmann, R., Rauber, T.: Fine-grained task scheduling using adaptive data structures. In: Proceedings of the 14th International European Conference on Parallel Processing (Euro-Par’08), pp. 253–262. Springer, Berlin, Germany, August 26–29 (2008) Hoffmann, R., Rauber, T.: Fine-grained task scheduling using adaptive data structures. In: Proceedings of the 14th International European Conference on Parallel Processing (Euro-Par’08), pp. 253–262. Springer, Berlin, Germany, August 26–29 (2008)
45.
go back to reference Lee, I.A., Boyd-Wickizer, S., Huang, Z., Leiserson, C.E.: Using memory mapping to support cactus stacks in work-stealing runtime systems. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10), pp. 411–420. ACM, New York, NY, USA, September 11–15 (2010) Lee, I.A., Boyd-Wickizer, S., Huang, Z., Leiserson, C.E.: Using memory mapping to support cactus stacks in work-stealing runtime systems. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10), pp. 411–420. ACM, New York, NY, USA, September 11–15 (2010)
46.
go back to reference Zhao, J.S., Shirako, J., Nandivada, V.K., Sarkar, V.: Reducing task creation and termination overhead in explicitly parallel programs. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10), pp. 169–180. ACM, New York, NY, USA, September 11–15 (2010) Zhao, J.S., Shirako, J., Nandivada, V.K., Sarkar, V.: Reducing task creation and termination overhead in explicitly parallel programs. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10), pp. 169–180. ACM, New York, NY, USA, September 11–15 (2010)
47.
go back to reference Robison, A., Voss, M., Kukanov, A.: Optimization via reflection on work stealing in TBB. In: Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08), pp. 1–8. IEEE, New York, NY, USA, April 14–18 (2008) Robison, A., Voss, M., Kukanov, A.: Optimization via reflection on work stealing in TBB. In: Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’08), pp. 1–8. IEEE, New York, NY, USA, April 14–18 (2008)
48.
go back to reference Chen, Q., Huang, Z., Guo, M., Zhou, J.: CAB: cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: Proceedings of the 2011 International Conference on Parallel Processing (ICPP’11), pp. 722–732. IEEE, New York, NY, USA, September 13–16 (2011) Chen, Q., Huang, Z., Guo, M., Zhou, J.: CAB: cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: Proceedings of the 2011 International Conference on Parallel Processing (ICPP’11), pp. 722–732. IEEE, New York, NY, USA, September 13–16 (2011)
49.
go back to reference Chen, Q., Guo, M.Y., Huang, Z.Y.: Adaptive cache aware bitier work-stealing in multisocket multicore architectures. IEEE Trans. Parallel Distrib. Syst. 24(12), 2334–2343 (2013)CrossRef Chen, Q., Guo, M.Y., Huang, Z.Y.: Adaptive cache aware bitier work-stealing in multisocket multicore architectures. IEEE Trans. Parallel Distrib. Syst. 24(12), 2334–2343 (2013)CrossRef
50.
go back to reference Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Prins, J.F.: Scheduling task parallelism on multi-socket multicore systems. In: Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers (ROSS’11), pp. 49–56. ACM, New York, NY, USA, May 31 (2011) Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Prins, J.F.: Scheduling task parallelism on multi-socket multicore systems. In: Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers (ROSS’11), pp. 49–56. ACM, New York, NY, USA, May 31 (2011)
51.
go back to reference Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012)CrossRef Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012)CrossRef
52.
go back to reference Chen, Q., Guo, M.: Locality-aware work stealing based on online profiling and auto-tuning for multisocket multicore architectures. ACM Trans. Arch. Code Optim. 12(222), 1–24 (2015) Chen, Q., Guo, M.: Locality-aware work stealing based on online profiling and auto-tuning for multisocket multicore architectures. ACM Trans. Arch. Code Optim. 12(222), 1–24 (2015)
53.
go back to reference Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 179–189. ACM, New York, NY, USA, January 9–14 (2010) Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 179–189. ACM, New York, NY, USA, January 9–14 (2010)
54.
go back to reference Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’05), pp. 21–28. ACM, New York, NY, USA, July 18–20 (2005) Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’05), pp. 21–28. ACM, New York, NY, USA, July 18–20 (2005)
55.
go back to reference Lê, N.M., Pop, A., Cohen, A., Nardelli, F.Z.: Correct and efficient work-stealing for weak memory models. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13), pp. 69–79. ACM, New York, NY, USA, February 23–27 (2013) Lê, N.M., Pop, A., Cohen, A., Nardelli, F.Z.: Correct and efficient work-stealing for weak memory models. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13), pp. 69–79. ACM, New York, NY, USA, February 23–27 (2013)
56.
go back to reference Traoré, D., Roch, J., Maillard, N., Gautier, T., Bernard, J.: Deque-free work-optimal parallel STL algorithms. In: Proceedings of the 14th International European Conference on Parallel Processing (Euro-Par’08), pp. 887–897. Springer, Berlin, Germany, August 26–29 (2008) Traoré, D., Roch, J., Maillard, N., Gautier, T., Bernard, J.: Deque-free work-optimal parallel STL algorithms. In: Proceedings of the 14th International European Conference on Parallel Processing (Euro-Par’08), pp. 887–897. Springer, Berlin, Germany, August 26–29 (2008)
57.
go back to reference Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC’09), pp. 1–11. ACM, New York, NY, USA, November 14–20 (2009) Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC’09), pp. 1–11. ACM, New York, NY, USA, November 14–20 (2009)
58.
go back to reference Michael, M.M., Vechev, M.T., Saraswat, V.A.: Idempotent work stealing. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’09), pp. 45–53. ACM, New York, NY, USA, February 14–18 (2009) Michael, M.M., Vechev, M.T., Saraswat, V.A.: Idempotent work stealing. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’09), pp. 45–53. ACM, New York, NY, USA, February 14–18 (2009)
59.
go back to reference Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: architectural support for fine-grained parallelism on Chip multiprocessors. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, Conference (ISCA’07), pp. 162–173. ACM, New York, NY, USA, June 9–13 (2007) Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: architectural support for fine-grained parallelism on Chip multiprocessors. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, Conference (ISCA’07), pp. 162–173. ACM, New York, NY, USA, June 9–13 (2007)
60.
go back to reference Quintin, J., Wagner, F.: Hierarchical work-stealing. In: Proceedings of the 16th International European Conference on Parallel Processing (Euro-Par’10). Springer, Berlin, Germany, August 31–September 3 (2010) Quintin, J., Wagner, F.: Hierarchical work-stealing. In: Proceedings of the 16th International European Conference on Parallel Processing (Euro-Par’10). Springer, Berlin, Germany, August 31–September 3 (2010)
61.
go back to reference Wang, Y., Zhang, Y., Su, Y., Wang, X., Chen, X., Ji, W., Shi, F.: An adaptive and hierarchical task scheduling scheme for multi-core clusters. Parallel Comput. 40(10), 611–627 (2014)CrossRef Wang, Y., Zhang, Y., Su, Y., Wang, X., Chen, X., Ji, W., Shi, F.: An adaptive and hierarchical task scheduling scheme for multi-core clusters. Parallel Comput. 40(10), 611–627 (2014)CrossRef
62.
go back to reference Tsai, Y.C., Huang, Y.C.: A generalized framework for parallelizing traffic sign inventory of video log images using multicore processors. Comput. Aided Civ. Infrastruct. Eng. 27(7SI), 476–493 (2012)CrossRef Tsai, Y.C., Huang, Y.C.: A generalized framework for parallelizing traffic sign inventory of video log images using multicore processors. Comput. Aided Civ. Infrastruct. Eng. 27(7SI), 476–493 (2012)CrossRef
63.
go back to reference van Dijk, T., van de Pol, J.C.: Lace: non-blocking split deque for work-stealing. In: Proceedings of the 20th International European Conference on Parallel Processing (Euro-Par’14), pp. 206–217. Springer, Berlin, Germany, August 25–26 (2014) van Dijk, T., van de Pol, J.C.: Lace: non-blocking split deque for work-stealing. In: Proceedings of the 20th International European Conference on Parallel Processing (Euro-Par’14), pp. 206–217. Springer, Berlin, Germany, August 25–26 (2014)
64.
go back to reference Hendler, D., Lev, Y., Shavit, N.: Dynamic memory ABP work-stealing. In: Proceedings of the 18th International Conference on Distributed Computing (DISC 2004), pp. 188–200. Springer, Berlin, Germany, October 4–8 (2004) Hendler, D., Lev, Y., Shavit, N.: Dynamic memory ABP work-stealing. In: Proceedings of the 18th International Conference on Distributed Computing (DISC 2004), pp. 188–200. Springer, Berlin, Germany, October 4–8 (2004)
65.
go back to reference Guo, Y., Zhao, J.Z., Cave, V., Sarkar, V.: SLAW: A scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 341–342. ACM, New York, NY, USA, January 09–14 (2010) Guo, Y., Zhao, J.Z., Cave, V., Sarkar, V.: SLAW: A scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp. 341–342. ACM, New York, NY, USA, January 09–14 (2010)
66.
go back to reference Guo, Y., Barik, R., Raman, R., Sarkar, V.: Work-first and help-first scheduling policies for async-finish task parallelism. In: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS’09), pp. 1–12. IEEE, New York, NY, USA, May 23–29 (2009) Guo, Y., Barik, R., Raman, R., Sarkar, V.: Work-first and help-first scheduling policies for async-finish task parallelism. In: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS’09), pp. 1–12. IEEE, New York, NY, USA, May 23–29 (2009)
67.
go back to reference Paudel, J., Tardieu, O., Amaral, J.N.: On the merits of distributed work-stealing on selective locality-aware tasks. In: Proceedings of the 2013 42nd International Conference on Parallel Processing (ICPP’13), pp. 100–109. IEEE, New York, NY, USA, October 1–4 (2013) Paudel, J., Tardieu, O., Amaral, J.N.: On the merits of distributed work-stealing on selective locality-aware tasks. In: Proceedings of the 2013 42nd International Conference on Parallel Processing (ICPP’13), pp. 100–109. IEEE, New York, NY, USA, October 1–4 (2013)
68.
go back to reference Cao, Y., Sun, H.Y., Qian, D.P., Wu, W.G.: Stable adaptive work-stealing for concurrent multicore runtime systems. In: Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications (HPCC’11). IEEE, New York, NY, USA, September 2–4 (2011) Cao, Y., Sun, H.Y., Qian, D.P., Wu, W.G.: Stable adaptive work-stealing for concurrent multicore runtime systems. In: Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications (HPCC’11). IEEE, New York, NY, USA, September 2–4 (2011)
69.
go back to reference Adnan, Sato, M.: Efficient work-stealing strategies for fine-grain task parallelism. In: Proceedings of the 2011 IEEE International Parallel and Distributed Processing Symposium (IPDPS’11). IEEE, Los Alamitos, CA, USA, May 16–22 (2011) Adnan, Sato, M.: Efficient work-stealing strategies for fine-grain task parallelism. In: Proceedings of the 2011 IEEE International Parallel and Distributed Processing Symposium (IPDPS’11). IEEE, Los Alamitos, CA, USA, May 16–22 (2011)
70.
go back to reference Acar, U.A., Chargueraud, A., Rainey, M.: Scheduling parallel programs by work stealing with private deques. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13), pp. 219–228. ACM, New York, NY, USA, February 23–27 (2013) Acar, U.A., Chargueraud, A., Rainey, M.: Scheduling parallel programs by work stealing with private deques. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13), pp. 219–228. ACM, New York, NY, USA, February 23–27 (2013)
71.
go back to reference Herlihy, M., Liu, Z.: Well-structured futures and cache locality. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14), pp. 155–166. ACM, New York, NY, USA, February 15–19 (2014) Herlihy, M., Liu, Z.: Well-structured futures and cache locality. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14), pp. 155–166. ACM, New York, NY, USA, February 15–19 (2014)
72.
go back to reference Fohry, C., Bungart, M., Posner, J.: Fault tolerance schemes for global load balancing in X10. Scalable Comput. Pract. Exp. 16(2SI), 169–185 (2015) Fohry, C., Bungart, M., Posner, J.: Fault tolerance schemes for global load balancing in X10. Scalable Comput. Pract. Exp. 16(2SI), 169–185 (2015)
73.
go back to reference Tardieu, O., Wang, H., Lin, H.: A work-stealing scheduler for X10’s task parallelism with suspension. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), pp. 267–276. ACM, New York, NY, USA, February 25–29 (2012) Tardieu, O., Wang, H., Lin, H.: A work-stealing scheduler for X10’s task parallelism with suspension. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), pp. 267–276. ACM, New York, NY, USA, February 25–29 (2012)
74.
go back to reference Tardieu, O., Herta, B., Cunningham, D., Grove, D., Kambadur, P., Saraswat, V., Shinnar, A., Takeuchi, M., Vaziri, M., Zhang, W.: X10 and APGAS at Petascale. ACM Trans. Parallel Comput. 2(4), 1–32 (2015)CrossRef Tardieu, O., Herta, B., Cunningham, D., Grove, D., Kambadur, P., Saraswat, V., Shinnar, A., Takeuchi, M., Vaziri, M., Zhang, W.: X10 and APGAS at Petascale. ACM Trans. Parallel Comput. 2(4), 1–32 (2015)CrossRef
75.
go back to reference Agarwal, S., Barik, R., Bonachea, D., Sarkar, V., Shymasundar, R.K., Yelick, K.: Deadlock-free scheduling of X10 computations with bounded resources. In: Proceedings of the 19th Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’07), pp. 229–240. ACM, New York, NY, USA, June 9–11 (2007) Agarwal, S., Barik, R., Bonachea, D., Sarkar, V., Shymasundar, R.K., Yelick, K.: Deadlock-free scheduling of X10 computations with bounded resources. In: Proceedings of the 19th Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’07), pp. 229–240. ACM, New York, NY, USA, June 9–11 (2007)
76.
go back to reference Pezzi, G.P., Cera, M.C., Mathias, E., Maillard, N., Navaux, P.O.A.: Online scheduling of MPI-2 programs with hierarchical work stealing. In: Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’07), pp. 247–254. IEEE, Los Alamitos, CA, USA, October 24–27 (2007) Pezzi, G.P., Cera, M.C., Mathias, E., Maillard, N., Navaux, P.O.A.: Online scheduling of MPI-2 programs with hierarchical work stealing. In: Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’07), pp. 247–254. IEEE, Los Alamitos, CA, USA, October 24–27 (2007)
77.
go back to reference Saraswat, V.A., Kambadur, P., Kodali, S., Grove, D., Krishnamoorthy, S.: Lifeline-based global load balancing. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM, New York, NY, USA, February 12–16 (2011) Saraswat, V.A., Kambadur, P., Kodali, S., Grove, D., Krishnamoorthy, S.: Lifeline-based global load balancing. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM, New York, NY, USA, February 12–16 (2011)
78.
go back to reference Ravichandran, K., Sangho, L., Pande, S.: Work stealing for multi-core HPC clusters. In: Proceedings of the 17th International European Conference on Parallel Processing (Euro-Par’11), pp. 205–217. Springer, Berlin, Germany, August 29–September 2 (2011) Ravichandran, K., Sangho, L., Pande, S.: Work stealing for multi-core HPC clusters. In: Proceedings of the 17th International European Conference on Parallel Processing (Euro-Par’11), pp. 205–217. Springer, Berlin, Germany, August 29–September 2 (2011)
79.
go back to reference Cao, Y.J., Qian, D.P., Wu, W.G., Dong, X.S.: Adaptive scheduling algorithm based on dynamic core-resource partitions for manycore processor systems. Ruanjian Xuebao J. Softw. 23(02), 240–252 (2012)CrossRef Cao, Y.J., Qian, D.P., Wu, W.G., Dong, X.S.: Adaptive scheduling algorithm based on dynamic core-resource partitions for manycore processor systems. Ruanjian Xuebao J. Softw. 23(02), 240–252 (2012)CrossRef
80.
go back to reference Li, Z., Certner, O., Duato, J., Temam, O.: Scalable hardware support for conditional parallelization. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10), pp. 157–168. ACM, New York, NY, USA, September 11–15 (2010) Li, Z., Certner, O., Duato, J., Temam, O.: Scalable hardware support for conditional parallelization. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10), pp. 157–168. ACM, New York, NY, USA, September 11–15 (2010)
81.
go back to reference Long, G.P., Zhang, J.C., Fan, D.R.: Architectural support and evaluation of Cilk language on many-core architectures. Chin. J. Comput. 31(11), 1975–1985 (2008)CrossRef Long, G.P., Zhang, J.C., Fan, D.R.: Architectural support and evaluation of Cilk language on many-core architectures. Chin. J. Comput. 31(11), 1975–1985 (2008)CrossRef
82.
go back to reference Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J.D., Lee, E.A., Morgan, N., Necula, G., Patterson, D.A., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.A.: The parallel computing laboratory at U.C. Berkeley: a research agenda based on the Berkeley view, Technical Report No. UCB/EECS-2008-23, Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley (2008) Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J.D., Lee, E.A., Morgan, N., Necula, G., Patterson, D.A., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.A.: The parallel computing laboratory at U.C. Berkeley: a research agenda based on the Berkeley view, Technical Report No. UCB/EECS-2008-23, Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley (2008)
Metadata
Title
Scheduling Parallel Computations by Work Stealing: A Survey
Authors
Jixiang Yang
Qingbi He
Publication date
06-01-2017
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 2/2018
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-016-0484-8

Other articles of this Issue 2/2018

International Journal of Parallel Programming 2/2018 Go to the issue

Premium Partner