skip to main content
10.1145/2567634.2567639acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

GLB: lifeline-based global load balancing library in x10

Authors Info & Claims
Published:16 February 2014Publication History

ABSTRACT

We present GLB, a programming model and an associated implementation that can handle a wide range of irregular parallel programming problems running over large-scale distributed systems. GLB is applicable both to problems that are easily load-balanced via static scheduling and to problems that are hard to statically load balance. GLB hides the intricate synchronizations (e.g., inter-node communication, initialization and startup, load balancing, termination and result collection) from the users. GLB internally uses a version of the lifeline graph based work-stealing algorithm proposed by Saraswat et al [25]. Users of GLB are simply required to write several pieces of sequential code that comply with the GLB interface. GLB then schedules and orchestrates the parallel execution of the code correctly and efficiently at scale.

We have applied GLB to two representative benchmarks: Betweenness Centrality (BC) and Unbalanced Tree Search (UTS). Among them, BC can be statically load-balanced whereas UTS cannot. In either case, GLB scales well -- achieving nearly linear speedup on different computer architectures (Power, Blue Gene/Q, and K) -- up to 16K cores.

References

  1. D. A. Bader, J. Feo, J. Gilbert, J. Kepner, D. Koester, E. Loh, K. Madduri, B. Mann, and T. Meuse. HPCS Scalable Synthetic Compact Applications#2: Graph Analysis. http://www.graphanalysis.org/benchmark/HPCS-SSCA2_Graph-Theory_v2.2.pdf%, 2007.Google ScholarGoogle Scholar
  2. J. E. Baldeschwieler, R. D. Blumofe, and E. A. Brewer. Atlas: an infrastructure for global computing. In EW 7: Proceedings of the 7th workshop on ACM SIGOPS European workshop, pages 165--172, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Batoukov and T. Sorevik. A Generic Parallel Branch and Bound Environment on a Network of Workstations. In HiPer '99: Proceedings of High Performance Computing on Hewlett-Packard Systems, pages 474--483, 1999.Google ScholarGoogle Scholar
  4. R. D. Blumofe and C. E. Leiserson. Scheduling Multithreaded Computations by Work Stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS), pages 356--368, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. D. Blumofe and P. A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In ATEC '97: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 10--10, Berkeley, CA, USA, 1997. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V. Saraswat, and T. Wen. Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 536--545, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, pages 10--10, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Dinan, D. B. Larkins, P. Sadayappan, S. Krishnamoorthy, and J. Nieplocha. Scalable work stealing. In SC '09, pages 1--11, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI'98, pages 212--223, Montreal, Quebec, Canada, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Grama and V. Kumar. State of the Art in Parallel Search Techniques for Discrete Optimization Problems. IEEE Trans. on Knowl. and Data Eng., 11(1):28--35, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Guo, R. Barik, R. Raman, and V. Sarkar. Work-First and Help-First Scheduling Policies for Async-Finish Task Parallelism. In Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. V. Kalé and S. Krishnan. CHARM++: A portable concurrent object oriented system based on C++. In OOPSLA'93, volume 28, pages 91--108, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Kumar, A. Y. Grama, and N. R. Vempaty. Scalable load balancing techniques for parallel computers. J. Parallel Distrib. Comput., 22(1):60--79, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. Skewtune: Mitigating skew in mapreduce applications. SIGMOD '12, pages 25--36, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Lea. A Java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA '00, pages 36--43, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Mellor-Crummey, L. Adhianto, G. Jin, M. Krentel, K. Murthy, W. Scherer, and C. Yang. Class II Submission to the HPC Challenge Award Competition Coarray Fortran 2.0, Nov. 2011.Google ScholarGoogle Scholar
  17. K. Morton, A. Friesen, M. Balazinska, and D. Grossman. Estimating the progress of mapreduce pipelines. In ICDE'10, pages 681--684. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Olivier and J. Prins. Scalable dynamic load balancing using UPC. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 123--131, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. OpenMP Architecture Review Board. OpenMP Application Program Interface, v3.0. May 2008.Google ScholarGoogle Scholar
  20. J. Prins, J. Huan, B. Pugh, C.-W. Tseng, and P. Sadayappan. UPC Implementation of an Unbalanced Tree Search Benchmark. Technical Report 03-034, University of North Carolina at Chapel Hill, October 2003.Google ScholarGoogle Scholar
  21. J. Reinders. Intel Threading Building Blocks. O'Reilly, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Saraswat, G. Almasi, G. Bikshandi, C. Cascaval, D. Cunningham, D. Grove, S. Kodali, I. Peshansky, and O. Tardieu. The Asynchronous Partitioned Global Address Space Model. In AMP'10: Proceedings of The First Workshop on Advances in Message Passing, June 2010.Google ScholarGoogle Scholar
  23. V. Saraswat, B. Bloom, I. Peshansky, O. Tardieu, and D. Grove. The X10 reference manual, v2.4. Sept. 2013.Google ScholarGoogle Scholar
  24. V. Saraswat and R. Jagadeesan. Concurrent clustered programming. In Concur'05, pages 353--367, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. A. Saraswat, P. Kambadur, S. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. PPoPP '11, pages 201--212, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. B. Sinha and L. V. Kalé. A load balancing strategy for prioritized execution of tasks. In IIPS'93: Proceedings of International Parallel Processing Symposium, pages 230--237, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Tardieu, B. Herta, D. Cunningham, D. Grove, P. Kambadur, V. A. Saraswat, A. Shinnar, M. Takeuchi, and M. Vaziri. X10 and APGAS at Petascale. In Proceedings of the 19th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '14. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. O. Tardieu, H. Wang, and H. Lin. A work-stealing scheduler for X10's task parallelism with suspension. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 267--276, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. V. van Nieuwpoort, T. Kielmann, and H. E. Bal. Efficient load balancing for wide-area divide-and-conquer applications. In PPoPP'01, pages 34--43, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Yang, K. Murthy, and J. Mellor-Crummey. Managing asynchronous operations in coarray fortran 2.0. In IPDPS'13, pages 1321--1332, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GLB: lifeline-based global load balancing library in x10

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PPAA '14: Proceedings of the first workshop on Parallel programming for analytics applications
            February 2014
            72 pages
            ISBN:9781450326544
            DOI:10.1145/2567634

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 February 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            PPAA '14 Paper Acceptance Rate6of7submissions,86%Overall Acceptance Rate6of7submissions,86%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader