skip to main content
10.1145/2966884.2966914acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Towards millions of communicating threads

Authors Info & Claims
Published:25 September 2016Publication History

ABSTRACT

We explore in this paper the advantages that accrue from avoiding the use of wildcards in MPI. We show that, with this change, one can efficiently support millions of concurrently communicating light-weight threads using send-receive communication.

References

  1. Graph 500. http://www.graph500.org/. {Online; accessed 13-May-2016}.Google ScholarGoogle Scholar
  2. TACC Stampede Cluster. http://www.xsede.org/resources/overview, 2016.Google ScholarGoogle Scholar
  3. The unbalanced tree search benchmark. https://sourceforge.net/projects/uts-benchmark/files/, 2016.Google ScholarGoogle Scholar
  4. A. Amer, H. Lu, Y. Wei, P. Balaji, and S. Matsuoka. MPI+threads: Runtime contention and remedies. ACM SIGPLAN Notices, 50(8):239--248, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115--144, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Toward efficient support for multithreaded MPI communication. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 120--129. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Fine-grained multithreading support for hybrid threaded MPI programming. International Journal of High Performance Computing Applications, 24(1):49--57, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Brooks, H.-V. Dang, N. Dryden, and M. Snir. PPL: an abstract runtime system for hybrid parallel programming. In Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, pages 2--9. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. G. Caglar, G. D. Benson, Q. Huang, and C.-W. Chu. USFMPI: a multi-threaded implementation of MPI for Linux clusters. In Fifteenth IASTED International Conference on Parallel and Distributed Computing and Systems, pages 674--680, 2003.Google ScholarGoogle Scholar
  11. S. Chatterjee, S. Tasιrlar, Z. Budimlic, V. Cave, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan. Integrating asynchronous task parallelism with MPI. In IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013, pages 712--725. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. D. Demaine. A threads-only mpi implementation for the development of parallel programs. In Proceedings of the 11th international symposium on high performance computing systems, pages 153--163. Citeseer, 1997.Google ScholarGoogle Scholar
  13. A. Denis. pioman: a pthread-based multithreaded communication engine. In Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on, pages 155--162. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Dhabaleswar. OSU Micro-Benchmarks 5.3. http://mvapich.cse.ohio-state.edu/benchmarks/, 2016. {Online; accessed 18-April-2016}.Google ScholarGoogle Scholar
  15. J. Dinan, S. Olivier, G. Sabin, J. Prins, P. Sadayappan, and C.-W. Tseng. Dynamic load balancing of unbalanced computations using message passing. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1--8. IEEE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Dongarra, D. Walker, E. Lusk, B. Knighten, M. Snir, A. Geist, S. Otto, R. Hempel, E. Lusk, W. Gropp, et al. MPI: a message-passing interface standard. International Journal of Supercomputer Applications and High Performance Computing, 8(3-4):165, 1994.Google ScholarGoogle Scholar
  17. G. Dózsa, S. Kumar, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, J. Ratterman, and R. Thakur. Enabling concurrent multithreaded MPI communication on multicore petascale systems. In Recent Advances in the Message Passing Interface, pages 11--20. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Flajslik, J. Dinan, and K. D. Underwood. Mitigating MPI message matching misery. In International Supercomputing Conference, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  19. F. García, A. Calderón, and J. Carretero. Mimpi: A multithread-safe implementation of mpi. In European Parallel Virtual Machine/Message Passing Interface UsersâĂŹ Group Meeting, pages 207--214. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Geist, W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, W. Saphir, T. Skjellum, and M. Snir. MPI-2: Extending the message-passing interface. In Euro-Par'96 Parallel Processing, pages 128--135. Springer, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Gropp and M. Snir. Programming for exascale computers. Computing in Science & Engineering, 15(6):27--35, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Gropp and R. Thakur. Issues in developing a thread-safe MPI implementation. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 12--21. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463--492, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. A. R. Hoare. Communicating sequential processes. Springer, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Huang, O. Lawlor, and L. V. Kale. Adaptive mpi. In International workshop on languages and compilers for parallel computing, pages 306--322. Springer, 2003.Google ScholarGoogle Scholar
  26. W. Huang, G. Santhanaraman, H.-W. Jin, Q. Gao, and D. K. Panda. Design of high performance MVAPICH2: MPI2 over InfiniBand. In Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 06), volume 1, pages 43--48. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Jose, S. Potluri, K. Tomko, and D. K. Panda. Designing scalable Graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In Supercomputing, pages 109--124. Springer, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  28. H. Kamal and A. Wagner. Fg-mpi: Fine-grain mpi for multicore and clusters. In Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, pages 1--8. IEEE, 2010.Google ScholarGoogle Scholar
  29. X. Li, D. G. Andersen, M. Kaminsky, and M. J. Freedman. Algorithmic improvements for fast concurrent cuckoo hashing. In Proceedings of the Ninth European Conference on Computer Systems, page 27. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Liu, J. Wu, and D. K. Panda. High performance RDMA-based MPI implementation over InfiniBand. International Journal of Parallel Programming, 32(3):167--198, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Lu, S. Seo, and P. Balaji. MPI+ ULT: Overlapping communication and computation with user-level threads. In High Performance Computing and Communications (HPCC), 2015 IEEE 17th International Conference on, pages 444--454. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Luo, D. K. Panda, K. Z. Ibrahim, and C. Iancu. Congestion avoidance on manycore high performance computing systems. In Proceedings of the 26th ACM international conference on Supercomputing, pages 121--132. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Machado, C. Lojewski, S. Abreu, and F.-J. Pfreundt. Unbalanced tree search on a manycore system using the GPI programming model. Computer Science-Research and Development, 26 (3-4): 229--236, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C.-W. Tseng. UTS: An unbalanced tree search benchmark. In Languages and Compilers for Parallel Computing, pages 235--250. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Message Passing Interface Forum. MPI 4.0 Standardization Effort, Point to Point Communication. https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/PtpWikiPage. {Online; accessed 6-May-2016}.Google ScholarGoogle Scholar
  36. NASA. NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html, 2016.Google ScholarGoogle Scholar
  37. C. Pheatt. Intel® threading building blocks. Journal of Computing Sciences in Colleges, 23(4):298--298, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. E. R. Rodrigues, P. O. A. Navaux, J. Panetta, and C. L. Mendes. A new technique for data privatization in user-level threads and its use in parallel applications. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC '10, pages 2149--2154, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Seo, A. Amer, P. Balaji, P. Beckman, C. Bordage, G. Bosilca, A. Brooks, A. CastellÃş, D. Genet, T. Herault, P. Jindal, L. V. Kale, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, and Y. Sun. Argobots: A lightweight low-level threading/tasking framework. Technical Report ANL/MCS-P5515-0116, 2016.Google ScholarGoogle Scholar
  40. M. Si, A. J. Peña, P. Balaji, M. Takagi, and Y. Ishikawa. MT-MPI: Multithreaded MPI for many-core environments. In Proceedings of the 28th ACM international conference on Supercomputing, pages 125--134. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Skjellum, B. Protopopov, and S. Hebert. A thread taxonomy for mpi. In MPI Developer's Conference, 1996. Proceedings., Second, pages 50--57. IEEE, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. T. Stark, R. F. Barrett, R. E. Grant, S. L. Olivier, K. T. Pedretti, and C. T. Vaughan. Early experiences co-scheduling work and communication tasks for hybrid MPI+X applications. In Proceedings of the 2014 Workshop on Exascale MPI, pages 9--19. IEEE Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. H. Tang, K. Shen, and T. Yang. Compile/run-time support for threaded mpi execution on multiprogrammed shared memory machines. In ACM SIGPLAN Notices, volume 34, pages 107--118. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. The Open MPI Project. Is Open MPI thread safe. shttps://www.open-mpi.org/faq/?category=supported-systems#thread-support, 2016. {Online; accessed 8-May-2016}.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
    September 2016
    225 pages
    ISBN:9781450342346
    DOI:10.1145/2966884

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 September 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate66of139submissions,47%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader