ABSTRACT
We explore in this paper the advantages that accrue from avoiding the use of wildcards in MPI. We show that, with this change, one can efficiently support millions of concurrently communicating light-weight threads using send-receive communication.
- Graph 500. http://www.graph500.org/. {Online; accessed 13-May-2016}.Google Scholar
- TACC Stampede Cluster. http://www.xsede.org/resources/overview, 2016.Google Scholar
- The unbalanced tree search benchmark. https://sourceforge.net/projects/uts-benchmark/files/, 2016.Google Scholar
- A. Amer, H. Lu, Y. Wei, P. Balaji, and S. Matsuoka. MPI+threads: Runtime contention and remedies. ACM SIGPLAN Notices, 50(8):239--248, 2015. Google ScholarDigital Library
- N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115--144, 2001.Google ScholarCross Ref
- P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Toward efficient support for multithreaded MPI communication. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 120--129. Springer, 2008. Google ScholarDigital Library
- P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Fine-grained multithreading support for hybrid threaded MPI programming. International Journal of High Performance Computing Applications, 24(1):49--57, 2010. Google ScholarDigital Library
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970. Google ScholarDigital Library
- A. Brooks, H.-V. Dang, N. Dryden, and M. Snir. PPL: an abstract runtime system for hybrid parallel programming. In Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, pages 2--9. ACM, 2015. Google ScholarDigital Library
- S. G. Caglar, G. D. Benson, Q. Huang, and C.-W. Chu. USFMPI: a multi-threaded implementation of MPI for Linux clusters. In Fifteenth IASTED International Conference on Parallel and Distributed Computing and Systems, pages 674--680, 2003.Google Scholar
- S. Chatterjee, S. Tasιrlar, Z. Budimlic, V. Cave, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan. Integrating asynchronous task parallelism with MPI. In IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013, pages 712--725. IEEE, 2013. Google ScholarDigital Library
- E. D. Demaine. A threads-only mpi implementation for the development of parallel programs. In Proceedings of the 11th international symposium on high performance computing systems, pages 153--163. Citeseer, 1997.Google Scholar
- A. Denis. pioman: a pthread-based multithreaded communication engine. In Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on, pages 155--162. IEEE, 2015. Google ScholarDigital Library
- P. Dhabaleswar. OSU Micro-Benchmarks 5.3. http://mvapich.cse.ohio-state.edu/benchmarks/, 2016. {Online; accessed 18-April-2016}.Google Scholar
- J. Dinan, S. Olivier, G. Sabin, J. Prins, P. Sadayappan, and C.-W. Tseng. Dynamic load balancing of unbalanced computations using message passing. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1--8. IEEE, 2007.Google ScholarCross Ref
- J. Dongarra, D. Walker, E. Lusk, B. Knighten, M. Snir, A. Geist, S. Otto, R. Hempel, E. Lusk, W. Gropp, et al. MPI: a message-passing interface standard. International Journal of Supercomputer Applications and High Performance Computing, 8(3-4):165, 1994.Google Scholar
- G. Dózsa, S. Kumar, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, J. Ratterman, and R. Thakur. Enabling concurrent multithreaded MPI communication on multicore petascale systems. In Recent Advances in the Message Passing Interface, pages 11--20. Springer, 2010. Google ScholarDigital Library
- M. Flajslik, J. Dinan, and K. D. Underwood. Mitigating MPI message matching misery. In International Supercomputing Conference, 2016.Google ScholarCross Ref
- F. García, A. Calderón, and J. Carretero. Mimpi: A multithread-safe implementation of mpi. In European Parallel Virtual Machine/Message Passing Interface UsersâĂŹ Group Meeting, pages 207--214. Springer, 1999. Google ScholarDigital Library
- A. Geist, W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, W. Saphir, T. Skjellum, and M. Snir. MPI-2: Extending the message-passing interface. In Euro-Par'96 Parallel Processing, pages 128--135. Springer, 1996. Google ScholarDigital Library
- W. Gropp and M. Snir. Programming for exascale computers. Computing in Science & Engineering, 15(6):27--35, 2013. Google ScholarDigital Library
- W. Gropp and R. Thakur. Issues in developing a thread-safe MPI implementation. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 12--21. Springer, 2006. Google ScholarDigital Library
- M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463--492, 1990. Google ScholarDigital Library
- C. A. R. Hoare. Communicating sequential processes. Springer, 1978. Google ScholarDigital Library
- C. Huang, O. Lawlor, and L. V. Kale. Adaptive mpi. In International workshop on languages and compilers for parallel computing, pages 306--322. Springer, 2003.Google Scholar
- W. Huang, G. Santhanaraman, H.-W. Jin, Q. Gao, and D. K. Panda. Design of high performance MVAPICH2: MPI2 over InfiniBand. In Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 06), volume 1, pages 43--48. IEEE, 2006. Google ScholarDigital Library
- J. Jose, S. Potluri, K. Tomko, and D. K. Panda. Designing scalable Graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In Supercomputing, pages 109--124. Springer, 2013.Google ScholarCross Ref
- H. Kamal and A. Wagner. Fg-mpi: Fine-grain mpi for multicore and clusters. In Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, pages 1--8. IEEE, 2010.Google Scholar
- X. Li, D. G. Andersen, M. Kaminsky, and M. J. Freedman. Algorithmic improvements for fast concurrent cuckoo hashing. In Proceedings of the Ninth European Conference on Computer Systems, page 27. ACM, 2014. Google ScholarDigital Library
- J. Liu, J. Wu, and D. K. Panda. High performance RDMA-based MPI implementation over InfiniBand. International Journal of Parallel Programming, 32(3):167--198, 2004. Google ScholarDigital Library
- H. Lu, S. Seo, and P. Balaji. MPI+ ULT: Overlapping communication and computation with user-level threads. In High Performance Computing and Communications (HPCC), 2015 IEEE 17th International Conference on, pages 444--454. IEEE, 2015. Google ScholarDigital Library
- M. Luo, D. K. Panda, K. Z. Ibrahim, and C. Iancu. Congestion avoidance on manycore high performance computing systems. In Proceedings of the 26th ACM international conference on Supercomputing, pages 121--132. ACM, 2012. Google ScholarDigital Library
- R. Machado, C. Lojewski, S. Abreu, and F.-J. Pfreundt. Unbalanced tree search on a manycore system using the GPI programming model. Computer Science-Research and Development, 26 (3-4): 229--236, 2011. Google ScholarDigital Library
- S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C.-W. Tseng. UTS: An unbalanced tree search benchmark. In Languages and Compilers for Parallel Computing, pages 235--250. Springer, 2006. Google ScholarDigital Library
- Message Passing Interface Forum. MPI 4.0 Standardization Effort, Point to Point Communication. https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/PtpWikiPage. {Online; accessed 6-May-2016}.Google Scholar
- NASA. NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html, 2016.Google Scholar
- C. Pheatt. Intel® threading building blocks. Journal of Computing Sciences in Colleges, 23(4):298--298, 2008. Google ScholarDigital Library
- E. R. Rodrigues, P. O. A. Navaux, J. Panetta, and C. L. Mendes. A new technique for data privatization in user-level threads and its use in parallel applications. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC '10, pages 2149--2154, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- S. Seo, A. Amer, P. Balaji, P. Beckman, C. Bordage, G. Bosilca, A. Brooks, A. CastellÃş, D. Genet, T. Herault, P. Jindal, L. V. Kale, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, and Y. Sun. Argobots: A lightweight low-level threading/tasking framework. Technical Report ANL/MCS-P5515-0116, 2016.Google Scholar
- M. Si, A. J. Peña, P. Balaji, M. Takagi, and Y. Ishikawa. MT-MPI: Multithreaded MPI for many-core environments. In Proceedings of the 28th ACM international conference on Supercomputing, pages 125--134. ACM, 2014. Google ScholarDigital Library
- A. Skjellum, B. Protopopov, and S. Hebert. A thread taxonomy for mpi. In MPI Developer's Conference, 1996. Proceedings., Second, pages 50--57. IEEE, 1996. Google ScholarDigital Library
- D. T. Stark, R. F. Barrett, R. E. Grant, S. L. Olivier, K. T. Pedretti, and C. T. Vaughan. Early experiences co-scheduling work and communication tasks for hybrid MPI+X applications. In Proceedings of the 2014 Workshop on Exascale MPI, pages 9--19. IEEE Press, 2014. Google ScholarDigital Library
- H. Tang, K. Shen, and T. Yang. Compile/run-time support for threaded mpi execution on multiprogrammed shared memory machines. In ACM SIGPLAN Notices, volume 34, pages 107--118. ACM, 1999. Google ScholarDigital Library
- The Open MPI Project. Is Open MPI thread safe. shttps://www.open-mpi.org/faq/?category=supported-systems#thread-support, 2016. {Online; accessed 8-May-2016}.Google Scholar
Recommendations
Frustrated With MPI+Threads? Try MPIxThreads!
EuroMPI '23: Proceedings of the 30th European MPI Users' Group MeetingMPI + Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP provides ...
Eliminating contention bottlenecks in multithreaded MPI
The performance sustains with many thousands of concurrently communicating threads.A constant time overhead algorithm for MPI point-to-point communication.A thread scheduler that achieves a single write for marking a thread as runnable.A new set of ...
Extending TP—monitors for intra-transaction parallelism
DIS '96: Proceedings of the fourth international conference on on Parallel and distributed information systemsInter-transaction parallelism, the concurrent execution of independent client transactions, is currently well supported by database systems. Intra-transaction parallelism, the parallel execution of operations within the same transaction, is generally ...
Comments