research-article

Towards millions of communicating threads

Authors:
Hoang-Vu Dang

Department of Computer Science, University of Illinois at Urbana-Champaign

Department of Computer Science, University of Illinois at Urbana-Champaign
View Profile

,
Marc Snir

Department of Computer Science, University of Illinois at Urbana-Champaign

Department of Computer Science, University of Illinois at Urbana-Champaign
View Profile

,
William Gropp

Department of Computer Science, University of Illinois at Urbana-Champaign

Department of Computer Science, University of Illinois at Urbana-Champaign
View Profile

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group MeetingSeptember 2016Pages 1–14https://doi.org/10.1145/2966884.2966914

Published:25 September 2016Publication History

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

Pages 1–14

ABSTRACT

We explore in this paper the advantages that accrue from avoiding the use of wildcards in MPI. We show that, with this change, one can efficiently support millions of concurrently communicating light-weight threads using send-receive communication.

References

Graph 500. http://www.graph500.org/. {Online; accessed 13-May-2016}.Google Scholar
TACC Stampede Cluster. http://www.xsede.org/resources/overview, 2016.Google Scholar
The unbalanced tree search benchmark. https://sourceforge.net/projects/uts-benchmark/files/, 2016.Google Scholar
A. Amer, H. Lu, Y. Wei, P. Balaji, and S. Matsuoka. MPI+threads: Runtime contention and remedies. ACM SIGPLAN Notices, 50(8):239--248, 2015. Google ScholarDigital Library
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115--144, 2001.Google ScholarCross Ref
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Toward efficient support for multithreaded MPI communication. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 120--129. Springer, 2008. Google ScholarDigital Library
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Fine-grained multithreading support for hybrid threaded MPI programming. International Journal of High Performance Computing Applications, 24(1):49--57, 2010. Google ScholarDigital Library
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970. Google ScholarDigital Library
A. Brooks, H.-V. Dang, N. Dryden, and M. Snir. PPL: an abstract runtime system for hybrid parallel programming. In Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, pages 2--9. ACM, 2015. Google ScholarDigital Library
S. G. Caglar, G. D. Benson, Q. Huang, and C.-W. Chu. USFMPI: a multi-threaded implementation of MPI for Linux clusters. In Fifteenth IASTED International Conference on Parallel and Distributed Computing and Systems, pages 674--680, 2003.Google Scholar
S. Chatterjee, S. Tasιrlar, Z. Budimlic, V. Cave, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan. Integrating asynchronous task parallelism with MPI. In IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013, pages 712--725. IEEE, 2013. Google ScholarDigital Library
E. D. Demaine. A threads-only mpi implementation for the development of parallel programs. In Proceedings of the 11th international symposium on high performance computing systems, pages 153--163. Citeseer, 1997.Google Scholar
A. Denis. pioman: a pthread-based multithreaded communication engine. In Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on, pages 155--162. IEEE, 2015. Google ScholarDigital Library
P. Dhabaleswar. OSU Micro-Benchmarks 5.3. http://mvapich.cse.ohio-state.edu/benchmarks/, 2016. {Online; accessed 18-April-2016}.Google Scholar
J. Dinan, S. Olivier, G. Sabin, J. Prins, P. Sadayappan, and C.-W. Tseng. Dynamic load balancing of unbalanced computations using message passing. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1--8. IEEE, 2007.Google ScholarCross Ref
J. Dongarra, D. Walker, E. Lusk, B. Knighten, M. Snir, A. Geist, S. Otto, R. Hempel, E. Lusk, W. Gropp, et al. MPI: a message-passing interface standard. International Journal of Supercomputer Applications and High Performance Computing, 8(3-4):165, 1994.Google Scholar
G. Dózsa, S. Kumar, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, J. Ratterman, and R. Thakur. Enabling concurrent multithreaded MPI communication on multicore petascale systems. In Recent Advances in the Message Passing Interface, pages 11--20. Springer, 2010. Google ScholarDigital Library
M. Flajslik, J. Dinan, and K. D. Underwood. Mitigating MPI message matching misery. In International Supercomputing Conference, 2016.Google ScholarCross Ref
F. García, A. Calderón, and J. Carretero. Mimpi: A multithread-safe implementation of mpi. In European Parallel Virtual Machine/Message Passing Interface UsersâĂ&Zacute; Group Meeting, pages 207--214. Springer, 1999. Google ScholarDigital Library
A. Geist, W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, W. Saphir, T. Skjellum, and M. Snir. MPI-2: Extending the message-passing interface. In Euro-Par'96 Parallel Processing, pages 128--135. Springer, 1996. Google ScholarDigital Library
W. Gropp and M. Snir. Programming for exascale computers. Computing in Science & Engineering, 15(6):27--35, 2013. Google ScholarDigital Library
W. Gropp and R. Thakur. Issues in developing a thread-safe MPI implementation. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 12--21. Springer, 2006. Google ScholarDigital Library
M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(3):463--492, 1990. Google ScholarDigital Library
C. A. R. Hoare. Communicating sequential processes. Springer, 1978. Google ScholarDigital Library
C. Huang, O. Lawlor, and L. V. Kale. Adaptive mpi. In International workshop on languages and compilers for parallel computing, pages 306--322. Springer, 2003.Google Scholar
W. Huang, G. Santhanaraman, H.-W. Jin, Q. Gao, and D. K. Panda. Design of high performance MVAPICH2: MPI2 over InfiniBand. In Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 06), volume 1, pages 43--48. IEEE, 2006. Google ScholarDigital Library
J. Jose, S. Potluri, K. Tomko, and D. K. Panda. Designing scalable Graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In Supercomputing, pages 109--124. Springer, 2013.Google ScholarCross Ref
H. Kamal and A. Wagner. Fg-mpi: Fine-grain mpi for multicore and clusters. In Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, pages 1--8. IEEE, 2010.Google Scholar
X. Li, D. G. Andersen, M. Kaminsky, and M. J. Freedman. Algorithmic improvements for fast concurrent cuckoo hashing. In Proceedings of the Ninth European Conference on Computer Systems, page 27. ACM, 2014. Google ScholarDigital Library
J. Liu, J. Wu, and D. K. Panda. High performance RDMA-based MPI implementation over InfiniBand. International Journal of Parallel Programming, 32(3):167--198, 2004. Google ScholarDigital Library
H. Lu, S. Seo, and P. Balaji. MPI+ ULT: Overlapping communication and computation with user-level threads. In High Performance Computing and Communications (HPCC), 2015 IEEE 17th International Conference on, pages 444--454. IEEE, 2015. Google ScholarDigital Library
M. Luo, D. K. Panda, K. Z. Ibrahim, and C. Iancu. Congestion avoidance on manycore high performance computing systems. In Proceedings of the 26th ACM international conference on Supercomputing, pages 121--132. ACM, 2012. Google ScholarDigital Library
R. Machado, C. Lojewski, S. Abreu, and F.-J. Pfreundt. Unbalanced tree search on a manycore system using the GPI programming model. Computer Science-Research and Development, 26 (3-4): 229--236, 2011. Google ScholarDigital Library
S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C.-W. Tseng. UTS: An unbalanced tree search benchmark. In Languages and Compilers for Parallel Computing, pages 235--250. Springer, 2006. Google ScholarDigital Library
Message Passing Interface Forum. MPI 4.0 Standardization Effort, Point to Point Communication. https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/PtpWikiPage. {Online; accessed 6-May-2016}.Google Scholar
NASA. NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html, 2016.Google Scholar
C. Pheatt. Intel® threading building blocks. Journal of Computing Sciences in Colleges, 23(4):298--298, 2008. Google ScholarDigital Library
E. R. Rodrigues, P. O. A. Navaux, J. Panetta, and C. L. Mendes. A new technique for data privatization in user-level threads and its use in parallel applications. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC '10, pages 2149--2154, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
S. Seo, A. Amer, P. Balaji, P. Beckman, C. Bordage, G. Bosilca, A. Brooks, A. CastellÃş, D. Genet, T. Herault, P. Jindal, L. V. Kale, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, and Y. Sun. Argobots: A lightweight low-level threading/tasking framework. Technical Report ANL/MCS-P5515-0116, 2016.Google Scholar
M. Si, A. J. Peña, P. Balaji, M. Takagi, and Y. Ishikawa. MT-MPI: Multithreaded MPI for many-core environments. In Proceedings of the 28th ACM international conference on Supercomputing, pages 125--134. ACM, 2014. Google ScholarDigital Library
A. Skjellum, B. Protopopov, and S. Hebert. A thread taxonomy for mpi. In MPI Developer's Conference, 1996. Proceedings., Second, pages 50--57. IEEE, 1996. Google ScholarDigital Library
D. T. Stark, R. F. Barrett, R. E. Grant, S. L. Olivier, K. T. Pedretti, and C. T. Vaughan. Early experiences co-scheduling work and communication tasks for hybrid MPI+X applications. In Proceedings of the 2014 Workshop on Exascale MPI, pages 9--19. IEEE Press, 2014. Google ScholarDigital Library
H. Tang, K. Shen, and T. Yang. Compile/run-time support for threaded mpi execution on multiprogrammed shared memory machines. In ACM SIGPLAN Notices, volume 34, pages 107--118. ACM, 1999. Google ScholarDigital Library
The Open MPI Project. Is Open MPI thread safe. shttps://www.open-mpi.org/faq/?category=supported-systems#thread-support, 2016. {Online; accessed 8-May-2016}.Google Scholar

Recommendations

Frustrated With MPI+Threads? Try MPIxThreads!
EuroMPI '23: Proceedings of the 30th European MPI Users' Group Meeting

MPI + Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP provides ...
Read More
Eliminating contention bottlenecks in multithreaded MPI

The performance sustains with many thousands of concurrently communicating threads.A constant time overhead algorithm for MPI point-to-point communication.A thread scheduler that achieves a single write for marking a thread as runnable.A new set of ...
Read More
Extending TP—monitors for intra-transaction parallelism
DIS '96: Proceedings of the fourth international conference on on Parallel and distributed information systems

Inter-transaction parallelism, the concurrent execution of independent client transactions, is currently well supported by database systems. Intra-transaction parallelism, the parallel execution of operations within the same transaction, is generally ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
September 2016
225 pages
ISBN:9781450342346
DOI:10.1145/2966884
General Chair:
Jack Dongarra,
Program Chairs:
Daniel Holmes,
Antonia Collis,
Jesper Larsson Träff,
Lorna Smith
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 September 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MPI
Message Passing Interface
communication
concurrent execution
multi-threading
runtime system
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate66of139submissions,47%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 179
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards millions of communicating threads

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Recommendations

Frustrated With MPI+Threads? Try MPIxThreads!

Eliminating contention bottlenecks in multithreaded MPI

Extending TP—monitors for intra-transaction parallelism

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards millions of communicating threads

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Recommendations

Frustrated With MPI+Threads? Try MPIxThreads!

Eliminating contention bottlenecks in multithreaded MPI

Extending TP—monitors for intra-transaction parallelism

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media