ABSTRACT
Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MPI over InfiniBand which brings the benefit of RDMA to not only large messages, but also small and control messages. We also achieve better scalability by exploiting application communication pattern and combining send/receive operations with RDMA operations. Our RDMA-based MPI implementation currently delivers a latency of 6.8 microseconds for small messages and a peak bandwidth of 871 Million Bytes (831 Mega Bytes) per second. Performance evaluation at the MPI level shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22%. For large messages, we improve performance by reducing the time for transferring control messages. We have also shown that our new design is beneficial to MPI collective communication and NAS Parallel Benchmarks.
- F. J. Alfaro, J. L. Sanchez, J. Duato, and C. R. Das. A Strategy to Compute the Infiniband Arbitration Tables. In Int'l Parallel and Distributed Processing Symposium (IPDPS'02), April 2002. Google ScholarDigital Library
- M. Banikazemi, R. K. Govindaraju, R. Blackmore, and D. K. Panda. MPI-LAPI: An Efficeint Implementation of MPI for IBM RS/6000 SP Systems. IEEE Transactions on Parallel and Distributed Systems, pages 1081--1093, October 2001. Google ScholarDigital Library
- E. V. Carrera, S. Rao, L. Iftode, and R. Bianchini. User-level communication in cluster-based servers. In Proceedings of the Eighth Symposium on High-Performance Architecture (HPCA'02), Februry 2002. Google ScholarDigital Library
- D. E. Culler, R. M. Karp, D. A. Patterson, A. Shy, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. Logp: Towards realistic model of parallel computation. In Principles Practice of Parallel Programming, pages 1--12, 1993. Google ScholarDigital Library
- R. Dimitrov and A. Skjellum. An Efficient MPI Implementation for Virtual Interface (VI) Architecture-Enabled Cluster Computing. http://www.mpi-softtech.com/publications/, 1998.Google Scholar
- D. Dunning, G. Regnier, G. McAlpine, D. Cameron, B. Shubert, F. Berry, A. Merritt, E. Gronke, and C. Dodd. The Virtual Interface Architecture. IEEE Micro, pages 66--76, March/April 1998. Google ScholarDigital Library
- W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789--828, 1996. Google ScholarDigital Library
- H. Tezuk and F. O'Carroll and A. Hori and Y. Ishikawa. Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication. In Proceedings of 12th International Parallel Processing Symposium, 1998. Google ScholarDigital Library
- P. Husbnds and J. C. Hoe. MPI-StrT: Delivering Network Performance to Numerical Applications. In Proceedings of the Supercomputing, 1998. Google ScholarDigital Library
- InfiniBand Trade Association. InfiniBand Architecture Specification, Release 1.0, October 24, 2000.Google Scholar
- Lawrence Livermore National Laboratory. MVICH: MPI for Virtual Interface Architecture, August, 2001.Google Scholar
- J. Liu, J. Wu, S. P. Kinis, D. Buntins, W. Yu, B. Chandrasekaran, R. Noronha, P. Wyckoff, and D. K. Panda. MPI over InfiniBand: Early Experiences. Technical Report, OSU-CISRC-10/02-TR25, Computer and Information Science, the Ohio State University, January, 2003.Google Scholar
- K. Magoutis, S. Addetia, A. Fedorova, M. Seltzer, J. Chase, A. Gallatin, R. Kisley, R. Wickremesinghe, and E. Gabber. Structure and performance of the direct access file system. In Proceedings of USENIX 2002 Annual Technical Conference, Monterey, CA, pages 1--14, June, 2002. Google ScholarDigital Library
- R. Martin, A. Vahdat, D. Culler, and T. Anderson. Effects of Communication Latency, Overhead, and Bandwidth in Cluster Architecture. In Proceedings of the International Symposium on Computer Architecture, 1997. Google ScholarDigital Library
- Mellanox Technologies. Mellanox InfiniBand InfiniHost Adapters, July, 2002.Google Scholar
- NASA. NAS Parallel Benchmarks.Google Scholar
- Pallas. Pallas MPI Benchmarks. http://www.pallas.com/e/products/pmb/.Google Scholar
- R. Gupta, P. Balaji, D. K. Panda, and J. Nieplocha. Efficient Collective Operations using Remote Memory Operations on VIA-Based Clusters. In Int'l Parallel and Distriauted Processing Symposium (IPDPS'03), April, 2003. Google ScholarDigital Library
- S. J. Sistare and C. J. Jackson. Ultra-High Performance Communication with MPI and the Sun Fire Link Interconnect. In Proceedings of the Supercomputing, 2002. Google ScholarDigital Library
- J. S. Vetter and F. Mueller. Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures. In Int'l Parallel and Distributed Processing Symposium (IPDPS'02), April, 2002. Google ScholarDigital Library
- J. Wu, J. Liu, P. Wyckoff, and D. K. Panda. Impact of On-Demand Connection Management in MPI over VIA. In Proceedings of the IEEE International Conference on Cluster Computing, 2002. Google ScholarDigital Library
- Y. Zhou, A. Bilas, S. Jagannathan, C. Dubnicki, J. F. Philbin, and K. Li. Expericences with vi communication for database storage. In In Proceedings of International Symposium on Computer Architecture'02, 2002. Google ScholarDigital Library
Index Terms
- High performance RDMA-based MPI implementation over InfiniBand
Recommendations
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programmingMessage Passing Interface (MPI) is a popular parallel programming model for scientific applications. Most high-performance MPI implementations use Rendezvous Protocol for efficient transfer of large messages. This protocol can be designed using either ...
High performance RDMA-based MPI implementation over infiniBand
Special issue I: The 17th annual international conference on supercomputing (ICS'03)Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) ...
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters
ICS '07: Proceedings of the 21st annual international conference on SupercomputingHigh-performance clusters have been growing rapidly in scale. Most of these clusters deploy a high-speed interconnect, such as Infini-Band, to achieve higher performance. Most scientific applications executing on these clusters use the Message Passing ...
Comments