MagPIe: MPI's collective communication operations for clustered wide area systems

Authors:
Thilo Kielmann

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands
View Profile

,
Rutger F. H. Hofman

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands
View Profile

,
Henri E. Bal

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands
View Profile

,
Aske Plaat

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands
View Profile

,
Raoul A. F. Bhoedjang

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands

Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands
View Profile

PPoPP '99: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programmingMay 1999Pages 131–140https://doi.org/10.1145/301104.301116

Published:01 May 1999Publication History

PPoPP '99: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 131–140

ABSTRACT

Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have developed MAGPIE, a library of collective communication operations optimized for wide area systems. MAGPIE's algorithms send the minimal amount of data over the slow wide area links, and only incur a single wide area latency. Using our system, existing MPI applications can be run unmodified on geographically distributed systems. On moderate cluster sizes, using a wide area latency of 10 milliseconds and a bandwidth of 1 MByte/s, MAGPIE executes operations up to 10 times faster than MPICH, a widely used MPI implementation; application kernels improve by up to a factor of 4. Due to the structure of our algorithms, MAGPIE's advantage increases for higher wide area latencies.

References

1.A. Alexandrov, M. E lonescu, K. E. Schauser, and C. Scheiman. LogGP: Incorporating Long Messages into the LogP Model -- One Step Closer Towards a Realistic Model for Parallel Computation. In Proc. Symposium on Parallel Algorithms and Architectures (SPAA), pages 95-105, Santa Barbara, CA, July 1995. Google ScholarDigital Library
2.H. Bal, R. Bhoedjang, R. Hofman, C. Jacobs, K. Langendoen, T. Rtihl, and E Kaashoek. Performance Evaluation of the Orca Shared Object System. ACM Transactions on Computer Systems, 16(1):1--40, 1998. Google ScholarDigital Library
3.H. Bal, A. Plaat, M. Bakker, P. Dozy, and R. Hofman. Optimizing Parallel Applications for Wide-Area Clusters. In IPPS-98 international Parallel Processing Symposium, pages 784--790, Apr. 1998. Google ScholarDigital Library
4.M. Banikazemi, V. Moorthy, and D. Panda. Efficient Collective Communication on Heterogeneous Networks of Workstations. In International Conference on Parallel Processing, pages 460-467, Minneapolis, MN, Aug. 1998. Google ScholarDigital Library
5.M. Bernaschi and G. lannello. Collective Communication Operations: Experimental Results vs. Theory. Concurrency: Practice and Experience, 10(5):359-386, April 1998.Google ScholarCross Ref
6.R. Bhoedjang, T. Riihl, and H. Bal. User-Level Network Interface Protocols. IEEE Computer, 31(11):53-60, 1998. Google ScholarDigital Library
7.N. Boden, D. Cohen, R. Felderman, A. Kulawik, C. Seitz, J. Seizovic, and W. Su. Myrinet: A Gigabit-per-second Local Area Network. IEEE Micro, 15(1):29-36, 1995. Google ScholarDigital Library
8.D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subrarnonian, and To von Eicken. LogP: Towards a Realistic Model of Parallel Computation. In Proc. Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 1-12, San Diego, CA, May 1993. Google ScholarDigital Library
9.A. Erlichson, N. Nuckolls, G. Chesson, and J. Hennessy. SoflFLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory. In Proc. 7th Intern. Conf. on Arch. Support for Prog. Lang. and Oper Systems, pages 210-220, Oct. 1996. Google ScholarDigital Library
10.G. E. Fagg, J. J. Dongarra, and A. Geist. Heterogeneous MPI Application Interoperation and Process Management under PVMPI. In Proc. 4th European PVM/MPI Users' Group Meeting, number 1332 in LNCS, pages 91-98, Cracow, Poland, 1997. Google ScholarDigital Library
11.G.E. Fagg, K. S. London, and J. J. Dongarra. MPl_Connect: Managing Heterogeneous MPI Applications Interoperation and Process Control. In Proc. 5th European PVM/MPI Users' Group Meeting, number 1497 in LNCS, pages 93-96, Liverpool, UK, 1998. Google ScholarDigital Library
12.G.E. Fagg, K. Moore, J. J. Dongarra, and A. Geist. Scalable Network Information Processing Environment (SNIPE). In SC'97, Nov. 1997. Online at http://www.supercomp.org/sc97/proceedings/. Google ScholarDigital Library
13.I. Foster, J. Geisler, W. Gropp, N. Karonis, E. Lusk, G. Thiruvathukal, and S. Tuecke. Wide-Area Implementation of the Message Passing Interface. Parallel Computing, 24(12-13):1735-1749, 1998. Google ScholarDigital Library
14.I. Foster and N. Karonis. A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems. In SC'98, Orlando, FL, Nov. 1998. Online at http://www.supercomp.org/ sc98/proc#edings/. Google ScholarDigital Library
15.I. Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. Int. Journal of Supercomputer Applications, 11(2):115-128, 1997.Google ScholarDigital Library
16.I. Foster and C. Kesselman, editors. The GRID: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1998. Google ScholarDigital Library
17.E. Gabriel, M. Resch, T. Beisel, and R. Keller. Distributed Computing in a Heterogeneous Computing Environment. In Proc. 5th European PVM/MPI Users" Group Meeting, number 1497 in LNCS, pages 180- 187, Liverpool, UK, 1998. Google ScholarDigital Library
18.A. Grimshaw and W. A. Wulf. The Legion Vision of a Worldwide Virtual Computer. Comm. ACM, 40(1):39--45, Jan. 1997. Google ScholarDigital Library
19.W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing, 22(6):789-828, 1996. Google ScholarDigital Library
20.P. Husbands and J. C. Hoe. MPI-StarT: Delivering Network Performance to Numerical Applications. In SC'98, Nov. 1998. Online at http://www.supercomp.org/sc98/proceedings/. Google ScholarDigital Library
21.G. lann~llo, M. Lauria, and S. Mercolino. Cross-Platform Analysis of Fast Messages for Myrinet. In Proc. Workshop CANPC'98, number 1362 in Lecture Notes in Computer Science, pages 217-231, Las Vegas, Nevada, January 1998. Springer. Google ScholarDigital Library
22.R. M. Karp, A. Sahay, E. E. Santos, and K. E. Schauser. Optimal Broadcast and Summation in the LogP model. In Proc. Symposium on Parallel A lgorithms and Architectures (SPAA ), pages 142-153, Velen, Germany, June 1993. Google ScholarDigital Library
23.G. Karypis and V. Kumar. A Coarse-Grained Parallel Formulation of a Multilevel k-way Graph Partitioning Algorithm. In Proc. Eighth SlAM Conference on Parallel Processing for Scientific Computing, 1997.Google Scholar
24.T. Kielmann, R. E H. Hofman, H. E. Bal, A. Plaat, and R. A. E Bhoedjang. MPI's Reduction Operations in Clustered Wide Area Systems. In Proc. MPIDC'99, Message Passing Interface Developer's and User's Conference, Atlanta, GA, March 1999.Google Scholar
25.L. Kontothanassis, G. Hunt, R. Stets, N. Hardavellas, M. Cierniak, S. Parthasarathy, W. Meira, S. Dwarkadas, and M. Scott. VM- Based Shared Memory on Low-Latency, Remote-Memory-Access Networks. In ISCA-24, Proc. 24th Annual International Symposium on Computer Architecture, pages 157-169, June 1997. Google ScholarDigital Library
26.J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In 24th #nn. Int. Symp. on Computer Architecture, pages 241-251, June 1997. Google ScholarDigital Library
27.B. Lowekamp and A. Beguelin. ECO: Efficient Collective Operations for Communication on Heterogeneous Networks. In International Parallel Processing Symposium, pages 399-405, Honolulu, HI, 1996. Google ScholarDigital Library
28.S. Lumetta, A. Mainwaring, and D. Culler. Multi-Protocol Active Messages on a Cluster of SMP's. In SC'97, Nov. 1997. Online at http://www.supercomp.org/sc97/proceedings/. Google ScholarDigital Library
29.Message Passing Interface Forum. MPI: A Message Passing Interface Standard. International Journal of Supercomputing Applications, 8(3/4), 1994.Google Scholar
30.J.-Y. L. Park, H.-A. Choi, N. Nupairoj, and L. M. Ni. Construction of Optimal Multicast Trees Based on the Parameterized Communication Model. In Proc. Int. Conference on Parallel Processing (ICPP), volume i, pages 180-187, 1996.Google Scholar
31.The PARMETIS Graph Partitioning Library. Online at http://wwwusers.cs.umn.edu/-karypis/metis/parmetis/main.shtml, 1997.Google Scholar
32.A. Plaat, H. Bal, and R. Hofman. Sensitivity of Parallel Applications to Large Differences in Bandwidth and Latency in Two-Layer Interconnects. In High Performance Computer Architecture HPCA-5, pages 244-253, Orlando, FL, Jan. 1999. Google ScholarDigital Library
33.A. Reinefeld, J. Gehring, and M. Brune. Communicating Across Parallel Message-Passing Environments. Journal of Systems Architecture, 44:261-272, 1998. Google ScholarDigital Library
34.D. j. Scales, K. Gharachorloo, and A. Aggatwal. Fine-Grain Software Distributed Shared Memory on SMP dusters, in HPCA-4 High- Performance Computer Architecture, pages 125-137, Feb. 1998. Google ScholarDigital Library
35.V. Soundararajan, M. Heinrieh, B. Verghcse, K. Gharachorloo, A. Gupta, and J. Hermessy. Flexible Use of Memory for Replication/Migration in Cache-Coherent DSM Multiprocessors. In ISCA- 98, 25th International Symposium on Computer Architecture, pages 342-355, June 1998. Google ScholarDigital Library
36.R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, and M. Scott. Cashmerc-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network. In Proc. 16th ACM Syrup. on Oper. Systems Princ., pages 170--183, Oct. 1997. Google ScholarDigital Library
37.R. Wolski. Dynamically Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service. In 6th High-Performance Distributed Computing, Aug. 1997. The network weather service is at http://nws.npaci.edu/. Google ScholarDigital Library

Index Terms

MagPIe: MPI's collective communication operations for clustered wide area systems
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management

Recommendations

MagPIe: MPI's collective communication operations for clustered wide area systems

Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective ...
Read More
The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600: Performances

This paper compares the performance and scalability of SHMEM and MPI-2 one-sided routines on different communication patterns for a SGI Origin 2000 and a Cray T3E-600. The communication tests were chosen to represent commonly used communication patterns ...
Read More
SPMD OpenMP versus MPI on a IBM SMP for 3 Kernels of the NAS Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance Computing

Shared Memory Multiprocessors are becoming more popular since they are used to deploy large parallel computers. The current trend is to enlarge the number of processors inside such multiprocessor nodes. However a lot of existing applications are using ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PPoPP '99: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
May 1999
192 pages
ISBN:1581131003
DOI:10.1145/301104
Chairmen:
Marc Snir
IBM T. J. Watson Research Center, Yorktown Heights, NY
,
Andrew A. Chien
Univ. of California, San Diego, San Diego
ACM SIGPLAN Notices Volume 34, Issue 8
Aug. 1999
192 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/329366
Editor:
A. Michael Burman
Rowan Univ., Glassboro, NJ
Issue’s Table of Contents
Copyright © 1999 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1999
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
PPoPP '99 Paper Acceptance Rate17of79submissions,22%Overall Acceptance Rate230of1,014submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 219
  Total Citations
  View Citations
- 1,067
  Total Downloads
- Downloads (Last 12 months)142
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MagPIe: MPI's collective communication operations for clustered wide area systems

PPoPP '99: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming

ABSTRACT

References

Cited By

Index Terms

Recommendations

MagPIe: MPI's collective communication operations for clustered wide area systems

The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600: Performances

SPMD OpenMP versus MPI on a IBM SMP for 3 Kernels of the NAS Benchmarks