ABSTRACT
Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have developed MAGPIE, a library of collective communication operations optimized for wide area systems. MAGPIE's algorithms send the minimal amount of data over the slow wide area links, and only incur a single wide area latency. Using our system, existing MPI applications can be run unmodified on geographically distributed systems. On moderate cluster sizes, using a wide area latency of 10 milliseconds and a bandwidth of 1 MByte/s, MAGPIE executes operations up to 10 times faster than MPICH, a widely used MPI implementation; application kernels improve by up to a factor of 4. Due to the structure of our algorithms, MAGPIE's advantage increases for higher wide area latencies.
- 1.A. Alexandrov, M. E lonescu, K. E. Schauser, and C. Scheiman. LogGP: Incorporating Long Messages into the LogP Model -- One Step Closer Towards a Realistic Model for Parallel Computation. In Proc. Symposium on Parallel Algorithms and Architectures (SPAA), pages 95-105, Santa Barbara, CA, July 1995. Google ScholarDigital Library
- 2.H. Bal, R. Bhoedjang, R. Hofman, C. Jacobs, K. Langendoen, T. Rtihl, and E Kaashoek. Performance Evaluation of the Orca Shared Object System. ACM Transactions on Computer Systems, 16(1):1--40, 1998. Google ScholarDigital Library
- 3.H. Bal, A. Plaat, M. Bakker, P. Dozy, and R. Hofman. Optimizing Parallel Applications for Wide-Area Clusters. In IPPS-98 international Parallel Processing Symposium, pages 784--790, Apr. 1998. Google ScholarDigital Library
- 4.M. Banikazemi, V. Moorthy, and D. Panda. Efficient Collective Communication on Heterogeneous Networks of Workstations. In International Conference on Parallel Processing, pages 460-467, Minneapolis, MN, Aug. 1998. Google ScholarDigital Library
- 5.M. Bernaschi and G. lannello. Collective Communication Operations: Experimental Results vs. Theory. Concurrency: Practice and Experience, 10(5):359-386, April 1998.Google ScholarCross Ref
- 6.R. Bhoedjang, T. Riihl, and H. Bal. User-Level Network Interface Protocols. IEEE Computer, 31(11):53-60, 1998. Google ScholarDigital Library
- 7.N. Boden, D. Cohen, R. Felderman, A. Kulawik, C. Seitz, J. Seizovic, and W. Su. Myrinet: A Gigabit-per-second Local Area Network. IEEE Micro, 15(1):29-36, 1995. Google ScholarDigital Library
- 8.D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subrarnonian, and To von Eicken. LogP: Towards a Realistic Model of Parallel Computation. In Proc. Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 1-12, San Diego, CA, May 1993. Google ScholarDigital Library
- 9.A. Erlichson, N. Nuckolls, G. Chesson, and J. Hennessy. SoflFLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory. In Proc. 7th Intern. Conf. on Arch. Support for Prog. Lang. and Oper Systems, pages 210-220, Oct. 1996. Google ScholarDigital Library
- 10.G. E. Fagg, J. J. Dongarra, and A. Geist. Heterogeneous MPI Application Interoperation and Process Management under PVMPI. In Proc. 4th European PVM/MPI Users' Group Meeting, number 1332 in LNCS, pages 91-98, Cracow, Poland, 1997. Google ScholarDigital Library
- 11.G.E. Fagg, K. S. London, and J. J. Dongarra. MPl_Connect: Managing Heterogeneous MPI Applications Interoperation and Process Control. In Proc. 5th European PVM/MPI Users' Group Meeting, number 1497 in LNCS, pages 93-96, Liverpool, UK, 1998. Google ScholarDigital Library
- 12.G.E. Fagg, K. Moore, J. J. Dongarra, and A. Geist. Scalable Network Information Processing Environment (SNIPE). In SC'97, Nov. 1997. Online at http://www.supercomp.org/sc97/proceedings/. Google ScholarDigital Library
- 13.I. Foster, J. Geisler, W. Gropp, N. Karonis, E. Lusk, G. Thiruvathukal, and S. Tuecke. Wide-Area Implementation of the Message Passing Interface. Parallel Computing, 24(12-13):1735-1749, 1998. Google ScholarDigital Library
- 14.I. Foster and N. Karonis. A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems. In SC'98, Orlando, FL, Nov. 1998. Online at http://www.supercomp.org/ sc98/proc#edings/. Google ScholarDigital Library
- 15.I. Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. Int. Journal of Supercomputer Applications, 11(2):115-128, 1997.Google ScholarDigital Library
- 16.I. Foster and C. Kesselman, editors. The GRID: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1998. Google ScholarDigital Library
- 17.E. Gabriel, M. Resch, T. Beisel, and R. Keller. Distributed Computing in a Heterogeneous Computing Environment. In Proc. 5th European PVM/MPI Users" Group Meeting, number 1497 in LNCS, pages 180- 187, Liverpool, UK, 1998. Google ScholarDigital Library
- 18.A. Grimshaw and W. A. Wulf. The Legion Vision of a Worldwide Virtual Computer. Comm. ACM, 40(1):39--45, Jan. 1997. Google ScholarDigital Library
- 19.W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing, 22(6):789-828, 1996. Google ScholarDigital Library
- 20.P. Husbands and J. C. Hoe. MPI-StarT: Delivering Network Performance to Numerical Applications. In SC'98, Nov. 1998. Online at http://www.supercomp.org/sc98/proceedings/. Google ScholarDigital Library
- 21.G. lann~llo, M. Lauria, and S. Mercolino. Cross-Platform Analysis of Fast Messages for Myrinet. In Proc. Workshop CANPC'98, number 1362 in Lecture Notes in Computer Science, pages 217-231, Las Vegas, Nevada, January 1998. Springer. Google ScholarDigital Library
- 22.R. M. Karp, A. Sahay, E. E. Santos, and K. E. Schauser. Optimal Broadcast and Summation in the LogP model. In Proc. Symposium on Parallel A lgorithms and Architectures (SPAA ), pages 142-153, Velen, Germany, June 1993. Google ScholarDigital Library
- 23.G. Karypis and V. Kumar. A Coarse-Grained Parallel Formulation of a Multilevel k-way Graph Partitioning Algorithm. In Proc. Eighth SlAM Conference on Parallel Processing for Scientific Computing, 1997.Google Scholar
- 24.T. Kielmann, R. E H. Hofman, H. E. Bal, A. Plaat, and R. A. E Bhoedjang. MPI's Reduction Operations in Clustered Wide Area Systems. In Proc. MPIDC'99, Message Passing Interface Developer's and User's Conference, Atlanta, GA, March 1999.Google Scholar
- 25.L. Kontothanassis, G. Hunt, R. Stets, N. Hardavellas, M. Cierniak, S. Parthasarathy, W. Meira, S. Dwarkadas, and M. Scott. VM- Based Shared Memory on Low-Latency, Remote-Memory-Access Networks. In ISCA-24, Proc. 24th Annual International Symposium on Computer Architecture, pages 157-169, June 1997. Google ScholarDigital Library
- 26.J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In 24th #nn. Int. Symp. on Computer Architecture, pages 241-251, June 1997. Google ScholarDigital Library
- 27.B. Lowekamp and A. Beguelin. ECO: Efficient Collective Operations for Communication on Heterogeneous Networks. In International Parallel Processing Symposium, pages 399-405, Honolulu, HI, 1996. Google ScholarDigital Library
- 28.S. Lumetta, A. Mainwaring, and D. Culler. Multi-Protocol Active Messages on a Cluster of SMP's. In SC'97, Nov. 1997. Online at http://www.supercomp.org/sc97/proceedings/. Google ScholarDigital Library
- 29.Message Passing Interface Forum. MPI: A Message Passing Interface Standard. International Journal of Supercomputing Applications, 8(3/4), 1994.Google Scholar
- 30.J.-Y. L. Park, H.-A. Choi, N. Nupairoj, and L. M. Ni. Construction of Optimal Multicast Trees Based on the Parameterized Communication Model. In Proc. Int. Conference on Parallel Processing (ICPP), volume i, pages 180-187, 1996.Google Scholar
- 31.The PARMETIS Graph Partitioning Library. Online at http://wwwusers.cs.umn.edu/-karypis/metis/parmetis/main.shtml, 1997.Google Scholar
- 32.A. Plaat, H. Bal, and R. Hofman. Sensitivity of Parallel Applications to Large Differences in Bandwidth and Latency in Two-Layer Interconnects. In High Performance Computer Architecture HPCA-5, pages 244-253, Orlando, FL, Jan. 1999. Google ScholarDigital Library
- 33.A. Reinefeld, J. Gehring, and M. Brune. Communicating Across Parallel Message-Passing Environments. Journal of Systems Architecture, 44:261-272, 1998. Google ScholarDigital Library
- 34.D. j. Scales, K. Gharachorloo, and A. Aggatwal. Fine-Grain Software Distributed Shared Memory on SMP dusters, in HPCA-4 High- Performance Computer Architecture, pages 125-137, Feb. 1998. Google ScholarDigital Library
- 35.V. Soundararajan, M. Heinrieh, B. Verghcse, K. Gharachorloo, A. Gupta, and J. Hermessy. Flexible Use of Memory for Replication/Migration in Cache-Coherent DSM Multiprocessors. In ISCA- 98, 25th International Symposium on Computer Architecture, pages 342-355, June 1998. Google ScholarDigital Library
- 36.R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, and M. Scott. Cashmerc-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network. In Proc. 16th ACM Syrup. on Oper. Systems Princ., pages 170--183, Oct. 1997. Google ScholarDigital Library
- 37.R. Wolski. Dynamically Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service. In 6th High-Performance Distributed Computing, Aug. 1997. The network weather service is at http://nws.npaci.edu/. Google ScholarDigital Library
Index Terms
- MagPIe: MPI's collective communication operations for clustered wide area systems
Recommendations
MagPIe: MPI's collective communication operations for clustered wide area systems
Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective ...
The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600: Performances
This paper compares the performance and scalability of SHMEM and MPI-2 one-sided routines on different communication patterns for a SGI Origin 2000 and a Cray T3E-600. The communication tests were chosen to represent commonly used communication patterns ...
SPMD OpenMP versus MPI on a IBM SMP for 3 Kernels of the NAS Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance ComputingShared Memory Multiprocessors are becoming more popular since they are used to deploy large parallel computers. The current trend is to enlarge the number of processors inside such multiprocessor nodes. However a lot of existing applications are using ...
Comments