skip to main content
10.1145/2063384.2063486acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Avoiding hot-spots on two-level direct networks

Authors Info & Claims
Published:12 November 2011Publication History

ABSTRACT

A low-diameter, fast interconnection network is going to be a prerequisite for building exascale machines. A two-level direct network has been proposed by several groups as a scalable design for future machines. IBM's PERCS topology and the dragonfly network discussed in the DARPA exascale hardware study are examples of this design. The presence of multiple levels in this design leads to hot-spots on a few links when processes are grouped together at the lowest level to minimize total communication volume. This is especially true for communication graphs with a small number of neighbors per task. Routing and mapping choices can impact the communication performance of parallel applications running on a machine with a two-level direct topology. This paper explores intelligent topology aware mappings of different communication patterns to the physical topology to identify cases that minimize link utilization. We also analyze the trade-offs between using direct and indirect routing with different mappings. We use simulations to study communication and overall performance of applications since there are no installations of two-level direct networks yet. This study raises interesting issues regarding the choice of job scheduling, routing and mapping for future machines.

References

  1. T. Agarwal, A. Sharma, and L. V. Kalé. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2006, April 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS High-Performance Interconnect. In 2010 IEEE 18th Annual Symposium on High Performance Interconnects (HOTI), pages 75--82, August 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bernard, T. Burch, T. A. DeGrand, C. DeTar, S. Gottlieb, U. M. Heller, J. E. Hetrick, K. Orginos, B. Sugar, and D. Toussaint. Scaling tests of the improved Kogut-Susskind quark action. Physical Review D, (61), 2000.Google ScholarGoogle ScholarCross RefCross Ref
  4. G. Bhanot, A. Gara, P. Heidelberger, E. Lawless, J. C. Sexton, and R. Walkup. Optimizing task layout on the Blue Gene/L supercomputer. IBM Journal of Research and Development, 49(2/3):489--500, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bhatelé, E. Bohm, and L. V. Kalé. Optimizing communication for Charm++ applications by reducing network contention. Concurrency and Computation: Practice & Experience, 23(2):211--222, February 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Bhatele, S. Kumar, C. Mei, J. C. Phillips, G. Zheng, and L. V. Kale. Overcoming scaling challenges in biomolecular simulations across multiple platforms. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008, pages 1--12, April 2008.Google ScholarGoogle ScholarCross RefCross Ref
  7. F. Ercal and J. Ramanujam and P. Sadayappan. Task allocation onto a hypercube by recursive mincut bipartitioning. In Proceedings of the 3rd conference on Hypercube concurrent computers and applications, pages 210--221. ACM Press, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. G. Fitch, A. Rayshubskiy, M. Eleftheriou, T. J. C. Ward, M. Giampapa, and M. C. Pitman. Blue matter: Approaching the limits of concurrency for classical molecular dynamics. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Francine Berman and Lawrence Snyder. On mapping parallel algorithms into parallel architectures. Journal of Parallel and Distributed Computing, 4(5):439--458, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Gygi, E. W. Draeger, M. Schulz, B. R. D. Supinski, J. A. Gunnels, V. Austel, J. C. Sexton, F. Franchetti, S. Kral, C. Ueberhuber, and J. Lorenz. Large-Scale Electronic Structure Calculations of High-Z Metals on the Blue Gene/L Platform. In Proceedings of the International Conference in Supercomputing. ACM Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Hoefler and M. Snir. Generic topology mapping strategies for large-scale parallel architectures. In Proceedings of the international conference on Supercomputing, ICS '11, pages 75--84, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Huang, G. Zheng, S. Kumar, and L. V. Kalé. Performance Evaluation of Adaptive MPI. In Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2006, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Kalé and S. Krishnan. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In A. Paepcke, editor, Proceedings of OOPSLA'93, pages 91--108. ACM Press, September 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using projections performance analysis tool. In Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis, volume 22, pages 347--358, February 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-driven, highly-scalable dragonfly topology. SIGARCH Comput. Archit. News, 36:77--88, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Kogge, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R. S. Williams, and K. Yelick. Exascale computing study: Technology challenges in achieving exascale systems, 2008.Google ScholarGoogle Scholar
  17. Michalakes, J., J. Dudhia, D. Gill, T. Henderson, J. Klemp, W. Skamarock, and W. Wang. The Weather Research and Forecast Model: Software Architecture and Performance. In Proceedings of the 11th ECMWF Workshop on the Use of High Performance Computing In Meteorology, October 2004.Google ScholarGoogle Scholar
  18. S. Wayne Bollinger and Scott F. Midkiff. Processor and Link Assignment in Multicomputers Using Simulated Annealing. In ICPP (1), pages 1--7, 1988.Google ScholarGoogle Scholar
  19. Soo-Young Lee and J. K. Aggarwal. A Mapping Strategy for Parallel Processing. IEEE Trans. Computers, 36(4):433--442, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Yu, I.-H. Chung, and J. Moreira. Topology mapping for Blue Gene/L supercomputer. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 116, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Zheng, T. Wilmarth, P. Jagadishprasad, and L. V. Kalé. Simulation-based performance prediction for large parallel machines. In International Journal of Parallel Programming, volume 33, pages 183--207, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Avoiding hot-spots on two-level direct networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
          November 2011
          866 pages
          ISBN:9781450307710
          DOI:10.1145/2063384

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 November 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SC '11 Paper Acceptance Rate74of352submissions,21%Overall Acceptance Rate1,516of6,373submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader