We present an output-queued switch architecture with cross-point buffering that has improved performance for both point-to-point communication and hardware accelerated collective communication. In the past, output queuing architectures have been less popular as they require more internal speedup and buffering. However, with current technology it is possible to build output-queued switches with a relatively large number of ports. We demonstrate that our output-queued architecture performs well for point-to-point messages, specially in a fat-tree topology. We also show that output-queued architectures facilitate efficient implementations of multicasts and reductions. We present performance of multicasts and reductions on individual switches and a network of switches interconnected in a fat-tree topology. We also present simulation results based on synthetic workloads that emulate a molecular dynamics application.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
- Improved Point-to-Point and Collective Communication Performance with Output-Queued High-Radix Routers
Laxmikant V. Kalé
- Springer Berlin Heidelberg