Publications

Results 1–25 of 126

Search results

Jump to search filters

Evaluating Trade-offs in Potential Exascale Interconnect Technologies

Hemmert, Karl S.; Bair, Ray; Bhatale, Abhinav; Groves, Taylor; Jain, Nikhil; Lewis, Cannada L.; Mubarak, Misbah; Pakin, Scott D.; Ross, Robert; Wilke, Jeremiah J.

This report details work to study trade-offs in topology and network bandwidth for potential interconnects in the exascale (2021-2022) timeframe. The work was done using multiple interconnect models across two parallel discrete event simulators. Results from each independent simulator are shown and discussed and the areas of agreement and disagreement are explored.

More Details

An Evaluation of Ethernet Performance for Scientific Workloads

Proceedings of INDIS 2020: Innovating the Network for Data-Intensive Science, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Kenny, Joseph P.; Wilke, Jeremiah J.; Ulmer, Craig D.; Baker, Gavin M.; Knight, Samuel K.; Friesen, Jerrold A.

Priority-based Flow Control (PFC), RDMA over Converged Ethernet (RoCE) and Enhanced Transmission Selection (ETS) are three enhancements to Ethernet networks which allow increased performance and may make Ethernet attractive for systems supporting a diverse scientific workload. We constructed a 96-node testbed cluster with a 100 Gb/s Ethernet network configured as a tapered fat tree. Tests representing important network operating conditions were completed and we provide an analysis of these performance results. RoCE running over a PFC-enabled network was found to significantly increase performance for both bandwidth-sensitive and latency-sensitive applications when compared to TCP. Additionally, a case study of interfering applications showed that ETS can prevent starvation of network traffic for latency-sensitive applications running on congested networks. We did not encounter any notable performance limitations for our Ethernet testbed, but we found that practical disadvantages still tip the balance towards traditional HPC networks unless a system design is driven by additional external requirements.

More Details

Opportunities and limitations of Quality-of-Service in Message Passing applications on adaptively routed Dragonfly and Fat Tree networks

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Wilke, Jeremiah J.; Kenny, Joseph P.

Avoiding communication bottlenecks remains a critical challenge in high-performance computing (HPC) as systems grow to exascale. Numerous design possibilities exist for avoiding network congestion including topology, adaptive routing, congestion control, and quality-of-service (QoS). While network design often focuses on topological features like diameter, bisection bandwidth, and routing, efficient QoS implementations will be critical for next-generation interconnects. HPC workloads are dominated by tightly-coupled mathematics, making delays in a single message manifest as delays across an entire parallel job. QoS can spread traffic onto different virtual lanes (VLs), lowering the impact of network hotspots by providing priorities or bandwidth guarantees that prevent starvation of critical traffic. Two leading topology candidates, Dragonfly and Fat Tree, are often discussed in terms of routing properties and cost, but the topology can have a major impact on QoS. While Dragonfly has attractive routing flexibility and cost relative to Fat Tree, the extra routing complexity requires several VLs to avoid deadlock. Here we discuss the special challenges of Dragonfly, proposing configurations that use different routing algorithms for different service levels (SLs) to limit VL requirements. We provide simulated results showing how each QoS strategy performs on different classes of application and different workload mixes. Despite Dragonfly's desirable characteristics for adaptive routing, Fat Tree is shown to be an attractive option when QoS is considered.

More Details

Opportunities and limitations of Quality-of-Service in Message Passing applications on adaptively routed Dragonfly and Fat Tree networks

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Wilke, Jeremiah J.; Kenny, Joseph P.

Avoiding communication bottlenecks remains a critical challenge in high-performance computing (HPC) as systems grow to exascale. Numerous design possibilities exist for avoiding network congestion including topology, adaptive routing, congestion control, and quality-of-service (QoS). While network design often focuses on topological features like diameter, bisection bandwidth, and routing, efficient QoS implementations will be critical for next-generation interconnects. HPC workloads are dominated by tightly-coupled mathematics, making delays in a single message manifest as delays across an entire parallel job. QoS can spread traffic onto different virtual lanes (VLs), lowering the impact of network hotspots by providing priorities or bandwidth guarantees that prevent starvation of critical traffic. Two leading topology candidates, Dragonfly and Fat Tree, are often discussed in terms of routing properties and cost, but the topology can have a major impact on QoS. While Dragonfly has attractive routing flexibility and cost relative to Fat Tree, the extra routing complexity requires several VLs to avoid deadlock. Here we discuss the special challenges of Dragonfly, proposing configurations that use different routing algorithms for different service levels (SLs) to limit VL requirements. We provide simulated results showing how each QoS strategy performs on different classes of application and different workload mixes. Despite Dragonfly's desirable characteristics for adaptive routing, Fat Tree is shown to be an attractive option when QoS is considered.

More Details

Verifying Simulator Readiness for Evaluating Potential Exascale Interconnect Technologies [PowerPoint]

Hemmert, Karl S.; Wilke, Jeremiah J.; Kenny, Joseph P.; Lewis, Cannada L.; Bhatele, Abhinav; Georgakoudis, Giorgis; Pakin, Scott; Mubarak, Misbah; Groves, Taylor

Goals of the milestone are to: verify key hardware contention models in controlled environment; validate simulator readiness for future milestones; and, provide baseline to define cross-validation workflow across teams for ''bracketing'' results.

More Details

ECP Milestone Memo for 2.3.1.04.14

Wilke, Jeremiah J.

The DARMA many-task framework provides asynchronous communication and load balancing functionality. This functionality is embedded in standard, modern C++ through the use of the template wrapper classes similar to futures. DARMA previously functioned as a single, large repository. This simplified building and installation, but hindered agile development as individual components could not be easily updated or reused in other projects. DARMA components can now be developed independently and reused in other ECP projects. Through Spack and modern CMake, a complete DARMA package can be easily configured and installed with automatic dependency management for each of the configuration options.

More Details
Results 1–25 of 126
Results 1–25 of 126