Performance Improvements for Open MPI Demonstrated at Scale on Trinity

Researchers from Sandia and Los Alamos National Laboratory recently collaborated to increase the performance of Open MPI on the Trinity supercomputer. The multi-lab team was able to significantly improve the performance of MPI remote memory access (RMA) operations, especially when RMA operations are used by multi-threaded applications. A key part of this work was the use of the RMA-MT benchmark developed at Sandia, which was used to identify and help diagnose performance bottlenecks and scalability limitations in Open MPI. The experiments that were run on the full scale of each partition of Trinity as part of this work represent the largest runs of the RMA-MT mode of MPI known to date. This collaboration was supported by the Computational Systems and Software Environment element of NNSA’s Advanced Simulation and Computing Program at each lab. The Trinity supercomputer is a result of the Alliance for Computing at Extreme Scale (ACES) partnership between Sandia and Los Alamos.

Figure: This graph shows the performance improvement for Open MPI's Remote Memory Access operations using the HPCCG benchmark. The new implementation (OSC/RDMA) significantly outperforms the previous approach (OSC/PT2PT). — Figure: This graph shows the performance improvement for Open MPI’s Remote Memory Access operations using the HPCCG benchmark. The new implementation (OSC/RDMA) significantly outperforms the previous approach (OSC/PT2PT).

Contact

Ryan Grant, regrant@sandia.gov

August 1, 2017