NiMC: Network-Induced Memory Contention

Researchers at Sandia National Laboratories have introduced the concept of Network-Induced Memory Contention (NiMC) and demonstrated its impact for a variety of current HPC system architectures. NiMC occurs when a remote direct memory access (RDMA) contends with the target node for the target’s memory resources. Given the importance of computation/communication overlap in future systems, this research explores the hidden cost of RDMA network traffic creating memory contention with running applications. It has been shown that NiMC is a concern for both onloaded and offloaded networking hardware, with the onloaded hardware observing the largest performance impact of up to 3X performance degradation for the molecular dynamics simulator LAMMPS at 8,192 cores when traffic of RDMA creates contention at endpoints. This work has offered three solutions to NiMC, each applicable for different intensities of RDMA traffic, that can reduce the overheads associated with memory contention to only 6.4% versus the 300%+ originally observed.

(Summary:  Researchers at Sandia have been the first to investigate the impact of RDMA networks on application memory bandwith on modern systems.)

Publications: SAND2016-0028 C: Taylor Groves, Ryan E. Grant, Dorian Arnold, 2016, "NiMC:Characterizing and Eliminating Network-Induced Memory Contention".  30th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2016), IEEE, New York, NY, USA, 10 pages (Accepted)

: Impact of Network-Induced Memory Contention on LAMMPS at up to 8,192 cores
: Impact of Network-Induced Memory Contention on LAMMPS at up to 8,192 cores

 

Contact
Ronald B. Brightwell, rbbrigh@sandia.gov

April 1, 2016