Publications

Results 1–25 of 129
Skip to search filters

A Methodology for Characterizing the Correspondence between Real and Proxy Applications

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Aaziz, Omar R.; Cook, Jeanine C.; Cook, Jonathan E.; Juedeman, Tanner; Richards, David; Vaughan, Courtenay T.

Proxy applications are a simplified means for stake-holders to evaluate how both hardware and software stacks might perform on the class of real applications that they are meant to model. However, characterizing the relationship between them and their behavior is not an easy task. We present a data-driven methodology for characterizing the relationship between real and proxy applications based on collecting runtime data from both and then using data analytics to find their correspondence and divergence. We use new capabilities for application-level monitoring within LDMS (Lightweight Distributed Monitoring System) to capture hardware performance counter and MPI-related data. To demonstrate the utility of this methodology, we present experimental evidence from two system platforms, using four proxy applications from the current ECP Proxy Application Suite and their corresponding parent applications (in the ECP application portfolio). Results show that each proxy analyzed is representative of its parent with respect to computation and memory behavior. We also analyze communication patterns separately using mpiP data and show that communication for these four proxy/parent pairs is also similar.

More Details

Application performance on the tri-lab linux capacity cluster -TLCC

International Journal of Distributed Systems and Technologies

Rajan, Mahesh; Doerfler, Douglas W.; Vaughan, Courtenay T.; Epperson, Marcus E.; Ogden, Jeff

In a recent acquisition by DOE/NNSA several large capacity computing clusters called TLCC have been installed at the DOE labs: SNL, LANL and LLNL. TLCC architecture with ccNUMA, multi-socket, multi-core nodes, and InfiniBand interconnect, is representative of the trend in HPC architectures. This paper examines application performance on TLCC contrasting them with Red Storm/Cray XT4. TLCC and Red Storm share similar AMD processors and memory DIMMs. Red Storm however has single socket nodes and custom interconnect. Micro-benchmarks and performance analysis tools help understand the causes for the observed performance differences. Control of processor and memory affinity on TLCC with the numactl utility is shown to result in significant performance gains and is essential to attenuate the detrimental impact of OS interference and cache-coherency overhead. While previous studies have investigated impact of affinity control mostly in the context of small SMP systems, the focus of this paper is on highly parallel MPI applications.

More Details
Results 1–25 of 129
Results 1–25 of 129