Page 2 – Center for Computing Research (CCR)

In this paper HPC architectural characteristics and their impact on application performance and scaling are investigated. Performance data gathered over several generations of very large HPC systems like: ASC Red Storm, ASC Purple, and a large InfiniBand cluster - Red Sky, are analyzed. As the number of cache coherent cores and number of NUMA domains at a compute node keeps increasing, we analyze their impact with a few simple benchmarks and several applications. We present bottlenecks and remedies examining production applications. We conclude with preliminary early-hardware performance data from the ASC Cielo, a petaFLOPS class future capability system. © 2010 American Institute of Physics.

More Details

TYPE Conference YEAR 2010

Scopus OSTI

Investigating the impact of the cielo cray XE6 architecture on scientific application codes

Vaughan, Courtenay T.; Rajan, Mahesh R.; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin P.

Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, and supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.

More Details

TYPE Conference YEAR 2010

OSTI

HPC application performance and scaling : understanding trends and future challenges with application benchmarks on past, present and future Tri-Lab computing systems

Rajan, Mahesh R.; Doerfler, Douglas W.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Cielo : next generation capability computing platform for NNSA/ASC

Doerfler, Douglas W.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Analyzing multicore characteristics for a suite of applications on an XT5 system

Vaughan, Courtenay T.; Doerfler, Douglas W.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Analyzing multicore characteristics for a suite of applications on an XT5 system

Vaughan, Courtenay T.; Doerfler, Douglas W.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

The alliance for computing at the extreme scale

Ang, James A.; Doerfler, Douglas W.; Dosanjh, Sudip S.; Hemmert, Karl S.

Los Alamos and Sandia National Laboratories have formed a new high performance computing center, the Alliance for Computing at the Extreme Scale (ACES). The two labs will jointly architect, develop, procure and operate capability systems for DOE's Advanced Simulation and Computing Program. This presentation will discuss a petascale production capability system, Cielo, that will be deployed in late 2010, and a new partnership with Cray on advanced interconnect technologies.

More Details

TYPE Conference YEAR 2010

OSTI

The Alliance for Computing at the Extreme Scale

Ang, James A.; Doerfler, Douglas W.; Dosanjh, Sudip S.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Application performance on the tri-lab linux capacity cluster -TLCC

International Journal of Distributed Systems and Technologies

Rajan, Mahesh; Doerfler, Douglas W.; Vaughan, Courtenay T.; Epperson, Marcus E.; Ogden, Jeff

In a recent acquisition by DOE/NNSA several large capacity computing clusters called TLCC have been installed at the DOE labs: SNL, LANL and LLNL. TLCC architecture with ccNUMA, multi-socket, multi-core nodes, and InfiniBand interconnect, is representative of the trend in HPC architectures. This paper examines application performance on TLCC contrasting them with Red Storm/Cray XT4. TLCC and Red Storm share similar AMD processors and memory DIMMs. Red Storm however has single socket nodes and custom interconnect. Micro-benchmarks and performance analysis tools help understand the causes for the observed performance differences. Control of processor and memory affinity on TLCC with the numactl utility is shown to result in significant performance gains and is essential to attenuate the detrimental impact of OS interference and cache-coherency overhead. While previous studies have investigated impact of affinity control mostly in the context of small SMP systems, the focus of this paper is on highly parallel MPI applications.

More Details

TYPE Conference YEAR 2010

Scopus OSTI

Publications