Publications Search

Analyzing multicore characteristics for a suite of applications on an XT5 system

Vaughan, Courtenay T.; Doerfler, Douglas W.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Analyzing multicore characteristics for a suite of applications on an XT5 system

Vaughan, Courtenay T.; Doerfler, Douglas W.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Application performance on the tri-lab linux capacity cluster -TLCC

International Journal of Distributed Systems and Technologies

Rajan, Mahesh; Doerfler, Douglas W.; Vaughan, Courtenay T.; Epperson, Marcus

In a recent acquisition by DOE/NNSA several large capacity computing clusters called TLCC have been installed at the DOE labs: SNL, LANL and LLNL. TLCC architecture with ccNUMA, multi-socket, multi-core nodes, and InfiniBand interconnect, is representative of the trend in HPC architectures. This paper examines application performance on TLCC contrasting them with Red Storm/Cray XT4. TLCC and Red Storm share similar AMD processors and memory DIMMs. Red Storm however has single socket nodes and custom interconnect. Micro-benchmarks and performance analysis tools help understand the causes for the observed performance differences. Control of processor and memory affinity on TLCC with the numactl utility is shown to result in significant performance gains and is essential to attenuate the detrimental impact of OS interference and cache-coherency overhead. While previous studies have investigated impact of affinity control mostly in the context of small SMP systems, the focus of this paper is on highly parallel MPI applications.

More Details

TYPE Presentation YEAR 2010

OSTI Scopus

Copy of Predicting AMD Magny-Cours Performance for a Suite of NNSA/ASC Applications

Doerfler, Douglas W.; Rajan, Mahesh; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Recent Experiences on Performance and Scalability of SNL Applications on Red Storm and TLCC

Rajan, Mahesh; Doerfler, Douglas W.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2009

OSTI

Red Storm / Cray XT4: A Superior Architecture for Scalability

Doerfler, Douglas W.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Application Performance on a Mildly Heterogeneous Supercomputer

Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Red Storm/XT4: A Superior Architecture for Scalability

Doerfler, Douglas W.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Investigating Real Power Usage on High Performance Computing Platforms

Pedretti, Kevin; Kelly, Suzanne M.; Vandyke, John P.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Optimizing FFT for HPCC

Sears, Mark P.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Application Performance under Different XT Operating Systems

Vaughan, Courtenay T.; Vandyke, John P.; Kelly, Suzanne M.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Application Performance under Different XT Operating Systems

Vaughan, Courtenay T.; Vandyke, John P.; Kelly, Suzanne M.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Application Sensitivity to Link and Injection Bandwidth on a Cray XT4 System

Pedretti, Kevin T.T.; Barrett, Brian; Hemmert, Karl S.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Optimizing FFT for HPCC

Sears, Mark P.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

A preliminary evaluation of quad-core processors for Sandia applications

Doerfler, Douglas W.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Investigating the balance between capacity and capability workloads across large scale computing platforms

Rajan, Mahesh; Vaughan, Courtenay T.; Doerfler, Douglas W.; Benner, Robert E.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Preliminary Results with AMD Quadcore Processors

Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

A Preliminary Evaluation of Quad-Core Processors for Sandia Applications

Doerfler, Douglas W.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Extending catamount for multi-core processors

Vandyke, John P.; Vaughan, Courtenay T.; Kelly, Suzanne M.

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

Extending catamount for multi-core, processors

Vandyke, John P.; Vaughan, Courtenay T.; Kelly, Suzanne M.

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

Investigating the balance between capacity and capability workloads across large scale computing platforms

Rajan, Mahesh; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2007

OSTI

The Effects of System Options on Code Performance

Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

The Effects of System Options on Code Performance

Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

Supercomputer and cluster performance modeling and analysis efforts:2004-2006

Ang, James A.; Vaughan, Courtenay T.; Barnette, Daniel W.; Benner, Robert E.; Doerfler, Douglas W.; Ganti, Anand; Phelps, Sue C.; Rajan, Mahesh; Stevenson, Joel O.; Scott, Ryan T.

This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

More Details

TYPE SAND Report YEAR 2007

DOI OSTI

The Effects of System Options on Code Performance

Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

A simple synchronous distributed-memory algorithm for the HPCC RandomAccess benchmark

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Plimpton, Steven J.; Brightwell, Ronald B.; Vaughan, Courtenay T.; Underwood, Keith D.

The RandomAccess benchmark as defined by the High Performance Computing Challenge (HPCC) tests the speed at which a machine can update the elements of a table spread across global system memory, as measured in billions (giga) of updates per second (GUPS). The parallel implementation provided by HPCC typically performs poorly on distributed-memory machines, due to updates requiring numerous small point-to-point messages between processors. We present an alternative algorithm which treats the collection of P processors as a hypercube, aggregating data so that larger messages are sent, and routing individual datums through dimensions of the hypercube to their destination processor. The algorithm's computation (the GUP count) scales linearly with P while its communication overhead scales as log2(P), thus enabling better performance on large numbers of processors. The new algorithm achieves a GUPS rate of 19.98 on 8192 processors of Sandia's Red Storm machine, compared to 1.02 for the HPCC-provided algorithm on 10350 processors. We also illustrate how GUPS performance varies with the benchmark's specification of its "look-ahead" parameter. As expected, parallel performance degrades for small look-ahead values, and improves dramatically for large values. © 2006 IEEE.

More Details

TYPE Conference YEAR 2006

OSTI Scopus

Zoltan 2.0: Data-Management Services for Parallel Applications -- User's Guide

Boman, Erik G.; Devine, Karen; Riesen, Lee A.; Heaphy, Robert T.; Hendrickson, Bruce A.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2006

OSTI

A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark

Underwood, Keith D.; Plimpton, Steven J.; Brightwell, Ronald B.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2006

OSTI

Red Storm performance of molecular dynamics modeling

Plimpton, Steven J.; Crozier, Paul; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2005

OSTI

Characterizing compiler performance for the AMD Opteron processor on a parallel platform

Doerfler, Douglas W.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2005

OSTI

Characterizing compiler performance for the AMD Opteron processor on a parallel platform

Doerfler, Douglas W.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2005

OSTI

Modeling air blast on thin-shell structures with Zapotec

Bessette, Greg C.; Vaughan, Courtenay T.; Bell, Raymond L.; Attaway, Stephen W.

A new capability for modeling thin-shell structures within the coupled Euler-Lagrange code, Zapotec, is under development. The new algorithm creates an artificial material interface for the Eulerian portion of the problem by expanding a Lagrangian shell element such that it has an effective thickness that spans one or more Eulerian cells. The algorithm implementation is discussed along with several examples involving blast loading on plates.

More Details

TYPE Conference YEAR 2003

OSTI

Design of dynamic load-balancing tools for parallel applications

Proceedings of the International Conference on Supercomputing

Devine, Karen; Hendrickson, Bruce A.; Boman, Erik G.; Vaughan, Courtenay T.

The design of general-purpose dynamic load-balancing tools for parallel applications is more challenging than the design of static partitioning tools. Both algorithmic and software engineering issues arise. We have addressed many of these issues in the design of the Zoltan dynamic load-balancing library. Zoltan has an object-oriented interface that makes it easy to use and provides separation between the application and the load-balancing algorithms. It contains a suite of dynamic load-balancing algorithms, including both geometric and graph-based algorithms. Its design makes it valuable both as a partitioning tool for a variety of applications and as a research test-bed for new algorithmic development. In this paper, we describe Zoltan's design and demonstrate its use in an unstructured-mesh finite element application.

More Details

TYPE Conference YEAR 2000

OSTI Scopus

The Optimization of a Shaped-Charge Design Using Parallel Computers

Gardner, David R.; Vaughan, Courtenay T.

Current supercomputers use large parallel arrays of tightly coupled processors to achieve levels of performance far surpassing conventional vector supercomputers. Shock-wave physics codes have been developed for these new supercomputers at Sandia National Laboratories and elsewhere. These parallel codes run fast enough on many simulations to consider using them to study the effects of varying design parameters on the performance of models of conventional munitions and other complex systems. Such studies maybe directed by optimization software to improve the performance of the modeled system. Using a shaped-charge jet design as an archetypal test case and the CTH parallel shock-wave physics code controlled by the Dakota optimization software, we explored the use of automatic optimization tools to optimize the design for conventional munitions. We used a scheme in which a lower resolution computational mesh was used to identify candidate optimal solutions and then these were verified using a higher resolution mesh. We identified three optimal solutions for the model and a region of the design domain where the jet tip speed is nearly optimal, indicating the possibility of a robust design. Based on this study we identified some of the difficulties in using high-fidelity models with optimization software to develop improved designs. These include developing robust algorithms for the objective function and constraints and mitigating the effects of numerical noise in them. We conclude that optimization software running high-fidelity models of physical systems using parallel shock wave physics codes to find improved designs can be a valuable tool for designers. While current state of algorithm and software development does not permit routine, ''black box'' optimization of designs, the effort involved in using the existing tools may well be worth the improvement achieved in designs.

More Details

TYPE Report YEAR 1999

DOI OSTI

Publications

Search results