Publications

Results 101–125 of 129

Search results

Jump to search filters

Supercomputer and cluster performance modeling and analysis efforts:2004-2006

Ang, James A.; Vaughan, Courtenay T.; Barnette, Daniel W.; Benner, Robert E.; Doerfler, Douglas W.; Ganti, Anand; Phelps, Sue C.; Rajan, Mahesh; Stevenson, Joel O.; Scott, Ryan T.

This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

More Details

A simple synchronous distributed-memory algorithm for the HPCC RandomAccess benchmark

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Plimpton, Steven J.; Brightwell, Ronald B.; Vaughan, Courtenay T.; Underwood, Keith D.

The RandomAccess benchmark as defined by the High Performance Computing Challenge (HPCC) tests the speed at which a machine can update the elements of a table spread across global system memory, as measured in billions (giga) of updates per second (GUPS). The parallel implementation provided by HPCC typically performs poorly on distributed-memory machines, due to updates requiring numerous small point-to-point messages between processors. We present an alternative algorithm which treats the collection of P processors as a hypercube, aggregating data so that larger messages are sent, and routing individual datums through dimensions of the hypercube to their destination processor. The algorithm's computation (the GUP count) scales linearly with P while its communication overhead scales as log2(P), thus enabling better performance on large numbers of processors. The new algorithm achieves a GUPS rate of 19.98 on 8192 processors of Sandia's Red Storm machine, compared to 1.02 for the HPCC-provided algorithm on 10350 processors. We also illustrate how GUPS performance varies with the benchmark's specification of its "look-ahead" parameter. As expected, parallel performance degrades for small look-ahead values, and improves dramatically for large values. © 2006 IEEE.

More Details
Results 101–125 of 129
Results 101–125 of 129