Publications

37 Results
Skip to search filters

Performance Efficiency and Effectivness of Supercomputers

Leland, Robert; Rajan, Mahesh R.; Heroux, Michael A.

Our first purpose here is to offer to a general technical and policy audience a perspective on whether the supercomputing community should focus on improving the efficiency of supercomputing systems and their use rather than on building larger and ostensibly more capable systems that are used at low efficiency. After first summarizing our content and defining some necessary terms, we give a concise answer to this question. We then set this in context by characterizing performance of current supercomputing systems on a variety of benchmark problems and actual problems drawn from workloads in the national security, industrial, and scientific context. Along the way we answer some related questions, identify some important technological trends, and offer a perspective on the significance of these trends. Our second purpose is to give a reasonably broad and transparent overview of the related issue space and thereby to better equip the reader to evaluate commentary and controversy concerning supercomputing performance. For example, questions repeatedly arise concerning the Linpack benchmark and its predictive power, so we consider this in moderate depth as an example. We also characterize benchmark and application performance for scientific and engineering use of supercomputers and offer some guidance on how to think about these. Examples here are drawn from traditional scientific computing. Other problem domains, for example, data analytics, have different performance characteristics that are better captured by different benchmark problems or applications, but the story in those domains is similar in character and leads to similar conclusions with regard to the motivating question. For more on this topic, see Large-Scale Data Analytics and Its Relationship to Simulation. 1 Director, Computing Research Center, Sandia National Laboratories 2 Distinguished Member of the Technical Staff, Sandia National Laboratories 3 Distinguished Member of the Technical Staff, Sandia National Laboratories 4 Distinguished Member of the Technical Staff , Sandia National Laboratories

More Details

ASC Trilab L2 Codesign Milestone 2015

Trott, Christian R.; Hammond, Simon D.; Dinge, Dennis D.; Lin, Paul L.; Vaughan, Courtenay T.; Cook, Jeanine C.; Rajan, Mahesh R.; Edwards, Harold C.; Hoekstra, Robert J.

For the FY15 ASC L2 Trilab Codesign milestone Sandia National Laboratories performed two main studies. The first study investigated three topics (performance, cross-platform portability and programmer productivity) when using OpenMP directives and the RAJA and Kokkos programming models available from LLNL and SNL respectively. The focus of this first study was the LULESH mini-application developed and maintained by LLNL. In the coming sections of the report the reader will find performance comparisons (and a demonstration of portability) for a variety of mini-application implementations produced during this study with varying levels of optimization. Of note is that the implementations utilized including optimizations across a number of programming models to help ensure claims that Kokkos can provide native-class application performance are valid. The second study performed during FY15 is a performance assessment of the MiniAero mini-application developed by Sandia. This mini-application was developed by the SIERRA Thermal-Fluid team at Sandia for the purposes of learning the Kokkos programming model and so is available in only a single implementation. For this report we studied its performance and scaling on a number of machines with the intent of providing insight into potential performance issues that may be experienced when similar algorithms are deployed on the forthcoming Trinity ASC ATS platform.

More Details

Investigating the impact of the Cielo Cray XE6 architecture on scientific application codes

IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum

Vaughan, Courtenay T.; Rajan, Mahesh R.; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin P.

Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, and supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo. © 2011 IEEE.

More Details

Investigating the impact of the cielo cray XE6 architecture on scientific application codes

Vaughan, Courtenay T.; Rajan, Mahesh R.; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin P.

Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, and supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.

More Details

HPC top 10 InfiniBand Machine : a 3D Torus IB interconnect on Red Sky

Naegle, John H.; Monk, Stephen T.; Schutt, James A.; Doerfler, Douglas W.; Rajan, Mahesh R.

This presentation discusses the following topics: (1) Red Sky Background; (2) 3D Torus Interconnect Concepts; (3) Difficulties of Torus in IB; (4) New Routing Code for IB a 3D Torus; (5) Red Sky 3D Torus Implementation; and (6) Managing a Large IB Machine. Computing at Sandia: (1) Capability Computing - Designed for scaling of single large runs, Usually proprietary for maximum performance, and Red Storm is Sandia's current capability machine; (2) Capacity Computing - Computing for the masses, 100s of jobs and 100s of users, Extreme reliability required, Flexibility for changing workload, Thunderbird will be decommissioned this quarter, Red Sky is our future capacity computing platform, and Red Mesa machine for National Renewable Energy Lab. Red Sky main themes are: (1) Cheaper - 5X capacity of Tbird at 2/3 the cost, Substantially cheaper per flop than our last large capacity machine purchase; (2) Leaner - Lower operational costs, Three security environments via modular fabric, Expandable, upgradeable, extensible, and Designed for 6yr. life cycle; and (3) Greener - 15% less power-1/6th power per flop, 40% less water-5M gallons saved annually, 10X better cooling efficiency, and 4x denser footprint.

More Details

Improving performance via mini-applications

Doerfler, Douglas W.; Crozier, Paul C.; Edwards, Harold C.; Williams, Alan B.; Rajan, Mahesh R.; Keiter, Eric R.; Thornquist, Heidi K.

Application performance is determined by a combination of many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, we find that the use of mini-applications - small self-contained proxies for real applications - is an excellent approach for rapidly exploring the parameter space of all these choices. Furthermore, use of mini-applications enriches the interaction between application, library and computer system developers by providing explicit functioning software and concrete performance results that lead to detailed, focused discussions of design trade-offs, algorithm choices and runtime performance issues. In this paper we discuss a collection of mini-applications and demonstrate how we use them to analyze and improve application performance on new and future computer platforms.

More Details

Supercomputer and cluster performance modeling and analysis efforts:2004-2006

Ang, James A.; Vaughan, Courtenay T.; Barnette, Daniel W.; Doerfler, Douglas W.; Ganti, Anand G.; Phelps, Sue C.; Rajan, Mahesh R.; Stevenson, Joel O.; Scott, Ryan D.

This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

More Details
37 Results
37 Results