Publications Search

Preliminary Assessment of Impact from Patch for Meltdown and Spectre (variants 1 & 2) on Sandia National Laboratories? HPC Production Operations Using ASC Integrated Codes

Agelastos, Anthony M.; Pase, Douglas M.; Klitsner, Tom; Monk, Stephen T.; Noe, John P.; Pavlakos, Constantine; Klundt, Ruth A.; Stevenson, Joel O.; Lamb, Justin M.; Ogden, Jeffry B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Application of Performance Analysis Tools on SNL ASC Codes

Agelastos, Anthony M.; Pase, Douglas M.; Amspaugh, Kathleen A.; Dinge, Dennis; Haskell, Karen; Ice, Lisa; Lamb, Justin M.; Shaw, Ryan; Stevenson, Joel O.; Brunini, Victor; Clausen, Jonathan; Crawford, Martin J.; Valdez, Greg D.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Application of Performance Analysis Tools on SNL ASC Codes

Agelastos, Anthony M.; Pase, Douglas M.; Amspaugh, Kathleen A.; Dinge, Dennis; Haskell, Karen; Ice, Lisa; Lamb, Justin M.; Rajan, Mahesh; Shaw, Ryan; Stevenson, Joel O.; Brunini, Victor; Clausen, Jonathan; Crawford, Martin J.; Valdez, Greg D.

This milestone 1) exercised a broad set of performance profiling and analysis tools, including tools whose development has been promoted by the ASC program; 2) exercised the tools on two different SNL ASC codes, one Sierra code (Sierra/Aria, a C++ codebase) and one RAMSES code (ITS, a Fortran codebase); and 3) exercised the tools on multiple platforms, including the CTS-1 (e.g., Serrano) and ATS-1 Trinity (e.g., Mutrino) platforms. The milestone generated a plethora of strong and weak scaling, trend and profile data for multiple versions and problem cases for each of the two codes. A wealth of experience was gained with the various tools that included identification of problems, an improved understanding of feature sets, enhanced usage documentation, and insights for future tool-development. Results are provided from a large number and variety of performance analysis runs with the target codes, together with instructions for how to make use of the tools with the codes.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

Performance on Trinity Phase 2 (a Cray XC40 utilizing Intel Xeon Phi processors) with Acceptance-Applications and Benchmarks

Agelastos, Anthony M.; Rajan, Mahesh; Wichmann, N.; Lin, Paul T.; Baker, R.; Domino, Stefan P.; Draeger, E.; Anderson, S.; Balma, J.; Behling, S.; Berry, M.; Carrier, P.; Davis, M.; Mcmahon, K.; Sandness, D.; Thomas, K.; Warren, S.; Zhu, T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II

Deconinck, Adam; Nam, Hai A.; Mortin, Dave; Bonnie, Amanda; Lueninghoener, Cory; Brandt, James M.; Gentile, Ann C.; Foulk, James W.; Agelastos, Anthony M.; Vaughan, Courtenay T.; Hammond, Simon; Allan, Benjamin A.; Davis, Michael; Repik, Jason J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II (Paper)

Deconinck, Adam; Nam, Hai A.; Morton, David; Bonnie, Amanda; Lueninghoener, Cory; Brandt, James M.; Gentile, Ann C.; Foulk, James W.; Agelastos, Anthony M.; Vaughan, Courtenay T.; Hammond, Simon; Allan, Benjamin A.; Davis, Mike; Repik, Jason J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Performance on Trinity Phase 2 (a Cray XC40 utilizing Intel Xeon Phi processors) with Acceptance Applications and Benchmarks

Rajan, Mahesh; Agelastos, Anthony M.; Domino, Stefan P.; Wichmann, N.; Baker, R.; Draeger, E.; Anderson, S.; Balma, J.; Behling, S.; Berry, M.; Carrier, P.; Davis, M.; Mcmahon, K.; Sandness, D.; Thomas, K.; Warren, S.; Zhu, T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Performance on Trinity Phase 2 (a Cray XC40 utilizing Intel Xeon Phi processors) with Acceptance Applications and Benchmarks

Agelastos, Anthony M.; Rajan, Mahesh; Wichmann, Nathan; Baker, Randy; Domino, Stefan P.; Draeger, Erik W.; Anderson, Sarah; Balma, Jacob; Behling, S.; Berry, Mike; Carrier, Pierre; Davis, Mike; Mcmahon, Kim; Sandness, D.; Thomas, Kevin; Warren, S.; Zhu, T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Guide to Using Sierra

Shaw, Ryan; Agelastos, Anthony M.; Miller, Joel D.

Sierra is an engineering mechanics simulation code suite supporting the Nation's Nuclear Weapons mission as well as other customers. It has explicit ties to Sandia National Labs' workfow, including geometry and meshing, design and optimization, and visualization. Dis- tinguishing strengths include "application aware" development, scalability, SQA and V&V, multiple scales, and multi-physics coupling. This document is intended to help new and existing users of Sierra as a user manual and troubleshooting guide.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

Contention and Congestion: Challenges and Approaches to Understanding Application Impact

Gentile, Ann C.; Brandt, James M.; Agelastos, Anthony M.; Lamb, Justin M.; Ruggirello, Kevin P.; Stevenson, Joel O.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Defining Metrics to Distill Large-Scale HPC Platform and Application Performance Data into Actionable Quantities ? Resource Contention of File System and Aries Interconnect

Agelastos, Anthony M.; Brandt, James M.; Gentile, Ann C.; Lamb, Justin M.; Ruggirello, Kevin P.; Stevenson, Joel O.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Continuous whole-system monitoring toward rapid understanding of production HPC applications and systems

Parallel Computing

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Lefantzi, Sophia; Monk, Stephen T.; Ogden, Jeffry B.; Rajan, Mahesh; Stevenson, Joel O.

A detailed understanding of HPC applications’ resource needs and their complex interactions with each other and HPC platform resources are critical to achieving scalability and performance. Such understanding has been difficult to achieve because typical application profiling tools do not capture the behaviors of codes under the potentially wide spectrum of actual production conditions and because typical monitoring tools do not capture system resource usage information with high enough fidelity to gain sufficient insight into application performance and demands. In this paper we present both system and application profiling results based on data obtained through synchronized system wide monitoring on a production HPC cluster at Sandia National Laboratories (SNL). We demonstrate analytic and visualization techniques that we are using to characterize application and system resource usage under production conditions for better understanding of application resource needs. Our goals are to improve application performance (through understanding application-to-resource mapping and system throughput) and to ensure that future system capabilities match their intended workloads.

More Details

TYPE Journal Article YEAR 2016

DOI OSTI Scopus

High Performance Computing Metrics to Enable Application-Platform Communication

Agelastos, Anthony M.; Brandt, James M.; Gentile, Ann C.; Lamb, Justin M.; Ruggirello, Kevin P.; Stevenson, Joel O.

Sandia has invested heavily in scientific/engineering application development and in the research, development, and deployment of large scale HPC platforms to support the computational needs of these applications. As application developers continually expand the capabilities of their software and spend more time on performance tuning of applications for these platforms, HPC platform resources are at a premium as they are a heavily shared resource serving the varied needs of many users. To ensure that the HPC platform resources are being used effciently and perform as designed, it is necessary to obtain reliable data on resource utilization that will allow us to investigate the occurrence, severity, and causes of performance-affecting contention between applications. The work presented in this paper was an initial step to determine if resource contention can be understood and minimized through monitoring, modeling, planning and infrastructure. This paper describes the set of metric definitions, identified in this research, that can be used as meaningful and potentially actionable indicators of performance-affecting contention between applications. These metrics were verified using the observed slowdown of IOR, IMB, and CTH in operating scenarios that forced contention. This paper also describes system/application monitoring activities that are critical to distilling vast amounts of data into quantities that hold the key to understanding for an application's performance under production conditions and that will ultimately aid in Sandia's efforts to succeed in extreme-scale computing.

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

Trinity Phase 2 8X Acceptance Status

Agelastos, Anthony M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Performance of miniFE For Trinity Factory Testing

Agelastos, Anthony M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Defining metrics to distill large-scale HPC platform and application performance data into actionable quantities

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Agelastos, Anthony M.

Application performance data accounting for resource contention and other external influences is highly coveted and extremely difficult to obtain. «Why did my application's performance change from the last time it ran?» is a question shared by application developers, program analysts, and system administrators. The answer to this question impacts nearly all programmatic and R&D efforts related to high-performance computing (HPC). Lightweight, right-fidelity monitoring infrastructures that can gather relevant application and resource performance data across the entire HPC platform can help address this research topic. This short technical paper will formally define an ongoing research effort to define the needed metrics and methods that distill the vast quantities of available data to a minimum set of actionable and interpretable quantities that can be used by application developers, system administrators, production analysts, and HPC platform designers for their respective production and R&D focus areas.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI Scopus

Defining Metrics to Distill Large-Scale HPC Platform and Application Performance Data into Actionable Quantities ? Resource Contention

Agelastos, Anthony M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program

Vaughan, Courtenay T.; Dinge, Dennis; Lin, Paul T.; Hammond, Simon; Cook, Jeanine; Trott, Christian R.; Agelastos, Anthony M.; Pase, Douglas M.; Benner, Robert E.; Rajan, Mahesh; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program

Vaughan, Courtenay T.; Dinge, Dennis; Lin, Paul T.; Hammond, Simon; Cook, Jeanine; Trott, Christian R.; Agelastos, Anthony M.; Pase, Douglas M.; Benner, Robert E.; Rajan, Mahesh; Hoekstra, Robert J.; Pierson, Kendall H.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Design and Implementation of a Scalable Monitoring System for Trinity

Deconinck, A.; Bonnie, A.; Kelly, K.; Sanchez, S.; Martin, C.; Mason, M.; Brandt, James M.; Gentile, Ann C.; Allan, Benjamin A.; Agelastos, Anthony M.; Davis, M.; Berry, M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

NALU Assembly ? Prototyping the NGP Transition

Trott, Christian R.; Domino, Stefan P.; Brunini, Victor; Lin, Paul T.; Agelastos, Anthony M.; Fisher, Travis C.; Hammond, Simon

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Performance on Trinity (a Cray XC40) with Acceptance-Applications and Benchmarks

Rajan, Mahesh; Domino, Stefan P.; Agelastos, Anthony M.; Wichmann, Nathan; Nuss, Cindy; Carrier, Pierre; Olson, Ryan; Anderson, Sarah; Davis, Mike; Baker, Randy; Dreager, Erik

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Toward rapid understanding of production HPC applications and systems

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Lefantzi, Sophia; Monk, Stephen T.; Ogden, Jeffrey B.; Rajan, Mahesh; Stevenson, Joel O.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI

Preliminary Assessment of Tecplot Chorus for Analyzing Ensemble of CTH Simulations

Agelastos, Anthony M.; Stevenson, Joel O.; Attaway, Stephen W.; Peterson, David

The exploration of large parameter spaces in search of problem solution and uncertainty quantifcation produces very large ensembles of data. Processing ensemble data will continue to require more resources as simulation complexity and HPC platform throughput increase. More tools are needed to help provide rapid insight into these data sets to decrease manual processing time by the analyst and to increase knowledge the data can provide. One such tool is Tecplot Chorus, whose strengths are visualizing ensemble metadata and linked images. This report contains the analysis and conclusions from evaluating Tecplot Chorus with an example problem that is relevant to Sandia National Laboratories. This report documents a preliminary evaluation of Tecplot Chorus for analyzing ensemble data from CTH simulations. The project that funded this report and evaluation is also evaluating and guiding development with SNL’s Slycat. Slycat and Tecplot Chorus each have their strengths, weaknesses, and overlapping capabilities. It is quite likely that, as the scale of ensemble data increases, both of these tools (and possibly others) will be needed for different processing goals. This report will focus on Tecplot Chorus and its application to an example ensemble of data supplied by David J. Peterson and John P. Korbin; this example is of a flyer plate impact and weld study henceforth referred to as CTH Impact Example. This evaluation also defines a workflow for analysts that can help reduce the time and resources for processing ensemble data.

More Details

TYPE SAND Report YEAR 2015

DOI OSTI

Toward Rapid Understanding of Production HPC Applications and Systems

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Lefantzi, Sophia; Monk, Stephen T.; Ogden, Jeffry B.; Rajan, Mahesh; Stevenson, Joel O.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Publications

Search results