Publications Search

Prototyping the Next Generation of Aria

Clausen, Jonathan; Brunini, Victor; Forster, Christopher J.; Noble, David R.; Trott, Christian R.; Hammond, Simon; Hoemmen, Mark F.; Lin, Paul T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures

SIAM Journal on Scientific Computing

Phipps, Eric T.; Edwards, Harold C.; Hoemmen, Mark F.; Hu, Jonathan J.; Rajamanickam, Sivasankaran

In this study, quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers of sources. Often simulation processes from sample to sample are similar and much of the data generated from each sample evaluation could be reused. We explore a new method for implementing sampling methods that simultaneously propagates groups of samples together in an embedded fashion, which we call embedded ensemble propagation. We show how this approach takes advantage of properties of modern computer architectures to improve performance by enabling reuse between samples, reducing memory bandwidth requirements, improving memory access patterns, improving opportunities for fine-grained parallelization, and reducing communication costs. We describe a software technique for implementing embedded ensemble propagation based on the use of C++ templates and describe its integration with various scientific computing libraries within Trilinos. We demonstrate improved performance, portability and scalability for the approach applied to the simulation of partial differential equations on a variety of CPU, GPU, and accelerator architectures, including up to 131,072 cores on a Cray XK7 (Titan).

More Details

TYPE Journal Article YEAR 2017

DOI OSTI

Enabling Low Mach Fluid Simulations Using Trilinos

Hu, Jonathan J.; Devine, Karen; Hoemmen, Mark F.; Lin, Paul T.; Rajamanickam, Sivasankaran; Roberts, Nathan V.; Siefert, Christopher; Trott, Christian R.; Prokopenko, Andrey

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Towards a performance portable compressible CFD code

23rd AIAA Computational Fluid Dynamics Conference, 2017

Howard, Micah; Bradley, Andrew M.; Bova, Steven W.; Overfelt, James R.; Wagnild, Ross M.; Dinzl, Derek J.; Hoemmen, Mark F.; Klinvex, Alicia M.

High performance computing (HPC) is undergoing a dramatic change in computing architectures. Nextgeneration HPC systems are being based primarily on many-core processing units and general purpose graphics processing units (GPUs). A computing node on a next-generation system can be, and in practice is, heterogeneous in nature, involving multiple memory spaces and multiple execution spaces. This presents a challenge for the development of application codes that wish to compute at the extreme scales afforded by these next-generation HPC technologies and systems - the best parallel programming model for one system is not necessarily the best parallel programming model for another. This inevitably raises the following question: how does an application code achieve high performance on disparate computing architectures without having entirely different, or at least significantly different, code paths, one for each architecture? This question has given rise to the term ‘performance portability’, a notion concerned with porting application code performance from architecture to architecture using a single code base. In this paper, we present the work being done at Sandia National Labs to develop a performance portable compressible CFD code that is targeting the ‘leadership’ class supercomputers the National Nuclear Security Administration (NNSA) is acquiring over the course of the next decade.

More Details

TYPE Conference Poster YEAR 2017

OSTI Scopus

Towards a performance portable compressible CFD code

23rd AIAA Computational Fluid Dynamics Conference, 2017

Howard, Micah; Bradley, Andrew M.; Bova, Steven W.; Overfelt, James R.; Wagnild, Ross M.; Dinzl, Derek J.; Hoemmen, Mark F.; Klinvex, Alicia M.

High performance computing (HPC) is undergoing a dramatic change in computing architectures. Nextgeneration HPC systems are being based primarily on many-core processing units and general purpose graphics processing units (GPUs). A computing node on a next-generation system can be, and in practice is, heterogeneous in nature, involving multiple memory spaces and multiple execution spaces. This presents a challenge for the development of application codes that wish to compute at the extreme scales afforded by these next-generation HPC technologies and systems - the best parallel programming model for one system is not necessarily the best parallel programming model for another. This inevitably raises the following question: how does an application code achieve high performance on disparate computing architectures without having entirely different, or at least significantly different, code paths, one for each architecture? This question has given rise to the term ‘performance portability’, a notion concerned with porting application code performance from architecture to architecture using a single code base. In this paper, we present the work being done at Sandia National Labs to develop a performance portable compressible CFD code that is targeting the ‘leadership’ class supercomputers the National Nuclear Security Administration (NNSA) is acquiring over the course of the next decade.

More Details

TYPE Conference Poster YEAR 2017

OSTI Scopus

Prototyping the Next-Generation of Aria

Brunini, Victor; Clausen, Jonathan; Noble, David R.; Forster, Christopher J.; Trott, Christian R.; Hammond, Simon; Hoemmen, Mark F.; Lin, Paul T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Trilinos NGP Planning

Rajamanickam, Sivasankaran; Devine, Karen; Hu, Jonathan J.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

KokkosKernels Introduction: Design API and Performance

Deveci, Mehmet; Rajamanickam, Sivasankaran; Kim, Kyungjoo; Bradley, Andrew M.; Trott, Christian R.; Hoemmen, Mark F.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Kokkos Technical Review Slides and Discussion Notes

Edwards, Harold C.; Sunderland, Daniel; Hoemmen, Mark F.; Ellingwood, Nathan D.; Trott, Christian R.; Mackey, Greg E.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Optimization of block sparse matrix-vector multiplication on shared-memory architectures

Eberhardt, Ryan E.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Ifpack2 User's Guide 1.0

Prokopenko, Andrey V.; Siefert, Christopher; Hu, Jonathan J.; Hoemmen, Mark F.; Klinvex, Alicia M.

This is the definitive user manual for the I FPACK 2 package in the Trilinos project. I FPACK 2 pro- vides implementations of iterative algorithms (e.g., Jacobi, SOR, additive Schwarz) and processor- based incomplete factorizations. I FPACK 2 is part of the Trilinos T PETRA solver stack, is templated on index, scalar, and node types, and leverages node-level parallelism indirectly through its use of T PETRA kernels. I FPACK 2 can be used to solve to matrix systems with greater than 2 billion rows (using 64-bit indices). Any options not documented in this manual should be considered strictly experimental .

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

What Error to Expect When You Are Expecting a Bit Flip

Foulk, James W.; Hoemmen, Mark F.; Mueller, Frank

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Performance Portability for Linear Algebra with Kokkos

Trott, Christian R.; Edwards, Harold C.; Ellingwood, Nathan D.; Hammond, Simon; Deveci, Mehmet; Boman, Erik G.; Bradley, Andrew M.; Hoemmen, Mark F.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Embedded Ensemble Propagation for Improving Performance Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

Phipps, Eric T.; Edwards, Harold C.; Hoemmen, Mark F.; Hu, Jonathan J.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran; Yamazaki, Ichitaro; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stan; Dongarra, Jack

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Optimal adiabatic scaling and the processor-in-memory-and-storage architecture (OAS+PIMS)

Proceedings of the 2015 IEEE/ACM International Symposium on Nanoscale Architectures, NANOARCH 2015

Debenedictis, Erik; Cook, Jeanine; Hoemmen, Mark F.; Metodi, Tzvetan S.

We discuss a new approach to computing that retains the possibility of exponential growth while making substantial use of the existing technology. The exponential improvement path of Moore's Law has been the driver behind the computing approach of Turing, von Neumann, and FORTRAN-like languages. Performance growth is slowing at the system level, even though further exponential growth should be possible. We propose two technology shifts as a remedy, the first being the formulation of a scaling rule for scaling into the third dimension. This involves use of circuit-level energy efficiency increases using adiabatic circuits to avoid overheating. However, this scaling rule is incompatible with the von Neumann architecture. The second technology shift is a computer architecture and programming change to an extremely aggressive form of Processor-In-Memory (PIM) architecture, which we call Processor-In-Memory-and-Storage (PIMS). Theoretical analysis shows that the PIMS architecture is compatible with the 3D scaling rule, suggesting both immediate benefit and a long-term improvement path.

More Details

TYPE Conference Poster YEAR 2015

OSTI Scopus

Beyond Moore's Law and Implications for Computing in Space

Debenedictis, Erik; Cook, Jeanine; Metodi, Tzvetan S.; Hoemmen, Mark F.; Marinella, Matthew; Schiek, Richard; Zima, Hans

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience

Teranishi, Keita; Heroux, Michael A.; Hoemmen, Mark F.; Chien, Andrew; Balaji, Pavan; Beckman, Pete; Dun, Nan; Fang, Aiman; Fujita, Hajime; Iskra, Kamil; Rubenstein, Zachary; Zheng, Zimming; Schreiber, Robert; Hammond, Jeff; Dinan, James; Laguna, Ignacio; Richards, David; Dubey, Anshu; Van Straalen, Brian; Siegel, Andrew

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

The cost of reliability: Iterative linear solvers and reactive fault tolerance

Foulk, James W.; Hoemmen, Mark F.; Mueller, Frank

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran; Yamazaki, Ichitaro; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire; Dongarra, Jack

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

A Numerical Soft Fault Model for Iterative Linear Solvers

Foulk, James W.; Mueller, Frank; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Exploiting data representation for fault tolerance

Journal of Computational Science

Foulk, James W.; Hoemmen, Mark F.; Mueller, F.

Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product's result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.

More Details

TYPE Journal Article YEAR 2015

DOI OSTI