Center for Computing Research (CCR)

1541 L2 Milestone: Thread Scalable Expression Assembly in Aria

Clausen, Jonathan C.; Brunini, Victor B.; Forster, Christopher J.; Noble, David R.; Trott, Christian R.; Hammond, Simon D.; Hoemmen, Mark F.; Lin, Paul L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

A free function linear algebra interface based on the BLAS

Caday, Peter C.; Hoemmen, Mark F.; Hollman, David S.; Liber, Nevin L.; Lo, Li-Ta L.; Lopez, Graham L.; Luszczek, Piotr L.; Knepper, Sarah K.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

A numerical soft fault model for iterative linear solvers

HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing

Elliott, James J.; Hoemmen, Mark F.; Mueller, Frank

We present a fault model designed to bring out the \worst" in iterative solvers based on mathematical properties. Our model introduces substantially higher overhead, but smaller variance, than a fault model based on random bit ips. We also relate the statistics from our experiments back to the solvers' conffguration, and briey address the computational efiort that each model requires. Our approach requires signi ficantly fewer resources, while punishing our solvers with undetectable errors that require notable overhead for recovery. This work also illustrates the robustness of our resilient algorithms: Not only do we make forward progress in the presence of pathological faults, we always obtain the correct answer.

More Details

TYPE Conference Poster YEAR 2015

Scopus OSTI

A Tutorial on Anasazi and Belos

Thornquist, Heidi K.; Hoemmen, Mark F.; Heroux, Michael A.; Lehoucq, Richard B.; Parks, Michael L.; Day, David M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

Beyond Moore's Law and Implications for Computing in Space

DeBenedictis, Erik; Cook, Jeanine C.; Metodi, Tzvetan S.; Hoemmen, Mark F.; Marinella, Matthew J.; Schiek, Richard S.; Zima, Hans Z.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Communication-avoiding & pipelined Krylov solvers in Trilinos

Yamazaki, Ichitaro Y.; Hoemmen, Mark F.; Boman, Erik G.; Dongarra, Jack D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Cooperative application/OS DRAM fault recovery

Hoemmen, Mark F.; Ferreira, Kurt; Heroux, Michael A.; Brightwell, Ronald B.

Exascale systems will present considerable fault-tolerance challenges to applications and system software. These systems are expected to suffer several hard and soft errors per day. Unfortunately, many fault-tolerance methods in use, such as rollback recovery, are unsuitable for many expected errors, for example DRAM failures. As a result, applications will need to address these resilience challenges to more effectively utilize future systems. In this paper, we describe work on a cross-layer application/OS framework to handle uncorrected memory errors. We illustrate the use of this framework through its integration with a new fault-tolerant iterative solver within the Trilinos library, and present initial convergence results.

More Details

TYPE SAND Report YEAR 2012

OSTI DOI

Cooperative Application/OS DRAM Fault Recovery

Hoemmen, Mark F.; Ferreira, Kurt; Heroux, Michael A.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Yamazaki, Ichitaro; Rajamanickam, Sivasankaran R.; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on Distributed GPUs

Boman, Erik G.; Heroux, Michael A.; Hoemmen, Mark F.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Dynamical System for Resilient Computing

Rothganger, Fredrick R.; Hoemmen, Mark F.; Phipps, Eric T.; Warrender, Christina E.

Abstract not provided.

More Details

TYPE Other Report YEAR 2016

OSTI DOI

Embedded Ensemble Propagation for Improving Performance Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

Phipps, Eric T.; D'Elia, Marta D.; Edwards, Harold C.; Hoemmen, Mark F.; Hu, Jonathan J.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Employing Multiple Levels of Parallelism for CFD at Large Scales on Next Generation High-Performance Computing Platforms

Howard, Micah A.; Fisher, Travis C.; Hoemmen, Mark F.; Dinzl, Derek J.; Overfelt, James R.; Bradley, Andrew M.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Employing Multiple Levels of Parallelism for CFD at Large Scales on Next Generation High-Performance Computing Platforms

Howard, Micah A.; Fisher, Travis C.; Hoemmen, Mark F.; Dinzl, Derek J.; Overfelt, James R.; Bradley, Andrew M.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Enabling extreme-scale simulations with next-generation Trilinos for Sierra low Mach fluid application code

Lin, Paul L.; Rajamanickam, Sivasankaran R.; Siefert, Christopher S.; Bettencourt, Matthew T.; Cyr, Eric C.; Domino, Stefan P.; Fisher, Travis C.; Hoemmen, Mark F.; Hu, Jonathan J.; Phipps, Eric T.; Prokopenko, Andrey V.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Enabling Low Mach Fluid Simulations Using Trilinos

Hu, Jonathan J.; Devine, Karen D.; Hoemmen, Mark F.; Lin, Paul L.; Rajamanickam, Sivasankaran R.; Roberts, Nathan V.; Siefert, Christopher S.; Trott, Christian R.; Prokopenko, Andrey P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Exploiting data representation for fault tolerance

Journal of Computational Science

Elliott, James J.; Hoemmen, Mark F.; Mueller, Frank M.

Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product's result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.

More Details

TYPE Journal Article YEAR 2015

OSTI DOI

Fault-tolerant iterative methods via selective reliability

Ferreira, Kurt; Heroux, Michael A.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Fault-tolerant iterative methods via selective reliability

Hoemmen, Mark F.; Heroux, Michael A.; Ferreira, Kurt

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Ifpack2 User's Guide 1.0

Prokopenko, Andrey V.; Siefert, Christopher S.; Hu, Jonathan J.; Hoemmen, Mark F.; Klinvex, Alicia M.

This is the definitive user manual for the I FPACK 2 package in the Trilinos project. I FPACK 2 pro- vides implementations of iterative algorithms (e.g., Jacobi, SOR, additive Schwarz) and processor- based incomplete factorizations. I FPACK 2 is part of the Trilinos T PETRA solver stack, is templated on index, scalar, and node types, and leverages node-level parallelism indirectly through its use of T PETRA kernels. I FPACK 2 can be used to solve to matrix systems with greater than 2 billion rows (using 64-bit indices). Any options not documented in this manual should be considered strictly experimental .

More Details

TYPE SAND Report YEAR 2016

OSTI DOI