Page 5 – Research

Cooperative application/OS DRAM fault recovery

Hoemmen, Mark F.; Ferreira, Kurt; Heroux, Michael A.; Brightwell, Ronald B.

Exascale systems will present considerable fault-tolerance challenges to applications and system software. These systems are expected to suffer several hard and soft errors per day. Unfortunately, many fault-tolerance methods in use, such as rollback recovery, are unsuitable for many expected errors, for example DRAM failures. As a result, applications will need to address these resilience challenges to more effectively utilize future systems. In this paper, we describe work on a cross-layer application/OS framework to handle uncorrected memory errors. We illustrate the use of this framework through its integration with a new fault-tolerant iterative solver within the Trilinos library, and present initial convergence results.

More Details

TYPE SAND Report YEAR 2012

OSTI DOI

Proposal for a future-scalable linear algebra interface for Krylov methods

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Effective and Efficient Handling of Ill - Conditioned Correlation Matrices in Kriging and Gradient Enhanced Kriging Emulators Through Pivoted Cholesky Factorization

Dalbey, Keith D.; Day, David M.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Fault-tolerant iterative methods via selective reliability

Ferreira, Kurt; Heroux, Michael A.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Communication-Avoiding GMRES Implementation Issues

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Fault-tolerant iterative methods via selective reliability

Hoemmen, Mark F.; Heroux, Michael A.; Ferreira, Kurt

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Copy of Next-generation iterative solvers for next-generation computing: Anasazi and Belos

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Next-generation iterative solvers for next-generation computing: Anasazi and Belos

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A Tutorial on Anasazi and Belos

Thornquist, Heidi K.; Hoemmen, Mark F.; Heroux, Michael A.; Lehoucq, Richard B.; Parks, Michael L.; Day, David M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

An overview of Trilinos

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Architecture-aware algorithms for extreme-scale computing

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Cooperative Application/OS DRAM Fault Recovery

Hoemmen, Mark F.; Ferreira, Kurt; Heroux, Michael A.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A communication-avoiding hybrid-parallel rank-revealing orthogonalization

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A communication-avoiding hybrid-parallel rank-revealing orthogonalization method

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method

Hoemmen, Mark F.

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches for orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.

More Details

TYPE Conference YEAR 2010

OSTI

Publications