Publications Search

Towards Architecture Aware Performance Portable Finite Element Code

Demeshko, Irina; Edwards, Harold C.; Heroux, Michael A.; Pawlowski, Roger; Phipps, Eric T.; Salinger, Andrew G.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Toward local failure local recovery resilience model using MPI-ULFM

ACM International Conference Proceeding Series

Teranishi, Keita; Heroux, Michael A.

The current system reaction to the loss of a single MPI process is to kill all the remaining processes and restart the application from the most recent checkpoint. This approach will become unfeasible for future extreme scale systems. We address this issue using an emerging resilient computing model called Local Failure Local Recovery (LFLR) that provides application developers with the ability to recover locally and continue application execution when a process is lost. We discuss the design of our software framework to enable the LFLR model using MPI-ULFM and demonstrate the resilient version of MiniFE that achieves a scalable recovery from process failures.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI Scopus

Toward Local Failure Local Recovery (LFLR) Resilience Model Using MPI-ULFM

Heroux, Michael A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Local Recovery of PDE Solvers from Hard Failures

Teranishi, Keita; Heroux, Michael A.; Gamell Balmana, Marc; Parashar, Manish

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Toward Local Failure Local Recovery (LFLR) Resilience Model Using MPI-ULFM

Teranishi, Keita; Heroux, Michael A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Local Recovery of PDE Solvers from Hard Failures

Teranishi, Keita; Gamell Balmana, Marc; Heroux, Michael A.; Parashar, Manish R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

A performance-portable implementation of the Albany ice sheet model: Kokkos approach

Demeshko, Irina; Edwards, Harold C.; Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Report for the ASC CSSE L2 Milestone (4873) - Demonstration of Local Failure Local Recovery Resilient Programming Model

Heroux, Michael A.; Teranishi, Keita

Recovery from process loss during the execution of a distributed memory parallel application is presently achieved by restarting the program, typically from a checkpoint file. Future computer system trends indicate that the size of data to checkpoint, the lack of improvement in parallel file system performance and the increase in process failure rates will lead to situations where checkpoint restart becomes infeasible. In this report we describe and prototype the use of a new application level resilient computing model that manages persistent storage of local state for each process such that, if a process fails, recovery can be performed locally without requiring access to a global checkpoint file. LFLR provides application developers with an ability to recover locally and continue application execution when a process is lost. This report discusses what features are required from the hardware, OS and runtime layers, and what approaches application developers might use in the design of future codes, including a demonstration of LFLR-enabled MiniFE code from the Matenvo mini-application suite.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on Distributed GPUs

Boman, Erik G.; Heroux, Michael A.; Hoemmen, Mark F.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Yamazaki, Ichitaro; Rajamanickam, Sivasankaran; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

System Software: A Necessary but Ill-prepared Hero

Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Toward the Next Generation of Parallel and Resilient Algorithms

Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable Manycore Computing for Sparse Computation

Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

HPCG Benchmark Technical Specification

Heroux, Michael A.

The High Performance Conjugate Gradient (HPCG) benchmark [cite SNL, UTK reports] is a tool for ranking computer systems based on a simple additive Schwarz, symmetric Gauss-Seidel preconditioned conjugate gradient solver. HPCG is similar to the High Performance Linpack (HPL), or Top 500, benchmark [1] in its purpose, but HPCG is intended to better represent how today’s applications perform. In this paper we describe the technical details of HPCG: how it is designed and implemented, what code transformations are permitted and how to interpret and report results.

More Details

TYPE SAND Report YEAR 2013

DOI OSTI

Building the Next Generation of Parallel and Resilient Applications and Libraries

Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The Mantevo ProjectMini-applications: Vehicles for Co-Design

Barrett, Richard F.; Heroux, Michael A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Co-Design Through Mini-Apps: Advising The Future of Hardware & Software

Hemstad, Jacob; Heroux, Michael A.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Co-Design in Action: HPCCG and the Intel Phi

Hemstad, Jacob; Heroux, Michael A.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Toward a New Metric for Ranking High Performance Computing Systems

Heroux, Michael A.

The High Performance Linpack (HPL), or Top 500, benchmark is the most widely recognized and discussed metric for ranking high performance computing systems. However, HPL is increasingly unreliable as a true measure of system performance for a growing collection of important science and engineering applications. In this paper we describe a new high performance conjugate gradient (HPCG) benchmark. HPCG is composed of computations and data access patterns more commonly found in applications. Using HPCG we strive for a better correlation to real scientific application performance and expect to drive computer system design and implementation in directions that will better impact performance improvement.

More Details

TYPE SAND Report YEAR 2013

DOI OSTI

Trilinos Developers SQE Guide: ASC Software Quality Engineering Practices Version 3.0

Willenbring, James M.; Heroux, Michael A.

The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A new software capability is introduced into Trilinos as a package. A Trilinos package is an integral unit and, although there are exceptions such as utility packages, each package is typically developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. The Trilinos Developers SQE Guide is a resource for Trilinos package developers who are working under Advanced Simulation and Computing (ASC) and are therefore subject to the ASC Software Quality Engineering Practices as described in the Sandia National Laboratories Advanced Simulation and Computing (ASC) Software Quality Plan: ASC Software Quality Engineering Practices Version 3.0 document. The Trilinos Developer Policies webpage contains a lot of detailed information that is essential for all Trilinos developers. The Trilinos Software Lifecycle Model defines the default lifecycle model for Trilinos packages and provides a context for many of the practices listed in this document.

More Details

TYPE SAND Report YEAR 2013

DOI OSTI

Toward Effective Parallel Programming: What We Need and Don't Need

Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Toward Resilient Algorithms and Applications

Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

DOI OSTI

Next-generation programming models: What we need and do not need

Hoemmen, Mark F.; Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Experiences with Xeon Phi

Hammond, Simon; Rajamanickam, Sivasankaran; Ang, James A.; Barrett, Richard F.; Doerfler, Douglas W.; Heroux, Michael A.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Navigating an evolutionary fast path to exascale

Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Barrett, Richard F.; Hammond, Simon; Vaughan, Courtenay T.; Doerfler, Douglas W.; Heroux, Michael A.

The computing community is in the midst of a disruptive architectural change. The advent of manycore and heterogeneous computing nodes forces us to reconsider every aspect of the system software and application stack. To address this challenge there is a broad spectrum of approaches, which we roughly classify as either revolutionary or evolutionary. With the former, the entire code base is re-written, perhaps using a new programming language or execution model. The latter, which is the focus of this work, seeks a piecewise path of effective incremental change. The end effect of our approach will be revolutionary in that the control structure of the application will be markedly different in order to utilize single-instruction multiple-data/thread (SIMD/SIMT), manycore and heterogeneous nodes, but the physics code fragments will be remarkably similar. Our approach is guided by a set of mission driven applications and their proxies, focused on balancing performance potential with the realities of existing application code bases. Although the specifics of this process have not yet converged, we find that there are several important steps that developers of scientific and engineering application programs can take to prepare for making effective use of these challenging platforms. Aiding an evolutionary approach is the recognition that the performance potential of the architectures is, in a meaningful sense, an extension of existing capabilities: vectorization, threading, and a re-visiting of node interconnect capabilities. Therefore, as architectures, programming models, and programming mechanisms continue to evolve, the preparations described herein will provide significant performance benefits on existing and emerging architectures. © 2012 IEEE.

More Details

TYPE Conference YEAR 2012

OSTI Scopus

Publications

Search results