Publications

Results 76–100 of 199

Search results

Jump to search filters

Toward local failure local recovery resilience model using MPI-ULFM

ACM International Conference Proceeding Series

Teranishi, Keita; Heroux, Michael A.

The current system reaction to the loss of a single MPI process is to kill all the remaining processes and restart the application from the most recent checkpoint. This approach will become unfeasible for future extreme scale systems. We address this issue using an emerging resilient computing model called Local Failure Local Recovery (LFLR) that provides application developers with the ability to recover locally and continue application execution when a process is lost. We discuss the design of our software framework to enable the LFLR model using MPI-ULFM and demonstrate the resilient version of MiniFE that achieves a scalable recovery from process failures.

More Details

Report for the ASC CSSE L2 Milestone (4873) - Demonstration of Local Failure Local Recovery Resilient Programming Model

Heroux, Michael A.; Teranishi, Keita

Recovery from process loss during the execution of a distributed memory parallel application is presently achieved by restarting the program, typically from a checkpoint file. Future computer system trends indicate that the size of data to checkpoint, the lack of improvement in parallel file system performance and the increase in process failure rates will lead to situations where checkpoint restart becomes infeasible. In this report we describe and prototype the use of a new application level resilient computing model that manages persistent storage of local state for each process such that, if a process fails, recovery can be performed locally without requiring access to a global checkpoint file. LFLR provides application developers with an ability to recover locally and continue application execution when a process is lost. This report discusses what features are required from the hardware, OS and runtime layers, and what approaches application developers might use in the design of future codes, including a demonstration of LFLR-enabled MiniFE code from the Matenvo mini-application suite.

More Details

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Yamazaki, Ichitaro; Rajamanickam, Sivasankaran; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

More Details

HPCG Benchmark Technical Specification

Heroux, Michael A.

The High Performance Conjugate Gradient (HPCG) benchmark [cite SNL, UTK reports] is a tool for ranking computer systems based on a simple additive Schwarz, symmetric Gauss-Seidel preconditioned conjugate gradient solver. HPCG is similar to the High Performance Linpack (HPL), or Top 500, benchmark [1] in its purpose, but HPCG is intended to better represent how today’s applications perform. In this paper we describe the technical details of HPCG: how it is designed and implemented, what code transformations are permitted and how to interpret and report results.

More Details

Toward a New Metric for Ranking High Performance Computing Systems

Heroux, Michael A.

The High Performance Linpack (HPL), or Top 500, benchmark is the most widely recognized and discussed metric for ranking high performance computing systems. However, HPL is increasingly unreliable as a true measure of system performance for a growing collection of important science and engineering applications. In this paper we describe a new high performance conjugate gradient (HPCG) benchmark. HPCG is composed of computations and data access patterns more commonly found in applications. Using HPCG we strive for a better correlation to real scientific application performance and expect to drive computer system design and implementation in directions that will better impact performance improvement.

More Details

Trilinos Developers SQE Guide: ASC Software Quality Engineering Practices Version 3.0

Willenbring, James M.; Heroux, Michael A.

The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A new software capability is introduced into Trilinos as a package. A Trilinos package is an integral unit and, although there are exceptions such as utility packages, each package is typically developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. The Trilinos Developers SQE Guide is a resource for Trilinos package developers who are working under Advanced Simulation and Computing (ASC) and are therefore subject to the ASC Software Quality Engineering Practices as described in the Sandia National Laboratories Advanced Simulation and Computing (ASC) Software Quality Plan: ASC Software Quality Engineering Practices Version 3.0 document. The Trilinos Developer Policies webpage contains a lot of detailed information that is essential for all Trilinos developers. The Trilinos Software Lifecycle Model defines the default lifecycle model for Trilinos packages and provides a context for many of the practices listed in this document.

More Details

Navigating an evolutionary fast path to exascale

Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Barrett, Richard F.; Hammond, Simon; Vaughan, Courtenay T.; Doerfler, Douglas W.; Heroux, Michael A.

The computing community is in the midst of a disruptive architectural change. The advent of manycore and heterogeneous computing nodes forces us to reconsider every aspect of the system software and application stack. To address this challenge there is a broad spectrum of approaches, which we roughly classify as either revolutionary or evolutionary. With the former, the entire code base is re-written, perhaps using a new programming language or execution model. The latter, which is the focus of this work, seeks a piecewise path of effective incremental change. The end effect of our approach will be revolutionary in that the control structure of the application will be markedly different in order to utilize single-instruction multiple-data/thread (SIMD/SIMT), manycore and heterogeneous nodes, but the physics code fragments will be remarkably similar. Our approach is guided by a set of mission driven applications and their proxies, focused on balancing performance potential with the realities of existing application code bases. Although the specifics of this process have not yet converged, we find that there are several important steps that developers of scientific and engineering application programs can take to prepare for making effective use of these challenging platforms. Aiding an evolutionary approach is the recognition that the performance potential of the architectures is, in a meaningful sense, an extension of existing capabilities: vectorization, threading, and a re-visiting of node interconnect capabilities. Therefore, as architectures, programming models, and programming mechanisms continue to evolve, the preparations described herein will provide significant performance benefits on existing and emerging architectures. © 2012 IEEE.

More Details
Results 76–100 of 199
Results 76–100 of 199