Publications

Results 101–125 of 199

Search results

Jump to search filters

Assessing the predictive capabilities of mini-applications

Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Barrett, Richard F.; Crozier, Paul; Doerfler, Douglas W.; Hammond, Simon; Heroux, Michael A.; Lin, Paul T.; Trucano, Timothy G.; Vaughan, Courtenay T.; Williams, Alan B.

The push to exascale computing is informed by the assumption that the architecture, regardless of the specific design, will be fundamentally different from petascale computers. The Mantevo project has been established to produce a set of proxies, or 'miniapps,' which enable rapid exploration of key performance issues that impact a broad set of scientific applications programs of interest to ASC and the broader HPC community. Understanding the conditions under which a miniapp can be confidently used as predictive of an applications' behavior must be clearly elucidated. Toward this end, we have developed a methodology for assessing the predictive capabilities of application proxies. Adhering to the spirit of experimental validation, our approach provides a framework for examining data from the application with that provided by their proxies. In this poster we present this methodology, and apply it to three miniapps developed by the Mantevo project. © 2012 IEEE.

More Details

ShyLU: A hybrid-hybrid solver for multicore platforms

Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

Rajamanickam, Sivasankaran; Boman, Erik G.; Heroux, Michael A.

With the ubiquity of multicore processors, it is crucial that solvers adapt to the hierarchical structure of modern architectures. We present ShyLU, a "hybrid-hybrid" solver for general sparse linear systems that is hybrid in two ways: First, it combines direct and iterative methods. The iterative part is based on approximate Schur complements where we compute the approximate Schur complement using a value-based dropping strategy or structure-based probing strategy. Second, the solver uses two levels of parallelism via hybrid programming (MPI+threads). ShyLU is useful both in shared-memory environments and on large parallel computers with distributed memory. In the latter case, it should be used as a sub domain solver. We argue that with the increasing complexity of compute nodes, it is important to exploit multiple levels of parallelism even within a single compute node. We show the robustness of ShyLU against other algebraic preconditioners. ShyLU scales well up to 384 cores for a given problem size. We also study the MPI-only performance of ShyLU against a hybrid implementation and conclude that on present multicore nodes MPI-only implementation is better. However, for future multicore machines (96 or more cores) hybrid/ hierarchical algorithms and implementations are important for sustained performance. © 2012 IEEE.

More Details

Cooperative application/OS DRAM fault recovery

Hoemmen, Mark F.; Ferreira, Kurt; Heroux, Michael A.; Brightwell, Ronald B.

Exascale systems will present considerable fault-tolerance challenges to applications and system software. These systems are expected to suffer several hard and soft errors per day. Unfortunately, many fault-tolerance methods in use, such as rollback recovery, are unsuitable for many expected errors, for example DRAM failures. As a result, applications will need to address these resilience challenges to more effectively utilize future systems. In this paper, we describe work on a cross-layer application/OS framework to handle uncorrected memory errors. We illustrate the use of this framework through its integration with a new fault-tolerant iterative solver within the Trilinos library, and present initial convergence results.

More Details

MiniGhost: A Miniapp for Exploring Boundary Exchange Strategies Using Stencil Computations in Scientific Parallel Computing

Barrett, Richard F.; Vaughan, Courtenay T.; Heroux, Michael A.

A broad range of scientific computation involves the use of difference stencils. In a parallel computing environment, this computation is typically implemented by decomposing the spacial domain, inducing a 'halo exchange' of process-owned boundary data. This approach adheres to the Bulk Synchronous Parallel (BSP) model. Because commonly available architectures provide strong inter-node bandwidth relative to latency costs, many codes 'bulk up' these messages by aggregating data into a message as a means of reducing the number of messages. A renewed focus on non-traditional architectures and architecture features provides new opportunities for exploring alternatives to this programming approach. In this report we describe miniGhost, a 'miniapp' designed for exploration of the capabilities of current as well as emerging and future architectures within the context of these sorts of applications. MiniGhost joins the suite of miniapps developed as part of the Mantevo project.

More Details

TriBITS Lifecycle Model: A Lean/Agile Software Lifecycle Model for Research-based Computational Science and Engineering and Applied Mathematical Software (V.1.0)

Willenbring, James M.; Heroux, Michael A.

Software lifecycles are becoming an increasingly important issue for computational science and engineering (CSE) software. The process by which a piece of CSE software begins life as a set of research requirements and then matures into a trusted high-quality capability is both commonplace and extremely challenging. Although an implicit lifecycle is obviously being used in any effort, the challenges of this process - respecting the competing needs of research vs. production - cannot be overstated. Here we describe a proposal for a well-defined software lifecycle process based on modern Lean/Agile software engineering principles. What we propose is appropriate for many CSE software projects that are initially heavily focused on research but also are expected to eventually produce usable high-quality capabilities. The model is related to TriBITS, a build, integration and testing system, which serves as a strong foundation for this lifecycle model, and aspects of this lifecycle model are ingrained in the TriBITS system. Here, we advocate three to four phases or maturity levels that address the appropriate handling of many issues associated with the transition from research to production software. The goals of this lifecycle model are to better communicate maturity levels with customers and to help to identify and promote Software Engineering (SE) practices that will help to improve productivity and produce better software. An important collection of software in this domain is Trilinos, which is used as the motivation and the initial target for this lifecycle model. However, many other related and similar CSE (and non-CSE) software projects can also make good use of this lifecycle model, especially those that use the TriBITS system. Indeed this lifecycle process, if followed, will enable large-scale sustainable integration of many complex CSE software efforts across several institutions.

More Details

A hybrid-hybrid solver for manycore platforms

SC'11 - Proceedings of the 2011 High Performance Computing Networking, Storage and Analysis Companion, Co-located with SC'11

Rajamanickam, Sivasankaran; Boman, Erik G.; Heroux, Michael A.

With the increasing levels of parallelism in a compute node, it is important to exploit multiple levels of parallelism even within a single compute node. We present ShyLU (pro- nounced\Shy-loo"for Scalable Hybrid LU), a\hybrid-hybrid" solver for general sparse linear systems that is hybrid in two ways: First, it combines direct and iterative methods. The iterative method is based on approximate Schur com- plements. Second, the solver uses two levels of parallelism via hybrid programming (MPI+threads). Our solver is use- ful both in shared-memory environments and on large par- allel computers with distributed memory (as a subdomain solver). We compare the robustness of ShyLU against other algebraic preconditioners. ShyLU scales well up to 192 cores for a given problem size. We compare at MPI performance of ShyLU against a hybrid implementation. We conclude that on present multicore nodes at MPI is better. However, for future manycore machines (48 or more cores) hybrid/ hi- erarchical algorithms and implementations are important for sustained performance. Copyright is held by the author/owner(s).

More Details
Results 101–125 of 199
Results 101–125 of 199