Publications Search

Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162--C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble when applying iterative linear solvers to parameterized and stochastic linear systems. In this paper we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen--Loève expansions. Finally, we demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.

More Details

TYPE Journal Article YEAR 2018

DOI OSTI

The Kokkos Programming Model

Trott, Christian R.; Bova, Steven W.; Ellingwood, Nathan D.; Ibanez-Granados, Daniel A.; Labreche, Duane A.; Sunderland, Daniel; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Tacho: Memory-Scalable Task Parallel Sparse Cholesky Factorization

Kim, Kyungjoo; Edwards, Harold C.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

Kokkos Tutorial

Edwards, Harold C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

December 2017 ECP ST Project Review ECP Project WBS 2.3.1.04 : SNL ATDM PMR

Wilke, Jeremiah; Trott, Christian R.; Edwards, Harold C.; Glass, Micheal W.; Clay, Robert L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Alexa: Simulating Shock Hydrodynamics on the GPU using Kokkos

Ibanez-Granados, Daniel A.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos: Performance Portability and Productivity for C++ Applications

Edwards, Harold C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Kokkos Tutorial

Edwards, Harold C.; Trott, Christian R.; Foertter, Fernanda

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Trends in Data Locality Abstractions for HPC Systems

IEEE Transactions on Parallel and Distributed Systems

Unat, Didem; Dubey, Anshu; Hoefler, Torsten; Shalf, John B.; Abraham, Mark; Bianco, Mauro; Chamberlain, Bradford L.; Cledat, Romain; Edwards, Harold C.; Finkel, Hal; Fuerlinger, Karl; Hannig, Frank; Jeannot, Emmanuel; Kamil, Amir; Keasler, Jeff; Kelly, Paul H.J.; Leung, Vitus J.; Ltaief, Hatem; Maruyama, Naoya; Newburn, Chris J.; Pericas, Miquel

The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.

More Details

TYPE Journal Article YEAR 2017

DOI OSTI Scopus

Kokkos' Task DAG Capabilities

Edwards, Harold C.; Ibanez-Granados, Daniel A.

This report documents the ASC/ATDM Kokkos deliverable "Production Portable Dy- namic Task DAG Capability." This capability enables applications to create and execute a dynamic task DAG ; a collection of heterogeneous computational tasks with a directed acyclic graph (DAG) of "execute after" dependencies where tasks and their dependencies are dynamically created and destroyed as tasks execute. The Kokkos task scheduler executes the dynamic task DAG on the target execution resource; e.g. a multicore CPU, a manycore CPU such as Intel's Knights Landing (KNL), or an NVIDIA GPU. Several major technical challenges had to be addressed during development of Kokkos' Task DAG capability: (1) portability to a GPU with it's simplified hardware and micro- runtime, (2) thread-scalable memory allocation and deallocation from a bounded pool of memory, (3) thread-scalable scheduler for dynamic task DAG, (4) usability by applications.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

ASC ATDM Level 2 Milestone #6015: Asynchronous Many-Task Software Stack Demonstration

Bennett, Janine C.; Bettencourt, Matthew T.; Clay, Robert L.; Edwards, Harold C.; Glass, Micheal W.; Hollman, David S.; Kolla, Hemanth; Lifflander, Jonathan J.; Littlewood, David J.; Markosyan, Aram; Moore, Stan G.; Olivier, Stephen L.; Phipps, Eric T.; Rizzi, Francesco; Slattengren, Nicole L.; Sunderland, Daniel; Wilke, Jeremiah

This report is an outcome of the ASC ATDM Level 2 Milestone 6015: Asynchronous Many-Task Software Stack Demonstration. It comprises a summary and in depth analysis of DARMA and a DARMA-compliant Asynchronous Many-Task (AMT) runtime software stack. Herein performance and productivity of the over- all approach are assessed on benchmarks and proxy applications representative of the Sandia ATDM applications. As part of the effort to assess the perceived strengths and weaknesses of AMT models compared to more traditional methods, experiments were performed on ATS-1 (Advanced Technology Systems) test bed machines and Trinity. In addition to productivity and performance assessments, this report includes findings on the generality of DARMAs backend API as well as findings on interoperability with node- level and network-level system libraries. Together, this information provides a clear understanding of the strengths and limitations of the DARMA approach in the context of Sandias ATDM codes, to guide our future research and development in this area.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

On the Importance of Faster Atomics

Hammond, Simon; Trott, Christian R.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos Evolution: Task-DAG and Back-ends

Edwards, Harold C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Kokkos Task-DAG: Memory Management and Locality Challenges Conquered

Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos: The C++ Performance Portability Programming Model

Trott, Christian R.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos Hierarchical Task-Data Parallelism for C++ HPC Applications

Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos Tutorial

Edwards, Harold C.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures

SIAM Journal on Scientific Computing

Phipps, Eric T.; Edwards, Harold C.; Hoemmen, Mark F.; Hu, Jonathan J.; Rajamanickam, Sivasankaran

In this study, quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers of sources. Often simulation processes from sample to sample are similar and much of the data generated from each sample evaluation could be reused. We explore a new method for implementing sampling methods that simultaneously propagates groups of samples together in an embedded fashion, which we call embedded ensemble propagation. We show how this approach takes advantage of properties of modern computer architectures to improve performance by enabling reuse between samples, reducing memory bandwidth requirements, improving memory access patterns, improving opportunities for fine-grained parallelization, and reducing communication costs. We describe a software technique for implementing embedded ensemble propagation based on the use of C++ templates and describe its integration with various scientific computing libraries within Trilinos. We demonstrate improved performance, portability and scalability for the approach applied to the simulation of partial differential equations on a variety of CPU, GPU, and accelerator architectures, including up to 131,072 cores on a Cray XK7 (Titan).

More Details

TYPE Journal Article YEAR 2017

DOI OSTI