Publications

Wilke, Jeremiah; Trott, Christian R.; Edwards, Harold C.; Glass, Micheal W.; Clay, Robert L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

December 2017 ECP ST Project Review ECP Project WBS 2.3.1.04 : SNL ATDM PMR

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Alexa: Simulating Shock Hydrodynamics on the GPU using Kokkos

Ibanez-Granados, Daniel A.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos Tutorial

Edwards, Harold C.; Trott, Christian R.; Foertter, Fernanda

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos: Performance Portability and Productivity for C++ Applications

Edwards, Harold C.; Ibanez-Granados, Daniel A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Trends in Data Locality Abstractions for HPC Systems

IEEE Transactions on Parallel and Distributed Systems

Unat, Didem; Dubey, Anshu; Hoefler, Torsten; Shalf, John B.; Abraham, Mark; Bianco, Mauro; Chamberlain, Bradford L.; Cledat, Romain; Edwards, Harold C.; Finkel, Hal; Fuerlinger, Karl; Hannig, Frank; Jeannot, Emmanuel; Kamil, Amir; Keasler, Jeff; Kelly, Paul H.J.; Leung, Vitus J.; Ltaief, Hatem; Maruyama, Naoya; Newburn, Chris J.; Pericas, Miquel

The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.

More Details

TYPE Journal Article YEAR 2017

DOI OSTI Scopus

Kokkos' Task DAG Capabilities

This report documents the ASC/ATDM Kokkos deliverable "Production Portable Dy- namic Task DAG Capability." This capability enables applications to create and execute a dynamic task DAG ; a collection of heterogeneous computational tasks with a directed acyclic graph (DAG) of "execute after" dependencies where tasks and their dependencies are dynamically created and destroyed as tasks execute. The Kokkos task scheduler executes the dynamic task DAG on the target execution resource; e.g. a multicore CPU, a manycore CPU such as Intel's Knights Landing (KNL), or an NVIDIA GPU. Several major technical challenges had to be addressed during development of Kokkos' Task DAG capability: (1) portability to a GPU with it's simplified hardware and micro- runtime, (2) thread-scalable memory allocation and deallocation from a bounded pool of memory, (3) thread-scalable scheduler for dynamic task DAG, (4) usability by applications.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

ASC ATDM Level 2 Milestone #6015: Asynchronous Many-Task Software Stack Demonstration

Bennett, Janine C.; Bettencourt, Matthew T.; Clay, Robert L.; Edwards, Harold C.; Glass, Micheal W.; Hollman, David S.; Kolla, Hemanth; Lifflander, Jonathan J.; Littlewood, David J.; Markosyan, Aram; Moore, Stan G.; Olivier, Stephen L.; Phipps, Eric T.; Rizzi, Francesco; Slattengren, Nicole L.; Sunderland, Daniel; Wilke, Jeremiah

This report is an outcome of the ASC ATDM Level 2 Milestone 6015: Asynchronous Many-Task Software Stack Demonstration. It comprises a summary and in depth analysis of DARMA and a DARMA-compliant Asynchronous Many-Task (AMT) runtime software stack. Herein performance and productivity of the over- all approach are assessed on benchmarks and proxy applications representative of the Sandia ATDM applications. As part of the effort to assess the perceived strengths and weaknesses of AMT models compared to more traditional methods, experiments were performed on ATS-1 (Advanced Technology Systems) test bed machines and Trinity. In addition to productivity and performance assessments, this report includes findings on the generality of DARMAs backend API as well as findings on interoperability with node- level and network-level system libraries. Together, this information provides a clear understanding of the strengths and limitations of the DARMA approach in the context of Sandias ATDM codes, to guide our future research and development in this area.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

Kokkos Evolution: Task-DAG and Back-ends

Hammond, Simon; Trott, Christian R.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

On the Importance of Faster Atomics

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos Task-DAG: Memory Management and Locality Challenges Conquered

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos Hierarchical Task-Data Parallelism for C++ HPC Applications

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos Tutorial

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkos: The C++ Performance Portability Programming Model

Phipps, Eric T.; Edwards, Harold C.; Hoemmen, Mark F.; Hu, Jonathan J.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures

SIAM Journal on Scientific Computing

In this study, quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers of sources. Often simulation processes from sample to sample are similar and much of the data generated from each sample evaluation could be reused. We explore a new method for implementing sampling methods that simultaneously propagates groups of samples together in an embedded fashion, which we call embedded ensemble propagation. We show how this approach takes advantage of properties of modern computer architectures to improve performance by enabling reuse between samples, reducing memory bandwidth requirements, improving memory access patterns, improving opportunities for fine-grained parallelization, and reducing communication costs. We describe a software technique for implementing embedded ensemble propagation based on the use of C++ templates and describe its integration with various scientific computing libraries within Trilinos. We demonstrate improved performance, portability and scalability for the approach applied to the simulation of partial differential equations on a variety of CPU, GPU, and accelerator architectures, including up to 131,072 cores on a Cray XK7 (Titan).

More Details

TYPE Journal Article YEAR 2017

DOI OSTI

Kokkos: Performance Portability Status

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Computing Derivatives for UQ on Emerging Manycore Architectures

Sunderland, Daniel; Edwards, Harold C.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Extending Kokkos with Task Parallelism

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Preparing Sandia's Application Portfolio for the Future Using Kokkos

Trott, Christian R.; Edwards, Harold C.; Hammond, Simon; Sunderland, Daniel

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Automatic Differentiation on Emerging Manycore Architectures with Sacado and Kokkos

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Kokkos: Performance Portability for C++ Codes

Edwards, Harold C.; Trott, Christian R.; Sunderland, Daniel

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos: Performance Portability and Productivity for C++ Applications

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Tacho: Two-level Task Parallel Cholesky Factorization

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Edwards, Harold C.; Dohrmann, Clark R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos: Performance Portability and Productivity for C++ Applications

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Kokkos Spaces: Expressing Locality in Heterogeneous Node Architectures

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Edwards, Harold C.; Olivier, Stephen L.; Stelle, George W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos Task API: A Use Case in Tacho

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos Technical Review Slides and Discussion Notes

Edwards, Harold C.; Sunderland, Daniel; Hoemmen, Mark F.; Ellingwood, Nathan D.; Trott, Christian R.; Mackey, Greg E.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Operator Overloading-based Automatic Differentiation of C++ Codes on Emerging Manycore Architectures

Wolf, Michael; Edwards, Harold C.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos/Qthreads Task Parallel Approach to Linear Algebra Based Graph Analytics

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Hierarchical Task-Data Parallelism using Kokkos and Qthreads

Edwards, Harold C.; Olivier, Stephen L.; Berry, Jonathan; Mackey, Greg E.; Rajamanickam, Sivasankaran; Wolf, Michael; Kim, Kyungjoo; Stelle, George W.

This report describes a new capability for hierarchical task-data parallelism using Sandia's Kokkos and Qthreads, and evaluation of this capability with sparse matrix Cholesky factorization and social network triangle enumeration mini-applications. Hierarchical task-data parallelism consists of a collection of tasks with executes-after dependences where each task contains data parallel operations performed on a team of hardware threads. The collection of tasks and dependences form a directed acyclic graph of tasks - a task DAG. Major challenges of this research and development effort include: portability and performance across multicore CPU; manycore Intel Xeon Phi, and NVIDIA GPU architectures; scalability with respect to hardware concurrency and size of the task DAG; and usability of the application programmer interface (API).

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

Kokkos Tutorial

Edwards, Harold C.; Trott, Christian R.; Amelang, Jeff

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos/Qthreads Task Parallel Approach to Linear Algebra Based Graph Analytics

Wolf, Michael; Edwards, Harold C.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Kokkos? Multidimensional Array and future directions for std::array_ref

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos Hierarchical Task-‐Data Parallelism for C++ HPC Applica9ons

Edwards, Harold C.; Hu, Jonathan J.; Phipps, Eric T.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Ensemble Grouping strategies for embedded Stochastic Collocation methods applied to anisotropic diffusion problems

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos- Performance Portability Today

Trott, Christian R.; Hammond, Simon; Edwards, Harold C.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

KokkosP: Runtime Hooks for Portable Performance Analysis

Hammond, Simon; Trott, Christian R.; Edwards, Harold C.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Embedded Ensemble Propagation for Improving Performance Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

Phipps, Eric T.; Edwards, Harold C.; Hoemmen, Mark F.; Hu, Jonathan J.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Performance Portability for Linear Algebra with Kokkos

Trott, Christian R.; Edwards, Harold C.; Ellingwood, Nathan D.; Hammond, Simon; Deveci, Mehmet; Boman, Erik G.; Bradley, Andrew M.; Hoemmen, Mark F.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos Tutorial

Edwards, Harold C.; Trott, Christian R.; Amelang, Jeff

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos: Manycore Programmability and Performance Portability

Trott, Christian R.; Edwards, Harold C.; Ellingwood, Nathan D.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos -- Portability Performance Productivity [PowerPoint]

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George W.; Edwards, Harold C.; Olivier, Stephen L.

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

Kokkos: Performance Portability and Productivity for Next Generation HPC

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Kokkos Tutorial

Edwards, Harold C.; Trott, Christian R.; Amelang, Jeff

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Kokkos: Enabling Performance Portablility

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Edwards, Harold C.; Olivier, Stephen L.; Stelle, George W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Task-parallel Sparse Incomplete Cholesky Factorization using Kokkos Portable APIs

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

ASC Trilab L2 Codesign Milestone 2015

Trott, Christian R.; Hammond, Simon; Dinge, Dennis; Lin, Paul T.; Vaughan, Courtenay T.; Cook, Jeanine; Rajan, Mahesh; Edwards, Harold C.; Hoekstra, Robert J.

For the FY15 ASC L2 Trilab Codesign milestone Sandia National Laboratories performed two main studies. The first study investigated three topics (performance, cross-platform portability and programmer productivity) when using OpenMP directives and the RAJA and Kokkos programming models available from LLNL and SNL respectively. The focus of this first study was the LULESH mini-application developed and maintained by LLNL. In the coming sections of the report the reader will find performance comparisons (and a demonstration of portability) for a variety of mini-application implementations produced during this study with varying levels of optimization. Of note is that the implementations utilized including optimizations across a number of programming models to help ensure claims that Kokkos can provide native-class application performance are valid. The second study performed during FY15 is a performance assessment of the MiniAero mini-application developed by Sandia. This mini-application was developed by the SIERRA Thermal-Fluid team at Sandia for the purposes of learning the Kokkos programming model and so is available in only a single implementation. For this report we studied its performance and scaling on a number of machines with the intent of providing insight into potential performance issues that may be experienced when similar algorithms are deployed on the forthcoming Trinity ASC ATS platform.

More Details

TYPE SAND Report YEAR 2015

DOI OSTI

Kokkos: Enabling Productivity and Performance Portability Across Next Generation Platforms

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

KOKKOS programming model and library

Abstract not provided.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

Kokkos: An Introduction

Trott, Christian R.; Edwards, Harold C.; Hammond, Simon

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Proxy App Usecases at Sandia

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Kokkos: Enabling Manycore Performance Portability for C++ Applications and Libraries

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

ASC L2 Trilab Codesign Milestone (Codesign at Sandia: LULESH and MiniAero)

Cook, Jeanine; Edwards, Harold C.; Dinge, Dennis; Glass, Micheal W.; Hammond, Simon; Hoekstra, Robert J.; Lin, Paul T.; Rajan, Mahesh; Trott, Christian R.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Kokkos: Enabling Performance Portability Across Next Generation Platforms

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Kokkos Performance Portable Thread-Parallel Execution and Data Structures On Next Generation Platforms

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Use of C++11 Features in Kokkos

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Assembly Algorithms for PDEs with Uncertain Input Data on Emerging Multicore Architectures

Demeshko, Irina; Edwards, Harold C.; Heroux, Michael A.; Salinger, Andrew G.; Pawlowski, Roger; Phipps, Eric T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Towards Exascale Implementation of the Finite Element Based Application Development Environment

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Embedded Sampling?Based Uncertainty Quantification Approaches for Emerging Computer Architectures

D'Elia, Marta; Phipps, Eric T.; Edwards, Harold C.; Hu, Jonathan J.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Kokkos Manycore Device Performance Portability for C++ HPC Applications

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

ASC-ATDM Performance Portability Requirements for 2015-2019

Demeshko, Irina; Bradley, Andrew M.; Cyr, Eric C.; Edwards, Harold C.; Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G.

This report outlines the research, development, and support requirements for the Advanced Simulation and Computing (ASC ) Advanced Technology, Development, and Mitigation (ATDM) Performance Portability (a.k.a., Kokkos) project for 2015 - 2019 . The research and development (R&D) goal for Kokkos (v2) has been to create and demonstrate a thread - parallel programming model a nd standard C++ library - based implementation that enables performance portability across diverse manycore architectures such as multicore CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. This R&D goal has been achieved for algorithms that use data parallel pat terns including parallel - for, parallel - reduce, and parallel - scan. Current R&D is focusing on hierarchical parallel patterns such as a directed acyclic graph (DAG) of asynchronous tasks where each task contain s nested data parallel algorithms. This five y ear plan includes R&D required to f ully and performance portably exploit thread parallelism across current and anticipated next generation platforms (NGP). The Kokkos library is being evaluated by many projects exploring algorithm s and code design for NGP. Some production libraries and applications such as Trilinos and LAMMPS have already committed to Kokkos as their foundation for manycore parallelism an d performance portability. These five year requirements includes support required for current and antic ipated ASC projects to be effective and productive in their use of Kokkos on NGP. The greatest risk to the success of Kokkos and ASC projects relying upon Kokkos is a lack of staffing resources to support Kokkos to the degree needed by these ASC projects. This support includes up - to - date tutorials, documentation, multi - platform (hardware and software stack) testing, minor feature enhancements, thread - scalable algorithm consulting, and managing collaborative R&D.

More Details

TYPE SAND Report YEAR 2015

DOI OSTI

A Kokkos Implementation of Albany: A Performance Portable Multiphysics Simulation Code

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Kokkos: Enabling Performance Portability of C++ Applications and Libraries across Manycore Architectures

Phipps, Eric T.; Edwards, Harold C.; Hu, Jonathan J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Programming Abstractions for Data Locality

Tate, Adrian; Kamil, Amir; Dubey, Anshu; Groblinger, Armin; Chamberlain, Brad; Goglin, Brice; Edwards, Harold C.; Newburn, Chris J.; Padua, David; Unat, Didem; Jeannot, Emmanuel; Hannig, Frank; Tobias, Gysi; Ltaief, Hatem; Sexton, James; Labarta, Jesus; Shalf, John; Fuerlinger, Karl; Brien, Leonidas'; Linardakis, Leonidas; Besta, MacIej; Sawley, Marie-Christine; Abraham, Mark; Bianco, Mauro; Pericas, Miquel; Maruyama, Naoya; Kelly, Paul H.J.; Messmer, Peter; Ross, Robert B.; Ciedat, Romain; Matsuoka, Satoshi; Schulthess, Thomas; Hoefler, Torsten; Leung, Vitus J.

The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models.

More Details

TYPE Other Report YEAR 2014

DOI OSTI

Improving Performance of Uncertainty Quantification Methods on Advanced Computing Architectures

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Kokkos: Enabling Performance Portability of C++ Applications and Libraries across Manycore Architectures

Demeshko, Irina; Edwards, Harold C.; Heroux, Michael A.; Phipps, Eric T.; Salinger, Andrew G.; Pawlowski, Roger

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Kokkos implementation of Albany: a performance-portable finite element application

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Towards Architecture Aware Performance Portable Finite Element Code

Demeshko, Irina; Edwards, Harold C.; Heroux, Michael A.; Pawlowski, Roger; Phipps, Eric T.; Salinger, Andrew G.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Threaded construction and fill of Tpetra sparse linear system using Kokkos

Hoemmen, Mark F.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Kokkos update: Memory Spaces Execution Spaces Execution Policies Defaults and C++11