Publications Search

This report describes a new capability for hierarchical task-data parallelism using Sandia's Kokkos and Qthreads, and evaluation of this capability with sparse matrix Cholesky factorization and social network triangle enumeration mini-applications. Hierarchical task-data parallelism consists of a collection of tasks with executes-after dependences where each task contains data parallel operations performed on a team of hardware threads. The collection of tasks and dependences form a directed acyclic graph of tasks - a task DAG. Major challenges of this research and development effort include: portability and performance across multicore CPU; manycore Intel Xeon Phi, and NVIDIA GPU architectures; scalability with respect to hardware concurrency and size of the task DAG; and usability of the application programmer interface (API).

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

Kokkos/Qthreads Task Parallel Approach to Linear Algebra Based Graph Analytics

Wolf, Michael; Edwards, Harold C.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Operator Overloading-based Automatic Differentiation of C++ Codes on Emerging Manycore Architectures

Phipps, Eric T.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos Tutorial

Edwards, Harold C.; Trott, Christian R.; Amelang, Jeff

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos/Qthreads Task Parallel Approach to Linear Algebra Based Graph Analytics

Wolf, Michael; Edwards, Harold C.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Kokkos? Multidimensional Array and future directions for std::array_ref

Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos- Performance Portability Today

Trott, Christian R.; Hammond, Simon; Edwards, Harold C.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

KokkosP: Runtime Hooks for Portable Performance Analysis

Hammond, Simon; Trott, Christian R.; Edwards, Harold C.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Ensemble Grouping strategies for embedded Stochastic Collocation methods applied to anisotropic diffusion problems

Edwards, Harold C.; Hu, Jonathan J.; Phipps, Eric T.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos Hierarchical Task-‐Data Parallelism for C++ HPC Applica9ons

Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Embedded Ensemble Propagation for Improving Performance Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

Phipps, Eric T.; Edwards, Harold C.; Hoemmen, Mark F.; Hu, Jonathan J.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Performance Portability for Linear Algebra with Kokkos

Trott, Christian R.; Edwards, Harold C.; Ellingwood, Nathan D.; Hammond, Simon; Deveci, Mehmet; Boman, Erik G.; Bradley, Andrew M.; Hoemmen, Mark F.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos: Manycore Programmability and Performance Portability

Edwards, Harold C.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos Tutorial

Edwards, Harold C.; Trott, Christian R.; Amelang, Jeff

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Kokkos -- Portability Performance Productivity [PowerPoint]

Trott, Christian R.; Edwards, Harold C.; Ellingwood, Nathan D.; Hammond, Simon

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George W.; Edwards, Harold C.; Olivier, Stephen L.

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

Kokkos: Performance Portability and Productivity for Next Generation HPC

Edwards, Harold C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Publications

Search results