Publications Search

Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures

Deveci, Mehmet; Trott, Christian R.; Rajamanickam, Sivasankaran

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.

More Details

TYPE Other Report YEAR 2017

DOI OSTI

The Kokkos Programming Model

Trott, Christian R.; Bova, Steven W.; Ellingwood, Nathan D.; Ibanez, Daniel A.; Labreche, Duane A.; Sunderland, Daniel; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkoskernels: Portable Math and Graph Kernels

Rajamanickam, Sivasankaran; Kim, Kyungjoo; Deveci, Mehmet; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

December 2017 ECP ST Project Review ECP Project WBS 2.3.1.04 : SNL ATDM PMR

Wilke, Jeremiah; Trott, Christian R.; Edwards, Harold C.; Glass, Micheal W.; Clay, Robert L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

A Brief Description of the Kokkos implementation of the SNAP potential in ExaMiniMD

Thompson, Aidan P.; Trott, Christian R.

Within the EXAALT project, the SNAP [1] approach is being used to develop high accuracy potentials for use in large-scale long-time molecular dynamics simulations of materials behavior. In particular, we have developed a new SNAP potential that is suitable for describing the interplay between helium atoms and vacancies in high-temperature tungsten[2]. This model is now being used to study plasma-surface interactions in nuclear fusion reactors for energy production. The high-accuracy of SNAP potentials comes at the price of increased computational cost per atom and increased computational complexity. The increased cost is mitigated by improvements in strong scaling that can be achieved using advanced algorithms [3].

More Details

TYPE Other Report YEAR 2017

DOI OSTI

Applications of Compact Batched Kernels

Rajamanickam, Sivasankaran; Bradley, Andrew M.; Deveci, Mehmet; Kim, Kyungjoo; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkoskernels

Rajamanickam, Sivasankaran; Bradley, Andrew M.; Deveci, Mehmet; Kim, Kyungjoo; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

KokkosKernels: Performance-Portable Sparse Dense and Graph Kernels

Rajamanickam, Sivasankaran; Bradley, Andrew M.; Deveci, Mehmet; Hoemmen, Mark F.; Hammond, Simon; Kim, Kyungjoo; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Kokkos Tutorial

Edwards, Harold C.; Trott, Christian R.; Foertter, Fernanda

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

1541 L2 Milestone: Thread Scalable Expression Assembly in Aria

Clausen, Jonathan; Brunini, Victor; Forster, Christopher J.; Noble, David R.; Trott, Christian R.; Hammond, Simon; Hoemmen, Mark F.; Lin, Paul T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Performance Portable Sparse Matrix Matrix Multiplication with Applications in Scientific Computing and Graph Analytics

Deveci, Mehmet; Trott, Christian R.; Hammond, Simon; Wolf, Michael; Berry, Jonathan; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Solving the performance portability issue with Kokkos

Trott, Christian R.; Plimpton, Steven J.; Thompson, Aidan P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

On the Importance of Faster Atomics

Hammond, Simon; Trott, Christian R.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on Next-Generation Architectures

Garcia De Gonzalo, Simon; Hammond, Simon; Trott, Christian R.; Huw, Wen-Mei

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Prototyping the Next Generation of Aria

Clausen, Jonathan; Brunini, Victor; Forster, Christopher J.; Noble, David R.; Trott, Christian R.; Hammond, Simon; Hoemmen, Mark F.; Lin, Paul T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

A Classical MD Primer

Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Performance-portable sparse matrix-matrix multiplication for many-core architectures

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017

Deveci, Mehmet; Trott, Christian R.; Rajamanickam, Sivasankaran

We consider the problem of writing performance portablesparse matrix-sparse matrix multiplication (SPGEMM) kernelfor many-core architectures. We approach the SPGEMMkernel from the perspectives of algorithm design and implementation, and its practical usage. First, we design ahierarchical, memory-efficient SPGEMM algorithm. We thendesign and implement thread scalable data structures thatenable us to develop a portable SPGEMM implementation. We show that the method achieves performance portabilityon massively threaded architectures, namely Intel's KnightsLanding processors (KNLs) and NVIDIA's Graphic ProcessingUnits (GPUs), by comparing its performance to specializedimplementations. Second, we study an important aspectof SPGEMM's usage in practice by reusing the structure ofinput matrices, and show speedups up to 3× compared to thebest specialized implementation on KNLs. We demonstratethat the portable method outperforms 4 native methods on2 different GPU architectures (up to 17× speedup), and it ishighly thread scalable on KNLs, in which it obtains 101× speedup on 256 threads.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI Scopus