Publications Search

Architectures with multiple classes of memory media are becoming a common part of mainstream supercomputer deployments. So called multi-level memories offer differing characteristics for each memory component including variation in bandwidth, latency and capacity. This paper investigates the performance of sparse matrix multiplication kernels on two leading highperformance computing architectures — Intel's Knights Landing processor and NVIDIA's Pascal GPU. We describe a data placement method and a chunking-based algorithm for our kernels that exploits the existence of the multiple memory spaces in each hardware platform. We evaluate the performance of these methods w.r.t. standard algorithms using the auto-caching mechanisms Our results show that standard algorithms that exploit cache reuse performed as well as multi-memory-aware algorithms for architectures such as Ki\iLs where the memory subsystems have similar latencies. However, for architectures such as GPUS where memory subsystems differ significantly in both bandwidth and latency, multi-memory-aware methods are crucial for good performance. In addition, our new approaches permit the user to run problems that require larger capacities than the fastest memory of each compute node without depending on the software-managed cache mechanisms.

More Details

TYPE Other Report YEAR 2018

DOI OSTI

ECP Hardware and Integration - Hardware Evaluation All Hands

Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Multi-threaded Sparse Matrix Matrix Multiplication with Applications in Scientific Computing and Graph Analytics

Deveci, Mehmet; Wolf, Michael; Berry, Jonathan; Rajamanickam, Sivasankaran; Boman, Erik G.; Trott, Christian R.; Hammond, Simon; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vector-friendly Batched BLAS and LAPACK Kernels : Design and Applications

Rajamanickam, Sivasankaran; Kim, Kyungjoo; Bradley, Andrew M.; Deveci, Mehmet; Trott, Christian R.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Continuous Performance Tracking for Kokkos Applications Using LDMS

Brandt, James M.; Hammond, Simon; Tucker, Thomas; Gentile, Ann C.; Cook, Jeanine

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Interconnect Working Group

Hemmert, Karl S.; Bair, Ray; Bhatele, Abhinav; Groves, Taylor; Hammond, Simon; Jain, Nikhil; Levenhagen, Michael; Mubarak, Misbah; Pakin, Scott; Ross, Rob; Wilke, Jeremiah

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Analyzing Exascale Memory Architectures Using the SST Toolkit

Hughes, Clayton; Awad, Amro; Hammond, Simon; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

SST Simulation Framework (and Complex Memory)

Hammond, Simon; Hughes, Clayton; Awad, Amro; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hemmert, Karl S.; Levenhagen, Michael; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Enhanced Profiling for Kokkos Applications

Hammond, Simon; Trott, Christian R.; Ibanez-Granados, Daniel A.; Edwards, Harold C.; Sunderland, Daniel; Ellingwood, Nathan D.; Brandt, James M.; Gentile, Ann C.; Cook, Jeanine; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Threaded Assembly in Aria Expressions

Clausen, Jonathan; Brunini, Victor; Forster, Chris; Noble, David R.; Hoemmen, Mark F.; Hammond, Simon; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Sandia ATDM Performance Execution Tools & Analysis

Hammond, Simon; Vaughan, Courtenay T.; Dinge, Dennis; Lin, Paul T.; Benner, Robert E.; Hughes, Clayton; Trott, Christian R.; Cook, Jeanine; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Profiling and Debugging Support for the Kokkos Programming Model

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Hammond, Simon; Trott, Christian R.; Ibanez-Granados, Daniel A.; Sunderland, Daniel

Supercomputing hardware is undergoing a period of significant change. In order to cope with the rapid pace of hardware and, in many cases, programming model innovation, we have developed the Kokkos Programming Model – a C++-based abstraction that permits performance portability across diverse architectures. Our experience has shown that the abstractions developed can significantly frustrate debugging and profiling activities because they break expected code proximity and layout assumptions. In this paper we present the Kokkos Profiling interface, a lightweight, suite of hooks to which debugging and profiling tools can attach to gain deep insights into the execution and data structure behaviors of parallel programs written to the Kokkos interface.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI Scopus