Page 4 – Center for Computing Research (CCR)

Architectures with multiple classes of memory media are becoming a common part of mainstream supercomputer deployments. So called multi-level memories offer differing characteristics for each memory component including variation in bandwidth, latency and capacity. This paper investigates the performance of sparse matrix multiplication kernels on two leading highperformance computing architectures — Intel's Knights Landing processor and NVIDIA's Pascal GPU. We describe a data placement method and a chunking-based algorithm for our kernels that exploits the existence of the multiple memory spaces in each hardware platform. We evaluate the performance of these methods w.r.t. standard algorithms using the auto-caching mechanisms Our results show that standard algorithms that exploit cache reuse performed as well as multi-memory-aware algorithms for architectures such as Ki\iLs where the memory subsystems have similar latencies. However, for architectures such as GPUS where memory subsystems differ significantly in both bandwidth and latency, multi-memory-aware methods are crucial for good performance. In addition, our new approaches permit the user to run problems that require larger capacities than the fastest memory of each compute node without depending on the software-managed cache mechanisms.

More Details

TYPE Other Report YEAR 2018

OSTI DOI

Multi-threaded Sparse Matrix Matrix Multiplication with Applications in Scientific Computing and Graph Analytics

Deveci, Mehmet D.; Wolf, Michael W.; Berry, Jonathan W.; Rajamanickam, Sivasankaran R.; Boman, Erik G.; Trott, Christian R.; Hammond, Simon D.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vector-friendly Batched BLAS and LAPACK Kernels : Design and Applications

Rajamanickam, Sivasankaran R.; Kim, Kyungjoo K.; Bradley, Andrew M.; Deveci, Mehmet D.; Trott, Christian R.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

ECP Hardware and Integration - Hardware Evaluation All Hands

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Enhanced Profiling for Kokkos Applications

Hammond, Simon D.; Trott, Christian R.; Ibanez-Granados, Daniel A.; Edwards, Harold C.; Sunderland, Daniel S.; Ellingwood, Nathan D.; Brandt, James M.; Gentile, Ann C.; Cook, Jeanine C.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Continuous Performance Tracking for Kokkos Applications Using LDMS

Brandt, James M.; Hammond, Simon D.; Tucker, Thomas T.; Gentile, Ann C.; Cook, Jeanine C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Threaded Assembly in Aria Expressions

Clausen, Jonathan C.; Brunini, Victor B.; Forster, Chris F.; Noble, David R.; Hoemmen, Mark F.; Hammond, Simon D.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Interconnect Working Group

Hemmert, Karl S.; Bair, Ray B.; Bhatele, Abhinav B.; Groves, Taylor G.; Hammond, Simon D.; Jain, Nikhil J.; Levenhagen, Michael J.; Mubarak, Misbah M.; Pakin, Scott P.; Ross, Rob R.; Wilke, Jeremiah J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

SST Simulation Framework (and Complex Memory)

Hammond, Simon D.; Hughes, Clayton H.; Awad, Amro A.; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hemmert, Karl S.; Levenhagen, Michael J.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Analyzing Exascale Memory Architectures Using the SST Toolkit

Hughes, Clayton H.; Awad, Amro A.; Hammond, Simon D.; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Sandia ATDM Performance Execution Tools & Analysis

Hammond, Simon D.; Vaughan, Courtenay T.; Dinge, Dennis D.; Lin, Paul L.; Benner, R.E.; Hughes, Clayton H.; Trott, Christian R.; Cook, Jeanine C.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Towards a Scalable Integrated Simulation Framework for Extreme Heterogeneity in High Performance Computing

Hammond, Simon D.; Rodrigues, Arun; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Hughes, Clayton H.; Levenhagen, Michael J.; Hoekstra, Robert J.; Ang, James A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

NALU Engineering Application Overview

Hammond, Simon D.; Hoekstra, Robert J.; Rodrigues, Arun; Ang, James A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Designing vector-friendly compact BLAS and LAPACK kernels

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017

Kim, Kyungjoo K.; Costa, Timothy B.; Deveci, Mehmet D.; Bradley, Andrew M.; Hammond, Simon D.; Guney, Murat E.; Knepper, Sarah; Story, Shane; Rajamanickam, Sivasankaran R.

Many applications, such as PDE based simulations and machine learning, apply BLAS/LAPACK routines to large groups of small matrices. While existing batched BLAS APIs provide meaningful speedup for this problem type, a non-canonical data layout enabling cross-matrix vectorization may provide further significant speedup. In this paper, we propose a new compact data layout that interleaves matrices in blocks according to the SIMD vector length. We combine this compact data layout with a new interface to BLAS/LAPACK routines that can be used within a hierarchical parallel application. Our layout provides up to 14x, 45x, and 27x speedup against OpenMP loops around optimized DGEMM, DTRSM and DGETRF kernels, respectively, on the Intel Knights Landing architecture. We discuss the compact batched BLAS/LAPACK implementations in two libraries, KokkosKernels and Intel® Math Kernel Library. We demonstrate the APIs in a line solver for coupled PDEs. Finally, we present detailed performance analysis of our kernels.

More Details

TYPE Conference Poster YEAR 2017

Scopus OSTI DOI

Towards an Open Source Eco-System for Future HPC Designs

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Publications