Publications

Results 1–25 of 32

Kokkos 3: Programming Model Extensions for the Exascale Era

IEEE Transactions on Parallel and Distributed Systems

Trott, Christian R.; Lebrun-Grandie, Damien; Arndt, Daniel; Ciesko, Jan; Dang, Vinh Q.; Ellingwood, Nathan D.; Gayatri, Rahulkumar; Harvey, Evan C.; Hollman, Daisy S.; Ibanez, Dan; Liber, Nevin; Madsen, Jonathan; Miles, Jeff; Poliakoff, David Z.; Powell, Amy J.; Rajamanickam, Sivasankaran R.; Simberg, Mikael; Sunderland, Dan; Turcksin, Bruno; Wilke, Jeremiah

As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high-performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. We describe the novel abstractions that have been added to Kokkos version 3 such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations to prepare for exascale era architectures. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Performance Portability of an SpMV Kernel Across Scientific Computing and Data Science Applications

Olivier, Stephen L.; Ellingwood, Nathan D.; Berry, Jonathan W.; Dunlavy, Daniel D.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2021

OSTI DOI

Kokkos Kernels 3.4

Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.; Dang, Vinh Q.; Ellingwood, Nathan D.; Harvey, Evan C.; Kelley, Brian M.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Kokkos Kernels: FY20 update

Berger-Vergiat, Luc B.; Rajamanickam, Sivasankaran R.; Dang, Vinh Q.; Ellingwood, Nathan D.; Kelley, Brian M.; Harvey, Evan C.; Wilke, Jeremiah J.; Acer, Seher A.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2021

OSTI DOI

Kokkos Kernels: FY20 update

Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.; Acer, Seher A.; Dang, Vinh Q.; Ellingwood, Nathan D.; Harvey, Evan C.; Kelley, Brian M.; Wilke, Jeremiah J.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2021

OSTI DOI

Performance Portability of an SpMV Kernel Across Scientific Computing and Data Science Applications

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Olivier, Stephen L.; Ellingwood, Nathan D.; Berry, Jonathan W.; Dunlavy, Daniel D.

Both the data science and scientific computing communities are embracing GPU acceleration for their most demanding workloads. For scientific computing applications, the massive volume of code and diversity of hardware platforms at supercomputing centers has motivated a strong effort toward performance portability. This property of a program, denoting its ability to perform well on multiple architectures and varied datasets, is heavily dependent on the choice of parallel programming model and which features of the programming model are used. In this paper, we evaluate performance portability in the context of a data science workload in contrast to a scientific computing workload, evaluating the same sparse matrix kernel on both. Among our implementations of the kernel in different performance-portable programming models, we find that many struggle to consistently achieve performance improvements using the GPU compared to simple one-line OpenMP parallelization on high-end multicore CPUs. We show one that does, and its performance approaches and sometimes even matches that of vendor-provided GPU math libraries.

More Details

TYPE Conference Paper YEAR 2021

Scopus OSTI DOI

Performance Portable Supernode-based Sparse Triangular Solver for Manycore Architectures

ACM International Conference Proceeding Series

Yamazaki, Ichitaro Y.; Rajamanickam, Sivasankaran R.; Ellingwood, Nathan D.

Sparse triangular solver is an important kernel in many computational applications. However, a fast, parallel, sparse triangular solver on a manycore architecture such as GPU has been an open issue in the field for several years. In this paper, we develop a sparse triangular solver that takes advantage of the supernodal structures of the triangular matrices that come from the direct factorization of a sparse matrix. We implemented our solver using Kokkos and Kokkos Kernels such that our solver is portable to different manycore architectures. This has the additional benefit of allowing our triangular solver to use the team-level kernels and take advantage of the hierarchical parallelism available on the GPU. We compare the effects of different scheduling schemes on the performance and also investigate an algorithmic variant called the partitioned inverse. Our performance results on an NVIDIA V100 or P100 GPU demonstrate that our implementation can be 12.4 × or 19.5 × faster than the vendor optimized implementation in NVIDIA's CuSPARSE library.

More Details

TYPE Conference Poster YEAR 2020

Scopus OSTI

Performance Portable Supernode-based Sparse Triangular Solver for Manycore Architecture

Yamazaki, Ichitaro Y.; Rajamanickam, Sivasankaran R.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Supernode-based Sparse Triangular Solver using Kokkos

Yamazaki, Ichitaro Y.; Rajamanickam, Sivasankaran R.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Practices and Challenges of Software Development for a Performance Portable Ecosystem

Ellingwood, Nathan D.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Kokkos Kernels

Rajamanickam, Sivasankaran R.; Acer, Seher A.; Berger-Vergiat, Luc B.; Dang, Vinh Q.; Ellingwood, Nathan D.; Kelley, Brian M.; Kim, Kyungjoo K.; Trott, Christian R.; Wilke, Jeremiah J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Batched Linear Algebra in Kokkos Kernels

Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.; Dang, Vinh Q.; Ellingwood, Nathan D.; Kim, Kyungjoo K.; McLendon, William C.; Trott, Christian R.; Wilke, Jeremiah J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Kokkos User Group: Code Deprecations for version 3.0

Ellingwood, Nathan D.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Kokkos Kernels

Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.; Dang, Vinh Q.; Ellingwood, Nathan D.; Kim, Kyungjoo K.; Trott, Christian R.; Wilke, Jason W.; McLendon, William C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Kokkos Kernels

Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.; Dang, Vinh Q.; Ellingwood, Nathan D.; Kim, Kyungjoo K.; McLendon, William C.; Trott, Christian R.; Wilke, Jeremiah J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

STPR 04 Milestone 6 Report

Trott, Christian R.; Ibanez-Granados, Daniel A.; Ellingwood, Nathan D.; Bova, S.W.; Labreche, Duane A.

This report documents the completion of milestone STPRO4-6 Kokkos Support for ASC applications and libraries. The team provided consultation and support for numerous ASC code projects including Sandias SPARC, EMPIRE, Aria, GEMMA, Alexa, Trilinos, LAMMPS and nimbleSM. Over the year more than 350 Kokkos github issues were resolved, with over 220 requiring fixes and enhancements to the code base. Resolving these requests, with many of them issued by ASC code teams, provided applications with the necessary capabilities in Kokkos to be successful.

More Details

TYPE Other Report YEAR 2018

OSTI DOI

KokkosKernels Overview

Rajamanickam, Sivasankaran R.; Deveci, Mehmet D.; Kim, Kyungjoo K.; Ellingwood, Nathan D.; Trott, Christian R.; Hu, Jonathan J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Lessons Learned: Experiences from Introducing the Kokkos Programming Model into Legacy Codes

Ellingwood, Nathan D.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Intrepid2: a PerformancePortable Package for Compatible HighOrder Finite Element Discretizations

Kim, Kyungjoo K.; Perego, Mauro P.; Ellingwood, Nathan D.; Peterson, Kara J.; Roberts, Nathan V.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Enhanced Profiling for Kokkos Applications

Hammond, Simon D.; Trott, Christian R.; Ibanez-Granados, Daniel A.; Edwards, Harold C.; Sunderland, Daniel S.; Ellingwood, Nathan D.; Brandt, James M.; Gentile, Ann C.; Cook, Jeanine C.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Kokkos Programming Model

Trott, Christian R.; Edwards, Harold C.; Ibanez-Granados, Daniel A.; Sunderland, Daniel S.; Bova, S.W.; Labreche, Duane A.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Using the Basker Linear Solvers in Xyce

Thornquist, Heidi K.; Mei, Ting M.; Rajamanickam, Sivasankaran R.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

The Kokkos Programming Model

Trott, Christian R.; Bova, S.W.; Ellingwood, Nathan D.; Ibanez-Granados, Daniel A.; Labreche, Duane A.; Sunderland, Daniel S.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts

Parallel Computing

Booth, Joshua D.; Ellingwood, Nathan D.; Thornquist, Heidi K.; Rajamanickam, Sivasankaran

Transient simulation in circuit simulation tools, such as SPICE and Xyce, depend on scalable and robust sparse LU factorizations for efficient numerical simulation of circuits and power grids. As the need for simulations of very large circuits grow, the prevalence of multicore architectures enable us to use shared memory parallel algorithms for such simulations. A parallel factorization is a critical component of such shared memory parallel simulations. We develop a parallel sparse factorization algorithm that can solve problems from circuit simulations efficiently, and map well to architectural features. This new factorization algorithm exposes hierarchical parallelism to accommodate irregular structure that arise in our target problems. It also uses a hierarchical two-dimensional data layout which reduces synchronization costs and maps to memory hierarchy found in multicore processors. We present an OpenMP based implementation of the parallel algorithm in a new multithreaded solver called Basker in the Trilinos framework. We present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulation. Basker achieves a geometric mean speedup of 5.91× on CPU (16 cores) and 7.4× on Xeon Phi (32 cores) relative to state-of-the-art solver KLU. Basker outperforms Intel MKL Pardiso solver (PMKL) by as much as 30× on CPU (16 cores) and 7.5× on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4× speedup on a challenging matrix sequence taken from an actual Xyce simulation.

More Details

TYPE Journal Article YEAR 2017

Scopus OSTI DOI

Intrepid2: Performance Portable Finite Element Discretization Library

Kim, Kyungjoo K.; Perego, Mauro P.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Results 1–25 of 32

Results 1–25 of 32