Publications

Results 1–50 of 61

TChem v3.0: A Software Toolkit for the Analysis of Complex Kinetic Models

Safta, Cosmin S.; Najm, H.N.; Diaz-Ibarra, Oscar H.; Kim, Kyungjoo K.

The TChem open-source software is a toolkit for computing thermodynamic properties, source term, and source term’s Jacobian matrix for chemical kinetic models that involve gas and surface reactions.

More Details

TYPE SAND Report YEAR 2021

OSTI DOI

CSPlib - A Toolkit for the Analysis of ODE/DAE Dynamical Systems and Chemical Kinetic Models

Diaz-Ibarra, Oscar H.; Kim, Kyungjoo K.; Safta, Cosmin S.; Najm, H.N.

CSPlib is an open source software library for analyzing general ordinary differential equation (ODE) systems and detailed chemical kinetic ODE/DAE systems. It relies on the computational singular perturbation (CSP) method for the analysis of these systems.

More Details

TYPE SAND Report YEAR 2021

OSTI DOI

Exascale Catalytic Chemistry (ECC)

Najm, H.N.; Sargsyan, Khachik S.; Kim, Kyungjoo K.; Goldsmith, C.F.; West, Richard H.; Bylaska, Eric J.; Bross, David H.; Ruscic, Branko R.; Safta, Cosmin S.; Zador, Judit Z.; Blais, Christopher B.; Blöndal, Katrín B.; Diaz-Ibarra, Oscar H.; Hermes, Eric H.; Mazeau, Emily J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2021

OSTI DOI

AUTOMATIC GENERATION AND ANALYSIS OF MICROKINETIC MODELS

Christopher, Blais C.; Diaz-Ibarra, Oscar H.; Mazeau, Emily J.; Gierada, Maciej G.; Hermes, Eric H.; Safta, Cosmin S.; Kim, Kyungjoo K.; Najm, H.N.; Bylaska, Eric J.; Zador, Judit Z.; Goldsmith, C.F.; West, Richard H.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2021

OSTI DOI

TChem ? An Open Source Computational Chemistry Software Library for Heterogeneous Computing Platforms

Safta, Cosmin S.; Diaz-Ibarra, Oscar H.; Kim, Kyungjoo K.; Najm, H.N.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2021

OSTI DOI

TChem - An Open Source Computational Chemistry Software Library for Heterogenous Computing Platforms

Safta, Cosmin S.; Diaz-Ibarra, Oscar H.; Najm, H.N.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2021

OSTI

Removal of the UVM Requirement from Tpetra: MultiVector and BlockMultiVector

Devine, Karen D.; Danielson, Geoffrey C.; Fuller, Timothy J.; Hu, Jonathan J.; Kelley, Brian M.; Kim, Kyungjoo K.; Siefert, Christopher S.; Smith, Timothy A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Using computational singular perturbation as a diagnostic tool in ODE and DAE systems: a case study in heterogeneous catalysis

Combustion Theory and Modelling

Diaz-Ibarra, Oscar H.; Kim, Kyungjoo K.; Safta, Cosmin S.; Zador, Judit Z.; Najm, H.N.

We have extended the computational singular perturbation (CSP) method to differential algebraic equation (DAE) systems and demonstrated its application in a heterogeneous-catalysis problem. The extended method obtains the CSP basis vectors for DAEs from a reduced Jacobian matrix that takes the algebraic constraints into account. We use a canonical problem in heterogeneous catalysis, the transient continuous stirred tank reactor (T-CSTR), for illustration. The T-CSTR problem is modelled fundamentally as an ordinary differential equation (ODE) system, but it can be transformed to a DAE system if one approximates typically fast surface processes using algebraic constraints for the surface species. We demonstrate the application of CSP analysis for both ODE and DAE constructions of a T-CSTR problem, illustrating the dynamical response of the system in each case. We also highlight the utility of the analysis in commenting on the quality of any particular DAE approximation built using the quasi-steady state approximation (QSSA), relative to the ODE reference case.

More Details

TYPE Journal Article YEAR 2021

Scopus OSTI DOI

CSPlib - A Software Toolkit for the Analysis of Dynamical Systems and Chemical Kinetic Models

Diaz-Ibarra, Oscar H.; Kim, Kyungjoo K.; Safta, Cosmin S.; Najm, H.N.

CSPlib is an open source software library for analyzing general ordinary differential equation (ODE) systems and detailed chemical kinetic ODE systems. It relies on the computational singular perturbation (CSP) method for the analysis of these systems. The software provides support for: General ODE models (gODE model class) for computing source terms and Jacobians for a generic ODE system; TChem model (ChemElemODETChem model class) for computing source term, Jacobian, other necessary chemical reaction data, as well as the rates of progress for a homogenous batch reactor using an elementary step detailed chemical kinetic reaction mechanism. This class relies on the TChem [2] library; A set of functions to compute essential elements of CSP analysis (Kernel class). This includes computations of the eigensolution of the Jacobian matrix, CSP basis vectors and co-vectors, time scales (reciprocals of the magnitudes of the Jacobian eigenvalues), mode amplitudes, CSP pointers, and the number of exhausted modes. This class relies on the Tines library; A set of functions to compute the eigensolution of the Jacobian matrix using Tines library GPU eigensolver; A set of functions to compute CSP indices (Index Class). This includes participation indices and both slow and fast importance indices.

More Details

TYPE SAND Report YEAR 2021

OSTI DOI

Kokkos Kernels

Rajamanickam, Sivasankaran R.; Acer, Seher A.; Berger-Vergiat, Luc B.; Dang, Vinh Q.; Ellingwood, Nathan D.; Kelley, Brian M.; Kim, Kyungjoo K.; Trott, Christian R.; Wilke, Jeremiah J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Load Balancing Eager K-truss on GPU and CPU via Fine-Grained Parallelism

Blanco, Mark P.; Low, Tze M.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Exploration of fine-grained parallelism for load balancing eager K-truss on GPU and CPU

2019 IEEE High Performance Extreme Computing Conference, HPEC 2019

Blanco, Mark; Low, Tze M.; Kim, Kyungjoo K.

In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU execution. We demonstrate our fine-grained parallel approach using implementations in Kokkos and evaluate them on an Intel Skylake CPU and an Nvidia Tesla V100 GPU. Overall, we observe between a 1.261. 48x improvement on the CPU and a 9.97-16.92x improvement on the GPU due to our fine-grained parallel formulation.

More Details

TYPE Conference Poster YEAR 2019

Scopus OSTI DOI

Performance Modeling of Vectorized SNAP Inter-Atomic Potentials on CPU Architectures

Blanco, Mark P.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Other Report YEAR 2019

OSTI DOI

Batched Linear Algebra in Kokkos Kernels

Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.; Dang, Vinh Q.; Ellingwood, Nathan D.; Kim, Kyungjoo K.; McLendon, William C.; Trott, Christian R.; Wilke, Jeremiah J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

FASTMath: Kokkos Kernels and Linear Solvers

Rajamanickam, Sivasankaran R.; Bogle, Ian A.; Hu, Jonathan J.; Devine, Karen D.; Slota, George M.; Perego, Mauro P.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU

Blanco, Mark P.; Kim, Kyungjoo K.; Low, Tze M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI DOI

Kokkos Kernels

Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.; Dang, Vinh Q.; Ellingwood, Nathan D.; Kim, Kyungjoo K.; Trott, Christian R.; Wilke, Jason W.; McLendon, William C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Solving Many Small Matrix Problems using Kokkos and KokkosKernels

Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Performance Portable SIMD Approach Implementing Block Line Solver For Coupled PDEs

Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Kokkos Kernels

Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.; Dang, Vinh Q.; Ellingwood, Nathan D.; Kim, Kyungjoo K.; McLendon, William C.; Trott, Christian R.; Wilke, Jeremiah J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Compact BLAS in Kokkos Kernels update

Rajamanickam, Sivasankaran R.; Kim, Kyungjoo K.; Dang, Vinh Q.; Howard, Micah A.; Bradley, Andrew M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Tacho: Memory-scalable task parallel sparse cholesky factorization

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Kim, Kyungjoo K.; Edwards, H.C.; Rajamanickam, Sivasankaran R.

We present a memory-scalable, parallel, sparse multifrontal solver for solving symmetric postive-definite systems arising in scientific and engineering applications. Factorizing sparse matrices requires memory for both the computed factors and the temporary workspaces for computing each frontal matrix - a data structure commonly used within multifrontal methods. To factorize multiple frontal matrices in parallel, the conventional approach is to allocate a uniform workspace for each hardware thread. In the manycore era, this results in increasing memory usage proportional to the number of hardware threads. We remedy this problem by using dynamic task parallelism with a scalable memory pool. Tasks are spawned while traversing an assembly tree and executed after their dependences are satisfied. We also use an idea to respawn the tasks when certain conditions are not met. Temporary workspace for frontal matrices in each task is allocated from a memory pool designed by us. If the requested memory space is not available in the memory pool, the task is respawned to yield the hardware thread to execute other tasks. The respawned task is executed after high priority tasks are executed. This approach allows to have robust parallel performance within a bounded memory space. Experimental results demonstrate the merits of our implementation on Intel multicore and manycore architectures.

More Details

TYPE Conference Poster YEAR 2018

Scopus OSTI DOI

Employing Multiple Levels of Parallelism for CFD at Large Scales on Next Generation High-Performance Computing Platforms

Howard, Micah A.; Fisher, Travis C.; Hoemmen, Mark F.; Dinzl, Derek J.; Overfelt, James R.; Bradley, Andrew M.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

KokkosKernels Overview

Rajamanickam, Sivasankaran R.; Deveci, Mehmet D.; Kim, Kyungjoo K.; Ellingwood, Nathan D.; Trott, Christian R.; Hu, Jonathan J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Employing Multiple Levels of Parallelism for CFD at Large Scales on Next Generation High-Performance Computing Platforms

Howard, Micah A.; Fisher, Travis C.; Hoemmen, Mark F.; Dinzl, Derek J.; Overfelt, James R.; Bradley, Andrew M.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vector-friendly Batched BLAS and LAPACK Kernels : Design and Applications

Rajamanickam, Sivasankaran R.; Kim, Kyungjoo K.; Bradley, Andrew M.; Deveci, Mehmet D.; Trott, Christian R.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Intrepid2: a PerformancePortable Package for Compatible HighOrder Finite Element Discretizations

Kim, Kyungjoo K.; Perego, Mauro P.; Ellingwood, Nathan D.; Peterson, Kara J.; Roberts, Nathan V.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

KokkosKernels Overview

Rajamanickam, Sivasankaran R.; Deveci, Mehmet D.; Kim, Kyungjoo K.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Kokkoskernels: Portable Math and Graph Kernels

Rajamanickam, Sivasankaran R.; Kim, Kyungjoo K.; Deveci, Mehmet D.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Tacho: Memory-Scalable Task Parallel Sparse Cholesky Factorization

Kim, Kyungjoo K.; Edwards, Harold C.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI DOI

Designing vector-friendly compact BLAS and LAPACK kernels

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017

Kim, Kyungjoo K.; Costa, Timothy B.; Deveci, Mehmet D.; Bradley, Andrew M.; Hammond, Simon D.; Guney, Murat E.; Knepper, Sarah; Story, Shane; Rajamanickam, Sivasankaran R.

Many applications, such as PDE based simulations and machine learning, apply BLAS/LAPACK routines to large groups of small matrices. While existing batched BLAS APIs provide meaningful speedup for this problem type, a non-canonical data layout enabling cross-matrix vectorization may provide further significant speedup. In this paper, we propose a new compact data layout that interleaves matrices in blocks according to the SIMD vector length. We combine this compact data layout with a new interface to BLAS/LAPACK routines that can be used within a hierarchical parallel application. Our layout provides up to 14x, 45x, and 27x speedup against OpenMP loops around optimized DGEMM, DTRSM and DGETRF kernels, respectively, on the Intel Knights Landing architecture. We discuss the compact batched BLAS/LAPACK implementations in two libraries, KokkosKernels and Intel® Math Kernel Library. We demonstrate the APIs in a line solver for coupled PDEs. Finally, we present detailed performance analysis of our kernels.

More Details

TYPE Conference Poster YEAR 2017

Scopus OSTI DOI

Applications of Compact Batched Kernels

Rajamanickam, Sivasankaran R.; Bradley, Andrew M.; Deveci, Mehmet D.; Kim, Kyungjoo K.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Kokkoskernels

Rajamanickam, Sivasankaran R.; Bradley, Andrew M.; Deveci, Mehmet D.; Kim, Kyungjoo K.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

KokkosKernels: Performance-Portable Sparse Dense and Graph Kernels

Rajamanickam, Sivasankaran R.; Bradley, Andrew M.; Deveci, Mehmet D.; Hoemmen, Mark F.; Hammond, Simon D.; Kim, Kyungjoo K.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Designing Vector-Friendly Compact BLAS and LAPACK Kernels

Rajamanickam, Sivasankaran R.; Story, Shane S.; Knepper, Sarah K.; Guney, Murat G.; Hammond, Simon D.; Bradley, Andrew M.; Deveci, Mehmet D.; Costa, Tim C.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

SIMD Scalar Types for Outer-loop Vectorization

Phipps, Eric T.; Kim, Kyungjoo K.; Rajamanickam, Sivasankaran R.; Tupek, Michael R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Designing Vector-Friendly Compact BLAS and LAPACK Kernels

Kim, Kyungjoo K.; Costa, Timothy B.; Deveci, Mehmet D.; Bradley, Andrew M.; Hammond, Simon D.; Guney, Murat G.; Knepper, Sarah K.; Story, Shane S.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI DOI

Performance Portable Line Smoother for Multiphysics Problems using Compact Batched BLAS

Kim, Kyungjoo K.; Deveci, Mehmet D.; Bradley, Andrew M.; Hammond, Simon D.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

ShyLU: A Collection of Node-Scalable Sparse Linear Solvers

Rajamanickam, Sivasankaran R.; Bradley, Andrew M.; Kim, Kyungjoo K.; Boman, Erik G.; Deveci, Mehmet D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

KokkosKernels: Compact Layouts for Batched Blas and Sparse Matrix-Matrix multiply

Rajamanickam, Sivasankaran R.; Bradley, Andrew M.; Kim, Kyungjoo K.; Deveci, Mehmet D.; Trott, Christian R.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Intrepid2: Performance Portable Finite Element Discretization Library

Kim, Kyungjoo K.; Perego, Mauro P.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

ECP 1.3.3.03a Develop General CS Components for ATDM Applications

Pawlowski, Roger P.; Bartlett, Roscoe B.; Bettencourt, Matthew T.; Carleton, James B.; Conde, Sidafa C.; Cyr, Eric C.; Kim, Kyungjoo K.; Mota, Alejandro M.; Perego, Mauro P.; Shadid, John N.; Sjaardema, Gregory D.; Toth, Alexander R.; Bradley, Andrew M.; Spotz, William S.; Ober, Curtis C.; Kalashnikova, Irina

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Kokkos Task API: A Use Case in Tacho

Kim, Kyungjoo K.; Rajamanickam, Sivasankaran R.; Edwards, Harold C.; Olivier, Stephen L.; Stelle, George

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Intrepid2: Towards Performance Portability

Kim, Kyungjoo K.; Perego, Mauro P.; Ellingwood, Nathan D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

KokkosKernels Introduction: Design API and Performance

Deveci, Mehmet D.; Rajamanickam, Sivasankaran R.; Kim, Kyungjoo K.; Bradley, Andrew M.; Trott, Christian R.; Hoemmen, Mark F.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Tacho: Two-level Task Parallel Cholesky Factorization

Kim, Kyungjoo K.; Rajamanickam, Sivasankaran R.; Edwards, Harold C.; Dohrmann, Clark R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Hierarchical Task-Data Parallelism using Kokkos and Qthreads

Edwards, Harold C.; Olivier, Stephen L.; Berry, Jonathan W.; Mackey, Greg; Rajamanickam, Sivasankaran R.; Wolf, Michael W.; Kim, Kyungjoo K.; Stelle, George

This report describes a new capability for hierarchical task-data parallelism using Sandia's Kokkos and Qthreads, and evaluation of this capability with sparse matrix Cholesky factor- ization and social network triangle enumeration mini-applications. Hierarchical task-data parallelism consists of a collection of tasks with executes-after dependences where each task contains data parallel operations performed on a team of hardware threads. The collection of tasks and dependences form a directed acyclic graph of tasks - a task DAG . Major chal- lenges of this research and development effort include: portability and performance across multicore CPU; manycore Intel Xeon Phi, and NVIDIA GPU architectures; scalability with respect to hardware concurrency and size of the task DAG; and usability of the application programmer interface (API).

More Details

TYPE SAND Report YEAR 2016

OSTI DOI

A Massively Parallel Scalable Implicit SPH Solver

Trask, Nathaniel T.; Maxey, Martin M.; Kim, Kyungjoo K.; Perego, Mauro P.; Parks, Michael L.; Yang, Kai Y.; Xu, Jinchao X.; Pan, Wenxiao P.; Tartakovsky, Alex T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

A comparison of high-level programming choices for incomplete sparse factorization across different architectures

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Booth, Joshua D.; Kim, Kyungjoo K.; Rajamanickam, Sivasankaran R.

All many-core systems require fine-grained shared memory parallelism, however the most efficient way to extract such parallelism is far from trivial. Fine-grained parallel algorithms face various performance trade-offs related to tasking, accesses to global data-structures, and use of shared cache. While programming models provide high level abstractions, such as data and task parallelism, algorithmic choices still remain open on how to best implement irregular algorithms, such as sparse factorizations, while taking into account the trade-offs mentioned above. In this paper, we compare these performance trade-offs for task and data parallelism on different hardware architectures such as Intel Sandy Bridge, Intel Xeon Phi, and IBM Power8. We do this by comparing the scaling of a new task-parallel incomplete sparse Cholesky factorization called Tacho and a new data-parallel incomplete sparse LU factorization called Basker. Both solvers utilize Kokkos programming model and were developed within the ShyLU package of Trilinos. Using these two codes we demonstrate how high-level programming changes affect performance and overhead costs on multiple multi/many-core systems. We find that Kokkos is able to provide comparable performance with both parallel-for and task/futures on traditional x86 multicores. However, the choice of which high-level abstraction to use on many-core systems depends on both the architectures and input matrices.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI DOI

Task and Data Parallelism Based Direct Solvers and Preconditioners in Manycore Architecture: Efforts in Trilinos/ShyLU

Booth, Joshua D.; Rajamanickam, Sivasankaran R.; Bradley, Andrew M.; Boman, Erik G.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Results 1–50 of 61

Results 1–50 of 61