Publications Search

In order to support the codesign needs of ECP applications in current and future hardware in the area of machine learning, the ExaLearn team at Sandia studied the different machine learning use cases in three different ECP applications. This report is a summary of the needs of the three applications. The Sandia ExaLearn team will develop a proxy application representative of ECP application needs, specifically the ExaSky and EXAALT ECP projects. The proxy application will allow us to demonstrate performance portable kernels within machine learning codes. Furthermore, current training scalability of machine learning networks in these applications is negatively affected by large batch sizes. Training throughput of the network will increase as batch size increases, but network accuracy and generalization worsens. The proxy application will contain hybrid model- and data-parallelism to improve training efficiency while maintaining network accuracy. The proxy application will also target optimizing 3D convolutional layers, specific to scientific machine learning, which have not been as thoroughly explored by industry.

More Details

TYPE Other Report YEAR 2019

DOI OSTI

Trilinos Scalable Solvers and Kokkos Kernels

Hu, Jonathan J.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

ExaGraph: Parallel Coloring and Partitioning for Exascale Applications

Boman, Erik G.; Rajamanickam, Sivasankaran; Acer, Seher

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Kokkos Kernels

Rajamanickam, Sivasankaran; Berger-Vergiat, Luc; Dang, Vinh Q.; Ellingwood, Nathan D.; Kim, Kyungjoo; Mclendon, William; Trott, Christian R.; Wilke, Jeremiah

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

ECP Panel on applications

Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Asynchronous Iterative Solvers for Extreme Scale

Boman, Erik G.; Chow, Edmond; Dongarra, Jack; Glusa, Christian; Rajamanickam, Sivasankaran; Szyld, Daniel; Yamazaki, Ichitaro

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

An Algebraic Sparsified Nested Dissection Algorithm using Low-Rank Approximations

Cambier, Leopold; Chen, Chao; Darve, Eric; Boman, Erik G.; Rajamanickam, Sivasankaran; Tuminaro, Raymond S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Using Graph Toolkits - Do interfaces matter

Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Scalable Generation of Graphs for Benchmarking HPC Community-Detection Algorithms

Slota, George; Berry, Jonathan; Phillips, Cynthia A.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI

A Performance Portable SIMD Scalar Type for Effective Vectorization Across Heterogeneous Architectures

Sahasrabudhe, Damodar; Phipps, Eric T.; Rajamanickam, Sivasankaran; Berzins, Martin

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Fast Triangle Counting Using Cilk

Rajamanickam, Sivasankaran; Yasar, Abdurrahman; Wolf, Michael; Berry, Jonathan; Catlyurek, Umit

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

ExaLearn Application Interview

Rajamanickam, Sivasankaran; Wolf, Michael; Phipps, Eric T.; Ebeida, Mohamed; Debusschere, Bert

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Fast Triangle Counting Using Cilk

Rajamanickam, Sivasankaran; Yasar, Abdurrahman; Wolf, Michael; Berry, Jonathan; Catlyurek, Umit

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Compact BLAS in Kokkos Kernels update

Rajamanickam, Sivasankaran; Kim, Kyungjoo; Dang, Vinh Q.; Howard, Micah; Bradley, Andrew M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Trilinos Framework and Solvers

Rajamanickam, Sivasankaran; Hu, Jonathan J.; Devine, Karen; Wolf, Michael

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures

Parallel Computing

Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian R.

Sparse matrix-matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, KKSPGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.

More Details

TYPE Journal Article YEAR 2018

DOI OSTI Scopus

Fast Linear Algebra-Based Triangle Analytics with Kokkos Kernels

Yasar, Abdurrahman; Rajamanickam, Sivasankaran; Wolf, Michael; Berry, Jonathan; Catalyurek, Umit

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Tacho: Memory-scalable task parallel sparse cholesky factorization

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Kim, Kyungjoo; Edwards, H.C.; Rajamanickam, Sivasankaran

We present a memory-scalable, parallel, sparse multifrontal solver for solving symmetric postive-definite systems arising in scientific and engineering applications. Factorizing sparse matrices requires memory for both the computed factors and the temporary workspaces for computing each frontal matrix - a data structure commonly used within multifrontal methods. To factorize multiple frontal matrices in parallel, the conventional approach is to allocate a uniform workspace for each hardware thread. In the manycore era, this results in increasing memory usage proportional to the number of hardware threads. We remedy this problem by using dynamic task parallelism with a scalable memory pool. Tasks are spawned while traversing an assembly tree and executed after their dependences are satisfied. We also use an idea to respawn the tasks when certain conditions are not met. Temporary workspace for frontal matrices in each task is allocated from a memory pool designed by us. If the requested memory space is not available in the memory pool, the task is respawned to yield the hardware thread to execute other tasks. The respawned task is executed after high priority tasks are executed. This approach allows to have robust parallel performance within a bounded memory space. Experimental results demonstrate the merits of our implementation on Intel multicore and manycore architectures.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI Scopus

Generating massive random graphs that mimic real data

Slota, George; Berry, Jonathan; Phillips, Cynthia A.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Publications

Search results