Publications Search

Predicting electronic structures at any length scale with machine learning

npj Computational Materials

Fiedler, Lenz; Modine, N.A.; Schmerler, Steve; Vogel, Dayton J.; Popoola, Gabriel A.; Thompson, Aidan P.; Rajamanickam, Sivasankaran R.; Cangi, Attila

The properties of electrons in matter are of fundamental importance. They give rise to virtually all material properties and determine the physics at play in objects ranging from semiconductor devices to the interior of giant gas planets. Modeling and simulation of such diverse applications rely primarily on density functional theory (DFT), which has become the principal method for predicting the electronic structure of matter. While DFT calculations have proven to be very useful, their computational scaling limits them to small systems. We have developed a machine learning framework for predicting the electronic structure on any length scale. It shows up to three orders of magnitude speedup on systems where DFT is tractable and, more importantly, enables predictions on scales where DFT calculations are infeasible. Our work demonstrates how machine learning circumvents a long-standing computational bottleneck and advances materials science to frontiers intractable with any current solutions.

More Details

TYPE Journal Article YEAR 2023

DOI OSTI Scopus

New Linear Solvers Features and Improvements in Trilinos

Loe, Jennifer A.; Boman, Erik G.; Espinoza, Heliezer J.; Glusa, Christian A.; Harper, Graham B.; Higgins, Andrew J.; Rajamanickam, Sivasankaran R.; Siefert, Christopher S.; Switzer, Heather M.; Szyld, Daniel; Tuminaro, Raymond S.; Yamazaki, Ichitaro Y.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2023

DOI OSTI

Performance Portable Batched Sparse Linear Solvers

IEEE Transactions on Parallel and Distributed Systems

Liegeois, Kim A.; Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.

Solving large number of small linear systems is increasingly becoming a bottleneck in computational science applications. While dense linear solvers for such systems have been studied before, batched sparse linear solvers are just starting to emerge. In this paper, we discuss algorithms for solving batched sparse linear systems and their implementation in the Kokkos Kernels library. The new algorithms are performance portable and map well to the hierarchical parallelism available in modern accelerator architectures. The sparse matrix vector product (SPMV) kernel is the main performance bottleneck of the Krylov solvers we implement in this work. The implementation of the batched SPMV and its performance are therefore discussed thoroughly in this paper. The implemented kernels are tested on different Central Processing Unit (CPU) and Graphic Processing Unit (GPU) architectures. We also develop batched Conjugate Gradient (CG) and batched Generalized Minimum Residual (GMRES) solvers using the batched SPMV. Our proposed solver was able to solve 20,000 sparse linear systems on V100 GPUs with a mean speedup of 76x and 924x compared to using a parallel sparse solver with a block diagonal system with all the small linear systems, and compared to solving the small systems one at a time, respectively. We see mean speedup of 0.51 compared to dense batched solver of cuSOLVER on V100, while using lot less memory. Thorough performance evaluation on three different architectures and analysis of the performance are presented.

More Details

TYPE Journal Article YEAR 2023

DOI OSTI Scopus

An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

Yamazaki, Ichitaro Y.; Rajamanickam, Sivasankaran R.; Heinlein, Alexander

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2023

DOI OSTI

Jet: Multilevel Partitioning on GPUs

Gilbert, Michael; Madduri, Kamesh; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2023

OSTI

High-Performance GMRES Multi-Precision Benchmark

Yamazaki, Ichitaro Y.; Loe, Jennifer A.; Glusa, Christian A.; Rajamanickam, Sivasankaran R.; Luszczek, Piotr; Dongarra, Jack

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2023

DOI OSTI

An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs

Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023

Yamazaki, Ichitaro Y.; Heinlein, Alexander; Rajamanickam, Sivasankaran R.

The generalized Dryja-Smith-Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver's computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy.The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about 2× using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.

More Details

TYPE Conference Paper YEAR 2023

DOI OSTI Scopus

High-Performance GMRES Multi-Precision Benchmark

Yamazaki, Ichitaro Y.; Loe, Jennifer A.; Glusa, Christian A.; Rajamanickam, Sivasankaran R.; Luszczek, Piotr; Dongarra, Jack

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2022

DOI OSTI

Accelerating Selected DOE Machine Learning Workloads on SambaNova Systems

Rajamanickam, Sivasankaran R.; Eydenberg, Michael S.; Ho, Yang H.; Liu, Chen; Zhang, Leon; Zhou, Kuan; Sun, Johnl G.; Chen, Edison; Deng, Andrew; Wang, Mingran

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2022

DOI OSTI

Predicting the Electronic Structure of Matter on Ultra-Large Scales

Fiedler, Lenz; Modine, N.A.; Schmerler, Steve; Vogel, Dayton J.; Popoola, Gabriel A.; Thompson, Aidan P.; Rajamanickam, Sivasankaran R.; Cangi, Attila

The long-standing problem of predicting the electronic structure of matter on ultra-large scales (beyond 100,000 atoms) is solved with machine learning.

More Details

TYPE Other Report YEAR 2022

DOI OSTI

Half-Precision Scalar Support in Kokkos and Kokkos Kernels: An Engineering Study and Experience Report

Harvey, Evan C.; Milewicz, Reed M.; Trott, Christian R.; Berger-Vergiat, Luc B.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2022

DOI OSTI

Unified Language Frontend for Physic-Informed AI/ML

Kelley, Brian M.; Rajamanickam, Sivasankaran R.

Artificial intelligence and machine learning (AI/ML) are becoming important tools for scientific modeling and simulation as in several other fields such as image analysis and natural language processing. ML techniques can leverage the computing power available in modern systems and reduce the human effort needed to configure experiments, interpret and visualize results, draw conclusions from huge quantities of raw data, and build surrogates for physics based models. Domain scientists in fields like fluid dynamics, microelectronics and chemistry can automate many of their most difficult and repetitive tasks or improve the design times by use of the faster ML-surrogates. However, modern ML and traditional scientific highperformance computing (HPC) tend to use completely different software ecosystems. While ML frameworks like PyTorch and TensorFlow provide Python APIs, most HPC applications and libraries are written in C++. Direct interoperability between the two languages is possible but is tedious and error-prone. In this work, we show that a compiler-based approach can bridge the gap between ML frameworks and scientific software with less developer effort and better efficiency. We use the MLIR (multi-level intermediate representation) ecosystem to compile a pre-trained convolutional neural network (CNN) in PyTorch to freestanding C++ source code in the Kokkos programming model. Kokkos is a programming model widely used in HPC to write portable, shared-memory parallel code that can natively target a variety of CPU and GPU architectures. Our compiler-generated source code can be directly integrated into any Kokkosbased application with no dependencies on Python or cross-language interfaces.

More Details

TYPE Other Report YEAR 2022

DOI OSTI

Accelerating Multiscale Materials Modeling with Machine Learning

Modine, N.A.; Stephens, John A.; Swiler, Laura P.; Thompson, Aidan P.; Vogel, Dayton J.; Cangi, Attila; Feilder, Lenz; Rajamanickam, Sivasankaran R.

The focus of this project is to accelerate and transform the workflow of multiscale materials modeling by developing an integrated toolchain seamlessly combining DFT, SNAP, LAMMPS, (shown in Figure 1-1) and a machine-learning (ML) model that will more efficiently extract information from a smaller set of first-principles calculations. Our ML model enables us to accelerate first-principles data generation by interpolating existing high fidelity data, and extend the simulation scale by extrapolating high fidelity data (10² atoms) to the mesoscale (10⁴ atoms). It encodes the underlying physics of atomic interactions on the microscopic scale by adapting a variety of ML techniques such as deep neural networks (DNNs), and graph neural networks (GNNs). We developed a new surrogate model for density functional theory using deep neural networks. The developed ML surrogate is demonstrated in a workflow to generate accurate band energies, total energies, and density of the 298K and 933K Aluminum systems. Furthermore, the models can be used to predict the quantities of interest for systems with more number of atoms than the training data set. We have demonstrated that the ML model can be used to compute the quantities of interest for systems with 100,000 Al atoms. When compared with 2000 Al system the new surrogate model is as accurate as DFT, but three orders of magnitude faster. We also explored optimal experimental design techniques to choose the training data and novel Graph Neural Networks to train on smaller data sets. These are promising methods that need to be explored in the future.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

Computational Challenges in the development of a surrogate model for Density Functional Theory calculations

Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2022

DOI OSTI

Polynomial Preconditioning GMRES with Mixed Precisions

Loe, Jennifer A.; Glusa, Christian A.; Yamazaki, Ichitaro Y.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2022

DOI OSTI

Polynomial Preconditioning GMRES with Mixed Precisions

Loe, Jennifer A.; Glusa, Christian A.; Yamazaki, Ichitaro Y.; Boman, Erik G.; Rajamanickam, Sivasankaran R.; Morgan, Ronald

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2022

OSTI

Mixed Precision Strategies for GMRES in TrilinosJennifer

Loe, Jennifer A.; Glusa, Christian A.; Yamazaki, Ichitaro Y.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2022

OSTI

Kokkos Kernels (Sake project)

Berger-Vergiat, Luc B.; Rajamanickam, Sivasankaran R.; Dang, Vinh Q.; Kelley, Brian M.; Ellingwood, Nathan D.; Loe, Jennifer A.; Harvey, Evan C.; Pearson, Carl W.; Foucar, James G.; Liegeois, Kim A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2022

DOI OSTI

Kokkos Kernels Math Library

Berger-Vergiat, Luc B.; Rajamanickam, Sivasankaran R.; Loe, Jennifer A.; Kelley, Brian M.; Harvey, Evan C.; Foucar, James G.; Ellingwood, Nathan D.; Dang, Vinh Q.; Liegeois, Kim A.; Pearson, Carl W.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2022

DOI OSTI

Performance portable batched sparse linear solvers in Kokkos Kernels

Liegeois, Kim A.; Rajamanickam, Sivasankaran R.; Berger-Vergiat, Luc B.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2022

DOI OSTI

A machine learning surrogate for density functional theory based on the local density of state

Modine, N.A.; Fiedler, Lenz; Vogel, Dayton J.; Thompson, Aidan P.; Ellis, Austin; Stephens, John A.; Popoola, Gabe; Cangi, Attila; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2022

DOI OSTI

Trilinos for Exascale

Boman, Erik G.; Rajamanickam, Sivasankaran R.; Teranishi, Keita T.; Yamazaki, Ichitaro Y.

Abstract not provided.

More Details

TYPE Presentation YEAR 2022

OSTI

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems

Moon, Gordon E.; Kwon, Hyoukjun; Jeong, Geonhwa; Chatarasi, Prasanth; Rajamanickam, Sivasankaran R.; Krishna, Tushar

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.

More Details

TYPE Other Report YEAR 2022

DOI OSTI Scopus

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems

Moon, Gordon E.; Kwon, Hyoukjun; Jeong, Geonhwa; Chatarasi, Prasanth; Rajamanickam, Sivasankaran R.; Krishna, Tushar

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.

More Details

TYPE Journal Article YEAR 2022

DOI OSTI Scopus

Parallel, Portable Algorithms for Distance-2 Maximal Independent Set and Graph Coarsening

Kelley, Brian M.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2022

DOI OSTI

Publications

Search results