Publications

Results 26–50 of 315

Search results

Jump to search filters

Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - In conjunction with IEEE IPDPS 2021

Loe, Jennifer A.; Glusa, Christian A.; Yamazaki, Ichitaro Y.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strategies at the linear algebra algorithms level if we want to exploit the hardware to its full potential while meeting the accuracy requirements. In this paper, we focus on preconditioned sparse iterative linear solvers, a key kernel in several CSE applications. We present a study of multiprecision strategies for accelerating this kernel on GPUs. We seek the best methods for incorporating multiple precisions into the GMRES linear solver; these include iterative refinement and parallelizable preconditioners. Our work presents strategies to determine when multiprecision GMRES will be effective and to choose parameters for a multiprecision iterative refinement solver to achieve better performance. We use an implementation that is based on the Trilinos library and employs Kokkos Kernels for performance portability of linear algebra kernels. Performance results demonstrate the promise of multiprecision approaches and demonstrate even further improvements are possible by optimizing low-level kernels.

More Details

Accelerating Finite-Temperature Kohn-Sham Density Functional Theory with Deep Neural Networks

Ellis, J.A.; Fielder, Lenz; Popoola, Gabriel A.; Modine, N.A.; Stephens, John A.; Thompson, Aidan P.; Rajamanickam, Sivasankaran R.

We present a numerical modeling workflow based on machine learning (ML) which reproduces the total energies produced by Kohn-Sham density functional theory (DFT) at finite electronic temperature to within chemical accuracy at negligible computational cost. Based on deep neural networks, our workflow yields the local density of states (LDOS) for a given atomic configuration. From the LDOS, spatially-resolved, energy-resolved, and integrated quantities can be calculated, including the DFT total free energy, which serves as the Born-Oppenheimer potential energy surface for the atoms. We demonstrate the efficacy of this approach for both solid and liquid metals and compare results between independent and unified machine-learning models for solid and liquid aluminum. Our machine-learning density functional theory framework opens up the path towards multiscale materials modeling for matter under ambient and extreme conditions at a computational scale and cost that is unattainable with current algorithms.

More Details

Extending sparse tensor accelerators to support multiple compression formats

Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021

Qin, Eric; Jeong, Geonhwa; Won, William; Kao, Sheng C.; Kwon, Hyoukjun; Das, Dipankar; Moon, Gordon E.; Rajamanickam, Sivasankaran R.; Krishna, Tushar

Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored in a variety of compression formats. We demonstrate that both the compactness of different compression formats and the compute efficiency of the algorithms enabled by them vary across tensor dimensions and amount of sparsity. Since DL and scientific workloads span across all sparsity regions, there can be numerous format combinations for optimizing memory and compute efficiency. Unfortunately, many proposed accelerators operate on one or two fixed format combinations. This work proposes hardware extensions to accelerators for supporting numerous format combinations seamlessly and demonstrates ∼ 4 × speedup over performing format conversions in software.

More Details

ExaWind: Exascale Predictive Wind Plant Flow Physics Modeling

Sprague, Michael; Ananthan, Shreyas; Binyahib, Roba; Brazell, Michael; De Frahan, Marc H.; King, Ryan A.; Mullowney, Paul; Rood, Jon; Sharma, Ashesh; Thomas, Stephen A.; Vijayakumar, Ganesh; Crozier, Paul C.; Berger-Vergiat, Luc B.; Cheung, Lawrence C.; Dement, David C.; deVelder, Nathaniel d.; Glaze, D.J.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Matula, Neil M.; Okusanya, Tolulope O.; Overfelt, James R.; Rajamanickam, Sivasankaran R.; Sakievich, Philip S.; Smith, Timothy A.; Vo, Johnathan V.; Williams, Alan B.; Yamazaki, Ichitaro Y.; Turner, William J.; Prokopenko, Andrey; Wilson, Robert V.; Moser, Robert; Melvin, Jeremy; Sitaraman, Jay

Abstract not provided.

A Taxonomy for Classification and Comparison of Dataflows for GNN Accelerators

Garg, Raveesh; Qin, Eric; Martinez, Francisco M.; Guirado, Robert; Jain, Akshay; Abadal, Sergi; Abellan, Jose L.; Acacio, Manuel E.; Alarcon, Eduard; Rajamanickam, Sivasankaran R.; Krishna, Tushar

Recently, Graph Neural Networks (GNNs) have received a lot of interest because of their success in learning representations from graph structured data. However, GNNs exhibit different compute and memory characteristics compared to traditional Deep Neural Networks (DNNs). Graph convolutions require feature aggregations from neighboring nodes (known as the aggregation phase), which leads to highly irregular data accesses. GNNs also have a very regular compute phase that can be broken down to matrix multiplications (known as the combination phase). All recently proposed GNN accelerators utilize different dataflows and microarchitecture optimizations for these two phases. Different communication strategies between the two phases have been also used. However, as more custom GNN accelerators are proposed, the harder it is to qualitatively classify them and quantitatively contrast them. In this work, we present a taxonomy to describe several diverse dataflows for running GNN inference on accelerators. This provides a structured way to describe and compare the design-space of GNN accelerators.

More Details

Concentric Spherical GNN for 3D Representation Learning

Fox, James S.; Zhao, Bo; Rajamanickam, Sivasankaran R.; Ramprasad, Rampi; Le SongLe

Learning 3D representations that generalize well to arbitrarily oriented inputs is a challenge of practical importance in applications varying from computer vision to physics and chemistry. We propose a novel multi-resolution convolutional architecture for learning over concentric spherical feature maps, of which the single sphere representation is a special case. Our hierarchical architecture is based on alternatively learning to incorporate both intra-sphere and inter-sphere information. We show the applicability of our method for two different types of 3D inputs, mesh objects, which can be regularly sampled, and point clouds, which are irregularly distributed. We also propose an efficient mapping of point clouds to concentric spherical images, thereby bridging spherical convolutions on grids with general point clouds. We demonstrate the effectiveness of our approach in improving state-of-the-art performance on 3D classification tasks with rotated data.

More Details

Using MLIR Framework for Codesign of ML Architectures Algorithms and Simulation Tools

Lewis, Cannada L.; Hughes, Clayton H.; Hammond, Simon D.; Rajamanickam, Sivasankaran R.

MLIR (Multi-Level Intermediate Representation), is an extensible compiler framework that supports high-level data structures and operation constructs. These higher-level code representations are particularly applicable to the artificial intelligence and machine learning (AI/ML) domain, allowing developers to more easily support upcoming heterogeneous AI/ML accelerators and develop flexible domain specific compilers/frameworks with higher-level intermediate representations (IRs) and advanced compiler optimizations. The result of using MLIR within the LLVM compiler framework is expected to yield significant improvement in the quality of generated machine code, which in turn will result in improved performance and hardware efficiency

More Details
Results 26–50 of 315
Results 26–50 of 315