Publications Search

Garg, Raveesh; Qin, Eric; Martinez, Francisco M.; Guirado, Robert; Jain, Akshay; Abadal, Sergi; Abellan, Jose L.; Acacio, Manuel E.; Alarcon, Eduard; Rajamanickam, Sivasankaran; Krishna, Tushar

Recently, Graph Neural Networks (GNNs) have received a lot of interest because of their success in learning representations from graph structured data. However, GNNs exhibit different compute and memory characteristics compared to traditional Deep Neural Networks (DNNs). Graph convolutions require feature aggregations from neighboring nodes (known as the aggregation phase), which leads to highly irregular data accesses. GNNs also have a very regular compute phase that can be broken down to matrix multiplications (known as the combination phase). All recently proposed GNN accelerators utilize different dataflows and microarchitecture optimizations for these two phases. Different communication strategies between the two phases have been also used. However, as more custom GNN accelerators are proposed, the harder it is to qualitatively classify them and quantitatively contrast them. In this work, we present a taxonomy to describe several diverse dataflows for running GNN inference on accelerators. This provides a structured way to describe and compare the design-space of GNN accelerators.

More Details

TYPE Other Report YEAR 2021

DOI OSTI DOI OSTI

Extending TRiSK with Higher-Order Hodge Stars and WENO/FCT Advection

Eldred, Christopher; Taylor, Mark A.; Norman, Matthew

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Performant implementation of the atomic cluster expansion

Lysogorskiy, Yury; Rinaldi, Matteo; Menon, Sarath; Van Der Oord, Van; Hammerschmidt, Thomas; Mrovec, Matous; Thompson, Aidan P.; Csanyi, Gabor; Ortner, Christoph; Drautz, Ralf

The atomic cluster expansion is a general polynomial expansion of the atomic energy in multi-atom basis functions. Here we implement the atomic cluster expansion in the performant C++ code PACE that is suitable for use in large scale atomistic simulations. We briefly review the atomic cluster expansion and give detailed expressions for energies and forces as well as efficient algorithms for their evaluation. We demonstrate that the atomic cluster expansion as implemented in PACE shifts a previously established Pareto front for machine learning interatomic potentials towards faster and more accurate calculations. Moreover, general purpose parameterizations are presented for copper and silicon and evaluated in detail. We show that the new Cu and Si potentials significantly improve on the best available potentials for highly accurate large-scale atomistic simulations.

More Details

TYPE Other Report YEAR 2021

DOI OSTI

Portability of Nalu-Wind using Trilinos and Kokkos Libraries

Berger-Vergiat, Luc; Hu, Jonathan J.; Glusa, Christian; Siefert, Christopher

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Enabling Guaranteed Correctness and Leading Edge Performance under Radiation with a Heterogeneous System

Feinberg, Benjamin M.; Rodrigues, Arun; Marinella, Matthew; Agarwal, Sapan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Removal of the UVM Requirement from Tpetra: MultiVector and BlockMultiVector

Devine, Karen; Danielson, Geoffrey C.; Fuller, Timothy J.; Hu, Jonathan J.; Kelley, Brian M.; Kim, Kyungjoo; Siefert, Christopher; Smith, Timothy A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Codesign for the Masses

Lewis, Cannada; Hammond, Simon; Wilke, Jeremiah

In this position paper we will address challenges and opportunities relating to the design and codesign of application specific circuits. Given our background as computational scientists, our perspective is from the viewpoint of a highly motivated application developer as opposed to career computer architects

More Details

TYPE Other Report YEAR 2021

DOI OSTI

Using MLIR Framework for Codesign of ML Architectures Algorithms and Simulation Tools

Lewis, Cannada; Hughes, Clayton; Hammond, Simon; Rajamanickam, Sivasankaran

MLIR (Multi-Level Intermediate Representation), is an extensible compiler framework that supports high-level data structures and operation constructs. These higher-level code representations are particularly applicable to the artificial intelligence and machine learning (AI/ML) domain, allowing developers to more easily support upcoming heterogeneous AI/ML accelerators and develop flexible domain specific compilers/frameworks with higher-level intermediate representations (IRs) and advanced compiler optimizations. The result of using MLIR within the LLVM compiler framework is expected to yield significant improvement in the quality of generated machine code, which in turn will result in improved performance and hardware efficiency

More Details

TYPE Other Report YEAR 2021

DOI OSTI

Deep Conservation: A Latent-Dynamics Model for Exact Satisfaction of Physical Conservation Laws [Slides]

Lee, Kookjin L.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin M.; Wong, Ryan; Xiao, T.P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI Scopus

Data-driven learning of nonlocal physics from high-fidelity synthetic data

Computer Methods in Applied Mechanics and Engineering

You, Huaiqian; Yu, Yue; Trask, Nathaniel A.; Gulian, Mamikon; D'Elia, Marta

A key challenge to nonlocal models is the analytical complexity of deriving them from first principles, and frequently their use is justified a posteriori. In this work we extract nonlocal models from data, circumventing these challenges and providing data-driven justification for the resulting model form. Extracting data-driven surrogates is a major challenge for machine learning (ML) approaches, due to nonlinearities and lack of convexity — it is particularly challenging to extract surrogates which are provably well-posed and numerically stable. Our scheme not only yields a convex optimization problem, but also allows extraction of nonlocal models whose kernels may be partially negative while maintaining well-posedness even in small-data regimes. To achieve this, based on established nonlocal theory, we embed in our algorithm sufficient conditions on the non-positive part of the kernel that guarantee well-posedness of the learnt operator. These conditions are imposed as inequality constraints to meet the requisite conditions of the nonlocal theory. We demonstrate this workflow for a range of applications, including reproduction of manufactured nonlocal kernels; numerical homogenization of Darcy flow associated with a heterogeneous periodic microstructure; nonlocal approximation to high-order local transport phenomena; and approximation of globally supported fractional diffusion operators by truncated kernels.

More Details

TYPE Journal Article YEAR 2021

DOI OSTI Scopus

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin M.; Wong, Ryan; Xiao, T.P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI Scopus

DeACT: Architecture-Aware Virtual Memory Support for Fabric Attached Memory Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Kommareddy, Vamsee R.; Hughes, Clayton; Hammond, Simon; Awad, Amro

1 The exponential growth of data has driven technology providers to develop new protocols, such as cache coherent interconnects and memory semantic fabrics, to help users and facilities leverage advances in memory technologies to satisfy these growing memory and storage demands. Using these new protocols, fabric-Attached memories (FAM) can be directly attached to a system interconnect and be easily integrated with a variety of processing elements (PEs). Moreover, systems that support FAM can be smoothly upgraded and allow multiple PEs to share the FAM memory pools using well-defined protocols. The sharing of FAM between PEs allows efficient data sharing, improves memory utilization, reduces cost by allowing flexible integration of different PEs and memory modules from several vendors, and makes it easier to upgrade the system. One promising use-case for FAMs is in High-Performance Compute (HPC) systems, where the underutilization of memory is a major challenge. However, adopting FAMs in HPC systems brings new challenges. In addition to cost, flexibility, and efficiency, one particular problem that requires rethinking is virtual memory support for security and performance. To address these challenges, this paper presents decoupled access control and address translation (DeACT), a novel virtual memory implementation that supports HPC systems equipped with FAM. Compared to the state-of-The-Art two-level translation approach, DeACT achieves speedup of up to 4.59x (1.8x on average) without compromising security.1Part of this work was done when Vamsee was working under the supervision of Amro Awad at UCF. Amro Awad is now with the ECE Department at NC State.

More Details

TYPE Conference Paper YEAR 2021

DOI OSTI Scopus

Sandia Labs Event-Sensing & Computation Interests

Vineyard, Craig M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Operational Quantum Tomography

Di Matteo, Olivia; Gamble, John; Granade, Chris; Rudinger, Kenneth M.; Wiebe, Nathan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Coupling 1D Telegrapher Equations to 3D Maxwell's Equations with Applications to Pulsed Power

Mcgregor, Duncan A.O.; Phillips, Edward; Sirajuddin, David; Pointon, Timothy

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Development of Machine Learned SNAP Potentials for Studying Radiation Damage in Materials

Cusentino, Mary A.; Wood, M.A.; Thompson, Aidan P.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Design and Implementation of ETD Methods for Nonhydrostatic Atmosphere Models

Krause, Cassidy F.; Steyer, Andrew

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Implementing Calving Laws in Ice-Sheet Models using Level Set Methods

Sockwell, Kenneth C.; Perego, Mauro

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Thermodynamically consistent physics-informed neural networks for hyperbolic systems

Manickam, Indu; Patel, Ravi; Trask, Nathaniel A.; Wood, M.A.; Lee, Myoungkyu; Tomas, Ignacio; Cyr, Eric C.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Hybrid multi-level Monte Carlo polynomial chaos method for global sensitivity analysis

Merritt, Michael; Geraci, Gianluca; Eldred, Michael S.; Portone, Teresa

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Evolving Spiking Circuit Motifs Using Weight Agnostic Neural Networks

Anwar, Abrar

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2021

DOI OSTI

Model Parallelism with Spatial Decomposition of Volumetric Data for Deep Learning

Saavedra, Gary; Cyr, Eric C.; Schroder, Jacob; Hewett, Russell

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Robustness and Validation of Model and Digital Twins Deployment

Volkova, Svitana; Stracuzzi, David J.; Shafer, Jenifer; Ray, Jaideep; Pullum, Laura

For digital twins (DTs) to become a central fixture in mission critical systems, a better understanding is required of potential modes of failure, quantification of uncertainty, and the ability to explain a model’s behavior. These aspects are particularly important as the performance of a digital twin will evolve during model development and deployment for real-world operations.

More Details

TYPE Other Report YEAR 2021

DOI OSTI