Enabling Guaranteed Correctness and Leading Edge Performance under Radiation with a Heterogeneous System
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report focuses on the two primary goals set forth in Sandia’s TAFI effort, referred to here under the name Kebab. The first goal is to overlay a trajectory onto a large database of historical trajectories, all with very different sampling rates than the original track. We demonstrate a fast method to accomplish this, even for databases that hold over a million tracks. The second goal is to then demonstrate that these matched historical trajectories can be used to make predictions about unknown qualities associated with the original trajectory. As part of this work, we also examine the problem of defining the qualities of a trajectory in a reproducible way.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Recently, Graph Neural Networks (GNNs) have received a lot of interest because of their success in learning representations from graph structured data. However, GNNs exhibit different compute and memory characteristics compared to traditional Deep Neural Networks (DNNs). Graph convolutions require feature aggregations from neighboring nodes (known as the aggregation phase), which leads to highly irregular data accesses. GNNs also have a very regular compute phase that can be broken down to matrix multiplications (known as the combination phase). All recently proposed GNN accelerators utilize different dataflows and microarchitecture optimizations for these two phases. Different communication strategies between the two phases have been also used. However, as more custom GNN accelerators are proposed, the harder it is to qualitatively classify them and quantitatively contrast them. In this work, we present a taxonomy to describe several diverse dataflows for running GNN inference on accelerators. This provides a structured way to describe and compare the design-space of GNN accelerators.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This is the second in a sequence of three Hardware Evaluation milestones that provide insight into the following questions: What are the sources of excess data movement across all levels of the memory hierarchy, going out to the network fabric? What can be done at various levels of the hardware/software hierarchy to reduce excess data movement? How does reduced data movement track application performance? The results of this study can be used to suggest where the DOE supercomputing facilities, working with their hardware vendors, can optimize aspects of the system to reduce excess data movement. Quantitative analysis will also benefit systems software and applications to optimize caching and data layout strategies. Another potential avenue is to answer cost-benefit questions, such as those involving memory capacity versus latency and bandwidth. This milestone focuses on techniques to reduce data movement, quantitatively evaluates the efficacy of the techniques in accomplishing that goal, and measures how performance tracks data movement reduction. We study a small collection of benchmarks and proxy mini-apps that run on pre-exascale GPUs and on the Accelsim GPU simulator. Our approach has two thrusts: to measure advanced data movement reduction directives and techniques on the newest available GPUs, and to evaluate our benchmark set on simulated GPUs configured with architectural refinements to reduce data movement.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The atomic cluster expansion is a general polynomial expansion of the atomic energy in multi-atom basis functions. Here we implement the atomic cluster expansion in the performant C++ code PACE that is suitable for use in large scale atomistic simulations. We briefly review the atomic cluster expansion and give detailed expressions for energies and forces as well as efficient algorithms for their evaluation. We demonstrate that the atomic cluster expansion as implemented in PACE shifts a previously established Pareto front for machine learning interatomic potentials towards faster and more accurate calculations. Moreover, general purpose parameterizations are presented for copper and silicon and evaluated in detail. We show that the new Cu and Si potentials significantly improve on the best available potentials for highly accurate large-scale atomistic simulations.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
We study both conforming and non-conforming versions of the practical DPG method for the convection-reaction problem. We determine that the most common approach for DPG stability analysis (construction of a local Fortin operator) is infeasible for the convection-reaction problem. We then develop a line of argument based on the direct construction of a global Fortin operator; we find that employing a polynomial enrichment for the test space does not suffice for this purpose, motivating the introduction of a (two-element) subgrid mesh. The argument combines mathematical analysis with numerical experiments
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
ROL-PEBBL is a C++, MPI-based parallel code for mixed-integer PDE-constrained optimization (MIPDECO). In these problems we wish to optimize (control, design, etc.) physical systems, which must obey the laws of physics, when some of the decision variables must take integer values. ROL-PEBBL combines a code to efficiently search over integer choices (PEBBL = Parallel Enumeration Branch-and-Bound Library) and a code for efficient nonlinear optimization, including PDE-constrained optimization (ROL = Rapid Optimization Library). In this report, we summarize the design of ROL-PEBBL and initial applications/results. For an artificial source-inversion problem, finding sources of pollution on a grid from sparse samples, ROL-PEBBLs solution for the nest grid gave the best optimization guarantee for any general solver that gives both a solution and a quality guarantee.
Abstract not provided.
Abstract not provided.
Abstract not provided.