Astra, deployed in 2018, was the first petascale supercomputer to utilize processors based on the ARM instruction set. The system was also the first under Sandia's Vanguard program which seeks to provide an evaluation vehicle for novel technologies that with refinement could be utilized in demanding, large-scale HPC environments. In addition to ARM, several other important first-of-a-kind developments were used in the machine, including new approaches to cooling the datacenter and machine. This article documents our experiences building a power measurement and control infrastructure for Astra. While this is often beyond the control of users today, the accurate measurement, cataloging, and evaluation of power, as our experiences show, is critical to the successful deployment of a large-scale platform. While such systems exist in part for other architectures, Astra required new development to support the novel Marvell ThunderX2 processor used in compute nodes. In addition to documenting the measurement of power during system bring up and for subsequent on-going routine use, we present results associated with controlling the power usage of the processor, an area which is becoming of progressively greater interest as data centers and supercomputing sites look to improve compute/energy efficiency and find additional sources for full system optimization.

The ground truth program used simulations as test beds for social science research methods. The simulations had known ground truth and were capable of producing large amounts of data. This allowed research teams to run experiments and ask questions of these simulations similar to social scientists studying real-world systems, and enabled robust evaluation of their causal inference, prediction, and prescription capabilities. We tested three hypotheses about research effectiveness using data from the ground truth program, specifically looking at the influence of complexity, causal understanding, and data collection on performance. We found some evidence that system complexity and causal understanding influenced research performance, but no evidence that data availability contributed. The ground truth program may be the first robust coupling of simulation test beds with an experimental framework capable of teasing out factors that determine the success of social science research.

Partitioned methods allow one to build a simulation capability for coupled problems by reusing existing single-component codes. In so doing, partitioned methods can shorten code development and validation times for multiphysics and multiscale applications. In this work, we consider a scenario in which one or more of the “codes” being coupled are projection-based reduced order models (ROMs), introduced to lower the computational cost associated with a particular component. We simulate this scenario by considering a model interface problem that is discretized independently on two non-overlapping subdomains. Here we then formulate a partitioned scheme for this problem that allows the coupling between a ROM “code” for one of the subdomains with a finite element model (FEM) or ROM “code” for the other subdomain. The ROM “codes” are constructed by performing proper orthogonal decomposition (POD) on a snapshot ensemble to obtain a low-dimensional reduced order basis, followed by a Galerkin projection onto this basis. The ROM and/or FEM “codes” on each subdomain are then coupled using a Lagrange multiplier representing the interface flux. To partition the resulting monolithic problem, we first eliminate the flux through a dual Schur complement. Application of an explicit time integration scheme to the transformed monolithic problem decouples the subdomain equations, allowing their independent solution for the next time step. We show numerical results that demonstrate the proposed method’s efficacy in achieving both ROM-FEM and ROM-ROM coupling.

We study both conforming and non-conforming versions of the practical DPG method for the convection-reaction problem. We determine that the most common approach for DPG stability analysis - construction of a local Fortin operator - is infeasible for the convection-reaction problem. We then develop a line of argument based on a direct proof of discrete stability; we find that employing a polynomial enrichment for the test space does not suffice for this purpose, motivating the introduction of a (two-element) subgrid mesh. The argument combines mathematical analysis with numerical experiments.

Measures of simulation model complexity generally focus on outputs; we propose measuring the complexity of a model’s causal structure to gain insight into its fundamental character. This article introduces tools for measuring causal complexity. First, we introduce a method for developing a model’s causal structure diagram, which characterises the causal interactions present in the code. Causal structure diagrams facilitate comparison of simulation models, including those from different paradigms. Next, we develop metrics for evaluating a model’s causal complexity using its causal structure diagram. We discuss cyclomatic complexity as a measure of the intricacy of causal structure and introduce two new metrics that incorporate the concept of feedback, a fundamental component of causal structure. The first new metric introduced here is feedback density, a measure of the cycle-based interconnectedness of causal structure. The second metric combines cyclomatic complexity and feedback density into a comprehensive causal complexity measure. Finally, we demonstrate these complexity metrics on simulation models from multiple paradigms and discuss potential uses and interpretations. These tools enable direct comparison of models across paradigms and provide a mechanism for measuring and discussing complexity based on a model’s fundamental assumptions and design.

National security applications require artificial neural networks (ANNs) that consume less power, are fast and dynamic online learners, are fault tolerant, and can learn from unlabeled and imbalanced data. We explore whether two fundamentally different, traditional learning algorithms from artificial intelligence and the biological brain can be merged. We tackle this problem from two directions. First, we start from a theoretical point of view and show that the spike time dependent plasticity (STDP) learning curve observed in biological networks can be derived using the mathematical framework of backpropagation through time. Second, we show that transmission delays, as observed in biological networks, improve the ability of spiking networks to perform classification when trained using a backpropagation of error (BP) method. These results provide evidence that STDP could be compatible with a BP learning rule. Combining these learning algorithms will likely lead to networks more capable of meeting our national security missions.

In this article, we present a general methodology to combine the Discontinuous Petrov–Galerkin (DPG) method in space and time in the context of methods of lines for transient advection–reaction problems. We first introduce a semidiscretization in space with a DPG method redefining the ideas of optimal testing and practicality of the method in this context. Then, we apply the recently developed DPG-based time-marching scheme, which is of exponential-type, to the resulting system of Ordinary Differential Equations (ODEs). We also discuss how to efficiently compute the action of the exponential of the matrix coming from the space semidiscretization without assembling the full matrix. Finally, we verify the proposed method for 1D+time advection–reaction problems showing optimal convergence rates for smooth solutions and more stable results for linear conservation laws comparing to the classical exponential integrators.

Neural operators [1–5] have recently become popular tools for designing solution maps between function spaces in the form of neural networks. Differently from classical scientific machine learning approaches that learn parameters of a known partial differential equation (PDE) for a single instance of the input parameters at a fixed resolution, neural operators approximate the solution map of a family of PDEs [6,7]. Despite their success, the uses of neural operators are so far restricted to relatively shallow neural networks and confined to learning hidden governing laws. In this work, we propose a novel nonlocal neural operator, which we refer to as nonlocal kernel network (NKN), that is resolution independent, characterized by deep neural networks, and capable of handling a variety of tasks such as learning governing equations and classifying images. Our NKN stems from the interpretation of the neural network as a discrete nonlocal diffusion reaction equation that, in the limit of infinite layers, is equivalent to a parabolic nonlocal equation, whose stability is analyzed via nonlocal vector calculus. The resemblance with integral forms of neural operators allows NKNs to capture long-range dependencies in the feature space, while the continuous treatment of node-to-node interactions makes NKNs resolution independent. The resemblance with neural ODEs, reinterpreted in a nonlocal sense, and the stable network dynamics between layers allow for generalization of NKN's optimal parameters from shallow to deep networks. This fact enables the use of shallow-to-deep initialization techniques [8]. Our tests show that NKNs outperform baseline methods in both learning governing equations and image classification tasks and generalize well to different resolutions and depths.

The recent discovery of bright, room-temperature, single photon emitters in GaN leads to an appealing alternative to diamond best single photon emitters given the widespread use and technological maturity of III-nitrides for optoelectronics (e.g. blue LEDs, lasers) and high-speed, high-power electronics. This discovery opens the door to on-chip and on-demand single photon sources integrated with detectors and electronics. Currently, little is known about the underlying defect structure nor is there a sense of how such an emitter might be controllably created. A detailed understanding of the origin of the SPEs in GaN and a path to deterministically introduce them is required. In this project, we develop new experimental capabilities to then investigate single photon emission from GaN nanowires and both GAN and AlN wafers. We ion implant our wafers with the ion implanted with our focused ion beam nanoimplantation capabilities at Sandia, to go beyond typical broad beam implantation and create single photon emitting defects with nanometer precision. We've created light emitting sources using Li^{+} and He^{+}, but single photon emission has yet to be demonstrated. In parallel, we calculate the energy levels of defects and transition metal substitutions in GaN to gain a better understanding of the sources of single photon emission in GaN and AlN. The combined experimental and theoretical capabilities developed throughout this project will enable further investigation into the origins of single photon emission from defects in GaN, AlN, and other wide bandgap semiconductors.

Fractional equations have become the model of choice in several applications where heterogeneities at the microstructure result in anomalous diffusive behavior at the macroscale. In this work we introduce a new fractional operator characterized by a doubly-variable fractional order and possibly truncated interactions. Under certain conditions on the model parameters and on the regularity of the fractional order we show that the corresponding Poisson problem is well-posed. We also introduce a finite element discretization and describe an efficient implementation of the finite-element matrix assembly in the case of piecewise constant fractional order. Through several numerical tests, we illustrate the improved descriptive power of this new operator across media interfaces. Furthermore, we present one-dimensional and two-dimensional h-convergence results that show that the variable-order model has the same convergence behavior as the constant-order model.

The prevalence of COVID-19 is shaped by behavioral responses to recommendations and warnings. Available information on the disease determines the population’s perception of danger and thus its behavior; this information changes dynamically, and different sources may report conflicting information. We study the feedback between disease, information, and stay-at-home behavior using a hybrid agent-based-system dynamics model that incorporates evolving trust in sources of information. We use this model to investigate how divergent reporting and conflicting information can alter the trajectory of a public health crisis. The model shows that divergent reporting not only alters disease prevalence over time, but also increases polarization of the population’s behaviors and trust in different sources of information.

The long-standing problem of predicting the electronic structure of matter on ultra-large scales (beyond 100,000 atoms) is solved with machine learning.

Physics-constrained machine learning is emerging as an important topic in the field of machine learning for physics. One of the most significant advantages of incorporating physics constraints into machine learning methods is that the resulting model requires significantly less data to train. By incorporating physical rules into the machine learning formulation itself, the predictions are expected to be physically plausible. Gaussian process (GP) is perhaps one of the most common methods in machine learning for small datasets. In this paper, we investigate the possibility of constraining a GP formulation with monotonicity on three different material datasets, where one experimental and two computational datasets are used. The monotonic GP is compared against the regular GP, where a significant reduction in the posterior variance is observed. The monotonic GP is strictly monotonic in the interpolation regime, but in the extrapolation regime, the monotonic effect starts fading away as one goes beyond the training dataset. Imposing monotonicity on the GP comes at a small accuracy cost, compared to the regular GP. The monotonic GP is perhaps most useful in applications where data are scarce and noisy, and monotonicity is supported by strong physical evidence.

Many distributed applications implement complex data flows and need a flexible mechanism for routing data between producers and consumers. Recent advances in programmable network interface cards, or SmartNICs, represent an opportunity to offload data-flow tasks into the network fabric, thereby freeing the hosts to perform other work. System architects in this space face multiple questions about the best way to leverage SmartNICs as processing elements in data flows. In this paper, we advocate the use of Apache Arrow as a foundation for implementing data-flow tasks on SmartNICs. We report on our experiences adapting a partitioning algorithm for particle data to Apache Arrow and measure the on-card processing performance for the BlueField-2 SmartNIC. Our experiments confirm that the BlueField-2’s (de)compression hardware can have a significant impact on in-transit workflows where data must be unpacked, processed, and repacked.

The purpose of our report is to discuss the notion of entropy and its relationship with statistics. Our goal is to provide a manner in which you can think about entropy, its central role within information theory and relationship with statistics. We review various relationships between information theory and statistics—nearly all are well-known but unfortunately are often not recognized. Entropy quantities the "average amount of surprise" in a random variable and lies at the heart of information theory, which studies the transmission, processing, extraction, and utilization of information. For us, data is information. What is the distinction between information theory and statistics? Information theorists work with probability distributions. Instead, statisticians work with samples. In so many words, information theory using samples is the practice of statistics. Acknowledgements. We thank Danny Dunlavy, Carlos Llosa, Oscar Lopez, Arvind Prasadan, Gary Saavedra, Jeremy Wendt for helpful discussions along the way. Our report was supported by the Laboratory Directed Research and Development program at San- dia National Laboratories, a multimission laboratory managed and operated by National Technol- ogy and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell Inter- national, Inc., for the U.S. Department of Energy's National Nuclear Adminstration under contract DE-NA0003525.

Recent efforts at Sandia such as DataSEA are creating search engines that enable analysts to query the institution’s massive archive of simulation and experiment data. The benefit of this work is that analysts will be able to retrieve all historical information about a system component that the institution has amassed over the years and make better-informed decisions in current work. As DataSEA gains momentum, it faces multiple technical challenges relating to capacity storage. From a raw capacity perspective, data producers will rapidly overwhelm the system with massive amounts of data. From an accessibility perspective, analysts will expect to be able to retrieve any portion of the bulk data, from any system on the enterprise network. Sandia’s Institutional Computing is mitigating storage problems at the enterprise level by procuring new capacity storage systems that can be accessed from anywhere on the enterprise network. These systems use the simple storage service, or S3, API for data transfers. While S3 uses objects instead of files, users can access it from their desktops or Sandia’s high-performance computing (HPC) platforms. S3 is particularly well suited for bulk storage in DataSEA, as datasets can be decomposed into object that can be referenced and retrieved individually, as needed by an analyst. In this report we describe our experiences working with S3 storage and provide information about how developers can leverage Sandia’s current systems. We present performance results from two sets of experiments. First, we measure S3 throughput when exchanging data between four different HPC platforms and two different enterprise S3 storage systems on the Sandia Restricted Network (SRN). Second, we measure the performance of S3 when communicating with a custom-built Ceph storage system that was constructed from HPC components. Overall, while S3 storage is significantly slower than traditional HPC storage, it provides significant accessibility benefits that will be valuable for archiving and exploiting historical data. There are multiple opportunities that arise from this work, including enhancing DataSEA to leverage S3 for bulk storage and adding native S3 support to Sandia’s IOSS library.

The time integration scheme is probably one of the most fundamental choices in the development of an ocean model. In this paper, we investigate several time integration schemes when applied to the shallow water equations. This set of equations is accurate enough for the modeling of a shallow ocean and is also relevant to study as it is the one solved for the barotropic (i.e. vertically averaged) component of a three dimensional ocean model. We analyze different time stepping algorithms for the linearized shallow water equations. High order explicit schemes are accurate but the time step is constrained by the Courant-Friedrichs-Lewy stability condition. Implicit schemes can be unconditionally stable but, in practice lack accuracy when used with large time steps. In this paper we propose a detailed comparison of such classical schemes with exponential integrators. The accuracy and the computational costs are analyzed in different configurations.

We construct a family of embedded pairs for optimal explicit strong stability preserving Runge–Kutta methods of order 2≤p≤4 to be used to obtain numerical solution of spatially discretized hyperbolic PDEs. In this construction, the goals include non-defective property, large stability region, and small error values as defined in Dekker and Verwer (1984) and Kennedy et al. (2000). The new family of embedded pairs offer the ability for strong stability preserving (SSP) methods to adapt by varying the step-size. Through several numerical experiments, we assess the overall effectiveness in terms of work versus precision while also taking into consideration accuracy and stability.

Additive manufactured Ti-5Al-5V-5Mo-3Cr (Ti-5553) is being considered as an AM repair material for engineering applications because of its superior strength properties compared to other titanium alloys. Here, we describe the failure mechanisms observed through computed tomography, electron backscatter diffraction (EBSD), and scanning electron microscopy (SEM) of spall damage as a result of tensile failure in as-built and annealed Ti-5553. We also investigate the phase stability in native powder, as-built and annealed Ti-5553 through diamond anvil cell (DAC) and ramp compression experiments. We then explore the effect of tensile loading on a sample containing an interface between a Ti-6Al-V4 (Ti-64) baseplate and additively manufactured Ti-5553 layer. Post-mortem materials characterization showed spallation occurred in regions of initial porosity and the interface provides a nucleation site for spall damage below the spall strength of Ti-5553. Preliminary peridynamics modeling of the dynamic experiments is described. Finally, we discuss further development of Stochastic Parallel PARticle Kinteic Simulator (SPPARKS) Monte Carlo (MC) capabilities to include the integration of alpha (α)-phase and microstructural simulations for this multiphase titanium alloy.

X-ray computed tomography is generally a primary step in characterization of defective electronic components, but is generally too slow to screen large lots of components. Super-resolution imaging approaches, in which higher-resolution data is inferred from lower-resolution images, have the potential to substantially reduce collection times for data volumes accessible via x-ray computed tomography. Here we seek to advance existing two-dimensional super-resolution approaches directly to three-dimensional computed tomography data. Multiple scan resolutions over a half order of magnitude of resolution were collected for four classes of commercial electronic components to serve as training data for a deep-learning, super-resolution network. A modular python framework for three-dimensional super-resolution of computed tomography data has been developed and trained over multiple classes of electronic components. Initial training and testing demonstrate the vast promise for these approaches, which have the potential for more than an order of magnitude reduction in collection time for electronic component screening.

Bidadi, Shreyas B.; Brazell, Michael B.; Brunhart-Lupo, Nicholas B.; Henry de Frahan, Marc T.; Lee, Dong H.; Hu, Jonathan J.; Melvin, Jeremy M.; Mullowney, Paul M.; Vijayakumar, Ganesh V.; Moser, Robert D.; Rood, Jon R.; Sakievich, Philip S.; Sharma, Ashesh S.; Williams, Alan B.; Sprague, Michael A.

The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, capturing the thin boundary layers, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.

The ASC program seeks to use machine learning to improve efficiencies in its stockpile stewardship mission. Moreover, there is a growing market for technologies dedicated to accelerating AI workloads. Many of these emerging architectures promise to provide savings in energy efficiency, area, and latency when compared to traditional CPUs for these types of applications — neuromorphic analog and digital technologies provide both low-power and configurable acceleration of challenging artificial intelligence (AI) algorithms. If designed into a heterogeneous system with other accelerators and conventional compute nodes, these technologies have the potential to augment the capabilities of traditional High Performance Computing (HPC) platforms [5]. This expanded computation space requires not only a new approach to physics simulation, but the ability to evaluate and analyze next-generation architectures specialized for AI/ML workloads in both traditional HPC and embedded ND applications. Developing this capability will enable ASC to understand how this hardware performs in both HPC and ND environments, improve our ability to port our applications, guide the development of computing hardware, and inform vendor interactions, leading them toward solutions that address ASC’s unique requirements.

Kononov, Alina K.; Lee, Cheng-Wei L.; Pereira dos Santos, Tatiane P.; Robinson, Brian R.; Yao, Yifan Y.; Yao, Yi Y.; Andrade, Xavier A.; Baczewski, Andrew D.; Constantinescu, Emil C.; Correa, Alfredo C.; Kanai, Yosuke K.; Modine, N.A.; Schleife, Andre S.

Due to a beneficial balance of computational cost and accuracy, real-time time-dependent density-functional theory has emerged as a promising first-principles framework to describe electron real-time dynamics. Here we discuss recent implementations around this approach, in particular in the context of complex, extended systems. Results include an analysis of the computational cost associated with numerical propagation and when using absorbing boundary conditions. We extensively explore the shortcomings for describing electron-electron scattering in real time and compare to many-body perturbation theory. Modern improvements of the description of exchange and correlation are reviewed. In this work, we specifically focus on the Qb@ll code, which we have mainly used for these types of simulations over the last years, and we conclude by pointing to further progress needed going forward.

Uncertainty quantification (UQ) plays a major role in verification and validation for computational engineering models and simulations, and establishes trust in the predictive capability of computational models. In the materials science and engineering context, where the process-structure-property-performance linkage is well known to be the only road mapping from manufacturing to engineering performance, numerous integrated computational materials engineering (ICME) models have been developed across a wide spectrum of length-scales and time-scales to relieve the burden of resource-intensive experiments. Within the structure-property linkage, crystal plasticity finite element method (CPFEM) models have been widely used since they are one of a few ICME toolboxes that allows numerical predictions, providing the bridge from microstructure to materials properties and performances. Several constitutive models have been proposed in the last few decades to capture the mechanics and plasticity behavior of materials. While some UQ studies have been performed, the robustness and uncertainty of these constitutive models have not been rigorously established. In this work, we apply a stochastic collocation (SC) method, which is mathematically rigorous and has been widely used in the field of UQ, to quantify the uncertainty of three most commonly used constitutive models in CPFEM, namely phenomenological models (with and without twinning), and dislocation-density-based constitutive models, for three different types of crystal structures, namely face-centered cubic (fcc) copper (Cu), body-centered cubic (bcc) tungsten (W), and hexagonal close packing (hcp) magnesium (Mg). Our numerical results not only quantify the uncertainty of these constitutive models in stress-strain curve, but also analyze the global sensitivity of the underlying constitutive parameters with respect to the initial yield behavior, which may be helpful for robust constitutive model calibration works in the future.