Publications

Results 51–100 of 9,998
Skip to search filters

Combining Physics and Machine Learning for the Next Generation of Molecular Simulation

Rackers, Joshua R.

Simulating molecules and atomic systems at quantum accuracy is a grand challenge for science in the 21st century. Quantum-accurate simulations would enable the design of new medicines and the discovery of new materials. The defining problem in this challenge is that quantum calculations on large molecules, like proteins or DNA, are fundamentally impossible with current algorithms. In this work, we explore a range of different methods that aim to make large, quantum-accurate simulations possible. We show that using advanced classical models, we can accurately simulate ion channels, an important biomolecular system. We show how advanced classical models can be implemented in an exascale-ready software package. Lastly, we show how machine learning can learn the laws of quantum mechanics from data and enable quantum electronic structure calculations on thousands of atoms, a feat that is impossible for current algorithms. Altogether, this work shows that combining advances in physics models, computing, and machine learning, we are moving closer to the reality of accurately simulating our molecular world.

More Details

AI-enhanced Codesign for Next-Generation Neuromorphic Circuits and Systems

Cardwell, Suma G.; Smith, John D.; Crowder, Douglas C.

This report details work that was completed to address the Fiscal Year 2022 Advanced Science and Technology (AS&T) Laboratory Directed Research and Development (LDRD) call for “AI-enhanced Co-Design of Next Generation Microelectronics.” This project required concurrent contributions from the fields of 1) materials science, 2) devices and circuits, 3) physics of computing, and 4) algorithms and system architectures. During this project, we developed AI-enhanced circuit design methods that relied on reinforcement learning and evolutionary algorithms. The AI-enhanced design methods were tested on neuromorphic circuit design problems that have real-world applications related to Sandia’s mission needs. The developed methods enable the design of circuits, including circuits that are built from emerging devices, and they were also extended to enable novel device discovery. We expect that these AI-enhanced design methods will accelerate progress towards developing next-generation, high-performance neuromorphic computing systems.

More Details

Using ultrasonic attenuation in cortical bone to infer distributions on pore size

Applied Mathematical Modelling

White, Rebekah D.; Alexanderian, A.; Yousefian, O.; Karbalaeisadegh, Y.; Bekele-Maxwell, K.; Kasali, A.; Banks, H.T.; Talmant, M.; Grimal, Q.; Muller, M.

In this work we infer the underlying distribution on pore radius in human cortical bone samples using ultrasonic attenuation data. We first discuss how to formulate polydisperse attenuation models using a probabilistic approach and the Waterman Truell model for scattering attenuation. We then compare the Independent Scattering Approximation and the higher-order Waterman Truell models’ forward predictions for total attenuation in polydisperse samples. Following this, we formulate an inverse problem under the Prohorov Metric Framework coupled with variational regularization to stabilize this inverse problem. We then use experimental attenuation data taken from human cadaver samples and solve inverse problems resulting in nonparametric estimates of the probability density function on pore radius. We compare these estimates to the “true” microstructure of the bone samples determined via microCT imaging. We find that our methodology allows us to reliably estimate the underlying microstructure of the bone from attenuation data.

More Details

Progress in Modeling the 2019 Extended Magnetically Insulated Transmission Line (MITL) and Courtyard Environment Trial at HERMES-III

Cartwright, Keith C.; Pointon, Tim P.; Powell, Troy C.; Grabowski, Theodore C.; Shields, Sidney S.; Sirajuddin, David S.; Jensen, Daniel S.; Renk, Timothy J.; Cyr, Eric C.; Stafford, David S.; Swan, Matthew S.; Mitra, Sudeep M.; McDoniel, William M.; Moore, Christopher H.

This report documents the progress made in simulating the HERMES-III Magnetically Insulated Transmission Line (MITL) and courtyard with EMPIRE and ITS. This study focuses on the shots that were taken during the months of June and July of 2019 performed with the new MITL extension. There were a few shots where there was dose mapping of the courtyard, 11132, 11133, 11134, 11135, 11136, and 11146. This report focuses on these shots because there was full data return from the MITL electrical diagnostics and the radiation dose sensors in the courtyard. The comparison starts with improving the processing of the incoming voltage into the EMPIRE simulation from the experiment. The currents are then compared at several location along the MITL. The simulation results of the electrons impacting the anode are shown. The electron impact energy and angle is then handed off to ITS which calculates the dose on the faceplate and locations in the courtyard and they are compared to experimental measurements. ITS also calculates the photons and electrons that are injected into the courtyard, these quantities are then used by EMPIRE to calculated the photon and electron transport in the courtyard. The details for the algorithms used to perform the courtyard simulations are presented as well as qualitative comparisons of the electric field, magnetic field, and the conductivity in the courtyard. Because of the computational burden of these calculations the pressure was reduce in the courtyard to reduce the computational load. The computation performance is presented along with suggestion on how to improve both the computational performance as well as the algorithmic performance. Some of the algorithmic changed would reduce the accuracy of the models and detail comparison of these changes are left for a future study. As well as, list of code improvements there is also a list of suggested experimental improvements to improve the quality of the data return.

More Details

Resilience Enhancements through Deep Learning Yields

Eydenberg, Michael S.; Batsch-Smith, Lisa B.; Bice, Charles T.; Blakely, Logan; Bynum, Michael L.; Boukouvala, Fani B.; Castillo, Anya C.; Haddad, Joshua H.; Hart, William E.; Jalving, Jordan H.; Kilwein, Zachary A.; Laird, Carl D.; Skolfield, Joshua K.

This report documents the Resilience Enhancements through Deep Learning Yields (REDLY) project, a three-year effort to improve electrical grid resilience by developing scalable methods for system operators to protect the grid against threats leading to interrupted service or physical damage. The computational complexity and uncertain nature of current real-world contingency analysis presents significant barriers to automated, real-time monitoring. While there has been a significant push to explore the use of accurate, high-performance machine learning (ML) model surrogates to address this gap, their reliability is unclear when deployed in high-consequence applications such as power grid systems. Contemporary optimization techniques used to validate surrogate performance can exploit ML model prediction errors, which necessitates the verification of worst-case performance for the models.

More Details

Improving Predictive Capability in REHEDS Simulations with Fast, Accurate, and Consistent Non-Equilibrium Material Properties

Hansen, Stephanie B.; Baczewski, Andrew D.; Gomez, T.A.; Hentschel, T.W.; Jennings, Christopher A.; Kononov, Alina K.; Nagayama, Taisuke N.; Adler, Kelsey A.; Cangi, A.C.; Cochrane, Kyle C.; Schleife, A. &.

Predictive design of REHEDS experiments with radiation-hydrodynamic simulations requires knowledge of material properties (e.g. equations of state (EOS), transport coefficients, and radiation physics). Interpreting experimental results requires accurate models of diagnostic observables (e.g. detailed emission, absorption, and scattering spectra). In conditions of Local Thermodynamic Equilibrium (LTE), these material properties and observables can be pre-computed with relatively high accuracy and subsequently tabulated on simple temperature-density grids for fast look-up by simulations. When radiation and electron temperatures fall out of equilibrium, however, non-LTE effects can profoundly change material properties and diagnostic signatures. Accurately and efficiently incorporating these non-LTE effects has been a longstanding challenge for simulations. At present, most simulations include non-LTE effects by invoking highly simplified inline models. These inline non-LTE models are both much slower than table look-up and significantly less accurate than the detailed models used to populate LTE tables and diagnose experimental data through post-processing or inversion. Because inline non-LTE models are slow, designers avoid them whenever possible, which leads to known inaccuracies from using tabular LTE. Because inline models are simple, they are inconsistent with tabular data from detailed models, leading to ill-known inaccuracies, and they cannot generate detailed synthetic diagnostics suitable for direct comparisons with experimental data. This project addresses the challenge of generating and utilizing efficient, accurate, and consistent non-equilibrium material data along three complementary but relatively independent research lines. First, we have developed a relatively fast and accurate non-LTE average-atom model based on density functional theory (DFT) that provides a complete set of EOS, transport, and radiative data, and have rigorously tested it against more sophisticated first-principles multi-atom DFT models, including time-dependent DFT. Next, we have developed a tabular scheme and interpolation methods that compactly capture non-LTE effects for use in simulations and have implemented these tables in the GORGON magneto-hydrodynamic (MHD) code. Finally, we have developed post-processing tools that use detailed tabulated non-LTE data to directly predict experimental observables from simulation output.

More Details

Adaptive Space-Time Methods for Large Scale Optimal Design

DiPietro, Kelsey L.; Ridzal, Denis R.; Morales, Diana M.

When modeling complex physical systems with advanced dynamics, such as shocks and singularities, many classic methods for solving partial differential equations can return inaccurate or unusable results. One way to resolve these complex dynamics is through r-adaptive refinement methods, in which a fixed number of mesh points are shifted to areas of high interest. The mesh refinement map can be found through the solution of the Monge-Ampére equation, a highly nonlinear partial differential equation. Due to its nonlinearity, the numerical solution of the Monge-Ampére equation is nontrivial and has previously required computationally expensive methods. In this report, we detail our novel optimization-based, multigrid-enabled solver for a low-order finite element approximation of the Monge-Ampére equation. This fast and scalable solver makes r-adaptive meshing more readily available for problems related to large-scale optimal design. Beyond mesh adaptivity, our report discusses additional applications where our fast solver for the Monge-Ampére equation could be easily applied.

More Details

Fluid-Kinetic Coupling: Advanced Discretizations for Simulations on Emerging Heterogeneous Architectures (LDRD FY20-0643)

Roberts, Nathan V.; Bond, Stephen D.; Miller, Sean A.; Cyr, Eric C.

Plasma physics simulations are vital for a host of Sandia mission concerns, for fundamental science, and for clean energy in the form of fusion power. Sandia's most mature plasma physics simulation capabilities come in the form of particle-in-cell (PIC) models and magnetohydrodynamics (MHD) models. MHD models for a plasma work well in denser plasma regimes when there is enough material that the plasma approximates a fluid. PIC models, on the other hand, work well in lower-density regimes, in which there is not too much to simulate; error in PIC scales as the square root of the number of particles, making high-accuracy simulations expensive. Real-world applications, however, almost always involve a transition region between the high-density regimes where MHD is appropriate, and the low-density regimes for PIC. In such a transition region, a direct discretization of Vlasov is appropriate. Such discretizations come with their own computational costs, however; the phase-space mesh for Vlasov can involve up to six dimensions (seven if time is included), and to apply appropriate homogeneous boundary conditions in velocity space requires meshing a substantial padding region to ensure that the distribution remains sufficiently close to zero at the velocity boundaries. Moreover, for collisional plasmas, the right-hand side of the Vlasov equation is a collision operator, which is non-local in velocity space, and which may dominate the cost of the Vlasov solver. The present LDRD project endeavors to develop modern, foundational tools for the development of continuum-kinetic Vlasov solvers, using the discontinuous Petrov-Galerkin (DPG) methodology, for discretization of Vlasov, and machine-learning (ML) models to enable efficient evaluation of collision operators. DPG affords several key advantages. First, it has a built-in, robust error indicator, allowing us to adapt the mesh in a very natural way, enabling a coarse velocity-space mesh near the homogeneous boundaries, and a fine mesh where the solution has fine features. Second, it is an inherently high-order, high-intensity method, requiring extra local computations to determine so-called optimal test functions, which makes it particularly suited to modern hardware in which floating-point throughput is increasing at a faster rate than memory bandwidth. Finally, DPG is a residual-minimizing method, which enables high-accuracy computation: in typical cases, the method delivers something very close to the $L^2$ projection of the exact solution. Meanwhile, the ML-based collision model we adopt affords a cost structure that scales as the square root of a standard direct evaluation. Moreover, we design our model to conserve mass, momentum, and energy by construction, and our approach to training is highly flexible, in that it can incorporate not only synthetic data from direct-simulation Monte Carlo (DSMC) codes, but also experimental data. We have developed two DPG formulations for Vlasov-Poisson: a time-marching, backward-Euler discretization and a space-time discretization. We have conducted a number of numerical experiments to verify the approach in a 1D1V setting. In this report, we detail these formulations and experiments. We also summarize some new theoretical results developed as part of this project (published as papers previously): some new analysis of DPG for the convection-reaction problem (of which the Vlasov equation is an instance), a new exponential integrator for DPG, and some numerical exploration of various DPG-based time-marching approaches to the heat equation. As part of this work, we have contributed extensively to the Camellia open-source library; we also describe the new capabilities and their usage. We have also developed a well-documented methodology for single-species collision operators, which we applied to argon and demonstrated with numerical experiments. We summarize those results here, as well as describing at a high level a design extending the methodology to multi-species operators. We have released a new open-source library, MLC, under a BSD license; we include a summary of its capabilities as well.

More Details

Modeling Analog Tile-Based Accelerators Using SST

Feinberg, Benjamin F.; Agarwal, Sapan A.; Plagge, Mark P.; Rothganger, Fredrick R.; Cardwell, Suma G.; Hughes, Clayton H.

Analog computing has been widely proposed to improve the energy efficiency of multiple important workloads including neural network operations, and other linear algebra kernels. To properly evaluate analog computing and explore more complex workloads such as systems consisting of multiple analog data paths, system level simulations are required. Moreover, prior work on system architectures for analog computing often rely on custom simulators creating signficant additional design effort and complicating comparisons between different systems. To remedy these issues, this report describes the design and implementation of a flexible tile-based analog accelerator element for the Structural Simulation Toolkit (SST). The element focuses on heavily on the tile controller—an often neglected aspect of prior work—that is sufficiently versatile to simulate a wide range of different tile operations including neural network layers, signal processing kernels, and generic linear algebra operations without major constraints. The tile model also interoperates with existing SST memory and network models to reduce the overall development load and enable future simulation of heterogeneous systems with both conventional digital logic and analog compute tiles. Finally, both the tile and array models are designed to easily support future extensions as new analog operations and applications that can benefit from analog computing are developed.

More Details

Global Sensitivity Analysis Using the Ultra‐Low Resolution Energy Exascale Earth System Model

Journal of Advances in Modeling Earth Systems

Kalashnikova, Irina; Peterson, Kara J.; Powell, Amy J.; Jakeman, John D.; Roesler, Erika L.

For decades, Arctic temperatures have increased twice as fast as average global temperatures. As a first step towards quantifying parametric uncertainty in Arctic climate, we performed a variance-based global sensitivity analysis (GSA) using a fully-coupled, ultra-low resolution (ULR) configuration of version 1 of the U.S. Department of Energy’s Energy Exascale Earth System Model (E3SMv1). Specifically, we quantified the sensitivity of six quantities of interest (QOIs), which characterize changes in Arctic climate over a 75 year period, to uncertainties in nine model parameters spanning the sea ice, atmosphere and ocean components of E3SMv1. Sensitivity indices for each QOI were computed with a Gaussian process emulator using 139 random realizations of the random parameters and fixed pre-industrial forcing. Uncertainties in the atmospheric parameters in the CLUBB (Cloud Layers Unified by Binormals) scheme were found to have the most impact on sea ice status and the larger Arctic climate. Our results demonstrate the importance of conducting sensitivity analyses with fully coupled climate models. The ULR configuration makes such studies computationally feasible today due to its low computational cost. When advances in computational power and modeling algorithms enable the tractable use of higher-resolution models, our results will provide a baseline that can quantify the impact of model resolution on the accuracy of sensitivity indices. Moreover, the confidence intervals provided by our study, which we used to quantify the impact of the number of model evaluations on the accuracy of sensitivity estimates, have the potential to inform the computational resources needed for future sensitivity studies.

More Details

Neural-network based collision operators for the Boltzmann equation

Journal of Computational Physics

Roberts, Nathan V.; Bond, Stephen D.; Cyr, Eric C.; Miller, Sean T.

Kinetic gas dynamics in rarefied and moderate-density regimes have complex behavior associated with collisional processes. These processes are generally defined by convolution integrals over a high-dimensional space (as in the Boltzmann operator), or require evaluating complex auxiliary variables (as in Rosenbluth potentials in Fokker-Planck operators) that are challenging to implement and computationally expensive to evaluate. In this work, we develop a data-driven neural network model that augments a simple and inexpensive BGK collision operator with a machine-learned correction term, which improves the fidelity of the simple operator with a small overhead to overall runtime. The composite collision operator has a tunable fidelity and, in this work, is trained using and tested against a direct-simulation Monte-Carlo (DSMC) collision operator.

More Details

Numerical simulation of a relativistic magnetron using a fluid electron model

Physics of Plasmas

Roberds, Nicholas R.; Cartwright, Keith C.; Sandoval, Andrew J.; Beckwith, Kristian B.; Cyr, Eric C.; Glines, Forrest W.

An approach to numerically modeling relativistic magnetrons, in which the electrons are represented with a relativistic fluid, is described. A principal effect in the operation of a magnetron is space-charge-limited (SCL) emission of electrons from the cathode. We have developed an approximate SCL emission boundary condition for the fluid electron model. This boundary condition prescribes the flux of electrons as a function of the normal component of the electric field on the boundary. We show the results of a benchmarking activity that applies the fluid SCL boundary condition to the one-dimensional Child–Langmuir diode problem and a canonical two-dimensional diode problem. Simulation results for a two-dimensional A6 magnetron are then presented. Computed bunching of the electron cloud occurs and coincides with significant microwave power generation. Numerical convergence of the solution is considered. Sharp gradients in the solution quantities at the diocotron resonance, spanning an interval of three to four grid cells in the most well-resolved case, are present and likely affect convergence.

More Details

Combining DPG in space with DPG time-marching scheme for the transient advection–reaction equation

Computer Methods in Applied Mechanics and Engineering

Roberts, Nathan V.; Muñoz-Matute, Judit M.; Demkowicz, Leszek D.

In this article, we present a general methodology to combine the Discontinuous Petrov–Galerkin (DPG) method in space and time in the context of methods of lines for transient advection–reaction problems. We first introduce a semidiscretization in space with a DPG method redefining the ideas of optimal testing and practicality of the method in this context. Then, we apply the recently developed DPG-based time-marching scheme, which is of exponential-type, to the resulting system of Ordinary Differential Equations (ODEs). Further, we also discuss how to efficiently compute the action of the exponential of the matrix coming from the space semidiscretization without assembling the full matrix. Finally, we verify the proposed method for 1D+time advection–reaction problems showing optimal convergence rates for smooth solutions and more stable results for linear conservation laws comparing to the classical exponential integrators.

More Details

Uncertainty and Sensitivity Analysis Methods and Applications in the GDSA Framework (FY2022)

Swiler, Laura P.; Basurto, Eduardo B.; Brooks, Dusty M.; Eckert, Aubrey C.; Leone, Rosemary C.; Mariner, Paul M.; Portone, Teresa P.; Smith, Mariah L.

The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Fuel Cycle Technology (FCT) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling. These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) control account, which is charged with developing a geologic repository system modeling and analysis capability, and the associated software, GDSA Framework, for evaluating disposal system performance for nuclear waste in geologic media. GDSA Framework is supported by SFWST Campaign and its predecessor the Used Fuel Disposition (UFD) campaign.

More Details

Islet: interpolation semi-Lagrangian element-based transport

Geoscientific Model Development (Online)

Bradley, Andrew M.; Bosler, Peter A.; Guba, Oksana G.

Abstract. Advection of trace species, or tracers, also called tracer transport, in models of the atmosphere and other physical domains is an important and potentially computationally expensive part of a model's dynamical core. Semi-Lagrangian (SL) advection methods are efficient because they permit a time step much larger than the advective stability limit for explicit Eulerian methods without requiring the solution of a globally coupled system of equations as implicit Eulerian methods do. Thus, to reduce the computational expense of tracer transport, dynamical cores often use SL methods to advect tracers. The class of interpolation semi-Lagrangian (ISL) methods contains potentially extremely efficient SL methods. We describe a finite-element ISL transport method that we call the interpolation semi-Lagrangian element-based transport (Islet) method, such as for use with atmosphere models discretized using the spectral element method. The Islet method uses three grids that share an element grid: a dynamics grid supporting, for example, the Gauss–Legendre–Lobatto basis of degree three; a physics parameterizations grid with a configurable number of finite-volume subcells per element; and a tracer grid supporting use of Islet bases with particular basis again configurable. This method provides extremely accurate tracer transport and excellent diagnostic values in a number of verification problems.

More Details

Accurate Compression of Tabulated Chemistry Models with Partition of Unity Networks

Combustion Science and Technology

Armstrong, Elizabeth A.; Hansen, Michael A.; Knaus, Robert C.; Trask, Nathaniel A.; Hewson, John C.; Sutherland, James C.

Tabulated chemistry models are widely used to simulate large-scale turbulent fires in applications including energy generation and fire safety. Tabulation via piecewise Cartesian interpolation suffers from the curse-of-dimensionality, leading to a prohibitive exponential growth in parameters and memory usage as more dimensions are considered. Artificial neural networks (ANNs) have attracted attention for constructing surrogates for chemistry models due to their ability to perform high-dimensional approximation. However, due to well-known pathologies regarding the realization of suboptimal local minima during training, in practice they do not converge and provide unreliable accuracy. Partition of unity networks (POUnets) are a recently introduced family of ANNs which preserve notions of convergence while performing high-dimensional approximation, discovering a mesh-free partition of space which may be used to perform optimal polynomial approximation. In this work, we assess their performance with respect to accuracy and model complexity in reconstructing unstructured flamelet data representative of nonadiabatic pool fire models. Our results show that POUnets can provide the desirable accuracy of classical spline-based interpolants with the low memory footprint of traditional ANNs while converging faster to significantly lower errors than ANNs. For example, we observe POUnets obtaining target accuracies in two dimensions with 40 to 50 times less memory and roughly double the compression in three dimensions. We also address the practical matter of efficiently training accurate POUnets by studying convergence over key hyperparameters, the impact of partition/basis formulation, and the sensitivity to initialization.

More Details

PyApprox: Enabling efficient model analysis

Jakeman, John D.

PyApprox is a Python-based one-stop-shop for probabilistic analysis of scientific numerical models. Easy to use and extendable tools are provided for constructing surrogates, sensitivity analysis, Bayesian inference, experimental design, and forward uncertainty quantification. The algorithms implemented represent the most popular methods for model analysis developed over the past two decades, including recent advances in multi-fidelity approaches that use multiple model discretizations and/or simplified physics to significantly reduce the computational cost of various types of analyses. Simple interfaces are provided for the most commonly-used algorithms to limit a user’s need to tune the various hyper-parameters of each algorithm. However, more advanced work flows that require customization of hyper-parameters is also supported. An extensive set of Benchmarks from the literature is also provided to facilitate the easy comparison of different algorithms for a wide range of model analyses. This paper introduces PyApprox and its various features, and presents results demonstrating the utility of PyApprox on a benchmark problem modeling the advection of a tracer in ground water.

More Details

Toward efficient polynomial preconditioning for GMRES

Numerical Linear Algebra with Applications

Loe, Jennifer A.; Morgan, Ronald B.

We present a polynomial preconditioner for solving large systems of linear equations. The polynomial is derived from the minimum residual polynomial (the GMRES polynomial) and is more straightforward to compute and implement than many previous polynomial preconditioners. Our current implementation of this polynomial using its roots is naturally more stable than previous methods of computing the same polynomial. We implement further stability control using added roots, and this allows for high degree polynomials. We discuss the effectiveness and challenges of root-adding and give an additional check for stability. In this article, we study the polynomial preconditioner applied to GMRES; however it could be used with any Krylov solver. This polynomial preconditioning algorithm can dramatically improve convergence for some problems, especially for difficult problems, and can reduce dot products by an even greater margin.

More Details

Permutation-adapted complete and independent basis for atomic cluster expansion descriptors

Goff, James M.; Sievers, Charles S.; Wood, Mitchell A.; Thompson, Aidan P.

In many recent applications, particularly in the field of atom-centered descriptors for interatomic potentials, tensor products of spherical harmonics have been used to characterize complex atomic environments. When coupled with a radial basis, the atomic cluster expansion (ACE) basis is obtained. However, symmetrization with respect to both rotation and permutation results in an overcomplete set of ACE descriptors with linear dependencies occurring within blocks of functions corresponding to particular generalized Wigner symbols. All practical applications of ACE employ semi-numerical constructions to generate a complete, fully independent basis. While computationally tractable, the resultant basis cannot be expressed analytically, is susceptible to numerical instability, and thus has limited reproducibility. Here we present a procedure for generating explicit analytic expressions for a complete and independent set of ACE descriptors. The procedure uses a coupling scheme that is maximally symmetric w.r.t. permutation of the atoms, exposing the permutational symmetries of the generalized Wigner symbols, and yields a permutation-adapted rotationally and permutationally invariant basis (PA-RPI ACE). Theoretical support for the approach is presented, as well as numerical evidence of completeness and independence. A summary of explicit enumeration of PA-RPI functions up to rank 6 and polynomial degree 32 is provided. The PA-RPI blocks corresponding to particular generalized Wigner symbols may be either larger or smaller than the corresponding blocks in the simpler rotationally invariant basis. Finally, we demonstrate that basis functions of high polynomial degree persist under strong regularization, indicating the importance of not restricting the maximum degree of basis functions in ACE models a priori.

More Details

Graph-Based Similarity Metrics for Comparing Simulation Model Causal Structures

Naugle, Asmeret B.; Swiler, Laura P.; Lakkaraju, Kiran L.; Verzi, Stephen J.; Warrender, Christina E.; Romero, Vicente J.

The causal structure of a simulation is a major determinant of both its character and behavior, yet most methods we use to compare simulations focus only on simulation outputs. We introduce a method that combines graphical representation with information theoretic metrics to quantitatively compare the causal structures of models. The method applies to agent-based simulations as well as system dynamics models and facilitates comparison within and between types. Comparing models based on their causal structures can illuminate differences in assumptions made by the models, allowing modelers to (1) better situate their models in the context of existing work, including highlighting novelty, (2) explicitly compare conceptual theory and assumptions to simulated theory and assumptions, and (3) investigate potential causal drivers of divergent behavior between models. We demonstrate the method by comparing two epidemiology models at different levels of aggregation.

More Details

Selective amorphization of SiGe in Si/SiGe nanostructures via high energy Si+ implant

Journal of Applied Physics

Turner, Emily M.; Campbell, Quinn C.; Avci, Ibrahim A.; Weber, William J.; Lu, Ping L.; Wang, George T.; Jones, Kevin S.

The selective amorphization of SiGe in Si/SiGe nanostructures via a 1 MeV Si + implant was investigated, resulting in single-crystal Si nanowires (NWs) and quantum dots (QDs) encapsulated in amorphous SiGe fins and pillars, respectively. The Si NWs and QDs are formed during high-temperature dry oxidation of single-crystal Si/SiGe heterostructure fins and pillars, during which Ge diffuses along the nanostructure sidewalls and encapsulates the Si layers. The fins and pillars were then subjected to a 3 × 10 15  ions/cm 2 1 MeV Si + implant, resulting in the amorphization of SiGe, while leaving the encapsulated Si crystalline for larger, 65-nm wide NWs and QDs. Interestingly, the 26-nm diameter Si QDs amorphize, while the 28-nm wide NWs remain crystalline during the same high energy ion implant. This result suggests that the Si/SiGe pillars have a lower threshold for Si-induced amorphization compared to their Si/SiGe fin counterparts. However, Monte Carlo simulations of ion implantation into the Si/SiGe nanostructures reveal similar predicted levels of displacements per cm 3 . Molecular dynamics simulations suggest that the total stress magnitude in Si QDs encapsulated in crystalline SiGe is higher than the total stress magnitude in Si NWs, which may lead to greater crystalline instability in the QDs during ion implant. The potential lower amorphization threshold of QDs compared to NWs is of special importance to applications that require robust QD devices in a variety of radiation environments.

More Details

Electrostatic Relativistic Fluid Models of Electron Emission in a Warm Diode

IEEE International Conference on Plasma Science (ICOPS)

Hamlin, Nathaniel D.; Smith, Thomas M.; Roberds, Nicholas R.; Glines, Forrest W.; Beckwith, Kristian B.

A semi-analytic fluid model has been developed for characterizing relativistic electron emission across a warm diode gap. Here we demonstrate the use of this model in (i) verifying multi-fluid codes in modeling compressible relativistic electron flows (the EMPIRE-Fluid code is used as an example; see also Ref. 1), (ii) elucidating key physics mechanisms characterizing the influence of compressibility and relativistic injection speed of the electron flow, and (iii) characterizing the regimes over which a fluid model recovers physically reasonable solutions.

More Details

Adaptive experimental design for multi-fidelity surrogate modeling of multi-disciplinary systems

International Journal for Numerical Methods in Engineering

Jakeman, John D.; Friedman, Sam; Eldred, Michael S.; Tamellini, Lorenzo; Gorodetsky, Alex A.; Allaire, Doug

We present an adaptive algorithm for constructing surrogate models of multi-disciplinary systems composed of a set of coupled components. With this goal we introduce “coupling” variables with a priori unknown distributions that allow surrogates of each component to be built independently. Once built, the surrogates of the components are combined to form an integrated-surrogate that can be used to predict system-level quantities of interest at a fraction of the cost of the original model. The error in the integrated-surrogate is greedily minimized using an experimental design procedure that allocates the amount of training data, used to construct each component-surrogate, based on the contribution of those surrogates to the error of the integrated-surrogate. The multi-fidelity procedure presented is a generalization of multi-index stochastic collocation that can leverage ensembles of models of varying cost and accuracy, for one or more components, to reduce the computational cost of constructing the integrated-surrogate. Extensive numerical results demonstrate that, for a fixed computational budget, our algorithm is able to produce surrogates that are orders of magnitude more accurate than methods that treat the integrated system as a black-box.

More Details

Scalable algorithms for physics-informed neural and graph networks

Data-Centric Engineering

Shukla, Khemraj; Xu, Mengjia; Trask, Nathaniel A.; Karniadakis, George E.

Physics-informed machine learning (PIML) has emerged as a promising new approach for simulating complex physical and biological systems that are governed by complex multiscale processes for which some data are also available. In some instances, the objective is to discover part of the hidden physics from the available data, and PIML has been shown to be particularly effective for such problems for which conventional methods may fail. Unlike commercial machine learning where training of deep neural networks requires big data, in PIML big data are not available. Instead, we can train such networks from additional information obtained by employing the physical laws and evaluating them at random points in the space-time domain. Such PIML integrates multimodality and multifidelity data with mathematical models, and implements them using neural networks or graph networks. Here, we review some of the prevailing trends in embedding physics into machine learning, using physics-informed neural networks (PINNs) based primarily on feed-forward neural networks and automatic differentiation. For more complex systems or systems of systems and unstructured data, graph neural networks (GNNs) present some distinct advantages, and here we review how physics-informed learning can be accomplished with GNNs based on graph exterior calculus to construct differential operators; we refer to these architectures as physics-informed graph networks (PIGNs). We present representative examples for both forward and inverse problems and discuss what advances are needed to scale up PINNs, PIGNs and more broadly GNNs for large-scale engineering problems.

More Details

Monolithic Multigrid for a Reduced-Quadrature Discretization of Poroelasticity

SIAM Journal on Scientific Computing

Adler, James A.; He, Yunhui H.; Hu, Xiaozhe H.; MacLachlan, Scott M.; Ohm, Peter B.

Advanced finite-element discretizations and preconditioners for models of poroelasticity have attracted significant attention in recent years. The equations of poroelasticity offer significant challenges in both areas, due to the potentially strong coupling between unknowns in the system, saddle-point structure, and the need to account for wide ranges of parameter values, including limiting behavior such as incompressible elasticity. This paper was motivated by an attempt to develop monolithic multigrid preconditioners for the discretization developed in [C. Rodrigo et al., Comput. Methods App. Mech. Engrg, 341 (2018), pp. 467--484]; we show here why this is a difficult task and, as a result, we modify the discretization in [Rodrigo et al.] through the use of a reduced-quadrature approximation, yielding a more “solver-friendly” discretization. Local Fourier analysis is used to optimize parameters in the resulting monolithic multigrid method, allowing a fair comparison between the performance and costs of methods based on Vanka and Braess--Sarazin relaxation. Further, numerical results are presented to validate the local Fourier analysis predictions and demonstrate efficiency of the algorithms. Finally, a comparison to existing block-factorization preconditioners is also given.

More Details

An optimization-based approach to parameter learning for fractional type nonlocal models

Computers and Mathematics with Applications

Burkovska, Olena; Glusa, Christian A.; D'Elia, Marta D.

Nonlocal operators of fractional type are a popular modeling choice for applications that do not adhere to classical diffusive behavior; however, one major challenge in nonlocal simulations is the selection of model parameters. In this work we propose an optimization-based approach to parameter identification for fractional models with an optional truncation radius. We formulate the inference problem as an optimal control problem where the objective is to minimize the discrepancy between observed data and an approximate solution of the model, and the control variables are the fractional order and the truncation length. For the numerical solution of the minimization problem we propose a gradient-based approach, where we enhance the numerical performance by an approximation of the bilinear form of the state equation and its derivative with respect to the fractional order. Several numerical tests in one and two dimensions illustrate the theoretical results and show the robustness and applicability of our method.

More Details

Electronic structure of intrinsic defects in c -gallium nitride: Density functional theory study without the jellium approximation

Physical Review B

Edwards, Arthur H.; Schultz, Peter A.; Dobzynski, Richard M.

We report the first nonjellium, systematic, density functional theory (DFT) study of intrinsic and extrinsic defects and defect levels in zinc-blende (cubic) gallium nitride. We use the local moment counter charge (LMCC) method, the standard Perdew-Becke-Ernzerhoff (PBE) exchange-correlation potential, and two pseudopotentials, where the Ga 3d orbitals are either in the core (d0) or explicitly in the valence set (d10). We studied 64, 216, 512, and 1000 atom supercells, and demonstrated convergence to the infinite limit, crucial for delineating deep from shallow states near band edges, and for demonstrating the elimination of finite cell-size errors. Contrary to common claims, we find that exact exchange is not required to obtain defect levels across the experimental band gap. As was true in silicon, silicon carbide, and gallium arsenide, the extremal LMCC defect levels of the aggregate of defects yield an effective LMCC defect band gap that is within 10% of the experimental gap (3.3 eV) for both pseudopotentials. We demonstrate that the gallium vacancy is more complicated than previously reported. There is dramatic metastability-a nearest-neighbor nitrogen atom shifts into the gallium site, forming an antisite, nitrogen vacancy pair, which is more stable than the simple vacancy for positive charge states. Our assessment of the d0 and d10 pseudopotentials yields minimal differences in defect structures and defect levels. The better agreement of the d0 lattice constant with experiment suggests that the more computationally economical d0 pseudopotentials are sufficient to achieve the fidelity possible within the physical accuracy of DFT, and thereby enable calculations in larger supercells necessary to demonstrate convergence with respect to finite size supercell errors.

More Details

Physics-assisted generative adversarial network for X-ray tomography

Optics Express

Guo, Zhen G.; Song, Jung K.; Barbastathis, George B.; Vaughan, Courtenay T.; Larson, Kurt L.; Alpert, Bradley K.; Levine, Zachary L.; Glinsky, Michael E.

X-ray tomography is capable of imaging the interior of objects in three dimensions non-invasively, with applications in biomedical imaging, materials science, electronic inspection, and other fields. The reconstruction process can be an ill-conditioned inverse problem, requiring regularization to obtain satisfactory results. Recently, deep learning has been adopted for tomographic reconstruction. Unlike iterative algorithms which require a distribution that is known a priori , deep reconstruction networks can learn a prior distribution through sampling the training distributions. In this work, we develop a Physics-assisted Generative Adversarial Network (PGAN), a two-step algorithm for tomographic reconstruction. In contrast to previous efforts, our PGAN utilizes maximum-likelihood estimates derived from the measurements to regularize the reconstruction with both known physics and the learned prior. Compared with methods with less physics assisting in training, PGAN can reduce the photon requirement with limited projection angles to achieve a given error rate. The advantages of using a physics-assisted learned prior in X-ray tomography may further enable low-photon nanoscale imaging.

More Details

The Portals 4.3 Network Programming Interface

Schonbein, William W.; Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan G.; Hemmert, Karl S.; Pedretti, Kevin P.; Underwood, Keith U.; Riesen, Rolf R.; Hoefler, Torsten H.; Barbe, Mathieu B.; Filho, Luiz H.; Ratchov, Alexandre R.; Maccabe, Arthur B.

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

Asymptotic preserving methods for fluid electron-fluid models in the large magnetic field limit with mathematically guaranteed properties (Final Report)

Tomas, Ignacio T.; Shadid, John N.; Maier, Matthias M.; Salgado, Abner S.

The current manuscript is a final report on the activities carried out under the Project LDRD-CIS #226834. In scientific terms, the work reported in this manuscript is a continuation of the efforts started with Project LDRD-express #223796 with final report of activities SAND2021-11481, see [83]. In this section we briefly explain what pre-existing developments motivated the current body of work and provide an overview of the activities developed with the funds provided. The overarching goal of the current project LDRD-CIS #226834 and the previous project LDRD-express #223796 is the development of numerical methods with mathematically guaranteed properties in order to solve the Euler-Maxwell system of plasma physics and generalizations thereof. Even though Project #223796 laid out general foundations of space and time discretization of Euler-Maxwell system, overall, it was focused on the development of numerical schemes for purely electrostatic fluid-plasma models. In particular, the project developed a family of schemes with mathematically guaranteed robustness in order to solve the Euler-Poisson model. This model is an asymptotic limit where only electrostatic response of the plasma is considered. Its primary feature is the presence of a non-local force, the electrostatic force, which introduces effects with infinite speed propagation into the problem. Even though instantaneous propagation of perturbations may be considered nonphysical, there are plenty of physical regimes of technical interest where such an approximation is perfectly valid.

More Details

QSCOUT Progress Report, June 2022 [Quantum Scientific Computing Open User Testbed]

Clark, Susan M.; Norris, Haley R.; Landahl, Andrew J.; Yale, Christopher G.; Lobser, Daniel L.; Van Der Wall, Jay W.; Revelle, Melissa R.

Quantum information processing has reached an inflection point, transitioning from proof-of-principle scientific experiments to small, noisy quantum processors. To accelerate this process and eventually move to fault-tolerant quantum computing, it is necessary to provide the scientific community with access to whitebox testbed systems. The Quantum Scientific Computing Open User Testbed (QSCOUT) provides scientists unique access to an innovative system to help advance quantum computing science.

More Details

Theory of the metastable injection-bleached E3c center in GaAs

Physical Review B

Schultz, Peter A.; Hjalmarson, Harold P.

The E3 transition in irradiated GaAs observed in deep level transient spectroscopy (DLTS) was recently discovered in Laplace-DLTS to encompass three distinct components. The component designated E3c was found to be metastable, reversibly bleached under minority carrier (hole) injection, with an introduction rate dependent upon Si doping density. It is shown through first-principles modeling that the E3c must be the intimate Si-vacancy pair, best described as a Si sitting in a divacancy Sivv. The bleached metastable state is enabled by a doubly site-shifting mechanism: Upon recharging, the defect undergoes a second site shift rather returning to its original E3c-active configuration via reversing the first site shift. Identification of this defect offers insights into the short-time annealing kinetics in irradiated GaAs.

More Details

A Taxonomy of Small Markovian Errors

PRX Quantum

Blume-Kohout, Robin J.; da Silva, Marcus P.; Nielsen, Erik N.; Proctor, Timothy J.; Rudinger, Kenneth M.; Sarovar, Mohan S.; Young, Kevin C.

Errors in quantum logic gates are usually modeled by quantum process matrices (CPTP maps). But process matrices can be opaque and unwieldy. We show how to transform the process matrix of a gate into an error generator that represents the same information more usefully. We construct a basis of simple and physically intuitive elementary error generators, classify them, and show how to represent the error generator of any gate as a mixture of elementary error generators with various rates. Finally, we show how to build a large variety of reduced models for gate errors by combining elementary error generators and/or entire subsectors of generator space. We conclude with a few examples of reduced models, including one with just 9N2 parameters that describes almost all commonly predicted errors on an N-qubit processor.

More Details

A Stochastic Reduced-Order Model for Statistical Microstructure Descriptors Evolution

Journal of Computing and Information Science in Engineering

Tran, Anh; Sun, Jing S.; Liu, Dehao L.; Wang, Yan W.; Wildey, Timothy M.

Integrated computational materials engineering (ICME) models have been a crucial building block for modern materials development, relieving heavy reliance on experiments and significantly accelerating the materials design process. However, ICME models are also computationally expensive, particularly with respect to time integration for dynamics, which hinders the ability to study statistical ensembles and thermodynamic properties of large systems for long time scales. To alleviate the computational bottleneck, we propose to model the evolution of statistical microstructure descriptors as a continuous-time stochastic process using a non-linear Langevin equation, where the probability density function (PDF) of the statistical microstructure descriptors, which are also the quantities of interests (QoIs), is modeled by the Fokker–Planck equation. In this work, we discuss how to calibrate the drift and diffusion terms of the Fokker–Planck equation from the theoretical and computational perspectives. The calibrated Fokker–Planck equation can be used as a stochastic reduced-order model to simulate the microstructure evolution of statistical microstructure descriptors PDF. Considering statistical microstructure descriptors in the microstructure evolution as QoIs, we demonstrate our proposed methodology in three integrated computational materials engineering (ICME) models: kinetic Monte Carlo, phase field, and molecular dynamics simulations.

More Details

Strain-tuning of transport gaps and semiconductor-to-conductor phase transition in twinned graphene

Acta Materialia

Mendez Granado, Juan P.

We show, through the use of the Landauer-Büttiker (LB) formalism and a tight-binding (TB) model, that the transport gap of twinned graphene can be tuned through the application of a uniaxial strain in the direction normal to the twin band. Remarkably, we find that the transport gap Egap bears a square-root dependence on the control parameter ϵx – ϵc, where ϵx is the applied uniaxial strain and ϵc ~ 19% is a critical strain. We interpret this dependence as evidence of criticality underlying a continuous phase transition, with ϵx – ϵc playing the role of control parameter and the transport gap Egap playing the role of order parameter. For ϵx < ϵc, the transport gap is non-zero and the material is semiconductor, whereas for ϵx < ϵc the transport gap closes to zero and the material becomes conductor, which evinces a semiconductor-to-conductor phase transition. The computed critical exponent of 1/2 places the transition in the meanfield universality class, which enables far-reaching analogies with other systems in the same class.

More Details

Entangling-gate error from coherently displaced motional modes of trapped ions

Physical Review A

Ruzic, Brandon R.; Barrick, Todd A.; Hunker, Jeffrey D.; Law, Ryan L.; McFarland, Brian M.; McGuinness, Hayden J.; Parazzoli, L.P.; Sterk, Jonathan D.; Van Der Wall, Jay W.; Stick, Daniel L.

Entangling gates in trapped-ion quantum computers are most often applied to stationary ions with initial motional distributions that are thermal and close to the ground state, while those demonstrations that involve transport generally use sympathetic cooling to reinitialize the motional state prior to applying a gate. Future systems with more ions, however, will face greater nonthermal excitation due to increased amounts of ion transport and exacerbated by longer operational times and variations over the trap array. In addition, pregate sympathetic cooling may be limited due to time costs and laser access constraints. In this paper, we analyze the impact of such coherent motional excitation on entangling-gate error by performing simulations of Mølmer-Sørenson (MS) gates on a pair of trapped-ion qubits with both thermal and coherent excitation present in a shared motional mode at the start of the gate. Here, we quantify how a small amount of coherent displacement erodes gate performance in the presence of experimental noise, and we demonstrate that adjusting the relative phase between the initial coherent displacement and the displacement induced by the gate or using Walsh modulation can suppress this error. We then use experimental data from transported ions to analyze the impact of coherent displacement on MS-gate error under realistic conditions.

More Details

Surrogate modeling for efficiently, accurately and conservatively estimating measures of risk

Reliability Engineering and System Safety

Jakeman, John D.; Kouri, Drew P.; Huerta, Jose G.

We present a surrogate modeling framework for conservatively estimating measures of risk from limited realizations of an expensive physical experiment or computational simulation. Risk measures combine objective probabilities with the subjective values of a decision maker to quantify anticipated outcomes. Given a set of samples, we construct a surrogate model that produces estimates of risk measures that are always greater than their empirical approximations obtained from the training data. These surrogate models limit over-confidence in reliability and safety assessments and produce estimates of risk measures that converge much faster to the true value than purely sample-based estimates. We first detail the construction of conservative surrogate models that can be tailored to a stakeholder's risk preferences and then present an approach, based on stochastic orders, for constructing surrogate models that are conservative with respect to families of risk measures. Our surrogate models include biases that permit them to conservatively estimate the target risk measures. We provide theoretical results that show that these biases decay at the same rate as the L2 error in the surrogate model. Numerical demonstrations confirm that risk-adapted surrogate models do indeed overestimate the target risk measures while converging at the expected rate.

More Details

CrossSim Inference Manual v2.0

Xiao, Tianyao X.; Bennett, Christopher H.; Feinberg, Benjamin F.; Marinella, Matthew J.; Agarwal, Sapan A.

Neural networks are largely based on matrix computations. During forward inference, the most heavily used compute kernel is the matrix-vector multiplication (MVM): $W \vec{x} $. Inference is a first frontier for the deployment of next-generation hardware for neural network applications, as it is more readily deployed in edge devices, such as mobile devices or embedded processors with size, weight, and power constraints. Inference is also easier to implement in analog systems than training, which has more stringent device requirements. The main processing kernel used during inference is the MVM.

More Details

A primal–dual algorithm for risk minimization

Mathematical Programming

Kouri, Drew P.; Surowiec, Thomas M.

In this paper, we develop an algorithm to efficiently solve risk-averse optimization problems posed in reflexive Banach space. Such problems often arise in many practical applications as, e.g., optimization problems constrained by partial differential equations with uncertain inputs. Unfortunately, for many popular risk models including the coherent risk measures, the resulting risk-averse objective function is nonsmooth. This lack of differentiability complicates the numerical approximation of the objective function as well as the numerical solution of the optimization problem. To address these challenges, we propose a primal–dual algorithm for solving large-scale nonsmooth risk-averse optimization problems. This algorithm is motivated by the classical method of multipliers and by epigraphical regularization of risk measures. As a result, the algorithm solves a sequence of smooth optimization problems using derivative-based methods. We prove convergence of the algorithm even when the subproblems are solved inexactly and conclude with numerical examples demonstrating the efficiency of our method.

More Details

Mixed precision s-step Lanczos and conjugate gradient algorithms

Numerical Linear Algebra with Applications

Carson, Erin; Gergelits, Tomáš; Yamazaki, Ichitaro Y.

Compared to the classical Lanczos algorithm, the s-step Lanczos variant has the potential to improve performance by asymptotically decreasing the synchronization cost per iteration. However, this comes at a price; despite being mathematically equivalent, the s-step variant may behave quite differently in finite precision, potentially exhibiting greater loss of accuracy and slower convergence relative to the classical algorithm. It has previously been shown that the errors in the s-step version follow the same structure as the errors in the classical algorithm, but are amplified by a factor depending on the square of the condition number of the (Formula presented.) -dimensional Krylov bases computed in each outer loop. As the condition number of these s-step bases grows (in some cases very quickly) with s, this limits the s values that can be chosen and thus can limit the attainable performance. In this work, we show that if a select few computations in s-step Lanczos are performed in double the working precision, the error terms then depend only linearly on the conditioning of the s-step bases. This has the potential for drastically improving the numerical behavior of the algorithm with little impact on per-iteration performance. Our numerical experiments demonstrate the improved numerical behavior possible with the mixed precision approach, and also show that this improved behavior extends to mixed precision s-step CG. We present preliminary performance results on NVIDIA V100 GPUs that show that the overhead of extra precision is minimal if one uses precisions implemented in hardware.

More Details

Low-order preconditioning of the Stokes equations

Numerical Linear Algebra with Applications

Voronin, Alexey; He, Yunhui; MacLachlan, Scott; Olson, Luke N.; Tuminaro, Raymond S.

A well-known strategy for building effective preconditioners for higher-order discretizations of some PDEs, such as Poisson's equation, is to leverage effective preconditioners for their low-order analogs. In this work, we show that high-quality preconditioners can also be derived for the Taylor–Hood discretization of the Stokes equations in much the same manner. In particular, we investigate the use of geometric multigrid based on the (Formula presented.) discretization of the Stokes operator as a preconditioner for the (Formula presented.) discretization of the Stokes system. We utilize local Fourier analysis to optimize the damping parameters for Vanka and Braess–Sarazin relaxation schemes and to achieve robust convergence. These results are then verified and compared against the measured multigrid performance. While geometric multigrid can be applied directly to the (Formula presented.) system, our ultimate motivation is to apply algebraic multigrid within solvers for (Formula presented.) systems via the (Formula presented.) discretization, which will be considered in a companion paper.

More Details

A Hybrid Method for Tensor Decompositions that Leverages Stochastic and Deterministic Optimization

Myers, Jeremy M.; Dunlavy, Daniel D.

In this paper, we propose a hybrid method that uses stochastic and deterministic search to compute the maximum likelihood estimator of a low-rank count tensor with Poisson loss via state-of-theart local methods. Our approach is inspired by Simulated Annealing for global optimization and allows for fine-grain parameter tuning as well as adaptive updates to algorithm parameters. We present numerical results that indicate our hybrid approach can compute better approximations to the maximum likelihood estimator with less computation than the state-of-the-art methods by themselves.

More Details

In Their Shoes: Persona-Based Approaches to Software Quality Practice Incentivization

Computing in Science and Engineering

Mundt, Miranda R.; Milewicz, Reed M.; Raybourn, Elaine M.

Many teams struggle to adapt and right-size software engineering best practices for quality assurance to fit their context. Introducing software quality is not usually framed in a way that motivates teams to take action, thus resulting in it becoming a “check the box for compliance” activity instead of a cultural practice that values software quality and the effort to achieve it. When and how can we provide effective incentives for software teams to adopt and integrate meaningful and enduring software quality practices? Here, we explored this question through a persona-based ideation exercise at the 2021 Collegeville Workshop on Scientific Software in which we created three unique personas that represent different scientific software developer perspectives.

More Details

The Ground Truth Program: Simulations as Test Beds for Social Science Research Methods.

Computational and Mathematical Organization Theory

Naugle, Asmeret B.; Russell, Adam R.; Lakkaraju, Kiran L.; Swiler, Laura P.; Verzi, Stephen J.; Romero, Vicente J.

Social systems are uniquely complex and difficult to study, but understanding them is vital to solving the world’s problems. The Ground Truth program developed a new way of testing the research methods that attempt to understand and leverage the Human Domain and its associated complexities. The program developed simulations of social systems as virtual world test beds. Not only were these simulations able to produce data on future states of the system under various circumstances and scenarios, but their causal ground truth was also explicitly known. Research teams studied these virtual worlds, facilitating deep validation of causal inference, prediction, and prescription methods. The Ground Truth program model provides a way to test and validate research methods to an extent previously impossible, and to study the intricacies and interactions of different components of research.

More Details

An Accurate, Error-Tolerant, and Energy-Efficient Neural Network Inference Engine Based on SONOS Analog Memory

IEEE Transactions on Circuits and Systems I: Regular Papers

Xiao, T.P.; Feinberg, Benjamin F.; Bennett, Christopher H.; Agrawal, Vineet; Saxena, Prashant; Prabhakar, Venkatraman; Ramkumar, Krishnaswamy; Medu, Harsha; Raghavan, Vijay; Chettuvetty, Ramesh; Agarwal, Sapan A.; Marinella, Matthew J.

We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-To-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a > 10× gain in energy efficiency over state-of-The-Art digital and analog inference accelerators.

More Details

Sensitivity Analysis for Solutions to Heterogeneous Nonlocal Systems. Theoretical and Numerical Studies

Journal of Peridynamics and Nonlocal Modeling

Buczkowski, Nicole E.; Foss, Mikil D.; Parks, Michael L.; Radu, Petronela R.

The paper presents a collection of results on continuous dependence for solutions to nonlocal problems under perturbations of data and system parameters. The integral operators appearing in the systems capture interactions via heterogeneous kernels that exhibit different types of weak singularities, space dependence, even regions of zero-interaction. Here, the stability results showcase explicit bounds involving the measure of the domain and of the interaction collar size, nonlocal Poincaré constant, and other parameters. In the nonlinear setting, the bounds quantify in different Lp norms the sensitivity of solutions under different nonlinearity profiles. The results are validated by numerical simulations showcasing discontinuous solutions, varying horizons of interactions, and symmetric and heterogeneous kernels.

More Details

Self-Induced Curvature in an Internally Loaded Peridynamic Fiber

Silling, Stewart A.

A straight fiber with nonlocal forces that are independent of bond strain is considered. These internal loads can either stabilize or destabilize the straight configuration. Transverse waves with long wavelength have unstable dispersion properties for certain combinations of nonlocal kernels and internal loads. When these unstable waves occur, deformation of the straight fiber into a circular arc can lower its potential energy in equilibrium. The equilibrium value of the radius of curvature is computed explicitly.

More Details

Quantitative Performance Assessment of Proxy Apps and Parents (Report for ECP Proxy App Project Milestone ADCD-504-28)

Cook, Jeanine C.; Aaziz, Omar R.; Chen, Si C.; Godoy, William F.; Powell, Amy J.; Watson, Gregory W.; Vaughan, Courtenay T.; Wildani, Avani W.

The ECP Proxy Application Project has an annual milestone to assess the state of ECP proxy applications and their role in the overall ECP ecosystem. Our FY22 March/April milestone (ADCD- 504-28) proposed to: Assess the fidelity of proxy applications compared to their respective parents in terms of kernel and I/O behavior, and predictability. Similarity techniques will be applied for quantitative comparison of proxy/parent kernel behavior. MACSio evaluation will continue and support for OpenPMD backends will be explored. The execution time predictability of proxy apps with respect to their parents will be explored through a carefully designed scaling study and code comparisons. Note that in this FY, we also have quantitative assessment milestones that are due in September and are, therefore, not included in the description above or in this report. Another report on these deliverables will be generated and submitted upon completion of these milestones. To satisfy this milestone, the following specific tasks were completed: Study the ability of MACSio to represent I/O workloads of adaptive mesh codes. Re-define the performance counter groups for contemporary Intel and IBM platforms to better match specific hardware components and to better align across platforms (make cross-platform comparison more accurate). Perform cosine similarity study based on the new performance counter groups on the Intel and IBM P9 platforms. Perform detailed analysis of performance counter data to accurately average and align the data to maintain phases across all executions and develop methods to reduce the set of collected performance counters used in cosine similarity analysis. Apply a quantitative similarity comparison between proxy and parent CPU kernels. Perform scaling studies to understand the accuracy of predictability of the parent performance using its respective proxy application. This report presents highlights of these efforts.

More Details

Kokkos 3: Programming Model Extensions for the Exascale Era

IEEE Transactions on Parallel and Distributed Systems

Trott, Christian R.; Lebrun-Grandie, Damien; Arndt, Daniel; Ciesko, Jan; Dang, Vinh Q.; Ellingwood, Nathan D.; Gayatri, Rahulkumar; Harvey, Evan C.; Hollman, Daisy S.; Ibanez, Dan; Liber, Nevin; Madsen, Jonathan; Miles, Jeff; Poliakoff, David Z.; Powell, Amy J.; Rajamanickam, Sivasankaran R.; Simberg, Mikael; Sunderland, Dan; Turcksin, Bruno; Wilke, Jeremiah

As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high-performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. We describe the novel abstractions that have been added to Kokkos version 3 such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations to prepare for exascale era architectures. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.

More Details
Results 51–100 of 9,998
Results 51–100 of 9,998