McGregor, Duncan A.; Love, Edward L.; Sirajuddin, David S.; Swan, Matthew S.; Collins, David I.; Pawlowski, Roger P.; Cartwright, Keith C.; Stafford, David S.

This is the user manual for EMPIRE, a simulation code for electromagnetics and plasma physics.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Development of Single Photon Sources in GaN

Mounce, Andrew M.; Wang, George W.; Schultz, Peter A.; Titze, Michael T.; Campbell, DeAnna M.; Lu, Ping L.; Henshaw, Jacob D.

The recent discovery of bright, room-temperature, single photon emitters in GaN leads to an appealing alternative to diamond best single photon emitters given the widespread use and technological maturity of III-nitrides for optoelectronics (e.g. blue LEDs, lasers) and high-speed, high-power electronics. This discovery opens the door to on-chip and on-demand single photon sources integrated with detectors and electronics. Currently, little is known about the underlying defect structure nor is there a sense of how such an emitter might be controllably created. A detailed understanding of the origin of the SPEs in GaN and a path to deterministically introduce them is required. In this project, we develop new experimental capabilities to then investigate single photon emission from GaN nanowires and both GAN and AlN wafers. We ion implant our wafers with the ion implanted with our focused ion beam nanoimplantation capabilities at Sandia, to go beyond typical broad beam implantation and create single photon emitting defects with nanometer precision. We've created light emitting sources using Li⁺ and He⁺, but single photon emission has yet to be demonstrated. In parallel, we calculate the energy levels of defects and transition metal substitutions in GaN to gain a better understanding of the sources of single photon emission in GaN and AlN. The combined experimental and theoretical capabilities developed throughout this project will enable further investigation into the origins of single photon emission from defects in GaN, AlN, and other wide bandgap semiconductors.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

A fractional model for anomalous diffusion with increased variability: Analysis, algorithms and applications to interface problems

Numerical Methods for Partial Differential Equations

D'Elia, Marta D.; Glusa, Christian A.

Fractional equations have become the model of choice in several applications where heterogeneities at the microstructure result in anomalous diffusive behavior at the macroscale. In this work we introduce a new fractional operator characterized by a doubly-variable fractional order and possibly truncated interactions. Under certain conditions on the model parameters and on the regularity of the fractional order we show that the corresponding Poisson problem is well-posed. We also introduce a finite element discretization and describe an efficient implementation of the finite-element matrix assembly in the case of piecewise constant fractional order. Through several numerical tests, we illustrate the improved descriptive power of this new operator across media interfaces. Furthermore, we present one-dimensional and two-dimensional h-convergence results that show that the variable-order model has the same convergence behavior as the constant-order model.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Conflicting Information and Compliance With COVID-19 Behavioral Recommendations

Naugle, Asmeret B.; Rothganger, Fredrick R.; Verzi, Stephen J.; Doyle, Casey L.

The prevalence of COVID-19 is shaped by behavioral responses to recommendations and warnings. Available information on the disease determines the population’s perception of danger and thus its behavior; this information changes dynamically, and different sources may report conflicting information. We study the feedback between disease, information, and stay-at-home behavior using a hybrid agent-based-system dynamics model that incorporates evolving trust in sources of information. We use this model to investigate how divergent reporting and conflicting information can alter the trajectory of a public health crisis. The model shows that divergent reporting not only alters disease prevalence over time, but also increases polarization of the population’s behaviors and trust in different sources of information.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Predicting the Electronic Structure of Matter on Ultra-Large Scales

Fiedler, Lenz F.; Modine, N.A.; Schmerler, Steve S.; Vogel, Dayton J.; Popoola, Gabriel P.; Thompson, Aidan P.; Rajamanickam, Sivasankaran R.; Cangi, Attila C.

The long-standing problem of predicting the electronic structure of matter on ultra-large scales (beyond 100,000 atoms) is solved with machine learning.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Monotonic Gaussian Process for Physics-Constrained Machine Learning With Materials Science Applications

Journal of Computing and Information Science in Engineering

Tran, Anh; Maupin, Kathryn A.; Rodgers, Theron R.

Physics-constrained machine learning is emerging as an important topic in the field of machine learning for physics. One of the most significant advantages of incorporating physics constraints into machine learning methods is that the resulting model requires significantly less data to train. By incorporating physical rules into the machine learning formulation itself, the predictions are expected to be physically plausible. Gaussian process (GP) is perhaps one of the most common methods in machine learning for small datasets. In this paper, we investigate the possibility of constraining a GP formulation with monotonicity on three different material datasets, where one experimental and two computational datasets are used. The monotonic GP is compared against the regular GP, where a significant reduction in the posterior variance is observed. The monotonic GP is strictly monotonic in the interpolation regime, but in the extrapolation regime, the monotonic effect starts fading away as one goes beyond the training dataset. Imposing monotonicity on the GP comes at a small accuracy cost, compared to the regular GP. The monotonic GP is perhaps most useful in applications where data are scarce and noisy, and monotonicity is supported by strong physical evidence.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

Processing Particle Data Flows with SmartNICs

Liu, Jianshen L.; Maltzahn, Carlos M.; Curry, Matthew L.; Ulmer, Craig D.

Many distributed applications implement complex data flows and need a flexible mechanism for routing data between producers and consumers. Recent advances in programmable network interface cards, or SmartNICs, represent an opportunity to offload data-flow tasks into the network fabric, thereby freeing the hosts to perform other work. System architects in this space face multiple questions about the best way to leverage SmartNICs as processing elements in data flows. In this paper, we advocate the use of Apache Arrow as a foundation for implementing data-flow tasks on SmartNICs. We report on our experiences adapting a partitioning algorithm for particle data to Apache Arrow and measure the on-card processing performance for the BlueField-2 SmartNIC. Our experiments confirm that the BlueField-2’s (de)compression hardware can have a significant impact on in-transit workflows where data must be unpacked, processed, and repacked.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Entropy and its Relationship with Statistics

Lehoucq, Richard B.; Mayer, Carolyn D.; Tucker, James D.

The purpose of our report is to discuss the notion of entropy and its relationship with statistics. Our goal is to provide a manner in which you can think about entropy, its central role within information theory and relationship with statistics. We review various relationships between information theory and statistics—nearly all are well-known but unfortunately are often not recognized. Entropy quantities the "average amount of surprise" in a random variable and lies at the heart of information theory, which studies the transmission, processing, extraction, and utilization of information. For us, data is information. What is the distinction between information theory and statistics? Information theorists work with probability distributions. Instead, statisticians work with samples. In so many words, information theory using samples is the practice of statistics. Acknowledgements. We thank Danny Dunlavy, Carlos Llosa, Oscar Lopez, Arvind Prasadan, Gary Saavedra, Jeremy Wendt for helpful discussions along the way. Our report was supported by the Laboratory Directed Research and Development program at San- dia National Laboratories, a multimission laboratory managed and operated by National Technol- ogy and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell Inter- national, Inc., for the U.S. Department of Energy's National Nuclear Adminstration under contract DE-NA0003525.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Branch, Brittany A.; Ruggles, Timothy R.; Miers, John C.; Massey, Caroline E.; Moore, David G.; Brown, Nathan B.; Duwal, Sakun D.; Silling, Stewart A.; Mitchell, John A.; Specht, Paul E.

Additive manufactured Ti-5Al-5V-5Mo-3Cr (Ti-5553) is being considered as an AM repair material for engineering applications because of its superior strength properties compared to other titanium alloys. Here, we describe the failure mechanisms observed through computed tomography, electron backscatter diffraction (EBSD), and scanning electron microscopy (SEM) of spall damage as a result of tensile failure in as-built and annealed Ti-5553. We also investigate the phase stability in native powder, as-built and annealed Ti-5553 through diamond anvil cell (DAC) and ramp compression experiments. We then explore the effect of tensile loading on a sample containing an interface between a Ti-6Al-V4 (Ti-64) baseplate and additively manufactured Ti-5553 layer. Post-mortem materials characterization showed spallation occurred in regions of initial porosity and the interface provides a nucleation site for spall damage below the spall strength of Ti-5553. Preliminary peridynamics modeling of the dynamic experiments is described. Finally, we discuss further development of Stochastic Parallel PARticle Kinteic Simulator (SPPARKS) Monte Carlo (MC) capabilities to include the integration of alpha (α)-phase and microstructural simulations for this multiphase titanium alloy.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Super-Resolution Approaches in Three-Dimensions for Classification and Screening of Commercial-Off-The-Shelf Components

Polonsky, Andrew P.; Martinez, Carianne M.; Appleby, Catherine A.; Bernard, Sylvain R.; Griego, J.J.M.; Noell, Philip N.; Pathare, Priya R.

X-ray computed tomography is generally a primary step in characterization of defective electronic components, but is generally too slow to screen large lots of components. Super-resolution imaging approaches, in which higher-resolution data is inferred from lower-resolution images, have the potential to substantially reduce collection times for data volumes accessible via x-ray computed tomography. Here we seek to advance existing two-dimensional super-resolution approaches directly to three-dimensional computed tomography data. Multiple scan resolutions over a half order of magnitude of resolution were collected for four classes of commercial electronic components to serve as training data for a deep-learning, super-resolution network. A modular python framework for three-dimensional super-resolution of computed tomography data has been developed and trained over multiple classes of electronic components. Initial training and testing demonstrate the vast promise for these approaches, which have the potential for more than an order of magnitude reduction in collection time for electronic component screening.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Demonstrate multi-turbine simulation with hybrid-structured / unstructured-moving-grid software stack running primarily on GPUs and propose improvements for successful KPP-2

Bidadi, Shreyas B.; Brazell, Michael B.; Brunhart-Lupo, Nicholas B.; Henry de Frahan, Marc T.; Lee, Dong H.; Hu, Jonathan J.; Melvin, Jeremy M.; Mullowney, Paul M.; Vijayakumar, Ganesh V.; Moser, Robert D.; Rood, Jon R.; Sakievich, Philip S.; Sharma, Ashesh S.; Williams, Alan B.; Sprague, Michael A.

The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, capturing the thin boundary layers, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Tuminaro, Raymond S.; Crockatt, Michael M.; Robinson, Allen C.

New patch smoothers or relaxation techniques are developed for solving linear matrix equations coming from systems of discretized partial differential equations (PDEs). One key linear solver challenge for many PDE systems arises when the resulting discretization matrix has a near null space that has a large dimension, which can occur in generalized magnetohydrodynamic (GMHD) systems. Patch-based relaxation is highly effective for problems when the null space can be spanned by a basis of locally supported vectors. The patch-based relaxation methods that we develop can be used either within an algebraic multigrid (AMG) hierarchy or as stand-alone preconditioners. These patch-based relaxation techniques are a form of well-known overlapping Schwarz methods where the computational domain is covered with a series of overlapping sub-domains (or patches). Patch relaxation then corresponds to solving a set of independent linear systems associated with each patch. In the context of GMHD, we also reformulate the underlying discrete representation used to generate a suitable set of matrix equations. In general, deriving a discretization that accurately approximates the curl operator and the Hall term while also producing linear systems with physically meaningful near null space properties can be challenging. Unfortunately, many natural discretization choices lead to a near null space that includes non-physical oscillatory modes and where it is not possible to span the near null space with a minimal set of locally supported basis vectors. Further discretization research is needed to understand the resulting trade-offs between accuracy, stability, and ease in solving the associated linear systems.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Enabling power measurement and control on Astra: The first petascale Arm supercomputer

Concurrency and Computation. Practice and Experience

Grant, Ryan E.; Hammond, Simon D.; Laros, James H.; Levenhagen, Michael J.; Olivier, Stephen L.; Pedretti, Kevin P.; Ward, H.L.; Younge, Andrew J.

Astra, deployed in 2018, was the first petascale supercomputer to utilize processors based on the ARM instruction set. The system was also the first under Sandia's Vanguard program which seeks to provide an evaluation vehicle for novel technologies that with refinement could be utilized in demanding, large-scale HPC environments. In addition to ARM, several other important first-of-a-kind developments were used in the machine, including new approaches to cooling the datacenter and machine. Here we document our experiences building a power measurement and control infrastructure for Astra. While this is often beyond the control of users today, the accurate measurement, cataloging, and evaluation of power, as our experiences show, is critical to the successful deployment of a large-scale platform. While such systems exist in part for other architectures, Astra required new development to support the novel Marvell ThunderX2 processor used in compute nodes. In addition to documenting the measurement of power during system bring up and for subsequent on-going routine use, we present results associated with controlling the power usage of the processor, an area which is becoming of progressively greater interest as data centers and supercomputing sites look to improve compute/energy efficiency and find additional sources for full system optimization.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

Thermodynamically consistent versions of approximations used in modelling moist air

Quarterly Journal of the Royal Meteorological Society

Eldred, Christopher; Guba, Oksana G.; Taylor, Mark A.

Some existing approaches to modelling the thermodynamics of moist air make approximations that break thermodynamic consistency, such that the resulting thermodynamics does not obey the first and second laws or has other inconsistencies. Recently, an approach to avoid such inconsistency has been suggested: the use of thermodynamic potentials in terms of their natural variables, from which all thermodynamic quantities and relationships (equations of state) are derived. In this article, we develop this approach for unapproximated moist-air thermodynamics and two widely used approximations: the constant-κ approximation and the dry heat capacities approximation. The (consistent) constant-κ approximation is particularly attractive because it leads to, with the appropriate choice of thermodynamic variable, adiabatic dynamics that depend only on total mass and are independent of the breakdown between water forms. Additionally, a wide variety of material from different sources in the literature on thermodynamics in atmospheric modelling is brought together. It is hoped that this article provides a comprehensive reference for the use of thermodynamic potentials in atmospheric modelling, especially for the three systems considered here.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

Metrics for Intercomparison of Remapping Algorithms (MIRA) protocol applied to Earth system models

Geoscientific Model Development

Applied Mathematical Modelling

Swiler, Laura P.; Basurto, Eduardo B.; Brooks, Dusty M.; Eckert, Aubrey C.; Leone, Rosemary C.; Mariner, Paul M.; Portone, Teresa P.; Smith, Mariah L.

The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Fuel Cycle Technology (FCT) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling. These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) control account, which is charged with developing a geologic repository system modeling and analysis capability, and the associated software, GDSA Framework, for evaluating disposal system performance for nuclear waste in geologic media. GDSA Framework is supported by SFWST Campaign and its predecessor the Used Fuel Disposition (UFD) campaign.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Islet: interpolation semi-Lagrangian element-based transport

Geoscientific Model Development (Online)

Bradley, Andrew M.; Bosler, Peter A.; Guba, Oksana G.

Abstract. Advection of trace species, or tracers, also called tracer transport, in models of the atmosphere and other physical domains is an important and potentially computationally expensive part of a model's dynamical core. Semi-Lagrangian (SL) advection methods are efficient because they permit a time step much larger than the advective stability limit for explicit Eulerian methods without requiring the solution of a globally coupled system of equations as implicit Eulerian methods do. Thus, to reduce the computational expense of tracer transport, dynamical cores often use SL methods to advect tracers. The class of interpolation semi-Lagrangian (ISL) methods contains potentially extremely efficient SL methods. We describe a finite-element ISL transport method that we call the interpolation semi-Lagrangian element-based transport (Islet) method, such as for use with atmosphere models discretized using the spectral element method. The Islet method uses three grids that share an element grid: a dynamics grid supporting, for example, the Gauss–Legendre–Lobatto basis of degree three; a physics parameterizations grid with a configurable number of finite-volume subcells per element; and a tracer grid supporting use of Islet bases with particular basis again configurable. This method provides extremely accurate tracer transport and excellent diagnostic values in a number of verification problems.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

Accurate Compression of Tabulated Chemistry Models with Partition of Unity Networks

Combustion Science and Technology

Armstrong, Elizabeth A.; Hansen, Michael A.; Knaus, Robert C.; Trask, Nathaniel A.; Hewson, John C.; Sutherland, James C.

Tabulated chemistry models are widely used to simulate large-scale turbulent fires in applications including energy generation and fire safety. Tabulation via piecewise Cartesian interpolation suffers from the curse-of-dimensionality, leading to a prohibitive exponential growth in parameters and memory usage as more dimensions are considered. Artificial neural networks (ANNs) have attracted attention for constructing surrogates for chemistry models due to their ability to perform high-dimensional approximation. However, due to well-known pathologies regarding the realization of suboptimal local minima during training, in practice they do not converge and provide unreliable accuracy. Partition of unity networks (POUnets) are a recently introduced family of ANNs which preserve notions of convergence while performing high-dimensional approximation, discovering a mesh-free partition of space which may be used to perform optimal polynomial approximation. In this work, we assess their performance with respect to accuracy and model complexity in reconstructing unstructured flamelet data representative of nonadiabatic pool fire models. Our results show that POUnets can provide the desirable accuracy of classical spline-based interpolants with the low memory footprint of traditional ANNs while converging faster to significantly lower errors than ANNs. For example, we observe POUnets obtaining target accuracies in two dimensions with 40 to 50 times less memory and roughly double the compression in three dimensions. We also address the practical matter of efficiently training accurate POUnets by studying convergence over key hyperparameters, the impact of partition/basis formulation, and the sensitivity to initialization.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

PyApprox: Enabling efficient model analysis

Jakeman, John D.

PyApprox is a Python-based one-stop-shop for probabilistic analysis of scientific numerical models. Easy to use and extendable tools are provided for constructing surrogates, sensitivity analysis, Bayesian inference, experimental design, and forward uncertainty quantification. The algorithms implemented represent the most popular methods for model analysis developed over the past two decades, including recent advances in multi-fidelity approaches that use multiple model discretizations and/or simplified physics to significantly reduce the computational cost of various types of analyses. Simple interfaces are provided for the most commonly-used algorithms to limit a user’s need to tune the various hyper-parameters of each algorithm. However, more advanced work flows that require customization of hyper-parameters is also supported. An extensive set of Benchmarks from the literature is also provided to facilitate the easy comparison of different algorithms for a wide range of model analyses. This paper introduces PyApprox and its various features, and presents results demonstrating the utility of PyApprox on a benchmark problem modeling the advection of a tracer in ground water.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Toward efficient polynomial preconditioning for GMRES

Numerical Linear Algebra with Applications

Loe, Jennifer A.; Morgan, Ronald B.

We present a polynomial preconditioner for solving large systems of linear equations. The polynomial is derived from the minimum residual polynomial (the GMRES polynomial) and is more straightforward to compute and implement than many previous polynomial preconditioners. Our current implementation of this polynomial using its roots is naturally more stable than previous methods of computing the same polynomial. We implement further stability control using added roots, and this allows for high degree polynomials. We discuss the effectiveness and challenges of root-adding and give an additional check for stability. In this article, we study the polynomial preconditioner applied to GMRES; however it could be used with any Krylov solver. This polynomial preconditioning algorithm can dramatically improve convergence for some problems, especially for difficult problems, and can reduce dot products by an even greater margin.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Permutation-adapted complete and independent basis for atomic cluster expansion descriptors

Goff, James M.; Sievers, Charles S.; Wood, Mitchell A.; Thompson, Aidan P.

In many recent applications, particularly in the field of atom-centered descriptors for interatomic potentials, tensor products of spherical harmonics have been used to characterize complex atomic environments. When coupled with a radial basis, the atomic cluster expansion (ACE) basis is obtained. However, symmetrization with respect to both rotation and permutation results in an overcomplete set of ACE descriptors with linear dependencies occurring within blocks of functions corresponding to particular generalized Wigner symbols. All practical applications of ACE employ semi-numerical constructions to generate a complete, fully independent basis. While computationally tractable, the resultant basis cannot be expressed analytically, is susceptible to numerical instability, and thus has limited reproducibility. Here we present a procedure for generating explicit analytic expressions for a complete and independent set of ACE descriptors. The procedure uses a coupling scheme that is maximally symmetric w.r.t. permutation of the atoms, exposing the permutational symmetries of the generalized Wigner symbols, and yields a permutation-adapted rotationally and permutationally invariant basis (PA-RPI ACE). Theoretical support for the approach is presented, as well as numerical evidence of completeness and independence. A summary of explicit enumeration of PA-RPI functions up to rank 6 and polynomial degree 32 is provided. The PA-RPI blocks corresponding to particular generalized Wigner symbols may be either larger or smaller than the corresponding blocks in the simpler rotationally invariant basis. Finally, we demonstrate that basis functions of high polynomial degree persist under strong regularization, indicating the importance of not restricting the maximum degree of basis functions in ACE models a priori.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Schonbein, William W.; Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan G.; Hemmert, Karl S.; Pedretti, Kevin P.; Underwood, Keith U.; Riesen, Rolf R.; Hoefler, Torsten H.; Barbe, Mathieu B.; Filho, Luiz H.; Ratchov, Alexandre R.; Maccabe, Arthur B.

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Asymptotic preserving methods for fluid electron-fluid models in the large magnetic field limit with mathematically guaranteed properties (Final Report)

Tomas, Ignacio T.; Shadid, John N.; Maier, Matthias M.; Salgado, Abner S.

The current manuscript is a final report on the activities carried out under the Project LDRD-CIS #226834. In scientific terms, the work reported in this manuscript is a continuation of the efforts started with Project LDRD-express #223796 with final report of activities SAND2021-11481, see [83]. In this section we briefly explain what pre-existing developments motivated the current body of work and provide an overview of the activities developed with the funds provided. The overarching goal of the current project LDRD-CIS #226834 and the previous project LDRD-express #223796 is the development of numerical methods with mathematically guaranteed properties in order to solve the Euler-Maxwell system of plasma physics and generalizations thereof. Even though Project #223796 laid out general foundations of space and time discretization of Euler-Maxwell system, overall, it was focused on the development of numerical schemes for purely electrostatic fluid-plasma models. In particular, the project developed a family of schemes with mathematically guaranteed robustness in order to solve the Euler-Poisson model. This model is an asymptotic limit where only electrostatic response of the plasma is considered. Its primary feature is the presence of a non-local force, the electrostatic force, which introduces effects with infinite speed propagation into the problem. Even though instantaneous propagation of perturbations may be considered nonphysical, there are plenty of physical regimes of technical interest where such an approximation is perfectly valid.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Xiao, Tianyao X.; Bennett, Christopher H.; Feinberg, Benjamin F.; Marinella, Matthew J.; Agarwal, Sapan A.

Neural networks are largely based on matrix computations. During forward inference, the most heavily used compute kernel is the matrix-vector multiplication (MVM): $W \vec{x} $. Inference is a first frontier for the deployment of next-generation hardware for neural network applications, as it is more readily deployed in edge devices, such as mobile devices or embedded processors with size, weight, and power constraints. Inference is also easier to implement in analog systems than training, which has more stringent device requirements. The main processing kernel used during inference is the MVM.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

A primal–dual algorithm for risk minimization

Mathematical Programming

Kouri, Drew P.; Surowiec, Thomas M.

In this paper, we develop an algorithm to efficiently solve risk-averse optimization problems posed in reflexive Banach space. Such problems often arise in many practical applications as, e.g., optimization problems constrained by partial differential equations with uncertain inputs. Unfortunately, for many popular risk models including the coherent risk measures, the resulting risk-averse objective function is nonsmooth. This lack of differentiability complicates the numerical approximation of the objective function as well as the numerical solution of the optimization problem. To address these challenges, we propose a primal–dual algorithm for solving large-scale nonsmooth risk-averse optimization problems. This algorithm is motivated by the classical method of multipliers and by epigraphical regularization of risk measures. As a result, the algorithm solves a sequence of smooth optimization problems using derivative-based methods. We prove convergence of the algorithm even when the subproblems are solved inexactly and conclude with numerical examples demonstrating the efficiency of our method.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Mixed precision s-step Lanczos and conjugate gradient algorithms

Numerical Linear Algebra with Applications

Carson, Erin; Gergelits, Tomáš; Yamazaki, Ichitaro Y.

Compared to the classical Lanczos algorithm, the s-step Lanczos variant has the potential to improve performance by asymptotically decreasing the synchronization cost per iteration. However, this comes at a price; despite being mathematically equivalent, the s-step variant may behave quite differently in finite precision, potentially exhibiting greater loss of accuracy and slower convergence relative to the classical algorithm. It has previously been shown that the errors in the s-step version follow the same structure as the errors in the classical algorithm, but are amplified by a factor depending on the square of the condition number of the (Formula presented.) -dimensional Krylov bases computed in each outer loop. As the condition number of these s-step bases grows (in some cases very quickly) with s, this limits the s values that can be chosen and thus can limit the attainable performance. In this work, we show that if a select few computations in s-step Lanczos are performed in double the working precision, the error terms then depend only linearly on the conditioning of the s-step bases. This has the potential for drastically improving the numerical behavior of the algorithm with little impact on per-iteration performance. Our numerical experiments demonstrate the improved numerical behavior possible with the mixed precision approach, and also show that this improved behavior extends to mixed precision s-step CG. We present preliminary performance results on NVIDIA V100 GPUs that show that the overhead of extra precision is minimal if one uses precisions implemented in hardware.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Low-order preconditioning of the Stokes equations

Numerical Linear Algebra with Applications

Voronin, Alexey; He, Yunhui; MacLachlan, Scott; Olson, Luke N.; Tuminaro, Raymond S.

A well-known strategy for building effective preconditioners for higher-order discretizations of some PDEs, such as Poisson's equation, is to leverage effective preconditioners for their low-order analogs. In this work, we show that high-quality preconditioners can also be derived for the Taylor–Hood discretization of the Stokes equations in much the same manner. In particular, we investigate the use of geometric multigrid based on the (Formula presented.) discretization of the Stokes operator as a preconditioner for the (Formula presented.) discretization of the Stokes system. We utilize local Fourier analysis to optimize the damping parameters for Vanka and Braess–Sarazin relaxation schemes and to achieve robust convergence. These results are then verified and compared against the measured multigrid performance. While geometric multigrid can be applied directly to the (Formula presented.) system, our ultimate motivation is to apply algebraic multigrid within solvers for (Formula presented.) systems via the (Formula presented.) discretization, which will be considered in a companion paper.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

A Hybrid Method for Tensor Decompositions that Leverages Stochastic and Deterministic Optimization

Myers, Jeremy M.; Dunlavy, Daniel D.

In this paper, we propose a hybrid method that uses stochastic and deterministic search to compute the maximum likelihood estimator of a low-rank count tensor with Poisson loss via state-of-theart local methods. Our approach is inspired by Simulated Annealing for global optimization and allows for fine-grain parameter tuning as well as adaptive updates to algorithm parameters. We present numerical results that indicate our hybrid approach can compute better approximations to the maximum likelihood estimator with less computation than the state-of-the-art methods by themselves.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

In Their Shoes: Persona-Based Approaches to Software Quality Practice Incentivization

Computing in Science and Engineering

Mundt, Miranda R.; Milewicz, Reed M.; Raybourn, Elaine M.

Many teams struggle to adapt and right-size software engineering best practices for quality assurance to fit their context. Introducing software quality is not usually framed in a way that motivates teams to take action, thus resulting in it becoming a “check the box for compliance” activity instead of a cultural practice that values software quality and the effort to achieve it. When and how can we provide effective incentives for software teams to adopt and integrate meaningful and enduring software quality practices? Here, we explored this question through a persona-based ideation exercise at the 2021 Collegeville Workshop on Scientific Software in which we created three unique personas that represent different scientific software developer perspectives.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

Silling, Stewart A.

A straight fiber with nonlocal forces that are independent of bond strain is considered. These internal loads can either stabilize or destabilize the straight configuration. Transverse waves with long wavelength have unstable dispersion properties for certain combinations of nonlocal kernels and internal loads. When these unstable waves occur, deformation of the straight fiber into a circular arc can lower its potential energy in equilibrium. The equilibrium value of the radius of curvature is computed explicitly.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Quantitative Performance Assessment of Proxy Apps and Parents (Report for ECP Proxy App Project Milestone ADCD-504-28)

Cook, Jeanine C.; Aaziz, Omar R.; Chen, Si C.; Godoy, William F.; Powell, Amy J.; Watson, Gregory W.; Vaughan, Courtenay T.; Wildani, Avani W.

The ECP Proxy Application Project has an annual milestone to assess the state of ECP proxy applications and their role in the overall ECP ecosystem. Our FY22 March/April milestone (ADCD- 504-28) proposed to: Assess the fidelity of proxy applications compared to their respective parents in terms of kernel and I/O behavior, and predictability. Similarity techniques will be applied for quantitative comparison of proxy/parent kernel behavior. MACSio evaluation will continue and support for OpenPMD backends will be explored. The execution time predictability of proxy apps with respect to their parents will be explored through a carefully designed scaling study and code comparisons. Note that in this FY, we also have quantitative assessment milestones that are due in September and are, therefore, not included in the description above or in this report. Another report on these deliverables will be generated and submitted upon completion of these milestones. To satisfy this milestone, the following specific tasks were completed: Study the ability of MACSio to represent I/O workloads of adaptive mesh codes. Re-define the performance counter groups for contemporary Intel and IBM platforms to better match specific hardware components and to better align across platforms (make cross-platform comparison more accurate). Perform cosine similarity study based on the new performance counter groups on the Intel and IBM P9 platforms. Perform detailed analysis of performance counter data to accurately average and align the data to maintain phases across all executions and develop methods to reduce the set of collected performance counters used in cosine similarity analysis. Apply a quantitative similarity comparison between proxy and parent CPU kernels. Perform scaling studies to understand the accuracy of predictability of the parent performance using its respective proxy application. This report presents highlights of these efforts.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Kokkos 3: Programming Model Extensions for the Exascale Era

IEEE Transactions on Parallel and Distributed Systems

Trott, Christian R.; Lebrun-Grandie, Damien; Arndt, Daniel; Ciesko, Jan; Dang, Vinh Q.; Ellingwood, Nathan D.; Gayatri, Rahulkumar; Harvey, Evan C.; Hollman, Daisy S.; Ibanez, Dan; Liber, Nevin; Madsen, Jonathan; Miles, Jeff; Poliakoff, David Z.; Powell, Amy J.; Rajamanickam, Sivasankaran R.; Simberg, Mikael; Sunderland, Dan; Turcksin, Bruno; Wilke, Jeremiah

As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high-performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. We describe the novel abstractions that have been added to Kokkos version 3 such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations to prepare for exascale era architectures. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Publications