TYPE Journal Article YEAR 2022

OSTI DOI

Neural-network based collision operators for the Boltzmann equation

Journal of Computational Physics

Roberts, Nathan V.; Bond, Stephen D.; Cyr, Eric C.; Miller, Sean T.

Kinetic gas dynamics in rarefied and moderate-density regimes have complex behavior associated with collisional processes. These processes are generally defined by convolution integrals over a high-dimensional space (as in the Boltzmann operator), or require evaluating complex auxiliary variables (as in Rosenbluth potentials in Fokker-Planck operators) that are challenging to implement and computationally expensive to evaluate. In this work, we develop a data-driven neural network model that augments a simple and inexpensive BGK collision operator with a machine-learned correction term, which improves the fidelity of the simple operator with a small overhead to overall runtime. The composite collision operator has a tunable fidelity and, in this work, is trained using and tested against a direct-simulation Monte-Carlo (DSMC) collision operator.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

Numerical simulation of a relativistic magnetron using a fluid electron model

Physics of Plasmas

Roberds, Nicholas R.; Cartwright, Keith C.; Sandoval, Andrew J.; Beckwith, Kristian B.; Cyr, Eric C.; Glines, Forrest W.

An approach to numerically modeling relativistic magnetrons, in which the electrons are represented with a relativistic fluid, is described. A principal effect in the operation of a magnetron is space-charge-limited (SCL) emission of electrons from the cathode. We have developed an approximate SCL emission boundary condition for the fluid electron model. This boundary condition prescribes the flux of electrons as a function of the normal component of the electric field on the boundary. We show the results of a benchmarking activity that applies the fluid SCL boundary condition to the one-dimensional Child–Langmuir diode problem and a canonical two-dimensional diode problem. Simulation results for a two-dimensional A6 magnetron are then presented. Computed bunching of the electron cloud occurs and coincides with significant microwave power generation. Numerical convergence of the solution is considered. Sharp gradients in the solution quantities at the diocotron resonance, spanning an interval of three to four grid cells in the most well-resolved case, are present and likely affect convergence.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

Combining DPG in space with DPG time-marching scheme for the transient advection–reaction equation

Computer Methods in Applied Mechanics and Engineering

Roberts, Nathan V.; Muñoz-Matute, Judit M.; Demkowicz, Leszek D.

In this article, we present a general methodology to combine the Discontinuous Petrov–Galerkin (DPG) method in space and time in the context of methods of lines for transient advection–reaction problems. We first introduce a semidiscretization in space with a DPG method redefining the ideas of optimal testing and practicality of the method in this context. Then, we apply the recently developed DPG-based time-marching scheme, which is of exponential-type, to the resulting system of Ordinary Differential Equations (ODEs). Further, we also discuss how to efficiently compute the action of the exponential of the matrix coming from the space semidiscretization without assembling the full matrix. Finally, we verify the proposed method for 1D+time advection–reaction problems showing optimal convergence rates for smooth solutions and more stable results for linear conservation laws comparing to the classical exponential integrators.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

Jakeman, John D.

PyApprox is a Python-based one-stop-shop for probabilistic analysis of scientific numerical models. Easy to use and extendable tools are provided for constructing surrogates, sensitivity analysis, Bayesian inference, experimental design, and forward uncertainty quantification. The algorithms implemented represent the most popular methods for model analysis developed over the past two decades, including recent advances in multi-fidelity approaches that use multiple model discretizations and/or simplified physics to significantly reduce the computational cost of various types of analyses. Simple interfaces are provided for the most commonly-used algorithms to limit a user’s need to tune the various hyper-parameters of each algorithm. However, more advanced work flows that require customization of hyper-parameters is also supported. An extensive set of Benchmarks from the literature is also provided to facilitate the easy comparison of different algorithms for a wide range of model analyses. This paper introduces PyApprox and its various features, and presents results demonstrating the utility of PyApprox on a benchmark problem modeling the advection of a tracer in ground water.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Toward efficient polynomial preconditioning for GMRES

Numerical Linear Algebra with Applications

Loe, Jennifer A.; Morgan, Ronald B.

We present a polynomial preconditioner for solving large systems of linear equations. The polynomial is derived from the minimum residual polynomial (the GMRES polynomial) and is more straightforward to compute and implement than many previous polynomial preconditioners. Our current implementation of this polynomial using its roots is naturally more stable than previous methods of computing the same polynomial. We implement further stability control using added roots, and this allows for high degree polynomials. We discuss the effectiveness and challenges of root-adding and give an additional check for stability. In this article, we study the polynomial preconditioner applied to GMRES; however it could be used with any Krylov solver. This polynomial preconditioning algorithm can dramatically improve convergence for some problems, especially for difficult problems, and can reduce dot products by an even greater margin.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Permutation-adapted complete and independent basis for atomic cluster expansion descriptors

Goff, James M.; Sievers, Charles S.; Wood, Mitchell A.; Thompson, Aidan P.

In many recent applications, particularly in the field of atom-centered descriptors for interatomic potentials, tensor products of spherical harmonics have been used to characterize complex atomic environments. When coupled with a radial basis, the atomic cluster expansion (ACE) basis is obtained. However, symmetrization with respect to both rotation and permutation results in an overcomplete set of ACE descriptors with linear dependencies occurring within blocks of functions corresponding to particular generalized Wigner symbols. All practical applications of ACE employ semi-numerical constructions to generate a complete, fully independent basis. While computationally tractable, the resultant basis cannot be expressed analytically, is susceptible to numerical instability, and thus has limited reproducibility. Here we present a procedure for generating explicit analytic expressions for a complete and independent set of ACE descriptors. The procedure uses a coupling scheme that is maximally symmetric w.r.t. permutation of the atoms, exposing the permutational symmetries of the generalized Wigner symbols, and yields a permutation-adapted rotationally and permutationally invariant basis (PA-RPI ACE). Theoretical support for the approach is presented, as well as numerical evidence of completeness and independence. A summary of explicit enumeration of PA-RPI functions up to rank 6 and polynomial degree 32 is provided. The PA-RPI blocks corresponding to particular generalized Wigner symbols may be either larger or smaller than the corresponding blocks in the simpler rotationally invariant basis. Finally, we demonstrate that basis functions of high polynomial degree persist under strong regularization, indicating the importance of not restricting the maximum degree of basis functions in ACE models a priori.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Schonbein, William W.; Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan G.; Hemmert, Karl S.; Pedretti, Kevin P.; Underwood, Keith U.; Riesen, Rolf R.; Hoefler, Torsten H.; Barbe, Mathieu B.; Filho, Luiz H.; Ratchov, Alexandre R.; Maccabe, Arthur B.

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Asymptotic preserving methods for fluid electron-fluid models in the large magnetic field limit with mathematically guaranteed properties (Final Report)

Tomas, Ignacio T.; Shadid, John N.; Maier, Matthias M.; Salgado, Abner S.

The current manuscript is a final report on the activities carried out under the Project LDRD-CIS #226834. In scientific terms, the work reported in this manuscript is a continuation of the efforts started with Project LDRD-express #223796 with final report of activities SAND2021-11481, see [83]. In this section we briefly explain what pre-existing developments motivated the current body of work and provide an overview of the activities developed with the funds provided. The overarching goal of the current project LDRD-CIS #226834 and the previous project LDRD-express #223796 is the development of numerical methods with mathematically guaranteed properties in order to solve the Euler-Maxwell system of plasma physics and generalizations thereof. Even though Project #223796 laid out general foundations of space and time discretization of Euler-Maxwell system, overall, it was focused on the development of numerical schemes for purely electrostatic fluid-plasma models. In particular, the project developed a family of schemes with mathematically guaranteed robustness in order to solve the Euler-Poisson model. This model is an asymptotic limit where only electrostatic response of the plasma is considered. Its primary feature is the presence of a non-local force, the electrostatic force, which introduces effects with infinite speed propagation into the problem. Even though instantaneous propagation of perturbations may be considered nonphysical, there are plenty of physical regimes of technical interest where such an approximation is perfectly valid.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Xiao, Tianyao X.; Bennett, Christopher H.; Feinberg, Benjamin F.; Marinella, Matthew J.; Agarwal, Sapan A.

Neural networks are largely based on matrix computations. During forward inference, the most heavily used compute kernel is the matrix-vector multiplication (MVM): $W \vec{x} $. Inference is a first frontier for the deployment of next-generation hardware for neural network applications, as it is more readily deployed in edge devices, such as mobile devices or embedded processors with size, weight, and power constraints. Inference is also easier to implement in analog systems than training, which has more stringent device requirements. The main processing kernel used during inference is the MVM.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

A primal–dual algorithm for risk minimization

Mathematical Programming

Kouri, Drew P.; Surowiec, Thomas M.

In this paper, we develop an algorithm to efficiently solve risk-averse optimization problems posed in reflexive Banach space. Such problems often arise in many practical applications as, e.g., optimization problems constrained by partial differential equations with uncertain inputs. Unfortunately, for many popular risk models including the coherent risk measures, the resulting risk-averse objective function is nonsmooth. This lack of differentiability complicates the numerical approximation of the objective function as well as the numerical solution of the optimization problem. To address these challenges, we propose a primal–dual algorithm for solving large-scale nonsmooth risk-averse optimization problems. This algorithm is motivated by the classical method of multipliers and by epigraphical regularization of risk measures. As a result, the algorithm solves a sequence of smooth optimization problems using derivative-based methods. We prove convergence of the algorithm even when the subproblems are solved inexactly and conclude with numerical examples demonstrating the efficiency of our method.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Mixed precision s-step Lanczos and conjugate gradient algorithms

Numerical Linear Algebra with Applications

Carson, Erin; Gergelits, Tomáš; Yamazaki, Ichitaro Y.

Compared to the classical Lanczos algorithm, the s-step Lanczos variant has the potential to improve performance by asymptotically decreasing the synchronization cost per iteration. However, this comes at a price; despite being mathematically equivalent, the s-step variant may behave quite differently in finite precision, potentially exhibiting greater loss of accuracy and slower convergence relative to the classical algorithm. It has previously been shown that the errors in the s-step version follow the same structure as the errors in the classical algorithm, but are amplified by a factor depending on the square of the condition number of the (Formula presented.) -dimensional Krylov bases computed in each outer loop. As the condition number of these s-step bases grows (in some cases very quickly) with s, this limits the s values that can be chosen and thus can limit the attainable performance. In this work, we show that if a select few computations in s-step Lanczos are performed in double the working precision, the error terms then depend only linearly on the conditioning of the s-step bases. This has the potential for drastically improving the numerical behavior of the algorithm with little impact on per-iteration performance. Our numerical experiments demonstrate the improved numerical behavior possible with the mixed precision approach, and also show that this improved behavior extends to mixed precision s-step CG. We present preliminary performance results on NVIDIA V100 GPUs that show that the overhead of extra precision is minimal if one uses precisions implemented in hardware.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Low-order preconditioning of the Stokes equations

Numerical Linear Algebra with Applications

Voronin, Alexey; He, Yunhui; MacLachlan, Scott; Olson, Luke N.; Tuminaro, Raymond S.

A well-known strategy for building effective preconditioners for higher-order discretizations of some PDEs, such as Poisson's equation, is to leverage effective preconditioners for their low-order analogs. In this work, we show that high-quality preconditioners can also be derived for the Taylor–Hood discretization of the Stokes equations in much the same manner. In particular, we investigate the use of geometric multigrid based on the (Formula presented.) discretization of the Stokes operator as a preconditioner for the (Formula presented.) discretization of the Stokes system. We utilize local Fourier analysis to optimize the damping parameters for Vanka and Braess–Sarazin relaxation schemes and to achieve robust convergence. These results are then verified and compared against the measured multigrid performance. While geometric multigrid can be applied directly to the (Formula presented.) system, our ultimate motivation is to apply algebraic multigrid within solvers for (Formula presented.) systems via the (Formula presented.) discretization, which will be considered in a companion paper.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

A Hybrid Method for Tensor Decompositions that Leverages Stochastic and Deterministic Optimization

Myers, Jeremy M.; Dunlavy, Daniel D.

In this paper, we propose a hybrid method that uses stochastic and deterministic search to compute the maximum likelihood estimator of a low-rank count tensor with Poisson loss via state-of-theart local methods. Our approach is inspired by Simulated Annealing for global optimization and allows for fine-grain parameter tuning as well as adaptive updates to algorithm parameters. We present numerical results that indicate our hybrid approach can compute better approximations to the maximum likelihood estimator with less computation than the state-of-the-art methods by themselves.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

In Their Shoes: Persona-Based Approaches to Software Quality Practice Incentivization

Computing in Science and Engineering

Mundt, Miranda R.; Milewicz, Reed M.; Raybourn, Elaine M.

Many teams struggle to adapt and right-size software engineering best practices for quality assurance to fit their context. Introducing software quality is not usually framed in a way that motivates teams to take action, thus resulting in it becoming a “check the box for compliance” activity instead of a cultural practice that values software quality and the effort to achieve it. When and how can we provide effective incentives for software teams to adopt and integrate meaningful and enduring software quality practices? Here, we explored this question through a persona-based ideation exercise at the 2021 Collegeville Workshop on Scientific Software in which we created three unique personas that represent different scientific software developer perspectives.

More Details

TYPE Journal Article YEAR 2022

OSTI DOI

Silling, Stewart A.

A straight fiber with nonlocal forces that are independent of bond strain is considered. These internal loads can either stabilize or destabilize the straight configuration. Transverse waves with long wavelength have unstable dispersion properties for certain combinations of nonlocal kernels and internal loads. When these unstable waves occur, deformation of the straight fiber into a circular arc can lower its potential energy in equilibrium. The equilibrium value of the radius of curvature is computed explicitly.

More Details

TYPE SAND Report YEAR 2022

OSTI DOI

Quantitative Performance Assessment of Proxy Apps and Parents (Report for ECP Proxy App Project Milestone ADCD-504-28)

Cook, Jeanine C.; Aaziz, Omar R.; Chen, Si C.; Godoy, William F.; Powell, Amy J.; Watson, Gregory W.; Vaughan, Courtenay T.; Wildani, Avani W.

The ECP Proxy Application Project has an annual milestone to assess the state of ECP proxy applications and their role in the overall ECP ecosystem. Our FY22 March/April milestone (ADCD- 504-28) proposed to: Assess the fidelity of proxy applications compared to their respective parents in terms of kernel and I/O behavior, and predictability. Similarity techniques will be applied for quantitative comparison of proxy/parent kernel behavior. MACSio evaluation will continue and support for OpenPMD backends will be explored. The execution time predictability of proxy apps with respect to their parents will be explored through a carefully designed scaling study and code comparisons. Note that in this FY, we also have quantitative assessment milestones that are due in September and are, therefore, not included in the description above or in this report. Another report on these deliverables will be generated and submitted upon completion of these milestones. To satisfy this milestone, the following specific tasks were completed: Study the ability of MACSio to represent I/O workloads of adaptive mesh codes. Re-define the performance counter groups for contemporary Intel and IBM platforms to better match specific hardware components and to better align across platforms (make cross-platform comparison more accurate). Perform cosine similarity study based on the new performance counter groups on the Intel and IBM P9 platforms. Perform detailed analysis of performance counter data to accurately average and align the data to maintain phases across all executions and develop methods to reduce the set of collected performance counters used in cosine similarity analysis. Apply a quantitative similarity comparison between proxy and parent CPU kernels. Perform scaling studies to understand the accuracy of predictability of the parent performance using its respective proxy application. This report presents highlights of these efforts.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Kokkos 3: Programming Model Extensions for the Exascale Era

IEEE Transactions on Parallel and Distributed Systems

Trott, Christian R.; Lebrun-Grandie, Damien; Arndt, Daniel; Ciesko, Jan; Dang, Vinh Q.; Ellingwood, Nathan D.; Gayatri, Rahulkumar; Harvey, Evan C.; Hollman, Daisy S.; Ibanez, Dan; Liber, Nevin; Madsen, Jonathan; Miles, Jeff; Poliakoff, David Z.; Powell, Amy J.; Rajamanickam, Sivasankaran R.; Simberg, Mikael; Sunderland, Dan; Turcksin, Bruno; Wilke, Jeremiah

As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high-performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. We describe the novel abstractions that have been added to Kokkos version 3 such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations to prepare for exascale era architectures. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.

More Details

TYPE Journal Article YEAR 2022

Scopus OSTI DOI

Publications