Adler, James A.; He, Yunhui H.; Hu, Xiaozhe H.; MacLachlan, Scott M.; Ohm, Peter B.

Advanced finite-element discretizations and preconditioners for models of poroelasticity have attracted significant attention in recent years. The equations of poroelasticity offer significant challenges in both areas, due to the potentially strong coupling between unknowns in the system, saddle-point structure, and the need to account for wide ranges of parameter values, including limiting behavior such as incompressible elasticity. This paper was motivated by an attempt to develop monolithic multigrid preconditioners for the discretization developed in [C. Rodrigo et al., Comput. Methods App. Mech. Engrg, 341 (2018), pp. 467--484]; we show here why this is a difficult task and, as a result, we modify the discretization in [Rodrigo et al.] through the use of a reduced-quadrature approximation, yielding a more “solver-friendly” discretization. Local Fourier analysis is used to optimize parameters in the resulting monolithic multigrid method, allowing a fair comparison between the performance and costs of methods based on Vanka and Braess--Sarazin relaxation. Further, numerical results are presented to validate the local Fourier analysis predictions and demonstrate efficiency of the algorithms. Finally, a comparison to existing block-factorization preconditioners is also given.

Nonlocal operators of fractional type are a popular modeling choice for applications that do not adhere to classical diffusive behavior; however, one major challenge in nonlocal simulations is the selection of model parameters. In this work we propose an optimization-based approach to parameter identification for fractional models with an optional truncation radius. We formulate the inference problem as an optimal control problem where the objective is to minimize the discrepancy between observed data and an approximate solution of the model, and the control variables are the fractional order and the truncation length. For the numerical solution of the minimization problem we propose a gradient-based approach, where we enhance the numerical performance by an approximation of the bilinear form of the state equation and its derivative with respect to the fractional order. Several numerical tests in one and two dimensions illustrate the theoretical results and show the robustness and applicability of our method.

We report the first nonjellium, systematic, density functional theory (DFT) study of intrinsic and extrinsic defects and defect levels in zinc-blende (cubic) gallium nitride. We use the local moment counter charge (LMCC) method, the standard Perdew-Becke-Ernzerhoff (PBE) exchange-correlation potential, and two pseudopotentials, where the Ga 3d orbitals are either in the core (d0) or explicitly in the valence set (d10). We studied 64, 216, 512, and 1000 atom supercells, and demonstrated convergence to the infinite limit, crucial for delineating deep from shallow states near band edges, and for demonstrating the elimination of finite cell-size errors. Contrary to common claims, we find that exact exchange is not required to obtain defect levels across the experimental band gap. As was true in silicon, silicon carbide, and gallium arsenide, the extremal LMCC defect levels of the aggregate of defects yield an effective LMCC defect band gap that is within 10% of the experimental gap (3.3 eV) for both pseudopotentials. We demonstrate that the gallium vacancy is more complicated than previously reported. There is dramatic metastability-a nearest-neighbor nitrogen atom shifts into the gallium site, forming an antisite, nitrogen vacancy pair, which is more stable than the simple vacancy for positive charge states. Our assessment of the d0 and d10 pseudopotentials yields minimal differences in defect structures and defect levels. The better agreement of the d0 lattice constant with experiment suggests that the more computationally economical d0 pseudopotentials are sufficient to achieve the fidelity possible within the physical accuracy of DFT, and thereby enable calculations in larger supercells necessary to demonstrate convergence with respect to finite size supercell errors.

X-ray tomography is capable of imaging the interior of objects in three dimensions non-invasively, with applications in biomedical imaging, materials science, electronic inspection, and other fields. The reconstruction process can be an ill-conditioned inverse problem, requiring regularization to obtain satisfactory results. Recently, deep learning has been adopted for tomographic reconstruction. Unlike iterative algorithms which require a distribution that is known a priori , deep reconstruction networks can learn a prior distribution through sampling the training distributions. In this work, we develop a Physics-assisted Generative Adversarial Network (PGAN), a two-step algorithm for tomographic reconstruction. In contrast to previous efforts, our PGAN utilizes maximum-likelihood estimates derived from the measurements to regularize the reconstruction with both known physics and the learned prior. Compared with methods with less physics assisting in training, PGAN can reduce the photon requirement with limited projection angles to achieve a given error rate. The advantages of using a physics-assisted learned prior in X-ray tomography may further enable low-photon nanoscale imaging.

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

The current manuscript is a final report on the activities carried out under the Project LDRD-CIS #226834. In scientific terms, the work reported in this manuscript is a continuation of the efforts started with Project LDRD-express #223796 with final report of activities SAND2021-11481, see [83]. In this section we briefly explain what pre-existing developments motivated the current body of work and provide an overview of the activities developed with the funds provided. The overarching goal of the current project LDRD-CIS #226834 and the previous project LDRD-express #223796 is the development of numerical methods with mathematically guaranteed properties in order to solve the Euler-Maxwell system of plasma physics and generalizations thereof. Even though Project #223796 laid out general foundations of space and time discretization of Euler-Maxwell system, overall, it was focused on the development of numerical schemes for purely electrostatic fluid-plasma models. In particular, the project developed a family of schemes with mathematically guaranteed robustness in order to solve the Euler-Poisson model. This model is an asymptotic limit where only electrostatic response of the plasma is considered. Its primary feature is the presence of a non-local force, the electrostatic force, which introduces effects with infinite speed propagation into the problem. Even though instantaneous propagation of perturbations may be considered nonphysical, there are plenty of physical regimes of technical interest where such an approximation is perfectly valid.

Quantum information processing has reached an inflection point, transitioning from proof-of-principle scientific experiments to small, noisy quantum processors. To accelerate this process and eventually move to fault-tolerant quantum computing, it is necessary to provide the scientific community with access to whitebox testbed systems. The Quantum Scientific Computing Open User Testbed (QSCOUT) provides scientists unique access to an innovative system to help advance quantum computing science.

The E3 transition in irradiated GaAs observed in deep level transient spectroscopy (DLTS) was recently discovered in Laplace-DLTS to encompass three distinct components. The component designated E3c was found to be metastable, reversibly bleached under minority carrier (hole) injection, with an introduction rate dependent upon Si doping density. It is shown through first-principles modeling that the E3c must be the intimate Si-vacancy pair, best described as a Si sitting in a divacancy Sivv. The bleached metastable state is enabled by a doubly site-shifting mechanism: Upon recharging, the defect undergoes a second site shift rather returning to its original E3c-active configuration via reversing the first site shift. Identification of this defect offers insights into the short-time annealing kinetics in irradiated GaAs.

Errors in quantum logic gates are usually modeled by quantum process matrices (CPTP maps). But process matrices can be opaque and unwieldy. We show how to transform the process matrix of a gate into an error generator that represents the same information more usefully. We construct a basis of simple and physically intuitive elementary error generators, classify them, and show how to represent the error generator of any gate as a mixture of elementary error generators with various rates. Finally, we show how to build a large variety of reduced models for gate errors by combining elementary error generators and/or entire subsectors of generator space. We conclude with a few examples of reduced models, including one with just 9N^{2} parameters that describes almost all commonly predicted errors on an N-qubit processor.

Integrated computational materials engineering (ICME) models have been a crucial building block for modern materials development, relieving heavy reliance on experiments and significantly accelerating the materials design process. However, ICME models are also computationally expensive, particularly with respect to time integration for dynamics, which hinders the ability to study statistical ensembles and thermodynamic properties of large systems for long time scales. To alleviate the computational bottleneck, we propose to model the evolution of statistical microstructure descriptors as a continuous-time stochastic process using a non-linear Langevin equation, where the probability density function (PDF) of the statistical microstructure descriptors, which are also the quantities of interests (QoIs), is modeled by the Fokker–Planck equation. In this work, we discuss how to calibrate the drift and diffusion terms of the Fokker–Planck equation from the theoretical and computational perspectives. The calibrated Fokker–Planck equation can be used as a stochastic reduced-order model to simulate the microstructure evolution of statistical microstructure descriptors PDF. Considering statistical microstructure descriptors in the microstructure evolution as QoIs, we demonstrate our proposed methodology in three integrated computational materials engineering (ICME) models: kinetic Monte Carlo, phase field, and molecular dynamics simulations.

We show, through the use of the Landauer-Büttiker (LB) formalism and a tight-binding (TB) model, that the transport gap of twinned graphene can be tuned through the application of a uniaxial strain in the direction normal to the twin band. Remarkably, we find that the transport gap E_{gap} bears a square-root dependence on the control parameter ϵ_{x} – ϵ_{c}, where ϵ_{x} is the applied uniaxial strain and ϵ_{c} ~ 19% is a critical strain. We interpret this dependence as evidence of criticality underlying a continuous phase transition, with ϵ_{x} – ϵ_{c} playing the role of control parameter and the transport gap E_{gap} playing the role of order parameter. For ϵ_{x} < ϵ_{c}, the transport gap is non-zero and the material is semiconductor, whereas for ϵ_{x} < ϵ_{c} the transport gap closes to zero and the material becomes conductor, which evinces a semiconductor-to-conductor phase transition. The computed critical exponent of 1/2 places the transition in the meanfield universality class, which enables far-reaching analogies with other systems in the same class.

Entangling gates in trapped-ion quantum computers are most often applied to stationary ions with initial motional distributions that are thermal and close to the ground state, while those demonstrations that involve transport generally use sympathetic cooling to reinitialize the motional state prior to applying a gate. Future systems with more ions, however, will face greater nonthermal excitation due to increased amounts of ion transport and exacerbated by longer operational times and variations over the trap array. In addition, pregate sympathetic cooling may be limited due to time costs and laser access constraints. In this paper, we analyze the impact of such coherent motional excitation on entangling-gate error by performing simulations of Mølmer-Sørenson (MS) gates on a pair of trapped-ion qubits with both thermal and coherent excitation present in a shared motional mode at the start of the gate. Here, we quantify how a small amount of coherent displacement erodes gate performance in the presence of experimental noise, and we demonstrate that adjusting the relative phase between the initial coherent displacement and the displacement induced by the gate or using Walsh modulation can suppress this error. We then use experimental data from transported ions to analyze the impact of coherent displacement on MS-gate error under realistic conditions.

We present a surrogate modeling framework for conservatively estimating measures of risk from limited realizations of an expensive physical experiment or computational simulation. Risk measures combine objective probabilities with the subjective values of a decision maker to quantify anticipated outcomes. Given a set of samples, we construct a surrogate model that produces estimates of risk measures that are always greater than their empirical approximations obtained from the training data. These surrogate models limit over-confidence in reliability and safety assessments and produce estimates of risk measures that converge much faster to the true value than purely sample-based estimates. We first detail the construction of conservative surrogate models that can be tailored to a stakeholder's risk preferences and then present an approach, based on stochastic orders, for constructing surrogate models that are conservative with respect to families of risk measures. Our surrogate models include biases that permit them to conservatively estimate the target risk measures. We provide theoretical results that show that these biases decay at the same rate as the L2 error in the surrogate model. Numerical demonstrations confirm that risk-adapted surrogate models do indeed overestimate the target risk measures while converging at the expected rate.

Neural networks are largely based on matrix computations. During forward inference, the most heavily used compute kernel is the matrix-vector multiplication (MVM): $W \vec{x} $. Inference is a first frontier for the deployment of next-generation hardware for neural network applications, as it is more readily deployed in edge devices, such as mobile devices or embedded processors with size, weight, and power constraints. Inference is also easier to implement in analog systems than training, which has more stringent device requirements. The main processing kernel used during inference is the MVM.

In this paper, we develop an algorithm to efficiently solve risk-averse optimization problems posed in reflexive Banach space. Such problems often arise in many practical applications as, e.g., optimization problems constrained by partial differential equations with uncertain inputs. Unfortunately, for many popular risk models including the coherent risk measures, the resulting risk-averse objective function is nonsmooth. This lack of differentiability complicates the numerical approximation of the objective function as well as the numerical solution of the optimization problem. To address these challenges, we propose a primal–dual algorithm for solving large-scale nonsmooth risk-averse optimization problems. This algorithm is motivated by the classical method of multipliers and by epigraphical regularization of risk measures. As a result, the algorithm solves a sequence of smooth optimization problems using derivative-based methods. We prove convergence of the algorithm even when the subproblems are solved inexactly and conclude with numerical examples demonstrating the efficiency of our method.

Compared to the classical Lanczos algorithm, the s-step Lanczos variant has the potential to improve performance by asymptotically decreasing the synchronization cost per iteration. However, this comes at a price; despite being mathematically equivalent, the s-step variant may behave quite differently in finite precision, potentially exhibiting greater loss of accuracy and slower convergence relative to the classical algorithm. It has previously been shown that the errors in the s-step version follow the same structure as the errors in the classical algorithm, but are amplified by a factor depending on the square of the condition number of the (Formula presented.) -dimensional Krylov bases computed in each outer loop. As the condition number of these s-step bases grows (in some cases very quickly) with s, this limits the s values that can be chosen and thus can limit the attainable performance. In this work, we show that if a select few computations in s-step Lanczos are performed in double the working precision, the error terms then depend only linearly on the conditioning of the s-step bases. This has the potential for drastically improving the numerical behavior of the algorithm with little impact on per-iteration performance. Our numerical experiments demonstrate the improved numerical behavior possible with the mixed precision approach, and also show that this improved behavior extends to mixed precision s-step CG. We present preliminary performance results on NVIDIA V100 GPUs that show that the overhead of extra precision is minimal if one uses precisions implemented in hardware.

Voronin, Alexey; He, Yunhui; MacLachlan, Scott; Olson, Luke N.; Tuminaro, Raymond S.

A well-known strategy for building effective preconditioners for higher-order discretizations of some PDEs, such as Poisson's equation, is to leverage effective preconditioners for their low-order analogs. In this work, we show that high-quality preconditioners can also be derived for the Taylor–Hood discretization of the Stokes equations in much the same manner. In particular, we investigate the use of geometric multigrid based on the (Formula presented.) discretization of the Stokes operator as a preconditioner for the (Formula presented.) discretization of the Stokes system. We utilize local Fourier analysis to optimize the damping parameters for Vanka and Braess–Sarazin relaxation schemes and to achieve robust convergence. These results are then verified and compared against the measured multigrid performance. While geometric multigrid can be applied directly to the (Formula presented.) system, our ultimate motivation is to apply algebraic multigrid within solvers for (Formula presented.) systems via the (Formula presented.) discretization, which will be considered in a companion paper.

In this paper, we propose a hybrid method that uses stochastic and deterministic search to compute the maximum likelihood estimator of a low-rank count tensor with Poisson loss via state-of-theart local methods. Our approach is inspired by Simulated Annealing for global optimization and allows for fine-grain parameter tuning as well as adaptive updates to algorithm parameters. We present numerical results that indicate our hybrid approach can compute better approximations to the maximum likelihood estimator with less computation than the state-of-the-art methods by themselves.

Many teams struggle to adapt and right-size software engineering best practices for quality assurance to fit their context. Introducing software quality is not usually framed in a way that motivates teams to take action, thus resulting in it becoming a “check the box for compliance” activity instead of a cultural practice that values software quality and the effort to achieve it. When and how can we provide effective incentives for software teams to adopt and integrate meaningful and enduring software quality practices? Here, we explored this question through a persona-based ideation exercise at the 2021 Collegeville Workshop on Scientific Software in which we created three unique personas that represent different scientific software developer perspectives.

Social systems are uniquely complex and difficult to study, but understanding them is vital to solving the world’s problems. The Ground Truth program developed a new way of testing the research methods that attempt to understand and leverage the Human Domain and its associated complexities. The program developed simulations of social systems as virtual world test beds. Not only were these simulations able to produce data on future states of the system under various circumstances and scenarios, but their causal ground truth was also explicitly known. Research teams studied these virtual worlds, facilitating deep validation of causal inference, prediction, and prescription methods. The Ground Truth program model provides a way to test and validate research methods to an extent previously impossible, and to study the intricacies and interactions of different components of research.

We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-To-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a > 10× gain in energy efficiency over state-of-The-Art digital and analog inference accelerators.

Buczkowski, Nicole E.; Foss, Mikil D.; Parks, Michael L.; Radu, Petronela R.

The paper presents a collection of results on continuous dependence for solutions to nonlocal problems under perturbations of data and system parameters. The integral operators appearing in the systems capture interactions via heterogeneous kernels that exhibit different types of weak singularities, space dependence, even regions of zero-interaction. Here, the stability results showcase explicit bounds involving the measure of the domain and of the interaction collar size, nonlocal Poincaré constant, and other parameters. In the nonlinear setting, the bounds quantify in different L^{p} norms the sensitivity of solutions under different nonlinearity profiles. The results are validated by numerical simulations showcasing discontinuous solutions, varying horizons of interactions, and symmetric and heterogeneous kernels.

A straight fiber with nonlocal forces that are independent of bond strain is considered. These internal loads can either stabilize or destabilize the straight configuration. Transverse waves with long wavelength have unstable dispersion properties for certain combinations of nonlocal kernels and internal loads. When these unstable waves occur, deformation of the straight fiber into a circular arc can lower its potential energy in equilibrium. The equilibrium value of the radius of curvature is computed explicitly.

The ECP Proxy Application Project has an annual milestone to assess the state of ECP proxy applications and their role in the overall ECP ecosystem. Our FY22 March/April milestone (ADCD- 504-28) proposed to: Assess the fidelity of proxy applications compared to their respective parents in terms of kernel and I/O behavior, and predictability. Similarity techniques will be applied for quantitative comparison of proxy/parent kernel behavior. MACSio evaluation will continue and support for OpenPMD backends will be explored. The execution time predictability of proxy apps with respect to their parents will be explored through a carefully designed scaling study and code comparisons. Note that in this FY, we also have quantitative assessment milestones that are due in September and are, therefore, not included in the description above or in this report. Another report on these deliverables will be generated and submitted upon completion of these milestones. To satisfy this milestone, the following specific tasks were completed: Study the ability of MACSio to represent I/O workloads of adaptive mesh codes. Re-define the performance counter groups for contemporary Intel and IBM platforms to better match specific hardware components and to better align across platforms (make cross-platform comparison more accurate). Perform cosine similarity study based on the new performance counter groups on the Intel and IBM P9 platforms. Perform detailed analysis of performance counter data to accurately average and align the data to maintain phases across all executions and develop methods to reduce the set of collected performance counters used in cosine similarity analysis. Apply a quantitative similarity comparison between proxy and parent CPU kernels. Perform scaling studies to understand the accuracy of predictability of the parent performance using its respective proxy application. This report presents highlights of these efforts.

As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high-performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. We describe the novel abstractions that have been added to Kokkos version 3 such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations to prepare for exascale era architectures. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.