Publications

Results 101–200 of 9,998
Skip to search filters

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems

Moon, Gordon E.; Kwon, Hyoukjun; Jeong, Geonhwa; Chatarasi, Prasanth; Rajamanickam, Sivasankaran R.; Krishna, Tushar

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.

More Details

Large-Scale Atomistic Simulations: Investigating Free Expansion

Moore, Stan G.

The experiment investigates free expansion of a supercritical fluid into a two-phase liquid-vapor coexistence region. A huge molecular dynamics simulation (6 billion Lennard-Jones atoms) was run on 5760 GPUs (33% of LLNL Sierra) using LAMMPS/Kokkos software. This improved visualization workflow and started preliminary simulations of aluminum using SNAP machine learning potential.

More Details

Elucidating size effects on the yield strength of single-crystal Cu via the Richtmyer–Meshkov instability

Journal of Applied Physics

Stewart, James A.; Wood, Mitchell A.; Olles, Joseph O.

Capturing the dynamic response of a material under high strain-rate deformation often demands challenging and time consuming experimental effort. While shock hydrodynamic simulation methods can aid in this area, a priori characterizations of the material strength under shock loading and spall failure are needed in order to parameterize constitutive models needed for these computational tools. Moreover, parameterizations of strain-rate-dependent strength models are needed to capture the full suite of Richtmyer–Meshkov instability (RMI) behavior of shock compressed metals, creating an unrealistic demand for these training data solely on experiments. Herein, we sweep a large range of geometric, crystallographic, and shock conditions within molecular dynamics (MD) simulations and demonstrate the breadth of RMI in Cu that can be captured from the atomic scale. In this work, yield strength measurements from jetted and arrested material from a sinusoidal surface perturbation were quantified as YRMI = 0.787 ± 0.374 GPa, higher than strain-rate-independent models used in experimentally matched hydrodynamic simulations. Defect-free, single-crystal Cu samples used in MD will overestimate YRMI, but the drastic scale difference between experiment and MD is highlighted by high confidence neighborhood clustering predictions of RMI characterizations, yielding incorrect classifications.

More Details

First-passage time statistics on surfaces of general shape: Surface PDE solvers using Generalized Moving Least Squares (GMLS)

Journal of Computational Physics

Gross, B.J.; Kuberry, Paul A.; Atzberger, P.J.

We develop numerical methods for computing statistics of stochastic processes on surfaces of general shape with drift-diffusion dynamics dXt=a(Xt)dt+b(Xt)dWt. We formulate descriptions of Brownian motion and general drift-diffusion processes on surfaces. We consider statistics of the form u(x)=Ex[∫0τg(Xt)dt]+Ex[f(Xτ)] for a domain Ω and the exit stopping time τ=inft⁡{t>0|Xt∉Ω}, where f,g are general smooth functions. For computing these statistics, we develop high-order Generalized Moving Least Squares (GMLS) solvers for associated surface PDE boundary-value problems based on Backward-Kolmogorov equations. We focus particularly on the mean First Passage Times (FPTs) given by the case f=0,g=1 where u(x)=Ex[τ]. We perform studies for a variety of shapes showing our methods converge with high-order accuracy both in capturing the geometry and the surface PDE solutions. We then perform studies showing how statistics are influenced by the surface geometry, drift dynamics, and spatially dependent diffusivities.

More Details

Three-Photon Optical Pumping for Trapped Ion Quantum Computing

Hogle, Craig W.; Ivory, Megan K.; Lobser, Daniel L.; Ruzic, Brandon R.; DeRose, Christopher T.

In this report we describe the testing of a novel scheme for state preparation of trapped ions in a quantum computing setup. This technique optimally would allow for similar precision and speed of state preparation while allowing for individual addressability of single ions in a chain using technology already available in a trapped ion experiment. As quantum computing experiments become more complicated, mid-experiment measurements will become necessary to achieve algorithms such as quantum error correction. Any mid-experiment measurement then requires the measured qubit to be re-prepared to a known quantum state. Currently this involves the protected qubits to be moved a sizeable distance away from the qubit being re-prepared which can be costly in terms of experiment length as well as introducing errors. Theoretical calculations predict that a three-photon process would allow for state preparation without qubit movement with similar efficiencies to current state preparation methods.

More Details

Navier-Stokes Equations Do Not Describe the Smallest Scales of Turbulence in Gases

Physical Review Letters

McMullen, Ryan M.; Krygier, Michael K.; Torczynski, J.R.; Gallis, Michail A.

In turbulent flows, kinetic energy is transferred from the largest scales to progressively smaller scales, until it is ultimately converted into heat. The Navier-Stokes equations are almost universally used to study this process. Here, by comparing with molecular-gas-dynamics simulations, we show that the Navier-Stokes equations do not describe turbulent gas flows in the dissipation range because they neglect thermal fluctuations. We investigate decaying turbulence produced by the Taylor-Green vortex and find that in the dissipation range the molecular-gas-dynamics spectra grow quadratically with wave number due to thermal fluctuations, in agreement with previous predictions, while the Navier-Stokes spectra decay exponentially. Furthermore, the transition to quadratic growth occurs at a length scale much larger than the gas molecular mean free path, namely in a regime that the Navier-Stokes equations are widely believed to describe. In fact, our results suggest that the Navier-Stokes equations are not guaranteed to describe the smallest scales of gas turbulence for any positive Knudsen number.

More Details

A theoretical investigation of the hydrolysis of uranium hexafluoride: the initiation mechanism and vibrational spectroscopy

Physical Chemistry Chemical Physics. PCCP

Lutz, Jesse J.; byrd, jason b.; Lotrich, Victor L.; Jensen, Daniel S.; Zador, Judit Z.; Hubbard, Joshua A.

Depleted uranium hexafluoride (UF6), a stockpiled byproduct of the nuclear fuel cycle, reacts readily with atmospheric humidity, but the mechanism is poorly understood. Here we compare several potential initiation steps at a consistent level of theory, generating underlying structures and vibrational modes using hybrid density functional theory (DFT) and computing relative energies of stationary points with double-hybrid (DH) DFT. A benchmark comparison is performed to assess the quality of DH-DFT data using reference energy differences obtained using a complete-basis-limit coupled-cluster (CC) composite method. The associated large-basis CC computations were enabled by a new general-purpose pseudopotential capability implemented as part of this work. Dispersion-corrected parameter-free DH-DFT methods, namely PBE0-DH-D3(BJ) and PBE-QIDH-D3(BJ), provided mean unsigned errors within chemical accuracy (1 kcal mol-1) for a set of barrier heights corresponding to the most energetically favorable initiation steps. The hydrolysis mechanism is found to proceed via intermolecular hydrogen transfer within van der Waals complexes involving UF6, UF5OH, and UOF4, in agreement with previous studies, followed by the formation of a previously unappreciated dihydroxide intermediate, UF4(OH)2. The dihydroxide is predicted to form under both kinetic and thermodynamic control, and, unlike the alternate pathway leading to the UO2F2 monomer, its reaction energy is exothermic, in agreement with observation. Finally, harmonic and anharmonic vibrational simulations are performed to reinterpret literature infrared spectroscopy in light of this newly identified species.

More Details

A hybrid meshfree discretization to improve the numerical performance of peridynamic models

Computer Methods in Applied Mechanics and Engineering

Shojaei, Arman; Hermann, Alexander; Cyron, Christian J.; Seleson, Pablo; Silling, Stewart A.

Efficient and accurate calculation of spatial integrals is of major interest in the numerical implementation of peridynamics (PD). The standard way to perform this calculation is a particle-based approach that discretizes the strong form of the PD governing equation. This approach has rapidly been adopted by the PD community since it offers some advantages. It is computationally cheaper than other available schemes, can conveniently handle material separation, and effectively deals with nonlinear PD models. Nevertheless, PD models are still computationally very expensive compared with those based on the classical continuum mechanics theory, particularly for large-scale problems in three dimensions. This results from the nonlocal nature of the PD theory which leads to interactions of each node of a discretized body with multiple surrounding nodes. Here, we propose a new approach to significantly boost the numerical efficiency of PD models. We propose a discretization scheme that employs a simple collocation procedure and is truly meshfree; i.e., it does not depend on any background integration cells. In contrast to the standard scheme, the proposed scheme requires a much smaller set of neighboring nodes (keeping the same physical length scale) to achieve a specific accuracy and is thus computationally more efficient. Our new scheme is applicable to the case of linear PD models and within neighborhoods where the solution can be approximated by smooth basis functions. Therefore, to fully exploit the advantages of both the standard and the proposed schemes, a hybrid discretization is presented that combines both approaches within an adaptive framework. The high performance of the developed framework is illustrated by several numerical examples, including brittle fracture and corrosion problems in two and three dimensions.

More Details

Krylov subspace recycling for evolving structures

Computer Methods in Applied Mechanics and Engineering

Bolten, M.; de Sturler, E.; Hahn, C.; Parks, Michael L.

Krylov subspace recycling is a powerful tool when solving a long series of large, sparse linear systems that change only slowly over time. In PDE constrained shape optimization, these series appear naturally, as typically hundreds or thousands of optimization steps are needed with only small changes in the geometry. In this setting, however, applying Krylov subspace recycling can be a difficult task. As the geometry evolves, in general, so does the finite element mesh defined on or representing this geometry, including the numbers of nodes and elements and element connectivity. This is especially the case if re-meshing techniques are used. As a result, the number of algebraic degrees of freedom in the system changes, and in general the linear system matrices resulting from the finite element discretization change size from one optimization step to the next. Changes in the mesh connectivity also lead to structural changes in the matrices. In the case of re-meshing, even if the geometry changes only a little, the corresponding mesh might differ substantially from the previous one. Obviously, this prevents any straightforward mapping of the approximate invariant subspace of the linear system matrix (the focus of recycling in this paper) from one optimization step to the next; similar problems arise for other selected subspaces. In this paper, we present an algorithm to map an approximate invariant subspace of the linear system matrix for the previous optimization step to an approximate invariant subspace of the linear system matrix for the current optimization step, for general meshes. This is achieved by exploiting the map from coefficient vectors to finite element functions on the mesh, combined with interpolation or approximation of functions on the finite element mesh. We demonstrate the effectiveness of our approach numerically with several proof of concept studies for a specific meshing technique.

More Details

Pyomo.GDP: an ecosystem for logic based modeling and optimization development

Optimization and Engineering

Chen, Qi; Johnson, Emma S.; Bernal, David E.; Valentin, Romeo; Kale, Sunjeev; Bates, Johnny; Siirola, John D.; Grossmann, Ignacio E.

We present three core principles for engineering-oriented integrated modeling and optimization tool sets—intuitive modeling contexts, systematic computer-aided reformulations, and flexible solution strategies—and describe how new developments in Pyomo.GDP for Generalized Disjunctive Programming (GDP) advance this vision. We describe a new logical expression system implementation for Pyomo.GDP allowing for a more intuitive description of logical propositions. The logical expression system supports automated reformulation of these logical constraints to linear constraints. We also describe two new logic-based global optimization solver implementations built on Pyomo.GDP that exploit logical structure to avoid “zero-flow” numerical difficulties that arise in nonlinear network design problems when nodes or streams disappear. These new solvers also demonstrate the capability to link to external libraries for expanded functionality within an integrated implementation. We present these new solvers in the context of a flexible array of solution paths available to GDP models. Finally, we present results on a new library of GDP models demonstrating the value of multiple solution approaches.

More Details

Hole in one: Pathways to deterministic single-acceptor incorporation in Si(100)-2 × 1

AVS Quantum Science

Campbell, Quinn C.; Baczewski, Andrew D.; Butera, R.E.; Misra, Shashank M.

Stochastic incorporation kinetics can be a limiting factor in the scalability of semiconductor fabrication technologies using atomic-precision techniques. While these technologies have recently been extended from donors to acceptors, the extent to which kinetics will impact single-acceptor incorporation has yet to be assessed. To identify the precursor molecule and dosing conditions that are promising for deterministic incorporation, we develop and apply an atomistic model for the single-acceptor incorporation rates of several recently demonstrated molecules: diborane (B2H6), boron trichloride (BCl3), and aluminum trichloride in both monomer (AlCl3) and dimer forms (Al2Cl6). While all three precursors can realize single-acceptor incorporation, we predict that diborane is unlikely to realize deterministic incorporation, boron trichloride can realize deterministic incorporation with modest heating (50 °C), and aluminum trichloride can realize deterministic incorporation at room temperature. We conclude that both boron and aluminum trichloride are promising precursors for atomic-precision single-acceptor applications, with the potential to enable the reliable production of large arrays of single-atom quantum devices.

More Details

Atomic step disorder on polycrystalline surfaces leads to spatially inhomogeneous work functions

Journal of Vacuum Science and Technology A

Bussmann, Ezra B.; smith, sean w.; Scrymgeour, David S.; Brumbach, Michael T.; Lu, Ping L.; Dickens, Sara D.; Michael, Joseph R.; Ohta, Taisuke O.; Hjalmarson, Harold P.; Schultz, Peter A.; Clem, Paul G.; Hopkins, Matthew M.; Moore, Christopher M.

Structural disorder causes materials’ surface electronic properties, e.g., work function ([Formula: see text]), to vary spatially, yet it is challenging to prove exact causal relationships to underlying ensemble disorder, e.g., roughness or granularity. For polycrystalline Pt, nanoscale resolution photoemission threshold mapping reveals a spatially varying [Formula: see text] eV over a distribution of (111) vicinal grain surfaces prepared by sputter deposition and annealing. With regard to field emission and related phenomena, e.g., vacuum arc initiation, a salient feature of the [Formula: see text] distribution is that it is skewed with a long tail to values down to 5.4 eV, i.e., far below the mean, which is exponentially impactful to field emission via the Fowler–Nordheim relation. We show that the [Formula: see text] spatial variation and distribution can be explained by ensemble variations of granular tilts and surface slopes via a Smoluchowski smoothing model wherein local [Formula: see text] variations result from spatially varying densities of electric dipole moments, intrinsic to atomic steps, that locally modify [Formula: see text]. Atomic step-terrace structure is confirmed with scanning tunneling microscopy (STM) at several locations on our surfaces, and prior works showed STM evidence for atomic step dipoles at various metal surfaces. From our model, we find an atomic step edge dipole [Formula: see text] D/edge atom, which is comparable to values reported in studies that utilized other methods and materials. Our results elucidate a connection between macroscopic [Formula: see text] and the nanostructure that may contribute to the spread of reported [Formula: see text] for Pt and other surfaces and may be useful toward more complete descriptions of polycrystalline metals in the models of field emission and other related vacuum electronics phenomena, e.g., arc initiation.

More Details

Finding Electronic Structure Machine Learning Surrogates without Training

Fiedler, Lenz F.; Hoffmann, Nils H.; Mohammed, Parvez M.; Popoola, Gabriel A.; Yovell, Tamar Y.; Oles, Vladyslav O.; Ellis, Austin E.; Rajamanickam, Sivasankaran R.; Cangi, Attila -.

A myriad of phenomena in materials science and chemistry rely on quantum-level simulations of the electronic structure in matter. While moving to larger length and time scales has been a pressing issue for decades, such large-scale electronic structure calculations are still challenging despite modern software approaches and advances in high-performance computing. The silver lining in this regard is the use of machine learning to accelerate electronic structure calculations – this line of research has recently gained growing attention. The grand challenge therein is finding a suitable machine-learning model during a process called hyperparameter optimization. This, however, causes a massive computational overhead in addition to that of data generation. We accelerate the construction of machine-learning surrogate models by roughly two orders of magnitude by circumventing excessive training during the hyperparameter optimization phase. We demonstrate our workflow for Kohn-Sham density functional theory, the most popular computational method in materials science and chemistry.

More Details

Unified Memory: GPGPU-Sim/UVM Smart Integration

Liu, Yechen L.; Rogers, Timothy R.; Hughes, Clayton H.

CPU/GPU heterogeneous compute platforms are an ubiquitous element in computing and a programming model specified for this heterogeneous computing model is important for both performance and programmability. A programming model that exposes the shared, unified, address space between the heterogeneous units is a necessary step in this direction as it removes the burden of explicit data movement from the programmer while maintaining performance. GPU vendors, such as AMD and NVIDIA, have released software-managed runtimes that can provide programmers the illusion of unified CPU and GPU memory by automatically migrating data in and out of the GPU memory. However, this runtime support is not included in GPGPU-Sim, a commonly used framework that models the features of a modern graphics processor that are relevant to non-graphics applications. UVM Smart was developed, which extended GPGPU-Sim 3.x to in- corporate the modeling of on-demand pageing and data migration through the runtime. This report discusses the integration of UVM Smart and GPGPU-Sim 4.0 and the modifications to improve simulation performance and accuracy.

More Details

Randomized Cholesky Preconditioning for Graph Partitioning Applications

Espinoza, Heliezer J.; Loe, Jennifer A.; Boman, Erik G.

Graph partitioning has emerged as an area of interest due to its use in various applications in computational research. One way to partition a graph is to solve for the eigenvectors of the corresponding graph Laplacian matrix. This project focuses on the eigensolver LOBPCG and the evaluation of a new preconditioner: Randomized Cholesky Factorization (rchol). This proconditioner was tested for its speed and accuracy against other well-known preconditioners for the method. After experiments were run on several known test matrices, rchol appears to be a better preconditioner for structured matrices. This research was sponsored by National Nuclear Security Administration Minority Serving Institutions Internship Program (NNSA-MSIIP) and completed at host facility Sandia National Laboratories. As such, after discussion of the research project itself, this report contains a brief reflection on experience gained as a result of participating in the NNSA-MSIIP.

More Details

A silicon singlet–triplet qubit driven by spin-valley coupling

Nature Communications

Jock, Ryan M.; Jacobson, Noah T.; Rudolph, Martin R.; Ward, Daniel R.; Carroll, Malcolm S.; Luhman, Dwight R.

Spin–orbit effects, inherent to electrons confined in quantum dots at a silicon heterointerface, provide a means to control electron spin qubits without the added complexity of on-chip, nanofabricated micromagnets or nearby coplanar striplines. Here, we demonstrate a singlet–triplet qubit operating mode that can drive qubit evolution at frequencies in excess of 200 MHz. This approach offers a means to electrically turn on and off fast control, while providing high logic gate orthogonality and long qubit dephasing times. We utilize this operational mode for dynamical decoupling experiments to probe the charge noise power spectrum in a silicon metal-oxide-semiconductor double quantum dot. In addition, we assess qubit frequency drift over longer timescales to capture low-frequency noise. We present the charge noise power spectral density up to 3 MHz, which exhibits a 1/fα dependence consistent with α ~ 0.7, over 9 orders of magnitude in noise frequency.

More Details

A Block-Based Triangle Counting Algorithm on Heterogeneous Environments

IEEE Transactions on Parallel and Distributed Systems

Yasar, Abdurrahman; Rajamanickam, Sivasankaran R.; Berry, Jonathan W.; Catalyurek, Umit V.

Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.

More Details

Neuromorphic scaling advantages for energy-efficient random walk computations

Nature Electronics

Smith, John D.; Hill, Aaron J.; Reeder, Leah E.; Franke, Brian C.; Lehoucq, Richard B.; Parekh, Ojas D.; Severa, William M.; Aimone, James B.

Neuromorphic computing, which aims to replicate the computational structure and architecture of the brain in synthetic hardware, has typically focused on artificial intelligence applications. What is less explored is whether such brain-inspired hardware can provide value beyond cognitive tasks. Here we show that the high degree of parallelism and configurability of spiking neuromorphic architectures makes them well suited to implement random walks via discrete-time Markov chains. These random walks are useful in Monte Carlo methods, which represent a fundamental computational tool for solving a wide range of numerical computing tasks. Using IBM’s TrueNorth and Intel’s Loihi neuromorphic computing platforms, we show that our neuromorphic computing algorithm for generating random walk approximations of diffusion offers advantages in energy-efficient computation compared with conventional approaches. We also show that our neuromorphic computing algorithm can be extended to more sophisticated jump-diffusion processes that are useful in a range of applications, including financial economics, particle physics and machine learning.

More Details

Assessing the predictive impact of factor fixing with an adaptive uncertainty-based approach

Environmental Modelling and Software

Wang, Qian; Guillaume, Joseph H.A.; Jakeman, John D.; Yang, Tao; Iwanaga, Takuya; Croke, Barry; Jakeman, Anthony J.

Despite widespread use of factor fixing in environmental modeling, its effect on model predictions has received little attention and is instead commonly presumed to be negligible. We propose a proof-of-concept adaptive method for systematically investigating the impact of factor fixing. The method uses Global Sensitivity Analysis methods to identify groups of sensitive parameters, then quantifies which groups can be safely fixed at nominal values without exceeding a maximum acceptable error, demonstrated using the 21-dimensional Sobol’ G-function. Three error measures are considered for quantities of interest, namely Relative Mean Absolute Error, Pearson Product-Moment Correlation and Relative Variance. Results demonstrate that factor fixing may cause large errors in the model results unexpectedly, when preliminary analysis suggests otherwise, and that the default value selected affects the number of factors to fix. To improve the applicability and methodological development of factor fixing, a new research agenda encompassing five opportunities is discussed for further attention.

More Details

A data-driven peridynamic continuum model for upscaling molecular dynamics

Computer Methods in Applied Mechanics and Engineering

You, Huaiqian; Yu, Yue; Silling, Stewart A.; D'Elia, Marta D.

Nonlocal models, including peridynamics, often use integral operators that embed lengthscales in their definition. However, the integrands in these operators are difficult to define from the data that are typically available for a given physical system, such as laboratory mechanical property tests. In contrast, molecular dynamics (MD) does not require these integrands, but it suffers from computational limitations in the length and time scales it can address. To combine the strengths of both methods and to obtain a coarse-grained, homogenized continuum model that efficiently and accurately captures materials’ behavior, we propose a learning framework to extract, from MD data, an optimal Linear Peridynamic Solid (LPS) model as a surrogate for MD displacements. To maximize the accuracy of the learnt model we allow the peridynamic influence function to be partially negative, while preserving the well-posedness of the resulting model. To achieve this, we provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions and develop a constrained optimization algorithm that minimizes the equation residual while enforcing such solvability conditions. This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training. We illustrate the efficacy of the proposed approach with several numerical tests for single layer graphene. Our two-dimensional tests show the robustness of the proposed algorithm on validation data sets that include thermal noise, different domain shapes and external loadings, and discretizations substantially different from the ones used for training.

More Details

LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales

Computer Physics Communications

Thompson, Aidan P.; Aktulga, H.M.; Berger, Richard; Bolintineanu, Dan S.; Brown, W.M.; Crozier, Paul C.; in 't Veld, Pieter J.; Kohlmeyer, Axel; Moore, Stan G.; Nguyen, Trung D.; Shan, Ray; Stevens, Mark J.; Tranchida, Julien; Trott, Christian R.; Plimpton, Steven J.

Since the classical molecular dynamics simulator LAMMPS was released as an open source code in 2004, it has become a widely-used tool for particle-based modeling of materials at length scales ranging from atomic to mesoscale to continuum. Reasons for its popularity are that it provides a wide variety of particle interaction models for different materials, that it runs on any platform from a single CPU core to the largest supercomputers with accelerators, and that it gives users control over simulation details, either via the input script or by adding code for new interatomic potentials, constraints, diagnostics, or other features needed for their models. As a result, hundreds of people have contributed new capabilities to LAMMPS and it has grown from fifty thousand lines of code in 2004 to a million lines today. In this paper several of the fundamental algorithms used in LAMMPS are described along with the design strategies which have made it flexible for both users and developers. We also highlight some capabilities recently added to the code which were enabled by this flexibility, including dynamic load balancing, on-the-fly visualization, magnetic spin dynamics models, and quantum-accuracy machine learning interatomic potentials. Program Summary: Program Title: Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) CPC Library link to program files: https://doi.org/10.17632/cxbxs9btsv.1 Developer's repository link: https://github.com/lammps/lammps Licensing provisions: GPLv2 Programming language: C++, Python, C, Fortran Supplementary material: https://www.lammps.org Nature of problem: Many science applications in physics, chemistry, materials science, and related fields require parallel, scalable, and efficient generation of long, stable classical particle dynamics trajectories. Within this common problem definition, there lies a great diversity of use cases, distinguished by different particle interaction models, external constraints, as well as timescales and lengthscales ranging from atomic to mesoscale to macroscopic. Solution method: The LAMMPS code uses parallel spatial decomposition, distributed neighbor lists, and parallel FFTs for long-range Coulombic interactions [1]. The time integration algorithm is based on the Størmer-Verlet symplectic integrator [2], which provides better stability than higher-order non-symplectic methods. In addition, LAMMPS supports a wide range of interatomic potentials, constraints, diagnostics, software interfaces, and pre- and post-processing features. Additional comments including restrictions and unusual features: This paper serves as the definitive reference for the LAMMPS code. References: [1] S. Plimpton, Fast parallel algorithms for short-range molecular dynamics. J. Comp. Phys. 117 (1995) 1–19. [2] L. Verlet, Computer experiments on classical fluids: I. Thermodynamical properties of Lennard–Jones molecules, Phys. Rev. 159 (1967) 98–103.

More Details

Randomized Cholesky Preconditioning for Graph Partitioning Applications

Espinoza, Heliezer J.; Loe, Jennifer A.; Boman, Erik G.

A graph is a mathematical representation of a network; we say it consists of a set of vertices, which are connected by edges. Graphs have numerous applications in various fields, as they can model all sorts of connections, processes, or relations. For example, graphs can model intricate transit systems or the human nervous system. However, graphs that are large or complicated become difficult to analyze. This is why there is an increased interest in the area of graph partitioning, reducing the size of the graph into multiple partitions. For example, partitions of a graph representing a social network might help identify clusters of friends or colleagues. Graph partitioning is also a widely used approach to load balancing in parallel computing. The partitioning of a graph is extremely useful to decompose the graph into smaller parts and allow for easier analysis. There are different ways to solve graph partitioning problems. For this work, we focus on a spectral partitioning method which forms a partition based upon the eigenvectors of the graph Laplacian (details presented in Acer, et. al.). This method uses the LOBPCG algorithm to compute these eigenvectors. LOBPCG can be accelerated by an operator called a preconditioner. For this internship, we evaluate a randomized Cholesky (rchol) preconditioner for its effectiveness on graph partitioning problems with LOBPCG. We compare it with two standard preconditioners: Jacobi and Incomplete Cholesky (ichol). This research was conducted from August to December 2021 in conjunction with Sandia National Laboratories.

More Details

Zero-Truncated Poisson Tensor Decomposition for Sparse Count Data

Lopez, Oscar L.; Lehoucq, Richard B.; Dunlavy, Daniel D.

We propose a novel statistical inference paradigm for zero-inflated multiway count data that dispenses with the need to distinguish between true and false zero counts. Our approach ignores all zero entries and applies zero-truncated Poisson regression on the positive counts. Inference is accomplished via tensor completion that imposes low-rank structure on the Poisson parameter space. Our main result shows that an $\textit{N}$-way rank-R parametric tensor 𝓜 ϵ (0, ∞)$I$Χ∙∙∙Χ$I$ generating Poisson observations can be accurately estimated from approximately $IR^2 \text{log}^2_2(I)$ non-zero counts for a nonnegative canonical polyadic decomposition. Several numerical experiments are presented demonstrating that our zero-truncated paradigm is comparable to the ideal scenario where the locations of false zero counts are known $\textit{a priori}$.

More Details

Machine-Learning of Nonlocal Kernels for Anomalous Subsurface Transport from Breakthrough Curves

D'Elia, Marta D.; Glusa, Christian A.; Xu, Xiao X.; Foster, John E.

Anomalous behavior is ubiquitous in subsurface solute transport due to the presence of high degrees of heterogeneity at different scales in the media. Although fractional models have been extensively used to describe the anomalous transport in various subsurface applications, their application is hindered by computational challenges. Simpler nonlocal models characterized by integrable kernels and finite interaction length represent a computationally feasible alternative to fractional models; yet, the informed choice of their kernel functions still remains an open problem. We propose a general data-driven framework for the discovery of optimal kernels on the basis of very small and sparse data sets in the context of anomalous subsurface transport. Using spatially sparse breakthrough curves recovered from fine-scale particle-density simulations, we learn the best coarse-scale nonlocal model using a nonlocal operator regression technique. Predictions of the breakthrough curves obtained using the optimal nonlocal model show good agreement with fine-scale simulation results even at locations and time intervals different from the ones used to train the kernel, confirming the excellent generalization properties of the proposed algorithm. A comparison with trained classical models and with black-box deep neural networks confirms the superiority of the predictive capability of the proposed model.

More Details

Sequential optical response suppression for chemical mixture characterization

Quantum

Magann, Alicia B.; McCaul, Gerard M.; Rabitz, Herschel R.; Bondar, Denys B.

The characterization of mixtures of non-interacting, spectroscopically similar quantum components has important applications in chemistry, biology, and materials science. We introduce an approach based on quantum tracking control that allows for determining the relative concentrations of constituents in a quantum mixture, using a single pulse which enhances the distinguishability of components of the mixture and has a length that scales linearly with the number of mixture constituents. To illustrate the method, we consider two very distinct model systems: mixtures of diatomic molecules in the gas phase, as well as solid-state materials composed of a mixture of components. A set of numerical analyses are presented, showing strong performance in both settings.

More Details

Precision tomography of a three-qubit donor quantum processor in silicon

Nature

Mądzik, Mateusz T.; Asaad, Serwan; Youssry, Akram; Joecker, Benjamin; Rudinger, Kenneth M.; Nielsen, Erik N.; Young, Kevin C.; Proctor, Timothy J.; Baczewski, Andrew D.; Laucht, Arne; Schmitt, Vivien; Hudson, Fay E.; Itoh, Kohei M.; Jakob, Alexander M.; Johnson, Brett C.; Jamieson, David N.; Dzurak, Andrew S.; Ferrie, Christopher; Blume-Kohout, Robin J.; Morello, Andrea

Nuclear spins were among the first physical platforms to be considered for quantum information processing1,2, because of their exceptional quantum coherence3 and atomic-scale footprint. However, their full potential for quantum computing has not yet been realized, owing to the lack of methods with which to link nuclear qubits within a scalable device combined with multi-qubit operations with sufficient fidelity to sustain fault-tolerant quantum computation. Here we demonstrate universal quantum logic operations using a pair of ion-implanted 31P donor nuclei in a silicon nanoelectronic device. A nuclear two-qubit controlled-Z gate is obtained by imparting a geometric phase to a shared electron spin4, and used to prepare entangled Bell states with fidelities up to 94.2(2.7)%. The quantum operations are precisely characterized using gate set tomography (GST)5, yielding one-qubit average gate fidelities up to 99.95(2)%, two-qubit average gate fidelity of 99.37(11)% and two-qubit preparation/measurement fidelities of 98.95(4)%. These three metrics indicate that nuclear spins in silicon are approaching the performance demanded in fault-tolerant quantum processors6. We then demonstrate entanglement between the two nuclei and the shared electron by producing a Greenberger–Horne–Zeilinger three-qubit state with 92.5(1.0)% fidelity. Because electron spin qubits in semiconductors can be further coupled to other electrons7–9 or physically shuttled across different locations10,11, these results establish a viable route for scalable quantum information processing using donor nuclear and electron spins.

More Details

Thermodynamically consistent physics-informed neural networks for hyperbolic systems

Journal of Computational Physics

Patel, Ravi G.; Manickam, Indu; Trask, Nathaniel A.; Wood, Mitchell A.; Lee, Myoungkyu N.; Tomas, Ignacio T.; Cyr, Eric C.

Physics-informed neural network architectures have emerged as a powerful tool for developing flexible PDE solvers that easily assimilate data. When applied to problems in shock physics however, these approaches face challenges related to the collocation-based PDE discretization underpinning them. By instead adopting a least squares space-time control volume scheme, we obtain a scheme which more naturally handles: regularity requirements, imposition of boundary conditions, entropy compatibility, and conservation, substantially reducing requisite hyperparameters in the process. Additionally, connections to classical finite volume methods allows application of inductive biases toward entropy solutions and total variation diminishing properties. For inverse problems in shock hydrodynamics, we propose inductive biases for discovering thermodynamically consistent equations of state that guarantee hyperbolicity. This framework therefore provides a means of discovering continuum shock models from molecular simulations of rarefied gases and metals. The output of the learning process provides a data-driven equation of state which may be incorporated into traditional shock hydrodynamics codes.

More Details

Calibration of elastoplastic constitutive model parameters from full-field data with automatic differentiation-based sensitivities

International Journal for Numerical Methods in Engineering

Seidl, D.T.; Granzow, Brian N.

We present a framework for calibration of parameters in elastoplastic constitutive models that is based on the use of automatic differentiation (AD). The model calibration problem is posed as a partial differential equation-constrained optimization problem where a finite element (FE) model of the coupled equilibrium equation and constitutive model evolution equations serves as the constraint. The objective function quantifies the mismatch between the displacement predicted by the FE model and full-field digital image correlation data, and the optimization problem is solved using gradient-based optimization algorithms. Forward and adjoint sensitivities are used to compute the gradient at considerably less cost than its calculation from finite difference approximations. Through the use of AD, we need only to write the constraints in terms of AD objects, where all of the derivatives required for the forward and inverse problems are obtained by appropriately seeding and evaluating these quantities. We present three numerical examples that verify the correctness of the gradient, demonstrate the AD approach's parallel computation capabilities via application to a large-scale FE model, and highlight the formulation's ease of extensibility to other classes of constitutive models.

More Details

Measuring the capabilities of quantum computers

Nature Physics

Proctor, Timothy J.; Rudinger, Kenneth M.; Young, Kevin; Nielsen, Erik N.; Blume-Kohout, Robin J.

Quantum computers can now run interesting programs, but each processor’s capability—the set of programs that it can run successfully—is limited by hardware errors. These errors can be complicated, making it difficult to accurately predict a processor’s capability. Benchmarks can be used to measure capability directly, but current benchmarks have limited flexibility and scale poorly to many-qubit processors. We show how to construct scalable, efficiently verifiable benchmarks based on any program by using a technique that we call circuit mirroring. With it, we construct two flexible, scalable volumetric benchmarks based on randomized and periodically ordered programs. We use these benchmarks to map out the capabilities of twelve publicly available processors, and to measure the impact of program structure on each one. We find that standard error metrics are poor predictors of whether a program will run successfully on today’s hardware, and that current processors vary widely in their sensitivity to program structure.

More Details

Logical and Physical Reversibility of Conservative Skyrmion Logic

IEEE Magnetics Letters

Hu, Xuan; Walker, Benjamin W.; Garcia-Sanchez, Felipe; Edwards, Alexander J.; Zhou, Peng; Incorvia, Jean A.; Paler, Alexandru; Frank, Michael P.; Friedman, Joseph S.

Magnetic skyrmions are nanoscale whirls of magnetism that can be propagated with electrical currents. The repulsion between skyrmions inspires their use for reversible computing based on the elastic billiard ball collisions proposed for conservative logic in 1982. In this letter, we evaluate the logical and physical reversibility of this skyrmion logic paradigm, as well as the limitations that must be addressed before dissipation-free computation can be realized.

More Details

Leveraging Production Visualization Tools In Situ

Mathematics and Visualization

Moreland, Kenneth D.; Bauer, Andrew C.; Geveci, Berk; O’Leary, Patrick; Whitlock, Brad

The visualization community has invested decades of research and development into producing large-scale production visualization tools. Although in situ is a paradigm shift for large-scale visualization, much of the same algorithms and operations apply regardless of whether the visualization is run post hoc or in situ. Thus, there is a great benefit to taking the large-scale code originally designed for post hoc use and leveraging it for use in situ. This chapter describes two in situ libraries, Libsim and Catalyst, that are based on mature visualization tools, VisIt and ParaView, respectively. Because they are based on fully-featured visualization packages, they each provide a wealth of features. For each of these systems we outline how the simulation and visualization software are coupled, what the runtime behavior and communication between these components are, and how the underlying implementation works. We also provide use cases demonstrating the systems in action. Both of these in situ libraries, as well as the underlying products they are based on, are made freely available as open-source products. The overviews in this chapter provide a toehold to the practical application of in situ visualization.

More Details

Reverse-mode differentiation in arbitrary tensor network format: with application to supervised learning

Journal of Machine Learning Research

Gorodetsky, Alex A.; Safta, Cosmin S.; Jakeman, John D.

This paper describes an efficient reverse-mode differentiation algorithm for contraction operations for arbitrary and unconventional tensor network topologies. The approach leverages the tensor contraction tree of Evenbly and Pfeifer (2014), which provides an instruction set for the contraction sequence of a network. We show that this tree can be efficiently leveraged for differentiation of a full tensor network contraction using a recursive scheme that exploits (1) the bilinear property of contraction and (2) the property that trees have a single path from root to leaves. While differentiation of tensor-tensor contraction is already possible in most automatic differentiation packages, we show that exploiting these two additional properties in the specific context of contraction sequences can improve eficiency. Following a description of the algorithm and computational complexity analysis, we investigate its utility for gradient-based supervised learning for low-rank function recovery and for fitting real-world unstructured datasets. We demonstrate improved performance over alternating least-squares optimization approaches and the capability to handle heterogeneous and arbitrary tensor network formats. When compared to alternating minimization algorithms, we find that the gradient-based approach requires a smaller oversampling ratio (number of samples compared to number model parameters) for recovery. This increased efficiency extends to fitting unstructured data of varying dimensionality and when employing a variety of tensor network formats. Here, we show improved learning using the hierarchical Tucker method over the tensor-train in high-dimensional settings on a number of benchmark problems.

More Details

Characterizing Midcircuit Measurements on a Superconducting Qubit Using Gate Set Tomography

Physical Review Applied

Rudinger, Kenneth M.; Ribeill, Guilhem J.; Govia, Luke C.G.; Ware, Matthew; Nielsen, Erik N.; Young, Kevin; Ohki, Thomas A.; Blume-Kohout, Robin J.; Proctor, Timothy J.

Measurements that occur within the internal layers of a quantum circuit—midcircuit measurements—are a useful quantum-computing primitive, most notably for quantum error correction. Midcircuit measurements have both classical and quantum outputs, so they can be subject to error modes that do not exist for measurements that terminate quantum circuits. Here we show how to characterize midcircuit measurements, modeled by quantum instruments, using a technique that we call quantum instrument linear gate set tomography (QILGST). We then apply this technique to characterize a dispersive measurement on a superconducting transmon qubit within a multiqubit system. By varying the delay time between the measurement pulse and subsequent gates, we explore the impact of residual cavity photon population on measurement error. QILGST can resolve different error modes and quantify the total error from a measurement; in our experiment, for delay times above 1000ns we measure a total error rate (i.e., half diamond distance) of ϵ⋄=8.1±1.4%, a readout fidelity of 97.0±0.3%, and output quantum-state fidelities of 96.7±0.6% and 93.7±0.7% when measuring 0 and 1, respectively.

More Details

FROSch PRECONDITIONERS FOR LAND ICE SIMULATIONS OF GREENLAND AND ANTARCTICA

SIAM Journal on Scientific Computing

Heinlein, Alexander; Perego, Mauro P.; Rajamanickam, Sivasankaran R.

Numerical simulations of Greenland and Antarctic ice sheets involve the solution of large-scale highly nonlinear systems of equations on complex shallow geometries. This work is concerned with the construction of Schwarz preconditioners for the solution of the associated tangent problems, which are challenging for solvers mainly because of the strong anisotropy of the meshes and wildly changing boundary conditions that can lead to poorly constrained problems on large portions of the domain. Here, two-level generalized Dryja-Smith-Widlund (GDSW)-type Schwarz preconditioners are applied to different land ice problems, i.e., a velocity problem, a temperature problem, as well as the coupling of the former two problems. We employ the message passing interface (MPI)- parallel implementation of multilevel Schwarz preconditioners provided by the package FROSch (fast and robust Schwarz) from the Trilinos library. The strength of the proposed preconditioner is that it yields out-of-the-box scalable and robust preconditioners for the single physics problems. To the best of our knowledge, this is the first time two-level Schwarz preconditioners have been applied to the ice sheet problem and a scalable preconditioner has been used for the coupled problem. The preconditioner for the coupled problem differs from previous monolithic GDSW preconditioners in the sense that decoupled extension operators are used to compute the values in the interior of the subdomains. Several approaches for improving the performance, such as reuse strategies and shared memory OpenMP parallelization, are explored as well. In our numerical study we target both uniform meshes of varying resolution for the Antarctic ice sheet as well as nonuniform meshes for the Greenland ice sheet. We present several weak and strong scaling studies confirming the robustness of the approach and the parallel scalability of the FROSch implementation. Among the highlights of the numerical results are a weak scaling study for up to 32 K processor cores (8 K MPI ranks and 4 OpenMP threads) and 566 M degrees of freedom for the velocity problem as well as a strong scaling study for up to 4 K processor cores (and MPI ranks) and 68 M degrees of freedom for the coupled problem.

More Details

Characterizing Memory Failures Using Benford’s Law

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Ferreira, Kurt B.; Levy, Scott

Fault tolerance is a key challenge as high performance computing systems continue to increase component counts, individual component reliability decreases, and hardware and software complexity increases. To better understand the potential impacts of failures on next-generation systems, significant effort has been devoted to collecting, characterizing and analyzing failures on current systems. These studies require large volumes of data and complex analysis in an attempt to identify statistical properties of the failure data. In this paper, we examine the lifetime of failures on the Cielo supercomputer that was located at Los Alamos National Laboratory, looking specifically at the time between faults on this system. Through this analysis, we show that the time between uncorrectable faults for this system obeys Benford’s law, This law applies to a number of naturally occurring collections of numbers and states that the leading digit is more likely to be small, for example a leading digit of 1 is more likely than 9. We also show that a number of common distributions used to model failures also follow this law. This work provides critical analysis on the distribution of times between failures for extreme-scale systems. Specifically, the analysis in this work could be used as a simple form of failure prediction or used for modeling realistic failures.

More Details

Orthogonal Polynomials Defined by Self-Similar Measures with Overlaps

Experimental Mathematics

Ngai, Sze M.; Tang, Wei; Tran, Anh; Yuan, Shuai

We study orthogonal polynomials with respect to self-similar measures, focusing on the class of infinite Bernoulli convolutions, which are defined by iterated function systems with overlaps, especially those defined by the Pisot, Garsia, and Salem numbers. By using an algorithm of Mantica, we obtain graphs of the coefficients of the 3-term recursion relation defining the orthogonal polynomials. We use these graphs to predict whether the singular infinite Bernoulli convolutions belong to the Nevai class. Based on our numerical results, we conjecture that all infinite Bernoulli convolutions with contraction ratios greater than or equal to 1/2 belong to Nevai’s class, regardless of the probability weights assigned to the self-similar measures.

More Details

Multifidelity data fusion in convolutional encoder/decoder assembly networks for computational fluid dynamics

AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022

Partin, Lauren; Rushdi, Ahmad R.; Schiavazzi, Daniele E.

We analyze the regression accuracy of convolutional neural networks assembled from encoders, decoders and skip connections and trained with multifidelity data. These networks benefit from a significant reduction in the number of trainable parameters with respect to an equivalent fully connected network. These architectures are also versatile with respect to the input and output dimensionality. For example, encoder-decoder, decoder-encoder or decoder-encoder-decoder architectures are well suited to learn mappings between input and outputs of any dimensionality. We demonstrate the accuracy produced by such architectures when trained on a few high-fidelity and many low-fidelity data generated from models ranging from one-dimensional functions to Poisson equation solvers in two-dimensions. We finally discuss a number of implementation choices that improve the reliability of the uncertainty estimates generated by a dropblock regularizer, and compare uncertainty estimates among low-, high-and multi-fidelity approaches.

More Details

Gas-kinetic simulations of compressible turbulence over a mean-free-path-scale porous wall

AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022

McMullen, Ryan M.; Krygier, Michael K.; Torczynski, J.R.; Gallis, Michail A.

We report flow statistics and visualizations from gas-kinetic simulations using the Direct Simulation Monte Carlo (DSMC) method of compressible turbulent Couette flow over a porous substrate composed of an array of circular cylinders for which the Knudsen number is O(10-1). Comparisons are made with both smooth-wall DSMC simulations and direct numerical simulations of the Navier-Stokes equations for the same conditions. Roughness, permeability, and noncontinuum effects are assessed.

More Details

Evaluating the Sustainability of Computational Science and Engineering Software: Empirical Observations

Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE

Willenbring, James M.; Walia, Gursimran S.

Software sustainability is critical for Computational Science and Engineering (CSE) software. It is also challenging due to factors ranging from funding models to the typical lifecycle of a research code to the inherent challenges of running fast on the newest architectures. Furthermore, measuring sustainability is challenging because sustainability consists of many complex attributes. To identify useful metrics for measuring CSE software sustainability, we gathered data from multiple freely available sources, including GitHub, SLOCCount, and Metrix++. This paper discusses the challenges practitioners face when measuring the sustainability of CSE software. We present an analysis of data with associated observations and future directions to better understand CSE software sustainability and how this work can be used to support decisions and improve sustainability by observing trends in metrics over time.

More Details

Mesostructure Evolution During Powder Compression: Micro-CT Experiments and Particle-Based Simulations

Conference Proceedings of the Society for Experimental Mechanics Series

Cooper, Marcia A.; Clemmer, Joel T.; Silling, Stewart A.; Bufford, Daniel C.; Bolintineanu, Dan S.

Powders under compression form mesostructures of particle agglomerations in response to both inter- and intra-particle forces. The ability to computationally predict the resulting mesostructures with reasonable accuracy requires models that capture the distributions associated with particle size and shape, contact forces, and mechanical response during deformation and fracture. The following report presents experimental data obtained for the purpose of validating emerging mesostructures simulated by discrete element method and peridynamic approaches. A custom compression apparatus, suitable for integration with our micro-computed tomography (micro-CT) system, was used to collect 3-D scans of a bulk powder at discrete steps of increasing compression. Details of the apparatus and the microcrystalline cellulose particles, with a nearly spherical shape and mean particle size, are presented. Comparative simulations were performed with an initial arrangement of particles and particle shapes directly extracted from the validation experiment. The experimental volumetric reconstruction was segmented to extract the relative positions and shapes of individual particles in the ensemble, including internal voids in the case of the microcrystalline cellulose particles. These computationally determined particles were then compressed within the computational domain and the evolving mesostructures compared directly to those in the validation experiment. The ability of the computational models to simulate the experimental mesostructures and particle behavior at increasing compression is discussed.

More Details

A FETI approach to domain decomposition for meshfree discretizations of nonlocal problems

Computer Methods in Applied Mechanics and Engineering

Xu, Xiao; Glusa, Christian A.; D'Elia, Marta D.; Foster, John T.

We propose a domain decomposition method for the efficient simulation of nonlocal problems. Our approach is based on a multi-domain formulation of a nonlocal diffusion problem where the subdomains share “nonlocal” interfaces of the size of the nonlocal horizon. This system of nonlocal equations is first rewritten in terms of minimization of a nonlocal energy, then discretized with a meshfree approximation and finally solved via a Lagrange multiplier approach in a way that resembles the finite element tearing and interconnect method. Specifically, we propose a distributed projected gradient algorithm for the solution of the Lagrange multiplier system, whose unknowns determine the nonlocal interface conditions between subdomains. Several two-dimensional numerical tests on problems as large as 191 million unknowns illustrate the strong and the weak scalability of our algorithm, which outperforms the standard approach to the distributed numerical solution of the problem. This work is the first rigorous numerical study in a two-dimensional multi-domain setting for nonlocal operators with finite horizon and, as such, it is a fundamental step towards increasing the use of nonlocal models in large scale simulations.

More Details

Crack nucleation at forging flaws studied by non-local peridynamics simulations

Mathematics and Mechanics of Solids

Karim, Mohammad K.; Narasimhachary, Santosh N.; Radailli, Francesco R.; Amann, Christian A.; Dayal, Kaushik D.; Silling, Stewart A.; Germann, Tim G.

In this study, we present a computational study and framework that allows us to study and understand the crack nucleation process from forging flaws. Forging flaws may be present in large steel rotor components commonly used for rotating power generation equipment including gas turbines, electrical generators, and steam turbines. The service life of these components is often limited by crack nucleation and subsequent growth from such forging flaws, which frequently exhibit themselves as non-metallic oxide inclusions. The fatigue crack growth process can be described by established engineering fracture mechanics methods. However, the initial crack nucleation process from a forging flaw is challenging for traditional engineering methods to quantify as it depends on the details of the flaw, including flaw morphology. We adopt the peridynamics method to describe and study this crack nucleation process. For a specific industrial gas turbine rotor steel, we present how we integrate and fit commonly known base material property data such as elastic properties, yield strength, and S-N curves, as well as fatigue crack growth data into a peridynamic model. The obtained model is then utilized in a series of high-performance two-dimensional peridynamic simulations to study the crack nucleation process from forging flaws for ambient and elevated temperatures in a rectangular simulation cell specimen. The simulations reveal an initial local nucleation at multiple small oxide inclusions followed by micro-crack propagation, arrest, coalescence, and eventual emergence of a dominant micro-crack that governs the crack nucleation process. The dependence on temperature and density of oxide inclusions of both the details of the microscopic processes and cycles to crack nucleation is also observed. Finally, the results are compared with fatigue experiments performed with specimens containing forging flaws of the same rotor steel.

More Details

NMSBA Sustainable Engineering (Final Report)

Nicholson, Bethany L.; Siirola, John D.

This report summarizes the guidance provided to Sustainable Engineering to help them learn about equation-oriented optimization and the Sandia-developed software packages Pyomo and IDAESPSE. This was a short 10-week project (October 2021 – December 2021) and the goal was to help the company learn about the IDAES framework and how it could be used for their future projects. The company submitted an SBIR proposal related to developing a green ammonia process model with IDAES and if that proposal is successful this NMSBA project could lead to future collaboration opportunities.

More Details

Evaluating MPI resource usage summary statistics

Parallel Computing

Ferreira, Kurt B.; Levy, Scott

The Message Passing Interface (MPI) remains the dominant programming model for scientific applications running on today's high-performance computing (HPC) systems. This dominance stems from MPI's powerful semantics for inter-process communication that has enabled scientists to write applications for simulating important physical phenomena. MPI does not, however, specify how messages and synchronization should be carried out. Those details are typically dependent on low-level architecture details and the message characteristics of the application. Therefore, analyzing an application's MPI resource usage is critical to tuning MPI's performance on a particular platform. The result of this analysis is typically a discussion of the mean message sizes, queue search lengths and message arrival times for a workload or set of workloads. While a discussion of the arithmetic mean in MPI resource usage might be the most intuitive summary statistic, it is not always the most accurate in terms of representing the underlying data. In this paper, we analyze MPI resource usage for a number of key MPI workloads using an existing MPI trace collector and discrete-event simulator. Our analysis demonstrates that the average, while easy and efficient to calculate, is a useful metric for characterizing latency and bandwidth measurements, but may not be a good representation of application message sizes, match list search depths, or MPI inter-operation times. Additionally, we show that the median and mode are superior choices in many cases. We also observe that the arithmetic mean is not the best representation of central tendency for data that are drawn from distributions that are multi-modal or have heavy tails. The results and analysis of our work provide valuable guidance on how we, as a community, should discuss and analyze MPI resource usage data for scientific applications.

More Details

Developing Uncertainty Quantification Strategies in Electromagnetic Problems Involving Highly Resonant Cavities

Journal of Verification, Validation and Uncertainty Quantification

Campione, Salvatore; Stephens, John A.; Martin, Nevin; Eckert, Aubrey C.; Warne, Larry K.; Huerta, Jose G.; Pfeiffer, Robert A.; Jones, Adam J.

High-quality factor resonant cavities are challenging structures to model in electromagnetics owing to their large sensitivity to minute parameter changes. Therefore, uncertainty quantification (UQ) strategies are pivotal to understanding key parameters affecting the cavity response. We discuss here some of these strategies focusing on shielding effectiveness (SE) properties of a canonical slotted cylindrical cavity that will be used to develop credibility evidence in support of predictions made using computational simulations for this application.

More Details

Discrete modeling of a transformer with ALEGRA

Rodriguez, Angel E.; Niederhaus, John H.; Greenwood, Wesley J.; Clutz, Christopher J.R.

We report progress on a task to model transformers in ALEGRA using the “Transient Magnetics” option. We specifically evaluate limits of the approach resolving individual coil wires. There are practical limits to the number of turns in a coil that can be numerically modeled, but calculated inductance can be scaled to the correct number of turns in a simple way. Our testing essentially confirmed this “turns scaling” hypothesis. We developed a conceptual transformer design, representative of practical designs of interest, and that focused our analysis. That design includes three coils wrapped around a rectangular ferromagnetic core. The secondary and tertiary coils have multiple layers. The tertiary has three layers of 13 turns each; the secondary has five layers of 44 turns; the primary has one layer of 20 turns. We validated the turns scaling of inductance for simple (one-layer) coils in air (no core) by comparison to available independent calculations for simple rectangular coils. These comparisons quantified the errors versus reduced number of turns modeled. For more than 3 turns, the errors are <5%. The magnetic field solver failed to converge (within 5000 iterations) for >10 turns. Including the core introduced some complications. It was necessary to capture the core surfaces in thin grid sheaths to minimize errors in computed magnetic energy. We do not yet have quantitative benchmarks with which to compare, but calculated results are qualitatively reasonable.

More Details

Document Retrieval and Ranking using Similarity Graph Mean Hitting Times

Dunlavy, Daniel D.; Chew, Peter A.

We present a novel approach to information retrieval and document analysis based on graph analytic methods. Traditional information retrieval methods use a set of terms to define a query that is applied against a document corpus to identify the documents most related to those terms. In contrast, we define a query as a set of documents of interest and apply the query by computing mean hitting times between this set and all other documents on a document similarity graph abstraction of the semantic relationships between all pairs of documents. We present the steps of our approach along with a simple example application illustrating how this approach can be used to find documents related to two or more documents or topics of interest.

More Details

Quantifying the unknown impact of segmentation uncertainty on image-based simulations

Nature Communications

Krygier, Michael K.; LaBonte, Tyler; Martinez, Carianne M.; Norris, Chance A.; Sharma, Krish; Collins, Lincoln; Mukherjee, Partha P.; Roberts, Scott A.

Image-based simulation, the use of 3D images to calculate physical quantities, relies on image segmentation for geometry creation. However, this process introduces image segmentation uncertainty because different segmentation tools (both manual and machine-learning-based) will each produce a unique and valid segmentation. First, we demonstrate that these variations propagate into the physics simulations, compromising the resulting physics quantities. Second, we propose a general framework for rapidly quantifying segmentation uncertainty. Through the creation and sampling of segmentation uncertainty probability maps, we systematically and objectively create uncertainty distributions of the physics quantities. We show that physics quantity uncertainty distributions can follow a Normal distribution, but, in more complicated physics simulations, the resulting uncertainty distribution can be surprisingly nontrivial. We establish that bounding segmentation uncertainty can fail in these nontrivial situations. While our work does not eliminate segmentation uncertainty, it improves simulation credibility by making visible the previously unrecognized segmentation uncertainty plaguing image-based simulation.

More Details

Performant implementation of the atomic cluster expansion (PACE) and application to copper and silicon

npj Computational Materials

Lysogorskiy, Yury; Oord, Cas v.; Bochkarev, Anton; Menon, Sarath; Rinaldi, Matteo; Hammerschmidt, Thomas; Mrovec, Matous; Thompson, Aidan P.; Csányi, Gábor; Ortner, Christoph; Drautz, Ralf

The atomic cluster expansion is a general polynomial expansion of the atomic energy in multi-atom basis functions. Here we implement the atomic cluster expansion in the performant C++ code PACE that is suitable for use in large-scale atomistic simulations. We briefly review the atomic cluster expansion and give detailed expressions for energies and forces as well as efficient algorithms for their evaluation. We demonstrate that the atomic cluster expansion as implemented in PACE shifts a previously established Pareto front for machine learning interatomic potentials toward faster and more accurate calculations. Moreover, general purpose parameterizations are presented for copper and silicon and evaluated in detail. We show that the Cu and Si potentials significantly improve on the best available potentials for highly accurate large-scale atomistic simulations.

More Details

Revealing quantum effects in highly conductive δ-layer systems

Communications Physics

Mamaluy, Denis M.; Mendez Granado, Juan P.; Gao, Xujiao G.; Misra, Shashank M.

Thin, high-density layers of dopants in semiconductors, known as δ-layer systems, have recently attracted attention as a platform for exploration of the future quantum and classical computing when patterned in plane with atomic precision. However, there are many aspects of the conductive properties of these systems that are still unknown. Here we present an open-system quantum transport treatment to investigate the local density of electron states and the conductive properties of the δ-layer systems. A successful application of this treatment to phosphorous δ-layer in silicon both explains the origin of recently-observed shallow sub-bands and reproduces the sheet resistance values measured by different experimental groups. Further analysis reveals two main quantum-mechanical effects: 1) the existence of spatially distinct layers of free electrons with different average energies; 2) significant dependence of sheet resistance on the δ-layer thickness for a fixed sheet charge density.

More Details

Timely Reporting of Heavy Hitters Using External Memory

ACM Transactions on Database Systems

Singh, Shikha; Pandey, Prashant; Bender, Michael A.; Berry, Jonathan W.; Farach-Colton, Martín; Johnson, Rob; Kroeger, Thomas M.; Phillips, Cynthia A.

Given an input stream S of size N, a φ-heavy hitter is an item that occurs at least φN times in S. The problem of finding heavy-hitters is extensively studied in the database literature.We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = φN-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection (TED) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity).Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (ω (N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes).We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead.We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device's random I/O throughput, i.e., ≈100K observations per second.

More Details

Data-driven magneto-elastic predictions with scalable classical spin-lattice dynamics

npj Computational Materials

Nikolov, Svetoslav V.; Wood, Mitchell A.; Cangi, Attila; Maillet, Jean B.; Marinica, Mihai C.; Thompson, Aidan P.; Desjarlais, Michael P.; Tranchida, Julien G.

A data-driven framework is presented for building magneto-elastic machine-learning interatomic potentials (ML-IAPs) for large-scale spin-lattice dynamics simulations. The magneto-elastic ML-IAPs are constructed by coupling a collective atomic spin model with an ML-IAP. Together they represent a potential energy surface from which the mechanical forces on the atoms and the precession dynamics of the atomic spins are computed. Both the atomic spin model and the ML-IAP are parametrized on data from first-principles calculations. We demonstrate the efficacy of our data-driven framework across magneto-structural phase transitions by generating a magneto-elastic ML-IAP for α-iron. The combined potential energy surface yields excellent agreement with first-principles magneto-elastic calculations and quantitative predictions of diverse materials properties including bulk modulus, magnetization, and specific heat across the ferromagnetic–paramagnetic phase transition.

More Details

Polarizable Water Potential Derived from a Model Electron Density

Journal of Chemical Theory and Computation

Rackers, Joshua R.; Silva, Roseane R.; Wang, Zhi; Ponder, Jay W.

A new empirical potential for efficient, large scale molecular dynamics simulation of water is presented. The HIPPO (Hydrogen-like Intermolecular Polarizable POtential) force field is based upon the model electron density of a hydrogen-like atom. This framework is used to derive and parametrize individual terms describing charge penetration damped permanent electrostatics, damped polarization, charge transfer, anisotropic Pauli repulsion, and damped dispersion interactions. Initial parameter values were fit to Symmetry Adapted Perturbation Theory (SAPT) energy components for ten water dimer configurations, as well as the radial and angular dependence of the canonical dimer. The SAPT-based parameters were then systematically refined to extend the treatment to water bulk phases. The final HIPPO water model provides a balanced representation of a wide variety of properties of gas phase clusters, liquid water, and ice polymorphs, across a range of temperatures and pressures. This water potential yields a rationalization of water structure, dynamics, and thermodynamics explicitly correlated with an ab initio energy decomposition, while providing a level of accuracy comparable or superior to previous polarizable atomic multipole force fields. The HIPPO water model serves as a cornerstone around which similarly detailed physics-based models can be developed for additional molecular species.

More Details

Characterizing Human Performance: Detecting Targets at High False Alarm Rates [Slides]

Speed, Ann S.; Wheeler, Jason W.; Russell, John L.; Oppel, Fred O.; Sanchez, Danielle; Silva, Austin R.; Chavez , Anna C.

Analysts develop a “no threat” bias with high false alarms. If only shown alarms for actual attacks, may never actually see an alarm. We see this in the laboratory, but not often studied in applied environments. (TSA is an exception.) In this work, near-operational paradigms are useful, but difficult to construct well. Pilot testing is critical before engaging time-limited professionals. Experimental control is difficult to balance with operational realism. Grounding near-operational experiments in basic research paradigms has both advantages and disadvantages. Despite shortcomings in our second experiment, we now have a platform for experimental investigations into the human element of physical security systems.

More Details

Exploring Explicit Uncertainty for Binary Analysis (EUBA)

Leger, Michelle A.; Darling, Michael C.; Jones, Stephen T.; Matzen, Laura E.; Stracuzzi, David J.; Wilson, Andrew T.; Bueno, Denis B.; Christentsen, Matthew C.; Ginaldi, Melissa J.; Hannasch, David A.; Heidbrink, Scott H.; Howell, Breannan C.; Leger, Chris; Reedy, Geoffrey E.; Rogers, Alisa N.; Williams, Jack A.

Reverse engineering (RE) analysts struggle to address critical questions about the safety of binary code accurately and promptly, and their supporting program analysis tools are simply wrong sometimes. The analysis tools have to approximate in order to provide any information at all, but this means that they introduce uncertainty into their results. And those uncertainties chain from analysis to analysis. We hypothesize that exposing sources, impacts, and control of uncertainty to human binary analysts will allow the analysts to approach their hardest problems with high-powered analytic techniques that they know when to trust. Combining expertise in binary analysis algorithms, human cognition, uncertainty quantification, verification and validation, and visualization, we pursue research that should benefit binary software analysis efforts across the board. We find a strong analogy between RE and exploratory data analysis (EDA); we begin to characterize sources and types of uncertainty found in practice in RE (both in the process and in supporting analyses); we explore a domain-specific focus on uncertainty in pointer analysis, showing that more precise models do help analysts answer small information flow questions faster and more accurately; and we test a general population with domain-general sudoku problems, showing that adding "knobs" to an analysis does not significantly slow down performance. This document describes our explorations in uncertainty in binary analysis.

More Details

Investigating Volumetric Inclusions of Semiconductor Materials to Improve Flashover Resistance in Dielectrics

Steiner, Adam M.; Siefert, Christopher S.; Shipley, Gabriel A.; Redline, Erica M.; Dickens, Sara D.; Jaramillo, Rex J.; Chavez, Tom C.; Hutsel, Brian T.; Frye-Mason, Gregory C.; Peterson, Kyle J.; Bell, Kate S.; Balogun, Shuaib A.; Losego, Mark D.; Sammeth, Torin M.; Kern, Ian J.; Harjes, Cameron D.; Gilmore, Mark A.; Lehr, Jane M.

Abstract not provided.

Dakota, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis (V.6.16 User's Manual)

Adams, Brian H.; Bohnhoff, William B.; Dalbey, Keith D.; Ebeida, Mohamed S.; Eddy, John E.; Eldred, Michael E.; Hooper, Russell H.; Hough, Patricia H.; Hu, Kenneth H.; Jakeman, John J.; Khalil, Mohammad K.; Maupin, Kathryn M.; Monschke, Jason A.; Ridgway, Elliott R.; Rushdi, Ahmad A.; Seidl, Daniel S.; Stephens, John A.; Swiler, Laura P.; Tran, Anh; Winokur, Justin W.

The Dakota toolkit provides a flexible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantification with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the Dakota toolkit provides a flexible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a user's manual for the Dakota software and provides capability overviews and procedures for software execution, as well as a variety of example studies.

More Details

CSRI Summer Proceedings 2021

Smith, John D.; Galvan, Edgar

The Computer Science Research Institute (CSRI) brings university faculty and students to Sandia National Laboratories for focused collaborative research on Department of Energy (DOE) computer and computational science problems. The institute provides an opportunity for university researches to learn about problems in computer and computational science at DOE laboratories, and help transfer results of their research to programs at the labs. Some specific CSRI research interest areas are: scalable solvers, optimization, algebraic preconditioners, graph-based, discrete, and combinatorial algorithms, uncertainty estimation, validation and verification methods, mesh generation, dynamic load-balancing, virus and other malicious-code defense, visualization, scalable cluster computers, beyond Moore’s Law computing, exascale computing tools and application design, reduced order and multiscale modeling, parallel input/output, and theoretical computer science. The CSRI Summer Program is organized by CSRI and includes a weekly seminar series and the publication of a summer proceedings.

More Details

CSRI Summer Proceedings 2021

Smith, John D.; Galvan, Edgar

The Computer Science Research Institute (CSRI) brings university faculty and students to Sandia National Laboratories for focused collaborative research on Department of Energy (DOE) computer and computational science problems. The institute provides an opportunity for university researches to learn about problems in computer and computational science at DOE laboratories, and help transfer results of their research to programs at the labs. Some specific CSRI research interest areas are: scalable solvers, optimization, algebraic preconditioners, graph-based, discrete, and combinatorial algorithms, uncertainty estimation, validation and verification methods, mesh generation, dynamic load-balancing, virus and other malicious-code defense, visualization, scalable cluster computers, beyond Moore’s Law computing, exascale computing tools and application design, reduced order and multiscale modeling, parallel input/output, and theoretical computer science. The CSRI Summer Program is organized by CSRI and includes a weekly seminar series and the publication of a summer proceedings.

More Details
Results 101–200 of 9,998
Results 101–200 of 9,998