Publications

Results 101–150 of 9,998
Skip to search filters

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems

Moon, Gordon E.; Kwon, Hyoukjun; Jeong, Geonhwa; Chatarasi, Prasanth; Rajamanickam, Sivasankaran R.; Krishna, Tushar

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.

More Details

Large-Scale Atomistic Simulations: Investigating Free Expansion

Moore, Stan G.

The experiment investigates free expansion of a supercritical fluid into a two-phase liquid-vapor coexistence region. A huge molecular dynamics simulation (6 billion Lennard-Jones atoms) was run on 5760 GPUs (33% of LLNL Sierra) using LAMMPS/Kokkos software. This improved visualization workflow and started preliminary simulations of aluminum using SNAP machine learning potential.

More Details

Elucidating size effects on the yield strength of single-crystal Cu via the Richtmyer–Meshkov instability

Journal of Applied Physics

Stewart, James A.; Wood, Mitchell A.; Olles, Joseph O.

Capturing the dynamic response of a material under high strain-rate deformation often demands challenging and time consuming experimental effort. While shock hydrodynamic simulation methods can aid in this area, a priori characterizations of the material strength under shock loading and spall failure are needed in order to parameterize constitutive models needed for these computational tools. Moreover, parameterizations of strain-rate-dependent strength models are needed to capture the full suite of Richtmyer–Meshkov instability (RMI) behavior of shock compressed metals, creating an unrealistic demand for these training data solely on experiments. Herein, we sweep a large range of geometric, crystallographic, and shock conditions within molecular dynamics (MD) simulations and demonstrate the breadth of RMI in Cu that can be captured from the atomic scale. In this work, yield strength measurements from jetted and arrested material from a sinusoidal surface perturbation were quantified as YRMI = 0.787 ± 0.374 GPa, higher than strain-rate-independent models used in experimentally matched hydrodynamic simulations. Defect-free, single-crystal Cu samples used in MD will overestimate YRMI, but the drastic scale difference between experiment and MD is highlighted by high confidence neighborhood clustering predictions of RMI characterizations, yielding incorrect classifications.

More Details

First-passage time statistics on surfaces of general shape: Surface PDE solvers using Generalized Moving Least Squares (GMLS)

Journal of Computational Physics

Gross, B.J.; Kuberry, Paul A.; Atzberger, P.J.

We develop numerical methods for computing statistics of stochastic processes on surfaces of general shape with drift-diffusion dynamics dXt=a(Xt)dt+b(Xt)dWt. We formulate descriptions of Brownian motion and general drift-diffusion processes on surfaces. We consider statistics of the form u(x)=Ex[∫0τg(Xt)dt]+Ex[f(Xτ)] for a domain Ω and the exit stopping time τ=inft⁡{t>0|Xt∉Ω}, where f,g are general smooth functions. For computing these statistics, we develop high-order Generalized Moving Least Squares (GMLS) solvers for associated surface PDE boundary-value problems based on Backward-Kolmogorov equations. We focus particularly on the mean First Passage Times (FPTs) given by the case f=0,g=1 where u(x)=Ex[τ]. We perform studies for a variety of shapes showing our methods converge with high-order accuracy both in capturing the geometry and the surface PDE solutions. We then perform studies showing how statistics are influenced by the surface geometry, drift dynamics, and spatially dependent diffusivities.

More Details

Three-Photon Optical Pumping for Trapped Ion Quantum Computing

Hogle, Craig W.; Ivory, Megan K.; Lobser, Daniel L.; Ruzic, Brandon R.; DeRose, Christopher T.

In this report we describe the testing of a novel scheme for state preparation of trapped ions in a quantum computing setup. This technique optimally would allow for similar precision and speed of state preparation while allowing for individual addressability of single ions in a chain using technology already available in a trapped ion experiment. As quantum computing experiments become more complicated, mid-experiment measurements will become necessary to achieve algorithms such as quantum error correction. Any mid-experiment measurement then requires the measured qubit to be re-prepared to a known quantum state. Currently this involves the protected qubits to be moved a sizeable distance away from the qubit being re-prepared which can be costly in terms of experiment length as well as introducing errors. Theoretical calculations predict that a three-photon process would allow for state preparation without qubit movement with similar efficiencies to current state preparation methods.

More Details

Navier-Stokes Equations Do Not Describe the Smallest Scales of Turbulence in Gases

Physical Review Letters

McMullen, Ryan M.; Krygier, Michael K.; Torczynski, J.R.; Gallis, Michail A.

In turbulent flows, kinetic energy is transferred from the largest scales to progressively smaller scales, until it is ultimately converted into heat. The Navier-Stokes equations are almost universally used to study this process. Here, by comparing with molecular-gas-dynamics simulations, we show that the Navier-Stokes equations do not describe turbulent gas flows in the dissipation range because they neglect thermal fluctuations. We investigate decaying turbulence produced by the Taylor-Green vortex and find that in the dissipation range the molecular-gas-dynamics spectra grow quadratically with wave number due to thermal fluctuations, in agreement with previous predictions, while the Navier-Stokes spectra decay exponentially. Furthermore, the transition to quadratic growth occurs at a length scale much larger than the gas molecular mean free path, namely in a regime that the Navier-Stokes equations are widely believed to describe. In fact, our results suggest that the Navier-Stokes equations are not guaranteed to describe the smallest scales of gas turbulence for any positive Knudsen number.

More Details

A theoretical investigation of the hydrolysis of uranium hexafluoride: the initiation mechanism and vibrational spectroscopy

Physical Chemistry Chemical Physics. PCCP

Lutz, Jesse J.; byrd, jason b.; Lotrich, Victor L.; Jensen, Daniel S.; Zador, Judit Z.; Hubbard, Joshua A.

Depleted uranium hexafluoride (UF6), a stockpiled byproduct of the nuclear fuel cycle, reacts readily with atmospheric humidity, but the mechanism is poorly understood. Here we compare several potential initiation steps at a consistent level of theory, generating underlying structures and vibrational modes using hybrid density functional theory (DFT) and computing relative energies of stationary points with double-hybrid (DH) DFT. A benchmark comparison is performed to assess the quality of DH-DFT data using reference energy differences obtained using a complete-basis-limit coupled-cluster (CC) composite method. The associated large-basis CC computations were enabled by a new general-purpose pseudopotential capability implemented as part of this work. Dispersion-corrected parameter-free DH-DFT methods, namely PBE0-DH-D3(BJ) and PBE-QIDH-D3(BJ), provided mean unsigned errors within chemical accuracy (1 kcal mol-1) for a set of barrier heights corresponding to the most energetically favorable initiation steps. The hydrolysis mechanism is found to proceed via intermolecular hydrogen transfer within van der Waals complexes involving UF6, UF5OH, and UOF4, in agreement with previous studies, followed by the formation of a previously unappreciated dihydroxide intermediate, UF4(OH)2. The dihydroxide is predicted to form under both kinetic and thermodynamic control, and, unlike the alternate pathway leading to the UO2F2 monomer, its reaction energy is exothermic, in agreement with observation. Finally, harmonic and anharmonic vibrational simulations are performed to reinterpret literature infrared spectroscopy in light of this newly identified species.

More Details

A hybrid meshfree discretization to improve the numerical performance of peridynamic models

Computer Methods in Applied Mechanics and Engineering

Shojaei, Arman; Hermann, Alexander; Cyron, Christian J.; Seleson, Pablo; Silling, Stewart A.

Efficient and accurate calculation of spatial integrals is of major interest in the numerical implementation of peridynamics (PD). The standard way to perform this calculation is a particle-based approach that discretizes the strong form of the PD governing equation. This approach has rapidly been adopted by the PD community since it offers some advantages. It is computationally cheaper than other available schemes, can conveniently handle material separation, and effectively deals with nonlinear PD models. Nevertheless, PD models are still computationally very expensive compared with those based on the classical continuum mechanics theory, particularly for large-scale problems in three dimensions. This results from the nonlocal nature of the PD theory which leads to interactions of each node of a discretized body with multiple surrounding nodes. Here, we propose a new approach to significantly boost the numerical efficiency of PD models. We propose a discretization scheme that employs a simple collocation procedure and is truly meshfree; i.e., it does not depend on any background integration cells. In contrast to the standard scheme, the proposed scheme requires a much smaller set of neighboring nodes (keeping the same physical length scale) to achieve a specific accuracy and is thus computationally more efficient. Our new scheme is applicable to the case of linear PD models and within neighborhoods where the solution can be approximated by smooth basis functions. Therefore, to fully exploit the advantages of both the standard and the proposed schemes, a hybrid discretization is presented that combines both approaches within an adaptive framework. The high performance of the developed framework is illustrated by several numerical examples, including brittle fracture and corrosion problems in two and three dimensions.

More Details

Krylov subspace recycling for evolving structures

Computer Methods in Applied Mechanics and Engineering

Bolten, M.; de Sturler, E.; Hahn, C.; Parks, Michael L.

Krylov subspace recycling is a powerful tool when solving a long series of large, sparse linear systems that change only slowly over time. In PDE constrained shape optimization, these series appear naturally, as typically hundreds or thousands of optimization steps are needed with only small changes in the geometry. In this setting, however, applying Krylov subspace recycling can be a difficult task. As the geometry evolves, in general, so does the finite element mesh defined on or representing this geometry, including the numbers of nodes and elements and element connectivity. This is especially the case if re-meshing techniques are used. As a result, the number of algebraic degrees of freedom in the system changes, and in general the linear system matrices resulting from the finite element discretization change size from one optimization step to the next. Changes in the mesh connectivity also lead to structural changes in the matrices. In the case of re-meshing, even if the geometry changes only a little, the corresponding mesh might differ substantially from the previous one. Obviously, this prevents any straightforward mapping of the approximate invariant subspace of the linear system matrix (the focus of recycling in this paper) from one optimization step to the next; similar problems arise for other selected subspaces. In this paper, we present an algorithm to map an approximate invariant subspace of the linear system matrix for the previous optimization step to an approximate invariant subspace of the linear system matrix for the current optimization step, for general meshes. This is achieved by exploiting the map from coefficient vectors to finite element functions on the mesh, combined with interpolation or approximation of functions on the finite element mesh. We demonstrate the effectiveness of our approach numerically with several proof of concept studies for a specific meshing technique.

More Details

Pyomo.GDP: an ecosystem for logic based modeling and optimization development

Optimization and Engineering

Chen, Qi; Johnson, Emma S.; Bernal, David E.; Valentin, Romeo; Kale, Sunjeev; Bates, Johnny; Siirola, John D.; Grossmann, Ignacio E.

We present three core principles for engineering-oriented integrated modeling and optimization tool sets—intuitive modeling contexts, systematic computer-aided reformulations, and flexible solution strategies—and describe how new developments in Pyomo.GDP for Generalized Disjunctive Programming (GDP) advance this vision. We describe a new logical expression system implementation for Pyomo.GDP allowing for a more intuitive description of logical propositions. The logical expression system supports automated reformulation of these logical constraints to linear constraints. We also describe two new logic-based global optimization solver implementations built on Pyomo.GDP that exploit logical structure to avoid “zero-flow” numerical difficulties that arise in nonlinear network design problems when nodes or streams disappear. These new solvers also demonstrate the capability to link to external libraries for expanded functionality within an integrated implementation. We present these new solvers in the context of a flexible array of solution paths available to GDP models. Finally, we present results on a new library of GDP models demonstrating the value of multiple solution approaches.

More Details

Hole in one: Pathways to deterministic single-acceptor incorporation in Si(100)-2 × 1

AVS Quantum Science

Campbell, Quinn C.; Baczewski, Andrew D.; Butera, R.E.; Misra, Shashank M.

Stochastic incorporation kinetics can be a limiting factor in the scalability of semiconductor fabrication technologies using atomic-precision techniques. While these technologies have recently been extended from donors to acceptors, the extent to which kinetics will impact single-acceptor incorporation has yet to be assessed. To identify the precursor molecule and dosing conditions that are promising for deterministic incorporation, we develop and apply an atomistic model for the single-acceptor incorporation rates of several recently demonstrated molecules: diborane (B2H6), boron trichloride (BCl3), and aluminum trichloride in both monomer (AlCl3) and dimer forms (Al2Cl6). While all three precursors can realize single-acceptor incorporation, we predict that diborane is unlikely to realize deterministic incorporation, boron trichloride can realize deterministic incorporation with modest heating (50 °C), and aluminum trichloride can realize deterministic incorporation at room temperature. We conclude that both boron and aluminum trichloride are promising precursors for atomic-precision single-acceptor applications, with the potential to enable the reliable production of large arrays of single-atom quantum devices.

More Details

Atomic step disorder on polycrystalline surfaces leads to spatially inhomogeneous work functions

Journal of Vacuum Science and Technology A

Bussmann, Ezra B.; smith, sean w.; Scrymgeour, David S.; Brumbach, Michael T.; Lu, Ping L.; Dickens, Sara D.; Michael, Joseph R.; Ohta, Taisuke O.; Hjalmarson, Harold P.; Schultz, Peter A.; Clem, Paul G.; Hopkins, Matthew M.; Moore, Christopher M.

Structural disorder causes materials’ surface electronic properties, e.g., work function ([Formula: see text]), to vary spatially, yet it is challenging to prove exact causal relationships to underlying ensemble disorder, e.g., roughness or granularity. For polycrystalline Pt, nanoscale resolution photoemission threshold mapping reveals a spatially varying [Formula: see text] eV over a distribution of (111) vicinal grain surfaces prepared by sputter deposition and annealing. With regard to field emission and related phenomena, e.g., vacuum arc initiation, a salient feature of the [Formula: see text] distribution is that it is skewed with a long tail to values down to 5.4 eV, i.e., far below the mean, which is exponentially impactful to field emission via the Fowler–Nordheim relation. We show that the [Formula: see text] spatial variation and distribution can be explained by ensemble variations of granular tilts and surface slopes via a Smoluchowski smoothing model wherein local [Formula: see text] variations result from spatially varying densities of electric dipole moments, intrinsic to atomic steps, that locally modify [Formula: see text]. Atomic step-terrace structure is confirmed with scanning tunneling microscopy (STM) at several locations on our surfaces, and prior works showed STM evidence for atomic step dipoles at various metal surfaces. From our model, we find an atomic step edge dipole [Formula: see text] D/edge atom, which is comparable to values reported in studies that utilized other methods and materials. Our results elucidate a connection between macroscopic [Formula: see text] and the nanostructure that may contribute to the spread of reported [Formula: see text] for Pt and other surfaces and may be useful toward more complete descriptions of polycrystalline metals in the models of field emission and other related vacuum electronics phenomena, e.g., arc initiation.

More Details

Finding Electronic Structure Machine Learning Surrogates without Training

Fiedler, Lenz F.; Hoffmann, Nils H.; Mohammed, Parvez M.; Popoola, Gabriel A.; Yovell, Tamar Y.; Oles, Vladyslav O.; Ellis, Austin E.; Rajamanickam, Sivasankaran R.; Cangi, Attila -.

A myriad of phenomena in materials science and chemistry rely on quantum-level simulations of the electronic structure in matter. While moving to larger length and time scales has been a pressing issue for decades, such large-scale electronic structure calculations are still challenging despite modern software approaches and advances in high-performance computing. The silver lining in this regard is the use of machine learning to accelerate electronic structure calculations – this line of research has recently gained growing attention. The grand challenge therein is finding a suitable machine-learning model during a process called hyperparameter optimization. This, however, causes a massive computational overhead in addition to that of data generation. We accelerate the construction of machine-learning surrogate models by roughly two orders of magnitude by circumventing excessive training during the hyperparameter optimization phase. We demonstrate our workflow for Kohn-Sham density functional theory, the most popular computational method in materials science and chemistry.

More Details

Unified Memory: GPGPU-Sim/UVM Smart Integration

Liu, Yechen L.; Rogers, Timothy R.; Hughes, Clayton H.

CPU/GPU heterogeneous compute platforms are an ubiquitous element in computing and a programming model specified for this heterogeneous computing model is important for both performance and programmability. A programming model that exposes the shared, unified, address space between the heterogeneous units is a necessary step in this direction as it removes the burden of explicit data movement from the programmer while maintaining performance. GPU vendors, such as AMD and NVIDIA, have released software-managed runtimes that can provide programmers the illusion of unified CPU and GPU memory by automatically migrating data in and out of the GPU memory. However, this runtime support is not included in GPGPU-Sim, a commonly used framework that models the features of a modern graphics processor that are relevant to non-graphics applications. UVM Smart was developed, which extended GPGPU-Sim 3.x to in- corporate the modeling of on-demand pageing and data migration through the runtime. This report discusses the integration of UVM Smart and GPGPU-Sim 4.0 and the modifications to improve simulation performance and accuracy.

More Details

Randomized Cholesky Preconditioning for Graph Partitioning Applications

Espinoza, Heliezer J.; Loe, Jennifer A.; Boman, Erik G.

Graph partitioning has emerged as an area of interest due to its use in various applications in computational research. One way to partition a graph is to solve for the eigenvectors of the corresponding graph Laplacian matrix. This project focuses on the eigensolver LOBPCG and the evaluation of a new preconditioner: Randomized Cholesky Factorization (rchol). This proconditioner was tested for its speed and accuracy against other well-known preconditioners for the method. After experiments were run on several known test matrices, rchol appears to be a better preconditioner for structured matrices. This research was sponsored by National Nuclear Security Administration Minority Serving Institutions Internship Program (NNSA-MSIIP) and completed at host facility Sandia National Laboratories. As such, after discussion of the research project itself, this report contains a brief reflection on experience gained as a result of participating in the NNSA-MSIIP.

More Details

A silicon singlet–triplet qubit driven by spin-valley coupling

Nature Communications

Jock, Ryan M.; Jacobson, Noah T.; Rudolph, Martin R.; Ward, Daniel R.; Carroll, Malcolm S.; Luhman, Dwight R.

Spin–orbit effects, inherent to electrons confined in quantum dots at a silicon heterointerface, provide a means to control electron spin qubits without the added complexity of on-chip, nanofabricated micromagnets or nearby coplanar striplines. Here, we demonstrate a singlet–triplet qubit operating mode that can drive qubit evolution at frequencies in excess of 200 MHz. This approach offers a means to electrically turn on and off fast control, while providing high logic gate orthogonality and long qubit dephasing times. We utilize this operational mode for dynamical decoupling experiments to probe the charge noise power spectrum in a silicon metal-oxide-semiconductor double quantum dot. In addition, we assess qubit frequency drift over longer timescales to capture low-frequency noise. We present the charge noise power spectral density up to 3 MHz, which exhibits a 1/fα dependence consistent with α ~ 0.7, over 9 orders of magnitude in noise frequency.

More Details

A Block-Based Triangle Counting Algorithm on Heterogeneous Environments

IEEE Transactions on Parallel and Distributed Systems

Yasar, Abdurrahman; Rajamanickam, Sivasankaran R.; Berry, Jonathan W.; Catalyurek, Umit V.

Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.

More Details

Neuromorphic scaling advantages for energy-efficient random walk computations

Nature Electronics

Smith, John D.; Hill, Aaron J.; Reeder, Leah E.; Franke, Brian C.; Lehoucq, Richard B.; Parekh, Ojas D.; Severa, William M.; Aimone, James B.

Neuromorphic computing, which aims to replicate the computational structure and architecture of the brain in synthetic hardware, has typically focused on artificial intelligence applications. What is less explored is whether such brain-inspired hardware can provide value beyond cognitive tasks. Here we show that the high degree of parallelism and configurability of spiking neuromorphic architectures makes them well suited to implement random walks via discrete-time Markov chains. These random walks are useful in Monte Carlo methods, which represent a fundamental computational tool for solving a wide range of numerical computing tasks. Using IBM’s TrueNorth and Intel’s Loihi neuromorphic computing platforms, we show that our neuromorphic computing algorithm for generating random walk approximations of diffusion offers advantages in energy-efficient computation compared with conventional approaches. We also show that our neuromorphic computing algorithm can be extended to more sophisticated jump-diffusion processes that are useful in a range of applications, including financial economics, particle physics and machine learning.

More Details

Assessing the predictive impact of factor fixing with an adaptive uncertainty-based approach

Environmental Modelling and Software

Wang, Qian; Guillaume, Joseph H.A.; Jakeman, John D.; Yang, Tao; Iwanaga, Takuya; Croke, Barry; Jakeman, Anthony J.

Despite widespread use of factor fixing in environmental modeling, its effect on model predictions has received little attention and is instead commonly presumed to be negligible. We propose a proof-of-concept adaptive method for systematically investigating the impact of factor fixing. The method uses Global Sensitivity Analysis methods to identify groups of sensitive parameters, then quantifies which groups can be safely fixed at nominal values without exceeding a maximum acceptable error, demonstrated using the 21-dimensional Sobol’ G-function. Three error measures are considered for quantities of interest, namely Relative Mean Absolute Error, Pearson Product-Moment Correlation and Relative Variance. Results demonstrate that factor fixing may cause large errors in the model results unexpectedly, when preliminary analysis suggests otherwise, and that the default value selected affects the number of factors to fix. To improve the applicability and methodological development of factor fixing, a new research agenda encompassing five opportunities is discussed for further attention.

More Details

A data-driven peridynamic continuum model for upscaling molecular dynamics

Computer Methods in Applied Mechanics and Engineering

You, Huaiqian; Yu, Yue; Silling, Stewart A.; D'Elia, Marta D.

Nonlocal models, including peridynamics, often use integral operators that embed lengthscales in their definition. However, the integrands in these operators are difficult to define from the data that are typically available for a given physical system, such as laboratory mechanical property tests. In contrast, molecular dynamics (MD) does not require these integrands, but it suffers from computational limitations in the length and time scales it can address. To combine the strengths of both methods and to obtain a coarse-grained, homogenized continuum model that efficiently and accurately captures materials’ behavior, we propose a learning framework to extract, from MD data, an optimal Linear Peridynamic Solid (LPS) model as a surrogate for MD displacements. To maximize the accuracy of the learnt model we allow the peridynamic influence function to be partially negative, while preserving the well-posedness of the resulting model. To achieve this, we provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions and develop a constrained optimization algorithm that minimizes the equation residual while enforcing such solvability conditions. This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training. We illustrate the efficacy of the proposed approach with several numerical tests for single layer graphene. Our two-dimensional tests show the robustness of the proposed algorithm on validation data sets that include thermal noise, different domain shapes and external loadings, and discretizations substantially different from the ones used for training.

More Details

LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales

Computer Physics Communications

Thompson, Aidan P.; Aktulga, H.M.; Berger, Richard; Bolintineanu, Dan S.; Brown, W.M.; Crozier, Paul C.; in 't Veld, Pieter J.; Kohlmeyer, Axel; Moore, Stan G.; Nguyen, Trung D.; Shan, Ray; Stevens, Mark J.; Tranchida, Julien; Trott, Christian R.; Plimpton, Steven J.

Since the classical molecular dynamics simulator LAMMPS was released as an open source code in 2004, it has become a widely-used tool for particle-based modeling of materials at length scales ranging from atomic to mesoscale to continuum. Reasons for its popularity are that it provides a wide variety of particle interaction models for different materials, that it runs on any platform from a single CPU core to the largest supercomputers with accelerators, and that it gives users control over simulation details, either via the input script or by adding code for new interatomic potentials, constraints, diagnostics, or other features needed for their models. As a result, hundreds of people have contributed new capabilities to LAMMPS and it has grown from fifty thousand lines of code in 2004 to a million lines today. In this paper several of the fundamental algorithms used in LAMMPS are described along with the design strategies which have made it flexible for both users and developers. We also highlight some capabilities recently added to the code which were enabled by this flexibility, including dynamic load balancing, on-the-fly visualization, magnetic spin dynamics models, and quantum-accuracy machine learning interatomic potentials. Program Summary: Program Title: Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) CPC Library link to program files: https://doi.org/10.17632/cxbxs9btsv.1 Developer's repository link: https://github.com/lammps/lammps Licensing provisions: GPLv2 Programming language: C++, Python, C, Fortran Supplementary material: https://www.lammps.org Nature of problem: Many science applications in physics, chemistry, materials science, and related fields require parallel, scalable, and efficient generation of long, stable classical particle dynamics trajectories. Within this common problem definition, there lies a great diversity of use cases, distinguished by different particle interaction models, external constraints, as well as timescales and lengthscales ranging from atomic to mesoscale to macroscopic. Solution method: The LAMMPS code uses parallel spatial decomposition, distributed neighbor lists, and parallel FFTs for long-range Coulombic interactions [1]. The time integration algorithm is based on the Størmer-Verlet symplectic integrator [2], which provides better stability than higher-order non-symplectic methods. In addition, LAMMPS supports a wide range of interatomic potentials, constraints, diagnostics, software interfaces, and pre- and post-processing features. Additional comments including restrictions and unusual features: This paper serves as the definitive reference for the LAMMPS code. References: [1] S. Plimpton, Fast parallel algorithms for short-range molecular dynamics. J. Comp. Phys. 117 (1995) 1–19. [2] L. Verlet, Computer experiments on classical fluids: I. Thermodynamical properties of Lennard–Jones molecules, Phys. Rev. 159 (1967) 98–103.

More Details

Randomized Cholesky Preconditioning for Graph Partitioning Applications

Espinoza, Heliezer J.; Loe, Jennifer A.; Boman, Erik G.

A graph is a mathematical representation of a network; we say it consists of a set of vertices, which are connected by edges. Graphs have numerous applications in various fields, as they can model all sorts of connections, processes, or relations. For example, graphs can model intricate transit systems or the human nervous system. However, graphs that are large or complicated become difficult to analyze. This is why there is an increased interest in the area of graph partitioning, reducing the size of the graph into multiple partitions. For example, partitions of a graph representing a social network might help identify clusters of friends or colleagues. Graph partitioning is also a widely used approach to load balancing in parallel computing. The partitioning of a graph is extremely useful to decompose the graph into smaller parts and allow for easier analysis. There are different ways to solve graph partitioning problems. For this work, we focus on a spectral partitioning method which forms a partition based upon the eigenvectors of the graph Laplacian (details presented in Acer, et. al.). This method uses the LOBPCG algorithm to compute these eigenvectors. LOBPCG can be accelerated by an operator called a preconditioner. For this internship, we evaluate a randomized Cholesky (rchol) preconditioner for its effectiveness on graph partitioning problems with LOBPCG. We compare it with two standard preconditioners: Jacobi and Incomplete Cholesky (ichol). This research was conducted from August to December 2021 in conjunction with Sandia National Laboratories.

More Details

Zero-Truncated Poisson Tensor Decomposition for Sparse Count Data

Lopez, Oscar L.; Lehoucq, Richard B.; Dunlavy, Daniel D.

We propose a novel statistical inference paradigm for zero-inflated multiway count data that dispenses with the need to distinguish between true and false zero counts. Our approach ignores all zero entries and applies zero-truncated Poisson regression on the positive counts. Inference is accomplished via tensor completion that imposes low-rank structure on the Poisson parameter space. Our main result shows that an $\textit{N}$-way rank-R parametric tensor 𝓜 ϵ (0, ∞)$I$Χ∙∙∙Χ$I$ generating Poisson observations can be accurately estimated from approximately $IR^2 \text{log}^2_2(I)$ non-zero counts for a nonnegative canonical polyadic decomposition. Several numerical experiments are presented demonstrating that our zero-truncated paradigm is comparable to the ideal scenario where the locations of false zero counts are known $\textit{a priori}$.

More Details

Machine-Learning of Nonlocal Kernels for Anomalous Subsurface Transport from Breakthrough Curves

D'Elia, Marta D.; Glusa, Christian A.; Xu, Xiao X.; Foster, John E.

Anomalous behavior is ubiquitous in subsurface solute transport due to the presence of high degrees of heterogeneity at different scales in the media. Although fractional models have been extensively used to describe the anomalous transport in various subsurface applications, their application is hindered by computational challenges. Simpler nonlocal models characterized by integrable kernels and finite interaction length represent a computationally feasible alternative to fractional models; yet, the informed choice of their kernel functions still remains an open problem. We propose a general data-driven framework for the discovery of optimal kernels on the basis of very small and sparse data sets in the context of anomalous subsurface transport. Using spatially sparse breakthrough curves recovered from fine-scale particle-density simulations, we learn the best coarse-scale nonlocal model using a nonlocal operator regression technique. Predictions of the breakthrough curves obtained using the optimal nonlocal model show good agreement with fine-scale simulation results even at locations and time intervals different from the ones used to train the kernel, confirming the excellent generalization properties of the proposed algorithm. A comparison with trained classical models and with black-box deep neural networks confirms the superiority of the predictive capability of the proposed model.

More Details

Sequential optical response suppression for chemical mixture characterization

Quantum

Magann, Alicia B.; McCaul, Gerard M.; Rabitz, Herschel R.; Bondar, Denys B.

The characterization of mixtures of non-interacting, spectroscopically similar quantum components has important applications in chemistry, biology, and materials science. We introduce an approach based on quantum tracking control that allows for determining the relative concentrations of constituents in a quantum mixture, using a single pulse which enhances the distinguishability of components of the mixture and has a length that scales linearly with the number of mixture constituents. To illustrate the method, we consider two very distinct model systems: mixtures of diatomic molecules in the gas phase, as well as solid-state materials composed of a mixture of components. A set of numerical analyses are presented, showing strong performance in both settings.

More Details

Precision tomography of a three-qubit donor quantum processor in silicon

Nature

Mądzik, Mateusz T.; Asaad, Serwan; Youssry, Akram; Joecker, Benjamin; Rudinger, Kenneth M.; Nielsen, Erik N.; Young, Kevin C.; Proctor, Timothy J.; Baczewski, Andrew D.; Laucht, Arne; Schmitt, Vivien; Hudson, Fay E.; Itoh, Kohei M.; Jakob, Alexander M.; Johnson, Brett C.; Jamieson, David N.; Dzurak, Andrew S.; Ferrie, Christopher; Blume-Kohout, Robin J.; Morello, Andrea

Nuclear spins were among the first physical platforms to be considered for quantum information processing1,2, because of their exceptional quantum coherence3 and atomic-scale footprint. However, their full potential for quantum computing has not yet been realized, owing to the lack of methods with which to link nuclear qubits within a scalable device combined with multi-qubit operations with sufficient fidelity to sustain fault-tolerant quantum computation. Here we demonstrate universal quantum logic operations using a pair of ion-implanted 31P donor nuclei in a silicon nanoelectronic device. A nuclear two-qubit controlled-Z gate is obtained by imparting a geometric phase to a shared electron spin4, and used to prepare entangled Bell states with fidelities up to 94.2(2.7)%. The quantum operations are precisely characterized using gate set tomography (GST)5, yielding one-qubit average gate fidelities up to 99.95(2)%, two-qubit average gate fidelity of 99.37(11)% and two-qubit preparation/measurement fidelities of 98.95(4)%. These three metrics indicate that nuclear spins in silicon are approaching the performance demanded in fault-tolerant quantum processors6. We then demonstrate entanglement between the two nuclei and the shared electron by producing a Greenberger–Horne–Zeilinger three-qubit state with 92.5(1.0)% fidelity. Because electron spin qubits in semiconductors can be further coupled to other electrons7–9 or physically shuttled across different locations10,11, these results establish a viable route for scalable quantum information processing using donor nuclear and electron spins.

More Details

Thermodynamically consistent physics-informed neural networks for hyperbolic systems

Journal of Computational Physics

Patel, Ravi G.; Manickam, Indu; Trask, Nathaniel A.; Wood, Mitchell A.; Lee, Myoungkyu N.; Tomas, Ignacio T.; Cyr, Eric C.

Physics-informed neural network architectures have emerged as a powerful tool for developing flexible PDE solvers that easily assimilate data. When applied to problems in shock physics however, these approaches face challenges related to the collocation-based PDE discretization underpinning them. By instead adopting a least squares space-time control volume scheme, we obtain a scheme which more naturally handles: regularity requirements, imposition of boundary conditions, entropy compatibility, and conservation, substantially reducing requisite hyperparameters in the process. Additionally, connections to classical finite volume methods allows application of inductive biases toward entropy solutions and total variation diminishing properties. For inverse problems in shock hydrodynamics, we propose inductive biases for discovering thermodynamically consistent equations of state that guarantee hyperbolicity. This framework therefore provides a means of discovering continuum shock models from molecular simulations of rarefied gases and metals. The output of the learning process provides a data-driven equation of state which may be incorporated into traditional shock hydrodynamics codes.

More Details

Calibration of elastoplastic constitutive model parameters from full-field data with automatic differentiation-based sensitivities

International Journal for Numerical Methods in Engineering

Seidl, D.T.; Granzow, Brian N.

We present a framework for calibration of parameters in elastoplastic constitutive models that is based on the use of automatic differentiation (AD). The model calibration problem is posed as a partial differential equation-constrained optimization problem where a finite element (FE) model of the coupled equilibrium equation and constitutive model evolution equations serves as the constraint. The objective function quantifies the mismatch between the displacement predicted by the FE model and full-field digital image correlation data, and the optimization problem is solved using gradient-based optimization algorithms. Forward and adjoint sensitivities are used to compute the gradient at considerably less cost than its calculation from finite difference approximations. Through the use of AD, we need only to write the constraints in terms of AD objects, where all of the derivatives required for the forward and inverse problems are obtained by appropriately seeding and evaluating these quantities. We present three numerical examples that verify the correctness of the gradient, demonstrate the AD approach's parallel computation capabilities via application to a large-scale FE model, and highlight the formulation's ease of extensibility to other classes of constitutive models.

More Details

Measuring the capabilities of quantum computers

Nature Physics

Proctor, Timothy J.; Rudinger, Kenneth M.; Young, Kevin; Nielsen, Erik N.; Blume-Kohout, Robin J.

Quantum computers can now run interesting programs, but each processor’s capability—the set of programs that it can run successfully—is limited by hardware errors. These errors can be complicated, making it difficult to accurately predict a processor’s capability. Benchmarks can be used to measure capability directly, but current benchmarks have limited flexibility and scale poorly to many-qubit processors. We show how to construct scalable, efficiently verifiable benchmarks based on any program by using a technique that we call circuit mirroring. With it, we construct two flexible, scalable volumetric benchmarks based on randomized and periodically ordered programs. We use these benchmarks to map out the capabilities of twelve publicly available processors, and to measure the impact of program structure on each one. We find that standard error metrics are poor predictors of whether a program will run successfully on today’s hardware, and that current processors vary widely in their sensitivity to program structure.

More Details

Logical and Physical Reversibility of Conservative Skyrmion Logic

IEEE Magnetics Letters

Hu, Xuan; Walker, Benjamin W.; Garcia-Sanchez, Felipe; Edwards, Alexander J.; Zhou, Peng; Incorvia, Jean A.; Paler, Alexandru; Frank, Michael P.; Friedman, Joseph S.

Magnetic skyrmions are nanoscale whirls of magnetism that can be propagated with electrical currents. The repulsion between skyrmions inspires their use for reversible computing based on the elastic billiard ball collisions proposed for conservative logic in 1982. In this letter, we evaluate the logical and physical reversibility of this skyrmion logic paradigm, as well as the limitations that must be addressed before dissipation-free computation can be realized.

More Details

Leveraging Production Visualization Tools In Situ

Mathematics and Visualization

Moreland, Kenneth D.; Bauer, Andrew C.; Geveci, Berk; O’Leary, Patrick; Whitlock, Brad

The visualization community has invested decades of research and development into producing large-scale production visualization tools. Although in situ is a paradigm shift for large-scale visualization, much of the same algorithms and operations apply regardless of whether the visualization is run post hoc or in situ. Thus, there is a great benefit to taking the large-scale code originally designed for post hoc use and leveraging it for use in situ. This chapter describes two in situ libraries, Libsim and Catalyst, that are based on mature visualization tools, VisIt and ParaView, respectively. Because they are based on fully-featured visualization packages, they each provide a wealth of features. For each of these systems we outline how the simulation and visualization software are coupled, what the runtime behavior and communication between these components are, and how the underlying implementation works. We also provide use cases demonstrating the systems in action. Both of these in situ libraries, as well as the underlying products they are based on, are made freely available as open-source products. The overviews in this chapter provide a toehold to the practical application of in situ visualization.

More Details

Reverse-mode differentiation in arbitrary tensor network format: with application to supervised learning

Journal of Machine Learning Research

Gorodetsky, Alex A.; Safta, Cosmin S.; Jakeman, John D.

This paper describes an efficient reverse-mode differentiation algorithm for contraction operations for arbitrary and unconventional tensor network topologies. The approach leverages the tensor contraction tree of Evenbly and Pfeifer (2014), which provides an instruction set for the contraction sequence of a network. We show that this tree can be efficiently leveraged for differentiation of a full tensor network contraction using a recursive scheme that exploits (1) the bilinear property of contraction and (2) the property that trees have a single path from root to leaves. While differentiation of tensor-tensor contraction is already possible in most automatic differentiation packages, we show that exploiting these two additional properties in the specific context of contraction sequences can improve eficiency. Following a description of the algorithm and computational complexity analysis, we investigate its utility for gradient-based supervised learning for low-rank function recovery and for fitting real-world unstructured datasets. We demonstrate improved performance over alternating least-squares optimization approaches and the capability to handle heterogeneous and arbitrary tensor network formats. When compared to alternating minimization algorithms, we find that the gradient-based approach requires a smaller oversampling ratio (number of samples compared to number model parameters) for recovery. This increased efficiency extends to fitting unstructured data of varying dimensionality and when employing a variety of tensor network formats. Here, we show improved learning using the hierarchical Tucker method over the tensor-train in high-dimensional settings on a number of benchmark problems.

More Details

Characterizing Midcircuit Measurements on a Superconducting Qubit Using Gate Set Tomography

Physical Review Applied

Rudinger, Kenneth M.; Ribeill, Guilhem J.; Govia, Luke C.G.; Ware, Matthew; Nielsen, Erik N.; Young, Kevin; Ohki, Thomas A.; Blume-Kohout, Robin J.; Proctor, Timothy J.

Measurements that occur within the internal layers of a quantum circuit—midcircuit measurements—are a useful quantum-computing primitive, most notably for quantum error correction. Midcircuit measurements have both classical and quantum outputs, so they can be subject to error modes that do not exist for measurements that terminate quantum circuits. Here we show how to characterize midcircuit measurements, modeled by quantum instruments, using a technique that we call quantum instrument linear gate set tomography (QILGST). We then apply this technique to characterize a dispersive measurement on a superconducting transmon qubit within a multiqubit system. By varying the delay time between the measurement pulse and subsequent gates, we explore the impact of residual cavity photon population on measurement error. QILGST can resolve different error modes and quantify the total error from a measurement; in our experiment, for delay times above 1000ns we measure a total error rate (i.e., half diamond distance) of ϵ⋄=8.1±1.4%, a readout fidelity of 97.0±0.3%, and output quantum-state fidelities of 96.7±0.6% and 93.7±0.7% when measuring 0 and 1, respectively.

More Details

FROSch PRECONDITIONERS FOR LAND ICE SIMULATIONS OF GREENLAND AND ANTARCTICA

SIAM Journal on Scientific Computing

Heinlein, Alexander; Perego, Mauro P.; Rajamanickam, Sivasankaran R.

Numerical simulations of Greenland and Antarctic ice sheets involve the solution of large-scale highly nonlinear systems of equations on complex shallow geometries. This work is concerned with the construction of Schwarz preconditioners for the solution of the associated tangent problems, which are challenging for solvers mainly because of the strong anisotropy of the meshes and wildly changing boundary conditions that can lead to poorly constrained problems on large portions of the domain. Here, two-level generalized Dryja-Smith-Widlund (GDSW)-type Schwarz preconditioners are applied to different land ice problems, i.e., a velocity problem, a temperature problem, as well as the coupling of the former two problems. We employ the message passing interface (MPI)- parallel implementation of multilevel Schwarz preconditioners provided by the package FROSch (fast and robust Schwarz) from the Trilinos library. The strength of the proposed preconditioner is that it yields out-of-the-box scalable and robust preconditioners for the single physics problems. To the best of our knowledge, this is the first time two-level Schwarz preconditioners have been applied to the ice sheet problem and a scalable preconditioner has been used for the coupled problem. The preconditioner for the coupled problem differs from previous monolithic GDSW preconditioners in the sense that decoupled extension operators are used to compute the values in the interior of the subdomains. Several approaches for improving the performance, such as reuse strategies and shared memory OpenMP parallelization, are explored as well. In our numerical study we target both uniform meshes of varying resolution for the Antarctic ice sheet as well as nonuniform meshes for the Greenland ice sheet. We present several weak and strong scaling studies confirming the robustness of the approach and the parallel scalability of the FROSch implementation. Among the highlights of the numerical results are a weak scaling study for up to 32 K processor cores (8 K MPI ranks and 4 OpenMP threads) and 566 M degrees of freedom for the velocity problem as well as a strong scaling study for up to 4 K processor cores (and MPI ranks) and 68 M degrees of freedom for the coupled problem.

More Details

Characterizing Memory Failures Using Benford’s Law

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Ferreira, Kurt B.; Levy, Scott

Fault tolerance is a key challenge as high performance computing systems continue to increase component counts, individual component reliability decreases, and hardware and software complexity increases. To better understand the potential impacts of failures on next-generation systems, significant effort has been devoted to collecting, characterizing and analyzing failures on current systems. These studies require large volumes of data and complex analysis in an attempt to identify statistical properties of the failure data. In this paper, we examine the lifetime of failures on the Cielo supercomputer that was located at Los Alamos National Laboratory, looking specifically at the time between faults on this system. Through this analysis, we show that the time between uncorrectable faults for this system obeys Benford’s law, This law applies to a number of naturally occurring collections of numbers and states that the leading digit is more likely to be small, for example a leading digit of 1 is more likely than 9. We also show that a number of common distributions used to model failures also follow this law. This work provides critical analysis on the distribution of times between failures for extreme-scale systems. Specifically, the analysis in this work could be used as a simple form of failure prediction or used for modeling realistic failures.

More Details

Orthogonal Polynomials Defined by Self-Similar Measures with Overlaps

Experimental Mathematics

Ngai, Sze M.; Tang, Wei; Tran, Anh; Yuan, Shuai

We study orthogonal polynomials with respect to self-similar measures, focusing on the class of infinite Bernoulli convolutions, which are defined by iterated function systems with overlaps, especially those defined by the Pisot, Garsia, and Salem numbers. By using an algorithm of Mantica, we obtain graphs of the coefficients of the 3-term recursion relation defining the orthogonal polynomials. We use these graphs to predict whether the singular infinite Bernoulli convolutions belong to the Nevai class. Based on our numerical results, we conjecture that all infinite Bernoulli convolutions with contraction ratios greater than or equal to 1/2 belong to Nevai’s class, regardless of the probability weights assigned to the self-similar measures.

More Details

Multifidelity data fusion in convolutional encoder/decoder assembly networks for computational fluid dynamics

AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022

Partin, Lauren; Rushdi, Ahmad R.; Schiavazzi, Daniele E.

We analyze the regression accuracy of convolutional neural networks assembled from encoders, decoders and skip connections and trained with multifidelity data. These networks benefit from a significant reduction in the number of trainable parameters with respect to an equivalent fully connected network. These architectures are also versatile with respect to the input and output dimensionality. For example, encoder-decoder, decoder-encoder or decoder-encoder-decoder architectures are well suited to learn mappings between input and outputs of any dimensionality. We demonstrate the accuracy produced by such architectures when trained on a few high-fidelity and many low-fidelity data generated from models ranging from one-dimensional functions to Poisson equation solvers in two-dimensions. We finally discuss a number of implementation choices that improve the reliability of the uncertainty estimates generated by a dropblock regularizer, and compare uncertainty estimates among low-, high-and multi-fidelity approaches.

More Details

Gas-kinetic simulations of compressible turbulence over a mean-free-path-scale porous wall

AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022

McMullen, Ryan M.; Krygier, Michael K.; Torczynski, J.R.; Gallis, Michail A.

We report flow statistics and visualizations from gas-kinetic simulations using the Direct Simulation Monte Carlo (DSMC) method of compressible turbulent Couette flow over a porous substrate composed of an array of circular cylinders for which the Knudsen number is O(10-1). Comparisons are made with both smooth-wall DSMC simulations and direct numerical simulations of the Navier-Stokes equations for the same conditions. Roughness, permeability, and noncontinuum effects are assessed.

More Details

Evaluating the Sustainability of Computational Science and Engineering Software: Empirical Observations

Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE

Willenbring, James M.; Walia, Gursimran S.

Software sustainability is critical for Computational Science and Engineering (CSE) software. It is also challenging due to factors ranging from funding models to the typical lifecycle of a research code to the inherent challenges of running fast on the newest architectures. Furthermore, measuring sustainability is challenging because sustainability consists of many complex attributes. To identify useful metrics for measuring CSE software sustainability, we gathered data from multiple freely available sources, including GitHub, SLOCCount, and Metrix++. This paper discusses the challenges practitioners face when measuring the sustainability of CSE software. We present an analysis of data with associated observations and future directions to better understand CSE software sustainability and how this work can be used to support decisions and improve sustainability by observing trends in metrics over time.

More Details

Mesostructure Evolution During Powder Compression: Micro-CT Experiments and Particle-Based Simulations

Conference Proceedings of the Society for Experimental Mechanics Series

Cooper, Marcia A.; Clemmer, Joel T.; Silling, Stewart A.; Bufford, Daniel C.; Bolintineanu, Dan S.

Powders under compression form mesostructures of particle agglomerations in response to both inter- and intra-particle forces. The ability to computationally predict the resulting mesostructures with reasonable accuracy requires models that capture the distributions associated with particle size and shape, contact forces, and mechanical response during deformation and fracture. The following report presents experimental data obtained for the purpose of validating emerging mesostructures simulated by discrete element method and peridynamic approaches. A custom compression apparatus, suitable for integration with our micro-computed tomography (micro-CT) system, was used to collect 3-D scans of a bulk powder at discrete steps of increasing compression. Details of the apparatus and the microcrystalline cellulose particles, with a nearly spherical shape and mean particle size, are presented. Comparative simulations were performed with an initial arrangement of particles and particle shapes directly extracted from the validation experiment. The experimental volumetric reconstruction was segmented to extract the relative positions and shapes of individual particles in the ensemble, including internal voids in the case of the microcrystalline cellulose particles. These computationally determined particles were then compressed within the computational domain and the evolving mesostructures compared directly to those in the validation experiment. The ability of the computational models to simulate the experimental mesostructures and particle behavior at increasing compression is discussed.

More Details

A FETI approach to domain decomposition for meshfree discretizations of nonlocal problems

Computer Methods in Applied Mechanics and Engineering

Xu, Xiao; Glusa, Christian A.; D'Elia, Marta D.; Foster, John T.

We propose a domain decomposition method for the efficient simulation of nonlocal problems. Our approach is based on a multi-domain formulation of a nonlocal diffusion problem where the subdomains share “nonlocal” interfaces of the size of the nonlocal horizon. This system of nonlocal equations is first rewritten in terms of minimization of a nonlocal energy, then discretized with a meshfree approximation and finally solved via a Lagrange multiplier approach in a way that resembles the finite element tearing and interconnect method. Specifically, we propose a distributed projected gradient algorithm for the solution of the Lagrange multiplier system, whose unknowns determine the nonlocal interface conditions between subdomains. Several two-dimensional numerical tests on problems as large as 191 million unknowns illustrate the strong and the weak scalability of our algorithm, which outperforms the standard approach to the distributed numerical solution of the problem. This work is the first rigorous numerical study in a two-dimensional multi-domain setting for nonlocal operators with finite horizon and, as such, it is a fundamental step towards increasing the use of nonlocal models in large scale simulations.

More Details

Crack nucleation at forging flaws studied by non-local peridynamics simulations

Mathematics and Mechanics of Solids

Karim, Mohammad K.; Narasimhachary, Santosh N.; Radailli, Francesco R.; Amann, Christian A.; Dayal, Kaushik D.; Silling, Stewart A.; Germann, Tim G.

In this study, we present a computational study and framework that allows us to study and understand the crack nucleation process from forging flaws. Forging flaws may be present in large steel rotor components commonly used for rotating power generation equipment including gas turbines, electrical generators, and steam turbines. The service life of these components is often limited by crack nucleation and subsequent growth from such forging flaws, which frequently exhibit themselves as non-metallic oxide inclusions. The fatigue crack growth process can be described by established engineering fracture mechanics methods. However, the initial crack nucleation process from a forging flaw is challenging for traditional engineering methods to quantify as it depends on the details of the flaw, including flaw morphology. We adopt the peridynamics method to describe and study this crack nucleation process. For a specific industrial gas turbine rotor steel, we present how we integrate and fit commonly known base material property data such as elastic properties, yield strength, and S-N curves, as well as fatigue crack growth data into a peridynamic model. The obtained model is then utilized in a series of high-performance two-dimensional peridynamic simulations to study the crack nucleation process from forging flaws for ambient and elevated temperatures in a rectangular simulation cell specimen. The simulations reveal an initial local nucleation at multiple small oxide inclusions followed by micro-crack propagation, arrest, coalescence, and eventual emergence of a dominant micro-crack that governs the crack nucleation process. The dependence on temperature and density of oxide inclusions of both the details of the microscopic processes and cycles to crack nucleation is also observed. Finally, the results are compared with fatigue experiments performed with specimens containing forging flaws of the same rotor steel.

More Details
Results 101–150 of 9,998
Results 101–150 of 9,998