Publications

Results 101–125 of 9,998
Skip to search filters

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems

Moon, Gordon E.; Kwon, Hyoukjun; Jeong, Geonhwa; Chatarasi, Prasanth; Rajamanickam, Sivasankaran R.; Krishna, Tushar

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.

More Details

Large-Scale Atomistic Simulations: Investigating Free Expansion

Moore, Stan G.

The experiment investigates free expansion of a supercritical fluid into a two-phase liquid-vapor coexistence region. A huge molecular dynamics simulation (6 billion Lennard-Jones atoms) was run on 5760 GPUs (33% of LLNL Sierra) using LAMMPS/Kokkos software. This improved visualization workflow and started preliminary simulations of aluminum using SNAP machine learning potential.

More Details

Elucidating size effects on the yield strength of single-crystal Cu via the Richtmyer–Meshkov instability

Journal of Applied Physics

Stewart, James A.; Wood, Mitchell A.; Olles, Joseph O.

Capturing the dynamic response of a material under high strain-rate deformation often demands challenging and time consuming experimental effort. While shock hydrodynamic simulation methods can aid in this area, a priori characterizations of the material strength under shock loading and spall failure are needed in order to parameterize constitutive models needed for these computational tools. Moreover, parameterizations of strain-rate-dependent strength models are needed to capture the full suite of Richtmyer–Meshkov instability (RMI) behavior of shock compressed metals, creating an unrealistic demand for these training data solely on experiments. Herein, we sweep a large range of geometric, crystallographic, and shock conditions within molecular dynamics (MD) simulations and demonstrate the breadth of RMI in Cu that can be captured from the atomic scale. In this work, yield strength measurements from jetted and arrested material from a sinusoidal surface perturbation were quantified as YRMI = 0.787 ± 0.374 GPa, higher than strain-rate-independent models used in experimentally matched hydrodynamic simulations. Defect-free, single-crystal Cu samples used in MD will overestimate YRMI, but the drastic scale difference between experiment and MD is highlighted by high confidence neighborhood clustering predictions of RMI characterizations, yielding incorrect classifications.

More Details

First-passage time statistics on surfaces of general shape: Surface PDE solvers using Generalized Moving Least Squares (GMLS)

Journal of Computational Physics

Gross, B.J.; Kuberry, Paul A.; Atzberger, P.J.

We develop numerical methods for computing statistics of stochastic processes on surfaces of general shape with drift-diffusion dynamics dXt=a(Xt)dt+b(Xt)dWt. We formulate descriptions of Brownian motion and general drift-diffusion processes on surfaces. We consider statistics of the form u(x)=Ex[∫0τg(Xt)dt]+Ex[f(Xτ)] for a domain Ω and the exit stopping time τ=inft⁡{t>0|Xt∉Ω}, where f,g are general smooth functions. For computing these statistics, we develop high-order Generalized Moving Least Squares (GMLS) solvers for associated surface PDE boundary-value problems based on Backward-Kolmogorov equations. We focus particularly on the mean First Passage Times (FPTs) given by the case f=0,g=1 where u(x)=Ex[τ]. We perform studies for a variety of shapes showing our methods converge with high-order accuracy both in capturing the geometry and the surface PDE solutions. We then perform studies showing how statistics are influenced by the surface geometry, drift dynamics, and spatially dependent diffusivities.

More Details

Three-Photon Optical Pumping for Trapped Ion Quantum Computing

Hogle, Craig W.; Ivory, Megan K.; Lobser, Daniel L.; Ruzic, Brandon R.; DeRose, Christopher T.

In this report we describe the testing of a novel scheme for state preparation of trapped ions in a quantum computing setup. This technique optimally would allow for similar precision and speed of state preparation while allowing for individual addressability of single ions in a chain using technology already available in a trapped ion experiment. As quantum computing experiments become more complicated, mid-experiment measurements will become necessary to achieve algorithms such as quantum error correction. Any mid-experiment measurement then requires the measured qubit to be re-prepared to a known quantum state. Currently this involves the protected qubits to be moved a sizeable distance away from the qubit being re-prepared which can be costly in terms of experiment length as well as introducing errors. Theoretical calculations predict that a three-photon process would allow for state preparation without qubit movement with similar efficiencies to current state preparation methods.

More Details

Navier-Stokes Equations Do Not Describe the Smallest Scales of Turbulence in Gases

Physical Review Letters

McMullen, Ryan M.; Krygier, Michael K.; Torczynski, J.R.; Gallis, Michail A.

In turbulent flows, kinetic energy is transferred from the largest scales to progressively smaller scales, until it is ultimately converted into heat. The Navier-Stokes equations are almost universally used to study this process. Here, by comparing with molecular-gas-dynamics simulations, we show that the Navier-Stokes equations do not describe turbulent gas flows in the dissipation range because they neglect thermal fluctuations. We investigate decaying turbulence produced by the Taylor-Green vortex and find that in the dissipation range the molecular-gas-dynamics spectra grow quadratically with wave number due to thermal fluctuations, in agreement with previous predictions, while the Navier-Stokes spectra decay exponentially. Furthermore, the transition to quadratic growth occurs at a length scale much larger than the gas molecular mean free path, namely in a regime that the Navier-Stokes equations are widely believed to describe. In fact, our results suggest that the Navier-Stokes equations are not guaranteed to describe the smallest scales of gas turbulence for any positive Knudsen number.

More Details

A theoretical investigation of the hydrolysis of uranium hexafluoride: the initiation mechanism and vibrational spectroscopy

Physical Chemistry Chemical Physics. PCCP

Lutz, Jesse J.; byrd, jason b.; Lotrich, Victor L.; Jensen, Daniel S.; Zador, Judit Z.; Hubbard, Joshua A.

Depleted uranium hexafluoride (UF6), a stockpiled byproduct of the nuclear fuel cycle, reacts readily with atmospheric humidity, but the mechanism is poorly understood. Here we compare several potential initiation steps at a consistent level of theory, generating underlying structures and vibrational modes using hybrid density functional theory (DFT) and computing relative energies of stationary points with double-hybrid (DH) DFT. A benchmark comparison is performed to assess the quality of DH-DFT data using reference energy differences obtained using a complete-basis-limit coupled-cluster (CC) composite method. The associated large-basis CC computations were enabled by a new general-purpose pseudopotential capability implemented as part of this work. Dispersion-corrected parameter-free DH-DFT methods, namely PBE0-DH-D3(BJ) and PBE-QIDH-D3(BJ), provided mean unsigned errors within chemical accuracy (1 kcal mol-1) for a set of barrier heights corresponding to the most energetically favorable initiation steps. The hydrolysis mechanism is found to proceed via intermolecular hydrogen transfer within van der Waals complexes involving UF6, UF5OH, and UOF4, in agreement with previous studies, followed by the formation of a previously unappreciated dihydroxide intermediate, UF4(OH)2. The dihydroxide is predicted to form under both kinetic and thermodynamic control, and, unlike the alternate pathway leading to the UO2F2 monomer, its reaction energy is exothermic, in agreement with observation. Finally, harmonic and anharmonic vibrational simulations are performed to reinterpret literature infrared spectroscopy in light of this newly identified species.

More Details

A hybrid meshfree discretization to improve the numerical performance of peridynamic models

Computer Methods in Applied Mechanics and Engineering

Shojaei, Arman; Hermann, Alexander; Cyron, Christian J.; Seleson, Pablo; Silling, Stewart A.

Efficient and accurate calculation of spatial integrals is of major interest in the numerical implementation of peridynamics (PD). The standard way to perform this calculation is a particle-based approach that discretizes the strong form of the PD governing equation. This approach has rapidly been adopted by the PD community since it offers some advantages. It is computationally cheaper than other available schemes, can conveniently handle material separation, and effectively deals with nonlinear PD models. Nevertheless, PD models are still computationally very expensive compared with those based on the classical continuum mechanics theory, particularly for large-scale problems in three dimensions. This results from the nonlocal nature of the PD theory which leads to interactions of each node of a discretized body with multiple surrounding nodes. Here, we propose a new approach to significantly boost the numerical efficiency of PD models. We propose a discretization scheme that employs a simple collocation procedure and is truly meshfree; i.e., it does not depend on any background integration cells. In contrast to the standard scheme, the proposed scheme requires a much smaller set of neighboring nodes (keeping the same physical length scale) to achieve a specific accuracy and is thus computationally more efficient. Our new scheme is applicable to the case of linear PD models and within neighborhoods where the solution can be approximated by smooth basis functions. Therefore, to fully exploit the advantages of both the standard and the proposed schemes, a hybrid discretization is presented that combines both approaches within an adaptive framework. The high performance of the developed framework is illustrated by several numerical examples, including brittle fracture and corrosion problems in two and three dimensions.

More Details

Krylov subspace recycling for evolving structures

Computer Methods in Applied Mechanics and Engineering

Bolten, M.; de Sturler, E.; Hahn, C.; Parks, Michael L.

Krylov subspace recycling is a powerful tool when solving a long series of large, sparse linear systems that change only slowly over time. In PDE constrained shape optimization, these series appear naturally, as typically hundreds or thousands of optimization steps are needed with only small changes in the geometry. In this setting, however, applying Krylov subspace recycling can be a difficult task. As the geometry evolves, in general, so does the finite element mesh defined on or representing this geometry, including the numbers of nodes and elements and element connectivity. This is especially the case if re-meshing techniques are used. As a result, the number of algebraic degrees of freedom in the system changes, and in general the linear system matrices resulting from the finite element discretization change size from one optimization step to the next. Changes in the mesh connectivity also lead to structural changes in the matrices. In the case of re-meshing, even if the geometry changes only a little, the corresponding mesh might differ substantially from the previous one. Obviously, this prevents any straightforward mapping of the approximate invariant subspace of the linear system matrix (the focus of recycling in this paper) from one optimization step to the next; similar problems arise for other selected subspaces. In this paper, we present an algorithm to map an approximate invariant subspace of the linear system matrix for the previous optimization step to an approximate invariant subspace of the linear system matrix for the current optimization step, for general meshes. This is achieved by exploiting the map from coefficient vectors to finite element functions on the mesh, combined with interpolation or approximation of functions on the finite element mesh. We demonstrate the effectiveness of our approach numerically with several proof of concept studies for a specific meshing technique.

More Details

Pyomo.GDP: an ecosystem for logic based modeling and optimization development

Optimization and Engineering

Chen, Qi; Johnson, Emma S.; Bernal, David E.; Valentin, Romeo; Kale, Sunjeev; Bates, Johnny; Siirola, John D.; Grossmann, Ignacio E.

We present three core principles for engineering-oriented integrated modeling and optimization tool sets—intuitive modeling contexts, systematic computer-aided reformulations, and flexible solution strategies—and describe how new developments in Pyomo.GDP for Generalized Disjunctive Programming (GDP) advance this vision. We describe a new logical expression system implementation for Pyomo.GDP allowing for a more intuitive description of logical propositions. The logical expression system supports automated reformulation of these logical constraints to linear constraints. We also describe two new logic-based global optimization solver implementations built on Pyomo.GDP that exploit logical structure to avoid “zero-flow” numerical difficulties that arise in nonlinear network design problems when nodes or streams disappear. These new solvers also demonstrate the capability to link to external libraries for expanded functionality within an integrated implementation. We present these new solvers in the context of a flexible array of solution paths available to GDP models. Finally, we present results on a new library of GDP models demonstrating the value of multiple solution approaches.

More Details

Hole in one: Pathways to deterministic single-acceptor incorporation in Si(100)-2 × 1

AVS Quantum Science

Campbell, Quinn C.; Baczewski, Andrew D.; Butera, R.E.; Misra, Shashank M.

Stochastic incorporation kinetics can be a limiting factor in the scalability of semiconductor fabrication technologies using atomic-precision techniques. While these technologies have recently been extended from donors to acceptors, the extent to which kinetics will impact single-acceptor incorporation has yet to be assessed. To identify the precursor molecule and dosing conditions that are promising for deterministic incorporation, we develop and apply an atomistic model for the single-acceptor incorporation rates of several recently demonstrated molecules: diborane (B2H6), boron trichloride (BCl3), and aluminum trichloride in both monomer (AlCl3) and dimer forms (Al2Cl6). While all three precursors can realize single-acceptor incorporation, we predict that diborane is unlikely to realize deterministic incorporation, boron trichloride can realize deterministic incorporation with modest heating (50 °C), and aluminum trichloride can realize deterministic incorporation at room temperature. We conclude that both boron and aluminum trichloride are promising precursors for atomic-precision single-acceptor applications, with the potential to enable the reliable production of large arrays of single-atom quantum devices.

More Details

Atomic step disorder on polycrystalline surfaces leads to spatially inhomogeneous work functions

Journal of Vacuum Science and Technology A

Bussmann, Ezra B.; smith, sean w.; Scrymgeour, David S.; Brumbach, Michael T.; Lu, Ping L.; Dickens, Sara D.; Michael, Joseph R.; Ohta, Taisuke O.; Hjalmarson, Harold P.; Schultz, Peter A.; Clem, Paul G.; Hopkins, Matthew M.; Moore, Christopher M.

Structural disorder causes materials’ surface electronic properties, e.g., work function ([Formula: see text]), to vary spatially, yet it is challenging to prove exact causal relationships to underlying ensemble disorder, e.g., roughness or granularity. For polycrystalline Pt, nanoscale resolution photoemission threshold mapping reveals a spatially varying [Formula: see text] eV over a distribution of (111) vicinal grain surfaces prepared by sputter deposition and annealing. With regard to field emission and related phenomena, e.g., vacuum arc initiation, a salient feature of the [Formula: see text] distribution is that it is skewed with a long tail to values down to 5.4 eV, i.e., far below the mean, which is exponentially impactful to field emission via the Fowler–Nordheim relation. We show that the [Formula: see text] spatial variation and distribution can be explained by ensemble variations of granular tilts and surface slopes via a Smoluchowski smoothing model wherein local [Formula: see text] variations result from spatially varying densities of electric dipole moments, intrinsic to atomic steps, that locally modify [Formula: see text]. Atomic step-terrace structure is confirmed with scanning tunneling microscopy (STM) at several locations on our surfaces, and prior works showed STM evidence for atomic step dipoles at various metal surfaces. From our model, we find an atomic step edge dipole [Formula: see text] D/edge atom, which is comparable to values reported in studies that utilized other methods and materials. Our results elucidate a connection between macroscopic [Formula: see text] and the nanostructure that may contribute to the spread of reported [Formula: see text] for Pt and other surfaces and may be useful toward more complete descriptions of polycrystalline metals in the models of field emission and other related vacuum electronics phenomena, e.g., arc initiation.

More Details

Finding Electronic Structure Machine Learning Surrogates without Training

Fiedler, Lenz F.; Hoffmann, Nils H.; Mohammed, Parvez M.; Popoola, Gabriel A.; Yovell, Tamar Y.; Oles, Vladyslav O.; Ellis, Austin E.; Rajamanickam, Sivasankaran R.; Cangi, Attila -.

A myriad of phenomena in materials science and chemistry rely on quantum-level simulations of the electronic structure in matter. While moving to larger length and time scales has been a pressing issue for decades, such large-scale electronic structure calculations are still challenging despite modern software approaches and advances in high-performance computing. The silver lining in this regard is the use of machine learning to accelerate electronic structure calculations – this line of research has recently gained growing attention. The grand challenge therein is finding a suitable machine-learning model during a process called hyperparameter optimization. This, however, causes a massive computational overhead in addition to that of data generation. We accelerate the construction of machine-learning surrogate models by roughly two orders of magnitude by circumventing excessive training during the hyperparameter optimization phase. We demonstrate our workflow for Kohn-Sham density functional theory, the most popular computational method in materials science and chemistry.

More Details

Unified Memory: GPGPU-Sim/UVM Smart Integration

Liu, Yechen L.; Rogers, Timothy R.; Hughes, Clayton H.

CPU/GPU heterogeneous compute platforms are an ubiquitous element in computing and a programming model specified for this heterogeneous computing model is important for both performance and programmability. A programming model that exposes the shared, unified, address space between the heterogeneous units is a necessary step in this direction as it removes the burden of explicit data movement from the programmer while maintaining performance. GPU vendors, such as AMD and NVIDIA, have released software-managed runtimes that can provide programmers the illusion of unified CPU and GPU memory by automatically migrating data in and out of the GPU memory. However, this runtime support is not included in GPGPU-Sim, a commonly used framework that models the features of a modern graphics processor that are relevant to non-graphics applications. UVM Smart was developed, which extended GPGPU-Sim 3.x to in- corporate the modeling of on-demand pageing and data migration through the runtime. This report discusses the integration of UVM Smart and GPGPU-Sim 4.0 and the modifications to improve simulation performance and accuracy.

More Details

Randomized Cholesky Preconditioning for Graph Partitioning Applications

Espinoza, Heliezer J.; Loe, Jennifer A.; Boman, Erik G.

Graph partitioning has emerged as an area of interest due to its use in various applications in computational research. One way to partition a graph is to solve for the eigenvectors of the corresponding graph Laplacian matrix. This project focuses on the eigensolver LOBPCG and the evaluation of a new preconditioner: Randomized Cholesky Factorization (rchol). This proconditioner was tested for its speed and accuracy against other well-known preconditioners for the method. After experiments were run on several known test matrices, rchol appears to be a better preconditioner for structured matrices. This research was sponsored by National Nuclear Security Administration Minority Serving Institutions Internship Program (NNSA-MSIIP) and completed at host facility Sandia National Laboratories. As such, after discussion of the research project itself, this report contains a brief reflection on experience gained as a result of participating in the NNSA-MSIIP.

More Details

A silicon singlet–triplet qubit driven by spin-valley coupling

Nature Communications

Jock, Ryan M.; Jacobson, Noah T.; Rudolph, Martin R.; Ward, Daniel R.; Carroll, Malcolm S.; Luhman, Dwight R.

Spin–orbit effects, inherent to electrons confined in quantum dots at a silicon heterointerface, provide a means to control electron spin qubits without the added complexity of on-chip, nanofabricated micromagnets or nearby coplanar striplines. Here, we demonstrate a singlet–triplet qubit operating mode that can drive qubit evolution at frequencies in excess of 200 MHz. This approach offers a means to electrically turn on and off fast control, while providing high logic gate orthogonality and long qubit dephasing times. We utilize this operational mode for dynamical decoupling experiments to probe the charge noise power spectrum in a silicon metal-oxide-semiconductor double quantum dot. In addition, we assess qubit frequency drift over longer timescales to capture low-frequency noise. We present the charge noise power spectral density up to 3 MHz, which exhibits a 1/fα dependence consistent with α ~ 0.7, over 9 orders of magnitude in noise frequency.

More Details

A Block-Based Triangle Counting Algorithm on Heterogeneous Environments

IEEE Transactions on Parallel and Distributed Systems

Yasar, Abdurrahman; Rajamanickam, Sivasankaran R.; Berry, Jonathan W.; Catalyurek, Umit V.

Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.

More Details

Neuromorphic scaling advantages for energy-efficient random walk computations

Nature Electronics

Smith, John D.; Hill, Aaron J.; Reeder, Leah E.; Franke, Brian C.; Lehoucq, Richard B.; Parekh, Ojas D.; Severa, William M.; Aimone, James B.

Neuromorphic computing, which aims to replicate the computational structure and architecture of the brain in synthetic hardware, has typically focused on artificial intelligence applications. What is less explored is whether such brain-inspired hardware can provide value beyond cognitive tasks. Here we show that the high degree of parallelism and configurability of spiking neuromorphic architectures makes them well suited to implement random walks via discrete-time Markov chains. These random walks are useful in Monte Carlo methods, which represent a fundamental computational tool for solving a wide range of numerical computing tasks. Using IBM’s TrueNorth and Intel’s Loihi neuromorphic computing platforms, we show that our neuromorphic computing algorithm for generating random walk approximations of diffusion offers advantages in energy-efficient computation compared with conventional approaches. We also show that our neuromorphic computing algorithm can be extended to more sophisticated jump-diffusion processes that are useful in a range of applications, including financial economics, particle physics and machine learning.

More Details

Assessing the predictive impact of factor fixing with an adaptive uncertainty-based approach

Environmental Modelling and Software

Wang, Qian; Guillaume, Joseph H.A.; Jakeman, John D.; Yang, Tao; Iwanaga, Takuya; Croke, Barry; Jakeman, Anthony J.

Despite widespread use of factor fixing in environmental modeling, its effect on model predictions has received little attention and is instead commonly presumed to be negligible. We propose a proof-of-concept adaptive method for systematically investigating the impact of factor fixing. The method uses Global Sensitivity Analysis methods to identify groups of sensitive parameters, then quantifies which groups can be safely fixed at nominal values without exceeding a maximum acceptable error, demonstrated using the 21-dimensional Sobol’ G-function. Three error measures are considered for quantities of interest, namely Relative Mean Absolute Error, Pearson Product-Moment Correlation and Relative Variance. Results demonstrate that factor fixing may cause large errors in the model results unexpectedly, when preliminary analysis suggests otherwise, and that the default value selected affects the number of factors to fix. To improve the applicability and methodological development of factor fixing, a new research agenda encompassing five opportunities is discussed for further attention.

More Details

A data-driven peridynamic continuum model for upscaling molecular dynamics

Computer Methods in Applied Mechanics and Engineering

You, Huaiqian; Yu, Yue; Silling, Stewart A.; D'Elia, Marta D.

Nonlocal models, including peridynamics, often use integral operators that embed lengthscales in their definition. However, the integrands in these operators are difficult to define from the data that are typically available for a given physical system, such as laboratory mechanical property tests. In contrast, molecular dynamics (MD) does not require these integrands, but it suffers from computational limitations in the length and time scales it can address. To combine the strengths of both methods and to obtain a coarse-grained, homogenized continuum model that efficiently and accurately captures materials’ behavior, we propose a learning framework to extract, from MD data, an optimal Linear Peridynamic Solid (LPS) model as a surrogate for MD displacements. To maximize the accuracy of the learnt model we allow the peridynamic influence function to be partially negative, while preserving the well-posedness of the resulting model. To achieve this, we provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions and develop a constrained optimization algorithm that minimizes the equation residual while enforcing such solvability conditions. This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training. We illustrate the efficacy of the proposed approach with several numerical tests for single layer graphene. Our two-dimensional tests show the robustness of the proposed algorithm on validation data sets that include thermal noise, different domain shapes and external loadings, and discretizations substantially different from the ones used for training.

More Details

LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales

Computer Physics Communications

Thompson, Aidan P.; Aktulga, H.M.; Berger, Richard; Bolintineanu, Dan S.; Brown, W.M.; Crozier, Paul C.; in 't Veld, Pieter J.; Kohlmeyer, Axel; Moore, Stan G.; Nguyen, Trung D.; Shan, Ray; Stevens, Mark J.; Tranchida, Julien; Trott, Christian R.; Plimpton, Steven J.

Since the classical molecular dynamics simulator LAMMPS was released as an open source code in 2004, it has become a widely-used tool for particle-based modeling of materials at length scales ranging from atomic to mesoscale to continuum. Reasons for its popularity are that it provides a wide variety of particle interaction models for different materials, that it runs on any platform from a single CPU core to the largest supercomputers with accelerators, and that it gives users control over simulation details, either via the input script or by adding code for new interatomic potentials, constraints, diagnostics, or other features needed for their models. As a result, hundreds of people have contributed new capabilities to LAMMPS and it has grown from fifty thousand lines of code in 2004 to a million lines today. In this paper several of the fundamental algorithms used in LAMMPS are described along with the design strategies which have made it flexible for both users and developers. We also highlight some capabilities recently added to the code which were enabled by this flexibility, including dynamic load balancing, on-the-fly visualization, magnetic spin dynamics models, and quantum-accuracy machine learning interatomic potentials. Program Summary: Program Title: Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) CPC Library link to program files: https://doi.org/10.17632/cxbxs9btsv.1 Developer's repository link: https://github.com/lammps/lammps Licensing provisions: GPLv2 Programming language: C++, Python, C, Fortran Supplementary material: https://www.lammps.org Nature of problem: Many science applications in physics, chemistry, materials science, and related fields require parallel, scalable, and efficient generation of long, stable classical particle dynamics trajectories. Within this common problem definition, there lies a great diversity of use cases, distinguished by different particle interaction models, external constraints, as well as timescales and lengthscales ranging from atomic to mesoscale to macroscopic. Solution method: The LAMMPS code uses parallel spatial decomposition, distributed neighbor lists, and parallel FFTs for long-range Coulombic interactions [1]. The time integration algorithm is based on the Størmer-Verlet symplectic integrator [2], which provides better stability than higher-order non-symplectic methods. In addition, LAMMPS supports a wide range of interatomic potentials, constraints, diagnostics, software interfaces, and pre- and post-processing features. Additional comments including restrictions and unusual features: This paper serves as the definitive reference for the LAMMPS code. References: [1] S. Plimpton, Fast parallel algorithms for short-range molecular dynamics. J. Comp. Phys. 117 (1995) 1–19. [2] L. Verlet, Computer experiments on classical fluids: I. Thermodynamical properties of Lennard–Jones molecules, Phys. Rev. 159 (1967) 98–103.

More Details

Randomized Cholesky Preconditioning for Graph Partitioning Applications

Espinoza, Heliezer J.; Loe, Jennifer A.; Boman, Erik G.

A graph is a mathematical representation of a network; we say it consists of a set of vertices, which are connected by edges. Graphs have numerous applications in various fields, as they can model all sorts of connections, processes, or relations. For example, graphs can model intricate transit systems or the human nervous system. However, graphs that are large or complicated become difficult to analyze. This is why there is an increased interest in the area of graph partitioning, reducing the size of the graph into multiple partitions. For example, partitions of a graph representing a social network might help identify clusters of friends or colleagues. Graph partitioning is also a widely used approach to load balancing in parallel computing. The partitioning of a graph is extremely useful to decompose the graph into smaller parts and allow for easier analysis. There are different ways to solve graph partitioning problems. For this work, we focus on a spectral partitioning method which forms a partition based upon the eigenvectors of the graph Laplacian (details presented in Acer, et. al.). This method uses the LOBPCG algorithm to compute these eigenvectors. LOBPCG can be accelerated by an operator called a preconditioner. For this internship, we evaluate a randomized Cholesky (rchol) preconditioner for its effectiveness on graph partitioning problems with LOBPCG. We compare it with two standard preconditioners: Jacobi and Incomplete Cholesky (ichol). This research was conducted from August to December 2021 in conjunction with Sandia National Laboratories.

More Details

Zero-Truncated Poisson Tensor Decomposition for Sparse Count Data

Lopez, Oscar L.; Lehoucq, Richard B.; Dunlavy, Daniel D.

We propose a novel statistical inference paradigm for zero-inflated multiway count data that dispenses with the need to distinguish between true and false zero counts. Our approach ignores all zero entries and applies zero-truncated Poisson regression on the positive counts. Inference is accomplished via tensor completion that imposes low-rank structure on the Poisson parameter space. Our main result shows that an $\textit{N}$-way rank-R parametric tensor 𝓜 ϵ (0, ∞)$I$Χ∙∙∙Χ$I$ generating Poisson observations can be accurately estimated from approximately $IR^2 \text{log}^2_2(I)$ non-zero counts for a nonnegative canonical polyadic decomposition. Several numerical experiments are presented demonstrating that our zero-truncated paradigm is comparable to the ideal scenario where the locations of false zero counts are known $\textit{a priori}$.

More Details

Machine-Learning of Nonlocal Kernels for Anomalous Subsurface Transport from Breakthrough Curves

D'Elia, Marta D.; Glusa, Christian A.; Xu, Xiao X.; Foster, John E.

Anomalous behavior is ubiquitous in subsurface solute transport due to the presence of high degrees of heterogeneity at different scales in the media. Although fractional models have been extensively used to describe the anomalous transport in various subsurface applications, their application is hindered by computational challenges. Simpler nonlocal models characterized by integrable kernels and finite interaction length represent a computationally feasible alternative to fractional models; yet, the informed choice of their kernel functions still remains an open problem. We propose a general data-driven framework for the discovery of optimal kernels on the basis of very small and sparse data sets in the context of anomalous subsurface transport. Using spatially sparse breakthrough curves recovered from fine-scale particle-density simulations, we learn the best coarse-scale nonlocal model using a nonlocal operator regression technique. Predictions of the breakthrough curves obtained using the optimal nonlocal model show good agreement with fine-scale simulation results even at locations and time intervals different from the ones used to train the kernel, confirming the excellent generalization properties of the proposed algorithm. A comparison with trained classical models and with black-box deep neural networks confirms the superiority of the predictive capability of the proposed model.

More Details

Sequential optical response suppression for chemical mixture characterization

Quantum

Magann, Alicia B.; McCaul, Gerard M.; Rabitz, Herschel R.; Bondar, Denys B.

The characterization of mixtures of non-interacting, spectroscopically similar quantum components has important applications in chemistry, biology, and materials science. We introduce an approach based on quantum tracking control that allows for determining the relative concentrations of constituents in a quantum mixture, using a single pulse which enhances the distinguishability of components of the mixture and has a length that scales linearly with the number of mixture constituents. To illustrate the method, we consider two very distinct model systems: mixtures of diatomic molecules in the gas phase, as well as solid-state materials composed of a mixture of components. A set of numerical analyses are presented, showing strong performance in both settings.

More Details
Results 101–125 of 9,998
Results 101–125 of 9,998