In traditional molecular dynamics (MD) simulations, atoms and coarse-grained particles are modeled as point masses interacting via isotropic potentials. For studies where particle shape plays a vital role, more complex models are required. In this paper we describe a spectrum of approaches for modeling aspherical particles, all of which are now available (some recently) as options within the LAMMPS MD package. Broadly these include two classes of models. In the first, individual particles are aspherical, either via a pairwise anisotropic potential which implicitly assigns a simple geometric shape to each particle, or in a more general way where particles store internal state which can explicitly define a complex geometric shape. In the second class of models, individual particles are simple points or spheres, but rigid body constraints are used to create composite aspherical particles in a variety of complex shapes. We discuss parallel algorithms and associated data structures for both kinds of models, which enable dynamics simulations of aspherical particle systems across a wide range of length and time scales. We also highlight parallel performance and scalability and give a few illustrative examples of aspherical models in different contexts.
A mechanical model is introduced for predicting the initiation and evolution of complex fracture patterns without the need for a damage variable or law. The model, a continuum variant of Newton’s second law, uses integral rather than partial differential operators where the region of integration is over finite domain. The force interaction is derived from a novel nonconvex strain energy density function, resulting in a nonmonotonic material model. The resulting equation of motion is proved to be mathematically well-posed. The model has the capacity to simulate nucleation and growth of multiple, mutually interacting dynamic fractures. In the limit of zero region of integration, the model reproduces the classic Griffith model of brittle fracture. The simplicity of the formulation avoids the need for supplemental kinetic relations that dictate crack growth or the need for an explicit damage evolution law.
As demands for memory-intensive applications continue to grow, the memory capacity of each computing node is expected to grow at a similar pace. In high-performance computing (HPC) systems, the memory capacity per compute node is decided upon the most demanding application that would likely run on such system, and hence the average capacity per node in future HPC systems is expected to grow significantly. However, since HPC systems run many applications with different capacity demands, a large percentage of the overall memory capacity will likely be underutilized; memory modules can be thought of as private memory for its corresponding computing node. Thus, as HPC systems are moving towards the exascale era, a better utilization of memory is strongly desired. Moreover, upgrading memory system requires significant efforts. Fortunately, disaggregated memory systems promise better utilization by defining regions of global memory, typically referred to as memory blades, which can be accessed by all computing nodes in the system, thus achieving much better utilization. Disaggregated memory systems are expected to be built using dense, power-efficient memory technologies. Thus, emerging nonvolatile memories (NVMs) are placing themselves as the main building blocks for such systems. However, NVMs are slower than DRAM. Therefore, it is expected that each computing node would have a small local memory that is based on either HBM or DRAM, whereas a large shared NVM memory would be accessible by all nodes. Managing such system with global and local memory requires a novel hardware/software co-design to initiate page migration between global and local memory to maximize performance while enabling access to huge shared memory. In this paper we provide support to migrate pages, investigate such memory management aspects and the major system-level aspects that can affect design decisions in disaggregated NVM systems
The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Fuel Cycle Technology (FCT) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling. These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) control account, which is charged with developing a geologic repository system modeling and analysis capability, and the associated software, GDSA Framework, for evaluating disposal system performance for nuclear waste in geologic media. GDSA Framework is supported by SFWST Campaign and its predecessor the Used Fuel Disposition (UFD) campaign.
Harden and optimize the ROCm based AMD GPU backend, develop a prototype backend for the Intel ECP Path Forward architecture, and improve the existing prototype Remote Memory Space capabilities.
This report documents the completion of milestone STPM12-19 Documented Kokkos application usecases. The goal of this milestone was to develop use case examples for common patterns users implement with Kokkos. This work was performed in the fourth quarter of FY19 and resulted in use case descriptions available in the Kokkos Wiki, with code examples.
This report documents the completion of milestone STPRO4-26 Engaging the C++ Committee. The Kokkos team attended the three C++ Committee meetings in San Diego, Hawaii, and Cologne with multiple members, updated multiple in-flight proposals (e.g. MDSpan, atomic ref), contributed to numerous proposals central for future capabilities in C++ (e.g. executors, affinity) and organized a new effort to introduce a Basic Linear Algebra library into the C++ standard. We also implemented a production quality version of mdspan as the basis for replacing the vast majority of the implementation of Kokkos::View, and thus start the transitioning of one of the core features in Kokkos to its future replacement.
This report documents the completion of milestone STPRO4-25 Harden and optimize the ROCm based AMD GPU backend, develop a prototype backend for the Intel ECP Path Forward architecture, and improve the existing prototype Remote Memory Space capabilities. The ROCM code was hardened up to the point of passing all Kokkos unit tests - then AMD deprecated the programming model, forcing us to start over in FY20 with HIP. The Intel ECP Path Forward architecture prototype was developed with some initial capabilities on simulators - but plans changed, so that work will not continue. Instead SYCL will be developed as a backend for Aurora. Remote Spaces was improved. Development is ongoing part of a collaboration with NVIDIA.
An overview of the Uintah code and benchmark case is presented. Unitah provides a parallel, adaptive, multi-physics framework and solves time-dependent PDEs in parallel.
SNAP potentials are inter-atomic potentials for molecular dynamics that enable simulations at accuracy levels comparable to density functional theory(DFT) at a fraction of the cost. As such, SNAP scales to on the order of 104 — 106 atoms. In this work, we explore CPU optimization of potentials computation using SIMD. We note that efficient use of SIMD is non-obvious as the application features an irregular iteration space for various potential terms, necessitating use of SIMD across atoms in a cross matrix, batched fashion. We present a preliminary analytic model to determine the correct batch size for several CPU architectures across several vendors, and show end-to-end speedups between 1.66x and 3.22x compared to the original.
Density functional theory (DFT) is undergoing a shift from a descriptive to a predictive tool in the field of solid state physics, heralded by a spike in “high-throughput” studies. However, methods to rigorously evaluate the validity and accuracy of these studies is lacking, raising serious questions when simulation and experiment disagree. In response, we have developed the V-DM/16 test set, designed to evaluate the experimental accuracy of DFT’s various implementations for pe riodic transition metal solids. Our test set evaluates 26 transition metal elements and 80 transition metal alloys across three physical observables: lattice constants, elastic coefficients, and formation energy of alloys. Whether or not a functional can accurately evaluate the formation energy offers key insights into whether the relevant physics are being captured in a simulation, an especially impor tant question in transition metals where active d-electrons can thwart the accuracy of an otherwise well-performing functional. Our test set captures a wide variety of cases where the unique physics present in transition metal binaries can undermine the effectiveness of “traditional” functionals. By application of the V/DM-16 test set, we aim to better characterize the performance of existing functionals on transition metals, and to offer a new tool to rigorously evaluate the performance of new functionals in the future.
Magnetically driven experiments supporting pulsed-power utilize a wide range of configurations, including wire-arrays, gas-puffs, flyer plates, and cylindrical liners. This experimental flexibility is critical to supporting radiation effects, dynamic materials, magneto-inertial-fusion (MIF), and basic high energy density laboratory physics (HEDP) efforts. Ultimately, the rate at which these efforts progress is limited by our understanding of the complex plasma physics of these systems. Our effort has been to begin to develop an advanced algorithmic structure and a R&D code implementation for a plasma physics simulation capability based on the five-moment multi-fluid / full-Maxwell plasma model. This model can be used for inclusion of multiple fluid species (e.g., electrons, multiple charge state ions, and neutrals) and allows for generalized collisional interactions between species, models for ionization/recombination, magnetized Braginskii collisional transport, dissipative effects, and can be readily extended to incorporate radiation transport physics. In the context of pulsed-power simulations this advanced model will help to allow SNL to computationally simulate the dense continuum regions of the physical load (e.g. liner implosions, flyer plates) as well as partial power-flow losses in the final gap region of the inner MITL. In this report we briefly summarize results of applying a preliminary version of this model in the context of verification type problems, and some initial magnetic implosion relevant prototype problems. The MIF relevant prototype problems include results from fully-implicit / implicit-explicit (IMEX) resistive MHD as well as full multifluid EM plasma formulations.
The effort to develop larger-scale computing systems introduces a set of related challenges: Large machines are more difficult to synchronize. The sheer quantity of hardware introduces more opportunities for errors. New approaches to hardware, such as low-energy or neuromorphic devices are not directly programmable by traditional methods.
Concern over Arctic methane (CH4) emissions has increased following recent discoveries of poorly understood sources and predictions that methane emissions from known sources will grow as Arctic temperatures increase. New efforts are required to detect increases and explain sources without being confounded by the multiple sources. Methods for distinguishing different sources are critical. We conducted measurements of atmospheric methane and source tracers and performed baseline global atmospheric modeling to begin assessing the climate impact of changes in atmospheric methane. The goal of this project was to address uncertainties in Arctic methane sources and their potential impact on climate by (1) deploying newly developed trace-gas analyzers for measurements of methane, methane isotopologues, ethane, and other tracers of methane sources in the Barrow, AK, (2) characterizing methane sources using high-resolution atmospheric chemical transport models and tracer measurements, and (3) modeling Arctic climate using the state-of-the-art high- resolution Spectral Element Community Atmosphere Model (CAM-SE).
As new memory technologies appear on the market, there is a growing push to incorporate them into future architectures. Compared to traditional DDR DRAM, these technologies provide appealing advantages such as increased bandwidth or non-volatility. However, the technologies have significant downsides as well including higher cost, manufacturing complexity, and for non-volatile memories, higher latency and wear-out limitations. As such, no technology has emerged as a clear technological and economic winner. As a result, systems are turning to the concept of multi-level memory, or mixing multiple memory technologies in a single system to balance cost, performance, and reliability.
This document outlines the gradient-based digital image correlation (DIC) formulation used in DICe, the Digital Image Correlation Engine (Sandia’s open source DIC code). The gradient-based algorithm implemented in DICe directly reflects the formulation presented here. Every effort is made to point out any simplifications or assumptions involved in the implementation. The focus of this document is on determination of the motion parameters. Computing strain is not discussed herein.
The ethical, legal, and social issues (ELSI) surrounding Artificial Intelligence (AI) can have as great of an impact on the technologies’ success as technical issues such as safety, reliability, and security. Addressing these risks can counter potential program failures, legal and ethical battles, constraints to scientific research, and product vulnerabilities. This paper presents a surety engineering framework and process that can be applied to AI to identify and address technical, ethical, legal and societal risks. Extending sound engineering practices to incorporate a method to “engineer” ELSI can offer the scientific rigor required to significantly reduce the risk of AI vulnerabilities. Modeling the specification, design, evaluation and quality/risk indicators for AI provides a foundation for a risk-informed decision process that can benefit researchers and stakeholders alike as they use it to critically examine both substantial and intangible risks.
The review was conducted on May 9-10, 2016 at the University of Utah. Overall the review team was impressed with the work presented and found that the CCMSC had met or exceeded the Year 2 milestones. Specific details, comments and recommendations are included in this document.
The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Spent Fuel & Waste Disposition (SFWD) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling (DOE 2011, Table 6). These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) work package, which is charged with developing a disposal system modeling and analysis capability for evaluating disposal system performance for nuclear waste in geologic media.
Sandia has a legacy of leadership in the advancement of high performance computing (HPC) at extreme scales. First-of-a-kind scalable distributed-memory parallel platforms such as the Intel Paragon, ASCI Red (the world’s first teraflops computer), and Red Storm (co-developed with Cray) helped form the basis for one of the most successful supercomputer product lines ever: the Cray XT series. Sandia also has pioneered system software elements—including lightweight operating systems, the Portals network programming interface, advanced interconnection network designs, and scalable I/O— that are critical to achieving scalability on large computing systems.
To achieve exascale computing, fundamental hardware architectures must change. The most significant consequence of this assertion is the impact on the scientific and engineering applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. In order to adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency into the future. While many details of the exascale architectures are undefined, an abstract machine model is designed to allow application developers to focus on the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. We use the term proxy architecture to describe a parameterized version of an abstract machine model, with the parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models are formulated to enable discussion between the developers of analytic models and simulators and computer hardware architects. They allow for application performance analysis and hardware optimization opportunities. In this report our goal is to provide the application development community with a set of models that can help software developers prepare for exascale. In addition, through the use of proxy architectures, we can enable a more concrete exploration of how well new and evolving application codes map onto future architectures. This second version of the document addresses system scale considerations and provides a system-level abstract machine model with proxy architecture information.
Although its demise has been frequently predicted, the Message Passing Interface (MPI) remains the dominant programming model for scientific applications running on high-performance computing (HPC) systems. MPI specifies powerful semantics for interprocess communication that have enabled scientists to write applications for simulating important physical phenomena. However, these semantics have also presented several significant challenges. For example, the existence of wildcard values has made the efficient enforcement of MPI message matching semantics challenging. Significant research has been dedicated to accelerating MPI message matching. One common approach has been to offload matching to dedicated hardware. One of the challenges that hardware designers have faced is knowing how to size hardware structures to accommodate outstanding match requests. Applications that exceed the capacity of specialized hardware typically must fall back to storing match requests in bulk memory, e.g. DRAM on the host processor. In this paper, we examine the implications of hardware matching and develop guidance on sizing hardware matching structure to strike a balance between minimizing expensive dedicated hardware resources and overall matching performance. By examining the message matching behavior of several important HPC workloads, we show that when specialized hardware matching is not dramatically faster than matching in memory the offload hardware's match queue capacity can be reduced without significantly increasing match time. On the other hand, effectively exploiting the benefits of very fast specialized matching hardware requires sufficient storage resources to ensure that every search completes in the specialized hardware. The data and analysis in this paper provide important guidance for designers of MPI message matching hardware.
We will develop Malliavin estimators for Monte Carlo radiation transport by formulating the governing jump stochastic differential equation and deriving the applicable estimators that produce sensitivities for our equations. Efficient and effective sensitivity can be used for design optimization and uncertainty quantification with broad utilization for radiation environments. The technology demonstration will lower development risk for other particle-based simulation methods.
This manuscript comprises the final report for the 1-year, FY19 LDRD project "Rigorous Data Fusion for Computationally Expensive Simulations," wherein an alternative approach to Bayesian calibration was developed based a new sampling technique called VoroSpokes. Vorospokes is a novel quadrature and sampling framework defined with respect to Voronoi tessellations of bounded domains in $R^d$ developed within this project. In this work, we first establish local quadrature and sampling results on convex polytopes using randomly directed rays, or spokes, to approximate the quantities of interest for a specified target function. A theoretical justification for both procedures is provided along with empirical results demonstrating the unbiased convergence in the resulting estimates/samples. The local quadrature and sampling procedures are then extended to global procedures defined on more general domains by applying the local results to the cells of a Voronoi tessellation covering the domain in consideration. We then demonstrate how the proposed global sampling procedure can be used to define a natural framework for adaptively constructing Voronoi Piecewise Surrogate (VPS) approximations based on local error estimates. Finally, we show that the adaptive VPS procedure can be used to form a surrogate model approximation to a specified, potentially unnormalized, density function, and that the global sampling procedure can be used to efficiently draw independent samples from the surrogate density in parallel. The performance of the resulting VoroSpokes sampling framework is assessed on a collection of Bayesian inference problems and is shown to provide highly accurate posterior predictions which align with the results obtained using traditional methods such as Gibbs sampling and random-walk Markov Chain Monte Carlo (MCMC). Importantly, the proposed framework provides a foundation for performing Bayesian inference tasks which is entirely independent from the theory of Markov chains.
This report documents the completion of milestone STPM12-17 Kokkos Training Bootcamp. The goal of this milestone was to hold a combined tutorial and hackathon bootcamp event for the Kokkos community and prospective users. The Kokkos Bootcamp event was held at Argonne National Laboratories from August 27 — August 29, 2019. Attendance being lower than expected (we believe largely due to bad timing), the team focused with a select set of ECP partners on early work in preparation for Aurora. In particular we evaluated issues posed by exposing SYCL and OpenMP target offload to applications via the Kokkos Pro Model.
Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of massive amounts of data have created a space for massively parallel accelerators capable of maintaining context for thousands of concurrent threads resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. One path for the design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. The SST framework has been proven to scale up to run simulations containing tens of thousands of nodes. A previous report described the initial integration of the open-source, execution-driven GPU simulator, GPGPU-Sim, into the SST framework. This report discusses the results of the integration and how to use the new GPU component in SST. It also provides examples of what it can be used to analyze and a correlation study showing how closely the execution matches that of a Nvidia V100 GPU when running kernels and mini-apps.
This report outlines the fiscal year (FY) 2019 status of an ongoing multi-year effort to develop a general, microstructurally-aware, continuum-level model for representing the dynamic response of material with complex microstructures. This work has focused on accurately representing the response of both conventionally wrought processed and additively manufactured (AM) 304L stainless steel (SS) as a test case. Additive manufacturing, or 3D printing, is an emerging technology capable of enabling shortened design and certification cycles for stockpile components through rapid prototyping. However, there is not an understanding of how the complex and unique microstructures of AM materials affect their mechanical response at high strain rates. To achieve our project goal, an upscaling technique was developed to bridge the gap between the microstructural and continuum scales to represent AM microstructures on a Finite Element (FE) mesh. This process involves the simulations of the additive process using the Sandia developed kinetic Monte Carlo (KMC) code SPPARKS. These SPPARKS microstructures are characterized using clustering algorithms from machine learning and used to populate the quadrature points of a FE mesh. Additionally, a spall kinetic model (SKM) was developed to more accurately represent the dynamic failure of AM materials. Validation experiments were performed using both pulsed power machines and projectile launchers. These experiments have provided equation of state (EOS) and flow strength measurements of both wrought and AM 304L SS to above Mbar pressures. In some experiments, multi-point interferometry was used to quantify the variation is observed material response of the AM 304L SS. Analysis of these experiments is ongoing, but preliminary comparisons of our upscaling technique and SKM to experimental data were performed as a validation exercise. Moving forward, this project will advance and further validate our computational framework, using advanced theory and additional high-fidelity experiments.
This report summarizes the work performed under a three year LDRD project aiming to develop mathematical and software foundations for compatible meshfree and particle discretizations. We review major technical accomplishments and project metrics such as publications, conference and colloquia presentations and organization of special sessions and minisimposia. The report concludes with a brief summary of ongoing projects and collaborations that utilize the products of this work.