Structure preserving architectures for SciML
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The credibility of an engineering model is of critical importance in large-scale projects. How concerned should an engineer be when reusing someone else's model when they may not know the author or be familiar with the tools that were used to create it? In this report, the authors advance engineers' capabilities for assessing models through examination of the underlying semantic structure of a model--the ontology. This ontology defines the objects in a model, types of objects, and relationships between them. In this study, two advances in ontology simplification and visualization are discussed and are demonstrated on two systems engineering models. These advances are critical steps toward enabling engineering models to interoperate, as well as assessing models for credibility. For example, results of this research show an 80% reduction in file size and representation size, dramatically improving the throughput of graph algorithms applied to the analysis of these models. Finally, four future problems are outlined in ontology research toward establishing credible models--ontology discovery, ontology matching, ontology alignment, and model assessment.
Abstract not provided.
Abstract not provided.
Physical Review Research
Optimally-shaped electromagnetic fields have the capacity to coherently control the dynamics of quantum systems and thus offer a promising means for controlling molecular transformations relevant to chemical, biological, and materials applications. Currently, advances in this area are hindered by the prohibitive cost of the quantum dynamics simulations needed to explore the principles and possibilities of molecular control. However, the emergence of nascent quantum-computing devices suggests that efficient simulations of quantum dynamics may be on the horizon. In this article, we study how quantum computers could be employed to design optimally-shaped fields to control molecular systems. We introduce a hybrid algorithm that utilizes a quantum computer for simulating the field-induced quantum dynamics of a molecular system in polynomial time, in combination with a classical optimization approach for updating the field. Qubit encoding methods relevant for molecular control problems are described, and procedures for simulating the quantum dynamics and obtaining the simulation results are discussed. Numerical illustrations are then presented that explicitly treat paradigmatic vibrational and rotational control problems, and also consider how optimally-shaped fields could be used to elucidate the mechanisms of energy transfer in light-harvesting complexes. Resource estimates, as well as a numerical assessment of the impact of hardware noise and the prospects of near-term hardware implementations, are provided for the latter task.
We present a numerical modeling workflow based on machine learning (ML) which reproduces the total energies produced by Kohn-Sham density functional theory (DFT) at finite electronic temperature to within chemical accuracy at negligible computational cost. Based on deep neural networks, our workflow yields the local density of states (LDOS) for a given atomic configuration. From the LDOS, spatially-resolved, energy-resolved, and integrated quantities can be calculated, including the DFT total free energy, which serves as the Born-Oppenheimer potential energy surface for the atoms. We demonstrate the efficacy of this approach for both solid and liquid metals and compare results between independent and unified machine-learning models for solid and liquid aluminum. Our machine-learning density functional theory framework opens up the path towards multiscale materials modeling for matter under ambient and extreme conditions at a computational scale and cost that is unattainable with current algorithms.
Abstract not provided.
Theoretical and Applied Fracture Mechanics
The peridynamic theory of solid mechanics is applied to the continuum modeling of the impact of small, high-velocity silica spheres on multilayer graphene targets. The model treats the laminate as a brittle elastic membrane. The material model includes separate failure criteria for the initial rupture of the membrane and for propagating cracks. Material variability is incorporated by assigning random variations in elastic properties within Voronoi cells. The computational model is shown to reproduce the primary aspects of the response observed in experiments, including the growth of a family of radial cracks from the point of impact.
Physical Review Letters
We adapt the robust phase estimation algorithm to the evaluation of energy differences between two eigenstates using a quantum computer. This approach does not require controlled unitaries between auxiliary and system registers or even a single auxiliary qubit. As a proof of concept, we calculate the energies of the ground state and low-lying electronic excitations of a hydrogen molecule in a minimal basis on a cloud quantum computer. The denominative robustness of our approach is then quantified in terms of a high tolerance to coherent errors in the state preparation and measurement. Conceptually, we note that all quantum phase estimation algorithms ultimately evaluate eigenvalue differences.
Physical Review B
The first-principles computation of the surfaces of metals is typically accomplished through slab calculations of finite thickness. The extraction of a convergent surface formation energy from slab calculations is dependent upon defining an appropriate bulk reference energy. I describe a method for an independently computed, slab-consistent bulk reference that leads to convergent surface formation energies from slab calculations that also provides realistic uncertainties for the magnitude of unavoidable nonlinear divergence in the surface formation energy with slab thickness. The accuracy is demonstrated on relaxed, unreconstructed low-index aluminum surfaces with slabs with up to 35 layers.
Physical Review B
The stability of low-index platinum surfaces and their electronic properties is investigated with density functional theory, toward the goal of understanding the surface structure and electron emission, and identifying precursors to electrical breakdown, on nonideal platinum surfaces. Propensity for electron emission can be related to a local work function, which, in turn, is intimately dependent on the local surface structure. The (1×N) missing row reconstruction of the Pt(110) surface is systematically examined. The (1×3) missing row reconstruction is found to be the lowest in energy, with the (1×2) and (1×4) slightly less stable. In the limit of large (1×N) with wider (111) nanoterraces, the energy accurately approaches the asymptotic limit of the infinite Pt(111) surface. This suggests a local energetic stability of narrow (111) nanoterraces on free Pt surfaces that could be a common structural feature in the complex surface morphologies, leading to work functions consistent with those on thermally grown Pt substrates.
High-performance computing (HPC) researchers have long envisioned scenarios where application workflows could be improved through the use of programmable processing elements embedded in the network fabric. Recently, vendors have introduced programmable Smart Network Interface Cards (SmartNICs) that enable computations to be offloaded to the edge of the network. There is great interest in both the HPC and high-performance data analytics (HPDA) communities in understanding the roles these devices may play in the data paths of upcoming systems. This paper focuses on characterizing both the networking and computing aspects of NVIDIA’s new BlueField-2 SmartNIC when used in a 100Gb/s Ethernet environment. For the networking evaluation we conducted multiple transfer experiments between processors located at the host, the SmartNIC, and a remote host. These tests illuminate how much effort is required to saturate the network and help estimate the processing headroom available on the SmartNIC during transfers. For the computing evaluation we used the stress-ng benchmark to compare the BlueField-2 to other servers and place realistic bounds on the types of offload operations that are appropriate for the hardware. Our findings from this work indicate that while the BlueField-2 provides a flexible means of processing data at the network’s edge, great care must be taken to not overwhelm the hardware. While the host can easily saturate the network link, the SmartNIC’s embedded processors may not have enough computing resources to sustain more than half the expected bandwidth when using kernel-space packet processing. From a computational perspective, encryption operations, memory operations under contention, and on-card IPC operations on the SmartNIC perform significantly better than the general-purpose servers used for comparisons in our experiments. Therefore, applications that mainly focus on these operations may be good candidates for offloading to the SmartNIC.
2021 International Conference on Applied Artificial Intelligence, ICAPAI 2021
Multivariate time series are used in many science and engineering domains, including health-care, astronomy, and high-performance computing. A recent trend is to use machine learning (ML) to process this complex data and these ML-based frameworks are starting to play a critical role for a variety of applications. However, barriers such as user distrust or difficulty of debugging need to be overcome to enable widespread adoption of such frameworks in production systems. To address this challenge, we propose a novel explainability technique, CoMTE, that provides counterfactual explanations for supervised machine learning frameworks on multivariate time series data. Using various machine learning frameworks and data sets, we compare CoMTE with several state-of-the-art explainability methods and show that we outperform existing methods in comprehensibility and robustness. We also show how CoMTE can be used to debug machine learning frameworks and gain a better understanding of the underlying multivariate time series data.
Physical Review B
Using the local moment counter charge (LMCC) method to accurately represent the asymptotic electrostatic boundary conditions within density functional theory supercell calculations, we present a comprehensive analysis of the atomic structure and energy levels of point defects in cubic silicon carbide (3C-SiC). Finding that the classical long-range dielectric screening outside the supercell induced by a charged defect is a significant contributor to the total energy. we describe and validate a modified Jost screening model to evaluate this polarization energy. This leads to bulk-converged defect levels in finite size supercells. With the LMCC boundary conditions and a standard Perdew-Burke-Ernzerhof (PBE) exchange correlation functional, the computed defect level spectrum exhibits no band gap problem: the range of defect levels spans ∼2.4eV, an effective defect band gap that agrees with the experimental band gap. Comparing with previous literature, our LMCC-PBE defect results are in consistent agreement with the hybrid-exchange functional results of Oda et al. [J. Chem. Phys. 139, 124707 (2013)JCPSA60021-960610.1063/1.4821937] rather than their PBE results. The difference with their PBE results is attributed to their use of a conventional jellium approximation rather than the more rigorous LMCC approach for handling charged supercell boundary conditions. The difference between standard dft and hybrid functional results for defect levels lies not in a band gap problem but rather in solving a boundary condition problem. The LMCC-PBE entirely mitigates the effect of the band gap problem on defect levels. The more computationally economical PBE enables a systematic exploration of 3C-SiC defects, where, most notably, we find that the silicon vacancy undergoes Jahn-Teller-induced distortions from the previously assumed Td symmetry, and that the divacancy, like the silicon vacancy, exhibits a site-shift bistability in p-type conditions.
SIAM Journal on Scientific Computing
We present a numerical framework for recovering unknown nonautonomous dynamical systems with time-dependent inputs. To circumvent the difficulty presented by the nonautonomous nature of the system, our method transforms the solution state into piecewise integration of the system over a discrete set of time instances. The time-dependent inputs are then locally parameterized by using a proper model, for example, polynomial regression, in the pieces determined by the time instances. This transforms the original system into a piecewise parametric system that is locally time invariant. We then design a deep neural network structure to learn the local models. Once the network model is constructed, it can be iteratively used over time to conduct global system prediction. We provide theoretical analysis of our algorithm and present a number of numerical examples to demonstrate the effectiveness of the method.
Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021
Remote Direct Memory Access (RDMA) capabilities have been provided by high-end networks for many years, but the network environments surrounding RDMA are evolving. RDMA performance has historically relied on using strict ordering guarantees to determine when data transfers complete, but modern adaptively-routed networks no longer provide those guarantees. RDMA also exposes low-level details about memory buffers: either all clients are required to coordinate access using a single shared buffer, or exclusive resources must be allocatable per-client for an unbounded amount of time. This makes RDMA unattractive for use in many-to-one communication models such as those found in public internet client-server situations.Remote Virtual Memory Access (RVMA) is a novel approach to data transfer which adapts and builds upon RDMA to provide better usability, resource management, and fault tolerance. RVMA provides a lightweight completion notification mechanism which addresses RDMA performance penalties imposed by adaptively-routed networks, enabling high-performance data transfer regardless of message ordering. RVMA also provides receiver-side resource management, abstracting away previously-exposed details from the sender-side and removing the RDMA requirement for exclusive/coordinated resources. RVMA requires only small hardware modifications from current designs, provides performance comparable or superior to traditional RDMA networks, and offers many new features.In this paper, we describe RVMA's receiver-managed resource approach and how it enables a variety of new data-transfer approaches on high-end networks. In particular, we demonstrate how an RVMA NIC could implement the first hardware-based fault tolerant RDMA-like solution. We present the design and validation of an RVMA simulation model in a popular simulation suite and use it to evaluate the advantages of RVMA at large scale. In addition to support for adaptive routing and easy programmability, RVMA can outperform RDMA on a 3D sweep application by 4.4X.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
We describe the accomplishments jointly achieved by Kitware and Sandia over the fiscal years 2016 through 2020 to benefit the Advanced Scientific Computed (ASC) Advanced Technology Development and Mitigation (ATDM) project. As a result of our collaboration, we have improved the Trilinos and ATDM application developer experience by decreasing the time to build, making it easier to identify and resolve build and test defects, and addressing other issues . We have also reduced the turnaround time for continuous integration (CI) results. For example, the combined improvements likely cut the wall clock time to run automated builds of Trilinos posting to CDash by approximately 6x or more in many cases. We primarily achieved these benefits by contributing changes to the Kitware CMake/CTest/CDash suite of open source software development support tools. As a result, ASC developers can now spend more time improving code and less time chasing bugs. And, without this work, one can argue that the stabilization of Trilinos for the ATDM platforms would not have been feasible which would have had a large negative impact on an important internal FY20 L1 milestone.
Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021
Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored in a variety of compression formats. We demonstrate that both the compactness of different compression formats and the compute efficiency of the algorithms enabled by them vary across tensor dimensions and amount of sparsity. Since DL and scientific workloads span across all sparsity regions, there can be numerous format combinations for optimizing memory and compute efficiency. Unfortunately, many proposed accelerators operate on one or two fixed format combinations. This work proposes hardware extensions to accelerators for supporting numerous format combinations seamlessly and demonstrates ∼ 4 × speedup over performing format conversions in software.
Abstract not provided.
Abstract not provided.
IEEE Transactions on Nuclear Science
Integration-technology feature shrink increases computing-system susceptibility to single-event effects (SEE). While modeling SEE faults will be critical, an integrated processor's scope makes physically correct modeling computationally intractable. Without useful models, presilicon evaluation of fault-tolerance approaches becomes impossible. To incorporate accurate transistor-level effects at a system scope, we present a multiscale simulation framework. Charge collection at the 1) device level determines 2) circuit-level transient duration and state-upset likelihood. Circuit effects, in turn, impact 3) register-transfer-level architecture-state corruption visible at 4) the system level. Thus, the physically accurate effects of SEEs in large-scale systems, executed on a high-performance computing (HPC) simulator, could be used to drive cross-layer radiation hardening by design. We demonstrate the capabilities of this model with two case studies. First, we determine a D flip-flop's sensitivity at the transistor level on 14-nm FinFet technology, validating the model against published cross sections. Second, we track and estimate faults in a microprocessor without interlocked pipelined stages (MIPS) processor for Adams 90% worst case environment in an isotropic space environment.
Abstract not provided.
Additive Manufacturing
Grain-scale microstructure evolution during additive manufacturing is a complex physical process. As with traditional solidification methods of material processing (e.g. casting and welding), microstructural properties are highly dependent on the solidification conditions involved. Additive manufacturing processes however, incorporate additional complexity such as remelting, and solid-state evolution caused by subsequent heat source passes and by holding the entire build at moderately high temperatures during a build. We present a three-dimensional model that simulates both solidification and solid-state evolution phenomena using stochastic Monte Carlo and Potts Monte Carlo methods. The model also incorporates a finite-difference based thermal conduction solver to create a fully integrated microstructural prediction tool. The three modeling methods and their coupling are described and demonstrated for a model study of laser powder-bed fusion of 300-series stainless steel. The investigation demonstrates a novel correlation between the mean number of remelting cycles experienced during a build, and the resulting columnar grain sizes.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This is the documentation for the Xyce-PyMi embedded Python model interpreter in Xyce.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Advances in Water Resources
Traditional probabilistic methods for the simulation of advection-diffusion equations (ADEs) often overlook the entropic contribution of the discretization, e.g., the number of particles, within associated numerical methods. Many times, the gain in accuracy of a highly discretized numerical model is outweighed by its associated computational costs or the noise within the data. We address the question of how many particles are needed in a simulation to best approximate and estimate parameters in one-dimensional advective-diffusive transport. To do so, we use the well-known Akaike Information Criterion (AIC) and a recently-developed correction called the Computational Information Criterion (COMIC) to guide the model selection process. Random-walk and mass-transfer particle tracking methods are employed to solve the model equations at various levels of discretization. Numerical results demonstrate that the COMIC provides an optimal number of particles that can describe a more efficient model in terms of parameter estimation and model prediction compared to the model selected by the AIC even when the data is sparse or noisy, the sampling volume is not uniform throughout the physical domain, or the error distribution of the data is non-IID Gaussian.
Abstract not provided.
The Dakota toolkit provides a flexible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantification with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the Dakota toolkit provides a flexible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a users manual for the Dakota software and provides capability overviews and procedures for software execution, as well as a variety of example studies.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Computer Methods in Applied Mechanics and Engineering
Meshfree discretizations of state-based peridynamic models are attractive due to their ability to naturally describe fracture of general materials. However, two factors conspire to prevent meshfree discretizations of state-based peridynamics from converging to corresponding local solutions as resolution is increased: quadrature error prevents an accurate prediction of bulk mechanics, and the lack of an explicit boundary representation presents challenges when applying traction loads. In this paper, we develop a reformulation of the linear peridynamic solid (LPS) model to address these shortcomings, using improved meshfree quadrature, a reformulation of the nonlocal dilatation, and a consistent handling of the nonlocal traction condition to construct a model with rigorous accuracy guarantees. In particular, these improvements are designed to enforce discrete consistency in the presence of evolving fractures, whose a priori unknown location render consistent treatment difficult. In the absence of fracture, when a corresponding classical continuum mechanics model exists, our improvements provide asymptotically compatible convergence to corresponding local solutions, eliminating surface effects and issues with traction loading which have historically plagued peridynamic discretizations. When fracture occurs, our formulation automatically provides a sharp representation of the fracture surface by breaking bonds, avoiding the loss of mass. We provide rigorous error analysis and demonstrate convergence for a number of benchmarks, including manufactured solutions, free-surface, nonhomogeneous traction loading, and composite material problems. Finally, we validate simulations of brittle fracture against a recent experiment of dynamic crack branching in soda-lime glass, providing evidence that the scheme yields accurate predictions for practical engineering problems.
On April 6-8, 2021, Sandia National Laboratories hosted a virtual workshop to explore the potential for developing AI-Enhanced Co-Design for Next-Generation Microelectronics (AICoM). The workshop brought together two themes. The first theme was articulated in the 2018 Department of Energy Office of Science (DOE SC) “Basic Research Needs for Microelectronics” (BRN) report, which called for a “fundamental rethinking” of the traditional design approach to microelectronics, in which subject matter experts (SMEs) in each microelectronics discipline (materials, devices, circuits, algorithms, etc.) work near-independently. Instead, the BRN called for a non-hierarchical, egalitarian vision of co-design, wherein “each scientific discipline informs and engages the others” in “parallel but intimately networked efforts to create radically new capabilities.” The second theme was the recognition of the continuing breakthroughs in artificial intelligence (AI) that are currently enhancing and accelerating the solution of traditional design problems in materials science, circuit design, and electronic design automation (EDA).
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Communications in Computational Physics
Gaussian processes and other kernel-based methods are used extensively to construct approximations of multivariate data sets. The accuracy of these approximations is dependent on the data used. This paper presents a computationally efficient algorithm to greedily select training samples that minimize the weighted Lp error of kernel-based approximations for a given number of data. The method successively generates nested samples, with the goal of minimizing the error in high probability regions of densities specified by users. The algorithm presented is extremely simple and can be implemented using existing pivoted Cholesky factorization methods. Training samples are generated in batches which allows training data to be evaluated (labeled) in parallel. For smooth kernels, the algorithm performs comparably with the greedy integrated variance design but has significantly lower complexity. Numerical experiments demonstrate the efficacy of the approach for bounded, unbounded, multi-modal and non-tensor product densities. We also show how to use the proposed algorithm to efficiently generate surrogates for inferring unknown model parameters from data using Bayesian inference.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Physical Review A
We present an extension to the robust phase estimation protocol, which can identify incorrect results that would otherwise lie outside the expected statistical range. Robust phase estimation is increasingly a method of choice for applications such as estimating the effective process parameters of noisy hardware, but its robustness is dependent on the noise satisfying certain threshold assumptions. We provide consistency checks that can indicate when those thresholds have been violated, which can be difficult or impossible to test directly. We test these consistency checks for several common noise models, and identify two possible checks with high accuracy in locating the point in a robust phase estimation run at which further estimates should not be trusted. One of these checks may be chosen based on resource availability, or they can be used together in order to provide additional verification.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Deep Learning computer vision models require many thousands of properly labelled images for training, which is especially challenging for safeguards and nonproliferation, given that safeguards-relevant images are typically rare due to the sensitivity and limited availability of the technologies. Creating relevant images through real-world staging is costly and limiting in scope. Expert-labeling is expensive, time consuming, and error prone. We aim to develop a data set of both realworld and synthetic images that are relevant to the nuclear safeguards domain that can be used to support multiple data science research questions. In the process of developing this data, we aim to develop a novel workflow to validate synthetic images using machine learning explainability methods, testing among multiple computer vision algorithms, and iterative synthetic data rendering. We will deliver one million images – both real-world and synthetically rendered – of two types uranium storage and transportation containers with labelled ground truth and associated adversarial examples.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Atomistic simulations can capture physics of free expansion into two-phase region.
Physical Review B
We present a methodology based on the Néel model to build a classical spin-lattice Hamiltonian for cubic crystals capable of describing magnetic properties induced by the spin-orbit coupling like magnetocrystalline anisotropy and anisotropic magnetostriction, as well as exchange magnetostriction. Taking advantage of the analytical solutions of the Néel model, we derive theoretical expressions for the parametrization of the exchange integrals and Néel dipole and quadrupole terms that link them to the magnetic properties of the material. This approach allows us to build accurate spin-lattice models with the desired magnetoelastic properties. We also explore a possible way to model the volume dependence of magnetic moment based on the Landau energy. This feature allows us to consider the effects of hydrostatic pressure on the saturation magnetization. We apply this method to develop a spin-lattice model for BCC Fe and FCC Ni, and we show that it accurately reproduces the experimental elastic tensor, magnetocrystalline anisotropy under pressure, anisotropic magnetostrictive coefficients, volume magnetostriction, and saturation magnetization under pressure at zero temperature. This work could constitute a step towards large-scale modeling of magnetoelastic phenomena.
Nuclear Fusion
Erosion of the beryllium first wall material in tokamak reactors has been shown to result in transport and deposition on the tungsten divertor. Experimental studies of beryllium implantation in tungsten indicate that mixed W–Be intermetallic deposits can form, which have lower melting temperatures than tungsten and can trap tritium at higher rates. To better understand the formation and growth rate of these intermetallics, we performed cumulative molecular dynamics (MD) simulations of both high and low energy beryllium deposition in tungsten. In both cases, a W–Be mixed material layer (MML) emerged at the surface within several nanoseconds, either through energetic implantation or a thermally-activated exchange mechanism, respectively. While some ordering of the material into intermetallics occurred, fully ordered structures did not emerge from the deposition simulations. Targeted MD simulations of the MML to further study the rate of Be diffusion and intermetallic growth rates indicate that for both cases, the gradual re-structuring of the material into an ordered intermetallic layer is beyond accessible MD time scales(≤1 μs). However, the rapid formation of the MML within nanoseconds indicates that beryllium deposition can influence other plasma species interactions at the surface and begin to alter the tungsten material properties. Therefore, beryllium deposition on the divertor surface, even in small amounts, is likely to cause significant changes in plasma-surface interactions and will need to be considered in future studies.
Journal of Computational Physics
In this paper we present an alternative approach to the representation of simulation particles for unstructured electrostatic and electromagnetic PIC simulations. In our modified PIC algorithm we represent particles as having a smooth shape function limited by some specified finite radius, r0. A unique feature of our approach is the representation of this shape by surrounding simulation particles with a set of virtual particles with delta shape, with fixed offsets and weights derived from Gaussian quadrature rules and the value of r0. As the virtual particles are purely computational, they provide the additional benefit of increasing the arithmetic intensity of traditionally memory bound particle kernels. The modified algorithm is implemented within Sandia National Laboratories' unstructured EMPIRE-PIC code, for electrostatic and electromagnetic simulations, using periodic boundary conditions. We show results for a representative set of benchmark problems, including electron orbit, a transverse electromagnetic wave propagating through a plasma, numerical heating, and a plasma slab expansion. In this work, good error reduction across all of the chosen problems is achieved as the particles are made progressively smoother, with the optimal particle radius appearing to be problem-dependent.
Physical Review B
The theoretical understanding of plasmon behavior is crucial for an accurate interpretation of inelastic scattering diagnostics in many experiments. We highlight the utility of linear response time-dependent density functional theory (LR-TDDFT) as a first-principles framework for consistently modeling plasmon properties. We provide a comprehensive analysis of plasmons in aluminum from ambient to warm dense matter conditions and assess typical properties such as the dynamical structure factor, the plasmon dispersion, and the plasmon lifetime. We compare our results with scattering measurements and with other TDDFT results as well as models such as the random phase approximation, the Mermin approach, and the dielectric function obtained using static local field corrections of the uniform electron gas parametrized from path-integral Monte Carlo simulations. We conclude that results for the plasmon dispersion and lifetime are inconsistent between experiment and theories and that the common practice of extracting and studying plasmon dispersion relations is an insufficient procedure to capture the complicated physics contained in the dynamic structure factor in its full breadth.
This presentation concludes in situ computation enables new approaches to linear algebra problems which can be both more effective and more efficient as compared to conventional digital systems. Preconditioning is well-suited to analog computation due to the tolerance for approximate solutions. When combined with prior work on in situ MVM for scientific computing, analog preconditioning can enable significant speedups for important linear algebra applications.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Recently, Graph Neural Networks (GNNs) have received a lot of interest because of their success in learning representations from graph structured data. However, GNNs exhibit different compute and memory characteristics compared to traditional Deep Neural Networks (DNNs). Graph convolutions require feature aggregations from neighboring nodes (known as the aggregation phase), which leads to highly irregular data accesses. GNNs also have a very regular compute phase that can be broken down to matrix multiplications (known as the combination phase). All recently proposed GNN accelerators utilize different dataflows and microarchitecture optimizations for these two phases. Different communication strategies between the two phases have been also used. However, as more custom GNN accelerators are proposed, the harder it is to qualitatively classify them and quantitatively contrast them. In this work, we present a taxonomy to describe several diverse dataflows for running GNN inference on accelerators. This provides a structured way to describe and compare the design-space of GNN accelerators.
Medical and Biological Engineering and Computing
Imbalance in the autonomic nervous system can lead to orthostatic intolerance manifested by dizziness, lightheadedness, and a sudden loss of consciousness (syncope); these are common conditions, but they are challenging to diagnose correctly. Uncertainties about the triggering mechanisms and the underlying pathophysiology have led to variations in their classification. This study uses machine learning to categorize patients with orthostatic intolerance. We use random forest classification trees to identify a small number of markers in blood pressure, and heart rate time-series data measured during head-up tilt to (a) distinguish patients with a single pathology and (b) examine data from patients with a mixed pathophysiology. Next, we use Kmeans to cluster the markers representing the time-series data. We apply the proposed method analyzing clinical data from 186 subjects identified as control or suffering from one of four conditions: postural orthostatic tachycardia (POTS), cardioinhibition, vasodepression, and mixed cardioinhibition and vasodepression. Classification results confirm the use of supervised machine learning. We were able to categorize more than 95% of patients with a single condition and were able to subgroup all patients with mixed cardioinhibitory and vasodepressor syncope. Clustering results confirm the disease groups and identify two distinct subgroups within the control and mixed groups. The proposed study demonstrates how to use machine learning to discover structure in blood pressure and heart rate time-series data. The methodology is used in classification of patients with orthostatic intolerance. Diagnosing orthostatic intolerance is challenging, and full characterization of the pathophysiological mechanisms remains a topic of ongoing research. This study provides a step toward leveraging machine learning to assist clinicians and researchers in addressing these challenges. [Figure not available: see fulltext.].
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Experimental Mechanics
This work explores the effect of the ill-posed problem on uncertainty quantification for motion estimation using digital image correlation (DIC) (Sutton et al. [2009]). We develop a correction factor for standard uncertainty estimates based on the cosine of the angle between the true motion and the image gradients, in an integral sense over a subregion of the image. This correction factor accounts for variability in the DIC solution previously unaccounted for when considering only image noise, interpolation bias, contrast, and the software settings such as subset size and spacing.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Verification, Validation and Uncertainty Quantification
The modern scientific process often involves the development of a predictive computational model. To improve its accuracy, a computational model can be calibrated to a set of experimental data. A variety of validation metrics can be used to quantify this process. Some of these metrics have direct physical interpretations and a history of use, while others, especially those for probabilistic data, are more difficult to interpret. In this work, a variety of validation metrics are used to quantify the accuracy of different calibration methods. Frequentist and Bayesian perspectives are used with both fixed effects and mixed-effects statistical models. Through a quantitative comparison of the resulting distributions, the most accurate calibration method can be selected. Two examples are included which compare the results of various validation metrics for different calibration methods. It is quantitatively shown that, in the presence of significant laboratory biases, a fixed effects calibration is significantly less accurate than a mixed-effects calibration. This is because the mixed-effects statistical model better characterizes the underlying parameter distributions than the fixed effects model. The results suggest that validation metrics can be used to select the most accurate calibration model for a particular empirical model with corresponding experimental data.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Learning 3D representations that generalize well to arbitrarily oriented inputs is a challenge of practical importance in applications varying from computer vision to physics and chemistry. We propose a novel multi-resolution convolutional architecture for learning over concentric spherical feature maps, of which the single sphere representation is a special case. Our hierarchical architecture is based on alternatively learning to incorporate both intra-sphere and inter-sphere information. We show the applicability of our method for two different types of 3D inputs, mesh objects, which can be regularly sampled, and point clouds, which are irregularly distributed. We also propose an efficient mapping of point clouds to concentric spherical images, thereby bridging spherical convolutions on grids with general point clouds. We demonstrate the effectiveness of our approach in improving state-of-the-art performance on 3D classification tasks with rotated data.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This is the second in a sequence of three Hardware Evaluation milestones that provide insight into the following questions: What are the sources of excess data movement across all levels of the memory hierarchy, going out to the network fabric? What can be done at various levels of the hardware/software hierarchy to reduce excess data movement? How does reduced data movement track application performance? The results of this study can be used to suggest where the DOE supercomputing facilities, working with their hardware vendors, can optimize aspects of the system to reduce excess data movement. Quantitative analysis will also benefit systems software and applications to optimize caching and data layout strategies. Another potential avenue is to answer cost-benefit questions, such as those involving memory capacity versus latency and bandwidth. This milestone focuses on techniques to reduce data movement, quantitatively evaluates the efficacy of the techniques in accomplishing that goal, and measures how performance tracks data movement reduction. We study a small collection of benchmarks and proxy mini-apps that run on pre-exascale GPUs and on the Accelsim GPU simulator. Our approach has two thrusts: to measure advanced data movement reduction directives and techniques on the newest available GPUs, and to evaluate our benchmark set on simulated GPUs configured with architectural refinements to reduce data movement.
Computer Methods in Applied Mechanics and Engineering
We present a fully discrete approximation technique for the compressible Navier–Stokes equations that is second-order accurate in time and space, semi-implicit, and guaranteed to be invariant domain preserving. The restriction on the time step is the standard hyperbolic CFL condition, i.e. τ≲O(h)∕V where V is some reference velocity scale and h the typical meshsize.
Abstract not provided.
Abstract not provided.
Abstract not provided.