Xyce Python Model Interpreter (Xyce-PyMi) for enabling ML advancements in production circuit simulation software
Abstract not provided.
Abstract not provided.
Nonlocal models naturally handle a range of physics of interest to SNL, but discretization of their underlying integral operators poses mathematical challenges to realize the accuracy and robustness commonplace in discretization of local counterparts. This project focuses on the concept of asymptotic compatibility, namely preservation of the limit of the discrete nonlocal model to a corresponding well-understood local solution. We address challenges that have traditionally troubled nonlocal mechanics models primarily related to consistency guarantees and boundary conditions. For simple problems such as diffusion and linear elasticity we have developed complete error analysis theory providing consistency guarantees. We then take these foundational tools to develop new state-of-the-art capabilities for: lithiation-induced failure in batteries, ductile failure of problems driven by contact, blast-on-structure induced failure, brittle/ductile failure of thin structures. We also summarize ongoing efforts using these frameworks in data-driven modeling contexts. This report provides a high-level summary of all publications which followed from these efforts.
Abstract not provided.
The final review for the FY21 Advanced Simulation and Computing (ASC) Computational Systems and Software Environments (CSSE) L2 Milestone #7840 was conducted on August 25th, 2021 at Sandia National Laboratories in Albuquerque, New Mexico. The review committee/panel unanimously agreed that the milestone has been successfully completed, exceeding expectations on several of the key deliverables.
Over the last year, the ECP xSDK-multiprecision effort has made tremendous progress in developing and deploying new mixed precision technology and customizing the algorithms for the hardware deployed in the ECP flagship supercomputers. The effort also has succeeded in creating a cross-laboratory community of scientists interested in mixed precision technology and now working together in deploying this technology for ECP applications. In this report, we highlight some of the most promising and impactful achievements of the last year. Among the highlights we present are: Mixed precision IR using a dense LU factorization and achieving a 1.8× speedup on Spock; results and strategies for mixed precision IR using a sparse LU factorization; a mixed precision eigenvalue solver; Mixed Precision GMRES-IR being deployed in Trilinos, and achieving a speedup of 1.4× over standard GMRES; compressed Basis (CB) GMRES being deployed in Ginkgo and achieving an average 1.4× speedup over standard GMRES; preparing hypre for mixed precision execution; mixed precision sparse approximate inverse preconditioners achieving an average speedup of 1.2×; and detailed description of the memory accessor separating the arithmetic precision from the memory precision, and enabling memory-bound low precision BLAS 1/2 operations to increase the accuracy by using high precision in the computations without degrading the performance. We emphasize that many of the highlights presented here have also been submitted to peer-reviewed journals or established conferences, and are under peer-review or have already been published.
Computational Mechanics
We present an approach for constructing a surrogate from ensembles of information sources of varying cost and accuracy. The multifidelity surrogate encodes connections between information sources as a directed acyclic graph, and is trained via gradient-based minimization of a nonlinear least squares objective. While the vast majority of state-of-the-art assumes hierarchical connections between information sources, our approach works with flexibly structured information sources that may not admit a strict hierarchy. The formulation has two advantages: (1) increased data efficiency due to parsimonious multifidelity networks that can be tailored to the application; and (2) no constraints on the training data—we can combine noisy, non-nested evaluations of the information sources. Finally, numerical examples ranging from synthetic to physics-based computational mechanics simulations indicate the error in our approach can be orders-of-magnitude smaller, particularly in the low-data regime, than single-fidelity and hierarchical multifidelity approaches.
The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Fuel Cycle Technology (FCT) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling. These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) control account, which is charged with developing a geologic repository system modeling and analysis capability, and the associated software, GDSA Framework, for evaluating disposal system performance for nuclear waste in geologic media. GDSA Framework is supported by SFWST Campaign and its predecessor the Used Fuel Disposition (UFD) campaign. This report fulfills the GDSA Uncertainty and Sensitivity Analysis Methods work package (SF-21SN01030404) level 3 milestone, Uncertainty and Sensitivity Analysis Methods and Applications in GDSA Framework (FY2021) (M3SF-21SN010304042). It presents high level objectives and strategy for development of uncertainty and sensitivity analysis tools, demonstrates uncertainty quantification (UQ) and sensitivity analysis (SA) tools in GDSA Framework in FY21, and describes additional UQ/SA tools whose future implementation would enhance the UQ/SA capability of GDSA Framework. This work was closely coordinated with the other Sandia National Laboratory GDSA work packages: the GDSA Framework Development work package (SF-21SN01030405), the GDSA Repository Systems Analysis work package (SF-21SN01030406), and the GDSA PFLOTRAN Development work package (SF-21SN01030407). This report builds on developments reported in previous GDSA Framework milestones, particularly M3SF 20SN010304032.
The representation of material heterogeneity (also referred to as "spatial variation") plays a key role in the material failure simulation method used in ALEGRA. ALEGRA is an arbitrary Lagrangian-Eulerian shock and multiphysics code developed at Sandia National Laboratories and contains several methods for incorporating spatial variation into simulations. A desirable property of a spatial variation method is that it should produce consistent stochastic behavior regardless of the mesh used (a property referred to as "mesh independence"). However, mesh dependence has been reported using the Weibull distribution with ALEGRA's spatial variation method. This report describes efforts towards providing additional insight into both the theory and numerical experiments investigating such mesh dependence. In particular, we have implemented a discrete minimum order statistic model with properties that are theoretically mesh independent.
ACM International Conference Proceeding Series
Cyber testbeds provide an important mechanism for experimentally evaluating cyber security performance. However, as an experimental discipline, reproducible cyber experimentation is essential to assure valid, unbiased results. Even minor differences in setup, configuration, and testbed components can have an impact on the experiments, and thus, reproducibility of results. This paper documents a case study in reproducing an earlier emulation study, with the reproduced emulation experiment conducted by a different research group on a different testbed. We describe lessons learned as a result of this process, both in terms of the reproducibility of the original study and in terms of the different testbed technologies used by both groups. This paper also addresses the question of how to compare results between two groups' experiments, identifying candidate metrics for comparison and quantifying the results in this reproduction study.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
IEEE Transactions on Parallel and Distributed Systems
MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors' computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Numerical Linear Algebra with Applications
The generalized singular value decomposition (GSVD) is a valuable tool that has many applications in computational science. However, computing the GSVD for large-scale problems is challenging. Motivated by applications in hyper-differential sensitivity analysis (HDSA), we propose new randomized algorithms for computing the GSVD which use randomized subspace iteration and weighted QR factorization. Detailed error analysis is given which provides insight into the accuracy of the algorithms and the choice of the algorithmic parameters. We demonstrate the performance of our algorithms on test matrices and a large-scale model problem where HDSA is used to study subsurface flow.
IEEE Spectrum
In each of our brains, 86 billion neurons work in parallel, processing inputs from senses and memories to produce the many feats of human cognition. The brains of other creatures are less broadly capable, but those animals often exhibit innate aptitudes for particular tasks, abilities honed by millions of years of evolution.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Physics A: Mathematical and Theoretical
This paper presents (Lagrangian) variational formulations for single and multicomponent semi-compressible fluids with both reversible (entropy-conserving) and irreversible (entropy-generating) processes. Semi-compressible fluids are useful in describing low-Mach dynamics, since they are soundproof. These models find wide use in many areas of fluid dynamics, including both geophysical and astrophysical fluid dynamics. Specifically, the Boussinesq, anelastic and pseudoincompressible equations are developed through a unified treatment valid for arbitrary Riemannian manifolds, thermodynamic potentials and geopotentials. By design, these formulations obey the 1st and 2nd laws of thermodynamics, ensuring their thermodynamic consistency. This general approach extends and unifies existing work, and helps clarify the thermodynamics of semi-compressible fluids. To further this goal, evolution equations are presented for a wide range of thermodynamicvariables: entropy density s, specific entropy η, buoyancy b, temperature T, potential temperature O and a generic entropic variable Χ; along with a general definition of buoyancy valid for all three semicompressible models and arbitrary geopotentials. Finally, the elliptic equation for the pressure perturbation (the Lagrange multiplier that enforces semicompressibility) is developed for all three equation sets in the case of reversible dynamics, and for the Boussinesq/anelastic equations in the case of irreversible dynamics; and some discussion is given of the difficulty in formulating it for the pseudoincompressible equations with irreversible dynamics.
Abstract not provided.
Abstract not provided.
Proceedings - 2021 IEEE Space Computing Conference, SCC 2021
Concerns about cyber threats to space systems are increasing. Researchers are developing intrusion detection and protection systems to mitigate these threats, but sparsity of cyber threat data poses a significant challenge to these efforts. Development of credible threat data sets are needed to overcome this challenge. This paper describes the extension/development of three data generation algorithms (generative adversarial networks, variational auto-encoders, and generative algorithm for multi-variate timeseries) to generate cyber threat data for space systems. The algorithms are applied to a use case that leverages the NASA Operational Simulation for Small Satellites (NOS$^{3})$ platform. Qualitative and quantitative measures are applied to evaluate the generated data. Strengths and weaknesses of each algorithm are presented, and suggested improvements are provided. For this use case, generative algorithm for multi-variate timeseries performed best according to both qualitative and quantitative measures.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Automated vehicles (AV) hold great promise for improving safety, as well as reducing congestion and emissions. In order to make automated vehicles commercially viable, a reliable and highperformance vehicle-based computing platform that meets ever-increasing computational demands will be key. Given the state of existing digital computing technology, designers will face significant challenges in meeting the needs of highly automated vehicles without exceeding thermal constraints or consuming a large portion of the energy available on vehicles, thus reducing range between charges or refills. The accompanying increases in energy for AV use will place increased demand on energy production and distribution infrastructure, which also motivates increasing computational energy efficiency.
Nonlocal models, including peridynamics, often use integral operators that embed lengthscales in their definition. However, the integrands in these operators are difficult to define from the data that are typically available for a given physical system, such as laboratory mechanical property tests. In contrast, molecular dynamics (MD) does not require these integrands, but it suffers from computational limitations in the length and time scales it can address. To combine the strengths of both methods and to obtain a coarse-grained, homogenized continuum model that efficiently and accurately captures materials’ behavior, we propose a learning framework to extract, from MD data, an optimal Linear Peridynamic Solid (LPS) model as a surrogate for MD displacements. To maximize the accuracy of the learnt model we allow the peridynamic influence function to be partially negative, while preserving the well-posedness of the resulting model. To achieve this, we provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions and develop a constrained optimization algorithm that minimizes the equation residual while enforcing such solvability conditions. This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training. We illustrate the efficacy of the proposed approach with several numerical tests for single layer graphene. Our two-dimensional tests show the robustness of the proposed algorithm on validation data sets that include thermal noise, different domain shapes and external loadings, and discretizations substantially different from the ones used for training.
Abstract not provided.
Abstract not provided.
Proceedings - 2021 IEEE Space Computing Conference, SCC 2021
Concerns about cyber threats to space systems are increasing. Researchers are developing intrusion detection and protection systems to mitigate these threats, but sparsity of cyber threat data poses a significant challenge to these efforts. Development of credible threat data sets are needed to overcome this challenge. This paper describes the extension/development of three data generation algorithms (generative adversarial networks, variational auto-encoders, and generative algorithm for multi-variate timeseries) to generate cyber threat data for space systems. The algorithms are applied to a use case that leverages the NASA Operational Simulation for Small Satellites (NOS$^{3})$ platform. Qualitative and quantitative measures are applied to evaluate the generated data. Strengths and weaknesses of each algorithm are presented, and suggested improvements are provided. For this use case, generative algorithm for multi-variate timeseries performed best according to both qualitative and quantitative measures.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
ACS Photonics
Classification of features in a scene typically requires conversion of the incoming photonic field into the electronic domain. Recently, an alternative approach has emerged whereby passive structured materials can perform classification tasks by directly using free-space propagation and diffraction of light. In this manuscript, we present a theoretical and computational study of such systems and establish the basic features that govern their performance. We show that system architecture, material structure, and input light field are intertwined and need to be co-designed to maximize classification accuracy. Our simulations show that a single layer metasurface can achieve classification accuracy better than conventional linear classifiers, with an order of magnitude fewer diffractive features than previously reported. For a wavelength λ, single layer metasurfaces of size 100λ × 100λ with an aperture density λ-2 achieve ∼96% testing accuracy on the MNIST data set, for an optimized distance ∼100λ to the output plane. This is enabled by an intrinsic nonlinearity in photodetection, despite the use of linear optical metamaterials. Furthermore, we find that once the system is optimized, the number of diffractive features is the main determinant of classification performance. The slow asymptotic scaling with the number of apertures suggests a reason why such systems may benefit from multiple layer designs. Finally, we show a trade-off between the number of apertures and fabrication noise.
IEEE Transactions on Parallel and Distributed Systems
As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. Here, we describe the novel abstractions that have been added to Kokkos recently such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.
We present an adaptive algorithm for constructing surrogate models for integrated systems composed of a set of coupled components. With this goal we introduce ‘coupling’ variables with a priori unknown distributions that allow approximations of each component to be built independently. Once built, the surrogates of the components are combined and used to predict system-level quantities of interest (QoI) at a fraction of the cost of interrogating the full system model. We use a greedy experimental design procedure, based upon a modification of Multi-Index Stochastic Collocation (MISC), to minimize the error of the combined surrogate. This is achieved by refining each component surrogate in accordance with its relative contribution to error in the approximation of the system-level QoI. Our adaptation of MISC is a multi-fidelity procedure that can leverage ensembles of models of varying cost and accuracy, for one or more components, to produce estimates of system-level QoI. Several numerical examples demonstrate the efficacy of the proposed approach on systems involving feed-forward and feedback coupling. For a fixed computational budget, the proposed algorithm is able to produce approximations that are orders of magnitude more accurate than approximations that treat the integrated system as a black-box.
We present a surrogate modeling framework for conservatively estimating measures of risk from limited realizations of an expensive physical experiment or computational simulation. We adopt a probabilistic description of risk that assigns probabilities to consequences associated with an event and use risk measures, which combine objective evidence with the subjective values of decision makers, to quantify anticipated outcomes. Given a set of samples, we construct a surrogate model that produces estimates of risk measures that are always greater than their empirical estimates obtained from the training data. These surrogate models not only limit over-confidence in reliability and safety assessments, but produce estimates of risk measures that converge much faster to the true value than purely sample-based estimates. We first detail the construction of conservative surrogate models that can be tailored to the specific risk preferences of the stakeholder and then present an approach, based upon stochastic orders, for constructing surrogate models that are conservative with respect to families of risk measures. The surrogate models introduce a bias that allows them to conservatively estimate the target risk measures. We provide theoretical results that show that this bias decays at the same rate as the L2 error in the surrogate model. Our numerical examples confirm that risk-aware surrogate models do indeed over-estimate the target risk measures while converging at the expected rate.
Chemical Science
Potassium channels modulate various cellular functions through efficient and selective conduction of K+ions. The mechanism of ion conduction in potassium channels has recently emerged as a topic of debate. Crystal structures of potassium channels show four K+ions bound to adjacent binding sites in the selectivity filter, while chemical intuition and molecular modeling suggest that the direct ion contacts are unstable. Molecular dynamics (MD) simulations have been instrumental in the study of conduction and gating mechanisms of ion channels. Based on MD simulations, two hypotheses have been proposed, in which the four-ion configuration is an artifact due to either averaged structures or low temperature in crystallographic experiments. The two hypotheses have been supported or challenged by different experiments. Here, MD simulations with polarizable force fields validated byab initiocalculations were used to investigate the ion binding thermodynamics. Contrary to previous beliefs, the four-ion configuration was predicted to be thermodynamically stable after accounting for the complex electrostatic interactions and dielectric screening. Polarization plays a critical role in the thermodynamic stabilities. As a result, the ion conduction likely operates through a simple single-vacancy and water-free mechanism. The simulations explained crystal structures, ion binding experiments and recent controversial mutagenesis experiments. This work provides a clear view of the mechanism underlying the efficient ion conduction and demonstrates the importance of polarization in ion channel simulations.
Annual ACM Symposium on Parallelism in Algorithms and Architectures
We present a theoretical framework for designing and assessing the performance of algorithms executing in networks consisting of spiking artificial neurons. Although spiking neural networks (SNNs) are capable of general-purpose computation, few algorithmic results with rigorous asymptotic performance analysis are known. SNNs are exceptionally well-motivated practically, as neuromorphic computing systems with 100 million spiking neurons are available, and systems with a billion neurons are anticipated in the next few years. Beyond massive parallelism and scalability, neuromorphic computing systems offer energy consumption orders of magnitude lower than conventional high-performance computing systems. We employ our framework to design and analyze neuromorphic graph algorithms, focusing on shortest path problems. Our neuromorphic algorithms are message-passing algorithms relying critically on data movement for computation, and we develop data-movement lower bounds for conventional algorithms. A fair and rigorous comparison with conventional algorithms and architectures is challenging but paramount. We prove a polynomial-factor advantage even when we assume an SNN consisting of a simple grid-like network of neurons. To the best of our knowledge, this is one of the first examples of a provable asymptotic computational advantage for neuromorphic computing.
Applied Physics Letters
Magnetization of clusters is often simulated using atomistic spin dynamics for a fixed lattice. Coupled spin-lattice dynamics simulations of the magnetization of nanoparticles have, to date, neglected the change in the size of the atomic magnetic moments near surfaces. We show that the introduction of variable magnetic moments leads to a better description of experimental data for the magnetization of small Fe nanoparticles. To this end, we divide atoms into a surface-near shell and a core with bulk properties. It is demonstrated that both the magnitude of the shell magnetic moment and the exchange interactions need to be modified to obtain a fair representation of the experimental data. This allows for a reasonable description of the average magnetic moment vs cluster size, and also the cluster magnetization vs temperature.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
MRS Bulletin
A materials synthesis method that we call atomic-precision advanced manufacturing (APAM), which is the only known route to tailor silicon nanoelectronics with full 3D atomic precision, is making an impact as a powerful prototyping tool for quantum computing. Quantum computing schemes using atomic (31P) spin qubits are compelling for future scale-up owing to long dephasing times, one- and two-qubit gates nearing high-fidelity thresholds for fault-tolerant quantum error correction, and emerging routes to manufacturing via proven Si foundry techniques. Multiqubit devices are challenging to fabricate by conventional means owing to tight interqubit pitches forced by short-range spin interactions, and APAM offers the required (Å-scale) precision to systematically investigate solutions. However, applying APAM to fabricate circuitry with increasing numbers of qubits will require significant technique development. Here, we provide a tutorial on APAM techniques and materials and highlight its impacts in quantum computing research. Finally, we describe challenges on the path to multiqubit architectures and opportunities for APAM technique development. Graphic Abstract: [Figure not available: see fulltext.]
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
These points are covered in this presentation: Distributed GPU stencil, non-contiguous data; Equivalence of strided datatypes and minimal representation; GPU communication methods; Deploying on managed systems; Large messages and MPI datatypes; Translation and canonicalization; Automatic model-driven transfer method selection; and Interposed library implementation.
We consider the development of multifluid models for partially ionized multispecies plasmas. The models are composed of a standard set of five-moment fluid equations for each species plus a description of electromagnetics. The most general model considered utilizes a full set of fluid equations for each charge state of each atomic species, plus a set of fluid equations for electrons. The fluid equations are coupled through source terms describing electromagnetic coupling, ionization, recombination, charge exchange, and elastic scattering collisions in the low-density coronal limit. The form of each of these source terms is described in detail, and references for required rate coefficients are identified for a diverse range of atomic species. Initial efforts have been made to extend these models to incorporate some higher-density collisional effects, including ionization potential depression and three- body recombination. Some reductions of the general multifluid model are considered. First, a reduced multifluid model is derived which averages over all of the charge states (including neutrals) of each atomic species in the general multifluid model. The resulting model maintains full consistency with the general multifluid model from which it is derived by leveraging a quasi-steady-state collisional ionization equilibrium assumption to recover the ionization fractions required to make use of the general collision models. Further reductions are briefly considered to derive certain components of a single-fluid magnetohydrodynamics (MHD) model. In this case, a generalized Ohm's law is obtained, and the standard MHD resistivity is expressed in terms of the collisional models used in the general multifluid model. A number of numerical considerations required to obtain robust implementations of these multifluid models are discussed. First, an algebraic flux correction (AFC) stabilization approach for a continuous Galerkin finite element discretization of the multifluid system is described in which the characteristic speeds used in the stabilization of the fluid systems are synchronized across all species in the model. It is demonstrated that this synchronization is crucial in order to obtain a robust discretization of the multifluid system. Additionally, several different formulations are considered for describing the electromagnetics portion of the multifluid system using nodal continuous Galerkin finite element discretizations. The formulations considered include a parabolic divergence cleaning method and an implicit projection method for the traditional curl formulation of Maxwell's equations, a purely- hyperbolic potential-based formulation of Maxwell's equations, and a mixed hyperbolic-elliptic potential-based formulation of Maxwell's equations. Some advantages and disadvantages of each formulation are explored to compare solution robustness and the ease of use of each formulation. Numerical results are presented to demonstrate the accuracy and robustness of various components of our implementation. Analytic solutions for a spatially homogeneous damped plasma oscillation are derived in order to verify the implementation of the source terms for electromagnetic coupling and elastic collisions between fluid species. Ionization balance as a function of electron temperature is evaluated for several atomic species of interest by comparing to steady-state calculations using various sets of ionization and recombination rate coefficients. Several test problems in one and two spatial dimensions are used to demonstrate the accuracy and robustness of the discretization and stabilization approach for the fluid components of the multifluid system. This includes standard test problems for electrostatic and electromagnetic shock tubes in the two-fluid and ideal shock-MHD limits, a cylindrical diocotron instability, and the GEM challenge magnetic reconnection problem. A one-dimensional simplified prototype of an argon gas puff configuration as deployed on Sandia's Z-machine is used as a demonstration to exercise the full range of capabilities associated with the general multifluid model.
ScienceCloud 2021 - Proceedings of the 11th Workshop on Scientific Cloud Computing
Large-scale, high-throughput computational science faces an accelerating convergence of software and hardware. Software container-based solutions have become common in cloud-based datacenter environments, and are considered promising tools for addressing heterogeneity and portability concerns. However, container solutions reflect a set of assumptions which complicate their adoption by developers and users of scientific workflow applications. Nor are containers a universal solution for deployment in high-performance computing (HPC) environments which have specialized and vertically integrated scheduling and runtime software stacks. In this paper, we present a container design and deployment approach which uses modular layering to ease the deployment of containers into existing HPC environments. This layered approach allows operating system integrations, support for different communication and performance monitoring libraries, and application code to be defined and interchanged in isolation. We describe in this paper the details of our approach, including specifics about container deployment and orchestration for different HPC scheduling systems. We also describe how this layering method can be used to build containers for two separate applications, each deployed on clusters with different batch schedulers, MPI networking support, and performance monitoring requirements. Our experience indicates that the layered approach is a viable strategy for building applications intended to provide similar behavior across widely varying deployment targets.
Conference Record of the IEEE Photovoltaic Specialists Conference
Smoke from wildfires results in air pollution that can impact the performance of solar photovoltaic plants. Production is impacted by factors including the proximity of the fire to a site of interest, the extent of the wildfire, wind direction, and ambient weather conditions. We construct a model that quantifies the relationships among weather, wildfire-induced pollution, and PV production for utility-scale and distributed generation sites located in the western USA. The regression model identified a 9.4%-37.8% reduction in solar PV production on smokey days. This model can be used to determine expected production losses at impacted sites. We also present an analysis of factors that contribute to solar photovoltaic energy production impacts from wildfires. This work will inform anticipated production changes for more accurate grid planning and operational considerations.
Computational Materials Science
Thermal spray processes involve the repeated impact of millions of discrete particles, whose melting, deformation, and coating-formation dynamics occur at microsecond timescales. The accumulated coating that evolves over minutes is comprised of complex, multiphase microstructures, and the timescale difference between the individual particle solidification and the overall coating formation represents a significant challenge for analysts attempting to simulate microstructure evolution. In order to overcome the computational burden, researchers have created rule-based models (similar to cellular automata methods) that do not directly simulate the physics of the process. Instead, the simulation is governed by a set of predefined rules, which do not capture the fine-details of the evolution, but do provide a useful approximation for the simulation of coating microstructures. Here, we introduce a new rules-based process model for microstructure formation during thermal spray processes. The model is 3D, allows for an arbitrary number of material types, and includes multiple porosity-generation mechanisms. Example results of the model for tantalum coatings are presented along with sensitivity analyses of model parameters and validation against 3D experimental data. The model's computational efficiency allows for investigations into the stochastic variation of coating microstructures, in addition to the typical process-to-structure relationships.
Journal of Computational Physics
In this paper we present an alternative approach to the representation of simulation particles for unstructured electrostatic and electromagnetic PIC simulations. In our modified PIC algorithm we represent particles as having a smooth shape function limited by some specified finite radius, r0. A unique feature of our approach is the representation of this shape by surrounding simulation particles with a set of virtual particles with delta shape, with fixed offsets and weights derived from Gaussian quadrature rules and the value of r0. As the virtual particles are purely computational, they provide the additional benefit of increasing the arithmetic intensity of traditionally memory bound particle kernels. The modified algorithm is implemented within Sandia National Laboratories' unstructured EMPIRE-PIC code, for electrostatic and electromagnetic simulations, using periodic boundary conditions. We show results for a representative set of benchmark problems, including electron orbit, a transverse electromagnetic wave propagating through a plasma, numerical heating, and a plasma slab expansion. Good error reduction across all of the chosen problems is achieved as the particles are made progressively smoother, with the optimal particle radius appearing to be problem-dependent.
Journal of Physical Chemistry C
The adsorption of AlCl3 on Si(100) and the effect of annealing the AlCl3-dosed substrate were studied to reveal key surface processes for the development of atomic-precision, acceptor-doping techniques. This investigation was performed via scanning tunneling microscopy (STM), X-ray photoelectron spectroscopy (XPS), and density functional theory (DFT) calculations. At room temperature, AlCl3 readily adsorbed to the Si substrate dimers and dissociated to form a variety of species. Annealing the AlCl3-dosed substrate at temperatures below 450 °C produced unique chlorinated aluminum chains (CACs) elongated along the Si(100) dimer row direction. An atomic model for the chains is proposed with supporting DFT calculations. Al was incorporated into the Si substrate upon annealing at 450 °C and above, and Cl desorption was observed for temperatures beyond 450 °C. Al-incorporated samples were encapsulated in Si and characterized by secondary ion mass spectrometry (SIMS) depth profiling to quantify the Al atom concentration, which was found to be in excess of 1020 cm-3 across a ∼2.7 nm-thick δ-doped region. The Al concentration achieved here and the processing parameters utilized promote AlCl3 as a viable gaseous precursor for novel acceptor-doped Si materials and devices for quantum computing.
Abstract not provided.
Abstract not provided.
Abstract not provided.
We present a numerical modeling workflow based on machine learning (ML) which reproduces the total energies produced by Kohn-Sham density functional theory (DFT) at finite electronic temperature to within chemical accuracy at negligible computational cost. Based on deep neural networks, our workflow yields the local density of states (LDOS) for a given atomic configuration. From the LDOS, spatially-resolved, energy-resolved, and integrated quantities can be calculated, including the DFT total free energy, which serves as the Born-Oppenheimer potential energy surface for the atoms. We demonstrate the efficacy of this approach for both solid and liquid metals and compare results between independent and unified machine-learning models for solid and liquid aluminum. Our machine-learning density functional theory framework opens up the path towards multiscale materials modeling for matter under ambient and extreme conditions at a computational scale and cost that is unattainable with current algorithms.
Theoretical and Applied Fracture Mechanics
The peridynamic theory of solid mechanics is applied to the continuum modeling of the impact of small, high-velocity silica spheres on multilayer graphene targets. The model treats the laminate as a brittle elastic membrane. The material model includes separate failure criteria for the initial rupture of the membrane and for propagating cracks. Material variability is incorporated by assigning random variations in elastic properties within Voronoi cells. The computational model is shown to reproduce the primary aspects of the response observed in experiments, including the growth of a family of radial cracks from the point of impact.
Abstract not provided.
The credibility of an engineering model is of critical importance in large-scale projects. How concerned should an engineer be when reusing someone else's model when they may not know the author or be familiar with the tools that were used to create it? In this report, the authors advance engineers' capabilities for assessing models through examination of the underlying semantic structure of a model--the ontology. This ontology defines the objects in a model, types of objects, and relationships between them. In this study, two advances in ontology simplification and visualization are discussed and are demonstrated on two systems engineering models. These advances are critical steps toward enabling engineering models to interoperate, as well as assessing models for credibility. For example, results of this research show an 80% reduction in file size and representation size, dramatically improving the throughput of graph algorithms applied to the analysis of these models. Finally, four future problems are outlined in ontology research toward establishing credible models--ontology discovery, ontology matching, ontology alignment, and model assessment.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Entropy
The reversible computation paradigm aims to provide a new foundation for general classical digital computing that is capable of circumventing the thermodynamic limits to the energy efficiency of the conventional, non-reversible digital paradigm. However, to date, the essential rationale for, and analysis of, classical reversible computing (RC) has not yet been expressed in terms that leverage the modern formal methods of non-equilibrium quantum thermodynamics (NEQT). In this paper, we begin developing an NEQT-based foundation for the physics of reversible computing. We use the framework of Gorini-Kossakowski-Sudarshan-Lindblad dynamics (a.k.a. Lindbladians) with multiple asymptotic states, incorporating recent results from resource theory, full counting statistics and stochastic thermodynamics. Important conclusions include that, as expected: (1) Landauer’s Principle indeed sets a strict lower bound on entropy generation in traditional non-reversible architectures for deterministic computing machines when we account for the loss of correlations; and (2) implementations of the alternative reversible computation paradigm can potentially avoid such losses, and thereby circumvent the Landauer limit, potentially allowing the efficiency of future digital computing technologies to continue improving indefinitely. We also outline a research plan for identifying the fundamental minimum energy dissipation of reversible computing machines as a function of speed.
Abstract not provided.
Abstract not provided.
Abstract not provided.
TEMPI provides a transparent non-contiguous data-handling layer compatible with various MPIs. MPI Datatypes are a powerful abstraction for allowing an MPI implementation to operate on non-contiguous data. CUDA-aware MPI implementations must also manage transfer of such data between the host system and GPU. The non-unique and recursive nature of MPI datatypes mean that providing fast GPU handling is a challenge. The same noncontiguous pattern may be described in a variety of ways, all of which should be treated equivalently by an implementation. This work introduces a novel technique to do this for strided datatypes. Methods for transferring non-contiguous data between the CPU and GPU depends on the properties of the data layout. This work shows that a simple performance model can accurately select the fastest method. Unfortunately, the combination of MPI software and system hardware available may not provide sufficient performance. The contributions of this work are deployed on OLCF Summit through an interposer library which does not require privileged access to the system to use
Abstract not provided.
Abstract not provided.
Abstract not provided.
Entropy
The reversible computation paradigm aims to provide a new foundation for general classical digital computing that is capable of circumventing the thermodynamic limits to the energy efficiency of the conventional, non-reversible digital paradigm. However, to date, the essential rationale for, and analysis of, classical reversible computing (RC) has not yet been expressed in terms that leverage the modern formal methods of non-equilibrium quantum thermodynamics (NEQT). In this paper, we begin developing an NEQT-based foundation for the physics of reversible computing. We use the framework of Gorini-Kossakowski-Sudarshan-Lindblad dynamics (a.k.a. Lindbladians) with multiple asymptotic states, incorporating recent results from resource theory, full counting statistics and stochastic thermodynamics. Important conclusions include that, as expected: (1) Landauer’s Principle indeed sets a strict lower bound on entropy generation in traditional non-reversible architectures for deterministic computing machines when we account for the loss of correlations; and (2) implementations of the alternative reversible computation paradigm can potentially avoid such losses, and thereby circumvent the Landauer limit, potentially allowing the efficiency of future digital computing technologies to continue improving indefinitely. We also outline a research plan for identifying the fundamental minimum energy dissipation of reversible computing machines as a function of speed.