This work, building on previous efforts, develops a suite of new graph neural network machine learning architectures that generate data-driven prolongators for use in Algebraic Multigrid (AMG). Algebraic Multigrid is a powerful and common technique for solving large, sparse linear systems. Its effectiveness is problem dependent and heavily depends on the choice of the prolongation operator, which interpolates the coarse mesh results onto a finer mesh. Previous work has used recent developments in graph neural networks to learn a prolongation operator from a given coefficient matrix. In this paper, we expand on previous work by exploring architectural enhancements of graph neural networks. A new method for generating a training set is developed which more closely aligns to the test set. Asymptotic error reduction factors are compared on a test suite of 3-dimensional Poisson problems with varying degrees of element stretching. Results show modest improvements in asymptotic error factor over both commonly chosen baselines and learning methods from previous work.
Atomically precise ultradoping of silicon is possible with atomic resists, area-selective surface chemistry, and a limited set of hydride and halide precursor molecules, in a process known as atomic precision advanced manufacturing (APAM). It is desirable to expand this set of precursors to include dopants with organic functional groups and here we consider aluminium alkyls, to expand the applicability of APAM. We explore the impurity content and selectivity that results from using trimethyl aluminium and triethyl aluminium precursors on Si(001) to ultradope with aluminium through a hydrogen mask. Comparison of the methylated and ethylated precursors helps us understand the impact of hydrocarbon ligand selection on incorporation surface chemistry. Combining scanning tunneling microscopy and density functional theory calculations, we assess the limitations of both classes of precursor and extract general principles relevant to each.
The Computer Science Research Institute (CSRI) brings university faculty and students to Sandia National Laboratories for focused collaborative research on Department of Energy (DOE) computer and computational science problems. The institute provides an opportunity for university researches to learn about problems in computer and computational science at DOE laboratories, and help transfer results of their research to programs at the labs. Some specific CSRI research interest areas are: scalable solvers, optimization, algebraic preconditioners, graph-based, discrete, and combinatorial algorithms, uncertainty estimation, validation and verification methods, mesh generation, dynamic load-balancing, virus and other malicious-code defense, visualization, scalable cluster computers, beyond Moore’s Law computing, exascale computing tools and application design, reduced order and multiscale modeling, parallel input/output, and theoretical computer science. The CSRI Summer Program is organized by CSRI and includes a weekly seminar series and the publication of a summer proceedings.
Adams, Brian H.; Bohnhoff, William J.; Dalbey, Keith R.; Ebeida, Mohamed S.; Eddy, John P.; Eldred, Michael S.; Hooper, Russell W.; Hough, Patricia D.; Hu, Kenneth T.; Jakeman, John D.; Khalil, Mohammad; Maupin, Kathryn A.; Monschke, Jason A.; Ridgway, Elliott M.; Rushdi, Ahmad A.; Seidl, Daniel T.; Stephens, John A.; Swiler, Laura P.; Laros, James H.; Winokur, Justin G.
The Dakota toolkit provides a flexible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantification with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the Dakota toolkit provides a flexible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a user's manual for the Dakota software and provides capability overviews and procedures for software execution, as well as a variety of example studies.
Bayesian optimization (BO) is an efficient and flexible global optimization framework that is applicable to a very wide range of engineering applications. To leverage the capability of the classical BO, many extensions, including multi-objective, multi-fidelity, parallelization, and latent-variable modeling, have been proposed to address the limitations of the classical BO framework. In this work, we propose a novel multi-objective BO formalism, called srMO-BO-3GP, to solve multi-objective optimization problems in a sequential setting. Three different Gaussian processes (GPs) are stacked together, where each of the GPs is assigned with a different task. The first GP is used to approximate a single-objective computed from the multi-objective definition, the second GP is used to learn the unknown constraints, and the third one is used to learn the uncertain Pareto frontier. At each iteration, a multi-objective augmented Tchebycheff function is adopted to convert multi-objective to single-objective, where the regularization with a regularized ridge term is also introduced to smooth the single-objective function. Finally, we couple the third GP along with the classical BO framework to explore the convergence and diversity of the Pareto frontier by the acquisition function for exploitation and exploration. The proposed framework is demonstrated using several numerical benchmark functions, as well as a thermomechanical finite element model for flip-chip package design optimization.
Neuromorphic computers are hardware systems that mimic the brain’s computational process phenomenology. This is in contrast to neural network accelerators, such as the Google TPU or the Intel Neural Compute Stick, which seek to accelerate the fundamental computation and data flows of neural network models used in the field of machine learning. Neuromorphic computers emulate the integrate and fire neuron dynamics of the brain to achieve a spiking communication architecture for computation. While neural networks are brain-inspired, they drastically oversimplify the brain’s computation model. Neuromorphic architectures are closer to the true computation model of the brain (albeit, still simplified). Neuromorphic computing models herald a 1000x power improvement over conventional CPU architectures. Sandia National Labs is a major contributor to the research community on neuromorphic systems by performing design analysis, evaluation, and algorithm development for neuromorphic computers. Space-based remote sensing development has been a focused target of funding for exploratory research into neuromorphic systems for their potential advantage in that program area; SNL has led some of these efforts. Recently, neuromorphic application evaluation has reached the NA-22 program area. This same exploratory research and algorithm development should penetrate the unattended ground sensor space for SNL’s mission partners and program areas. Neuromorphic computing paradigms offer a distinct advantage for the SWaP-constrained embedded systems of our diverse sponsor-driven program areas.
The recent introduction of a new generation of "smart NICs" have provided new accelerator platforms that include CPU cores or reconfigurable fabric in addition to traditional networking hardware and packet offloading capabilities. While there are currently several proposals for using these smartNICs for low-latency, in-line packet processing operations, there remains a gap in knowledge as to how they might be used as computational accelerators for traditional high-performance applications. This work aims to look at benchmarks and mini-applications to evaluate possible benefits of using a smartNIC as a compute accelerator for HPC applications. We investigate NVIDIA's current-generation BlueField-2 card, which includes eight Arm CPUs along with a small amount of storage, and we test the networking and data movement performance of these cards compared to a standard Intel server host. We then detail how two different applications, YASK and miniMD can be modified to make more efficient use of the BlueField-2 device with a focus on overlapping computation and communication for operations like neighbor building and halo exchanges. Our results show that while the overall compute performance of these devices is limited, using them with a modified miniMD algorithm allows for potential speedups of 5 to 20% over the host CPU baseline with no loss in simulation accuracy.
We demonstrate the ability to fabricate vertically stacked Si quantum dots (QDs) within SiGe nanowires with QD diameters down to 2 nm. These QDs are formed during high-temperature dry oxidation of Si/SiGe heterostructure pillars, during which Ge diffuses along the pillars' sidewalls and encapsulates the Si layers. Continued oxidation results in QDs with sizes dependent on oxidation time. The formation of a Ge-rich shell that encapsulates the Si QDs is observed, a configuration which is confirmed to be thermodynamically favorable with molecular dynamics and density functional theory. The type-II band alignment of the Si dot/SiGe pillar suggests that charge trapping on the Si QDs is possible, and electron energy loss spectra show that a conduction band offset of at least 200 meV is maintained for even the smallest Si QDs. Our approach is compatible with current Si-based manufacturing processes, offering a new avenue for realizing Si QD devices.
We develop and analyze an optimization-based method for the coupling of a static peridynamic (PD) model and a static classical elasticity model. The approach formulates the coupling as a control problem in which the states are the solutions of the PD and classical equations, the objective is to minimize their mismatch on an overlap of the PD and classical domains, and the controls are virtual volume constraints and boundary conditions applied at the local-nonlocal interface. Our numerical tests performed on three-dimensional geometries illustrate the consistency and accuracy of our method, its numerical convergence, and its applicability to realistic engineering geometries. We demonstrate the coupling strategy as a means to reduce computational expense by confining the nonlocal model to a subdomain of interest, and as a means to transmit local (e.g., traction) boundary conditions applied at a surface to a nonlocal model in the bulk of the domain.
Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, nonconvexity, data sparsity, and nontrivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss--Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to nonquadratic objective functions, most notably cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In our four numerical experiments from surrogate modeling, segmentation, and classification, GNvpro solves the optimization problem more efficiently than commonly used stochastic gradient descent (SGD) schemes. Finally, GNvpro finds solutions that generalize well, and in all but one example better than well-tuned SGD methods, to unseen data points.
The U.S. Army Research Office (ARO), in partnership with IARPA, are investigating innovative, efficient, and scalable computer architectures that are capable of executing next-generation large scale data-analytic applications. These applications are increasingly sparse, unstructured, non-local, and heterogeneous. Under the Advanced Graphic Intelligence Logical computing Environment (AGILE) program, Performer teams will be asked to design computer architectures to meet the future needs of the DoD and the Intelligence Community (IC). This design effort will require flexible, scalable, and detailed simulation to assess the performance, efficiency, and validity of their designs. To support AGILE, Sandia National Labs will be providing the AGILE-enhanced Structural Simulation Toolkit (A-SST). This toolkit is a computer architecture simulation framework designed to support fast, parallel, and multi-scale simulation of novel architectures. This document describes the A-SST framework, some of its library of simulation models, and how it may be used by AGILE Performers.
In this paper, we develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. GCP differs from traditional canonical polyadic (CP) tensor decompositions as it allows for arbitrary objective functions which the CP model attempts to minimize. This approach can provide better fits and more interpretable models when the observed tensor data is strongly non-Gaussian. In the streaming case, tensor data is gradually observed over time and the algorithm must incrementally update a GCP factorization with limited access to prior data. In this work, we extend the GCP formalism to the streaming context by deriving a GCP optimization problem to be solved as new tensor data is observed, formulate a tunable history term to balance reconstruction of recently observed data with data observed in the past, develop a scalable solution strategy based on segregated solves using stochastic gradient descent methods, describe a software implementation that provides performance and portability to contemporary CPU and GPU architectures and integrates with Matlab for enhanced usability, and demonstrate the utility and performance of the approach and software on several synthetic and real tensor data sets.
This report provides detailed documentation of the algorithms that where developed and implemented in the Plato software over the course of the Optimization-based Design for Manufacturing LDRD project.
This report provides detailed documentation of the algorithms that were developed and implemented in the Plato software over the course of the Optimization-based Design for Manufacturing LDRD project.
Borrowing from nature, neural-inspired interception algorithms were implemented onboard a vehicle. To maximize success, work was conducted in parallel within a simulated environment and on physical hardware. The intercept vehicle used only optical imaging to detect and track the target. A successful outcome is the proof-of-concept demonstration of a neural-inspired algorithm autonomously guiding a vehicle to intercept a moving target. This work tried to establish the key parameters for the intercept algorithm (sensors and vehicle) and expand the knowledge and capabilities of implementing neural-inspired algorithms in simulation and on hardware.
As the number of supported platforms for SNL software increases, so do the testing requirements. This increases the total time spent between when a developer submits code for testing, and when tests are completed. This in turn leads developers to hold off submitting code for testing, meaning that when code is ready for testing there's a lot more of it. This increases the likelihood of merge conflicts which the developer must resolve by hand -- because someone else touched the files near the lines the developer touched. Current text-based diff tools often have trouble resolving conflicts in these cases. Work in Europe and Japan has demonstrated that, using programming language aware diff tools (e.g., using the abstract syntax tree (AST) a compiler might generate) can reduce the manual labor necessary to resolve merge conflicts. These techniques can detect code blocks which have moved, as opposed than current text-based diff tools, which only detect insertions / deletions of text blocks. In this study, we evaluate one such tool, GumTree, and see how effective it is as a replacement for traditional text-based diff approaches.
As noise limits the performance of quantum processors, the ability to characterize this noise and develop methods to overcome it is essential for the future of quantum computing. In this report, we develop a complete set of tools for improving quantum processor performance at the application level, including low-level physical models of quantum gates, a numerically efficient method of producing process matrices that span a wide range of model parameters, and full-channel quantum simulations. We then provide a few examples of how to use these tools to study the effects of noise on quantum circuits.
This Laboratory Directed Research and Development project developed and applied closely coupled experimental and computational tools to investigate powder compaction across multiple length scales. The primary motivation for this work is to provide connections between powder feedstock characteristics, processing conditions, and powder pellet properties in the context of powder-based energetic components manufacturing. We have focused our efforts on multicrystalline cellulose, a molecular crystalline surrogate material that is mechanically similar to several energetic materials of interest, but provides several advantages for fundamental investigations. We report extensive experimental characterization ranging in length scale from nanometers to macroscopic, bulk behavior. Experiments included nanoindentation of well-controlled, micron-scale pillar geometries milled into the surface of individual particles, single-particle crushing experiments, in-situ optical and computed tomography imaging of the compaction of multiple particles in different geometries, and bulk powder compaction. In order to capture the large plastic deformation and fracture of particles in computational models, we have advanced two distinct meshfree Lagrangian simulation techniques: 1.) bonded particle methods, which extend existing discrete element method capabilities in the Sandia-developed , open-source LAMMPS code to capture particle deformation and fracture and 2.) extensions of peridynamics for application to mesoscale powder compaction, including a novel material model that includes plasticity and creep. We have demonstrated both methods for simulations of single-particle crushing as well as mesoscale multi-particle compaction, with favorable comparisons to experimental data. We have used small-scale, mechanical characterization data to inform material models, and in-situ imaging of mesoscale particle structures to provide initial conditions for simulations. Both mesostructure porosity characteristics and overall stress-strain behavior were found to be in good agreement between simulations and experiments. We have thus demonstrated a novel multi-scale, closely coupled experimental and computational approach to the study of powder compaction. This enables a wide range of possible investigations into feedstock-process-structure relationships in powder-based materials, with immediate applications in energetic component manufacturing, as well as other particle-based components and processes.
This project demonstrates that Chapel programs can interface with MPI-based libraries written in C++ without storing multiple copies of shared data. Chapel is a language for productive parallel computing using global address spaces (PGAS). We identified two approaches to interface Chapel code with the MPI-based Grafiki and Trilinos libraries. The first uses a single Chapel executable to call a C function that interacts with the C++ libraries. The second uses the mmap function to allow separate executables to read and write to the same block of memory on a node. We also encapsulated the second approach in Docker/Singularity containers to maximize ease of use. Comparisons of the two approaches using shared and distributed memory installations of Chapel show that both approaches provide similar scalability and performance.
In this project we developed and validated algorithms for privacy-preserving linear regression using a new variant of Secure Multiparty Computation (MPC) we call "Hybrid MPC" (hMPC). Our variant is intended to support low-power, unreliable networks of sensors with low-communication, fault-tolerant algorithms. In hMPC we do not share training data, even via secret sharing. Thus, agents are responsible for protecting their own local data. Only the machine learning (ML) model is protected with information-theoretic security guarantees against honest-but-curious agents. There are three primary advantages to this approach: (1) after setup, hMPC supports a communication-efficient matrix multiplication primitive, (2) organizations prevented by policy or technology from sharing any of their data can participate as agents in hMPC, and (3) large numbers of low-power agents can participate in hMPC. We have also created an open-source software library named "Cicada" to support hMPC applications with fault-tolerance. The fault-tolerance is important in our applications because the agents are vulnerable to failure or capture. We have demonstrated this capability at Sandia's Autonomy New Mexico laboratory through a simple machine-learning exercise with Raspberry Pi devices capturing and classifying images while flying on four drones.
Part distortion and residual stress are critical factors for metal additive manufacturing (AM) because they can lead to high failure rates during both manufacturing and service. We present a topology optimization approach that incorporates a fast AM process simulation at each design iteration to provide predictions of manufacturing outcomes (i.e., residual stress, distortion, residual elastic energy) that can be optimized or constrained. The details of the approach and implementation are discussed, and an example design is presented that illustrates the efficacy of the method.
Metamaterials are artificial structures that can manipulate and control sound waves in ways not possible with conventional materials. While much effort has been undertaken to widen the bandgaps produced by these materials through design of heterogeneities within unit cells, comparatively little work has considered the effect of engineering heterogeneities at the structural scale by combining different types of unit cells. In this paper, we use the relaxed micromorphic model to study wave propagation in heterogeneous metastructures composed of different unit cells. We first establish the efficacy of the relaxed micromorphic model for capturing the salient characteristics of dispersive wave propagation through comparisons with direct numerical simulations for two classes of metamaterial unit cells: namely phononic crystals and locally resonant metamaterials. We then use this model to demonstrate how spatially arranging multiple unit cells into metastructures can lead to tailored and unique properties such as spatially-dependent broadband wave attenuation, rainbow trapping, and pulse shaping. In the case of the broadband wave attenuation application, we show that by building layered metastructures from different metamaterial unit cells, we can slow down or stop wave packets in an enlarged frequency range, while letting other frequencies through. In the case of the rainbow-trapping application, we show that spatial arrangements of different unit cells can be designed to progressively slow down and eventually stop waves with different frequencies at different spatial locations. Finally, in the case of the pulse-shaping application, our results show that heterogeneous metastructures can be designed to tailor the spatial profile of a propagating wave packet. Collectively, these results show the versatility of the relaxed micromorphic model for effectively and accurately simulating wave propagation in heterogeneous metastructures, and how this model can be used to design heterogeneous metastructures with tailored wave propagation functionalities.
Isocontours of Q-criterion with velocity visualized in the wake for two NREL 5-MW turbines operating under uniform-inflow wind speed of 8 m/s. Simulation performed with the hybrid-Nalu-Wind/AMR-Wind solver.
Adcock, Christiane; Ananthan, Shreyas; Berget-Vergiat, Luc; Brazell, Michael; Brunhart-Lupo, Nicholas; Hu, Jonathan J.; Knaus, Robert C.; Melvin, Jeremy; Moser, Bob; Mullowney, Paul; Rood, Jon; Sharma, Ashesh; Thomas, Stephen; Vijayakumar, Ganesh; Williams, Alan B.; Wilson, Robert; Yamazaki, Ichitaro Y.; Sprague, Michael
The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, capturing the thin boundary layers, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.
The atomic precision advanced manufacturing (APAM) enabled vertical tunneling field effect transistor (TFET) presents a new opportunity in microelectronics thanks to the use of ultra-high doping and atomically abrupt doping profiles. We present modeling and assessment of the APAM TFET using TCAD Charon simulation. First, we show, through a combination of simulation and experiment, that we can achieve good control of the gated channel on top of a phosphorus layer made using APAM, an essential part of the APAM TFET. Then, we present simulation results of a preliminary APAM TFET that predict transistor-like current-voltage response despite low device performance caused by using large geometry dimensions. Future device simulations will be needed to optimize geometry and doping to guide device design for achieving superior device performance.
We present an efficient self-consistent implementation of the Non-Equilibrium Green Function formalism, based on the Contact Block Reduction method for fast numerical efficiency, and the predictor-corrector approach, together with the Anderson mixing scheme, for the self-consistent solution of the Poisson and Schrödinger equations. Then, we apply this quantum transport framework to investigate 2D horizontal Si:P δ-layer Tunnel Junctions. We find that the potential barrier height varies with the tunnel gap width and the applied bias and that the sign of a single charge impurity in the tunnel gap plays an important role in the electrical current.
This report includes a compilation of several slide presentations: 1) Interatomic Potentials for Materials Science and Beyond–Advances in Machine Learned Spectral Neighborhood Analysis Potentials (Wood); 2) Agile Materials Science and Advanced Manufacturing through AI/ML (de Oca Zapiain); 3) Machine Learning for DFT Calculations (Rajamanickam); 4) Structure-preserving ML discovery of a quantum-to-continuum codesign stack (Trask); and 5) IBM Overview of Accelerated Discovery Technology (Pitera)