The Opportunity and Challenge of Serverless Computing and Large Scale Computational Science
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Reverse engineering (RE) analysts struggle to address critical questions about the safety of binary code accurately and promptly, and their supporting program analysis tools are simply wrong sometimes. The analysis tools have to approximate in order to provide any information at all, but this means that they introduce uncertainty into their results. And those uncertainties chain from analysis to analysis. We hypothesize that exposing sources, impacts, and control of uncertainty to human binary analysts will allow the analysts to approach their hardest problems with high-powered analytic techniques that they know when to trust. Combining expertise in binary analysis algorithms, human cognition, uncertainty quantification, verification and validation, and visualization, we pursue research that should benefit binary software analysis efforts across the board. We find a strong analogy between RE and exploratory data analysis (EDA); we begin to characterize sources and types of uncertainty found in practice in RE (both in the process and in supporting analyses); we explore a domain-specific focus on uncertainty in pointer analysis, showing that more precise models do help analysts answer small information flow questions faster and more accurately; and we test a general population with domain-general sudoku problems, showing that adding "knobs" to an analysis does not significantly slow down performance. This document describes our explorations in uncertainty in binary analysis.
Graph algorithms enable myriad large-scale applications including cybersecurity, social network analysis, resource allocation, and routing. The scalability of current graph algorithm implementations on conventional computing architectures are hampered by the demise of Moore’s law. We present a theoretical framework for designing and assessing the performance of graph algorithms executing in networks of spiking artificial neurons. Although spiking neural networks (SNNs) are capable of general-purpose computation, few algorithmic results with rigorous asymptotic performance analysis are known. SNNs are exceptionally well-motivated practically, as neuromorphic computing systems with 100 million spiking neurons are available, and systems with a billion neurons are anticipated in the next few years. Beyond massive parallelism and scalability, neuromorphic computing systems offer energy consumption orders of magnitude lower than conventional high-performance computing systems. We employ our framework to design and analyze new spiking algorithms for shortest path and dynamic programming problems. Our neuromorphic algorithms are message-passing algorithms relying critically on data movement for computation. For fair and rigorous comparison with conventional algorithms and architectures, which is challenging but paramount, we develop new models of data-movement in conventional computing architectures. This allows us to prove polynomial-factor advantages, even when we assume a SNN consisting of a simple grid-like network of neurons. To the best of our knowledge, this is one of the first examples of a rigorous asymptotic computational advantage for neuromorphic computing.
Abstract not provided.
Abstract not provided.
Semiconductor Science and Technology
To support the increasing demands for efficient deep neural network processing, accelerators based on analog in-memory computation of matrix multiplication have recently gained significant attention for reducing the energy of neural network inference. However, analog processing within memory arrays must contend with the issue of parasitic voltage drops across the metal interconnects, which distort the results of the computation and limit the array size. This work analyzes how parasitic resistance affects the end-to-end inference accuracy of state-of-the-art convolutional neural networks, and comprehensively studies how various design decisions at the device, circuit, architecture, and algorithm levels affect the system's sensitivity to parasitic resistance effects. A set of guidelines are provided for how to design analog accelerator hardware that is intrinsically robust to parasitic resistance, without any explicit compensation or re-training of the network parameters.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The Computer Science Research Institute (CSRI) brings university faculty and students to Sandia National Laboratories for focused collaborative research on Department of Energy (DOE) computer and computational science problems. The institute provides an opportunity for university researches to learn about problems in computer and computational science at DOE laboratories, and help transfer results of their research to programs at the labs. Some specific CSRI research interest areas are: scalable solvers, optimization, algebraic preconditioners, graph-based, discrete, and combinatorial algorithms, uncertainty estimation, validation and verification methods, mesh generation, dynamic load-balancing, virus and other malicious-code defense, visualization, scalable cluster computers, beyond Moore’s Law computing, exascale computing tools and application design, reduced order and multiscale modeling, parallel input/output, and theoretical computer science. The CSRI Summer Program is organized by CSRI and includes a weekly seminar series and the publication of a summer proceedings.
Abstract not provided.
Abstract not provided.
CSPlib is an open source software library for analyzing general ordinary differential equation (ODE) systems and detailed chemical kinetic ODE/DAE systems. It relies on the computational singular perturbation (CSP) method for the analysis of these systems.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Mechanical Design
Bayesian optimization (BO) is an efficient and flexible global optimization framework that is applicable to a very wide range of engineering applications. To leverage the capability of the classical BO, many extensions, including multi-objective, multi-fidelity, parallelization, and latent-variable modeling, have been proposed to address the limitations of the classical BO framework. In this work, we propose a novel multi-objective BO formalism, called srMO-BO-3GP, to solve multi-objective optimization problems in a sequential setting. Three different Gaussian processes (GPs) are stacked together, where each of the GPs is assigned with a different task. The first GP is used to approximate a single-objective computed from the multi-objective definition, the second GP is used to learn the unknown constraints, and the third one is used to learn the uncertain Pareto frontier. At each iteration, a multi-objective augmented Tchebycheff function is adopted to convert multi-objective to single-objective, where the regularization with a regularized ridge term is also introduced to smooth the single-objective function. Finally, we couple the third GP along with the classical BO framework to explore the convergence and diversity of the Pareto frontier by the acquisition function for exploitation and exploration. The proposed framework is demonstrated using several numerical benchmark functions, as well as a thermomechanical finite element model for flip-chip package design optimization.
Neuromorphic computers are hardware systems that mimic the brain’s computational process phenomenology. This is in contrast to neural network accelerators, such as the Google TPU or the Intel Neural Compute Stick, which seek to accelerate the fundamental computation and data flows of neural network models used in the field of machine learning. Neuromorphic computers emulate the integrate and fire neuron dynamics of the brain to achieve a spiking communication architecture for computation. While neural networks are brain-inspired, they drastically oversimplify the brain’s computation model. Neuromorphic architectures are closer to the true computation model of the brain (albeit, still simplified). Neuromorphic computing models herald a 1000x power improvement over conventional CPU architectures. Sandia National Labs is a major contributor to the research community on neuromorphic systems by performing design analysis, evaluation, and algorithm development for neuromorphic computers. Space-based remote sensing development has been a focused target of funding for exploratory research into neuromorphic systems for their potential advantage in that program area; SNL has led some of these efforts. Recently, neuromorphic application evaluation has reached the NA-22 program area. This same exploratory research and algorithm development should penetrate the unattended ground sensor space for SNL’s mission partners and program areas. Neuromorphic computing paradigms offer a distinct advantage for the SWaP-constrained embedded systems of our diverse sponsor-driven program areas.
The recent introduction of a new generation of "smart NICs" have provided new accelerator platforms that include CPU cores or reconfigurable fabric in addition to traditional networking hardware and packet offloading capabilities. While there are currently several proposals for using these smartNICs for low-latency, in-line packet processing operations, there remains a gap in knowledge as to how they might be used as computational accelerators for traditional high-performance applications. This work aims to look at benchmarks and mini-applications to evaluate possible benefits of using a smartNIC as a compute accelerator for HPC applications. We investigate NVIDIA's current-generation BlueField-2 card, which includes eight Arm CPUs along with a small amount of storage, and we test the networking and data movement performance of these cards compared to a standard Intel server host. We then detail how two different applications, YASK and miniMD can be modified to make more efficient use of the BlueField-2 device with a focus on overlapping computation and communication for operations like neighbor building and halo exchanges. Our results show that while the overall compute performance of these devices is limited, using them with a modified miniMD algorithm allows for potential speedups of 5 to 20% over the host CPU baseline with no loss in simulation accuracy.
Nano Letters
We demonstrate the ability to fabricate vertically stacked Si quantum dots (QDs) within SiGe nanowires with QD diameters down to 2 nm. These QDs are formed during high-temperature dry oxidation of Si/SiGe heterostructure pillars, during which Ge diffuses along the pillars' sidewalls and encapsulates the Si layers. Continued oxidation results in QDs with sizes dependent on oxidation time. The formation of a Ge-rich shell that encapsulates the Si QDs is observed, a configuration which is confirmed to be thermodynamically favorable with molecular dynamics and density functional theory. The type-II band alignment of the Si dot/SiGe pillar suggests that charge trapping on the Si QDs is possible, and electron energy loss spectra show that a conduction band offset of at least 200 meV is maintained for even the smallest Si QDs. Our approach is compatible with current Si-based manufacturing processes, offering a new avenue for realizing Si QD devices.
We develop and analyze an optimization-based method for the coupling of a static peridynamic (PD) model and a static classical elasticity model. The approach formulates the coupling as a control problem in which the states are the solutions of the PD and classical equations, the objective is to minimize their mismatch on an overlap of the PD and classical domains, and the controls are virtual volume constraints and boundary conditions applied at the local-nonlocal interface. Our numerical tests performed on three-dimensional geometries illustrate the consistency and accuracy of our method, its numerical convergence, and its applicability to realistic engineering geometries. We demonstrate the coupling strategy as a means to reduce computational expense by confining the nonlocal model to a subdomain of interest, and as a means to transmit local (e.g., traction) boundary conditions applied at a surface to a nonlocal model in the bulk of the domain.
The U.S. Army Research Office (ARO), in partnership with IARPA, are investigating innovative, efficient, and scalable computer architectures that are capable of executing next-generation large scale data-analytic applications. These applications are increasingly sparse, unstructured, non-local, and heterogeneous. Under the Advanced Graphic Intelligence Logical computing Environment (AGILE) program, Performer teams will be asked to design computer architectures to meet the future needs of the DoD and the Intelligence Community (IC). This design effort will require flexible, scalable, and detailed simulation to assess the performance, efficiency, and validity of their designs. To support AGILE, Sandia National Labs will be providing the AGILE-enhanced Structural Simulation Toolkit (A-SST). This toolkit is a computer architecture simulation framework designed to support fast, parallel, and multi-scale simulation of novel architectures. This document describes the A-SST framework, some of its library of simulation models, and how it may be used by AGILE Performers.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This project demonstrates that Chapel programs can interface with MPI-based libraries written in C++ without storing multiple copies of shared data. Chapel is a language for productive parallel computing using global address spaces (PGAS). We identified two approaches to interface Chapel code with the MPI-based Grafiki and Trilinos libraries. The first uses a single Chapel executable to call a C function that interacts with the C++ libraries. The second uses the mmap function to allow separate executables to read and write to the same block of memory on a node. We also encapsulated the second approach in Docker/Singularity containers to maximize ease of use. Comparisons of the two approaches using shared and distributed memory installations of Chapel show that both approaches provide similar scalability and performance.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of the Mechanics and Physics of Solids
Metamaterials are artificial structures that can manipulate and control sound waves in ways not possible with conventional materials. While much effort has been undertaken to widen the bandgaps produced by these materials through design of heterogeneities within unit cells, comparatively little work has considered the effect of engineering heterogeneities at the structural scale by combining different types of unit cells. In this paper, we use the relaxed micromorphic model to study wave propagation in heterogeneous metastructures composed of different unit cells. We first establish the efficacy of the relaxed micromorphic model for capturing the salient characteristics of dispersive wave propagation through comparisons with direct numerical simulations for two classes of metamaterial unit cells: namely phononic crystals and locally resonant metamaterials. We then use this model to demonstrate how spatially arranging multiple unit cells into metastructures can lead to tailored and unique properties such as spatially-dependent broadband wave attenuation, rainbow trapping, and pulse shaping. In the case of the broadband wave attenuation application, we show that by building layered metastructures from different metamaterial unit cells, we can slow down or stop wave packets in an enlarged frequency range, while letting other frequencies through. In the case of the rainbow-trapping application, we show that spatial arrangements of different unit cells can be designed to progressively slow down and eventually stop waves with different frequencies at different spatial locations. Finally, in the case of the pulse-shaping application, our results show that heterogeneous metastructures can be designed to tailor the spatial profile of a propagating wave packet. Collectively, these results show the versatility of the relaxed micromorphic model for effectively and accurately simulating wave propagation in heterogeneous metastructures, and how this model can be used to design heterogeneous metastructures with tailored wave propagation functionalities.
Abstract not provided.
As the number of supported platforms for SNL software increases, so do the testing requirements. This increases the total time spent between when a developer submits code for testing, and when tests are completed. This in turn leads developers to hold off submitting code for testing, meaning that when code is ready for testing there's a lot more of it. This increases the likelihood of merge conflicts which the developer must resolve by hand -- because someone else touched the files near the lines the developer touched. Current text-based diff tools often have trouble resolving conflicts in these cases. Work in Europe and Japan has demonstrated that, using programming language aware diff tools (e.g., using the abstract syntax tree (AST) a compiler might generate) can reduce the manual labor necessary to resolve merge conflicts. These techniques can detect code blocks which have moved, as opposed than current text-based diff tools, which only detect insertions / deletions of text blocks. In this study, we evaluate one such tool, GumTree, and see how effective it is as a replacement for traditional text-based diff approaches.
This report provides detailed documentation of the algorithms that were developed and implemented in the Plato software over the course of the Optimization-based Design for Manufacturing LDRD project.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
As noise limits the performance of quantum processors, the ability to characterize this noise and develop methods to overcome it is essential for the future of quantum computing. In this report, we develop a complete set of tools for improving quantum processor performance at the application level, including low-level physical models of quantum gates, a numerically efficient method of producing process matrices that span a wide range of model parameters, and full-channel quantum simulations. We then provide a few examples of how to use these tools to study the effects of noise on quantum circuits.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Borrowing from nature, neural-inspired interception algorithms were implemented onboard a vehicle. To maximize success, work was conducted in parallel within a simulated environment and on physical hardware. The intercept vehicle used only optical imaging to detect and track the target. A successful outcome is the proof-of-concept demonstration of a neural-inspired algorithm autonomously guiding a vehicle to intercept a moving target. This work tried to establish the key parameters for the intercept algorithm (sensors and vehicle) and expand the knowledge and capabilities of implementing neural-inspired algorithms in simulation and on hardware.
Abstract not provided.
This Laboratory Directed Research and Development project developed and applied closely coupled experimental and computational tools to investigate powder compaction across multiple length scales. The primary motivation for this work is to provide connections between powder feedstock characteristics, processing conditions, and powder pellet properties in the context of powder-based energetic components manufacturing. We have focused our efforts on multicrystalline cellulose, a molecular crystalline surrogate material that is mechanically similar to several energetic materials of interest, but provides several advantages for fundamental investigations. We report extensive experimental characterization ranging in length scale from nanometers to macroscopic, bulk behavior. Experiments included nanoindentation of well-controlled, micron-scale pillar geometries milled into the surface of individual particles, single-particle crushing experiments, in-situ optical and computed tomography imaging of the compaction of multiple particles in different geometries, and bulk powder compaction. In order to capture the large plastic deformation and fracture of particles in computational models, we have advanced two distinct meshfree Lagrangian simulation techniques: 1.) bonded particle methods, which extend existing discrete element method capabilities in the Sandia-developed , open-source LAMMPS code to capture particle deformation and fracture and 2.) extensions of peridynamics for application to mesoscale powder compaction, including a novel material model that includes plasticity and creep. We have demonstrated both methods for simulations of single-particle crushing as well as mesoscale multi-particle compaction, with favorable comparisons to experimental data. We have used small-scale, mechanical characterization data to inform material models, and in-situ imaging of mesoscale particle structures to provide initial conditions for simulations. Both mesostructure porosity characteristics and overall stress-strain behavior were found to be in good agreement between simulations and experiments. We have thus demonstrated a novel multi-scale, closely coupled experimental and computational approach to the study of powder compaction. This enables a wide range of possible investigations into feedstock-process-structure relationships in powder-based materials, with immediate applications in energetic component manufacturing, as well as other particle-based components and processes.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report provides detailed documentation of the algorithms that where developed and implemented in the Plato software over the course of the Optimization-based Design for Manufacturing LDRD project.
Part distortion and residual stress are critical factors for metal additive manufacturing (AM) because they can lead to high failure rates during both manufacturing and service. We present a topology optimization approach that incorporates a fast AM process simulation at each design iteration to provide predictions of manufacturing outcomes (i.e., residual stress, distortion, residual elastic energy) that can be optimized or constrained. The details of the approach and implementation are discussed, and an example design is presented that illustrates the efficacy of the method.
Abstract not provided.
Abstract not provided.
Abstract not provided.
In this paper, we develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. GCP differs from traditional canonical polyadic (CP) tensor decompositions as it allows for arbitrary objective functions which the CP model attempts to minimize. This approach can provide better fits and more interpretable models when the observed tensor data is strongly non-Gaussian. In the streaming case, tensor data is gradually observed over time and the algorithm must incrementally update a GCP factorization with limited access to prior data. In this work, we extend the GCP formalism to the streaming context by deriving a GCP optimization problem to be solved as new tensor data is observed, formulate a tunable history term to balance reconstruction of recently observed data with data observed in the past, develop a scalable solution strategy based on segregated solves using stochastic gradient descent methods, describe a software implementation that provides performance and portability to contemporary CPU and GPU architectures and integrates with Matlab for enhanced usability, and demonstrate the utility and performance of the approach and software on several synthetic and real tensor data sets.
Abstract not provided.
In this project we developed and validated algorithms for privacy-preserving linear regression using a new variant of Secure Multiparty Computation (MPC) we call "Hybrid MPC" (hMPC). Our variant is intended to support low-power, unreliable networks of sensors with low-communication, fault-tolerant algorithms. In hMPC we do not share training data, even via secret sharing. Thus, agents are responsible for protecting their own local data. Only the machine learning (ML) model is protected with information-theoretic security guarantees against honest-but-curious agents. There are three primary advantages to this approach: (1) after setup, hMPC supports a communication-efficient matrix multiplication primitive, (2) organizations prevented by policy or technology from sharing any of their data can participate as agents in hMPC, and (3) large numbers of low-power agents can participate in hMPC. We have also created an open-source software library named "Cicada" to support hMPC applications with fault-tolerance. The fault-tolerance is important in our applications because the agents are vulnerable to failure or capture. We have demonstrated this capability at Sandia's Autonomy New Mexico laboratory through a simple machine-learning exercise with Raspberry Pi devices capturing and classifying images while flying on four drones.
Isocontours of Q-criterion with velocity visualized in the wake for two NREL 5-MW turbines operating under uniform-inflow wind speed of 8 m/s. Simulation performed with the hybrid-Nalu-Wind/AMR-Wind solver.
The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, capturing the thin boundary layers, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources.
This report includes a compilation of several slide presentations: 1) Interatomic Potentials for Materials Science and Beyond–Advances in Machine Learned Spectral Neighborhood Analysis Potentials (Wood); 2) Agile Materials Science and Advanced Manufacturing through AI/ML (de Oca Zapiain); 3) Machine Learning for DFT Calculations (Rajamanickam); 4) Structure-preserving ML discovery of a quantum-to-continuum codesign stack (Trask); and 5) IBM Overview of Accelerated Discovery Technology (Pitera)
International Conference on Simulation of Semiconductor Processes and Devices, SISPAD
We present an efficient self-consistent implementation of the Non-Equilibrium Green Function formalism, based on the Contact Block Reduction method for fast numerical efficiency, and the predictor-corrector approach, together with the Anderson mixing scheme, for the self-consistent solution of the Poisson and Schrödinger equations. Then, we apply this quantum transport framework to investigate 2D horizontal Si:P δ-layer Tunnel Junctions. We find that the potential barrier height varies with the tunnel gap width and the applied bias and that the sign of a single charge impurity in the tunnel gap plays an important role in the electrical current.
2021 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD)
Abstract not provided.
International Conference on Simulation of Semiconductor Processes and Devices, SISPAD
The atomic precision advanced manufacturing (APAM) enabled vertical tunneling field effect transistor (TFET) presents a new opportunity in microelectronics thanks to the use of ultra-high doping and atomically abrupt doping profiles. We present modeling and assessment of the APAM TFET using TCAD Charon simulation. First, we show, through a combination of simulation and experiment, that we can achieve good control of the gated channel on top of a phosphorus layer made using APAM, an essential part of the APAM TFET. Then, we present simulation results of a preliminary APAM TFET that predict transistor-like current-voltage response despite low device performance caused by using large geometry dimensions. Future device simulations will be needed to optimize geometry and doping to guide device design for achieving superior device performance.
Computer Physics Communications
Since the classical molecular dynamics simulator LAMMPS was released as an open source code in 2004, it has become a widely-used tool for particle-based modeling of materials at length scales ranging from atomic to mesoscale to continuum. Reasons for its popularity are that it provides a wide variety of particle interaction models for different materials, that it runs on any platform from a single CPU core to the largest supercomputers with accelerators, and that it gives users control over simulation details, either via the input script or by adding code for new interatomic potentials, constraints, diagnostics, or other features needed for their models. As a result, hundreds of people have contributed new capabilities to LAMMPS and it has grown from fifty thousand lines of code in 2004 to a million lines today. In this paper several of the fundamental algorithms used in LAMMPS are described along with the design strategies which have made it flexible for both users and developers. We also highlight some capabilities recently added to the code which were enabled by this flexibility, including dynamic load balancing, on-the-fly visualization, magnetic spin dynamics models, and quantum-accuracy machine learning interatomic potentials.
Chemistry - A European Journal
Ultradoping introduces unprecedented dopant levels into Si, which transforms its electronic behavior and enables its use as a next-generation electronic material. Commercialization of ultradoping is currently limited by gas-phase ultra-high vacuum requirements. Solvothermal chemistry is amenable to scale-up. However, an integral part of ultradoping is a direct chemical bond between dopants and Si, and solvothermal dopant-Si surface reactions are not well-developed. This work provides the first quantified demonstration of achieving ultradoping concentrations of boron (∼1e14 cm2) by using a solvothermal process. Surface characterizations indicate the catalyst cross-reacted, which led to multiple surface products and caused ambiguity in experimental confirmation of direct surface attachment. Density functional theory computations elucidate that the reaction results in direct B−Si surface bonds. This proof-of-principle work lays groundwork for emerging solvothermal ultradoping processes.
Abstract not provided.