We present a new method for reducing parallel applications’ communication time by mapping their MPI tasks to processors in a way that lowers the distance messages travel and the amount of congestion in the network. Assuming geometric proximity among the tasks is a good approximation of their communication interdependence, we use a geometric partitioning algorithm to order both the tasks and the processors, assigning task parts to the corresponding processor parts. In this way, interdependent tasks are assigned to “nearby” cores in the network. We also present a number of algorithmic optimizations that exploit specific features of the network or application to further improve the quality of the mapping. We specifically address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network. However, our methods generalize to contiguous allocations as well, and results are shown for both contiguous and non-contiguous allocations. We show that, for the structured finite difference mini-application MiniGhost, our mapping methods reduced communication time up to 75 percent relative to MiniGhost’s default mapping on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modeling code E3SM/HOMME, our methods reduced communication time up to 31% on 16K cores of an IBM BlueGene/Q with contiguous allocation.
Under high-rate loading in tension, metals can sustain much larger tensile stresses for sub-microsecond time periods than would be possible under quasi-static conditions. This type of failure, known as spall, is not adequately reproduced by hydrocodes with commonly used failure models. The Spall Kinetics Model treats spall by incorporating a time scale into the process of failure. Under sufficiently strong tensile states of stress, damage accumulates over this time scale, which can be thought of as an incubation time. The time scale depends on the previous loading history of the material, reflecting possible damage by a shock wave. The model acts by modifying the hydrostatic pressure that is predicted by any equation of state and is therefore simple to implement. Examples illustrate the ability of the model to reproduce the spall stress and resulting release waves in plate impact experiments on stainless steel.
Over the last decade, hardware advances have led to the feasibility of training and inference for very large deep neural networks. Sparsified deep neural networks (DNNs) can greatly reduce memory costs and increase throughput of standard DNNs, if loss of accuracy can be controlled. The IEEE HPEC Sparse Deep Neural Network Graph Challenge serves as a testbed for algorithmic and implementation advances to maximize computational performance of sparse deep neural networks. We base our sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library. Using the sparse matrix-matrix multiplication in Kokkos Kernels allows us to reuse a highly optimized kernel. We focus on reducing the single node and multi-node runtimes for 12 sparse networks. We test KK-SpDNN on Intel Skylake and Knights Landing architectures and see 120-500x improvement on single node performance over the serial reference implementation. We run in data-parallel mode with MPI to further speed up network inference, ultimately obtaining an edge processing rate of 1.16e+12 on 20 Skylake nodes. This translates to a 13x speed up on 20 nodes compared to our highly optimized multithreaded implementation on a single Skylake node.
In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU execution. We demonstrate our fine-grained parallel approach using implementations in Kokkos and evaluate them on an Intel Skylake CPU and an Nvidia Tesla V100 GPU. Overall, we observe between a 1.261. 48x improvement on the CPU and a 9.97-16.92x improvement on the GPU due to our fine-grained parallel formulation.
This report summarizes the work performed under a three year LDRD project aiming to develop mathematical and software foundations for compatible meshfree and particle discretizations. We review major technical accomplishments and project metrics such as publications, conference and colloquia presentations and organization of special sessions and minisimposia. The report concludes with a brief summary of ongoing projects and collaborations that utilize the products of this work.
This research aims to develop brain-inspired solutions for reliable and adaptive autonomous navigation in systems that have limited internal and external sensors and may not have access to reliable GPS information. The algorithms investigated and developed by this project was performed in the context of Sandas A4H (autonomy for hypersonics) mission campaign. These algorithms were additionally explored with respect to their suitability for implementation on emerging neuromorphic computing hardware technology. This project is premised on the hypothesis that brain-inspired SLAM (simultaneous localization and mapping) algorithms may provide an energy-efficient, context-flexible approach to robust sensor-based, real-time navigation.
The failure of subsurface seals (i.e., wellbores, shaft and drift seals in a deep geologic nuclear waste repository) has important implications for US Energy Security. The performance of these cementitious seals is controlled by a combination of chemical and mechanical forces, which are coupled processes that occur over multiple length scales. The goal of this work is to improve fundamental understanding of cement-geomaterial interfaces and develop tools and methodologies to characterize and predict performance of subsurface seals. This project utilized a combined experimental and modeling approach to better understand failure at cement-geomaterial interfaces. Cutting-edge experimental methods and characterization methods were used to understand evolution of the material properties during chemo-mechanical alteration of cement-geomaterial interfaces. Software tools were developed to model chemo-mechanical coupling and predict the complex interplay between reactive transport and solid mechanics. Novel, fit-for-purpose materials were developed and tested using fundamental understanding of failure processes at cement-geomaterial interfaces.
Motivated by the need for improved forward modeling and inversion capabilities of geophysical response in geologic settings whose fine--scale features demand accountability, this project describes two novel approaches which advance the current state of the art. First is a hierarchical material properties representation for finite element analysis whereby material properties can be prescribed on volumetric elements, in addition to their facets and edges. Hence, thin or fine--scaled features can be economically represented by small numbers of connected edges or facets, rather than 10's of millions of very small volumetric elements. Examples of this approach are drawn from oilfield and near--surface geophysics where, for example, electrostatic response of metallic infastructure or fracture swarms is easily calculable on a laptop computer with an estimated reduction in resource allocation by 4 orders of magnitude over traditional methods. Second is a first-ever solution method for the space--fractional Helmholtz equation in geophysical electromagnetics, accompanied by newly--found magnetotelluric evidence supporting a fractional calculus representation of multi-scale geomaterials. Whereas these two achievements are significant in themselves, a clear understanding the intermediate length scale where these two endmember viewpoints must converge remains unresolved and is a natural direction for future research. Additionally, an explicit mapping from a known multi-scale geomaterial model to its equivalent fractional calculus representation proved beyond the scope of the present research and, similarly, remains fertile ground for future exploration.
Due to its balance of accuracy and computational cost, density functional theory has become the method of choice for computing the electronic structure and related properties of materials. However, present-day semi-local approximations to the exchange-correlation energy of density functional theory break down for materials containing d and f electrons. In this report we summarize the results of our research efforts within the LDRD 200202 titled "Making density functional theory work for all materials" in addressing this issue. Our efforts are grouped into two research thrusts. In the first thrust, we develop an exchange-correlation functional (BSC functional) within the subsystem functional formalism. It enables us to capture bulk, surface, and confinement physics with a single, semi-local exchange-correlation functional in density functional theory calculations. We present the analytical properties of the BSC functional and demonstrate that the BSC functional is able to capture confinement physics more accurately than standard semi-local exchange-correlation functionals. The second research thrust focusses on developing a database for transition metal binary compounds. The database consists of materials properties (formation energies, ground-state energies, lattice constants, and elastic constants) of 26 transition metal elements and 89 transition metal alloys. It serves as a reference for benchmarking computational models (such as lower-level modeling methods and exchange-correlation functionals). We expect that our database will significantly impact the materials science community. We conclude with a brief discussion on the future research directions and impact of our results.
In this short article, we summarize a step-by-step methodology to forecast power output from a photovoltaic solar generator using hourly auto-regressive moving average (ARMA) models. We illustrate how to build an ARMA model, to use statistical tests to validate it, and construct hourly samples. The resulting model inherits nice properties for embedding it into more sophisticated operation and planning models, while at the same time showing relatively good accuracy. Additionally, it represents a good forecasting tool for sample generation for stochastic energy optimization models.
ATS platforms are some of the largest, most complex, and most expensive computer systems installed in the United States at just a few major national laboratories. This milestone describes our recent efforts to procure, install, and test a machine called Vortex at Sandia National Laboratories that is compatible with the larger ATS platform Sierra at LLNL. In this milestone, we have 1) configured and procured a machine with similar hardware characteristics as Sierra ATS, 2) installed the machine, verified its physical hardware, and measured its baseline performance, and 3) demonstrated the machine's compatibility with Sierra ATS, and capacity for useful development and testing of Sandia computer codes (such as SPARC), including uses such as nightly regression testing workloads.
We consider a joint-chance constraint (JCC) as a union of sets, and approximate this union using bounds from classical probability theory. When these bounds are used in an optimization model constrained by the JCC, we obtain corresponding upper and lower bounds on the optimal objective function value. We compare the strength of these bounds against each other under two different sampling schemes, and observe that a larger correlation between the uncertainties tends to result in more computationally challenging optimization models. We also observe the same set of inequalities to provide the tightest upper and lower bounds in our computational experiments.
This report summarizes the accomplishments and challenges of a two year LDRD effort focused on improving design-to-simulation agility. The central bottleneck in most solid mechanics simulations is the process of taking CAD geometry and creating a discretization of suitable quality, i.e., the "meshing" effort. This report revisits meshfree methods and documents some key advancements that allow their use on problems with complex geometries, low quality meshes, nearly incompressible materials or that involve fracture. The resulting capability was demonstrated to be an effective part of an agile simulation process by enabling rapid discretization techniques without increasing the time to obtain a solution of a given accuracy. The first enhancement addressed boundary-related challenges associated with meshfree methods. When using point clouds and Euclidean metrics to construct approximation spaces, the boundary information is lost, which results in low accuracy solutions for non-convex geometries and mate rial interfaces. This also complicates the application of essential boundary conditions. The solution involved the development of conforming window functions which use graph and boundary information to directly incorporate boundaries into the approximation space.
Dragonflies are known to be highly successful hunters (achieving 90-95% success rate in nature) that implement a guidance law like proportional navigation to intercept their prey. This project tested the hypothesis that dragonflies are able to implement proportional navigation using prey-image translation on their eyes. The model dragonfly presented here calculates changes in pitch and yaw to maintain the prey's image at a designated location (the fovea) on a two-dimensional screen (the model's eyes ). When the model also uses self-knowledge of its own maneuvers as an error signal to adjust the location of the fovea, its interception trajectory becomes equivalent to proportional navigation. I also show that this model can also be applied successfully (in a limited number of scenarios) against maneuvering prey. My results provide a proof-of-concept demonstration of the potential of using the dragonfly nervous system to design a robust interception algorithm for implementation on a man-made system.
Uncertainty quantification is recognized as a fundamental task to obtain predictive numerical simulations. However, many realistic engineering applications require complex and computationally expensive high-fidelity numerical simulations for the accurate characterization of the system responses. Moreover, complex physical models and extreme operative conditions can easily lead to hundreds of uncertain parameters that need to be propagated through high-fidelity codes. Under these circumstances, a single fidelity approach, i.e. a workflow that only uses high-fidelity simulations to perform the uncertainty quantification task, is unfeasible due to the prohibitive overall computational cost. In recent years, multifidelity strategies have been introduced to overcome this issue. The core idea of this family of methods is to combine simulations with varying levels of fidelity/accuracy in order to obtain the multifidelity estimators or surrogates with the same accuracy of their single fidelity counterparts at a much lower computational cost. This goal is usually accomplished by defining a prioria sequence of discretization levels or physical modeling assumptions that can be used to decrease the complexity of a numerical realization and thus its computational cost. However ,less attention has been dedicated to low-fidelity models that can be built directly from the small number of high-fidelity simulations available. In this work we focus our attention on Reduced-Order Models that can be considered a particular class of data-driven approaches. Our main goal is to explore the combination of multifidelity uncertainty quantification and reduced-order models to obtain an efficient framework for propagating uncertainties through expensive numerical codes.
There are numerous applications that combine data collected from sensors with machine-learning based classification models to predict the type of event or objects observed. Both the collection of the data itself and the classification models can be tuned for optimal performance, but we hypothesize that additional gains can be realized by jointly assessing both factors together. Through this research, we used a seismic event dataset and two neural network classification models that issued probabilistic predictions on each event to determine whether it was an earthquake or a quarry blast. Real world applications will have constraints on data collection, perhaps in terms of a budget for the number of sensors or on where, when, or how data can be collected. We mimicked such constraints by creating subnetworks of sensors with both size and locational constraints. We compare different methods of determining the set of sensors in each subnetwork in terms of their predictive accuracy and the number of events that they observe overall. Additionally, we take the classifiers into account, treating them both as black-box models and testing out various ways of combining predictions among models and among the set of sensors that observe any given event. We find that comparable overall performance can be seen with less than half the number of sensors in the full network. Additionally, a voting scheme that uses the average confidence across the sensors for a given event shows improved predictive accuracy across nearly all subnetworks. Lastly, locational constraints matter, but sometimes in unintuitive ways, as better-performing sensors may be chosen instead of the ones excluded based on location. This being a short-term research effort, we offer a lengthy discussion on interesting next-steps and ties to other ongoing research efforts that we did not have time to pursue. These include a detailed analysis of the subnetwork performance broken down by event type, specific location, and model confidence. This project also included a Campus Executive research partnership with Texas A&M University. Through this, we worked with a professor and student to study information gain for UAV routing. This was an alternative way of looking at the similar problem space that includes sensor operation for data collection and the resulting benefit to be gained from it. This work is described in an Appendix.
This report outlines the fiscal year (FY) 2019 status of an ongoing multi-year effort to develop a general, microstructurally-aware, continuum-level model for representing the dynamic response of material with complex microstructures. This work has focused on accurately representing the response of both conventionally wrought processed and additively manufactured (AM) 304L stainless steel (SS) as a test case. Additive manufacturing, or 3D printing, is an emerging technology capable of enabling shortened design and certification cycles for stockpile components through rapid prototyping. However, there is not an understanding of how the complex and unique microstructures of AM materials affect their mechanical response at high strain rates. To achieve our project goal, an upscaling technique was developed to bridge the gap between the microstructural and continuum scales to represent AM microstructures on a Finite Element (FE) mesh. This process involves the simulations of the additive process using the Sandia developed kinetic Monte Carlo (KMC) code SPPARKS. These SPPARKS microstructures are characterized using clustering algorithms from machine learning and used to populate the quadrature points of a FE mesh. Additionally, a spall kinetic model (SKM) was developed to more accurately represent the dynamic failure of AM materials. Validation experiments were performed using both pulsed power machines and projectile launchers. These experiments have provided equation of state (EOS) and flow strength measurements of both wrought and AM 304L SS to above Mbar pressures. In some experiments, multi-point interferometry was used to quantify the variation is observed material response of the AM 304L SS. Analysis of these experiments is ongoing, but preliminary comparisons of our upscaling technique and SKM to experimental data were performed as a validation exercise. Moving forward, this project will advance and further validate our computational framework, using advanced theory and additional high-fidelity experiments.
Quantum materials have long promised to revolutionize everything from energy transmission (high temperature superconductors) to both quantum and classical information systems (topological materials). However, their discovery and application has proceeded in an Edisonian fashion due to both an incomplete theoretical understanding and the difficulty of growing and purifying new materials. This project leverages Sandia's unique atomic precision advanced manufacturing (APAM) capability to design small-scale tunable arrays (designer materials) made of donors in silicon. Their low-energy electronic behavior can mimic quantum materials, and can be tuned by changing the fabrication parameters for the array, thereby enabling the discovery of materials systems which can't yet be synthesized. In this report, we detail three key advances we have made towards development of designer quantum materials. First are advances both in APAM technique and underlying mechanisms required to realize high-yielding donor arrays. Second is the first-ever observation of distinct phases in this material system, manifest in disordered 2D sheets of donors. Finally are advances in modeling the electronic structure of donor clusters and regular structures incorporating them, critical to understanding whether an array is expected to show interesting physics. Combined, these establish the baseline knowledge required to manifest the strongly-correlated phases of the Mott-Hubbard model in donor arrays, the first step to deploying APAM donor arrays as analogues of quantum materials.
In stochastic optimization, probabilities naturally arise as cost functionals and chance constraints. Unfortunately, these functions are difficult to handle both theoretically and computationally. The buffered probability of failure and its subsequent extensions were developed as numerically tractable, conservative surrogates for probabilistic computations. In this manuscript, we introduce the higher-moment buffered probability. Whereas the buffered probability is defined using the conditional value-at-risk, the higher-moment buffered probability is defined using higher-moment coherent risk measures. In this way, the higher-moment buffered probability encodes information about the magnitude of tail moments, not simply the tail average. We prove that the higher-moment buffered probability is closed, monotonic, quasi-convex and can be computed by solving a smooth one-dimensional convex optimization problem. These properties enable smooth reformulations of both higher-moment buffered probability cost functionals and constraints.
Tallman, Aaron E.; Stopka, Krzysztof S.; Swiler, Laura P.; Wang, Yan; Kalidindi, Surya R.; Mcdowell, David L.
Data-driven tools for finding structure–property (S–P) relations, such as the Materials Knowledge System (MKS) framework, can accelerate materials design, once the costly and technical calibration process has been completed. A three-model method is proposed to reduce the expense of S–P relation model calibration: (1) direct simulations are performed as per (2) a Gaussian process-based data collection model, to calibrate (3) an MKS homogenization model in an application to α-Ti. The new methods are compared favorably with expert texture selection on the performance of the so-calibrated MKS models. Benefits for the development of new and improved materials are discussed.
This report presents the code verification of EMPIRE-PIC to the analytic solution to a cold diode which was first derived by Jaffe. The cold diode was simulated using EMPIRE-PIC and the error norms were computed based on the Jaffe solution. The diode geometry is one-dimensional and uses the EMPIRE electrostatic field solver. After a transient start-up phase as the electrons first cross the anode-cathode gap, the simulations reach an equilibrium where the electric potential and electric field are approximately steady. The expected spatial order of convergence for potential, electric field and particle velocity are observed.
We present a new, distributed-memory parallel algorithm for detection of degenerate mesh features that can cause singularities in ice sheet mesh simulations. Identifying and removing mesh features such as disconnected components (icebergs) or hinge vertices (peninsulas of ice detached from the land) can significantly improve the convergence of iterative solvers. Because the ice sheet evolves during the course of a simulation, it is important that the detection algorithm can run in situ with the simulation - - running in parallel and taking a negligible amount of computation time - - so that degenerate features (e.g., calving icebergs) can be detected as they develop. We present a distributed memory, BFS-based label-propagation approach to degenerate feature detection that is efficient enough to be called at each step of an ice sheet simulation, while correctly identifying all degenerate features of an ice sheet mesh. Our method finds all degenerate features in a mesh with 13 million vertices in 0.0561 seconds on 1536 cores in the MPAS Albany Land Ice (MALI) model. Compared to the previously used serial pre-processing approach, we observe a 46,000x speedup for our algorithm, and provide additional capability to do dynamic detection of degenerate features in the simulation.
Use of insensitive high explosives (IHEs) has significantly improved ammunition safety because of their remarkable insensitivity to violent cook-off, shock and impact. Triamino-trinitrobenzene (TATB) is the IHE used in many modern munitions. Previously, lightning simulations in different test configurations have shown that the required detonation threshold for standard density TATB at ambient and elevated temperatures (250 C) has a sufficient margin over the shock caused by an arc from the most severe lightning. In this paper, the Braginskii model with Lee-More channel conductivity prescription is used to demonstrate how electrical arcs from lightning could cause detonation in TATB. The steep rise and slow decay in typical lightning pulse are used in demonstrating that the shock pressure from an electrical arc, after reaching the peak, falls off faster than the inverse of the arc radius. For detonation to occur, two necessary detonation conditions must be met: the Pop-Plot criterion and minimum spot size requirement. The relevant Pop-Plot for TATB at 250 C was converted into an empirical detonation criterion, which is applicable to explosives subject to shocks of variable pressure. The arc cross-section was required to meet the minimum detonation spot size reported in the literature. One caveat is that when the shock pressure exceeds the detonation pressure the Pop-Plot may not be applicable, and the minimum spot size requirement may be smaller.