Due to its balance of accuracy and computational cost, density functional theory has become the method of choice for computing the electronic structure and related properties of materials. However, present-day semi-local approximations to the exchange-correlation energy of density functional theory break down for materials containing d and f electrons. In this report we summarize the results of our research efforts within the LDRD 200202 titled "Making density functional theory work for all materials" in addressing this issue. Our efforts are grouped into two research thrusts. In the first thrust, we develop an exchange-correlation functional (BSC functional) within the subsystem functional formalism. It enables us to capture bulk, surface, and confinement physics with a single, semi-local exchange-correlation functional in density functional theory calculations. We present the analytical properties of the BSC functional and demonstrate that the BSC functional is able to capture confinement physics more accurately than standard semi-local exchange-correlation functionals. The second research thrust focusses on developing a database for transition metal binary compounds. The database consists of materials properties (formation energies, ground-state energies, lattice constants, and elastic constants) of 26 transition metal elements and 89 transition metal alloys. It serves as a reference for benchmarking computational models (such as lower-level modeling methods and exchange-correlation functionals). We expect that our database will significantly impact the materials science community. We conclude with a brief discussion on the future research directions and impact of our results.
We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the linear complexity of the-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of linear complexity of the-matrix factorization. We first compare factorization performances using the L-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR. We then compare the BLR and lattice (H-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.
We consider a joint-chance constraint (JCC) as a union of sets, and approximate this union using bounds from classical probability theory. When these bounds are used in an optimization model constrained by the JCC, we obtain corresponding upper and lower bounds on the optimal objective function value. We compare the strength of these bounds against each other under two different sampling schemes, and observe that a larger correlation between the uncertainties tends to result in more computationally challenging optimization models. We also observe the same set of inequalities to provide the tightest upper and lower bounds in our computational experiments.
Remote sensing (RS) data collection capabilities are rapidly evolving hyper-spectrally (sensing more spectral bands), hyper-temporally (faster sampling rates) and hyper-spatially (increasing number of smaller pixels). Accordingly, sensor technologies have outpaced transmission capa- bilities introducing a need to process more data at the sensor. While many sophisticated data processing capabilities are emerging, power and other hardware requirements for these approaches on conventional electronic systems place them out of context for resource constrained operational environments. To address these limitations, in this research effort we have investigated and char- acterized neural-inspired architectures to determine suitability for implementing RS algorithms In doing so, we have been able to highlight a 100x performance per watt improvement using neu- romorphic computing as well as developed an algorithmic architecture co-design and exploration capability.
Over the last decade, hardware advances have led to the feasibility of training and inference for very large deep neural networks. Sparsified deep neural networks (DNNs) can greatly reduce memory costs and increase throughput of standard DNNs, if loss of accuracy can be controlled. The IEEE HPEC Sparse Deep Neural Network Graph Challenge serves as a testbed for algorithmic and implementation advances to maximize computational performance of sparse deep neural networks. We base our sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library. Using the sparse matrix-matrix multiplication in Kokkos Kernels allows us to reuse a highly optimized kernel. We focus on reducing the single node and multi-node runtimes for 12 sparse networks. We test KK-SpDNN on Intel Skylake and Knights Landing architectures and see 120-500x improvement on single node performance over the serial reference implementation. We run in data-parallel mode with MPI to further speed up network inference, ultimately obtaining an edge processing rate of 1.16e+12 on 20 Skylake nodes. This translates to a 13x speed up on 20 nodes compared to our highly optimized multithreaded implementation on a single Skylake node.
In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU execution. We demonstrate our fine-grained parallel approach using implementations in Kokkos and evaluate them on an Intel Skylake CPU and an Nvidia Tesla V100 GPU. Overall, we observe between a 1.261. 48x improvement on the CPU and a 9.97-16.92x improvement on the GPU due to our fine-grained parallel formulation.
The failure of subsurface seals (i.e., wellbores, shaft and drift seals in a deep geologic nuclear waste repository) has important implications for US Energy Security. The performance of these cementitious seals is controlled by a combination of chemical and mechanical forces, which are coupled processes that occur over multiple length scales. The goal of this work is to improve fundamental understanding of cement-geomaterial interfaces and develop tools and methodologies to characterize and predict performance of subsurface seals. This project utilized a combined experimental and modeling approach to better understand failure at cement-geomaterial interfaces. Cutting-edge experimental methods and characterization methods were used to understand evolution of the material properties during chemo-mechanical alteration of cement-geomaterial interfaces. Software tools were developed to model chemo-mechanical coupling and predict the complex interplay between reactive transport and solid mechanics. Novel, fit-for-purpose materials were developed and tested using fundamental understanding of failure processes at cement-geomaterial interfaces.
Additive manufacturing (AM) of metal parts can save time, energy, and produce parts that cannot otherwise be made with traditional machining methods. Near final part geometry is the goal for AM, but material microstructures are inherently different from those of wrought materials as they arise from a complex temperature history associated with the additive process. It is well known that strength and other properties of interest in engineering design follow from microstructure and temperature history. Because of complex microstructure morphologies and spatial heterogeneities, properties are heterogeneous and reflect underlying microstructure. This report describes a method for distributing properties across a finite element mesh so that effects of complex heterogeneous microstructures arising from additive manufacturing can be systematically incorporated into engineering scale calculations without the need for conducting a nearly impossible and time consuming effort of meshing material details. Furthermore, the method reflects the inherent variability in AM materials by making use of kinetic Monte Carlo calculations to model the AM process associated with a build.
ATS platforms are some of the largest, most complex, and most expensive computer systems installed in the United States at just a few major national laboratories. This milestone describes our recent efforts to procure, install, and test a machine called Vortex at Sandia National Laboratories that is compatible with the larger ATS platform Sierra at LLNL. In this milestone, we have 1) configured and procured a machine with similar hardware characteristics as Sierra ATS, 2) installed the machine, verified its physical hardware, and measured its baseline performance, and 3) demonstrated the machine's compatibility with Sierra ATS, and capacity for useful development and testing of Sandia computer codes (such as SPARC), including uses such as nightly regression testing workloads.
Triangle counting is a foundational graph-analysis kernel in network science. It has also been one of the challenge problems for the 'Static Graph Challenge'. In this work, we propose a novel, hybrid, parallel triangle counting algorithm based on its linear algebra formulation. Our framework uses MPI and Cilk to exploit the benefits of distributed-memory and shared-memory parallelism, respectively. The problem is partitioned among MPI processes using a two-dimensional (2D) Cartesian block partitioning. One-dimensional (1D) rowwise partitioning is used within the Cartesian blocks for shared-memory parallelism using the Cilk programming model. Besides exhibiting very good strong scaling behavior in almost all tested graphs, our algorithm achieves the fastest time on the 1.4B edge real-world twitter graph, which is 3.217 seconds, on 1,092 cores. In comparison to past distributed-memory parallel winners of the graph challenge, we demonstrate a speed up of 2.7× on this twitter graph. This is also the fastest time reported for parallel triangle counting on the twitter graph when the graph is not replicated.
This report summarizes the work performed under a three year LDRD project aiming to develop mathematical and software foundations for compatible meshfree and particle discretizations. We review major technical accomplishments and project metrics such as publications, conference and colloquia presentations and organization of special sessions and minisimposia. The report concludes with a brief summary of ongoing projects and collaborations that utilize the products of this work.
Under high-rate loading in tension, metals can sustain much larger tensile stresses for sub-microsecond time periods than would be possible under quasi-static conditions. This type of failure, known as spall, is not adequately reproduced by hydrocodes with commonly used failure models. The Spall Kinetics Model treats spall by incorporating a time scale into the process of failure. Under sufficiently strong tensile states of stress, damage accumulates over this time scale, which can be thought of as an incubation time. The time scale depends on the previous loading history of the material, reflecting possible damage by a shock wave. The model acts by modifying the hydrostatic pressure that is predicted by any equation of state and is therefore simple to implement. Examples illustrate the ability of the model to reproduce the spall stress and resulting release waves in plate impact experiments on stainless steel.
This research aims to develop brain-inspired solutions for reliable and adaptive autonomous navigation in systems that have limited internal and external sensors and may not have access to reliable GPS information. The algorithms investigated and developed by this project was performed in the context of Sandas A4H (autonomy for hypersonics) mission campaign. These algorithms were additionally explored with respect to their suitability for implementation on emerging neuromorphic computing hardware technology. This project is premised on the hypothesis that brain-inspired SLAM (simultaneous localization and mapping) algorithms may provide an energy-efficient, context-flexible approach to robust sensor-based, real-time navigation.
Dragonflies are known to be highly successful hunters (achieving 90-95% success rate in nature) that implement a guidance law like proportional navigation to intercept their prey. This project tested the hypothesis that dragonflies are able to implement proportional navigation using prey-image translation on their eyes. The model dragonfly presented here calculates changes in pitch and yaw to maintain the prey's image at a designated location (the fovea) on a two-dimensional screen (the model's eyes ). When the model also uses self-knowledge of its own maneuvers as an error signal to adjust the location of the fovea, its interception trajectory becomes equivalent to proportional navigation. I also show that this model can also be applied successfully (in a limited number of scenarios) against maneuvering prey. My results provide a proof-of-concept demonstration of the potential of using the dragonfly nervous system to design a robust interception algorithm for implementation on a man-made system.
Motivated by the need for improved forward modeling and inversion capabilities of geophysical response in geologic settings whose fine--scale features demand accountability, this project describes two novel approaches which advance the current state of the art. First is a hierarchical material properties representation for finite element analysis whereby material properties can be prescribed on volumetric elements, in addition to their facets and edges. Hence, thin or fine--scaled features can be economically represented by small numbers of connected edges or facets, rather than 10's of millions of very small volumetric elements. Examples of this approach are drawn from oilfield and near--surface geophysics where, for example, electrostatic response of metallic infastructure or fracture swarms is easily calculable on a laptop computer with an estimated reduction in resource allocation by 4 orders of magnitude over traditional methods. Second is a first-ever solution method for the space--fractional Helmholtz equation in geophysical electromagnetics, accompanied by newly--found magnetotelluric evidence supporting a fractional calculus representation of multi-scale geomaterials. Whereas these two achievements are significant in themselves, a clear understanding the intermediate length scale where these two endmember viewpoints must converge remains unresolved and is a natural direction for future research. Additionally, an explicit mapping from a known multi-scale geomaterial model to its equivalent fractional calculus representation proved beyond the scope of the present research and, similarly, remains fertile ground for future exploration.
This report summarizes the accomplishments and challenges of a two year LDRD effort focused on improving design-to-simulation agility. The central bottleneck in most solid mechanics simulations is the process of taking CAD geometry and creating a discretization of suitable quality, i.e., the "meshing" effort. This report revisits meshfree methods and documents some key advancements that allow their use on problems with complex geometries, low quality meshes, nearly incompressible materials or that involve fracture. The resulting capability was demonstrated to be an effective part of an agile simulation process by enabling rapid discretization techniques without increasing the time to obtain a solution of a given accuracy. The first enhancement addressed boundary-related challenges associated with meshfree methods. When using point clouds and Euclidean metrics to construct approximation spaces, the boundary information is lost, which results in low accuracy solutions for non-convex geometries and mate rial interfaces. This also complicates the application of essential boundary conditions. The solution involved the development of conforming window functions which use graph and boundary information to directly incorporate boundaries into the approximation space.
This report outlines the fiscal year (FY) 2019 status of an ongoing multi-year effort to develop a general, microstructurally-aware, continuum-level model for representing the dynamic response of material with complex microstructures. This work has focused on accurately representing the response of both conventionally wrought processed and additively manufactured (AM) 304L stainless steel (SS) as a test case. Additive manufacturing, or 3D printing, is an emerging technology capable of enabling shortened design and certification cycles for stockpile components through rapid prototyping. However, there is not an understanding of how the complex and unique microstructures of AM materials affect their mechanical response at high strain rates. To achieve our project goal, an upscaling technique was developed to bridge the gap between the microstructural and continuum scales to represent AM microstructures on a Finite Element (FE) mesh. This process involves the simulations of the additive process using the Sandia developed kinetic Monte Carlo (KMC) code SPPARKS. These SPPARKS microstructures are characterized using clustering algorithms from machine learning and used to populate the quadrature points of a FE mesh. Additionally, a spall kinetic model (SKM) was developed to more accurately represent the dynamic failure of AM materials. Validation experiments were performed using both pulsed power machines and projectile launchers. These experiments have provided equation of state (EOS) and flow strength measurements of both wrought and AM 304L SS to above Mbar pressures. In some experiments, multi-point interferometry was used to quantify the variation is observed material response of the AM 304L SS. Analysis of these experiments is ongoing, but preliminary comparisons of our upscaling technique and SKM to experimental data were performed as a validation exercise. Moving forward, this project will advance and further validate our computational framework, using advanced theory and additional high-fidelity experiments.