The purpose of this project was to devise, implement, and demonstrate a method that can use Sandia's existing analysis codes (e.g., Sierra, Alegra, the CTH hydro code) with minimal modification to generate objective function gradients for optimization-based design in transient, non-linear, coupled-physics applications. The approach uses a Moving Least Squares representation of the geometry to substantially reduce the number of geometric degrees of freedom. A Multiple-Program Multiple-Data computing model is then used to compute objective gradients via finite differencing. Details of the formulation and implementation are provided, and example applications are presented that show effectiveness and scalability of the approach.
Presented in this document are the theoretical aspects of capabilities contained in the Sierra / SM code. This manuscript serves as an ideal starting point for understanding the theoretical foundations of the code. For a comprehensive study of these capabilities, the reader is encouraged to explore the many references to scientific articles and textbooks contained in this manual. It is important to point out that some capabilities are still in development and may not be presented in this document. Further updates to this manuscript will be made as these capabilities come closer to production level.
Attaining high performance with MPI applications requires efficient message matching to minimize message processing overheads and the latency these overheads introduce into application communication. In this paper, we use a validated simulation-based approach to examine the relationship between MPI message matching performance and application time-to-solution. Specifically, we examine how the performance of several important HPC workloads is affected by the time required for matching. Our analysis yields several important contributions: (i) the performance of current workloads is unlikely to be significantly affected by MPI matching unless match queue operations get much slower or match queues get much longer; (ii) match queue designs that provide sublinear performance as a function of queue length are unlikely to yield much benefit unless match queue lengths increase dramatically; and (iii) we provide guidance on how long the mean time per match attempt may be without significantly affecting application performance. The results and analysis in this paper provide valuable guidance on the design and development of MPI message match queues.
Neuromorphic computing has many promises in the future of computing due to its energy efficient and scalable implementation. Here we extend a neural algorithm that is able to solve the diffusion equation PDE by implementing random walks on neuromorphic hardware. Additionally, we introduce four random walk applications that use this spiking neural algorithm. The four applications currently implemented are: generating a random walk to replicate an image, finding a path between two nodes, finding triangles in a graph, and partitioning a graph into two sections. We then made these four applications available to be implemented on software using a graphical user interface (GUI).
Parameter estimation for mechanical models of plastic deformation utilized in nuclear weapons systems is a laborious process for both experimentalists and constitutive modelers and is critical to producing meaningful numerical predictions. In this work we derive an adjoint-based optimization approach for a stabilized, large-deformation J2 plasticity model that is considerably more computationally efficient but no less accurate than current state of the art methods. Unlike most approaches to model calibration, we drive the inversion procedure with full-field deformation data that can be experimentally measured through established digital image or volume correlation techniques. We present numerical results for two and three dimensional model problems and comment on various directions of future research.
This report documents the completion of milestone STPRO4-7 Kokkos R&D: Remote Memory Spaces for One-Sided Halo-Exchange. The goal of this milestone was to develop and deploy an initial capability to support PGAS like communication models integrated into Kokkos via Remote Memory Spaces. The team developed semantic requirements for Remote Memory Spaces and implemented a prototype library leveraging four different communication libraries: libQUO, SHMEM, MPI-OneSided and NVSHMEM. In conjunction with ADCD02-COPA the Remote Memory Space capability was used in ExaMiniMD — a Molecular Dynamics Proxy Application — to explore the current state of the technology and its usability. The obtained results demonstrate that usability is very good, allowing a significant simplification communication routines, but performance is still lacking.
This report documents the completion of milestone STPRO4-6 Kokkos Support for ASC applications and libraries. The team provided consultation and support for numerous ASC code projects including Sandias SPARC, EMPIRE, Aria, GEMMA, Alexa, Trilinos, LAMMPS and nimbleSM. Over the year more than 350 Kokkos github issues were resolved, with over 220 requiring fixes and enhancements to the code base. Resolving these requests, with many of them issued by ASC code teams, provided applications with the necessary capabilities in Kokkos to be successful.
This report documents the completion of milestone STPRO4-5 Kokkos interoperability with general SIMD types to force vectorization on ATS-1. The Kokkos team worked with application developers to enable the utilization of SIMD intrinsics, which allowed up to 3.7x improvement of the affected kernels on ATS-1 in a proxy application. SIMD types are now deployed in the production code base.
This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.
This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.
This report documents the outcome from the ASC ATDM Level 2 Milestone 6358: Assess Status of Next Generation Components and Physics Models in EMPIRE. This Milestone is an assessment of the EMPIRE (ElectroMagnetic Plasma In Realistic Environments) application and three software components. The assessment focuses on the electromagnetic and electrostatic particle-in-cell solutions for EMPIRE and its associated solver, time integration, and checkpoint-restart components. This information provides a clear understanding of the current status of the EMPIRE application and will help to guide future work in FY19 in order to ready the application for the ASC ATDM L1 Milestone in FY20. It is clear from this assessment that performance of the linear solver will have to be a focus in FY19.
With the increased scale expected on future leadership-class systems, detailed information about the resource usage and performance of MPI message matching provides important insights into how to maintain application performance on next-generation systems. However, obtaining MPI message matching performance data is often not possible without significant effort. A common approach is to instrument an MPI implementation to collect relevant statistics. While this approach can provide important data, collecting matching data at runtime perturbs the application's execution, including its matching performance, and is highly dependent on the MPI library's matchlist implementation. In this paper, we introduce a trace-based simulation approach to obtain detailed MPI message matching performance data for MPI applications without perturbing their execution. Using a number of key parallel workloads and microbenchmarks, we demonstrate that this simulator approach can rapidly and accurately characterize matching behavior. Specifically, we use our simulator to collect several important statistics about the operation of the MPI posted and unexpected queues. For example, we present data about search lengths and the duration that messages spend in the queues waiting to be matched. Data gathered using this simulation-based approach have significant potential to aid hardware designers in determining resource allocation for MPI matching functions and provide application and middleware developers with insight into the scalability issues associated with MPI message matching.
This report summarizes the result of the LDRD Exploratory Express project 211666-01, titled "Coupled Magnetic Spin Dynamics and Molecular Dynamics in a Massively Parallel Framework".
Research interest in developing computing systems that represent logic states using quantum mechanical observables has only increased in the few decades since its inception. While quantum computers, with Josephson junction based qubits, have now been commercially available in the last three years, there is also significant research initiative to develop scalable quantum computers with so-called donor qubits. B.E. Kane first published on a device implementation of a silicon-based quantum computer in 1998, which sparked a wave of follow-on advances due to the attractive nature of silicon-based computing[7]. Nearly all commercial computing systems using classical binary logic are fabricated using a silicon substrate and it is inarguably the most mature material system for semiconductor devices, so that coupling classical and quantum bits on a single substrate is possible. The process of growing and processing silicon crystals into wafers is extremely robust and leads to minimal impurities or structural defects.
In this work we propose an approach for accelerating Uncertainty Quantification (UQ) analysis in the context of Multifidelity applications. In the presence of complex multiphysics applications, which often require a prohibitive computational cost for each evaluation, multifidelity UQ techniques try to accelerate the convergence of statistics by leveraging the in- formation collected from a larger number of a lower fidelity model realizations. However, at the-state-of-the-art, the performance of virtually all the multifidelity UQ techniques is related to the correlation between the high and low-fidelity models. In this work we proposed to design a multifidelity UQ framework based on the identification of independent important directions for each model. The main idea is that if the responses of each model can be represented in a common space, this latter can be shared to enhance the correlation when the samples are drawn with respect to it instead of the original variables. There are also two main additional advantages that follow from this approach. First, the models might be correlated even if their original parametrizations are chosen independently. Second, if the shared space between models has a lower dimensionality than the original spaces, the UQ analysis might benefit from a dimension reduction standpoint. In this work we designed this general framework and we also tested it on several test problems ranging from analytical functions for verification purpose, up to more challenging application problems as an aero-thermo-structural analysis and a scramjet flow analysis.
There has been much interest in leveraging the topological order of materials for quantum information processing. Among the various solid-state systems, one-dimensional topological superconductors made out of strongly spin-orbit-coupled nanowires have been shown to be the most promising material platform. In this project, we investigated the feasibility of turning silicon, which is a non-topological semiconductor and has weak spin-orbit coupling, into a one-dimensional topological superconductor. Our theoretical analysis showed that it is indeed possible to create a sizable effective spin-orbit gap in the energy spectrum of a ballistic one-dimensional electron channel in silicon with the help of nano-magnet arrays. Experimentally, we developed magnetic materials needed for fabricating such nano-magnets, characterized the magnetic behavior at low temperatures, and successfully demonstrated the required magnetization configuration for opening the spin-orbit gap. Our results pave the way toward a practical topological quantum computing platform using silicon, one of the most technologically mature electronic materials.