Publications

Results 1801–2000 of 9,998

Search results

Jump to search filters

A spatially adaptive high-order meshless method for fluid–structure interactions

Computer Methods in Applied Mechanics and Engineering

Hu, Wei; Trask, Nathaniel A.; Hu, Xiaozhe; Pan, Wenxiao

We present a scheme implementing an a posteriori refinement strategy in the context of a high-order meshless method for problems involving point singularities and fluid–solid interfaces. The generalized moving least squares (GMLS) discretization used in this work has been previously demonstrated to provide high-order compatible discretization of the Stokes and Darcy problems, offering a high-fidelity simulation tool for problems with moving boundaries. The meshless nature of the discretization is particularly attractive for adaptive h-refinement, especially when resolving the near-field aspects of variables and point singularities governing lubrication effects in fluid–structure interactions. We demonstrate that the resulting spatially adaptive GMLS method is able to achieve optimal convergence in the presence of singularities for both the div-grad and Stokes problems. Further, we present a series of simulations for flows of colloid suspensions, in which the refinement strategy efficiently achieved highly accurate solutions, particularly for colloids with complex geometries.

More Details

Finite-element modeling for an explosively loaded ferroelectric generator

Niederhaus, John H.; Yang, Pin Y.; DiAntonio, Christopher D.; Vunni, George

A preliminary finite-element model has been developed using the ALEGRA-FE code for explosive driven depoling of a PZT 95/5 ferroelectric generator. The ferroelectric material is characterized using hysteresis-loop and hydrostatic depoling tests. These characteristics are incorporated into ALEGRA-FE simulations that model the explosive drive mechanism and shock environment in the material leading to depoling, as well as the ferroelectric response and the behavior of a coupled circuit. The ferroelectric-to-antiferroelectric phase transition is captured, producing an output voltage pulse that matches experimental data to within 10% in rise time, and to within about 15% for the final voltage. Both experimental and modeled pulse magnitudes are less than the theoretical maximum output of the material. Observations from materials characterization suggest that unmodeled effects such as trapped charge in the stored FEG material may have influenced the experimentally observed output.

More Details

Aspherical particle models for molecular dynamics simulation

Computer Physics Communications

Plimpton, Steven J.

In traditional molecular dynamics (MD) simulations, atoms and coarse-grained particles are modeled as point masses interacting via isotropic potentials. For studies where particle shape plays a vital role, more complex models are required. In this paper we describe a spectrum of approaches for modeling aspherical particles, all of which are now available (some recently) as options within the LAMMPS MD package. Broadly these include two classes of models. In the first, individual particles are aspherical, either via a pairwise anisotropic potential which implicitly assigns a simple geometric shape to each particle, or in a more general way where particles store internal state which can explicitly define a complex geometric shape. In the second class of models, individual particles are simple points or spheres, but rigid body constraints are used to create composite aspherical particles in a variety of complex shapes. We discuss parallel algorithms and associated data structures for both kinds of models, which enable dynamics simulations of aspherical particle systems across a wide range of length and time scales. We also highlight parallel performance and scalability and give a few illustrative examples of aspherical models in different contexts.

More Details

Data-driven high-fidelity 2D microstructure reconstruction via non-local patch-based image inpainting

Acta Materialia

Laros, James H.; Tran, Hoang

Microstructure reconstruction problems are usually limited to the representation with finitely many number of phases, e.g. binary and ternary. However, images of microstructure obtained through experimental, for example, using microscope, are often represented as a RGB or grayscale image. Because the phase-based representation is discrete, more rigid, and provides less flexibility in modeling the microstructure, as compared to RGB or grayscale image, there is a loss of information in the conversion. In this paper, a microstructure reconstruction method, which produces images at the fidelity of experimental microscopy, i.e. RGB or grayscale image, is proposed without introducing any physics-based microstructure descriptor. Furthermore, the image texture is preserved and the microstructure image is represented with continuous variables (as in RGB or grayscale images), instead of binary or categorical variables, which results in a high-fidelity image of microstructure reconstruction. The advantage of the proposed method is its quality of reconstruction, which can be applied to any other binary or multiphase 2D microstructure. The proposed method can be thought of as a subsampling approach to expand the microstructure dataset, while preserving its image texture. Moreover, the size of the reconstructed image is more flexible, compared to other machine learning microstructure reconstruction method, where the size must be fixed beforehand. In addition, the proposed method is capable of joining the microstructure images taken at different locations to reconstruct a larger microstructure image. A significant advantage of the proposed method is to remedy the data scarcity problem in materials science, where experimental data is scare and hard to obtain. The proposed method can also be applied to generate statistically equivalent microstructures, which has a strong implication in microstructure-related uncertainty quantification applications. The proposed microstructure reconstruction method is demonstrated with the UltraHigh Carbon Steel micrograph DataBase (UHCSDB).

More Details

Sensitivity and Uncertainty Workflow of Full System SIERRA Models Supporting High Consequence Applications

Orient, George E.; Clay, Robert L.; Friedman-Hill, Ernest J.; Pebay, Philippe P.; Ridgway, Elliott M.

Credibility of end-to-end CompSim (Computational Simulation) models and their agile execution requires an expressive framework to describe, communicate and execute complex computational tool chains representing the model. All stakeholders from system engineering and customers through model developers and V&V partners need views and functionalities of the workflow representing the model in a manner that is natural to their discipline. In the milestone and in this report we define workflow as a network of computation simulation activities executed autonomously on a distributed set of computational platforms. The FY19 ASC L2 Milestone (6802) for the Integrated Workflow (IWF) project was designed to integrate and improve existing capabilities or develop new functionalities to provide a wide range of stakeholders a coherent and intuitive platform capable of defining and executing CompSim modeling from analysis workflow definition to complex ensemble calculations. The main goal of the milestone was to advance the integrated workflow capabilities to support the weapon system analysts with a production deployment in FY20. Ensemble calculations supporting program decisions include sensitivity analysis, optimization and uncertainty quantification. The goal of the L2 milestone aligned with the ultimate goal of the IWF project is to foster cultural and technical shift toward and integrated CompSim capability based on automated workflows. Specific deliverables were defined in five broad categories: 1) Infrastructure, including development of distributed-computing workflow capability, 2) integration of Dakota (Sandia's sensitivity, optimization and UQ engine) with SAW (Sandia Analysis Workbench), 3) ARG (Automatic Report Generator introspecting analysis artifacts and generating human-readable extensible and archivable reports), 4) Libraries and Repositories aiding capability reuse, and 5) Exemplars to support training, capturing best practices and stress testing of the platform. A set of exemplars was defined to represent typical weapon system qualification CompSim projects. Analyzing the required capabilities and using the findings to plan implementation of required capabilities ensured optimal allocation of development resources focused on production deployment after the L2 is completed. It was recognized early that the end-to-end modeling applications pose a considerable number of diverse risks, and a formal risk tracking process was implemented. The project leveraged products, capabilities and development tasks of IWF partners. SAW, Dakota, Cubit, Sierra, Slycat, and NGA (NexGen Analytics, a small business) contributed to the integrated platform developed during this milestone effort. New products delivered include: a) NGW (Next Generation Workflow) for robust workflow definition and execution, b) Dakota wizards, editor and results visualization, and c) the automatic report generator ARG. User engagement was initiated early in the development process eliciting concrete requirements and actionable feedback to assure that the integrated CompSim capability will have high user acceptance and impact. The current integrated capabilities have been demonstrated and are continually being tested by a set of exemplars ranging from training scenarios to computationally demanding uncertainty analyses. The integrated workflow platform has been deployed on both SRN (Sandia Restricted Network) and SCN (Sandia Classified Network). Computational platforms where the system has been demonstrated span from Windows (Creo the CAD platform chosen by Sandia) to Trinity HPC (Sierra and CTH solvers). Follow up work will focus on deployment at SNL and other sites in the nuclear enterprise (LLNL, KCNSC), training and consulting support to democratize the analysis agility, process health and knowledge management benefits the NGW platform provides.

More Details

Status Report on Uncertainty Quantification and Sensitivity Analysis Tools in the Geologic Disposal Safety Assessment (GDSA) Framework

Swiler, Laura P.; Helton, J.C.; Basurto, Eduardo B.; Brooks, Dusty M.; Mariner, Paul M.; Moore, Leslie M.; Mohanty, Sitakanta N.; Sevougian, Stephen D.; Stein, Emily S.

The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Fuel Cycle Technology (FCT) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling. These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) control account, which is charged with developing a geologic repository system modeling and analysis capability, and the associated software, GDSA Framework, for evaluating disposal system performance for nuclear waste in geologic media. GDSA Framework is supported by SFWST Campaign and its predecessor the Used Fuel Disposition (UFD) campaign.

More Details

Page migration support for disaggregated non-volatile memories

ACM International Conference Proceeding Series

Kommareddy, Vamsee R.; Hammond, Simon D.; Hughes, Clayton H.; Samih, Ahmad; Awad, Amro

As demands for memory-intensive applications continue to grow, the memory capacity of each computing node is expected to grow at a similar pace. In high-performance computing (HPC) systems, the memory capacity per compute node is decided upon the most demanding application that would likely run on such system, and hence the average capacity per node in future HPC systems is expected to grow significantly. However, since HPC systems run many applications with different capacity demands, a large percentage of the overall memory capacity will likely be underutilized; memory modules can be thought of as private memory for its corresponding computing node. Thus, as HPC systems are moving towards the exascale era, a better utilization of memory is strongly desired. Moreover, upgrading memory system requires significant efforts. Fortunately, disaggregated memory systems promise better utilization by defining regions of global memory, typically referred to as memory blades, which can be accessed by all computing nodes in the system, thus achieving much better utilization. Disaggregated memory systems are expected to be built using dense, power-efficient memory technologies. Thus, emerging nonvolatile memories (NVMs) are placing themselves as the main building blocks for such systems. However, NVMs are slower than DRAM. Therefore, it is expected that each computing node would have a small local memory that is based on either HBM or DRAM, whereas a large shared NVM memory would be accessible by all nodes. Managing such system with global and local memory requires a novel hardware/software co-design to initiate page migration between global and local memory to maximize performance while enabling access to huge shared memory. In this paper we provide support to migrate pages, investigate such memory management aspects and the major system-level aspects that can affect design decisions in disaggregated NVM systems

More Details

ROCM+Intel-PathForward+RemoteSpaces Development

Trott, Christian R.

This report documents the completion of milestone STPRO4-25 Harden and optimize the ROCm based AMD GPU backend, develop a prototype backend for the Intel ECP Path Forward architecture, and improve the existing prototype Remote Memory Space capabilities. The ROCM code was hardened up to the point of passing all Kokkos unit tests - then AMD deprecated the programming model, forcing us to start over in FY20 with HIP. The Intel ECP Path Forward architecture prototype was developed with some initial capabilities on simulators - but plans changed, so that work will not continue. Instead SYCL will be developed as a backend for Aurora. Remote Spaces was improved. Development is ongoing part of a collaboration with NVIDIA.

More Details

Engage the ISO C++ Standard Committee

Trott, Christian R.

This report documents the completion of milestone STPRO4-26 Engaging the C++ Committee. The Kokkos team attended the three C++ Committee meetings in San Diego, Hawaii, and Cologne with multiple members, updated multiple in-flight proposals (e.g. MDSpan, atomic ref), contributed to numerous proposals central for future capabilities in C++ (e.g. executors, affinity) and organized a new effort to introduce a Basic Linear Algebra library into the C++ standard. We also implemented a production quality version of mdspan as the basis for replacing the vast majority of the implementation of Kokkos::View, and thus start the transitioning of one of the core features in Kokkos to its future replacement.

More Details

Performance Modeling of Vectorized SNAP Inter-Atomic Potentials on CPU Architectures

Blanco, Mark P.; Kim, Kyungjoo K.

SNAP potentials are inter-atomic potentials for molecular dynamics that enable simulations at accuracy levels comparable to density functional theory(DFT) at a fraction of the cost. As such, SNAP scales to on the order of 104 — 106 atoms. In this work, we explore CPU optimization of potentials computation using SIMD. We note that efficient use of SIMD is non-obvious as the application features an irregular iteration space for various potential terms, necessitating use of SIMD across atoms in a cross matrix, batched fashion. We present a preliminary analytic model to determine the correct batch size for several CPU architectures across several vendors, and show end-to-end speedups between 1.66x and 3.22x compared to the original.

More Details

Density Functional Theory Applied to Transition Metal Elements and binaries: Development Application and Results of the V-DM/16 Test Set

Decolvenaere, Elizabeth D.; Wills, Ann E.

Density functional theory (DFT) is undergoing a shift from a descriptive to a predictive tool in the field of solid state physics, heralded by a spike in “high-throughput” studies. However, methods to rigorously evaluate the validity and accuracy of these studies is lacking, raising serious questions when simulation and experiment disagree. In response, we have developed the V-DM/16 test set, designed to evaluate the experimental accuracy of DFT’s various implementations for pe riodic transition metal solids. Our test set evaluates 26 transition metal elements and 80 transition metal alloys across three physical observables: lattice constants, elastic coefficients, and formation energy of alloys. Whether or not a functional can accurately evaluate the formation energy offers key insights into whether the relevant physics are being captured in a simulation, an especially impor tant question in transition metals where active d-electrons can thwart the accuracy of an otherwise well-performing functional. Our test set captures a wide variety of cases where the unique physics present in transition metal binaries can undermine the effectiveness of “traditional” functionals. By application of the V/DM-16 test set, we aim to better characterize the performance of existing functionals on transition metals, and to offer a new tool to rigorously evaluate the performance of new functionals in the future.

More Details

Towards Multifluid Multiphysics Continuum Plasma Simulation for Modeling Magnetically-driven Experiments on Z

Shadid, John N.

Magnetically driven experiments supporting pulsed-power utilize a wide range of configurations, including wire-arrays, gas-puffs, flyer plates, and cylindrical liners. This experimental flexibility is critical to supporting radiation effects, dynamic materials, magneto-inertial-fusion (MIF), and basic high energy density laboratory physics (HEDP) efforts. Ultimately, the rate at which these efforts progress is limited by our understanding of the complex plasma physics of these systems. Our effort has been to begin to develop an advanced algorithmic structure and a R&D code implementation for a plasma physics simulation capability based on the five-moment multi-fluid / full-Maxwell plasma model. This model can be used for inclusion of multiple fluid species (e.g., electrons, multiple charge state ions, and neutrals) and allows for generalized collisional interactions between species, models for ionization/recombination, magnetized Braginskii collisional transport, dissipative effects, and can be readily extended to incorporate radiation transport physics. In the context of pulsed-power simulations this advanced model will help to allow SNL to computationally simulate the dense continuum regions of the physical load (e.g. liner implosions, flyer plates) as well as partial power-flow losses in the final gap region of the inner MITL. In this report we briefly summarize results of applying a preliminary version of this model in the context of verification type problems, and some initial magnetic implosion relevant prototype problems. The MIF relevant prototype problems include results from fully-implicit / implicit-explicit (IMEX) resistive MHD as well as full multifluid EM plasma formulations.

More Details

Dynamical System for Resilient Computing

Rothganger, Fredrick R.; Hoemmen, Mark F.; Phipps, Eric T.; Warrender, Christina E.

The effort to develop larger-scale computing systems introduces a set of related challenges: Large machines are more difficult to synchronize. The sheer quantity of hardware introduces more opportunities for errors. New approaches to hardware, such as low-energy or neuromorphic devices are not directly programmable by traditional methods.

More Details

Monitoring, Understanding, and Predicting the Growth of Methane Emissions in the Arctic

Bambha, Ray B.; Lafranchi, Brian W.; Schrader, Paul E.; Roesler, Erika L.; Taylor, Mark A.; Lucero, Daniel A.; Ivey, Mark D.; Michelsen, Hope A.

Concern over Arctic methane (CH4) emissions has increased following recent discoveries of poorly understood sources and predictions that methane emissions from known sources will grow as Arctic temperatures increase. New efforts are required to detect increases and explain sources without being confounded by the multiple sources. Methods for distinguishing different sources are critical. We conducted measurements of atmospheric methane and source tracers and performed baseline global atmospheric modeling to begin assessing the climate impact of changes in atmospheric methane. The goal of this project was to address uncertainties in Arctic methane sources and their potential impact on climate by (1) deploying newly developed trace-gas analyzers for measurements of methane, methane isotopologues, ethane, and other tracers of methane sources in the Barrow, AK, (2) characterizing methane sources using high-resolution atmospheric chemical transport models and tracer measurements, and (3) modeling Arctic climate using the state-of-the-art high- resolution Spectral Element Community Atmosphere Model (CAM-SE).

More Details

Evaluating the Opportunities for Multi-Level Memory - An ASC 2016 L2 Milestone

Voskuilen, Gwendolyn R.; Frank, Michael P.; Hammond, Simon D.; Rodrigues, Arun

As new memory technologies appear on the market, there is a growing push to incorporate them into future architectures. Compared to traditional DDR DRAM, these technologies provide appealing advantages such as increased bandwidth or non-volatility. However, the technologies have significant downsides as well including higher cost, manufacturing complexity, and for non-volatile memories, higher latency and wear-out limitations. As such, no technology has emerged as a clear technological and economic winner. As a result, systems are turning to the concept of multi-level memory, or mixing multiple memory technologies in a single system to balance cost, performance, and reliability.

More Details

Progress in Deep Geologic Disposal Safety Assessment in the U.S. since 2010

Mariner, Paul M.; Connolly, Laura A.; Cunningham, Leigh C.; Debusschere, Bert D.; Dobson, David C.; Frederick, Jennifer M.; Hammond, Glenn E.; Jordan, Spencer H.; LaForce, Tara; Nole, Michael A.; Park, Heeho D.; Laros, James H.; Rogers, Ralph D.; Seidl, Daniel T.; Sevougian, Stephen D.; Stein, Emily S.; Swift, Peter N.; Swiler, Laura P.; Vo, Jonathan; Wallace, Michael G.

The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Spent Fuel & Waste Disposition (SFWD) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling (DOE 2011, Table 6). These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) work package, which is charged with developing a disposal system modeling and analysis capability for evaluating disposal system performance for nuclear waste in geologic media.

More Details

An Overview of the Gradient-Based Local DIC Formulation for Motion Estimation in DICe

Turner, Daniel Z.

This document outlines the gradient-based digital image correlation (DIC) formulation used in DICe, the Digital Image Correlation Engine (Sandia’s open source DIC code). The gradient-based algorithm implemented in DICe directly reflects the formulation presented here. Every effort is made to point out any simplifications or assumptions involved in the implementation. The focus of this document is on determination of the motion parameters. Computing strain is not discussed herein.

More Details

A Surety Engineering Framework and Process to Address Ethical Legal and Social Issues for Artificial Intelligence

Shaneyfelt, Wendy S.; Feddema, John T.; James, Conrad D.

More Details

Predictive Science ASC Alliance Program (PSAAP) II 2016 Review of the Carbon Capture Multidisciplinary Science Center (CCMSC) at the University of Utah

Hoekstra, Robert J.; Ruggirello, Kevin P.

The review was conducted on May 9-10, 2016 at the University of Utah. Overall the review team was impressed with the work presented and found that the CCMSC had met or exceeded the Year 2 milestones. Specific details, comments and recommendations are included in this document.

More Details

Center for Computing Research Highlights

Hendrickson, Bruce A.; Alvin, Kenneth F.; Miller, Leann A.; Collis, Samuel S.

Sandia has a legacy of leadership in the advancement of high performance computing (HPC) at extreme scales. First-of-a-kind scalable distributed-memory parallel platforms such as the Intel Paragon, ASCI Red (the world’s first teraflops computer), and Red Storm (co-developed with Cray) helped form the basis for one of the most successful supercomputer product lines ever: the Cray XT series. Sandia also has pioneered system software elements—including lightweight operating systems, the Portals network programming interface, advanced interconnection network designs, and scalable I/O— that are critical to achieving scalability on large computing systems.

More Details

Abstract Machine Models and Proxy Architectures for Exascale Computing

Ang, James A.; Barrett, Richard F.; Benner, R.E.; Burke, Daniel; Chan, Cy; Cook, Jeanine C.; Daley, Christopher S.; Donofrio, David; Hammond, Simon D.; Hemmert, Karl S.; Hoekstra, Robert J.; Ibrahim, Khaled; Kelly, Suzanne M.; Le, Hoang; Leung, Vitus J.; Michelogiannakis, George; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan; Unat, D.; Wright, Nick J.; Voskuilen, Gwendolyn R.

To achieve exascale computing, fundamental hardware architectures must change. The most significant consequence of this assertion is the impact on the scientific and engineering applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. In order to adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency into the future. While many details of the exascale architectures are undefined, an abstract machine model is designed to allow application developers to focus on the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. We use the term proxy architecture to describe a parameterized version of an abstract machine model, with the parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models are formulated to enable discussion between the developers of analytic models and simulators and computer hardware architects. They allow for application performance analysis and hardware optimization opportunities. In this report our goal is to provide the application development community with a set of models that can help software developers prepare for exascale. In addition, through the use of proxy architectures, we can enable a more concrete exploration of how well new and evolving application codes map onto future architectures. This second version of the document addresses system scale considerations and provides a system-level abstract machine model with proxy architecture information.

More Details

Evaluating tradeoffs between MPI message matching offload hardware capacity and performance

ACM International Conference Proceeding Series

Levy, Scott L.; Ferreira, Kurt B.

Although its demise has been frequently predicted, the Message Passing Interface (MPI) remains the dominant programming model for scientific applications running on high-performance computing (HPC) systems. MPI specifies powerful semantics for interprocess communication that have enabled scientists to write applications for simulating important physical phenomena. However, these semantics have also presented several significant challenges. For example, the existence of wildcard values has made the efficient enforcement of MPI message matching semantics challenging. Significant research has been dedicated to accelerating MPI message matching. One common approach has been to offload matching to dedicated hardware. One of the challenges that hardware designers have faced is knowing how to size hardware structures to accommodate outstanding match requests. Applications that exceed the capacity of specialized hardware typically must fall back to storing match requests in bulk memory, e.g. DRAM on the host processor. In this paper, we examine the implications of hardware matching and develop guidance on sizing hardware matching structure to strike a balance between minimizing expensive dedicated hardware resources and overall matching performance. By examining the message matching behavior of several important HPC workloads, we show that when specialized hardware matching is not dramatically faster than matching in memory the offload hardware's match queue capacity can be reduced without significantly increasing match time. On the other hand, effectively exploiting the benefits of very fast specialized matching hardware requires sufficient storage resources to ensure that every search completes in the specialized hardware. The data and analysis in this paper provide important guidance for designers of MPI message matching hardware.

More Details

Developing and evaluating Malliavin estimators for intrusive sensitivity analysis of Monte Carlo radiation transport

Bond, Stephen D.; Franke, Brian C.; Lehoucq, Richard B.; Smith, John D.

We will develop Malliavin estimators for Monte Carlo radiation transport by formulating the governing jump stochastic differential equation and deriving the applicable estimators that produce sensitivities for our equations. Efficient and effective sensitivity can be used for design optimization and uncertainty quantification with broad utilization for radiation environments. The technology demonstration will lower development risk for other particle-based simulation methods.

More Details

Rigorous Data Fusion for Computationally Expensive Simulations

Winovich, Nickolas W.; Rushdi, Ahmad R.; Phipps, Eric T.; Ray, Jaideep R.; Lin, Guang; Ebeida, Mohamed S.

This manuscript comprises the final report for the 1-year, FY19 LDRD project "Rigorous Data Fusion for Computationally Expensive Simulations," wherein an alternative approach to Bayesian calibration was developed based a new sampling technique called VoroSpokes. Vorospokes is a novel quadrature and sampling framework defined with respect to Voronoi tessellations of bounded domains in $R^d$ developed within this project. In this work, we first establish local quadrature and sampling results on convex polytopes using randomly directed rays, or spokes, to approximate the quantities of interest for a specified target function. A theoretical justification for both procedures is provided along with empirical results demonstrating the unbiased convergence in the resulting estimates/samples. The local quadrature and sampling procedures are then extended to global procedures defined on more general domains by applying the local results to the cells of a Voronoi tessellation covering the domain in consideration. We then demonstrate how the proposed global sampling procedure can be used to define a natural framework for adaptively constructing Voronoi Piecewise Surrogate (VPS) approximations based on local error estimates. Finally, we show that the adaptive VPS procedure can be used to form a surrogate model approximation to a specified, potentially unnormalized, density function, and that the global sampling procedure can be used to efficiently draw independent samples from the surrogate density in parallel. The performance of the resulting VoroSpokes sampling framework is assessed on a collection of Bayesian inference problems and is shown to provide highly accurate posterior predictions which align with the results obtained using traditional methods such as Gibbs sampling and random-walk Markov Chain Monte Carlo (MCMC). Importantly, the proposed framework provides a foundation for performing Bayesian inference tasks which is entirely independent from the theory of Markov chains.

More Details

Kokkos Training Bootcamp

Trott, Christian R.

This report documents the completion of milestone STPM12-17 Kokkos Training Bootcamp. The goal of this milestone was to hold a combined tutorial and hackathon bootcamp event for the Kokkos community and prospective users. The Kokkos Bootcamp event was held at Argonne National Laboratories from August 27 — August 29, 2019. Attendance being lower than expected (we believe largely due to bad timing), the team focused with a select set of ECP partners on early work in preparation for Aurora. In particular we evaluated issues posed by exposing SYCL and OpenMP target offload to applications via the Kokkos Pro Model.

More Details

Balar: A SST GPU Component for Performance Modeling and Profiling

Hughes, Clayton H.; Hammond, Simon D.; Khairy, Mahmoud; Zhang, Mengchi; Green, Roland; Rogers, Timothy; Hoekstra, Robert J.

Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of massive amounts of data have created a space for massively parallel accelerators capable of maintaining context for thousands of concurrent threads resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. One path for the design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. The SST framework has been proven to scale up to run simulations containing tens of thousands of nodes. A previous report described the initial integration of the open-source, execution-driven GPU simulator, GPGPU-Sim, into the SST framework. This report discusses the results of the integration and how to use the new GPU component in SST. It also provides examples of what it can be used to analyze and a correlation study showing how closely the execution matches that of a Nvidia V100 GPU when running kernels and mini-apps.

More Details

Almost optimal classical approximation algorithms for a quantum generalization of max-cut

Leibniz International Proceedings in Informatics, LIPIcs

Gharibian, Sevag; Parekh, Ojas D.

Approximation algorithms for constraint satisfaction problems (CSPs) are a central direction of study in theoretical computer science. In this work, we study classical product state approximation algorithms for a physically motivated quantum generalization of Max-Cut, known as the quantum Heisenberg model. This model is notoriously difficult to solve exactly, even on bipartite graphs, in stark contrast to the classical setting of Max-Cut. Here we show, for any interaction graph, how to classically and efficiently obtain approximation ratios 0.649 (anti-feromagnetic XY model) and 0.498 (anti-ferromagnetic Heisenberg XYZ model). These are almost optimal; we show that the best possible ratios achievable by a product state for these models is 2/3 and 1/2, respectively.

More Details

An Approach to Upscaling SPPARKS Generated Synthetic Microstructures of Additively Manufactured Metals

Mitchell, John A.

Additive manufacturing (AM) of metal parts can save time, energy, and produce parts that cannot otherwise be made with traditional machining methods. Near final part geometry is the goal for AM, but material microstructures are inherently different from those of wrought materials as they arise from a complex temperature history associated with the additive process. It is well known that strength and other properties of interest in engineering design follow from microstructure and temperature history. Because of complex microstructure morphologies and spatial heterogeneities, properties are heterogeneous and reflect underlying microstructure. This report describes a method for distributing properties across a finite element mesh so that effects of complex heterogeneous microstructures arising from additive manufacturing can be systematically incorporated into engineering scale calculations without the need for conducting a nearly impossible and time consuming effort of meshing material details. Furthermore, the method reflects the inherent variability in AM materials by making use of kinetic Monte Carlo calculations to model the AM process associated with a build.

More Details

Incremental Interval Assignment (IIA) for Scalable Mesh Preparation

Mitchell, Scott A.

Interval Assignment (IA) means selecting the number of mesh edges for each CAD curve. IIA is a discrete algorithm over integers. A priority queue iteratively selects compatible sets of intervals to increase in lock-step by integers. In contrast, the current capability in Cubit is floating-point Linear Programming with Branch-and-Bound for integerization (BBIA).

More Details

Linear algebra-based triangle counting via fine-grained tasking on heterogeneous environments : ((Update on Static Graph Challenge)

2019 IEEE High Performance Extreme Computing Conference, HPEC 2019

Yasar, Abdurrahman Y.; Rajamanickam, Sivasankaran R.; Berry, Jonathan W.; Acer, Seher A.; Wolf, Michael W.; Young, Jeffrey G.; Catalyurek, Umit V.

Triangle counting is a representative graph problem that shows the challenges of improving graph algorithm performance using algorithmic techniques and adopting graph algorithms to new architectures. In this paper, we describe an update to the linear-algebraic formulation of the triangle counting problem. Our new approach relies on fine-grained tasking based on a tile layout. We adopt this task based algorithm to heterogeneous architectures (CPUs and GPUs) for up to 10.8x speed up over past year's graph challenge submission. This implementation also results in the fastest kernel time known at time of publication for real-world graphs like twitter (3.7 second) and friendster (1.8 seconds) on GPU accelerators when the graph is GPU resident. This is a 1.7 and 1.2 time improvement over previous state-of-the-art triangle counting on GPUs. We also improved end-to-end execution time by overlapping computation and communication of the graph to the GPUs. In terms of end-to-end execution time, our implementation also achieves the fastest end-to-end times due to very low overhead costs.

More Details

Distributed-memory lattice H-matrix factorization

International Journal of High Performance Computing Applications

Yamazaki, Ichitaro Y.; Ida, Akihiro; Yokota, Rio; Dongarra, Jack

We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the linear complexity of the-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of linear complexity of the-matrix factorization. We first compare factorization performances using the L-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR. We then compare the BLR and lattice (H-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.

More Details

Increasing accuracy of iterative refinement in limited floating-point arithmetic on half-precision accelerators

2019 IEEE High Performance Extreme Computing Conference, HPEC 2019

Yamazaki, Ichitaro Y.; Dongarra, Jack

The emergence of deep learning as a leading computational workload for machine learning tasks on large-scale cloud infrastructure installations has led to plethora of accelerator hardware releases. However, the reduced precision and range of the floating-point numbers on these new platforms makes it a non-trivial task to leverage these unprecedented advances in computational power for numerical linear algebra operations that come with a guarantee of robust error bounds. In order to address these concerns, we present a number of strategies that can be used to increase the accuracy of limited-precision iterative refinement. By limited precision, we mean 16-bit floating-point formats implemented in modern hardware accelerators and are not necessarily compliant with the IEEE half-precision specification. We include the explanation of a broader context and connections to established IEEE floating-point standards and existing high-performance computing (HPC) benchmarks. We also present a new formulation of LU factorization that we call signed square root LU which produces more numerically balanced L and U factors which directly address the problems of limited range of the low-precision storage formats. The experimental results indicate that it is possible to recover substantial amounts of the accuracy in the system solution that would otherwise be lost. Previously, this could only be achieved by using iterative refinement based on single-precision floating-point arithmetic. The discussion will also explore the numerical stability issues that are important for robust linear solvers on these new hardware platforms.

More Details

The Impact on Mix of Different Preheat Protocols

Harvey-Thompson, Adam J.; Geissel, Matthias G.; Jennings, Christopher A.; Weis, Matthew R.; Ampleford, David A.; Bliss, David E.; Chandler, Gordon A.; Fein, Jeffrey R.; Galloway, B.R.; Glinsky, Michael E.; Gomez, Matthew R.; Hahn, K.D.; Hansen, Stephanie B.; Harding, Eric H.; Kimmel, Mark W.; Knapp, Patrick K.; Perea, L.; Peterson, Kara J.; Porter, John L.; Rambo, Patrick K.; Robertson, Grafton K.; Rochau, G.A.; Ruiz, Daniel E.; Schwarz, Jens S.; Shores, Jonathon S.; Sinars, Daniel S.; Slutz, Stephen A.; Smith, Ian C.; Speas, Christopher S.; Whittemore, K.; Woodbury, Daniel; Smith, G.E.

Abstract not provided.

Resilient Computing with Dynamical Systems

Rothganger, Fredrick R.; Cardwell, Suma G.

We reformulate fundamental numerical problems to run on novel hardware inspired by the brain. Such "neuromorphie hardware consumes less energy per computation, promising a means to augment next-generation exascale computers. However, their programming model is radically different from floating-point machines, with fewer guarantees about precision and communication. The approach is to pass each given problem through a sequence of transformations (algorithmic "reductions") which change it from conventional form into a dynamical system, then ultimately into a spiking neural network. Results for the eigenvalue problem are presented, showing that the dynamical system formulation is feasible.

More Details

Scalable triangle counting on distributed-memory systems

2019 IEEE High Performance Extreme Computing Conference, HPEC 2019

Acer, Seher A.; Yasar, Abdurrahman; Rajamanickam, Sivasankaran R.; Wolf, Michael W.; Catalyurek, Umit V.

Triangle counting is a foundational graph-analysis kernel in network science. It has also been one of the challenge problems for the 'Static Graph Challenge'. In this work, we propose a novel, hybrid, parallel triangle counting algorithm based on its linear algebra formulation. Our framework uses MPI and Cilk to exploit the benefits of distributed-memory and shared-memory parallelism, respectively. The problem is partitioned among MPI processes using a two-dimensional (2D) Cartesian block partitioning. One-dimensional (1D) rowwise partitioning is used within the Cartesian blocks for shared-memory parallelism using the Cilk programming model. Besides exhibiting very good strong scaling behavior in almost all tested graphs, our algorithm achieves the fastest time on the 1.4B edge real-world twitter graph, which is 3.217 seconds, on 1,092 cores. In comparison to past distributed-memory parallel winners of the graph challenge, we demonstrate a speed up of 2.7× on this twitter graph. This is also the fastest time reported for parallel triangle counting on the twitter graph when the graph is not replicated.

More Details

Neural Inspired Computation Remote Sensing Platform

Vineyard, Craig M.; Severa, William M.; Green, Sam G.; Dellana, Ryan A.; Plagge, Mark P.; Hill, Aaron J.

Remote sensing (RS) data collection capabilities are rapidly evolving hyper-spectrally (sensing more spectral bands), hyper-temporally (faster sampling rates) and hyper-spatially (increasing number of smaller pixels). Accordingly, sensor technologies have outpaced transmission capa- bilities introducing a need to process more data at the sensor. While many sophisticated data processing capabilities are emerging, power and other hardware requirements for these approaches on conventional electronic systems place them out of context for resource constrained operational environments. To address these limitations, in this research effort we have investigated and char- acterized neural-inspired architectures to determine suitability for implementing RS algorithms In doing so, we have been able to highlight a 100x performance per watt improvement using neu- romorphic computing as well as developed an algorithmic architecture co-design and exploration capability.

More Details

Progress in Implementing High-Energy Low-Mix Laser Preheat for MagLIF

Harvey-Thompson, Adam J.; Geissel, Matthias G.; Jennings, Christopher A.; Weis, Matthew R.; Ampleford, David A.; Bliss, David E.; Chandler, Gordon A.; Fein, Jeffrey R.; Galloway, B.R.; Glinsky, Michael E.; Gomez, Matthew R.; Hahn, K.D.; Hansen, Stephanie B.; Harding, Eric H.; Kimmel, Mark W.; Knapp, Patrick K.; Perea, L.; Peterson, Kara J.; Porter, John L.; Rambo, Patrick K.; Robertson, Grafton K.; Rochau, G.A.; Ruiz, Daniel E.; Schwarz, Jens S.; Shores, Jonathon S.; Sinars, Daniel S.; Slutz, Stephen A.; Smith, Ian C.; Speas, Christopher S.; Whittemore, K.; Woodbury, Daniel; Smith, G.E.

Abstract not provided.

Geometric mapping of tasks to processors on parallel computers with mesh or torus networks

IEEE Transactions on Parallel and Distributed Systems

Deveci, Mehmet; Devine, Karen D.; Laros, James H.; Taylor, Mark A.; Rajamanickam, Sivasankaran R.; Catalyurek, Umit V.

We present a new method for reducing parallel applications’ communication time by mapping their MPI tasks to processors in a way that lowers the distance messages travel and the amount of congestion in the network. Assuming geometric proximity among the tasks is a good approximation of their communication interdependence, we use a geometric partitioning algorithm to order both the tasks and the processors, assigning task parts to the corresponding processor parts. In this way, interdependent tasks are assigned to “nearby” cores in the network. We also present a number of algorithmic optimizations that exploit specific features of the network or application to further improve the quality of the mapping. We specifically address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network. However, our methods generalize to contiguous allocations as well, and results are shown for both contiguous and non-contiguous allocations. We show that, for the structured finite difference mini-application MiniGhost, our mapping methods reduced communication time up to 75 percent relative to MiniGhost’s default mapping on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modeling code E3SM/HOMME, our methods reduced communication time up to 31% on 16K cores of an IBM BlueGene/Q with contiguous allocation.

More Details

Spall kinetics model description

Silling, Stewart A.

Under high-rate loading in tension, metals can sustain much larger tensile stresses for sub-microsecond time periods than would be possible under quasi-static conditions. This type of failure, known as spall, is not adequately reproduced by hydrocodes with commonly used failure models. The Spall Kinetics Model treats spall by incorporating a time scale into the process of failure. Under sufficiently strong tensile states of stress, damage accumulates over this time scale, which can be thought of as an incubation time. The time scale depends on the previous loading history of the material, reflecting possible damage by a shock wave. The model acts by modifying the hydrostatic pressure that is predicted by any equation of state and is therefore simple to implement. Examples illustrate the ability of the model to reproduce the spall stress and resulting release waves in plate impact experiments on stainless steel.

More Details

Scalable inference for sparse deep neural networks using kokkos kernels

2019 IEEE High Performance Extreme Computing Conference, HPEC 2019

Ellis, John E.; Rajamanickam, Sivasankaran R.

Over the last decade, hardware advances have led to the feasibility of training and inference for very large deep neural networks. Sparsified deep neural networks (DNNs) can greatly reduce memory costs and increase throughput of standard DNNs, if loss of accuracy can be controlled. The IEEE HPEC Sparse Deep Neural Network Graph Challenge serves as a testbed for algorithmic and implementation advances to maximize computational performance of sparse deep neural networks. We base our sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library. Using the sparse matrix-matrix multiplication in Kokkos Kernels allows us to reuse a highly optimized kernel. We focus on reducing the single node and multi-node runtimes for 12 sparse networks. We test KK-SpDNN on Intel Skylake and Knights Landing architectures and see 120-500x improvement on single node performance over the serial reference implementation. We run in data-parallel mode with MPI to further speed up network inference, ultimately obtaining an edge processing rate of 1.16e+12 on 20 Skylake nodes. This translates to a 13x speed up on 20 nodes compared to our highly optimized multithreaded implementation on a single Skylake node.

More Details

Exploration of fine-grained parallelism for load balancing eager K-truss on GPU and CPU

2019 IEEE High Performance Extreme Computing Conference, HPEC 2019

Blanco, Mark; Low, Tze M.; Kim, Kyungjoo K.

In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU execution. We demonstrate our fine-grained parallel approach using implementations in Kokkos and evaluate them on an Intel Skylake CPU and an Nvidia Tesla V100 GPU. Overall, we observe between a 1.261. 48x improvement on the CPU and a 9.97-16.92x improvement on the GPU due to our fine-grained parallel formulation.

More Details

Compatible Particle Discretizations (Final LDRD Report)

Bochev, Pavel B.; Bosler, Peter A.; Kuberry, Paul A.; Perego, Mauro P.; Peterson, Kara J.; Trask, Nathaniel A.

This report summarizes the work performed under a three year LDRD project aiming to develop mathematical and software foundations for compatible meshfree and particle discretizations. We review major technical accomplishments and project metrics such as publications, conference and colloquia presentations and organization of special sessions and minisimposia. The report concludes with a brief summary of ongoing projects and collaborations that utilize the products of this work.

More Details

BrainSLAM

Wang, Felix W.; Aimone, James B.; Musuvathy, Srideep M.; Anwar, Abrar

This research aims to develop brain-inspired solutions for reliable and adaptive autonomous navigation in systems that have limited internal and external sensors and may not have access to reliable GPS information. The algorithms investigated and developed by this project was performed in the context of Sandas A4H (autonomy for hypersonics) mission campaign. These algorithms were additionally explored with respect to their suitability for implementation on emerging neuromorphic computing hardware technology. This project is premised on the hypothesis that brain-inspired SLAM (simultaneous localization and mapping) algorithms may provide an energy-efficient, context-flexible approach to robust sensor-based, real-time navigation.

More Details

Monitoring and Repair of Cement-Geomaterial Interfaces in Borehole and Repository Scenarios

Matteo, Edward N.; McMahon, Kevin A.; Camphouse, Russell C.; Dewers, Thomas D.; Jove Colon, Carlos F.; Fuller, Timothy J.; Mohahgheghi, Joseph; Stormont, J.C.; Taha, Mahmoud R.; Pyrak-Nolte, Laura; Wang, Chaoyi; Douba, A.; Genedy, Moneeb; Fernandez, Serafin G.; Kandil, U.F.; Soliman, E.E.; Starr, J.; Stenko, Mike

The failure of subsurface seals (i.e., wellbores, shaft and drift seals in a deep geologic nuclear waste repository) has important implications for US Energy Security. The performance of these cementitious seals is controlled by a combination of chemical and mechanical forces, which are coupled processes that occur over multiple length scales. The goal of this work is to improve fundamental understanding of cement-geomaterial interfaces and develop tools and methodologies to characterize and predict performance of subsurface seals. This project utilized a combined experimental and modeling approach to better understand failure at cement-geomaterial interfaces. Cutting-edge experimental methods and characterization methods were used to understand evolution of the material properties during chemo-mechanical alteration of cement-geomaterial interfaces. Software tools were developed to model chemo-mechanical coupling and predict the complex interplay between reactive transport and solid mechanics. Novel, fit-for-purpose materials were developed and tested using fundamental understanding of failure processes at cement-geomaterial interfaces.

More Details

Prediction and Inference of Multi-scale Electrical Properties of Geomaterials

Weiss, Chester J.; Beskardes, G.D.; van Bloemen Waanders, Bart G.

Motivated by the need for improved forward modeling and inversion capabilities of geophysical response in geologic settings whose fine--scale features demand accountability, this project describes two novel approaches which advance the current state of the art. First is a hierarchical material properties representation for finite element analysis whereby material properties can be prescribed on volumetric elements, in addition to their facets and edges. Hence, thin or fine--scaled features can be economically represented by small numbers of connected edges or facets, rather than 10's of millions of very small volumetric elements. Examples of this approach are drawn from oilfield and near--surface geophysics where, for example, electrostatic response of metallic infastructure or fracture swarms is easily calculable on a laptop computer with an estimated reduction in resource allocation by 4 orders of magnitude over traditional methods. Second is a first-ever solution method for the space--fractional Helmholtz equation in geophysical electromagnetics, accompanied by newly--found magnetotelluric evidence supporting a fractional calculus representation of multi-scale geomaterials. Whereas these two achievements are significant in themselves, a clear understanding the intermediate length scale where these two endmember viewpoints must converge remains unresolved and is a natural direction for future research. Additionally, an explicit mapping from a known multi-scale geomaterial model to its equivalent fractional calculus representation proved beyond the scope of the present research and, similarly, remains fertile ground for future exploration.

More Details

Semi-local Density Functional Approximations for Bulk, Surface, and Confinement Physics

Cangi, Attila C.; Sagredo, Francisca S.; Decolvenaere, Elizabeth; Mattsson, Ann E.

Due to its balance of accuracy and computational cost, density functional theory has become the method of choice for computing the electronic structure and related properties of materials. However, present-day semi-local approximations to the exchange-correlation energy of density functional theory break down for materials containing d and f electrons. In this report we summarize the results of our research efforts within the LDRD 200202 titled "Making density functional theory work for all materials" in addressing this issue. Our efforts are grouped into two research thrusts. In the first thrust, we develop an exchange-correlation functional (BSC functional) within the subsystem functional formalism. It enables us to capture bulk, surface, and confinement physics with a single, semi-local exchange-correlation functional in density functional theory calculations. We present the analytical properties of the BSC functional and demonstrate that the BSC functional is able to capture confinement physics more accurately than standard semi-local exchange-correlation functionals. The second research thrust focusses on developing a database for transition metal binary compounds. The database consists of materials properties (formation energies, ground-state energies, lattice constants, and elastic constants) of 26 transition metal elements and 89 transition metal alloys. It serves as a reference for benchmarking computational models (such as lower-level modeling methods and exchange-correlation functionals). We expect that our database will significantly impact the materials science community. We conclude with a brief discussion on the future research directions and impact of our results.

More Details

A Guide to Solar Power Forecasting using ARMA Models

Proceedings of 2019 IEEE PES Innovative Smart Grid Technologies Europe, ISGT-Europe 2019

Singh, Bismark S.; Pozo, David

In this short article, we summarize a step-by-step methodology to forecast power output from a photovoltaic solar generator using hourly auto-regressive moving average (ARMA) models. We illustrate how to build an ARMA model, to use statistical tests to validate it, and construct hourly samples. The resulting model inherits nice properties for embedding it into more sophisticated operation and planning models, while at the same time showing relatively good accuracy. Additionally, it represents a good forecasting tool for sample generation for stochastic energy optimization models.

More Details

Design Installation and Operation of the Vortex ART Platform

Gauntt, Nathan E.; Davis, Kevin D.; Repik, Jason; Brandt, James M.; Gentile, Ann C.; Hammond, Simon D.

ATS platforms are some of the largest, most complex, and most expensive computer systems installed in the United States at just a few major national laboratories. This milestone describes our recent efforts to procure, install, and test a machine called Vortex at Sandia National Laboratories that is compatible with the larger ATS platform Sierra at LLNL. In this milestone, we have 1) configured and procured a machine with similar hardware characteristics as Sierra ATS, 2) installed the machine, verified its physical hardware, and measured its baseline performance, and 3) demonstrated the machine's compatibility with Sierra ATS, and capacity for useful development and testing of Sandia computer codes (such as SPARC), including uses such as nightly regression testing workloads.

More Details

Approximating two-stage chance-constrained programs with classical probability bounds

Optimization Letters

Singh, Bismark S.; Watson, Jean-Paul W.

We consider a joint-chance constraint (JCC) as a union of sets, and approximate this union using bounds from classical probability theory. When these bounds are used in an optimization model constrained by the JCC, we obtain corresponding upper and lower bounds on the optimal objective function value. We compare the strength of these bounds against each other under two different sampling schemes, and observe that a larger correlation between the uncertainties tends to result in more computationally challenging optimization models. We also observe the same set of inequalities to provide the tightest upper and lower bounds in our computational experiments.

More Details

An Agile Design-to-Simulation Workflow Using a New Conforming Moving Least Squares Method

Koester, Jacob K.; Tupek, Michael R.; Mitchell, Scott A.

This report summarizes the accomplishments and challenges of a two year LDRD effort focused on improving design-to-simulation agility. The central bottleneck in most solid mechanics simulations is the process of taking CAD geometry and creating a discretization of suitable quality, i.e., the "meshing" effort. This report revisits meshfree methods and documents some key advancements that allow their use on problems with complex geometries, low quality meshes, nearly incompressible materials or that involve fracture. The resulting capability was demonstrated to be an effective part of an agile simulation process by enabling rapid discretization techniques without increasing the time to obtain a solution of a given accuracy. The first enhancement addressed boundary-related challenges associated with meshfree methods. When using point clouds and Euclidean metrics to construct approximation spaces, the boundary information is lost, which results in low accuracy solutions for non-convex geometries and mate rial interfaces. This also complicates the application of essential boundary conditions. The solution involved the development of conforming window functions which use graph and boundary information to directly incorporate boundaries into the approximation space.

More Details

Dragonfly-Inspired Algorithms for Intercept Trajectory Planning

Chance, Frances S.

Dragonflies are known to be highly successful hunters (achieving 90-95% success rate in nature) that implement a guidance law like proportional navigation to intercept their prey. This project tested the hypothesis that dragonflies are able to implement proportional navigation using prey-image translation on their eyes. The model dragonfly presented here calculates changes in pitch and yaw to maintain the prey's image at a designated location (the fovea) on a two-dimensional screen (the model's eyes ). When the model also uses self-knowledge of its own maneuvers as an error signal to adjust the location of the fovea, its interception trajectory becomes equivalent to proportional navigation. I also show that this model can also be applied successfully (in a limited number of scenarios) against maneuvering prey. My results provide a proof-of-concept demonstration of the potential of using the dragonfly nervous system to design a robust interception algorithm for implementation on a man-made system.

More Details

On-line Generation and Error Handling for Surrogate Models within Multifidelity Uncertainty Quantification

Blonigan, Patrick J.; Geraci, Gianluca G.; Rizzi, Francesco N.; Eldred, Michael S.; Carlberg, Kevin

Uncertainty quantification is recognized as a fundamental task to obtain predictive numerical simulations. However, many realistic engineering applications require complex and computationally expensive high-fidelity numerical simulations for the accurate characterization of the system responses. Moreover, complex physical models and extreme operative conditions can easily lead to hundreds of uncertain parameters that need to be propagated through high-fidelity codes. Under these circumstances, a single fidelity approach, i.e. a workflow that only uses high-fidelity simulations to perform the uncertainty quantification task, is unfeasible due to the prohibitive overall computational cost. In recent years, multifidelity strategies have been introduced to overcome this issue. The core idea of this family of methods is to combine simulations with varying levels of fidelity/accuracy in order to obtain the multifidelity estimators or surrogates with the same accuracy of their single fidelity counterparts at a much lower computational cost. This goal is usually accomplished by defining a prioria sequence of discretization levels or physical modeling assumptions that can be used to decrease the complexity of a numerical realization and thus its computational cost. However ,less attention has been dedicated to low-fidelity models that can be built directly from the small number of high-fidelity simulations available. In this work we focus our attention on Reduced-Order Models that can be considered a particular class of data-driven approaches. Our main goal is to explore the combination of multifidelity uncertainty quantification and reduced-order models to obtain an efficient framework for propagating uncertainties through expensive numerical codes.

More Details

Hybridizing Classifiers and Collection Systems to Maximize Intelligence and Minimize Uncertainty in National Security Data Analytics Applications

Staid, Andrea S.; Valicka, Christopher G.

There are numerous applications that combine data collected from sensors with machine-learning based classification models to predict the type of event or objects observed. Both the collection of the data itself and the classification models can be tuned for optimal performance, but we hypothesize that additional gains can be realized by jointly assessing both factors together. Through this research, we used a seismic event dataset and two neural network classification models that issued probabilistic predictions on each event to determine whether it was an earthquake or a quarry blast. Real world applications will have constraints on data collection, perhaps in terms of a budget for the number of sensors or on where, when, or how data can be collected. We mimicked such constraints by creating subnetworks of sensors with both size and locational constraints. We compare different methods of determining the set of sensors in each subnetwork in terms of their predictive accuracy and the number of events that they observe overall. Additionally, we take the classifiers into account, treating them both as black-box models and testing out various ways of combining predictions among models and among the set of sensors that observe any given event. We find that comparable overall performance can be seen with less than half the number of sensors in the full network. Additionally, a voting scheme that uses the average confidence across the sensors for a given event shows improved predictive accuracy across nearly all subnetworks. Lastly, locational constraints matter, but sometimes in unintuitive ways, as better-performing sensors may be chosen instead of the ones excluded based on location. This being a short-term research effort, we offer a lengthy discussion on interesting next-steps and ties to other ongoing research efforts that we did not have time to pursue. These include a detailed analysis of the subnetwork performance broken down by event type, specific location, and model confidence. This project also included a Campus Executive research partnership with Texas A&M University. Through this, we worked with a professor and student to study information gain for UAV routing. This was an alternative way of looking at the similar problem space that includes sensor operation for data collection and the resulting benefit to be gained from it. This work is described in an Appendix.

More Details

Shortening the Design and Certification Cycle for Additively Manufactured Materials by Improved Mesoscale Simulations and Validation Experiments: Fiscal Year 2019 Status Report

Specht, Paul E.; Mitchell, John A.; Adams, David P.; Brown, Justin L.; Silling, Stewart A.; Wise, Jack L.; Palmer, Todd

This report outlines the fiscal year (FY) 2019 status of an ongoing multi-year effort to develop a general, microstructurally-aware, continuum-level model for representing the dynamic response of material with complex microstructures. This work has focused on accurately representing the response of both conventionally wrought processed and additively manufactured (AM) 304L stainless steel (SS) as a test case. Additive manufacturing, or 3D printing, is an emerging technology capable of enabling shortened design and certification cycles for stockpile components through rapid prototyping. However, there is not an understanding of how the complex and unique microstructures of AM materials affect their mechanical response at high strain rates. To achieve our project goal, an upscaling technique was developed to bridge the gap between the microstructural and continuum scales to represent AM microstructures on a Finite Element (FE) mesh. This process involves the simulations of the additive process using the Sandia developed kinetic Monte Carlo (KMC) code SPPARKS. These SPPARKS microstructures are characterized using clustering algorithms from machine learning and used to populate the quadrature points of a FE mesh. Additionally, a spall kinetic model (SKM) was developed to more accurately represent the dynamic failure of AM materials. Validation experiments were performed using both pulsed power machines and projectile launchers. These experiments have provided equation of state (EOS) and flow strength measurements of both wrought and AM 304L SS to above Mbar pressures. In some experiments, multi-point interferometry was used to quantify the variation is observed material response of the AM 304L SS. Analysis of these experiments is ongoing, but preliminary comparisons of our upscaling technique and SKM to experimental data were performed as a validation exercise. Moving forward, this project will advance and further validate our computational framework, using advanced theory and additional high-fidelity experiments.

More Details

Designer quantum materials

Misra, Shashank M.; Ward, Daniel R.; Baczewski, Andrew D.; Campbell, Quinn C.; Schmucker, Scott W.; Mounce, Andrew M.; Tracy, Lisa A.; Lu, Tzu-Ming L.; Marshall, Michael T.; Campbell, DeAnna M.

Quantum materials have long promised to revolutionize everything from energy transmission (high temperature superconductors) to both quantum and classical information systems (topological materials). However, their discovery and application has proceeded in an Edisonian fashion due to both an incomplete theoretical understanding and the difficulty of growing and purifying new materials. This project leverages Sandia's unique atomic precision advanced manufacturing (APAM) capability to design small-scale tunable arrays (designer materials) made of donors in silicon. Their low-energy electronic behavior can mimic quantum materials, and can be tuned by changing the fabrication parameters for the array, thereby enabling the discovery of materials systems which can't yet be synthesized. In this report, we detail three key advances we have made towards development of designer quantum materials. First are advances both in APAM technique and underlying mechanisms required to realize high-yielding donor arrays. Second is the first-ever observation of distinct phases in this material system, manifest in disordered 2D sheets of donors. Finally are advances in modeling the electronic structure of donor clusters and regular structures incorporating them, critical to understanding whether an array is expected to show interesting physics. Combined, these establish the baseline knowledge required to manifest the strongly-correlated phases of the Mott-Hubbard model in donor arrays, the first step to deploying APAM donor arrays as analogues of quantum materials.

More Details

Higher-moment buffered probability

Optimization Letters

Kouri, Drew P.

In stochastic optimization, probabilities naturally arise as cost functionals and chance constraints. Unfortunately, these functions are difficult to handle both theoretically and computationally. The buffered probability of failure and its subsequent extensions were developed as numerically tractable, conservative surrogates for probabilistic computations. In this manuscript, we introduce the higher-moment buffered probability. Whereas the buffered probability is defined using the conditional value-at-risk, the higher-moment buffered probability is defined using higher-moment coherent risk measures. In this way, the higher-moment buffered probability encodes information about the magnitude of tail moments, not simply the tail average. We prove that the higher-moment buffered probability is closed, monotonic, quasi-convex and can be computed by solving a smooth one-dimensional convex optimization problem. These properties enable smooth reformulations of both higher-moment buffered probability cost functionals and constraints.

More Details

Gaussian-Process-Driven Adaptive Sampling for Reduced-Order Modeling of Texture Effects in Polycrystalline Alpha-Ti

JOM

Tallman, Aaron E.; Stopka, Krzysztof S.; Swiler, Laura P.; Wang, Yan; Kalidindi, Surya R.; Mcdowell, David L.

Data-driven tools for finding structure–property (S–P) relations, such as the Materials Knowledge System (MKS) framework, can accelerate materials design, once the costly and technical calibration process has been completed. A three-model method is proposed to reduce the expense of S–P relation model calibration: (1) direct simulations are performed as per (2) a Gaussian process-based data collection model, to calibrate (3) an MKS homogenization model in an application to α-Ti. The new methods are compared favorably with expert texture selection on the performance of the so-calibrated MKS models. Benefits for the development of new and improved materials are discussed.

More Details

EMPIRE-PIC Code Verification of a Cold Diode

Smith, Thomas M.; Pointon, T.D.; Cartwright, K.L.; Rider, W.J.

This report presents the code verification of EMPIRE-PIC to the analytic solution to a cold diode which was first derived by Jaffe. The cold diode was simulated using EMPIRE-PIC and the error norms were computed based on the Jaffe solution. The diode geometry is one-dimensional and uses the EMPIRE electrostatic field solver. After a transient start-up phase as the electrons first cross the anode-cathode gap, the simulations reach an equilibrium where the electric potential and electric field are approximately steady. The expected spatial order of convergence for potential, electric field and particle velocity are observed.

More Details

A parallel graph algorithm for detecting mesh singularities in distributed memory ice sheet simulations

ACM International Conference Proceeding Series

Bogle, Ian A.; Devine, Karen D.; Perego, Mauro P.; Rajamanickam, Sivasankaran R.; Slota, George M.

We present a new, distributed-memory parallel algorithm for detection of degenerate mesh features that can cause singularities in ice sheet mesh simulations. Identifying and removing mesh features such as disconnected components (icebergs) or hinge vertices (peninsulas of ice detached from the land) can significantly improve the convergence of iterative solvers. Because the ice sheet evolves during the course of a simulation, it is important that the detection algorithm can run in situ with the simulation - - running in parallel and taking a negligible amount of computation time - - so that degenerate features (e.g., calving icebergs) can be detected as they develop. We present a distributed memory, BFS-based label-propagation approach to degenerate feature detection that is efficient enough to be called at each step of an ice sheet simulation, while correctly identifying all degenerate features of an ice sheet mesh. Our method finds all degenerate features in a mesh with 13 million vertices in 0.0561 seconds on 1536 cores in the MPAS Albany Land Ice (MALI) model. Compared to the previously used serial pre-processing approach, we observe a 46,000x speedup for our algorithm, and provide additional capability to do dynamic detection of degenerate features in the simulation.

More Details

TATB Sensitivity to Shocks from Electrical Arcs

Propellants, Explosives, Pyrotechnics

Chen, Kenneth C.; Warne, Larry K.; Jorgenson, Roy E.; Niederhaus, John H.

Use of insensitive high explosives (IHEs) has significantly improved ammunition safety because of their remarkable insensitivity to violent cook-off, shock and impact. Triamino-trinitrobenzene (TATB) is the IHE used in many modern munitions. Previously, lightning simulations in different test configurations have shown that the required detonation threshold for standard density TATB at ambient and elevated temperatures (250 C) has a sufficient margin over the shock caused by an arc from the most severe lightning. In this paper, the Braginskii model with Lee-More channel conductivity prescription is used to demonstrate how electrical arcs from lightning could cause detonation in TATB. The steep rise and slow decay in typical lightning pulse are used in demonstrating that the shock pressure from an electrical arc, after reaching the peak, falls off faster than the inverse of the arc radius. For detonation to occur, two necessary detonation conditions must be met: the Pop-Plot criterion and minimum spot size requirement. The relevant Pop-Plot for TATB at 250 C was converted into an empirical detonation criterion, which is applicable to explosives subject to shocks of variable pressure. The arc cross-section was required to meet the minimum detonation spot size reported in the literature. One caveat is that when the shock pressure exceeds the detonation pressure the Pop-Plot may not be applicable, and the minimum spot size requirement may be smaller.

More Details
Results 1801–2000 of 9,998
Results 1801–2000 of 9,998