This report summarizes the result of a NEAMS project focused on sensitivity analysis of the heat transfer model in the gap between the fuel rod and the cladding used in the BISON fuel performance code of Idaho National Laboratory. Using the gap heat transfer models in BISON, the sensitivity of the modeling parameters and the associated responses is investigated. The study results in a quantitative assessment of the role of various parameters in the analysis of gap heat transfer in nuclear fuel.
A method is described for applying a sequence of peridynamic models with different length scales concurrently to subregions of a body. The method allows the smallest length scale, and therefore greatest spatial resolution, to be focused on evolving defects such as cracks. The peridynamic horizon in each of the models is half of that of the next model in the sequence. The boundary conditions on each model are provided by the solution predicted by the model above it. Material property characterization for each model is derived by coarse-graining the more detailed resolution in the model below it. Implementation of the multiscale method in the PDMS code is described. Examples of crack growth modeling illustrate the ability of the method to reproduce the main features of crack growth seen in a model with uniformly small resolution. Comparison of the multiscale model results with XFEM and cohesive elements is also given for a crack growth problem.
The peridynamic theory is an extension of traditional solid mechanics in which the field equations can be applied on discontinuities, such as growing cracks. This paper proposes a bond damage model within peridynamics to treat the nucleation and growth of cracks due to cyclic loading. Bond damage occurs according to the evolution of a variable called the "remaining life" of each bond that changes over time according to the cyclic strain in the bond. It is shown that the model reproduces the main features of S-N data for typical materials and also reproduces the Paris law for fatigue crack growth. Extensions of the model account for the effects of loading spectrum, fatigue limit, and variable load ratio. A three-dimensional example illustrates the nucleation and growth of a helical fatigue crack in the torsion of an aluminum alloy rod.
We created a highly efficient, universal 3D quant um transport simulator. We demonstrated that the simulator scales linearly - both with the problem size (N) and number of CPUs, which presents an important break-through in the field of computational nanoelectronics. It allowed us, for the first time, to accurately simulate and optim ize a large number of realistic nanodevices in a much shorter time, when compared to other methods/codes such as RGF[%7EN 2.333 ]/KNIT, KWANT, and QTBM[%7EN 3 ]/NEMO5. In order to determine the best-in-class for different beyond-CMOS paradigms, we performed rigorous device optimization for high-performance logic devices at 6-, 5- and 4-nm gate lengths. We have discovered that there exists a fundamental down-scaling limit for CMOS technology and other Field-Effect Transistors (FETs). We have found that, at room temperatures, all FETs, irre spective of their channel material, will start experiencing unacceptable level of thermally induced errors around 5-nm gate lengths.
The MueLu tutorial is written as a hands-on tutorial for MueLu, the next generation multigrid framework in Trilinos. It covers the whole spectrum from absolute beginners’ topics to expert level. Since the focus of this tutorial is on practical and technical aspects of multigrid methods in general and MueLu in particular, the reader is expected to have a basic understanding of multigrid methods and its general underlying concepts. Please refer to multigrid textbooks (e.g. [1]) for the theoretical background.
Adaptive Thinking has been defined here as the capacity to recognize when a course of action that may have previously been effective is no longer effective and there is need to adjust strategy. Research was undertaken with human test subjects to identify the factors that contribute to adaptive thinking. It was discovered that those most effective in settings that call for adaptive thinking tend to possess a superior capacity to quickly and effectively generate possible courses of action, as measured using the Category Generation test. Software developed for this research has been applied to develop capabilities enabling analysts to identify crucial factors that are predictive of outcomes in fore-on-force simulation exercises.
This is the official user guide for the M UE L U multigrid library in Trilinos version 11.12. This guide provides an overview of M UE L U , its capabilities, and instructions for new users who want to start using M UE L U with a minimum of effort. Detailed information is given on how to drive M UE L U through its XML interface. Links to more advanced use cases are given. This guide gives information on how to achieve good parallel performance, as well as how to introduce new algorithms. Finally, readers will find a comprehensive listing of available M UE L U options. Any options not documented in this manual should be considered strictly experimental.
Adult neurogenesis in the hippocampus region of the brain is a neurobiological process that is believed to contribute to the brain's advanced abilities in complex pattern recognition and cognition. Here, we describe how realistic scale simulations of the neurogenesis process can offer both a unique perspective on the biological relevance of this process and confer computational insights that are suggestive of novel machine learning techniques. First, supercomputer based scaling studies of the neurogenesis process demonstrate how a small fraction of adult-born neurons have a uniquely larger impact in biologically realistic scaled networks. Second, we describe a novel technical approach by which the information content of ensembles of neurons can be estimated. Finally, we illustrate several examples of broader algorithmic impact of neurogenesis, including both extending existing machine learning approaches and novel approaches for intelligent sensing.
This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performance computing to obtain better data locality and thus reduce run times.
This report presents the test cases used to verify, validate and demonstrate the features and capabilities of the first release of the 3D Direct Simulation Monte Carlo (DSMC) code SPARTA (Stochastic Real Time Particle Analyzer). The test cases included in this report exercise the most critical capabilities of the code like the accurate representation of physical phenomena (molecular advection and collisions, energy conservation, etc.) and implementation of numerical methods (grid adaptation, load balancing, etc.). Several test cases of simple flow examples are shown to demonstrate that the code can reproduce phenomena predicted by analytical solutions and theory. A number of additional test cases are presented to illustrate the ability of SPARTA to model flow around complicated shapes. In these cases, the results are compared to other well-established codes or theoretical predictions. This compilation of test cases is not exhaustive, and it is anticipated that more cases will be added in the future.
Model reduction for dynamical systems is a promising approach for reducing the computational cost of large-scale physics-based simulations to enable high-fidelity models to be used in many- query (e.g., Bayesian inference) and near-real-time (e.g., fast-turnaround simulation) contexts. While model reduction works well for specialized problems such as linear time-invariant systems, it is much more difficult to obtain accurate, stable, and efficient reduced-order models (ROMs) for systems with general nonlinearities. This report describes several advances that enable nonlinear reduced-order models (ROMs) to be deployed in a variety of time-critical settings. First, we present an error bound for the Gauss-Newton with Approximated Tensors (GNAT) nonlinear model reduction technique. This bound allows the state-space error for the GNAT method to be quantified when applied with the backward Euler time-integration scheme. Second, we present a methodology for preserving classical Lagrangian structure in nonlinear model reduction. This technique guarantees that important properties--such as energy conservation and symplectic time-evolution maps--are preserved when performing model reduction for models described by a Lagrangian formalism (e.g., molecular dynamics, structural dynamics). Third, we present a novel technique for decreasing the temporal complexity --defined as the number of Newton-like iterations performed over the course of the simulation--by exploiting time-domain data. Fourth, we describe a novel method for refining projection-based reduced-order models a posteriori using a goal-oriented framework similar to mesh-adaptive h -refinement in finite elements. The technique allows the ROM to generate arbitrarily accurate solutions, thereby providing the ROM with a 'failsafe' mechanism in the event of insufficient training data. Finally, we present the reduced-order model error surrogate (ROMES) method for statistically quantifying reduced- order-model errors. This enables ROMs to be rigorously incorporated in uncertainty-quantification settings, as the error model can be treated as a source of epistemic uncertainty. This work was completed as part of a Truman Fellowship appointment. We note that much additional work was performed as part of the Fellowship. One salient project is the development of the Trilinos-based model-reduction software module Razor , which is currently bundled with the Albany PDE code and currently allows nonlinear reduced-order models to be constructed for any application supported in Albany. Other important projects include the following: 1. ROMES-equipped ROMs for Bayesian inference: K. Carlberg, M. Drohmann, F. Lu (Lawrence Berkeley National Laboratory), M. Morzfeld (Lawrence Berkeley National Laboratory). 2. ROM-enabled Krylov-subspace recycling: K. Carlberg, V. Forstall (University of Maryland), P. Tsuji, R. Tuminaro. 3. A pseudo balanced POD method using only dual snapshots: K. Carlberg, M. Sarovar. 4. An analysis of discrete v. continuous optimality in nonlinear model reduction: K. Carlberg, M. Barone, H. Antil (George Mason University). Journal articles for these projects are in progress at the time of this writing.
The HPC architectures of today are significantly different for a decade ago, with high odds that further changes will occur on the road to Exascale. This report discusses the "perfect storm' in technology that produced this change, the classes of architectures we are dealing with, and probable trends in how they will evolve. These properties and trends are then evaluated in terms of what it likely means to future Exascale systems and applications.
MPI defines a one-to-one relationship between MPI processes and ranks. This model captures many use cases effectively; however, it also limits communication concurrency and interoperability between MPI and programming models that utilize threads. Our paper describes the MPI endpoints extension, which relaxes the longstanding one-to-one relationship between MPI processes and ranks. Using endpoints, an MPI implementation can map separate communication contexts to threads, allowing them to drive communication independently. Also, endpoints enable threads to be addressable in MPI operations, enhancing interoperability between MPI and other programming models. Furthermore, these characteristics are illustrated through several examples and an empirical study that contrasts current multithreaded communication performance with the need for high degrees of communication concurrency to achieve peak communication performance.
Proceedings of the International Conference on Dependable Systems and Networks
Ibtesham, Dewan; Debonis, David; Arnold, Dorian; Ferreira, Kurt
As high-performance computing systems continue to grow in size and complexity, energy efficiency and reliability have emerged as first-order concerns. Researchers have shown that data movement is a significant contributing factor to power consumption on these systems. Additionally, rollback/recovery protocols like checkpoint/restart can generate large volumes of data traffic exacerbating the energy and power concerns. In this work, we show that a coarse-grained model can be used effectively to speculate about the energy footprints of rollback/recovery protocols. Using our validated model, we evaluate the energy footprint of checkpoint compression, a method that incurs higher computational demand to reduce data volumes and data traffic. Specifically, we show that while checkpoint compression leads to more frequent checkpoints (as per the optimal checkpoint frequency) and increases per checkpoint energy cost, compression still yields a decrease in total application energy consumption due to the overall runtime decrease.
The current system reaction to the loss of a single MPI process is to kill all the remaining processes and restart the application from the most recent checkpoint. This approach will become unfeasible for future extreme scale systems. We address this issue using an emerging resilient computing model called Local Failure Local Recovery (LFLR) that provides application developers with the ability to recover locally and continue application execution when a process is lost. We discuss the design of our software framework to enable the LFLR model using MPI-ULFM and demonstrate the resilient version of MiniFE that achieves a scalable recovery from process failures.
We present a review and critique of several methods for the simulation of the dynamics of colloidal suspensions at the mesoscale. We focus particularly on simulation techniques for hydrodynamic interactions, including implicit solvents (Fast Lubrication Dynamics, an approximation to Stokesian Dynamics) and explicit/particle-based solvents (Multi-Particle Collision Dynamics and Dissipative Particle Dynamics). Several variants of each method are compared quantitatively for the canonical system of monodisperse hard spheres, with a particular focus on diffusion characteristics, as well as shear rheology and microstructure. In all cases, we attempt to match the relevant properties of a well-characterized solvent, which turns out to be challenging for the explicit solvent models. Reasonable quantitative agreement is observed among all methods, but overall the Fast Lubrication Dynamics technique shows the best accuracy and performance. We also devote significant discussion to the extension of these methods to more complex situations of interest in industrial applications, including models for non-Newtonian solvent rheology, non-spherical particles, drying and curing of solvent and flows in complex geometries. This work identifies research challenges and motivates future efforts to develop techniques for quantitative, predictive simulations of industrially relevant colloidal suspension processes.
Recent advances in nanotechnology have enabled researchers to control individual quantum mechanical objects with unprecedented accuracy, opening the door for both quantum and extreme- scale conventional computation applications. As these devices become more complex, designing for facility of control becomes a daunting and computationally infeasible task. Here, motivated by ideas from compressed sensing, we introduce a protocol for the Compressed Optimization of Device Architectures (CODA). It leads naturally to a metric for benchmarking and optimizing device designs, as well as an automatic device control protocol that reduces the operational complexity required to achieve a particular output. Because this protocol is both experimentally and computationally efficient, it is readily extensible to large systems. For this paper, we demonstrate both the bench- marking and device control protocol components of CODA through examples of realistic simulations of electrostatic quantum dot devices, which are currently being developed experimentally for quantum computation.
Concern is growing in the High-Performance Computing community regarding the reliability of proposed exascale systems. Current research has shown that the expected reliability of these machines will greatly reduce their scalability. In constrast to current fault tolerance methods whose reliability focus is only the application, this project investigates the benefits integrating reliability mechcanisms in the operating system and runtime, as well as the appli- cation. More specifically, this project has three broad contributions in the field: First, using failure logs from current leadership-class high-performance computing systems, we outline the failures common on these large-scale systems. Second, we describe a novel memory pro- tection mechcanism capable of protecting common observed failures that uses the similarity inherrant in many OS and applications state, thereby reducing overheads. Finally, using an analogy with OS jitter, we develop a highly effecient simulator capable predicting the performance of resilience methods at the scales expected for future extreme-scale systems.
As computer systems grow in both size and complexity, the need for applications and run-time systems to adjust to their dynamic environment also grows. The goal of the RAAMP LDRD was to combine static architecture information and real-time system state with algorithms to conserve power, reduce communication costs, and avoid network contention. We devel- oped new data collection and aggregation tools to extract static hardware information (e.g., node/core hierarchy, network routing) as well as real-time performance data (e.g., CPU uti- lization, power consumption, memory bandwidth saturation, percentage of used bandwidth, number of network stalls). We created application interfaces that allowed this data to be used easily by algorithms. Finally, we demonstrated the benefit of integrating system and application information for two use cases. The first used real-time power consumption and memory bandwidth saturation data to throttle concurrency to save power without increasing application execution time. The second used static or real-time network traffic information to reduce or avoid network congestion by remapping MPI tasks to allocated processors. Results from our work are summarized in this report; more details are available in our publications [2, 6, 14, 16, 22, 29, 38, 44, 51, 54].
This milestone was the 2nd in a series of Tri-Lab Co-Design L2 milestones supporting ‘Co-Design’ efforts in the ASC program. It is a crucial step towards evaluating the effectiveness of proxy applications in exploring code performance on next generation architectures. All three labs evaluated the performance of 2 proxy applications on modern architectures and/or testbeds for pre-production hardware. The results are captured in this document as well as annotated presentations from all 3 laboratories.
Rigorous modeling of engineering systems relies on efficient propagation of uncertainty from input parameters to model outputs. In recent years, there has been substantial development of probabilistic polynomial chaos (PC) Uncertainty Quantification (UQ) methods, enabling studies in expensive computational models. One approach, termed ”intrusive”, involving reformulation of the governing equations, has been found to have superior computational performance compared to non-intrusive sampling-based methods in relevant large-scale problems, particularly in the context of emerging architectures. However, the utility of intrusive methods has been severely limited due to detrimental numerical instabilities associated with strong nonlinear physics. Previous methods for stabilizing these constructions tend to add unacceptably high computational costs, particularly in problems with many uncertain parameters. In order to address these challenges, we propose to adapt and improve numerical continuation methods for the robust time integration of intrusive PC system dynamics. We propose adaptive methods, starting with a small uncertainty for which the model has stable behavior and gradually moving to larger uncertainty where the instabilities are rampant, in a manner that provides a suitable solution.
This report summarizes the result of LDRD project 12-0395, titled "Automated Algorithms for Quantum-level Accuracy in Atomistic Simulations." During the course of this LDRD, we have developed an interatomic potential for solids and liquids called Spectral Neighbor Analysis Poten- tial (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected on to a basis of hyperspherical harmonics in four dimensions. The SNAP coef- ficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. Global optimization methods in the DAKOTA software package are used to seek out good choices of hyperparameters that define the overall structure of the SNAP potential. FitSnap.py, a Python-based software pack- age interfacing to both LAMMPS and DAKOTA is used to formulate the linear regression problem, solve it, and analyze the accuracy of the resultant SNAP potential. We describe a SNAP potential for tantalum that accurately reproduces a variety of solid and liquid properties. Most significantly, in contrast to existing tantalum potentials, SNAP correctly predicts the Peierls barrier for screw dislocation motion. We also present results from SNAP potentials generated for indium phosphide (InP) and silica (SiO 2 ). We describe efficient algorithms for calculating SNAP forces and energies in molecular dynamics simulations using massively parallel computers and advanced processor ar- chitectures. Finally, we briefly describe the MSM method for efficient calculation of electrostatic interactions on massively parallel computers.
We explore rearrangements of classical uncertainty quantification methods with the aim of achieving higher aggregate performance for uncertainty quantification calculations on emerging multicore and many core architectures. We show a rearrangement of the stochastic Galerkin method leads to improved performance and scalability on several computational architectures whereby uncertainty information is propagated at the lowest levels of the simulation code improving memory access patterns, exposing new dimensions of fine grained parallelism, and reducing communication. We also develop a general framework for implementing such rearrangements for a diverse set of uncertainty quantification algorithms as well as computational simulation codes to which they are applied.
Surface effects are critical to the accurate simulation of electromagnetics (EM) as current tends to concentrate near material surfaces. Sandia EM applications, which include exploding bridge wires for detonator design, electromagnetic launch of flyer plates for material testing and gun design, lightning blast-through for weapon safety, electromagnetic armor, and magnetic flux compression generators, all require accurate resolution of surface effects. These applications operate in a large deformation regime, where body-fitted meshes are impractical and multimaterial elements are the only feasible option. State-of-the-art methods use various mixture models to approximate the multi-physics of these elements. The empirical nature of these models can significantly compromise the accuracy of the simulation in this very important surface region. We propose to substantially improve the predictive capability of electromagnetic simulations by removing the need for empirical mixture models at material surfaces. We do this by developing an eXtended Finite Element Method (XFEM) and an associated Conformal Decomposition Finite Element Method (CDFEM) which satisfy the physically required compatibility conditions at material interfaces. We demonstrate the effectiveness of these methods for diffusion and diffusion-like problems on node, edge and face elements in 2D and 3D. We also present preliminary work on h -hierarchical elements and remap algorithms.
In this project we have developed atmospheric measurement capabilities and a suite of atmospheric modeling and analysis tools that are well suited for verifying emissions of green- house gases (GHGs) on an urban-through-regional scale. We have for the first time applied the Community Multiscale Air Quality (CMAQ) model to simulate atmospheric CO2 . This will allow for the examination of regional-scale transport and distribution of CO2 along with air pollutants traditionally studied using CMAQ at relatively high spatial and temporal resolution with the goal of leveraging emissions verification efforts for both air quality and climate. We have developed a bias-enhanced Bayesian inference approach that can remedy the well-known problem of transport model errors in atmospheric CO2 inversions. We have tested the approach using data and model outputs from the TransCom3 global CO2 inversion comparison project. We have also performed two prototyping studies on inversion approaches in the generalized convection-diffusion context. One of these studies employed Polynomial Chaos Expansion to accelerate the evaluation of a regional transport model and enable efficient Markov Chain Monte Carlo sampling of the posterior for Bayesian inference. The other approach uses de- terministic inversion of a convection-diffusion-reaction system in the presence of uncertainty. These approaches should, in principle, be applicable to realistic atmospheric problems with moderate adaptation. We outline a regional greenhouse gas source inference system that integrates (1) two ap- proaches of atmospheric dispersion simulation and (2) a class of Bayesian inference and un- certainty quantification algorithms. We use two different and complementary approaches to simulate atmospheric dispersion. Specifically, we use a Eulerian chemical transport model CMAQ and a Lagrangian Particle Dispersion Model - FLEXPART-WRF. These two models share the same WRF assimilated meteorology fields, making it possible to perform a hybrid simulation, in which the Eulerian model (CMAQ) can be used to compute the initial condi- tion needed by the Lagrangian model, while the source-receptor relationships for a large state vector can be efficiently computed using the Lagrangian model in its backward mode. In ad- dition, CMAQ has a complete treatment of atmospheric chemistry of a suite of traditional air pollutants, many of which could help attribute GHGs from different sources. The inference of emissions sources using atmospheric observations is cast as a Bayesian model calibration problem, which is solved using a variety of Bayesian techniques, such as the bias-enhanced Bayesian inference algorithm, which accounts for the intrinsic model deficiency, Polynomial Chaos Expansion to accelerate model evaluation and Markov Chain Monte Carlo sampling, and Karhunen-Lo %60 eve (KL) Expansion to reduce the dimensionality of the state space. We have established an atmospheric measurement site in Livermore, CA and are collect- ing continuous measurements of CO2 , CH4 and other species that are typically co-emitted with these GHGs. Measurements of co-emitted species can assist in attributing the GHGs to different emissions sectors. Automatic calibrations using traceable standards are performed routinely for the gas-phase measurements. We are also collecting standard meteorological data at the Livermore site as well as planetary boundary height measurements using a ceilometer. The location of the measurement site is well suited to sample air transported between the San Francisco Bay area and the California Central Valley.