Performance measurement of parallel algorithms is well studied and well understood. However, a flaw in traditional performance metrics is that they rely on comparisons to serial performance with the same input. This comparison is convenient for theoretical complexity analysis but impossible to perform in large-scale empirical studies with data sizes far too large to run on a single serial computer. Consequently, scaling studies currently rely on ad hoc methods that, although effective, have no grounded mathematical models. In this position paper we advocate using a rate-based model that has a concrete meaning relative to speedup and efficiency and that can be used to unify strong and weak scaling studies.
Visual search has been an active area of research – empirically and theoretically – for a number of decades, however much of that work is based on novice searchers performing basic tasks in a laboratory. This paper summarizes some of the issues associated with quantifying expert, domain-specific visual search behavior in operationally realistic environments.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Ciesko, Jan; Mateo, Sergi; Teruel, Xavier; Martorell, Xavier; Ayguade, Eduard; Labarta, Jesus; Duran, Alex; De Supinski, Bronis R.; Olivier, Stephen L.; Li, Kelvin; Eichenberger, Alexandre E.
Reductions represent a common algorithmic pattern in many scientific applications. OpenMP* has always supported them on parallel and worksharing constructs. OpenMP 3.0’s tasking constructs enable new parallelization opportunities through the annotation of irregular algorithms. Unfortunately the tasking model does not easily allow the expression of concurrent reductions, which limits the general applicability of the programming model to such algorithms. In this work, we present an extension to OpenMP that supports task-parallel reductions on task and taskgroup constructs to improve productivity and programmability. We present specification of the feature and explore issues for programmers and software vendors regarding programming transparency as well as the impact on the current standard with respect to nesting, untied task support and task data dependencies. Our performance evaluation demonstrates comparable results to hand coded task reductions.
Electric distribution utilities, the companies that feed electricity to end users, are overseeing a technological transformation of their networks, installing sensors and other automated equipment, that are fundamentally changing the way the grid operates. These grid modernization efforts will allow utilities to incorporate some of the newer technology available to the home user – such as solar panels and electric cars – which will result in a bi-directional flow of energy and information. How will this new flow of information affect control room operations? How will the increased automation associated with smart grid technologies influence control room operators’ decisions? And how will changes in control room operations and operator decision making impact grid resilience? These questions have not been thoroughly studied, despite the enormous changes that are taking place. In this study, which involved collaborating with utility companies in the state of Vermont, the authors proposed to advance the science of control-room decision making by understanding the impact of distribution grid modernization on operator performance. Distribution control room operators were interviewed to understand daily tasks and decisions and to gain an understanding of how these impending changes will impact control room operations. Situation awareness was found to be a major contributor to successful control room operations. However, the impact of growing levels of automation due to smart grid technology on operators’ situation awareness is not well understood. Future work includes performing a naturalistic field study in which operator situation awareness will be measured in real-time during normal operations and correlated with the technological changes that are underway. The results of this future study will inform tools and strategies that will help system operators adapt to a changing grid, respond to critical incidents and maintain critical performance skills.
The elastic properties and mechanical stability of zirconium alloys and zirconium hydrides have been investigated within the framework of density functional perturbation theory. Results show that the lowest-energy cubic Pn3m polymorph of δ-ZrH1.5 does not satisfy all the Born requirements for mechanical stability, unlike its nearly degenerate tetragonal P42/mcm polymorph. Elastic moduli predicted with the Voigt-Reuss-Hill approximations suggest that mechanical stability of α-Zr, Zr-alloy and Zr-hydride polycrystalline aggregates is limited by the shear modulus. According to both Pugh's and Poisson's ratios, α-Zr, Zr-alloy and Zr-hydride polycrystalline aggregates can be considered ductile. The Debye temperatures predicted for γ-ZrH, δ-ZrH1.5 and ε-ZrH2 are D = 299.7, 415.6 and 356.9 K, respectively, while D = 273.6, 284.2, 264.1 and 257.1 K for the α-Zr, Zry-4, ZIRLO and M5 matrices, i.e. suggesting that Zry-4 possesses the highest micro-hardness among Zr matrices.
The reproducing kernel particle method (RKPM) is a meshfree method for computational solid mechanics that can be tailored for an arbitrary order of completeness and smoothness. The primary advantage of RKPM relative to standard finiteelement (FE) approaches is its capacity to model large deformations, material damage, and fracture. Additionally, the use of a meshfree approach offers great flexibility in the domain discretization process and reduces the complexity of mesh modifications such as adaptive refinement. We present an overview of the RKPM implementation in the Sierra/SolidMechanics analysis code, with a focus on verification, validation, and software engineering for massively parallel computation. Key details include the processing of meshfree discretizations within a FE code, RKPM solution approximation and domain integration, stress update and calculation of internal force, and contact modeling. The accuracy and performance of RKPM are evaluated using a set of benchmark problems. Solution verification, mesh convergence, and parallel scalability are demonstrated using a simulation of wave propagation along the length of a bar. Initial model validation is achieved through simulation of a Taylor bar impact test. The RKPM approach is shown to be a viable alternative to standard FE techniques that provides additional flexibility to the analyst community.
We study several natural instances of the geometric hitting set problem for input consisting of sets of line segments (and rays, lines) having a small number of distinct slopes. These problems model path monitoring (e.g., on road networks) using the fewest sensors (the “hitting points”). We give approximation algorithms for cases including (i) lines of 3 slopes in the plane, (ii) vertical lines and horizontal segments, (iii) pairs of horizontal/vertical segments. We give hardness and hardness of approximation results for these problems. We prove that the hitting set problem for vertical lines and horizontal rays is polynomially solvable.
The reproducing kernel particle method (RKPM) is a meshfree method for computational solid mechanics that can be tailored for an arbitrary order of completeness and smoothness. The primary advantage of RKPM relative to standard finiteelement (FE) approaches is its capacity to model large deformations, material damage, and fracture. Additionally, the use of a meshfree approach offers great flexibility in the domain discretization process and reduces the complexity of mesh modifications such as adaptive refinement. We present an overview of the RKPM implementation in the Sierra/SolidMechanics analysis code, with a focus on verification, validation, and software engineering for massively parallel computation. Key details include the processing of meshfree discretizations within a FE code, RKPM solution approximation and domain integration, stress update and calculation of internal force, and contact modeling. The accuracy and performance of RKPM are evaluated using a set of benchmark problems. Solution verification, mesh convergence, and parallel scalability are demonstrated using a simulation of wave propagation along the length of a bar. Initial model validation is achieved through simulation of a Taylor bar impact test. The RKPM approach is shown to be a viable alternative to standard FE techniques that provides additional flexibility to the analyst community.
Proceedings - 15th European Turbulence Conference, ETC 2015
Smith, Thomas M.; Christon, Mark A.; Baglietto, Emilio; Luo, Hong
Accurate simulation of turbulence remains one of the most challenging problems in nuclear reactor analysis and design. Due to limitations in computing resources, Reynolds averaged Navier Stokes models (RANS) continue to play an important role in reactor simulations. The Consortium for advanced simulations of light water reactors (CASL) is a Department of Energy technology hub that is investing in research and development of a state-of-the-art computational fluid dynamics capability to meet the challenges of turbulent simulation of nuclear reactors. In this presentation, we assess several RANS eddy viscosity models appropriate for single-phase incompressible turbulent flows. Specifically, we compare the single equation Splalart-Allmaras to several variations of the k − ε model. The assessment takes into consideration elements of full system reactor cores such as complex geometries, heterogeneous meshes, swirling flow, near wall flow behavior, heat transfer and robustness issues. The goal of this strategically oriented assessment is to provide an accurate and robust turbulent simulation capability for the CASL community. Metrics of performance will be constructed by comparing different models on a strategically chosen set of problems that represent reactor core sub-systems.
Previously, the current authors (Hopkins et al. 2015) described research in which subjects provided a tool that facilitated their construction of a narrative account of events performed better in conducting cyber security forensic analysis. The narrative tool offered several distinct features. In the current paper, an analysis is reported that considered which features of the tool contributed to superior performance. This analysis revealed two features that accounted for a statistically significant portion of the variance in performance. The first feature provided a mechanism for subjects to identify suspected perpetrators of the crimes and their motives. The second feature involved the ability to create an annotated visuospatial diagram of clues regarding the crimes and their relationships to one another. Based on these results, guidance may be provided for the development of software tools meant to aid cyber security professionals in conducting forensic analysis.
The purpose of this paper is to consider the exit-time problem for a finite-range Markov jump process, i.e, the distance the particle can jump is bounded independent of its location. Such jump diffusions are expedient models for anomalous transport exhibiting super-diffusion or nonstandard normal diffusion. We refer to the associated deterministic equation as a volume-constrained nonlocal diffusion equation. The volume constraint is the nonlocal analogue of a boundary condition necessary to demonstrate that the nonlocal diffusion equation is well-posed and is consistent with the jump process. A critical aspect of the analysis is a variational formulation and a recently developed nonlocal vector calculus. This calculus allows us to pose nonlocal backward and forward Kolmogorov equations, the former equation granting the various moments of the exit-time distribution.
An approach for building energy-stable Galerkin reduced order models (ROMs) for linear hyperbolic or incompletely parabolic systems of partial differential equations (PDEs) using continuous projection is developed. This method is an extension of earlier work by the authors specific to the equations of linearized compressible inviscid flow. The key idea is to apply to the PDEs a transformation induced by the Lyapunov function for the system, and to build the ROM in the transformed variables. For linear problems, the desired transformation is induced by a special inner product, termed the "symmetry inner product", which is derived herein for several systems of physical interest. Connections are established between the proposed approach and other stability-preserving model reduction methods, giving the paper a review flavor. More specifically, it is shown that a discrete counterpart of this inner product is a weighted L2 inner product obtained by solving a Lyapunov equation, first proposed by Rowley et al. and termed herein the "Lyapunov inner product". Comparisons between the symmetry inner product and the Lyapunov inner product are made, and the performance of ROMs constructed using these inner products is evaluated on several benchmark test cases.
The purpose of this paper is to consider the exit-time problem for a finite-range Markov jump process, i.e, the distance the particle can jump is bounded independent of its location. Additionally, such jump diffusions are expedient models for anomalous transport exhibiting super-diffusion or nonstandard normal diffusion. We refer to the associated deterministic equation as a volume-constrained nonlocal diffusion equation. The volume constraint is the nonlocal analogue of a boundary condition necessary to demonstrate that the nonlocal diffusion equation is well-posed and is consistent with the jump process. A critical aspect of the analysis is a variational formulation and a recently developed nonlocal vector calculus. Finally, this calculus allows us to pose nonlocal backward and forward Kolmogorov equations, the former equation granting the various moments of the exit-time distribution.
We present a spectral mimetic least-squares method for a model diffusion–reaction problem, which preserves key conservation properties of the continuum problem. Casting the model problem into a first-order system for two scalar and two vector variables shifts material properties from the differential equations to a pair of constitutive relations. We also use this system to motivate a new least-squares functional involving all four fields and show that its minimizer satisfies the differential equations exactly. Discretization of the four-field least-squares functional by spectral spaces compatible with the differential operators leads to a least-squares method in which the differential equations are also satisfied exactly. Additionally, the latter are reduced to purely topological relationships for the degrees of freedom that can be satisfied without reference to basis functions. Furthermore, numerical experiments confirm the spectral accuracy of the method and its local conservation.
Current commercial software tools for transmission and generation investment planning have limited stochastic modeling capabilities. Because of this limitation, electric power utilities generally rely on scenario planning heuristics to identify potentially robust and cost effective investment plans for a broad range of system, economic, and policy conditions. Several research studies have shown that stochastic models perform significantly better than deterministic or heuristic approaches, in terms of overall costs. However, there is a lack of practical solution approaches to solve such models. In this paper we propose a scalable decomposition algorithm to solve stochastic transmission and generation planning problems, respectively considering discrete and continuous decision variables for transmission and generation investments. Given stochasticity restricted to loads and wind, solar, and hydro power output, we develop a simple scenario reduction framework based on a clustering algorithm, to yield a more tractable model. The resulting stochastic optimization model is decomposed on a scenario basis and solved using a variant of the Progressive Hedging (PH) algorithm. We perform numerical experiments using a 240-bus network representation of the Western Electricity Coordinating Council in the US. Although convergence of PH to an optimal solution is not guaranteed for mixed-integer linear optimization models, we find that it is possible to obtain solutions with acceptable optimality gaps for practical applications. Our numerical simulations are performed both on a commodity workstation and on a high-performance cluster. The results indicate that large-scale problems can be solved to a high degree of accuracy in at most two hours of wall clock time.
The metrics used for evaluating energy saving techniques for future HPC systems are critical to the correct assessment of proposed methods. Current predictions forecast that overcoming reduced system reliability, increased power requirements and energy consumption will be a major design challenge for future systems. Modern runtime energy-saving research efforts do not take into account the energy spent providing reliability. They also do not account for the increase in the probability of failure during application execution due to runtime overhead from energy saving methods. While this is very reasonable for current systems, it is insufficient for future generation systems. By taking into account the energy consumption ramifications of increased runtimes on system reliability, better energy saving techniques can be developed. This paper demonstrates how to determine the impact of runtime energy conservation methods within the context of failure-prone large scale systems. In addition, a survey of several energy savings methodologies is conducted and an analysis is performed with respect to their effectiveness in an environment in which failures occur.
Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that may combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way in which compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insight into how MPI implementations for future hybrid-core processors should be designed.
We use Sandia's Z machine and magnetically accelerated flyer plates to shock compress liquid krypton to 850 GPa and compare with results from density-functional theory (DFT) based simulations using the AM05 functional. We also employ quantum Monte Carlo calculations to motivate the choice of AM05. We conclude that the DFT results are sensitive to the quality of the pseudopotential in terms of scattering properties at high energy/temperature. A new Kr projector augmented wave potential was constructed with improved scattering properties which resulted in excellent agreement with the experimental results to 850 GPa and temperatures above 10 eV (110 kK). Finally, we present comparisons of our data from the Z experiments and DFT calculations to current equation of state models of krypton to determine the best model for high energy-density applications.
In this report we derive frequency-domain methods for inverse characterization of the constitutive parameters of viscoelastic materials. The inverse problem is cast in a PDE-constrained optimization framework with efficient computation of gradients and Hessian vector products through matrix free operations. The abstract optimization operators for first and second derivatives are derived from first principles. Various methods from the Rapid Optimization Library (ROL) are tested on the viscoelastic inversion problem. The methods described herein are applied to compute the viscoelastic bulk and shear moduli of a foam block model, which was recently used in experimental testing for viscoelastic property characterization.
The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models.
Under shock compression, most porous materials exhibit lower densities for a given pressure than that of a full-dense sample of the same material. However, some porous materials exhibit an anomalous, or enhanced, densification under shock compression. We demonstrate a molecular mechanism that drives this behavior. We also present evidence from atomistic simulation that silicon belongs to this anomalous class of materials. Atomistic simulations indicate that local shear strain in the neighborhood of collapsing pores nucleates a local solid-solid phase transformation even when bulk pressures are below the thermodynamic phase transformation pressure. This metastable, local, and partial, solid-solid phase transformation, which accounts for the enhanced densification in silicon, is driven by the local stress state near the void, not equilibrium thermodynamics. This mechanism may also explain the phenomenon in other covalently bonded materials.
Adult neurogenesis in the hippocampus is a notable process due not only to its uniqueness and potential impact on cognition but also to its localized vertical integration of different scales of neuroscience, ranging from molecular and cellular biology to behavior. Our review summarizes the recent research regarding the process of adult neurogenesis from these different perspectives, with particular emphasis on the differentiation and development of new neurons, the regulation of the process by extrinsic and intrinsic factors, and their ultimate function in the hippocampus circuit. Arising from a local neural stem cell population, new neurons progress through several stages of maturation, ultimately integrating into the adult dentate gyrus network. Furthermore, the increased appreciation of the full neurogenesis process, from genes and cells to behavior and cognition, makes neurogenesis both a unique case study for how scales in neuroscience can link together and suggests neurogenesis as a potential target for therapeutic intervention for a number of disorders.
This report summarizes the result of a NEAMS project focused on sensitivity analysis of the heat transfer model in the gap between the fuel rod and the cladding used in the BISON fuel performance code of Idaho National Laboratory. Using the gap heat transfer models in BISON, the sensitivity of the modeling parameters and the associated responses is investigated. The study results in a quantitative assessment of the role of various parameters in the analysis of gap heat transfer in nuclear fuel.
A method is described for applying a sequence of peridynamic models with different length scales concurrently to subregions of a body. The method allows the smallest length scale, and therefore greatest spatial resolution, to be focused on evolving defects such as cracks. The peridynamic horizon in each of the models is half of that of the next model in the sequence. The boundary conditions on each model are provided by the solution predicted by the model above it. Material property characterization for each model is derived by coarse-graining the more detailed resolution in the model below it. Implementation of the multiscale method in the PDMS code is described. Examples of crack growth modeling illustrate the ability of the method to reproduce the main features of crack growth seen in a model with uniformly small resolution. Comparison of the multiscale model results with XFEM and cohesive elements is also given for a crack growth problem.
The peridynamic theory is an extension of traditional solid mechanics in which the field equations can be applied on discontinuities, such as growing cracks. This paper proposes a bond damage model within peridynamics to treat the nucleation and growth of cracks due to cyclic loading. Bond damage occurs according to the evolution of a variable called the "remaining life" of each bond that changes over time according to the cyclic strain in the bond. It is shown that the model reproduces the main features of S-N data for typical materials and also reproduces the Paris law for fatigue crack growth. Extensions of the model account for the effects of loading spectrum, fatigue limit, and variable load ratio. A three-dimensional example illustrates the nucleation and growth of a helical fatigue crack in the torsion of an aluminum alloy rod.
We created a highly efficient, universal 3D quant um transport simulator. We demonstrated that the simulator scales linearly - both with the problem size (N) and number of CPUs, which presents an important break-through in the field of computational nanoelectronics. It allowed us, for the first time, to accurately simulate and optim ize a large number of realistic nanodevices in a much shorter time, when compared to other methods/codes such as RGF[%7EN 2.333 ]/KNIT, KWANT, and QTBM[%7EN 3 ]/NEMO5. In order to determine the best-in-class for different beyond-CMOS paradigms, we performed rigorous device optimization for high-performance logic devices at 6-, 5- and 4-nm gate lengths. We have discovered that there exists a fundamental down-scaling limit for CMOS technology and other Field-Effect Transistors (FETs). We have found that, at room temperatures, all FETs, irre spective of their channel material, will start experiencing unacceptable level of thermally induced errors around 5-nm gate lengths.
The MueLu tutorial is written as a hands-on tutorial for MueLu, the next generation multigrid framework in Trilinos. It covers the whole spectrum from absolute beginners’ topics to expert level. Since the focus of this tutorial is on practical and technical aspects of multigrid methods in general and MueLu in particular, the reader is expected to have a basic understanding of multigrid methods and its general underlying concepts. Please refer to multigrid textbooks (e.g. [1]) for the theoretical background.
Adaptive Thinking has been defined here as the capacity to recognize when a course of action that may have previously been effective is no longer effective and there is need to adjust strategy. Research was undertaken with human test subjects to identify the factors that contribute to adaptive thinking. It was discovered that those most effective in settings that call for adaptive thinking tend to possess a superior capacity to quickly and effectively generate possible courses of action, as measured using the Category Generation test. Software developed for this research has been applied to develop capabilities enabling analysts to identify crucial factors that are predictive of outcomes in fore-on-force simulation exercises.
This is the official user guide for the M UE L U multigrid library in Trilinos version 11.12. This guide provides an overview of M UE L U , its capabilities, and instructions for new users who want to start using M UE L U with a minimum of effort. Detailed information is given on how to drive M UE L U through its XML interface. Links to more advanced use cases are given. This guide gives information on how to achieve good parallel performance, as well as how to introduce new algorithms. Finally, readers will find a comprehensive listing of available M UE L U options. Any options not documented in this manual should be considered strictly experimental.
Adult neurogenesis in the hippocampus region of the brain is a neurobiological process that is believed to contribute to the brain's advanced abilities in complex pattern recognition and cognition. Here, we describe how realistic scale simulations of the neurogenesis process can offer both a unique perspective on the biological relevance of this process and confer computational insights that are suggestive of novel machine learning techniques. First, supercomputer based scaling studies of the neurogenesis process demonstrate how a small fraction of adult-born neurons have a uniquely larger impact in biologically realistic scaled networks. Second, we describe a novel technical approach by which the information content of ensembles of neurons can be estimated. Finally, we illustrate several examples of broader algorithmic impact of neurogenesis, including both extending existing machine learning approaches and novel approaches for intelligent sensing.
This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performance computing to obtain better data locality and thus reduce run times.
This report presents the test cases used to verify, validate and demonstrate the features and capabilities of the first release of the 3D Direct Simulation Monte Carlo (DSMC) code SPARTA (Stochastic Real Time Particle Analyzer). The test cases included in this report exercise the most critical capabilities of the code like the accurate representation of physical phenomena (molecular advection and collisions, energy conservation, etc.) and implementation of numerical methods (grid adaptation, load balancing, etc.). Several test cases of simple flow examples are shown to demonstrate that the code can reproduce phenomena predicted by analytical solutions and theory. A number of additional test cases are presented to illustrate the ability of SPARTA to model flow around complicated shapes. In these cases, the results are compared to other well-established codes or theoretical predictions. This compilation of test cases is not exhaustive, and it is anticipated that more cases will be added in the future.
Model reduction for dynamical systems is a promising approach for reducing the computational cost of large-scale physics-based simulations to enable high-fidelity models to be used in many- query (e.g., Bayesian inference) and near-real-time (e.g., fast-turnaround simulation) contexts. While model reduction works well for specialized problems such as linear time-invariant systems, it is much more difficult to obtain accurate, stable, and efficient reduced-order models (ROMs) for systems with general nonlinearities. This report describes several advances that enable nonlinear reduced-order models (ROMs) to be deployed in a variety of time-critical settings. First, we present an error bound for the Gauss-Newton with Approximated Tensors (GNAT) nonlinear model reduction technique. This bound allows the state-space error for the GNAT method to be quantified when applied with the backward Euler time-integration scheme. Second, we present a methodology for preserving classical Lagrangian structure in nonlinear model reduction. This technique guarantees that important properties--such as energy conservation and symplectic time-evolution maps--are preserved when performing model reduction for models described by a Lagrangian formalism (e.g., molecular dynamics, structural dynamics). Third, we present a novel technique for decreasing the temporal complexity --defined as the number of Newton-like iterations performed over the course of the simulation--by exploiting time-domain data. Fourth, we describe a novel method for refining projection-based reduced-order models a posteriori using a goal-oriented framework similar to mesh-adaptive h -refinement in finite elements. The technique allows the ROM to generate arbitrarily accurate solutions, thereby providing the ROM with a 'failsafe' mechanism in the event of insufficient training data. Finally, we present the reduced-order model error surrogate (ROMES) method for statistically quantifying reduced- order-model errors. This enables ROMs to be rigorously incorporated in uncertainty-quantification settings, as the error model can be treated as a source of epistemic uncertainty. This work was completed as part of a Truman Fellowship appointment. We note that much additional work was performed as part of the Fellowship. One salient project is the development of the Trilinos-based model-reduction software module Razor , which is currently bundled with the Albany PDE code and currently allows nonlinear reduced-order models to be constructed for any application supported in Albany. Other important projects include the following: 1. ROMES-equipped ROMs for Bayesian inference: K. Carlberg, M. Drohmann, F. Lu (Lawrence Berkeley National Laboratory), M. Morzfeld (Lawrence Berkeley National Laboratory). 2. ROM-enabled Krylov-subspace recycling: K. Carlberg, V. Forstall (University of Maryland), P. Tsuji, R. Tuminaro. 3. A pseudo balanced POD method using only dual snapshots: K. Carlberg, M. Sarovar. 4. An analysis of discrete v. continuous optimality in nonlinear model reduction: K. Carlberg, M. Barone, H. Antil (George Mason University). Journal articles for these projects are in progress at the time of this writing.
The HPC architectures of today are significantly different for a decade ago, with high odds that further changes will occur on the road to Exascale. This report discusses the "perfect storm' in technology that produced this change, the classes of architectures we are dealing with, and probable trends in how they will evolve. These properties and trends are then evaluated in terms of what it likely means to future Exascale systems and applications.
MPI defines a one-to-one relationship between MPI processes and ranks. This model captures many use cases effectively; however, it also limits communication concurrency and interoperability between MPI and programming models that utilize threads. Our paper describes the MPI endpoints extension, which relaxes the longstanding one-to-one relationship between MPI processes and ranks. Using endpoints, an MPI implementation can map separate communication contexts to threads, allowing them to drive communication independently. Also, endpoints enable threads to be addressable in MPI operations, enhancing interoperability between MPI and other programming models. Furthermore, these characteristics are illustrated through several examples and an empirical study that contrasts current multithreaded communication performance with the need for high degrees of communication concurrency to achieve peak communication performance.
Proceedings of the International Conference on Dependable Systems and Networks
Ibtesham, Dewan; Debonis, David; Arnold, Dorian; Ferreira, Kurt
As high-performance computing systems continue to grow in size and complexity, energy efficiency and reliability have emerged as first-order concerns. Researchers have shown that data movement is a significant contributing factor to power consumption on these systems. Additionally, rollback/recovery protocols like checkpoint/restart can generate large volumes of data traffic exacerbating the energy and power concerns. In this work, we show that a coarse-grained model can be used effectively to speculate about the energy footprints of rollback/recovery protocols. Using our validated model, we evaluate the energy footprint of checkpoint compression, a method that incurs higher computational demand to reduce data volumes and data traffic. Specifically, we show that while checkpoint compression leads to more frequent checkpoints (as per the optimal checkpoint frequency) and increases per checkpoint energy cost, compression still yields a decrease in total application energy consumption due to the overall runtime decrease.
The current system reaction to the loss of a single MPI process is to kill all the remaining processes and restart the application from the most recent checkpoint. This approach will become unfeasible for future extreme scale systems. We address this issue using an emerging resilient computing model called Local Failure Local Recovery (LFLR) that provides application developers with the ability to recover locally and continue application execution when a process is lost. We discuss the design of our software framework to enable the LFLR model using MPI-ULFM and demonstrate the resilient version of MiniFE that achieves a scalable recovery from process failures.
We present a review and critique of several methods for the simulation of the dynamics of colloidal suspensions at the mesoscale. We focus particularly on simulation techniques for hydrodynamic interactions, including implicit solvents (Fast Lubrication Dynamics, an approximation to Stokesian Dynamics) and explicit/particle-based solvents (Multi-Particle Collision Dynamics and Dissipative Particle Dynamics). Several variants of each method are compared quantitatively for the canonical system of monodisperse hard spheres, with a particular focus on diffusion characteristics, as well as shear rheology and microstructure. In all cases, we attempt to match the relevant properties of a well-characterized solvent, which turns out to be challenging for the explicit solvent models. Reasonable quantitative agreement is observed among all methods, but overall the Fast Lubrication Dynamics technique shows the best accuracy and performance. We also devote significant discussion to the extension of these methods to more complex situations of interest in industrial applications, including models for non-Newtonian solvent rheology, non-spherical particles, drying and curing of solvent and flows in complex geometries. This work identifies research challenges and motivates future efforts to develop techniques for quantitative, predictive simulations of industrially relevant colloidal suspension processes.
Recent advances in nanotechnology have enabled researchers to control individual quantum mechanical objects with unprecedented accuracy, opening the door for both quantum and extreme- scale conventional computation applications. As these devices become more complex, designing for facility of control becomes a daunting and computationally infeasible task. Here, motivated by ideas from compressed sensing, we introduce a protocol for the Compressed Optimization of Device Architectures (CODA). It leads naturally to a metric for benchmarking and optimizing device designs, as well as an automatic device control protocol that reduces the operational complexity required to achieve a particular output. Because this protocol is both experimentally and computationally efficient, it is readily extensible to large systems. For this paper, we demonstrate both the bench- marking and device control protocol components of CODA through examples of realistic simulations of electrostatic quantum dot devices, which are currently being developed experimentally for quantum computation.
Concern is growing in the High-Performance Computing community regarding the reliability of proposed exascale systems. Current research has shown that the expected reliability of these machines will greatly reduce their scalability. In constrast to current fault tolerance methods whose reliability focus is only the application, this project investigates the benefits integrating reliability mechcanisms in the operating system and runtime, as well as the appli- cation. More specifically, this project has three broad contributions in the field: First, using failure logs from current leadership-class high-performance computing systems, we outline the failures common on these large-scale systems. Second, we describe a novel memory pro- tection mechcanism capable of protecting common observed failures that uses the similarity inherrant in many OS and applications state, thereby reducing overheads. Finally, using an analogy with OS jitter, we develop a highly effecient simulator capable predicting the performance of resilience methods at the scales expected for future extreme-scale systems.
As computer systems grow in both size and complexity, the need for applications and run-time systems to adjust to their dynamic environment also grows. The goal of the RAAMP LDRD was to combine static architecture information and real-time system state with algorithms to conserve power, reduce communication costs, and avoid network contention. We devel- oped new data collection and aggregation tools to extract static hardware information (e.g., node/core hierarchy, network routing) as well as real-time performance data (e.g., CPU uti- lization, power consumption, memory bandwidth saturation, percentage of used bandwidth, number of network stalls). We created application interfaces that allowed this data to be used easily by algorithms. Finally, we demonstrated the benefit of integrating system and application information for two use cases. The first used real-time power consumption and memory bandwidth saturation data to throttle concurrency to save power without increasing application execution time. The second used static or real-time network traffic information to reduce or avoid network congestion by remapping MPI tasks to allocated processors. Results from our work are summarized in this report; more details are available in our publications [2, 6, 14, 16, 22, 29, 38, 44, 51, 54].
This milestone was the 2nd in a series of Tri-Lab Co-Design L2 milestones supporting ‘Co-Design’ efforts in the ASC program. It is a crucial step towards evaluating the effectiveness of proxy applications in exploring code performance on next generation architectures. All three labs evaluated the performance of 2 proxy applications on modern architectures and/or testbeds for pre-production hardware. The results are captured in this document as well as annotated presentations from all 3 laboratories.
Rigorous modeling of engineering systems relies on efficient propagation of uncertainty from input parameters to model outputs. In recent years, there has been substantial development of probabilistic polynomial chaos (PC) Uncertainty Quantification (UQ) methods, enabling studies in expensive computational models. One approach, termed ”intrusive”, involving reformulation of the governing equations, has been found to have superior computational performance compared to non-intrusive sampling-based methods in relevant large-scale problems, particularly in the context of emerging architectures. However, the utility of intrusive methods has been severely limited due to detrimental numerical instabilities associated with strong nonlinear physics. Previous methods for stabilizing these constructions tend to add unacceptably high computational costs, particularly in problems with many uncertain parameters. In order to address these challenges, we propose to adapt and improve numerical continuation methods for the robust time integration of intrusive PC system dynamics. We propose adaptive methods, starting with a small uncertainty for which the model has stable behavior and gradually moving to larger uncertainty where the instabilities are rampant, in a manner that provides a suitable solution.
This report summarizes the result of LDRD project 12-0395, titled "Automated Algorithms for Quantum-level Accuracy in Atomistic Simulations." During the course of this LDRD, we have developed an interatomic potential for solids and liquids called Spectral Neighbor Analysis Poten- tial (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected on to a basis of hyperspherical harmonics in four dimensions. The SNAP coef- ficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. Global optimization methods in the DAKOTA software package are used to seek out good choices of hyperparameters that define the overall structure of the SNAP potential. FitSnap.py, a Python-based software pack- age interfacing to both LAMMPS and DAKOTA is used to formulate the linear regression problem, solve it, and analyze the accuracy of the resultant SNAP potential. We describe a SNAP potential for tantalum that accurately reproduces a variety of solid and liquid properties. Most significantly, in contrast to existing tantalum potentials, SNAP correctly predicts the Peierls barrier for screw dislocation motion. We also present results from SNAP potentials generated for indium phosphide (InP) and silica (SiO 2 ). We describe efficient algorithms for calculating SNAP forces and energies in molecular dynamics simulations using massively parallel computers and advanced processor ar- chitectures. Finally, we briefly describe the MSM method for efficient calculation of electrostatic interactions on massively parallel computers.
We explore rearrangements of classical uncertainty quantification methods with the aim of achieving higher aggregate performance for uncertainty quantification calculations on emerging multicore and many core architectures. We show a rearrangement of the stochastic Galerkin method leads to improved performance and scalability on several computational architectures whereby uncertainty information is propagated at the lowest levels of the simulation code improving memory access patterns, exposing new dimensions of fine grained parallelism, and reducing communication. We also develop a general framework for implementing such rearrangements for a diverse set of uncertainty quantification algorithms as well as computational simulation codes to which they are applied.
Surface effects are critical to the accurate simulation of electromagnetics (EM) as current tends to concentrate near material surfaces. Sandia EM applications, which include exploding bridge wires for detonator design, electromagnetic launch of flyer plates for material testing and gun design, lightning blast-through for weapon safety, electromagnetic armor, and magnetic flux compression generators, all require accurate resolution of surface effects. These applications operate in a large deformation regime, where body-fitted meshes are impractical and multimaterial elements are the only feasible option. State-of-the-art methods use various mixture models to approximate the multi-physics of these elements. The empirical nature of these models can significantly compromise the accuracy of the simulation in this very important surface region. We propose to substantially improve the predictive capability of electromagnetic simulations by removing the need for empirical mixture models at material surfaces. We do this by developing an eXtended Finite Element Method (XFEM) and an associated Conformal Decomposition Finite Element Method (CDFEM) which satisfy the physically required compatibility conditions at material interfaces. We demonstrate the effectiveness of these methods for diffusion and diffusion-like problems on node, edge and face elements in 2D and 3D. We also present preliminary work on h -hierarchical elements and remap algorithms.
In this project we have developed atmospheric measurement capabilities and a suite of atmospheric modeling and analysis tools that are well suited for verifying emissions of green- house gases (GHGs) on an urban-through-regional scale. We have for the first time applied the Community Multiscale Air Quality (CMAQ) model to simulate atmospheric CO2 . This will allow for the examination of regional-scale transport and distribution of CO2 along with air pollutants traditionally studied using CMAQ at relatively high spatial and temporal resolution with the goal of leveraging emissions verification efforts for both air quality and climate. We have developed a bias-enhanced Bayesian inference approach that can remedy the well-known problem of transport model errors in atmospheric CO2 inversions. We have tested the approach using data and model outputs from the TransCom3 global CO2 inversion comparison project. We have also performed two prototyping studies on inversion approaches in the generalized convection-diffusion context. One of these studies employed Polynomial Chaos Expansion to accelerate the evaluation of a regional transport model and enable efficient Markov Chain Monte Carlo sampling of the posterior for Bayesian inference. The other approach uses de- terministic inversion of a convection-diffusion-reaction system in the presence of uncertainty. These approaches should, in principle, be applicable to realistic atmospheric problems with moderate adaptation. We outline a regional greenhouse gas source inference system that integrates (1) two ap- proaches of atmospheric dispersion simulation and (2) a class of Bayesian inference and un- certainty quantification algorithms. We use two different and complementary approaches to simulate atmospheric dispersion. Specifically, we use a Eulerian chemical transport model CMAQ and a Lagrangian Particle Dispersion Model - FLEXPART-WRF. These two models share the same WRF assimilated meteorology fields, making it possible to perform a hybrid simulation, in which the Eulerian model (CMAQ) can be used to compute the initial condi- tion needed by the Lagrangian model, while the source-receptor relationships for a large state vector can be efficiently computed using the Lagrangian model in its backward mode. In ad- dition, CMAQ has a complete treatment of atmospheric chemistry of a suite of traditional air pollutants, many of which could help attribute GHGs from different sources. The inference of emissions sources using atmospheric observations is cast as a Bayesian model calibration problem, which is solved using a variety of Bayesian techniques, such as the bias-enhanced Bayesian inference algorithm, which accounts for the intrinsic model deficiency, Polynomial Chaos Expansion to accelerate model evaluation and Markov Chain Monte Carlo sampling, and Karhunen-Lo %60 eve (KL) Expansion to reduce the dimensionality of the state space. We have established an atmospheric measurement site in Livermore, CA and are collect- ing continuous measurements of CO2 , CH4 and other species that are typically co-emitted with these GHGs. Measurements of co-emitted species can assist in attributing the GHGs to different emissions sectors. Automatic calibrations using traceable standards are performed routinely for the gas-phase measurements. We are also collecting standard meteorological data at the Livermore site as well as planetary boundary height measurements using a ceilometer. The location of the measurement site is well suited to sample air transported between the San Francisco Bay area and the California Central Valley.