Recovery from process loss during the execution of a distributed memory parallel application is presently achieved by restarting the program, typically from a checkpoint file. Future computer system trends indicate that the size of data to checkpoint, the lack of improvement in parallel file system performance and the increase in process failure rates will lead to situations where checkpoint restart becomes infeasible. In this report we describe and prototype the use of a new application level resilient computing model that manages persistent storage of local state for each process such that, if a process fails, recovery can be performed locally without requiring access to a global checkpoint file. LFLR provides application developers with an ability to recover locally and continue application execution when a process is lost. This report discusses what features are required from the hardware, OS and runtime layers, and what approaches application developers might use in the design of future codes, including a demonstration of LFLR-enabled MiniFE code from the Matenvo mini-application suite.
Exascale data environments are fast approaching, driven by diverse structured and unstructured data such as system and application telemetry streams, open-source information capture, and on-demand simulation output. Storage costs having plummeted, the question is now one of converting vast stores of data to actionable information. Complicating this problem are the low degrees of awareness across domain boundaries about what potentially useful data may exist, and write-once-read- never issues (data generation/collection rates outpacing data analysis and integration rates). Increasingly, technologists and researchers need to correlate previously unrelated data sources and artifacts to produce fused data views for domain-specific purposes. New tools and approaches for creating such views from vast amounts of data are vitally important to maintaining research and operational momentum. We propose to research and develop tools and services to assist in the creation, refinement, discovery and reuse of fused data views over large, diverse collections of heterogeneously structured data. We innovate in the following ways. First, we enable and encourage end-users to introduce customized index methods selected for local benefit rather than for global interaction (flexible multi-indexing). We envision rich combinations of such views on application data: views that span backing stores with different semantics, that introduce analytic methods of indexing, and that define multiple views on individual data items. We specifically decline to build a big fused database of everything providing a centralized index over all data, or to export a rigid schema to all comers as in federated query approaches. Second, we proactively advertise these application-specific views so that they may be programmatically reused and extended (data proactivity). Through this mechanism, both changes in state (new data in existing view collected) and changes in structure (new or derived view exists) are made known. Lastly, we embrace found data heterogeneity by coupling multi-indexing to backing stores with appropriate semantics (as opposed to a single store or schema).
The Dakota (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quanti cation with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the Dakota toolkit provides a exible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a theoretical manual for selected algorithms implemented within the Dakota software. It is not intended as a comprehensive theoretical treatment, since a number of existing texts cover general optimization theory, statistical analysis, and other introductory topics. Rather, this manual is intended to summarize a set of Dakota-related research publications in the areas of surrogate-based optimization, uncertainty quanti cation, and optimization under uncertainty that provide the foundation for many of Dakota's iterative analysis capabilities.
The Dakota (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quanti cation with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the Dakota toolkit provides a exible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a user's manual for the Dakota software and provides capability overviews and procedures for software execution, as well as a variety of example studies.
This report describes work performed from June 2012 through May 2014 as a part of a Sandia Early Career Laboratory Directed Research and Development (LDRD) project led by the first author. The objective of the project is to investigate methods for building stable and efficient proper orthogonal decomposition (POD)/Galerkin reduced order models (ROMs): models derived from a sequence of high-fidelity simulations but having a much lower computational cost. Since they are, by construction, small and fast, ROMs can enable real-time simulations of complex systems for onthe- spot analysis, control and decision-making in the presence of uncertainty. Of particular interest to Sandia is the use of ROMs for the quantification of the compressible captive-carry environment, simulated for the design and qualification of nuclear weapons systems. It is an unfortunate reality that many ROM techniques are computationally intractable or lack an a priori stability guarantee for compressible flows. For this reason, this LDRD project focuses on the development of techniques for building provably stable projection-based ROMs. Model reduction approaches based on continuous as well as discrete projection are considered. In the first part of this report, an approach for building energy-stable Galerkin ROMs for linear hyperbolic or incompletely parabolic systems of partial differential equations (PDEs) using continuous projection is developed. The key idea is to apply a transformation induced by the Lyapunov function for the system, and to build the ROM in the transformed variables. It is shown that, for many PDE systems including the linearized compressible Euler and linearized compressible Navier-Stokes equations, the desired transformation is induced by a special inner product, termed the “symmetry inner product”. Attention is then turned to nonlinear conservation laws. A new transformation and corresponding energy-based inner product for the full nonlinear compressible Navier-Stokes equations is derived, and it is demonstrated that if a Galerkin ROM is constructed in this inner product, the ROM system energy will be bounded in a way that is consistent with the behavior of the exact solution to these PDEs, i.e., the ROM will be energy-stable. The viability of the linear as well as nonlinear continuous projection model reduction approaches developed as a part of this project is evaluated on several test cases, including the cavity configuration of interest in the targeted application area. In the second part of this report, some POD/Galerkin approaches for building stable ROMs using discrete projection are explored. It is shown that, for generic linear time-invariant (LTI) systems, a discrete counterpart of the continuous symmetry inner product is a weighted L2 inner product obtained by solving a Lyapunov equation. This inner product was first proposed by Rowley et al., and is termed herein the “Lyapunov inner product“. Comparisons between the symmetry inner product and the Lyapunov inner product are made, and the performance of ROMs constructed using these inner products is evaluated on several benchmark test cases. Also in the second part of this report, a new ROM stabilization approach, termed “ROM stabilization via optimization-based eigenvalue reassignment“, is developed for generic LTI systems. At the heart of this method is a constrained nonlinear least-squares optimization problem that is formulated and solved numerically to ensure accuracy of the stabilized ROM. Numerical studies reveal that the optimization problem is computationally inexpensive to solve, and that the new stabilization approach delivers ROMs that are stable as well as accurate. Summaries of “lessons learned“ and perspectives for future work motivated by this LDRD project are provided at the end of each of the two main chapters.
In the past several years, several international agencies have begun to collect data on human performance in nuclear power plant simulators [1]. This data provides a valuable opportunity to improve human reliability analysis (HRA), but there improvements will not be realized without implementation of Bayesian methods. Bayesian methods are widely used in to incorporate sparse data into models in many parts of probabilistic risk assessment (PRA), but Bayesian methods have not been adopted by the HRA community. In this article, we provide a Bayesian methodology to formally use simulator data to refine the human error probabilities (HEPs) assigned by existing HRA methods. We demonstrate the methodology with a case study, wherein we use simulator data from the Halden Reactor Project to update the probability assignments from the SPAR-H method. The case study demonstrates the ability to use performance data, even sparse data, to improve existing HRA methods. Furthermore, this paper also serves as a demonstration of the value of Bayesian methods to improve the technical basis of HRA.