Monte Carlo (MC) sampling is a common method used to randomly sample a range of scenarios. The associated error follows a predictable rate of convergence of $1/\sqrt{N}$, such that quadrupling the sample size halves the error. This method is often employed in performing global sensitivity analysis which computes sensitivity indices, measuring fractional contributions of uncertain model inputs to the total output variance. In this study, several models are used to observe the rate of decay in the MC error in the estimation of the conditional variance, the total variance in the output, and the global sensitivity indices. The purpose is to examine the rate of convergence of the error in existing specialized, albeit MC-based, sampling methods for estimation of the sensitivity indices. It was found that the conditional variances and sensitivity indices all follow the $1/\sqrt{N}$ convergence rate. Future work will test the convergence of observables from more complex models such as ignition time in combustion.
Demonstrate algorithm-based resilience to silent data corruption (SDC) and hard faults in a task-based domain-decomposition preconditioner for elliptic PDEs.
Explore scalability of a resilient task-based domain decomposition preconditioner for elliptic PDEs. Selective reliability to study the impact of different levels of simulated SDC and hard faults. Explore interplay between the application resilience, and the role of the server-client programming model.
We discuss algorithm-based resilience to silent data corruption (SDC) in a task- based domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm exploits a reformulation of the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to SDC. The implementation is based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Scalability tests run up to ~ 51 K cores show a parallel efficiency greater than 90%. We use a 2D elliptic PDE and a fault model based on random single bit-flip to demonstrate the resilience of the application to synthetically injected SDC. We discuss two fault scenarios: one based on the corruption of all data of a target task, and the other involving the corruption of a single data point. We show that for our application, given the test problem considered, a four-fold increase in the number of faults only yields a 2% change in the overhead to overcome their presence, from 7% to 9%. We then discuss potential savings in energy consumption via dynamics voltage/frequency scaling, and its interplay with fault-rates, and application overhead.
We present a resilient domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm reformulates the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to both soft and hard faults. We discuss an implementation based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Servers are assumed to be “sandboxed”, while no assumption is made on the reliability of the clients. We explore the scalability of the algorithm up to ∼12k cores, build an SST/macro skeleton to extrapolate to∼50k cores, and show the resilience under simulated hard and soft faults for a 2D linear Poisson equation.
We present a resilient domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm reformulates the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to both soft and hard faults. We discuss an implementation based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Servers are assumed to be “sandboxed”, while no assumption is made on the reliability of the clients. We explore the scalability of the algorithm up to ∼12k cores, build an SST/macro skeleton to extrapolate to∼50k cores, and show the resilience under simulated hard and soft faults for a 2D linear Poisson equation.
We present a domain-decomposition-based pre-conditioner for the solution of partial differential equations (PDEs) that is resilient to both soft and hard faults. The algorithm is based on the following steps: first, the computational domain is split into overlapping subdomains, second, the target PDE is solved on each subdomain for sampled values of the local current boundary conditions, third, the subdomain solution samples are collected and fed into a regression step to build maps between the subdomains' boundary conditions, finally, the intersection of these maps yields the updated state at the subdomain boundaries. This reformulation allows us to recast the problem as a set of independent tasks. The implementation relies on an asynchronous server-client framework, where one or more reliable servers hold the data, while the clients ask for tasks and execute them. This framework provides resiliency to hard faults such that if a client crashes, it stops asking for work, and the servers simply distribute the work among all the other clients alive. Erroneous subdomain solves (e.g. due to soft faults) appear as corrupted data, which is either rejected if that causes a task to fail, or is seamlessly filtered out during the regression stage through a suitable noise model. Three different types of faults are modeled: hard faults modeling nodes (or clients) crashing, soft faults occurring during the communication of the tasks between server and clients, and soft faults occurring during task execution. We demonstrate the resiliency of the approach for a 2D elliptic PDE, and explore the effect of the faults at various failure rates.
The objective of this work is to investigate the efficacy of using calibration strategies from Uncertainty Quantification (UQ) to determine model coefficients for LES. As the target methods are for engineering LES, uncertainty from numerical aspects of the model must also be quantified. 15 The ultimate goal of this research thread is to generate a cost versus accuracy curve for LES such that the cost could be minimized given an accuracy prescribed by an engineering need. Realization of this goal would enable LES to serve as a predictive simulation tool within the engineering design process.
Direct solutions of the Chemical Master Equation (CME) governing Stochastic Reaction Networks (SRNs) are generally prohibitively expensive due to excessive numbers of possible discrete states in such systems. To enhance computational efficiency we develop a hybrid approach where the evolution of states with low molecule counts is treated with the discrete CME model while that of states with large molecule counts is modeled by the continuum Fokker-Planck equation. The Fokker-Planck equation is discretized using a 2nd order finite volume approach with appropriate treatment of flux components. The numerical construction at the interface between the discrete and continuum regions implements the transfer of probability reaction by reaction according to the stoichiometry of the system. The performance of this novel hybrid approach is explored for a two-species circadian model with computational efficiency gains of about one order of magnitude.
In this paper, a series of algorithms are proposed to address the problems in the NASA Langley Research Center Multidisciplinary Uncertainty Quantification Challenge. A Bayesian approach is employed to characterize and calibrate the epistemic parameters based on the available data, whereas a variance-based global sensitivity analysis is used to rank the epistemic and aleatory model parameters. A nested sampling of the aleatory-epistemic space is proposed to propagate uncertainties from model parameters to output quantities of interest.
The move towards extreme-scale computing platforms challenges scientific simulations in many ways. Given the recent tendencies in computer architecture development, one needs to reformulate legacy codes in order to cope with large amounts of communication, system faults, and requirements of low-memory usage per core. In this work, we develop a novel framework for solving PDEs via domain decomposition that reformulates the solution as a state of knowledge with a probabilistic interpretation. Such reformulation allows resiliency with respect to potential faults without having to apply fault detection, avoids unnecessary communication, and is generally well-suited for rigorous uncertainty quantification studies that target improvements of predictive fidelity of scientific models. We demonstrate our algorithm for one-dimensional PDE examples where artificial faults have been implemented as bit flips in the binary representation of subdomain solutions.
Stochastic unit commitment models typically handle uncertainties in forecast demand by considering a finite number of realizations from a stochastic process model for loads. Accurate evaluations of expectations or higher moments for the quantities of interest require a prohibitively large number of model evaluations. In this paper we propose an alternative approach based on using surrogate models valid over the range of the forecast uncertainty. We consider surrogate models based on Polynomial Chaos expansions, constructed using sparse quadrature methods. Considering expected generation cost, we demonstrate that the approach can lead to several orders of magnitude reduction in computational cost relative to using Monte Carlo sampling on the original model, for a given target error threshold.