For decades, Arctic temperatures have increased twice as fast as average global temperatures. As a first step toward quantifying parametric uncertainty in Arctic climate, we performed a variance-based global sensitivity analysis (GSA) using a fully coupled, ultra-low resolution (ULR) configuration of version 1 of the U.S. Department of Energy's Energy Exascale Earth System Model (E3SMv1). Specifically, we quantified the sensitivity of six quantities of interests (QOIs), which characterize changes in Arctic climate over a 75 year period, to uncertainties in nine model parameters spanning the sea ice, atmosphere, and ocean components of E3SMv1. Sensitivity indices for each QOI were computed with a Gaussian process emulator using 139 random realizations of the random parameters and fixed preindustrial forcing. Uncertainties in the atmospheric parameters in the Cloud Layers Unified by Binormals (CLUBB) scheme were found to have the most impact on sea ice status and the larger Arctic climate. Our results demonstrate the importance of conducting sensitivity analyses with fully coupled climate models. The ULR configuration makes such studies computationally feasible today due to its low computational cost. When advances in computational power and modeling algorithms enable the tractable use of higher-resolution models, our results will provide a baseline that can quantify the impact of model resolution on the accuracy of sensitivity indices. Moreover, the confidence intervals provided by our study, which we used to quantify the impact of the number of model evaluations on the accuracy of sensitivity estimates, have the potential to inform the computational resources needed for future sensitivity studies.
We present an adaptive algorithm for constructing surrogate models of multi-disciplinary systems composed of a set of coupled components. With this goal we introduce “coupling” variables with a priori unknown distributions that allow surrogates of each component to be built independently. Once built, the surrogates of the components are combined to form an integrated-surrogate that can be used to predict system-level quantities of interest at a fraction of the cost of the original model. The error in the integrated-surrogate is greedily minimized using an experimental design procedure that allocates the amount of training data, used to construct each component-surrogate, based on the contribution of those surrogates to the error of the integrated-surrogate. The multi-fidelity procedure presented is a generalization of multi-index stochastic collocation that can leverage ensembles of models of varying cost and accuracy, for one or more components, to reduce the computational cost of constructing the integrated-surrogate. Extensive numerical results demonstrate that, for a fixed computational budget, our algorithm is able to produce surrogates that are orders of magnitude more accurate than methods that treat the integrated system as a black-box.
We present a surrogate modeling framework for conservatively estimating measures of risk from limited realizations of an expensive physical experiment or computational simulation. Risk measures combine objective probabilities with the subjective values of a decision maker to quantify anticipated outcomes. Given a set of samples, we construct a surrogate model that produces estimates of risk measures that are always greater than their empirical approximations obtained from the training data. These surrogate models limit over-confidence in reliability and safety assessments and produce estimates of risk measures that converge much faster to the true value than purely sample-based estimates. We first detail the construction of conservative surrogate models that can be tailored to a stakeholder's risk preferences and then present an approach, based on stochastic orders, for constructing surrogate models that are conservative with respect to families of risk measures. Our surrogate models include biases that permit them to conservatively estimate the target risk measures. We provide theoretical results that show that these biases decay at the same rate as the L2 error in the surrogate model. Numerical demonstrations confirm that risk-adapted surrogate models do indeed overestimate the target risk measures while converging at the expected rate.
This paper describes an efficient reverse-mode differentiation algorithm for contraction operations for arbitrary and unconventional tensor network topologies. The approach leverages the tensor contraction tree of Evenbly and Pfeifer (2014), which provides an instruction set for the contraction sequence of a network. We show that this tree can be efficiently leveraged for differentiation of a full tensor network contraction using a recursive scheme that exploits (1) the bilinear property of contraction and (2) the property that trees have a single path from root to leaves. While differentiation of tensor-tensor contraction is already possible in most automatic differentiation packages, we show that exploiting these two additional properties in the specific context of contraction sequences can improve eficiency. Following a description of the algorithm and computational complexity analysis, we investigate its utility for gradient-based supervised learning for low-rank function recovery and for fitting real-world unstructured datasets. We demonstrate improved performance over alternating least-squares optimization approaches and the capability to handle heterogeneous and arbitrary tensor network formats. When compared to alternating minimization algorithms, we find that the gradient-based approach requires a smaller oversampling ratio (number of samples compared to number model parameters) for recovery. This increased efficiency extends to fitting unstructured data of varying dimensionality and when employing a variety of tensor network formats. Here, we show improved learning using the hierarchical Tucker method over the tensor-train in high-dimensional settings on a number of benchmark problems.
Multi-model Monte Carlo methods have been illustrated to be an efficient and accurate alternative to standard Monte Carlo (MC) in the model-based propagation of uncertainty in entry, descent, and landing (EDL) applications. These multi-model MC methods fuse predictions from low-fidelity models with the high-fidelity EDL model of interest to produce unbiased statistics with a fraction of the computational cost. The accuracy and efficiency of the multi-model MC methods are dependent upon the magnitude of correlations of the low-fidelity models with the high-fidelity model, but also upon the correlation amongst the low-fidelity models, and their relative computational cost. Because of this layer of complexity, the question of how to optimally select the set of low-fidelity models has remained open. In this work, methods for optimal model construction and tuning are investigated as a means to increase the speed and precision of trajectory simulation for EDL. Specifically, the focus is on the inclusion of low-fidelity model tuning within the sample allocation optimization that accompanies multi-model MC methods. Results indicate that low-fidelity model tuning can significantly improve efficiency and precision of trajectory simulations and provide an increased edge to multi-model MC methods when compared to standard MC.
Wang, Qian; Guillaume, Joseph; Jakeman, John D.; Yang, Tao; Iwanaga, Takuya; Croke, Barry; Jakeman, Tony
Despite widespread use of factor fixing in environmental modeling, its effect on model predictions has received little attention and is instead commonly presumed to be negligible. We propose a proof-of-concept adaptive method for systematically investigating the impact of factor fixing. The method uses Global Sensitivity Analysis methods to identify groups of sensitive parameters, then quantifies which groups can be safely fixed at nominal values without exceeding a maximum acceptable error, demonstrated using the 21-dimensional Sobol’ G-function. Furthermore, three error measures are considered for quantities of interest, namely Relative Mean Absolute Error, Pearson Product-Moment Correlation and Relative Variance. Results demonstrate that factor fixing may cause large errors in the model results unexpectedly, when preliminary analysis suggests otherwise, and that the default value selected affects the number of factors to fix. To improve the applicability and methodological development of factor fixing, a new research agenda encompassing five opportunities is discussed for further attention.
Constructing accurate statistical models of critical system responses typically requires an enormous amount of data from physical experiments or numerical simulations. Unfortunately, data generation is often expensive and time consuming. To streamline the data generation process, optimal experimental design determines the 'best' allocation of experiments with respect to a criterion that measures the ability to estimate some important aspect of an assumed statistical model. While optimal design has a vast literature, few researchers have developed design paradigms targeting tail statistics, such as quantiles. In this project, we tailored and extended traditional design paradigms to target distribution tails. Our approach included (i) the development of new optimality criteria to shape the distribution of prediction variances, (ii) the development of novel risk-adapted surrogate models that provably overestimate certain statistics including the probability of exceeding a threshold, and (iii) the asymptotic analysis of regression approaches that target tail statistics such as superquantile regression. To accompany our theoretical contributions, we released implementations of our methods for surrogate modeling and design of experiments in two complementary open source software packages, the ROL/OED Toolkit and PyApprox.