The ground truth program used simulations as test beds for social science research methods. The simulations had known ground truth and were capable of producing large amounts of data. This allowed research teams to run experiments and ask questions of these simulations similar to social scientists studying real-world systems, and enabled robust evaluation of their causal inference, prediction, and prescription capabilities. We tested three hypotheses about research effectiveness using data from the ground truth program, specifically looking at the influence of complexity, causal understanding, and data collection on performance. We found some evidence that system complexity and causal understanding influenced research performance, but no evidence that data availability contributed. The ground truth program may be the first robust coupling of simulation test beds with an experimental framework capable of teasing out factors that determine the success of social science research.

Measures of simulation model complexity generally focus on outputs; we propose measuring the complexity of a model’s causal structure to gain insight into its fundamental character. This article introduces tools for measuring causal complexity. First, we introduce a method for developing a model’s causal structure diagram, which characterises the causal interactions present in the code. Causal structure diagrams facilitate comparison of simulation models, including those from different paradigms. Next, we develop metrics for evaluating a model’s causal complexity using its causal structure diagram. We discuss cyclomatic complexity as a measure of the intricacy of causal structure and introduce two new metrics that incorporate the concept of feedback, a fundamental component of causal structure. The first new metric introduced here is feedback density, a measure of the cycle-based interconnectedness of causal structure. The second metric combines cyclomatic complexity and feedback density into a comprehensive causal complexity measure. Finally, we demonstrate these complexity metrics on simulation models from multiple paradigms and discuss potential uses and interpretations. These tools enable direct comparison of models across paradigms and provide a mechanism for measuring and discussing complexity based on a model’s fundamental assumptions and design.

The causal structure of a simulation is a major determinant of both its character and behavior, yet most methods we use to compare simulations focus only on simulation outputs. We introduce a method that combines graphical representation with information theoretic metrics to quantitatively compare the causal structures of models. The method applies to agent-based simulations as well as system dynamics models and facilitates comparison within and between types. Comparing models based on their causal structures can illuminate differences in assumptions made by the models, allowing modelers to (1) better situate their models in the context of existing work, including highlighting novelty, (2) explicitly compare conceptual theory and assumptions to simulated theory and assumptions, and (3) investigate potential causal drivers of divergent behavior between models. We demonstrate the method by comparing two epidemiology models at different levels of aggregation.

Social systems are uniquely complex and difficult to study, but understanding them is vital to solving the world’s problems. The Ground Truth program developed a new way of testing the research methods that attempt to understand and leverage the Human Domain and its associated complexities. The program developed simulations of social systems as virtual world test beds. Not only were these simulations able to produce data on future states of the system under various circumstances and scenarios, but their causal ground truth was also explicitly known. Research teams studied these virtual worlds, facilitating deep validation of causal inference, prediction, and prescription methods. The Ground Truth program model provides a way to test and validate research methods to an extent previously impossible, and to study the intricacies and interactions of different components of research.

This paper describes and analyzes the Discrete Direct (DD) model calibration and uncertainty propagation approach for computational models calibrated to data from sparse replicate tests of stochastically varying phenomena. The DD approach consists of generating and propagating discrete realizations of possible calibration parameter values corresponding to possible realizations of the uncertain inputs and outputs of the experiments. This is in contrast to model calibration methods that attempt to assign or infer continuous probability density functions for the calibration parameters. The DD approach straightforwardly accommodates aleatory variabilities and epistemic uncertainties (interval and/or probabilistically represented) in system properties and behaviors, in input initial and boundary conditions, and in measurement uncertainties of experimental inputs and outputs. In particular, the approach has several advantages over Bayesian and other calibration techniques in capturing and utilizing the information obtained from the typically small number of replicate experiments in model calibration situations, especially when sparse realizations of random function data like force-displacement curves from replicate material tests are used for calibration. The DD approach better preserves the fundamental information from the experimental data in a way that enables model predictions to be more directly tied to the supporting experimental data. The DD methodology is also simpler and typically less expensive than other established calibration-UQ approaches, is straightforward to implement, and is plausibly more reliably conservative and accurate for sparse-data calibration-UQ problems. The methodology is explained and analyzed in this paper under several regimes of model calibration and uncertainty propagation circumstances.

A discrete direct (DD) model calibration and uncertainty propagation approach is explained and demonstrated on a 4-parameter Johnson-Cook (J-C) strain-rate dependent material strength model for an aluminum alloy. The methodology’s performance is characterized in many trials involving four random realizations of strain-rate dependent material-test data curves per trial, drawn from a large synthetic population. The J-C model is calibrated to particular combinations of the data curves to obtain calibration parameter sets which are then propagated to “Can Crush” structural model predictions to produce samples of predicted response variability. These are processed with appropriate sparse-sample uncertainty quantification (UQ) methods to estimate various statistics of response with an appropriate level of conservatism. This is tested on 16 output quantities (von Mises stresses and equivalent plastic strains) and it is shown that important statistics of the true variabilities of the 16 quantities are bounded with a high success rate that is reasonably predictable and controllable. The DD approach has several advantages over other calibration-UQ approaches like Bayesian inference for capturing and utilizing the information obtained from typically small numbers of replicate experiments in model calibration situations—especially when sparse replicate functional data are involved like force–displacement curves from material tests. The DD methodology is straightforward and efficient for calibration and propagation problems involving aleatory and epistemic uncertainties in calibration experiments, models, and procedures.

When analyzing and predicting stochastic variability in a population of devices or systems, it is important to segregate epistemic lack-of-knowledge uncertainties and aleatory uncertainties due to stochastic variation in the population. This traditionally requires dual-loop Monte Carlo (MC) uncertainty propagation where the outer loop samples the epistemic uncertainties and for each realization, an inner loop samples and propagates the aleatory uncertainties. This results in various realizations of what the aleatory distribution of population response variability might be. Under certain conditions, the various possible realizations can be represented in a concise manner by approximate upper and lower bounding distributions of the same shape, composing a “Level 1” approximate probability box (L1 APbox). These are usually sufficient for model validation purposes, for example, and can be formed with substantially reduced computational cost and complication in propagating the aleatory and epistemic uncertainties (compared to dual-loop MC). Propagation cost can be further reduced by constructing and sampling response surface models that approximate the variation of physics-model output responses over the uncertainty parameter space. A simple dimension-and order-adaptive polynomial response surface approach is demonstrated for propagating the aleatory and epistemic uncertainties in a L1 APbox and for estimating the error contributed by using the surrogate model. Sensitivity analysis is also performed to quantify which uncertainty sources contribute most to the total aleatory-epistemic uncertainty in predicted response. The methodology is demonstrated as part of a model validation assessment involving thermal-chemical-mechanical response and weld breach failure of sealed canisters weakened by high temperatures and pressurized by heat-induced pyrolysis of foam.

This paper presents a practical methodology for propagating and combining the effects of random variations of several continuous scalar quantities and several random-function quantities affecting the failure pressure of a heated pressurized vessel. The random functions are associated with stress-strain curve test-to-test variability in replicate material strength tests (uniaxial tension tests) on nominally identical material specimens. It is demonstrated how to effectively propagate the curve-to-curve discrete variations and appropriately account for the small sample size of functional data realizations. This is coordinated with the propagation of aleatory variability described by uncertainty distributions for continuous scalar quantities of pressure-vessel wall thickness, weld depth, and thermal-contact factor. Motivated by the high expense of the pressure vessel simulations of heating, pressurization, and failure, a simple dimension-and order-adaptive polynomial response surface approach is used to propagate effects of the random variables and enable uncertainty estimates on the error contributed by using the surrogate model. Linear convolution is used to aggregate the resultant aleatory uncertainty from the parametrically propagated random variables with an appropriately conservative probability distribution of aleatory effects from propagating the multiple stress-strain curves for each material. The response surface constructions, Monte Carlo sampling of them for uncertainty propagation, and linear sensitivity analysis and convolution procedures, are demonstrated with standard EXCEL spreadsheet functions (no special software needed).

In the past few decades, advancements in computing hardware and physical modeling capability have allowed computer models such as computational fluid dynamics to accelerate the development cycle of aerospace products. In general, model behavior is well-understood in the heart of the flight envelope, such as the cruise condition for a conventional commercial aircraft. Models have been well validated at these conditions, so the practice of running a single, deterministic solution to assess aircraft performance is sufficient for engineering purposes. However, the aerospace industry is beginning to apply models to configurations at the edge of the flight envelope. In this regime, uncertainty in the model due to its mathematical form, numerical behavior, or model parameters may become important. Uncertainty Quantification is the process of characterizing all major sources of uncertainty in the model and quantifying their effect on analysis outcomes. The goal of this paper is to survey modern uncertainty quantification methodologies and relate them to aerospace applications. Ultimately, uncertainty quantification enables modelers and simulation practitioners to make more informed statements about the uncertainty and associated degree of credibility of model-based predictions.

Tolerance Interval Equivalent Normal (TI-EN) and Superdistribution (SD) sparse-sample uncertainty quantification (UQ) methods are used for conservative estimation of small tail probabilities. These methods are used to estimate the probability of a response laying beyond a specified threshold with limited data. The study focused on sparse-sample regimes ranging from N = 2 to 20 samples, because this is reflective of most experimental and some expensive computational situations. A tail probability magnitude of 10−4 was examined on four different distribution shapes, in order to be relevant for quantification of margins and uncertainty (QMU) problems that arise in risk and reliability analyses. In most cases the UQ methods were found to have optimal performance with a small number of samples, beyond which the performance deteriorated as samples were added. Using this observation, a generalized Jackknife resampling technique was developed to average many smaller subsamples. This improved the performance of the SD and TI-EN methods, specifically when a larger than optimal number of samples were available. A Complete Jackknifing technique, which considered all possible sub-sample combinations, was shown to perform better in most cases than an alternative Bootstrap resampling technique.