The ground truth program used simulations as test beds for social science research methods. The simulations had known ground truth and were capable of producing large amounts of data. This allowed research teams to run experiments and ask questions of these simulations similar to social scientists studying real-world systems, and enabled robust evaluation of their causal inference, prediction, and prescription capabilities. We tested three hypotheses about research effectiveness using data from the ground truth program, specifically looking at the influence of complexity, causal understanding, and data collection on performance. We found some evidence that system complexity and causal understanding influenced research performance, but no evidence that data availability contributed. The ground truth program may be the first robust coupling of simulation test beds with an experimental framework capable of teasing out factors that determine the success of social science research.
Measures of simulation model complexity generally focus on outputs; we propose measuring the complexity of a model’s causal structure to gain insight into its fundamental character. This article introduces tools for measuring causal complexity. First, we introduce a method for developing a model’s causal structure diagram, which characterises the causal interactions present in the code. Causal structure diagrams facilitate comparison of simulation models, including those from different paradigms. Next, we develop metrics for evaluating a model’s causal complexity using its causal structure diagram. We discuss cyclomatic complexity as a measure of the intricacy of causal structure and introduce two new metrics that incorporate the concept of feedback, a fundamental component of causal structure. The first new metric introduced here is feedback density, a measure of the cycle-based interconnectedness of causal structure. The second metric combines cyclomatic complexity and feedback density into a comprehensive causal complexity measure. Finally, we demonstrate these complexity metrics on simulation models from multiple paradigms and discuss potential uses and interpretations. These tools enable direct comparison of models across paradigms and provide a mechanism for measuring and discussing complexity based on a model’s fundamental assumptions and design.
The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Spent Fuel & Waste Disposition (SFWD) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). A high priority for SFWST disposal R&D is disposal system modeling (Sassani et al. 2021). The SFWST Geologic Disposal Safety Assessment (GDSA) work package is charged with developing a disposal system modeling and analysis capability for evaluating generic disposal system performance for nuclear waste in geologic media. This report describes fiscal year (FY) 2022 advances of the Geologic Disposal Safety Assessment (GDSA) performance assessment (PA) development groups of the SFWST Campaign. The common mission of these groups is to develop a geologic disposal system modeling capability for nuclear waste that can be used to assess probabilistically the performance of generic disposal options and generic sites. The modeling capability under development is called GDSA Framework (pa.sandia.gov). GDSA Framework is a coordinated set of codes and databases designed for probabilistically simulating the release and transport of disposed radionuclides from a repository to the biosphere for post-closure performance assessment. Primary components of GDSA Framework include PFLOTRAN to simulate the major features, events, and processes (FEPs) over time, Dakota to propagate uncertainty and analyze sensitivities, meshing codes to define the domain, and various other software for rendering properties, processing data, and visualizing results.
Geologic Disposal Safety Assessment Framework is a state-of-the-art simulation software toolkit for probabilistic post-closure performance assessment of systems for deep geologic disposal of nuclear waste developed by the United States Department of Energy. This paper presents a generic reference case and shows how it is being used to develop and demonstrate performance assessment methods within the Geologic Disposal Safety Assessment Framework that mitigate some of the challenges posed by high uncertainty and limited computational resources. Variance-based global sensitivity analysis is applied to assess the effects of spatial heterogeneity using graph-based summary measures for scalar and time-varying quantities of interest. Behavior of the system with respect to spatial heterogeneity is further investigated using ratios of water fluxes. This analysis shows that spatial heterogeneity is a dominant uncertainty in predictions of repository performance which can be identified in global sensitivity analysis using proxy variables derived from graph descriptions of discrete fracture networks. New quantities of interest defined using water fluxes proved useful for better understanding overall system behavior.
The focus of this project is to accelerate and transform the workflow of multiscale materials modeling by developing an integrated toolchain seamlessly combining DFT, SNAP, LAMMPS, (shown in Figure 1-1) and a machine-learning (ML) model that will more efficiently extract information from a smaller set of first-principles calculations. Our ML model enables us to accelerate first-principles data generation by interpolating existing high fidelity data, and extend the simulation scale by extrapolating high fidelity data (102 atoms) to the mesoscale (104 atoms). It encodes the underlying physics of atomic interactions on the microscopic scale by adapting a variety of ML techniques such as deep neural networks (DNNs), and graph neural networks (GNNs). We developed a new surrogate model for density functional theory using deep neural networks. The developed ML surrogate is demonstrated in a workflow to generate accurate band energies, total energies, and density of the 298K and 933K Aluminum systems. Furthermore, the models can be used to predict the quantities of interest for systems with more number of atoms than the training data set. We have demonstrated that the ML model can be used to compute the quantities of interest for systems with 100,000 Al atoms. When compared with 2000 Al system the new surrogate model is as accurate as DFT, but three orders of magnitude faster. We also explored optimal experimental design techniques to choose the training data and novel Graph Neural Networks to train on smaller data sets. These are promising methods that need to be explored in the future.
The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Fuel Cycle Technology (FCT) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling. These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) control account, which is charged with developing a geologic repository system modeling and analysis capability, and the associated software, GDSA Framework, for evaluating disposal system performance for nuclear waste in geologic media. GDSA Framework is supported by SFWST Campaign and its predecessor the Used Fuel Disposition (UFD) campaign.
The causal structure of a simulation is a major determinant of both its character and behavior, yet most methods we use to compare simulations focus only on simulation outputs. We introduce a method that combines graphical representation with information theoretic metrics to quantitatively compare the causal structures of models. The method applies to agent-based simulations as well as system dynamics models and facilitates comparison within and between types. Comparing models based on their causal structures can illuminate differences in assumptions made by the models, allowing modelers to (1) better situate their models in the context of existing work, including highlighting novelty, (2) explicitly compare conceptual theory and assumptions to simulated theory and assumptions, and (3) investigate potential causal drivers of divergent behavior between models. We demonstrate the method by comparing two epidemiology models at different levels of aggregation.
Social systems are uniquely complex and difficult to study, but understanding them is vital to solving the world’s problems. The Ground Truth program developed a new way of testing the research methods that attempt to understand and leverage the Human Domain and its associated complexities. The program developed simulations of social systems as virtual world test beds. Not only were these simulations able to produce data on future states of the system under various circumstances and scenarios, but their causal ground truth was also explicitly known. Research teams studied these virtual worlds, facilitating deep validation of causal inference, prediction, and prescription methods. The Ground Truth program model provides a way to test and validate research methods to an extent previously impossible, and to study the intricacies and interactions of different components of research.
Adams, Brian H.; Bohnhoff, William B.; Dalbey, Keith D.; Ebeida, Mohamed S.; Eddy, John E.; Eldred, Michael E.; Hooper, Russell H.; Hough, Patricia H.; Hu, Kenneth H.; Jakeman, John J.; Khalil, Mohammad K.; Maupin, Kathryn M.; Monschke, Jason A.; Ridgway, Elliott R.; Rushdi, Ahmad A.; Seidl, Daniel S.; Stephens, John A.; Swiler, Laura P.; Tran, Anh; Winokur, Justin W.
The Dakota toolkit provides a flexible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantification with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the Dakota toolkit provides a flexible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a user's manual for the Dakota software and provides capability overviews and procedures for software execution, as well as a variety of example studies.
The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Spent Fuel & Waste Disposition (SFWD) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and highlevel nuclear waste (HLW). A high priority for SFWST disposal R&D is disposal system modeling (DOE 2012, Table 6; Sevougian et al. 2019). The SFWST Geologic Disposal Safety Assessment (GDSA) work package is charged with developing a disposal system modeling and analysis capability for evaluating generic disposal system performance for nuclear waste in geologic media.
Over the past four years, an informal working group has developed to investigate existing sensitivity analysis methods, examine new methods, and identify best practices. The focus is on the use of sensitivity analysis in case studies involving geologic disposal of spent nuclear fuel or nuclear waste. To examine ideas and have applicable test cases for comparison purposes, we have developed multiple case studies. Four of these case studies are presented in this report: the GRS clay case, the SNL shale case, the Dessel case, and the IBRAE groundwater case. We present the different sensitivity analysis methods investigated by various groups, the results obtained by different groups and different implementations, and summarize our findings.
This report summarizes the activities performed as part of the Science and Engineering of Cybersecurity by Uncertainty quantification and Rigorous Experimentation (SECURE) Grand Challenge LDRD project. We provide an overview of the research done in this project, including work on cyber emulation, uncertainty quantification, and optimization. We present examples of integrated analyses performed on two case studies: a network scanning/detection study and a malware command and control study. We highlight the importance of experimental workflows and list references of papers and presentations developed under this project. We outline lessons learned and suggestions for future work.
All disciplines that use models to predict the behavior of real-world systems need to determine the accuracy of the models’ results. Techniques for verification, validation, and uncertainty quantification (VVUQ) focus on improving the credibility of computational models and assessing their predictive capability. VVUQ emphasizes rigorous evaluation of models and how they are applied to improve understanding of model limitations and quantify the accuracy of model predictions.
The June 15, 1991 Mt. Pinatubo eruption is simulated in E3SM by injecting 10 Tg of SO2 gas in the stratosphere, turning off prescribed volcanic aerosols, and enabling E3SM to treat stratospheric volcanic aerosols prognostically. This experimental prognostic treatment of volcanic aerosols in the stratosphere results in some realistic behaviors (SO2 evolves into H2SO4 which heats the lower stratosphere), and some expected biases (H2SO4 aerosols sediment out of the stratosphere too quickly). Climate fingerprinting techniques are used to establish a Mt. Pinatubo fingerprint based on the vertical profile of temperature from the E3SMv1 DECK ensemble. By projecting reanalysis data and preindustrial simulations onto the fingerprint, the Mt. Pinatubo stratospheric heating anomaly is detected. Projecting the experimental prognostic aerosol simulation onto the fingerprint also results in a detectable heating anomaly, but, as expected, the duration is too short relative to reanalysis data.
This report presents the results of the “Foundations of Rigorous Cyber Experimentation” (FORCE) Laboratory Directed Research and Development (LDRD) project. This project is a companion project to the “Science and Engineering of Cyber security through Uncertainty quantification and Rigorous Experimentation” (SECURE) Grand Challenge LDRD project. This project leverages the offline, controlled nature of cyber experimentation technologies in general, and emulation testbeds in particular, to assess how uncertainties in network conditions affect uncertainties in key metrics. We conduct extensive experimentation using a Firewheel emulation-based cyber testbed model of Invisible Internet Project (I2P) networks to understand a de-anonymization attack formerly presented in the literature. Our goals in this analysis are to see if we can leverage emulation testbeds to produce reliably repeatable experimental networks at scale, identify significant parameters influencing experimental results, replicate the previous results, quantify uncertainty associated with the predictions, and apply multi-fidelity techniques to forecast results to real-world network scales. The I2P networks we study are up to three orders of magnitude larger than the networks studied in SECURE and presented additional challenges to identify significant parameters. The key contributions of this project are the application of SECURE techniques such as UQ to a scenario of interest and scaling the SECURE techniques to larger network sizes. This report describes the experimental methods and results of these studies in more detail. In addition, the process of constructing these large-scale experiments tested the limits of the Firewheel emulation-based technologies. Therefore, another contribution of this work is that it informed the Firewheel developers of scaling limitations, which were subsequently corrected.
The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Fuel Cycle Technology (FCT) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling. These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) control account, which is charged with developing a geologic repository system modeling and analysis capability, and the associated software, GDSA Framework, for evaluating disposal system performance for nuclear waste in geologic media. GDSA Framework is supported by SFWST Campaign and its predecessor the Used Fuel Disposition (UFD) campaign. This report fulfills the GDSA Uncertainty and Sensitivity Analysis Methods work package (SF-21SN01030404) level 3 milestone, Uncertainty and Sensitivity Analysis Methods and Applications in GDSA Framework (FY2021) (M3SF-21SN010304042). It presents high level objectives and strategy for development of uncertainty and sensitivity analysis tools, demonstrates uncertainty quantification (UQ) and sensitivity analysis (SA) tools in GDSA Framework in FY21, and describes additional UQ/SA tools whose future implementation would enhance the UQ/SA capability of GDSA Framework. This work was closely coordinated with the other Sandia National Laboratory GDSA work packages: the GDSA Framework Development work package (SF-21SN01030405), the GDSA Repository Systems Analysis work package (SF-21SN01030406), and the GDSA PFLOTRAN Development work package (SF-21SN01030407). This report builds on developments reported in previous GDSA Framework milestones, particularly M3SF 20SN010304032.
Cyber testbeds provide an important mechanism for experimentally evaluating cyber security performance. However, as an experimental discipline, reproducible cyber experimentation is essential to assure valid, unbiased results. Even minor differences in setup, configuration, and testbed components can have an impact on the experiments, and thus, reproducibility of results. This paper documents a case study in reproducing an earlier emulation study, with the reproduced emulation experiment conducted by a different research group on a different testbed. We describe lessons learned as a result of this process, both in terms of the reproducibility of the original study and in terms of the different testbed technologies used by both groups. This paper also addresses the question of how to compare results between two groups' experiments, identifying candidate metrics for comparison and quantifying the results in this reproduction study.
Swiler, Laura P.; Brooks, Dusty M.; Stein, Emily S.; Röhlig, Klaus-Jürgen R.; Plischke, Elmar P.; Becker, Dirk-Alexander B.; Spiessl, Sabine M.; Koskinen, Lasse K.; Govaerts, Joan G.; Svitelman, Valentina S.; Saveleva., Elena Saveleva.