Causal discovery algorithms construct hypothesized causal graphs that depict causal dependencies among variables in observational data. While powerful, the accuracy of these algorithms is highly sensitive to the underlying dynamics of the system in ways that have not been fully characterized in the literature. In this report, we benchmark the PCMCI causal discovery algorithm in its application to gridded spatiotemporal systems. Effectively computing grid-level causal graphs on large grids will enable analysis of the causal impacts of transient and mobile spatial phenomena in large systems, such as the Earth’s climate. We evaluate the performance of PCMCI with a set of structural causal models, using simulated spatial vector autoregressive processes in one- and two-dimensions. We develop computational and analytical tools for characterizing these processes and their associated causal graphs. Our findings suggest that direct application of PCMCI is not suitable for the analysis of dynamical spatiotemporal gridded systems, such as climatological data, without significant preprocessing and downscaling of the data. PCMCI requires unrealistic sample sizes to achieve acceptable performance on even modestly sized problems and suffers from a notable curse of dimensionality. This work suggests that, even under generous structural assumptions, significant additional algorithmic improvements are needed before causal discovery algorithms can be reliably applied to grid-level outputs of earth system models.
The increasing use of machine learning (ML) models to support high-consequence decision making drives a need to increase the rigor of ML-based decision making. Critical problems ranging from climate change to nonproliferation monitoring rely on machine learning for aspects of their analyses. Likewise, future technologies, such as incorporation of data-driven methods into the stockpile surveillance and predictive failure analysis for weapons components, will all rely on decision-making that incorporates the output of machine learning models. In this project, our main focus was the development of decision scientific methods that combine uncertainty estimates for machine learning predictions, with a domain-specific model of error costs. Other focus areas include uncertainty measurement in ML predictions, designing decision rules using multiobjecive optimization, the value of uncertainty reduction, and decision-tailored uncertainty quantification for probability estimates. By laying foundations for rigorous decision making based on the predictions of machine learning models, these approaches are directly relevant to every national security mission that applies, or will apply, machine learning to data, most of which entail some decision context.
This report contains the written footprint of a Sandia-hosted workshop held in Albuquerque, New Mexico, June 22-23, 2016 on “Complex Systems Models and Their Applications: Towards a New Science of Verification, Validation and Uncertainty Quantification,” as well as of pre-work that fed into the workshop. The workshop’s intent was to explore and begin articulating research opportunities at the intersection between two important Sandia communities: the complex systems (CS) modeling community, and the verification, validation and uncertainty quantification (VVUQ) community The overarching research opportunity (and challenge) that we ultimately hope to address is: how can we quantify the credibility of knowledge gained from complex systems models, knowledge that is often incomplete and interim, but will nonetheless be used, sometimes in real-time, by decision makers?
Genetic algorithms provide attractive options for performing nonlinear multi-objective combinatorial design optimization, and they have proven very useful for optimizing individual systems. However, conventional genetic algorithms fall short when performing holistic portfolio optimizations in which the decision variables also include the integer counts of multiple system types over multiple time periods. When objective functions are formulated as analytic functions, we can formally differentiate with respect to system counts and use the resulting gradient information to generate favorable mutations in the count variables. We apply several variations on this basic idea to an idealized hanging chain example to obtain >> 1000x speedups over conventional genetic algorithms in both single - and multi-objective cases. We develop a more complex example of a notional military portfolio that includes combinatorial design variables and dependency constraints between the design choices. In this case, our initial results are mixed, but many variations are still open to further research.
In 2012, Hurricane Sandy devastated much of the U.S. northeast coastal areas. Among those hardest hit was the small community of Hoboken, New Jersey, located on the banks of the Hudson River across from Manhattan. This report describes a city-wide electrical infrastructure design that uses microgrids and other infrastructure to ensure the city retains functionality should such an event occur in the future. The designs ensure that up to 55 critical buildings will retain power during blackout or flooded conditions and include analysis for microgrid architectures, performance parameters, system control, renewable energy integration, and financial opportunities (while grid connected). The results presented here are not binding and are subject to change based on input from the Hoboken stakeholders, the integrator selected to manage and implement the microgrid, or other subject matter experts during the detailed (final) phase of the design effort.