Tailoring for Trust: T4T
Abstract not provided.
Abstract not provided.
Climate impacts have broad economic, health, political, and national security ramifications. Societally relevant impacts are typically farther downstream, are the product of multiple interacting processes, and can arise over small regions and timeframes because their sources are short-term and localized. Short-term forcings (as can be seen in volcanic eruptions, climatic tipping points (e.g., the collapse of rainforests or the disappearance of sea ice), or in increasingly plausible climate interventions) fundamentally possess low signal-to-noise and could benefit from accounting for the multiple conditional processes through which a downstream impact arises. Under the Grand Challenge LDRD CLDERA (CLimate impacts: Discovering Etiology thRough pAthways), we have developed tools to enable downstream impact attribution from geographically and temporally localized source forcings in the climate. CLDERA developed methods that can distinguish how a localized source drives the climate system to respond with particular impacts. The how is embodied in pathways – the spatio-temporally evolving chain of physical processes that connects a source to a series of increasingly distant impacts. Novel analytic methods in pursuit of downstream impact attribution were developed and demonstrated on simulations and observations of the 1991 eruption of Mt. Pinatubo in the Philippines. As described within this report we have • developed stratospheric expertise and aerosol modeling capabilities in E3SM, • created original methods to detect and model pathways from source-to-impact, and • advanced climate attribution through novel methods, cases, and approaches. Further, CLDERA developed a tiered verification process consisting of controlled datasets to prototype, verify, and refine the original method development. CLDERA increased Sandia’s footprint in the climate analytics community and developed new climate collaborations whilst also creating a cadre of climate analysts at Sandia. The products from CLDERA have been extensive with a total of 9 journal articles published, 12 articles submitted and under review, and an additional 8 articles in preparation. We have produced 1750 simulated years and developed 9 code-bases. This report details these accomplishments and serves as a summary of the work completed during the CLDERA Grand Challenge.
Random forests have become popular models used for data driven predictions. As a result, random forests are currently used or being considered for high-consequence mission applications in national security, such as the prediction of yield from optical signals and malware detection. While random forests may provide accurate predictions, the complexity of the algorithm causes a lack of interpretability. Random forests are an ensemble of regression or decision trees. Individual regression and decision trees are interpretable, but ensembles are inherently difficult to interpret due to the compilation of many models. We aim to increase the interpretability of random forests by finding patterns in the ensemble of trees that can be used to “thin” (or remove) trees. As a starting point, in this report, we develop a new distance metric for quantifying the similarity between trees based on their topologies (i.e., shapes). We base the metric on a novel distance metric for graphs that is a proper mathematical distance, is invariant to transformations, has registration between graphs, and computes topological evolutions between graphs. We use the tree distance metric to compute tree statistics such as a “mean tree” and to identify clusters of trees. We apply the developed methodology to a toy dataset and a mission relevant product inspection dataset to demonstrate how the metric can provide insight into random forests. Furthermore, we discuss the limitations of the approach and ideas for future research into how the metric could be used as a thinning tool to develop less complex models.
Abstract not provided.
Abstract not provided.
Statistical Analysis and Data Mining
The 2022 National Defense Strategy of the United States listed climate change as a serious threat to national security. Climate intervention methods, such as stratospheric aerosol injection, have been proposed as mitigation strategies, but the downstream effects of such actions on a complex climate system are not well understood. The development of algorithmic techniques for quantifying relationships between source and impact variables related to a climate event (i.e., a climate pathway) would help inform policy decisions. Data-driven deep learning models have become powerful tools for modeling highly nonlinear relationships and may provide a route to characterize climate variable relationships. In this paper, we explore the use of an echo state network (ESN) for characterizing climate pathways. ESNs are a computationally efficient neural network variation designed for temporal data, and recent work proposes ESNs as a useful tool for forecasting spatiotemporal climate data. However, ESNs are noninterpretable black-box models along with other neural networks. The lack of model transparency poses a hurdle for understanding variable relationships. We address this issue by developing feature importance methods for ESNs in the context of spatiotemporal data to quantify variable relationships captured by the model. We conduct a simulation study to assess and compare the feature importance techniques, and we demonstrate the approach on reanalysis climate data. In the climate application, we consider a time period that includes the 1991 volcanic eruption of Mount Pinatubo. This event was a significant stratospheric aerosol injection, which acts as a proxy for an anthropogenic stratospheric aerosol injection. We are able to use the proposed approach to characterize relationships between pathway variables associated with this event that agree with relationships previously identified by climate scientists.
Abstract not provided.
Abstract not provided.
Spatial Statistics
As global temperatures continue to rise, climate mitigation strategies such as stratospheric aerosol injections (SAI) are increasingly discussed, but the downstream effects of these strategies are not well understood. As such, there is interest in developing statistical methods to quantify the evolution of climate variable relationships during the time period surrounding an SAI. Feature importance applied to echo state network (ESN) models has been proposed as a way to understand the effects of SAI using a data-driven model. This approach depends on the ESN fitting the data well. If not, the feature importance may place importance on features that are not representative of the underlying relationships. Typically, time series prediction models such as ESNs are assessed using out-of-sample performance metrics that divide the times series into separate training and testing sets. However, this model assessment approach is geared towards forecasting applications and not scenarios such as the motivating SAI example where the objective is using a data driven model to capture variable relationships. Here, in this paper, we demonstrate a novel use of climate model replicates to investigate the applicability of the commonly used repeated hold-out model assessment approach for the SAI application. Simulations of an SAI are generated using a simplified climate model, and different initialization conditions are used to provide independent training and testing sets containing the same SAI event. The climate model replicates enable out-of-sample measures of model performance, which are compared to the single time series hold-out validation approach. For our case study, it is found that the repeated hold-out sample performance is comparable, but conservative, to the replicate out-of-sample performance when the training set contains enough time after the aerosol injection.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Deep neural networks (NNs) typically outperform traditional machine learning (ML) approaches for complicated, non-linear tasks. It is expected that deep learning (DL) should offer superior performance for the important non-proliferation task of predicting explosive device configuration based upon observed optical signature, a task which human experts struggle with. However, supervised machine learning is difficult to apply in this mission space because most recorded signatures are not associated with the corresponding device description, or “truth labels.” This is challenging for NNs, which traditionally require many samples for strong performance. Semi-supervised learning (SSL), low-shot learning (LSL), and uncertainty quantification (UQ) for NNs are emerging approaches that could bridge the mission gaps of few labels and rare samples of importance. NN explainability techniques are important in gaining insight into the inferential feature importance of such a complex model. In this work, SSL, LSL, and UQ are merged into a single framework, a significant technical hurdle not previously demonstrated. Exponential Average Adversarial Training (EAAT) and Pairwise Neural Networks (PNNs) are chosen as the SSL and LSL methods of choice. Permutation feature importance (PFI) for functional data is used to provide explainability via the Variable importance Explainable Elastic Shape Analysis (VEESA) pipeline. A variety of uncertainty quantification approaches are explored: Bayesian Neural Networks (BNNs), ensemble methods, concrete dropout, and evidential deep learning. Two final approaches, one utilizing ensemble methods and one utilizing evidential learning, are constructed and compared using a well-quantified synthetic 2D dataset along with the DIRSIG Megascene.
Abstract not provided.
Abstract not provided.