Causal discovery algorithms construct hypothesized causal graphs that depict causal dependencies among variables in observational data. While powerful, the accuracy of these algorithms is highly sensitive to the underlying dynamics of the system in ways that have not been fully characterized in the literature. In this report, we benchmark the PCMCI causal discovery algorithm in its application to gridded spatiotemporal systems. Effectively computing grid-level causal graphs on large grids will enable analysis of the causal impacts of transient and mobile spatial phenomena in large systems, such as the Earth’s climate. We evaluate the performance of PCMCI with a set of structural causal models, using simulated spatial vector autoregressive processes in one- and two-dimensions. We develop computational and analytical tools for characterizing these processes and their associated causal graphs. Our findings suggest that direct application of PCMCI is not suitable for the analysis of dynamical spatiotemporal gridded systems, such as climatological data, without significant preprocessing and downscaling of the data. PCMCI requires unrealistic sample sizes to achieve acceptable performance on even modestly sized problems and suffers from a notable curse of dimensionality. This work suggests that, even under generous structural assumptions, significant additional algorithmic improvements are needed before causal discovery algorithms can be reliably applied to grid-level outputs of earth system models.
In September of 2020, Arctic sea ice extent was the second-lowest on record. State of the art climate prediction uses Earth system models (ESMs), driven by systems of differential equations representing the laws of physics. Previously, these models have tended to underestimate Arctic sea ice loss. The issue is grave because accurate modeling is critical for economic, ecological, and geopolitical planning. We use machine learning techniques, including random forest regression and Gini importance, to show that the Energy Exascale Earth System Model (E3SM) relies too heavily on just one of the ten chosen climatological quantities to predict September sea ice averages. Furthermore, E3SM gives too much importance to six of those quantities when compared to observed data. Identifying the features that climate models incorrectly rely on should allow climatologists to improve prediction accuracy.
We use a nascent data-driven causal discovery method to find and compare causal relationships in observed data and climate model output. We consider ten different features in the Arctic climate collected from public databases on observational and Energy Exascale Earth System Model (E3SM) data. In identifying and analyzing the resulting causal networks, we make meaningful comparisons between observed and climate model interdependencies. This work demonstrates our ability to apply the PCMCI causal discovery algorithm to Arctic climate data, that there are noticeable similarities between observed and simulated Arctic climate dynamics, and that further work is needed to identify specific areas for improvement to better align models with natural observations.
The Arctic is warming and feedbacks in the coupled Earth system may be driving the Arctic to tipping events that could have critical downstream impacts for the rest of the globe. In this project we have focused on analyzing sea ice variability and loss in the coupled Earth system Summer sea ice loss is happening rapidly and although the loss may be smooth and reversible, it has significant consequences for other Arctic systems as well as geopolitical and economic implications. Accurate seasonal predictions of sea ice minimum extent and long-term estimates of timing for a seasonally ice-free Arctic depend on a better understanding of the factors influencing sea ice dynamics and variation in this strongly coupled system. Under this project we have investigated the most influential factors in accurate predictions of September Arctic sea ice extent using machine learning models trained separately on observational data and on simulation data from five E3SM historical ensembles. Monthly averaged data from June, July, and August for a selection of ice, ocean, and atmosphere variables were used to train a random forest regression model. Gini importance measures were computed for each input feature with the testing data. We found that sea ice volume is most important earlier in the season (June) and sea ice extent became a more important predictor closer to September. Results from this study provide insight into how feature importance changes with forecast length and illustrates differences between observational data and simulated Earth system data. We have additionally performed a global sensitivity analysis (GSA) using a fully coupled ultra- low resolution configuration E3SM. To our knowledge, this is the first global sensitivity analysis involving the fully-coupled E3SM Earth system model. We have found that parameter variations show significant impact on the Arctic climate state and atmospheric parameters related to cloud parameterizations are the most significant. We also find significant interactions between parameters from different components of E3SM. The results of this study provide invaluable insight into the relative importance of various parameters from the sea ice, atmosphere and ocean components of the E3SM (including cross-component parameter interactions) on various Arctic-focused quantities of interest (QOIs).