We present a high-level architecture for how artificial intelligences might advance and accumulate scientific and technological knowledge, inspired by emerging perspectives on how human intelligences advance and accumulate such knowledge. Agents advance knowledge by exercising a technoscientific method—an interacting combination of scientific and engineering methods. The technoscientific method maximizes a quantity we call “useful learning” via more-creative implausible utility (including the “aha!” moments of discovery), as well as via less-creative plausible utility. Society accumulates the knowledge advanced by agents so that other agents can incorporate and build on to make further advances. The proposed architecture is challenging but potentially complete: its execution might in principle enable artificial intelligences to advance and accumulate an equivalent of the full range of human scientific and technological knowledge.
The increasing use of machine learning (ML) models to support high-consequence decision making drives a need to increase the rigor of ML-based decision making. Critical problems ranging from climate change to nonproliferation monitoring rely on machine learning for aspects of their analyses. Likewise, future technologies, such as incorporation of data-driven methods into the stockpile surveillance and predictive failure analysis for weapons components, will all rely on decision-making that incorporates the output of machine learning models. In this project, our main focus was the development of decision scientific methods that combine uncertainty estimates for machine learning predictions, with a domain-specific model of error costs. Other focus areas include uncertainty measurement in ML predictions, designing decision rules using multiobjecive optimization, the value of uncertainty reduction, and decision-tailored uncertainty quantification for probability estimates. By laying foundations for rigorous decision making based on the predictions of machine learning models, these approaches are directly relevant to every national security mission that applies, or will apply, machine learning to data, most of which entail some decision context.
Reverse engineering (RE) analysts struggle to address critical questions about the safety of binary code accurately and promptly, and their supporting program analysis tools are simply wrong sometimes. The analysis tools have to approximate in order to provide any information at all, but this means that they introduce uncertainty into their results. And those uncertainties chain from analysis to analysis. We hypothesize that exposing sources, impacts, and control of uncertainty to human binary analysts will allow the analysts to approach their hardest problems with high-powered analytic techniques that they know when to trust. Combining expertise in binary analysis algorithms, human cognition, uncertainty quantification, verification and validation, and visualization, we pursue research that should benefit binary software analysis efforts across the board. We find a strong analogy between RE and exploratory data analysis (EDA); we begin to characterize sources and types of uncertainty found in practice in RE (both in the process and in supporting analyses); we explore a domain-specific focus on uncertainty in pointer analysis, showing that more precise models do help analysts answer small information flow questions faster and more accurately; and we test a general population with domain-general sudoku problems, showing that adding "knobs" to an analysis does not significantly slow down performance. This document describes our explorations in uncertainty in binary analysis.
On April 6-8, 2021, Sandia National Laboratories hosted a virtual workshop to explore the potential for developing AI-Enhanced Co-Design for Next-Generation Microelectronics (AICoM). The workshop brought together two themes. The first theme was articulated in the 2018 Department of Energy Office of Science (DOE SC) “Basic Research Needs for Microelectronics” (BRN) report, which called for a “fundamental rethinking” of the traditional design approach to microelectronics, in which subject matter experts (SMEs) in each microelectronics discipline (materials, devices, circuits, algorithms, etc.) work near-independently. Instead, the BRN called for a non-hierarchical, egalitarian vision of co-design, wherein “each scientific discipline informs and engages the others” in “parallel but intimately networked efforts to create radically new capabilities.” The second theme was the recognition of the continuing breakthroughs in artificial intelligence (AI) that are currently enhancing and accelerating the solution of traditional design problems in materials science, circuit design, and electronic design automation (EDA).
For digital twins (DTs) to become a central fixture in mission critical systems, a better understanding is required of potential modes of failure, quantification of uncertainty, and the ability to explain a model’s behavior. These aspects are particularly important as the performance of a digital twin will evolve during model development and deployment for real-world operations.
Signal arrival-time estimation plays a critical role in a variety of downstream seismic analyses, including location estimation and source characterization. Any arrival-time errors propagate through subsequent data-processing results. In this article, we detail a general framework for refining estimated seismic signal arrival times along with full estimation of their associated uncertainty. Using the standard short-term average/long-term average threshold algorithm to identify a search window, we demonstrate how to refine the pick estimate through two different approaches. In both cases, new waveform realizations are generated through bootstrap algorithms to produce full a posteriori estimates of uncertainty of onset arrival time of the seismic signal. The onset arrival uncertainty estimates provide additional data-derived information from the signal and have the potential to influence seismic analysis along several fronts.
The Arctic is warming and feedbacks in the coupled Earth system may be driving the Arctic to tipping events that could have critical downstream impacts for the rest of the globe. In this project we have focused on analyzing sea ice variability and loss in the coupled Earth system Summer sea ice loss is happening rapidly and although the loss may be smooth and reversible, it has significant consequences for other Arctic systems as well as geopolitical and economic implications. Accurate seasonal predictions of sea ice minimum extent and long-term estimates of timing for a seasonally ice-free Arctic depend on a better understanding of the factors influencing sea ice dynamics and variation in this strongly coupled system. Under this project we have investigated the most influential factors in accurate predictions of September Arctic sea ice extent using machine learning models trained separately on observational data and on simulation data from five E3SM historical ensembles. Monthly averaged data from June, July, and August for a selection of ice, ocean, and atmosphere variables were used to train a random forest regression model. Gini importance measures were computed for each input feature with the testing data. We found that sea ice volume is most important earlier in the season (June) and sea ice extent became a more important predictor closer to September. Results from this study provide insight into how feature importance changes with forecast length and illustrates differences between observational data and simulated Earth system data. We have additionally performed a global sensitivity analysis (GSA) using a fully coupled ultra- low resolution configuration E3SM. To our knowledge, this is the first global sensitivity analysis involving the fully-coupled E3SM Earth system model. We have found that parameter variations show significant impact on the Arctic climate state and atmospheric parameters related to cloud parameterizations are the most significant. We also find significant interactions between parameters from different components of E3SM. The results of this study provide invaluable insight into the relative importance of various parameters from the sea ice, atmosphere and ocean components of the E3SM (including cross-component parameter interactions) on various Arctic-focused quantities of interest (QOIs).
As artificial intelligence, machine learning, and statistical modeling methods become commonplace in national security applications, the drive to create trusted analytics becomes increasingly important. The goal of this report is to identify areas of research that can provide the foundational understanding and technical prerequisites for the development and deployment of trusted analytics in national security settings. Our review of the literature covered several disjoint research communities, including computer science, statistics, human factors, and several branches of psychology and cognitive science, which tend not to interact with one another or cite each other's literatures. As a result, there exists no agreed-upon theoretical framework for understanding how various factors influence trust and no well-established empirical paradigm for studying these effects. This report therefore takes three steps. First, we define several key terms in an effort to provide a unifying language for trusted analytics and to manage the scope of the problem. Second, we outline an empirical perspective that identifies key independent, moderating, and dependent variables in assessing trusted analytics. Though not a substitute for a theoretical framework, the empirical perspective does support research and development of trusted analytics in the national security domain. Finally, we discuss several research gaps relevant to developing trusted analytics for the national security mission space.
This report summarizes the results of an LDRD focused on developing and demonstrating statistically rigorous methods for analyzing and comparing complex activities from remote sensing data. Identifying activity from remote sensing data, particularly those that play out over time and span multiple locations, often requires extensive manual effort because of the variety of features that describe the activity and the required domain expertise. Our results suggest that there are some hidden challenges in extracting and representing activities in sensor data. In particular, we found that the variability in the underlying behaviors can be difficult to overcome statistically, and the report identifies several examples of the issue. We discuss key lessons learned in the context of the project, and finally conclude with recommendations on next steps and future work.
In this report, we present preliminary research into nonparametric clustering methods for multi-source imagery data and quantifying the performance of these models. In many domain areas, data sets do not necessarily follow well-defined and well-known probability distributions, such as the normal, gamma, and exponential. This is especially true when combining data from multiple sources describing a common set of objects (which we call multimodal analysis), where the data in each source can follow different distributions and need to be analyzed in conjunction with one another. This necessitates nonparametric density estimation methods, which allow the data to better dictate the distribution of the data. One prominent example of multimodal analysis is multimodal image analysis, when we analyze multiple images taken using different radar systems of the same scene of interest. We develop uncertainty analysis methods, which are inherent in the use of probabilistic models but often not taken advance of, to assess the performance of probabilistic clustering methods used for analyzing multimodal images. This added information helps assess model performance and how much trust decision-makers should have in the obtained analysis results. The developed methods illustrate some ways in which uncertainty can inform decisions that arise when designing and using machine learning models.
Data-driven modeling, including machine learning methods, continue to play an increasing role in society. Data-driven methods impact decision making for applications ranging from everyday determinations about which news people see and control of self-driving cars to high-consequence national security situations related to cyber security and analysis of nuclear weapons reliability. Although modern machine learning methods have made great strides in model induction and show excellent performance in a broad variety of complex domains, uncertainty remains an inherent aspect of any data-driven model. In this report, we provide an update to the preliminary results on uncertainty quantification for machine learning presented in SAND2017-6776. Specifically, we improve upon the general problem definition and expand upon the experiments conducted for the earlier re- port. Most importantly, we summarize key lessons learned about how and when uncertainty quantification can inform decision making and provide valuable insights into the quality of learned models and potential improvements to them.
We discuss uncertainty quantification in multisensor data integration and analysis, including estimation methods and the role of uncertainty in decision making and trust in automated analytics. The challenges associated with automatically aggregating information across multiple images, identifying subtle contextual cues, and detecting small changes in noisy activity patterns are well-established in the intelligence, surveillance, and reconnaissance (ISR) community. In practice, such questions cannot be adequately addressed with discrete counting, hard classifications, or yes/no answers. For a variety of reasons ranging from data quality to modeling assumptions to inadequate definitions of what constitutes "interesting" activity, variability is inherent in the output of automated analytics, yet it is rarely reported. Consideration of these uncertainties can provide nuance to automated analyses and engender trust in their results. In this work, we assert the importance of uncertainty quantification for automated data analytics and outline a research agenda. We begin by defining uncertainty in the context of machine learning and statistical data analysis, identify its sources, and motivate the importance and impact of its quantification. We then illustrate these issues and discuss methods for data-driven uncertainty quantification in the context of a multi-source image analysis example. We conclude by identifying several specific research issues and by discussing the potential long-term implications of uncertainty quantification for data analytics, including sensor tasking and analyst trust in automated analytics.
Our goal is to develop a general theoretical basis for quantifying uncertainty in supervised machine learning models. Current machine learning accuracy-based validation metrics indicate how well a classifier performs on a given data set as a whole. However, these metrics do not tell us a model's efficacy in predicting particular samples. We quantify uncertainty by constructing probability distributions of the predictions made by an ensemble of classifiers. This report details our initial investigations into uncertainty quantification for supervised machine learning. We apply an uncertainty analysis to the problem of malicious website detection. Machine learning models can be trained to find suspicious characteristics in the text of a website's Uniform Resource Locator (URL). However, given the vast numbers of URLs and the ever changing tactics of malicious actors, it will always be possible to find sets of websites which are outliers with respect to a model's hypothesis. Therefore, we seek to understand a model's per-sample reliability when classifying URL data. Acknowledgements This work was funded by the Sandia National Laboratories Laboratory Directed Research and Development (LDRD) program.
Visual clutter metrics play an important role in both the design of information visualizations and in the continued theoretical development of visual search models. In visualization design, clutter metrics provide a mathematical prediction of the complexity of the display and the difficulty associated with locating and identifying key pieces of information. In visual search models, they offer a proxy to set size, which represents the number of objects in the search scene, but is difficult to estimate in real-world imagery. In this article, we first briefly review the literature on clutter metrics and then contribute our own results drawn from studies in two security-oriented visual search domains: airport X-ray imagery and radar imagery. We analyze our results with an eye toward bridging the gap between the scene features evaluated by current clutter metrics and the features that are relevant to our security tasks. The article concludes with a brief discussion of possible research steps to close this gap.
In this paper, we assert the importance of uncertainty quantification for machine learning and sketch an initial research agenda. We define uncertainty in the context of machine learning, identify its sources, and motivate the importance and impact of its quantification. We then illustrate these issues with an image analysis example. The paper concludes by identifying several specific research issues and by discussing the potential long-term implications of uncertainty quantification for data analytics in general.
This summary of PANTHER Human Analytics work describes three of the team's major work activities: research with teams to elicit and document work practices; experimental studies of visual search performance and visual attention; and the application of spatio-temporal algorithms to the analysis of eye tracking data. Our intent is to provide basic introduction to the work area and a selected set of representative HA team publications as a starting point for readers interested our team's work.
Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. We present a preliminary evaluation of three methods for determining both match quality scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data.
This issue features expanded versions of articles selected from the 2014 AAAI Conference on Innovative Applications of Artificial Intelligence held in Quebec City, Canada. We present a selection of four articles describing deployed applications plus two more articles that discuss work on emerging applications.
This report summarizes preliminary research into uncertainty quantification for pattern ana- lytics within the context of the Pattern Analytics to Support High-Performance Exploitation and Reasoning (PANTHER) project. The primary focus of PANTHER was to make large quantities of remote sensing data searchable by analysts. The work described in this re- port adds nuance to both the initial data preparation steps and the search process. Search queries are transformed from does the specified pattern exist in the data? to how certain is the system that the returned results match the query? We show example results for both data processing and search, and discuss a number of possible improvements for each.
Visual search data describe people’s performance on the common perceptual problem of identifying target objects in a complex scene. Technological advances in areas such as eye tracking now provide researchers with a wealth of data not previously available. The goal of this work is to support researchers in analyzing this complex and multimodal data and in developing new insights into visual search techniques. We discuss several methods drawn from the statistics and machine learning literature for integrating visual search data derived from multiple sources and performing exploratory data analysis. We ground our discussion in a specific task performed by officers at the Transportation Security Administration and consider the applicability, likely issues, and possible adaptations of several candidate analysis methods.
Inference techniques play a central role in many cognitive systems. They transform low-level observations of the environment into high-level, actionable knowledge which then gets used by mechanisms that drive action, problem-solving, and learning. This paper presents an initial effort at combining results from AI and psychology into a pragmatic and scalable computational reasoning system. Our approach combines a numeric notion of plausibility with first-order logic to produce an incremental inference engine that is guided by heuristics derived from the psychological literature. We illustrate core ideas with detailed examples and discuss the advantages of the approach with respect to cognitive systems.
Inference techniques play a central role in many cognitive systems. They transform low-level observations of the environment into high-level, actionable knowledge which then gets used by mechanisms that drive action, problem-solving, and learning. This paper presents an initial effort at combining results from AI and psychology into a pragmatic and scalable computational reasoning system. Our approach combines a numeric notion of plausibility with first-order logic to produce an incremental inference engine that is guided by heuristics derived from the psychological literature. We illustrate core ideas with detailed examples and discuss the advantages of the approach with respect to cognitive systems.