Publications Search

Improving and Assessing the Quality of Uncertainty Quantification in Deep Learning

Adams, Jason R.; Baiyasi, Rashad; Berman, Brandon; Darling, Michael C.; Ganter, Tyler; Michalenko, Joshua J.; Patel, Lekha; Ries, Daniel; Liang, Feng; Qian, Christopher; Roy, Krishna

Deep learning (DL) models have enjoyed increased attention in recent years because of their powerful predictive capabilities. While many successes have been achieved, standard deep learning methods suffer from a lack of uncertainty quantification (UQ). While the development of methods for producing UQ from DL models is an active area of current research, little attention has been given to the quality of the UQ produced by such methods. In order to deploy DL models to high-consequence applications, high-quality UQ is necessary. This report details the research and development conducted as part of a Laboratory Directed Research and Development (LDRD) project at Sandia National Laboratories. The focus of this project is to develop a framework of methods and metrics for the principled assessment of UQ quality in DL models. This report presents an overview of UQ quality assessment in traditional statistical modeling and describes why this approach is difficult to apply in DL contexts. An assessment on relatively simple simulated data is presented to demonstrate that UQ quality can differ greatly between DL models trained on the same data. A method for simulating image data that can then be used for UQ quality assessment is described. A general method for simulating realistic data for the purpose of assessing a model’s UQ quality is also presented. A Bayesian uncertainty framework for understanding uncertainty and existing metrics is described. Research that came out of collaborations with two university partners are discussed along with a software toolkit that is currently being developed to implement the UQ quality assessment framework as well as serve as a general guide to incorporating UQ into DL applications.

More Details

TYPE LDRD Report YEAR 2023

DOI OSTI

Machine Learning and Autonomous Decision Making for National Security: Designing Decision Rules using Multiobjective Optimization

Smith, Mark A.; Darling, Michael C.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2023

DOI OSTI

Preliminary Results for Using Uncertainty and Out-of-distribution Detection to Identify Unreliable Predictions

Doak, Justin E.; Darling, Michael C.

As machine learning (ML) models are deployed into an ever-diversifying set of application spaces, ranging from self-driving cars to cybersecurity to climate modeling, the need to carefully evaluate model credibility becomes increasingly important. Uncertainty quantification (UQ) provides important information about the ability of a learned model to make sound predictions, often with respect to individual test cases. However, most UQ methods for ML are themselves data-driven and therefore susceptible to the same knowledge gaps as the models themselves. Specifically, UQ helps to identify points near decision boundaries where the models fit the data poorly, yet predictions can score as certain for points that are under-represented by the training data and thus out-of-distribution (OOD). One method for evaluating the quality of both ML models and their associated uncertainty estimates is out-of-distribution detection (OODD). We combine OODD with UQ to provide insights into the reliability of the individual predictions made by an ML model.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

A Decision Theoretic Approach To Optimizing Machine Learning Decisions with Prediction Uncertainty

Field, Richard V.; Darling, Michael C.

While the use of machine learning (ML) classifiers is widespread, their output is often not part of any follow-on decision-making process. To illustrate, consider the scenario where we have developed and trained an ML classifier to find malicious URL links. In this scenario, network administrators must decide whether to allow a computer user to visit a particular website, or to instead block access because the site is deemed malicious. It would be very beneficial if decisions such as these could be made automatically using a trained ML classifier. Unfortunately, due to a variety of reasons discussed herein, the output from these classifiers can be uncertain, rendering downstream decisions difficult. Herein, we provide a framework for: (1) quantifying and propagating uncertainty in ML classifiers; (2) formally linking ML outputs with the decision-making process; and (3) making optimal decisions for classification under uncertainty with single or multiple objectives.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

Decision Science for Machine Learning (DeSciML)

Darling, Michael C.; Field, Richard V.; Smith, Mark A.; Doak, Justin E.; Headen, James M.; Stracuzzi, David J.

The increasing use of machine learning (ML) models to support high-consequence decision making drives a need to increase the rigor of ML-based decision making. Critical problems ranging from climate change to nonproliferation monitoring rely on machine learning for aspects of their analyses. Likewise, future technologies, such as incorporation of data-driven methods into the stockpile surveillance and predictive failure analysis for weapons components, will all rely on decision-making that incorporates the output of machine learning models. In this project, our main focus was the development of decision scientific methods that combine uncertainty estimates for machine learning predictions, with a domain-specific model of error costs. Other focus areas include uncertainty measurement in ML predictions, designing decision rules using multiobjecive optimization, the value of uncertainty reduction, and decision-tailored uncertainty quantification for probability estimates. By laying foundations for rigorous decision making based on the predictions of machine learning models, these approaches are directly relevant to every national security mission that applies, or will apply, machine learning to data, most of which entail some decision context.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

Decision Science for Machine Learning (DeSciML)

Field Jr., Richard V.; Darling, Michael C.; Doak, Justin E.; Headen, James M.; Smith, Mark A.; Bickel, Eric; Boada, Jason; Smith, Zack

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Value of Uncertainty Reduction for Machine Learning Predictions

Boada, Jason; Darling, Michael C.; Bickel, Eric

Abstract not provided.

More Details

TYPE Presentation YEAR 2022

OSTI

Exploring Explicit Uncertainty for Binary Analysis (EUBA)

Leger, Michelle A.; Darling, Michael C.; Jones, Stephen T.; Matzen, Laura E.; Stracuzzi, David J.; Wilson, Andrew T.; Bueno, Denis; Christentsen, Matthew; Ginaldi, Melissa; Foulk, James W.; Heidbrink, Scott; Howell, Breannan C.; Leger, Chris; Reedy, Geoffrey; Rogers, Alisa; Williams, Jack

Reverse engineering (RE) analysts struggle to address critical questions about the safety of binary code accurately and promptly, and their supporting program analysis tools are simply wrong sometimes. The analysis tools have to approximate in order to provide any information at all, but this means that they introduce uncertainty into their results. And those uncertainties chain from analysis to analysis. We hypothesize that exposing sources, impacts, and control of uncertainty to human binary analysts will allow the analysts to approach their hardest problems with high-powered analytic techniques that they know when to trust. Combining expertise in binary analysis algorithms, human cognition, uncertainty quantification, verification and validation, and visualization, we pursue research that should benefit binary software analysis efforts across the board. We find a strong analogy between RE and exploratory data analysis (EDA); we begin to characterize sources and types of uncertainty found in practice in RE (both in the process and in supporting analyses); we explore a domain-specific focus on uncertainty in pointer analysis, showing that more precise models do help analysts answer small information flow questions faster and more accurately; and we test a general population with domain-general sudoku problems, showing that adding "knobs" to an analysis does not significantly slow down performance. This document describes our explorations in uncertainty in binary analysis.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

Optimizing Machine Learning Decisions with Prediction Uncertainty

Darling, Michael C.; Doak, Justin E.; Field, Richard V.; Smith, Mark A.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Optimizing Machine Learning Predictions with Prediction Uncertainty

Smith, Zachary J.; Darling, Michael C.; Stracuzzi, David J.; Doak, Justin E.; Smith, Mark A.; Bickel, J.E.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2021

DOI OSTI

A Novel Measure of Uncertainty For Machine Learning Predictions

Darling, Michael C.; Hush, Don R.; Stracuzzi, David J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Using Uncertainty To Interpret Supervised Machine Learning Predictions

Darling, Michael C.

Abstract not provided.

More Details

TYPE Thesis or Dissertation YEAR 2019

OSTI

Preliminary Results on Applying Nonparametric Clustering and Bayesian Consensus Clustering Methods to Multimodal Data

Chen, Maximillian G.; Darling, Michael C.; Stracuzzi, David J.

In this report, we present preliminary research into nonparametric clustering methods for multi-source imagery data and quantifying the performance of these models. In many domain areas, data sets do not necessarily follow well-defined and well-known probability distributions, such as the normal, gamma, and exponential. This is especially true when combining data from multiple sources describing a common set of objects (which we call multimodal analysis), where the data in each source can follow different distributions and need to be analyzed in conjunction with one another. This necessitates nonparametric density estimation methods, which allow the data to better dictate the distribution of the data. One prominent example of multimodal analysis is multimodal image analysis, when we analyze multiple images taken using different radar systems of the same scene of interest. We develop uncertainty analysis methods, which are inherent in the use of probabilistic models but often not taken advance of, to assess the performance of probabilistic clustering methods used for analyzing multimodal images. This added information helps assess model performance and how much trust decision-makers should have in the obtained analysis results. The developed methods illustrate some ways in which uncertainty can inform decisions that arise when designing and using machine learning models.

More Details

TYPE SAND Report YEAR 2018

DOI OSTI

Quantifying Uncertainty to Improve Decision Making in Machine Learning

Stracuzzi, David J.; Darling, Michael C.; Peterson, Matthew G.; Chen, Maximillian G.

Data-driven modeling, including machine learning methods, continue to play an increasing role in society. Data-driven methods impact decision making for applications ranging from everyday determinations about which news people see and control of self-driving cars to high-consequence national security situations related to cyber security and analysis of nuclear weapons reliability. Although modern machine learning methods have made great strides in model induction and show excellent performance in a broad variety of complex domains, uncertainty remains an inherent aspect of any data-driven model. In this report, we provide an update to the preliminary results on uncertainty quantification for machine learning presented in SAND2017-6776. Specifically, we improve upon the general problem definition and expand upon the experiments conducted for the earlier re- port. Most importantly, we summarize key lessons learned about how and when uncertainty quantification can inform decision making and provide valuable insights into the quality of learned models and potential improvements to them.

More Details

TYPE SAND Report YEAR 2018

DOI OSTI

A Mathematical Framework for Uncertainty Quantification in Multimodal Image Analysis via Probabilistic Clustering Models

Chen, Maximillian G.; Stracuzzi, David J.; Darling, Michael C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

A Mathematical Framework for Uncertainty Quantification in Multimodal Image Analysis via Probabilistic Clustering Models

Chen, Maximillian G.; Stracuzzi, David J.; Darling, Michael C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Using Uncertainty to Understand Machine Learning Models and Decisions

Chen, Maximillian G.; Darling, Michael C.; Vollmer, Charlie; Peterson, Matthew G.; Stracuzzi, David J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Data-Driven Uncertainty Quantification for Multi-Sensor Analytics

Stracuzzi, David J.; Darling, Michael C.; Chen, Maximillian G.; Peterson, Matthew G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Intelligent Modeling for Nuclear Power Plant Accident Management

International Journal on Artificial Intelligence Tools

Darling, Michael C.; Luger, George F.; Jones, Thomas B.; Denman, Matthew R.; Groth, Katrina M.

This paper explores the viability of using counterfactual reasoning for impact analyses when understanding and responding to "beyond-design-basis" nuclear power plant accidents. Currently, when a severe nuclear power plant accident occurs, plant operators rely on Severe Accident Management Guidelines. However, the current guidelines are limited in scope and depth: for certain types of accidents, plant operators would have to work to mitigate the damage with limited experience and guidance for the particular situation. We aim to fill the need for comprehensive accident support by using a dynamic Bayesian network to aid in the diagnosis of a nuclear reactor's state and to analyze the impact of possible response measures. The dynamic Bayesian network, DBN, offers an expressive representation of the components and relationships that make up a complex causal system. For this reason, and for its tractable reasoning, the DBN supports a functional model for the intricate operations of nuclear power plants. In this domain, it is also pertinent that a Bayesian network can be composed of both probabilistic and knowledge-based components. Though probabilities can be calculated from simulated models, the structure of the network, as well as the value of some parameters, must be assigned by human experts. Since dynamic Bayesian network-based systems are capable of running better-than-real-time situation analyses, they can support both current event and alternate scenario impact analyses.

More Details

TYPE Journal Article YEAR 2018

DOI OSTI Scopus

Uncertainty Propagation In Multilayer Analysis

Darling, Michael C.; Stracuzzi, David J.; Chen, Maximillian G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Multimodal Image Analysis and Uncertainty Quantification via Nonparametric Probabilistic Clustering

Chen, Maximillian G.; Darling, Michael C.; Stracuzzi, David J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Data-driven uncertainty quantification for multisensor analytics

Proceedings of SPIE - The International Society for Optical Engineering

Stracuzzi, David J.; Darling, Michael C.; Chen, Maximillian G.; Peterson, Matthew G.

We discuss uncertainty quantification in multisensor data integration and analysis, including estimation methods and the role of uncertainty in decision making and trust in automated analytics. The challenges associated with automatically aggregating information across multiple images, identifying subtle contextual cues, and detecting small changes in noisy activity patterns are well-established in the intelligence, surveillance, and reconnaissance (ISR) community. In practice, such questions cannot be adequately addressed with discrete counting, hard classifications, or yes/no answers. For a variety of reasons ranging from data quality to modeling assumptions to inadequate definitions of what constitutes "interesting" activity, variability is inherent in the output of automated analytics, yet it is rarely reported. Consideration of these uncertainties can provide nuance to automated analyses and engender trust in their results. In this work, we assert the importance of uncertainty quantification for automated data analytics and outline a research agenda. We begin by defining uncertainty in the context of machine learning and statistical data analysis, identify its sources, and motivate the importance and impact of its quantification. We then illustrate these issues and discuss methods for data-driven uncertainty quantification in the context of a multi-source image analysis example. We conclude by identifying several specific research issues and by discussing the potential long-term implications of uncertainty quantification for data analytics, including sensor tasking and analyst trust in automated analytics.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI Scopus

Toward Uncertainty Quantification for Supervised Classification

Darling, Michael C.; Stracuzzi, David J.

Our goal is to develop a general theoretical basis for quantifying uncertainty in supervised machine learning models. Current machine learning accuracy-based validation metrics indicate how well a classifier performs on a given data set as a whole. However, these metrics do not tell us a model's efficacy in predicting particular samples. We quantify uncertainty by constructing probability distributions of the predictions made by an ensemble of classifiers. This report details our initial investigations into uncertainty quantification for supervised machine learning. We apply an uncertainty analysis to the problem of malicious website detection. Machine learning models can be trained to find suspicious characteristics in the text of a website's Uniform Resource Locator (URL). However, given the vast numbers of URLs and the ever changing tactics of malicious actors, it will always be possible to find sets of websites which are outliers with respect to a model's hypothesis. Therefore, we seek to understand a model's per-sample reliability when classifying URL data. Acknowledgements This work was funded by the Sandia National Laboratories Laboratory Directed Research and Development (LDRD) program.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI