Publications

Results 26–50 of 75

Search results

Jump to search filters

Preliminary Results on Applying Nonparametric Clustering and Bayesian Consensus Clustering Methods to Multimodal Data

Chen, Maximillian G.; Darling, Michael C.; Stracuzzi, David J.

In this report, we present preliminary research into nonparametric clustering methods for multi-source imagery data and quantifying the performance of these models. In many domain areas, data sets do not necessarily follow well-defined and well-known probability distributions, such as the normal, gamma, and exponential. This is especially true when combining data from multiple sources describing a common set of objects (which we call multimodal analysis), where the data in each source can follow different distributions and need to be analyzed in conjunction with one another. This necessitates nonparametric density estimation methods, which allow the data to better dictate the distribution of the data. One prominent example of multimodal analysis is multimodal image analysis, when we analyze multiple images taken using different radar systems of the same scene of interest. We develop uncertainty analysis methods, which are inherent in the use of probabilistic models but often not taken advance of, to assess the performance of probabilistic clustering methods used for analyzing multimodal images. This added information helps assess model performance and how much trust decision-makers should have in the obtained analysis results. The developed methods illustrate some ways in which uncertainty can inform decisions that arise when designing and using machine learning models.

More Details

Quantifying Uncertainty to Improve Decision Making in Machine Learning

Stracuzzi, David J.; Darling, Michael C.; Peterson, Matthew G.; Chen, Maximillian G.

Data-driven modeling, including machine learning methods, continue to play an increasing role in society. Data-driven methods impact decision making for applications ranging from everyday determinations about which news people see and control of self-driving cars to high-consequence national security situations related to cyber security and analysis of nuclear weapons reliability. Although modern machine learning methods have made great strides in model induction and show excellent performance in a broad variety of complex domains, uncertainty remains an inherent aspect of any data-driven model. In this report, we provide an update to the preliminary results on uncertainty quantification for machine learning presented in SAND2017-6776. Specifically, we improve upon the general problem definition and expand upon the experiments conducted for the earlier re- port. Most importantly, we summarize key lessons learned about how and when uncertainty quantification can inform decision making and provide valuable insights into the quality of learned models and potential improvements to them.

More Details

Data-driven uncertainty quantification for multisensor analytics

Proceedings of SPIE - The International Society for Optical Engineering

Stracuzzi, David J.; Darling, Michael C.; Chen, Maximillian G.; Peterson, Matthew G.

We discuss uncertainty quantification in multisensor data integration and analysis, including estimation methods and the role of uncertainty in decision making and trust in automated analytics. The challenges associated with automatically aggregating information across multiple images, identifying subtle contextual cues, and detecting small changes in noisy activity patterns are well-established in the intelligence, surveillance, and reconnaissance (ISR) community. In practice, such questions cannot be adequately addressed with discrete counting, hard classifications, or yes/no answers. For a variety of reasons ranging from data quality to modeling assumptions to inadequate definitions of what constitutes "interesting" activity, variability is inherent in the output of automated analytics, yet it is rarely reported. Consideration of these uncertainties can provide nuance to automated analyses and engender trust in their results. In this work, we assert the importance of uncertainty quantification for automated data analytics and outline a research agenda. We begin by defining uncertainty in the context of machine learning and statistical data analysis, identify its sources, and motivate the importance and impact of its quantification. We then illustrate these issues and discuss methods for data-driven uncertainty quantification in the context of a multi-source image analysis example. We conclude by identifying several specific research issues and by discussing the potential long-term implications of uncertainty quantification for data analytics, including sensor tasking and analyst trust in automated analytics.

More Details

Toward Uncertainty Quantification for Supervised Classification

Darling, Michael C.; Stracuzzi, David J.

Our goal is to develop a general theoretical basis for quantifying uncertainty in supervised machine learning models. Current machine learning accuracy-based validation metrics indicate how well a classifier performs on a given data set as a whole. However, these metrics do not tell us a model's efficacy in predicting particular samples. We quantify uncertainty by constructing probability distributions of the predictions made by an ensemble of classifiers. This report details our initial investigations into uncertainty quantification for supervised machine learning. We apply an uncertainty analysis to the problem of malicious website detection. Machine learning models can be trained to find suspicious characteristics in the text of a website's Uniform Resource Locator (URL). However, given the vast numbers of URLs and the ever changing tactics of malicious actors, it will always be possible to find sets of websites which are outliers with respect to a model's hypothesis. Therefore, we seek to understand a model's per-sample reliability when classifying URL data. Acknowledgements This work was funded by the Sandia National Laboratories Laboratory Directed Research and Development (LDRD) program.

More Details

Applying Image Clutter Metrics to Domain-Specific Expert Visual Search

Speed, Ann S.; Stracuzzi, David J.; Lee, Jina L.; Hund, Lauren H.

Visual clutter metrics play an important role in both the design of information visualizations and in the continued theoretical development of visual search models. In visualization design, clutter metrics provide a mathematical prediction of the complexity of the display and the difficulty associated with locating and identifying key pieces of information. In visual search models, they offer a proxy to set size, which represents the number of objects in the search scene, but is difficult to estimate in real-world imagery. In this article, we first briefly review the literature on clutter metrics and then contribute our own results drawn from studies in two security-oriented visual search domains: airport X-ray imagery and radar imagery. We analyze our results with an eye toward bridging the gap between the scene features evaluated by current clutter metrics and the features that are relevant to our security tasks. The article concludes with a brief discussion of possible research steps to close this gap.

More Details
Results 26–50 of 75
Results 26–50 of 75