Trust in a Safeguards Voice User Interface for a Nuclear Material Measurement Task
INMM paper for UUR release. This manuscript has been approved for sensitivity review and for release by the sponsor
INMM paper for UUR release. This manuscript has been approved for sensitivity review and for release by the sponsor
Abstract not provided.
ESARDA Bulletin
Computer vision models have great potential as tools for international nuclear safeguards verification activities, but off-the-shelf models require fine-tuning through transfer learning to detect relevant objects. Because open-source examples of safeguards-relevant objects are rare, and to evaluate the potential of synthetic training data for computer vision, we present the Limbo dataset. Limbo includes both real and computer-generated images of uranium hexafluoride containers for training computer vision models. We generated these images iteratively based on results from data validation experiments that are detailed here. The findings from these experiments are applicable both for the safeguards community and the broader community of computer vision research using synthetic data.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proceedings of the Annual Hawaii International Conference on System Sciences
With machine learning (ML) technologies rapidly expanding to new applications and domains, users are collaborating with artificial intelligence-assisted diagnostic tools to a larger and larger extent. But what impact does ML aid have on cognitive performance, especially when the ML output is not always accurate? Here, we examined the cognitive effects of the presence of simulated ML assistance-including both accurate and inaccurate output-on two tasks (a domain-specific nuclear safeguards task and domain-general visual search task). Patterns of performance varied across the two tasks for both the presence of ML aid as well as the category of ML feedback (e.g., false alarm). These results indicate that differences such as domain could influence users' performance with ML aid, and suggest the need to test the effects of ML output (and associated errors) in the specific context of use, especially when the stimuli of interest are vague or ill-defined.
Cognitive Research: Principles and Implications
Eye tracking is a useful tool for studying human cognition, both in the laboratory and in real-world applications. However, there are cases in which eye tracking is not possible, such as in high-security environments where recording devices cannot be introduced. After facing this challenge in our own work, we sought to test the effectiveness of using artificial foveation as an alternative to eye tracking for studying visual search performance. Two groups of participants completed the same list comparison task, which was a computer-based task designed to mimic an inventory verification process that is commonly performed by international nuclear safeguards inspectors. We manipulated the way in which the items on the inventory list were ordered and color coded. For the eye tracking group, an eye tracker was used to assess the order in which participants viewed the items and the number of fixations per trial in each list condition. For the artificial foveation group, the items were covered with a blurry mask except when participants moused over them. We tracked the order in which participants viewed the items by moving their mouse and the number of items viewed per trial in each list condition. We observed the same overall pattern of performance for the various list display conditions, regardless of the method. However, participants were much slower to complete the task when using artificial foveation and had more variability in their accuracy. Our results indicate that the artificial foveation method can reveal the same pattern of differences across conditions as eye tracking, but it can also impact participants’ task performance.
Due to their recent increases in performance, machine learning and deep learning models are being increasingly adopted across many domains for visual processing tasks. One such domain is international nuclear safeguards, which seeks to verify the peaceful use of commercial nuclear energy across the globe. Despite recent impressive performance results from machine learning and deep learning algorithms, there is always at least some small level of error. Given the significant consequences of international nuclear safeguards conclusions, we sought to characterize how incorrect responses from a machine or deep learning-assisted visual search task would cognitively impact users. We found that not only do some types of model errors have larger negative impacts on human performance than other errors, the scale of those impacts change depending on the accuracy of the model with which they are presented and they persist in scenarios of evenly distributed errors and single-error presentations. Further, we found that experiments conducted using a common visual search dataset from the psychology community has similar implications to a safeguards- relevant dataset of images containing hyperboloid cooling towers when the cooling tower images are presented to expert participants. While novice performance was considerably different (and worse) on the cooling tower task, we saw increased novice reliance on the most challenging cooling tower images compared to experts. These findings are relevant not just to the cognitive science community, but also for developers of machine and deep learning that will be implemented in multiple domains. For safeguards, this research provides key insights into how machine and deep learning projects should be implemented considering their special requirements that information not be missed.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Deep Learning computer vision models require many thousands of properly labelled images for training, which is especially challenging for safeguards and nonproliferation, given that safeguards-relevant images are typically rare due to the sensitivity and limited availability of the technologies. Creating relevant images through real-world staging is costly and limiting in scope. Expert-labeling is expensive, time consuming, and error prone. We aim to develop a data set of both realworld and synthetic images that are relevant to the nuclear safeguards domain that can be used to support multiple data science research questions. In the process of developing this data, we aim to develop a novel workflow to validate synthetic images using machine learning explainability methods, testing among multiple computer vision algorithms, and iterative synthetic data rendering. We will deliver one million images – both real-world and synthetically rendered – of two types uranium storage and transportation containers with labelled ground truth and associated adversarial examples.
Abstract not provided.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
As the ability to collect and store data grows, so does the need to efficiently analyze that data. As human-machine teams that use machine learning (ML) algorithms as a way to inform human decision-making grow in popularity it becomes increasingly critical to understand the optimal methods of implementing algorithm assisted search. In order to better understand how algorithm confidence values associated with object identification can influence participant accuracy and response times during a visual search task, we compared models that provided appropriate confidence, random confidence, and no confidence, as well as a model biased toward over confidence and a model biased toward under confidence. Results indicate that randomized confidence is likely harmful to performance while non-random confidence values are likely better than no confidence value for maintaining accuracy over time. Providing participants with appropriate confidence values did not seem to benefit performance any more than providing participants with under or over confident models.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Deep convolutional neural networks (DCNNs) currently provide state-of-the-art performance on image classification and object detection tasks, and there are many global security mission areas where such models could be extremely useful. Crucially, the success of these models is driven in large part by the widespread availability of high-quality open source data sets such as Image Net, Common Objects in Context (COCO), and KITTI, which contain millions of images with thousands of unique labels. However, global security relevant objects-of-interest can be difficult to obtain: relevant events are low frequency and high consequence; the content of relevant images is sensitive; and adversaries and proliferators seek to obscure their activities. For these cases where exemplar data is hard to come-by, even fine-tuning an existing model with available data can be effectively impossible. Recent work demonstrated that models can be trained using a combination of real-world and synthetic images generated from 3D representations; that such models can exceed the performance of models trained using real-world data alone; and that the generated images need not be perfectly realistic (Tremblay, et al., 2018). However, this approach still required hundreds to thousands of real-world images for training and fine tuning, which for sparse, global security-relevant datasets can be an unrealistic hurdle. In this research, we validate the performance and behavior of DCNN models as we drive the number of real-world images used for training object detection tasks down to a minimal set. We perform multiple experiments to identify the best approach to train DCNNs from an extremely small set of real-world images. In doing so, we: Develop state-of-the-art, parameterized 3D models based on real-world images and sample from their parameters to increase the variance in synthetic image training data; Use machine learning explainability techniques to highlight and correct through targeted training the biases that result from training using completely synthetic images; and Validate our results by comparing the performance of the models trained on synthetic data to one another, and to a control model created by fine-tuning an existing ImageNet-trained model with a limited number (hundreds) of real-world images.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Nuclear Non-proliferation and Arms Control Verification: Innovative Systems Concepts
Societal verification-the use of data produced by the public to support confirmation that a state is in compliance with its nonproliferation or arms control obligations-is a concept as old as nonproliferation and arms control proposals themselves. With the tremendous growth in access to the Internet, and its accompanying public generation of and access to data, the concept of societal verification has undergone a recent resurgence in popularity. This chapter explores societal verification through two mechanisms of collecting and analyzing societallyproduced data: mobilization and observation. It describes current applications and research in each area before providing an overview of challenges and considerations that must be addressed in order to bring societally-produced data into an official verification regime. The chapter concludes by emphasizing that the role of societal verification, if any, in nonproliferation and arms control will supplement rather than supplant traditional verification means.