Publications

Results 1–25 of 49

Search results

Jump to search filters

Data Validation Experiments with a Computer-Generated Imagery Dataset for International Nuclear Safeguards

ESARDA Bulletin

Gastelum, Zoe N.; Shead, Timothy M.; Marshall, Matthew

Computer vision models have great potential as tools for international nuclear safeguards verification activities, but off-the-shelf models require fine-tuning through transfer learning to detect relevant objects. Because open-source examples of safeguards-relevant objects are rare, and to evaluate the potential of synthetic training data for computer vision, we present the Limbo dataset. Limbo includes both real and computer-generated images of uranium hexafluoride containers for training computer vision models. We generated these images iteratively based on results from data validation experiments that are detailed here. The findings from these experiments are applicable both for the safeguards community and the broader community of computer vision research using synthetic data.

More Details

Adapting Secure MultiParty Computation to Support Machine Learning in Radio Frequency Sensor Networks

Berry, Jonathan W.; Ganti, Anand G.; Goss, Kenneth G.; Mayer, Carolyn D.; Onunkwo, Uzoma O.; Phillips, Cynthia A.; Saia, Jarared; Shead, Timothy M.

In this project we developed and validated algorithms for privacy-preserving linear regression using a new variant of Secure Multiparty Computation (MPC) we call "Hybrid MPC" (hMPC). Our variant is intended to support low-power, unreliable networks of sensors with low-communication, fault-tolerant algorithms. In hMPC we do not share training data, even via secret sharing. Thus, agents are responsible for protecting their own local data. Only the machine learning (ML) model is protected with information-theoretic security guarantees against honest-but-curious agents. There are three primary advantages to this approach: (1) after setup, hMPC supports a communication-efficient matrix multiplication primitive, (2) organizations prevented by policy or technology from sharing any of their data can participate as agents in hMPC, and (3) large numbers of low-power agents can participate in hMPC. We have also created an open-source software library named "Cicada" to support hMPC applications with fault-tolerance. The fault-tolerance is important in our applications because the agents are vulnerable to failure or capture. We have demonstrated this capability at Sandia's Autonomy New Mexico laboratory through a simple machine-learning exercise with Raspberry Pi devices capturing and classifying images while flying on four drones.

More Details

Safeguards-Informed Hybrid Imagery Dataset [Poster]

Rutkowski, Joshua E.; Gastelum, Zoe N.; Shead, Timothy M.; Rushdi, Ahmad R.; Bolles, Jason C.; Mattes, Arielle

Deep Learning computer vision models require many thousands of properly labelled images for training, which is especially challenging for safeguards and nonproliferation, given that safeguards-relevant images are typically rare due to the sensitivity and limited availability of the technologies. Creating relevant images through real-world staging is costly and limiting in scope. Expert-labeling is expensive, time consuming, and error prone. We aim to develop a data set of both realworld and synthetic images that are relevant to the nuclear safeguards domain that can be used to support multiple data science research questions. In the process of developing this data, we aim to develop a novel workflow to validate synthetic images using machine learning explainability methods, testing among multiple computer vision algorithms, and iterative synthetic data rendering. We will deliver one million images – both real-world and synthetically rendered – of two types uranium storage and transportation containers with labelled ground truth and associated adversarial examples.

More Details

How Low Can You Go? Using Synthetic 3D Imagery to Drastically Reduce Real-World Training Data for Object Detection

Gastelum, Zoe N.; Shead, Timothy M.

Deep convolutional neural networks (DCNNs) currently provide state-of-the-art performance on image classification and object detection tasks, and there are many global security mission areas where such models could be extremely useful. Crucially, the success of these models is driven in large part by the widespread availability of high-quality open source data sets such as Image Net, Common Objects in Context (COCO), and KITTI, which contain millions of images with thousands of unique labels. However, global security relevant objects-of-interest can be difficult to obtain: relevant events are low frequency and high consequence; the content of relevant images is sensitive; and adversaries and proliferators seek to obscure their activities. For these cases where exemplar data is hard to come-by, even fine-tuning an existing model with available data can be effectively impossible. Recent work demonstrated that models can be trained using a combination of real-world and synthetic images generated from 3D representations; that such models can exceed the performance of models trained using real-world data alone; and that the generated images need not be perfectly realistic (Tremblay, et al., 2018). However, this approach still required hundreds to thousands of real-world images for training and fine tuning, which for sparse, global security-relevant datasets can be an unrealistic hurdle. In this research, we validate the performance and behavior of DCNN models as we drive the number of real-world images used for training object detection tasks down to a minimal set. We perform multiple experiments to identify the best approach to train DCNNs from an extremely small set of real-world images. In doing so, we: Develop state-of-the-art, parameterized 3D models based on real-world images and sample from their parameters to increase the variance in synthetic image training data; Use machine learning explainability techniques to highlight and correct through targeted training the biases that result from training using completely synthetic images; and Validate our results by comparing the performance of the models trained on synthetic data to one another, and to a control model created by fine-tuning an existing ImageNet-trained model with a limited number (hundreds) of real-world images.

More Details

Information-Theoretically Secure Distributed Machine Learning

Shead, Timothy M.; Berry, Jonathan W.; Phillips, Cynthia A.; Saia, Jared

A previously obscure area of cryptography known as Secure Multiparty Computation (MPC) is enjoying increased attention in the field of privacy-preserving machine learning (ML), because ML models implemented using MPC can be uniquely resistant to capture or reverse engineering by an adversary. In particular, an adversary who captures a share of a distributed MPC model provably cannot recover the model itself, nor data evaluated by the model, even by observing the model in operation. We report on our small project to survey current MPC software and judge its practicality for fielding mission-relevant distributed machine learning models.

More Details
Results 1–25 of 49
Results 1–25 of 49