Publications

Results 1–25 of 54

Search results

Jump to search filters

Data Validation Experiments with a Computer-Generated Imagery Dataset for International Nuclear Safeguards

ESARDA Bulletin

Gastelum, Zoe N.; Shead, Timothy M.; Marshall, Matthew

Computer vision models have great potential as tools for international nuclear safeguards verification activities, but off-the-shelf models require fine-tuning through transfer learning to detect relevant objects. Because open-source examples of safeguards-relevant objects are rare, and to evaluate the potential of synthetic training data for computer vision, we present the Limbo dataset. Limbo includes both real and computer-generated images of uranium hexafluoride containers for training computer vision models. We generated these images iteratively based on results from data validation experiments that are detailed here. The findings from these experiments are applicable both for the safeguards community and the broader community of computer vision research using synthetic data.

More Details

Towards a More Effective Hybrid Workforce Culture in a Computationally Focused Research Center

Chance, Frances S.; Lofstead, Gerald F.; Metodi, Tzvetan S.; Mitchell, Scott A.; Rutka, Phyllis A.; Steinmetz, Scott; Shead, Timothy M.; Teves, Joshua B.; Warrender, Christina E.

It is essential to Sandia National Laboratory’s continued success in scientific and technological advances and mission delivery to embrace a hybrid workforce culture under which current and future employees can thrive. This report focuses on the findings of the Hybrid Work Team for the Center for Computing Research, which met weekly from March to June 2023 and conducted a survey across the Center at Sandia. Conclusions in this report are drawn from the 9 authors of this report, which comprises the Hybrid Work Team, and 15 responses to a center-wide survey, as well as numerous conversations with colleagues. A major finding was widespread dissatisfaction with the quantity, execution, and tooling surrounding formal meetings with remote participants. While there was consensus that remote work enables people to produce high quality individual and technical work, there was also consensus that there was widespread social disconnect, with particular concern about hires that were made after the onset of the Covid-19 pandemic. There were many concerns about tooling and policy to facilitate remote collaboration both within Sandia and with its external collaborators. This report includes recommendations for mitigating these problems. For problems for which obvious recommendations cannot be made, ideas of what a successful solution might look like are presented.

More Details

Adapting Secure MultiParty Computation to Support Machine Learning in Radio Frequency Sensor Networks

Berry, Jonathan; Ganti, Anand; Goss, Kenneth; Mayer, Carolyn D.; Onunkwo, Uzoma; Phillips, Cynthia A.; Saia, Jarared; Shead, Timothy M.

In this project we developed and validated algorithms for privacy-preserving linear regression using a new variant of Secure Multiparty Computation (MPC) we call "Hybrid MPC" (hMPC). Our variant is intended to support low-power, unreliable networks of sensors with low-communication, fault-tolerant algorithms. In hMPC we do not share training data, even via secret sharing. Thus, agents are responsible for protecting their own local data. Only the machine learning (ML) model is protected with information-theoretic security guarantees against honest-but-curious agents. There are three primary advantages to this approach: (1) after setup, hMPC supports a communication-efficient matrix multiplication primitive, (2) organizations prevented by policy or technology from sharing any of their data can participate as agents in hMPC, and (3) large numbers of low-power agents can participate in hMPC. We have also created an open-source software library named "Cicada" to support hMPC applications with fault-tolerance. The fault-tolerance is important in our applications because the agents are vulnerable to failure or capture. We have demonstrated this capability at Sandia's Autonomy New Mexico laboratory through a simple machine-learning exercise with Raspberry Pi devices capturing and classifying images while flying on four drones.

More Details

Safeguards-Informed Hybrid Imagery Dataset [Poster]

Rutkowski, Joshua; Gastelum, Zoe N.; Shead, Timothy M.; Rushdi, Ahmad; Bolles, Jason; Mattes, Arielle

Deep Learning computer vision models require many thousands of properly labelled images for training, which is especially challenging for safeguards and nonproliferation, given that safeguards-relevant images are typically rare due to the sensitivity and limited availability of the technologies. Creating relevant images through real-world staging is costly and limiting in scope. Expert-labeling is expensive, time consuming, and error prone. We aim to develop a data set of both realworld and synthetic images that are relevant to the nuclear safeguards domain that can be used to support multiple data science research questions. In the process of developing this data, we aim to develop a novel workflow to validate synthetic images using machine learning explainability methods, testing among multiple computer vision algorithms, and iterative synthetic data rendering. We will deliver one million images – both real-world and synthetically rendered – of two types uranium storage and transportation containers with labelled ground truth and associated adversarial examples.

More Details

How Low Can You Go? Using Synthetic 3D Imagery to Drastically Reduce Real-World Training Data for Object Detection

Gastelum, Zoe N.; Shead, Timothy M.

Deep convolutional neural networks (DCNNs) currently provide state-of-the-art performance on image classification and object detection tasks, and there are many global security mission areas where such models could be extremely useful. Crucially, the success of these models is driven in large part by the widespread availability of high-quality open source data sets such as Image Net, Common Objects in Context (COCO), and KITTI, which contain millions of images with thousands of unique labels. However, global security relevant objects-of-interest can be difficult to obtain: relevant events are low frequency and high consequence; the content of relevant images is sensitive; and adversaries and proliferators seek to obscure their activities. For these cases where exemplar data is hard to come-by, even fine-tuning an existing model with available data can be effectively impossible. Recent work demonstrated that models can be trained using a combination of real-world and synthetic images generated from 3D representations; that such models can exceed the performance of models trained using real-world data alone; and that the generated images need not be perfectly realistic (Tremblay, et al., 2018). However, this approach still required hundreds to thousands of real-world images for training and fine tuning, which for sparse, global security-relevant datasets can be an unrealistic hurdle. In this research, we validate the performance and behavior of DCNN models as we drive the number of real-world images used for training object detection tasks down to a minimal set. We perform multiple experiments to identify the best approach to train DCNNs from an extremely small set of real-world images. In doing so, we: Develop state-of-the-art, parameterized 3D models based on real-world images and sample from their parameters to increase the variance in synthetic image training data; Use machine learning explainability techniques to highlight and correct through targeted training the biases that result from training using completely synthetic images; and Validate our results by comparing the performance of the models trained on synthetic data to one another, and to a control model created by fine-tuning an existing ImageNet-trained model with a limited number (hundreds) of real-world images.

More Details

Information-Theoretically Secure Distributed Machine Learning

Shead, Timothy M.; Berry, Jonathan; Phillips, Cynthia A.; Saia, Jared

A previously obscure area of cryptography known as Secure Multiparty Computation (MPC) is enjoying increased attention in the field of privacy-preserving machine learning (ML), because ML models implemented using MPC can be uniquely resistant to capture or reverse engineering by an adversary. In particular, an adversary who captures a share of a distributed MPC model provably cannot recover the model itself, nor data evaluated by the model, even by observing the model in operation. We report on our small project to survey current MPC software and judge its practicality for fielding mission-relevant distributed machine learning models.

More Details

Anomaly detection in scientific data using joint statistical moments

Journal of Computational Physics

Aditya, Konduri; Kolla, Hemanth; Kegelmeyer, William P.; Shead, Timothy M.; Ling, Julia; Davis, Warren L.

We propose an anomaly detection method for multi-variate scientific data based on analysis of high-order joint moments. Using kurtosis as a reliable measure of outliers, we suggest that principal kurtosis vectors, by analogy to principal component analysis (PCA) vectors, signify the principal directions along which outliers appear. The inception of an anomaly, then, manifests as a change in the principal values and vectors of kurtosis. Obtaining the principal kurtosis vectors requires decomposing a fourth order joint cumulant tensor for which we use a simple, computationally less expensive approach that involves performing a singular value decomposition (SVD) over the matricized tensor. We demonstrate the efficacy of this approach on synthetic data, and develop an algorithm to identify the occurrence of a spatial and/or temporal anomalous event in scientific phenomena. The algorithm decomposes the data into several spatial sub-domains and time steps to identify regions with such events. Feature moment metrics, based on the alignments of the principal kurtosis vectors, are computed at each sub-domain and time step for all features to quantify their relative importance towards the overall kurtosis in the data. Accordingly, spatial and temporal anomaly metrics for each sub-domain are proposed using the Hellinger distance of the feature moment metric distribution from a suitable nominal distribution. We apply the algorithm to two turbulent auto-ignition combustion cases and demonstrate that the anomaly metrics reliably capture the occurrence of auto-ignition in relevant spatial sub-domains at the right time steps.

More Details
Results 1–25 of 54
Results 1–25 of 54