Publications

21 Results
Skip to search filters

Data Science and Machine Learning for Genome Security

Verzi, Stephen J.; Krishnakumar, Raga K.; Levin, Drew L.; Krofcheck, Daniel J.; Williams, Kelly P.

This report describes research conducted to use data science and machine learning methods to distinguish targeted genome editing versus natural mutation and sequencer machine noise. Genome editing capabilities have been around for more than 20 years, and the efficiencies of these techniques has improved dramatically in the last 5+ years, notably with the rise of CRISPR-Cas technology. Whether or not a specific genome has been the target of an edit is concern for U.S. national security. The research detailed in this report provides first steps to address this concern. A large amount of data is necessary in our research, thus we invested considerable time collecting and processing it. We use an ensemble of decision tree and deep neural network machine learning methods as well as anomaly detection to detect genome edits given either whole exome or genome DNA reads. The edit detection results we obtained with our algorithms tested against samples held out during training of our methods are significantly better than random guessing, achieving high F1 and recall scores as well as with precision overall.

More Details

Deep Reinforcement Learning for Online Distribution Power System Cybersecurity Protection

2021 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, SmartGridComm 2021

Bailey, Tyson B.; Johnson, Jay; Levin, Drew L.

The sophistication and regularity of power system cybersecurity attacks has been growing in the last decade, leading researchers to investigate new innovative, cyber-resilient tools to help grid operators defend their networks and power systems. One promising approach is to apply recent advances in deep reinforcement learning (DRL) to aid grid operators in making real-time changes to the power system equipment to counteract malicious actions. While multiple transmission studies have been conducted in the past, in this work we investigate the possibility of defending distribution power systems using a DRL agent who has control of a collection of utility-owned distributed energy resources (DER). A game board using a modified version of the IEEE 13-bus model was simulated using OpenDSS to train the DRL agent and compare its performance to a random agent, a greedy agent, and human players. Both the DRL agent and the greedy approach performed well, suggesting a greedy approach can be appropriate for computationally tractable system configurations and a DRL agent is a viable path forward for systems of increased complexity. This work paves the way to create multi-player distribution system control games which could be designed to defend the power grid under a sophisticated cyber-attack.

More Details

Movement and spatial specificity support scaling in ant colonies and immune systems: Application to national biosurveillance

Springer Proceedings in Complexity

Flanagan, Tatiana P.; Beyeler, Walter E.; Levin, Drew L.; Finley, Patrick D.; Moses, Melanie

Data obtained from biosurveillance can be used by public health systems to detect and respond to disease outbreaks and save lives. However, existing data is distributed across large geographic areas, and both the quality and type of data vary in space and time. We discuss a framework for analyzing biosurveillance information to minimize detection time and maximize detection accuracy while scaling the analysis over large regions. We propose that strategies used by canonical biological complex systems, which are adapted to diverse environments, provide good models for the design of a robust, adaptive, and scalable biosurveillance system. Drawing from knowledge of the adaptive immune system, and ant colonies, we examine strategies that support the scaling of detection in order to search and respond in large areas with dynamic distributions of data. Based on this research, we discuss a bioinspired approach for a distributed, adaptive, and scalable biosurveillance system.

More Details

Biologically inspired approaches for biosurveillance anomaly detection and data fusion

Finley, Patrick D.; Finley, Patrick D.; Finley, Patrick D.; Finley, Patrick D.; Levin, Drew L.; Levin, Drew L.; Levin, Drew L.; Levin, Drew L.; Flanagan, Tatiana P.; Flanagan, Tatiana P.; Flanagan, Tatiana P.; Flanagan, Tatiana P.; Beyeler, Walter E.; Beyeler, Walter E.; Beyeler, Walter E.; Beyeler, Walter E.; Mitchell, Michael D.; Mitchell, Michael D.; Mitchell, Michael D.; Mitchell, Michael D.; Ray, Jaideep R.; Ray, Jaideep R.; Ray, Jaideep R.; Ray, Jaideep R.; Moses, Melanie M.; Moses, Melanie M.; Moses, Melanie M.; Moses, Melanie M.; Forrest, Stephanie F.; Forrest, Stephanie F.; Forrest, Stephanie F.; Forrest, Stephanie F.

This study developed and tested biologically inspired computational methods to detect anomalous signals in data streams that could indicate a pending outbreak or bio-weapon attack. Current large- scale biosurveillance systems are plagued by two principal deficiencies: (1) timely detection of disease-indicating signals in noisy data and (2) anomaly detection across multiple channels. Anomaly detectors and data fusion components modeled after human immune system processes were tested against a variety of natural and synthetic surveillance datasets. A pilot scale immune-system-based biosurveillance system performed at least as well as traditional statistical anomaly detection data fusion approaches. Machine learning approaches leveraging Deep Learning recurrent neural networks were developed and applied to challenging unstructured and multimodal health surveillance data. Within the limits imposed of data availability, both immune systems and deep learning methods were found to improve anomaly detection and data fusion performance for particularly challenging data subsets. ACKNOWLEDGEMENTS The authors acknowledge the close collaboration of Scott Lee, Jason Thomas, and Chad Heilig from the US Centers for Disease Control (CDC) in this effort. De-identified biosurveillance data provided by Ken Jeter of the New Mexico Department of Health proved to be an important contribution to our work. Discussions with members of the International Society of Disease Surveillance helped the researchers focus on questions relevant to practicing public health professionals. Funding for this work was provided by Sandia National Laboratories' Laboratory Directed Research and Development program.

More Details

Integrated Cyber/Physical Grid Resiliency Modeling

Dawson, Lon A.; Verzi, Stephen J.; Levin, Drew L.; Melander, Darryl J.; Sorensen, Asael H.; Cauthen, Katherine R.; Wilches-Bernal, Felipe; Berg, Timothy M.; Lavrova, Olga A.; Guttromson, Ross G.

This project explored coupling modeling and analysis methods from multiple domains to address complex hybrid (cyber and physical) attacks on mission critical infrastructure. Robust methods to integrate these complex systems are necessary to enable large trade-space exploration including dynamic and evolving cyber threats and mitigations. Reinforcement learning employing deep neural networks, as in the AlphaGo Zero solution, was used to identify "best" (or approximately optimal) resilience strategies for operation of a cyber/physical grid model. A prototype platform was developed and the machine learning (ML) algorithm was made to play itself in a game of 'Hurt the Grid'. This proof of concept shows that machine learning optimization can help us understand and control complex, multi-dimensional grid space. A simple, yet high-fidelity model proves that the data have spatial correlation which is necessary for any optimization or control. Our prototype analysis showed that the reinforcement learning successfully improved adversary and defender knowledge to manipulate the grid. When expanded to more representative models, this exact type of machine learning will inform grid operations and defense - supporting mitigation development to defend the grid from complex cyber attacks! This same research can be expanded to similar complex domains.

More Details

Synthetic data generators for the evaluation of biosurveillance outbreak detection algorithms

Levin, Drew L.; Finley, Patrick D.

The research and development of new algorithmic and statistical methods of outbreak detection is an ongoing research priority in the field of biosurveillance. The early detection of emergent disease outbreaks is crucial for effective treatment and mitigation. New detection methods must be compared to established approaches for proper evaluation. This comparison requires biosurveillance test data that accurately reflects the complexity of the real-world data it will be applied to. While the test and evaluation of new detection methods is best performed on real data, it is often impractical to obtain such data as it is either proprietary or limited in scope. Thus, scientists must turn to synthetic data generation to provide enough data to properly eval- uate new detection methodologies. This paper evaluates three such synthetic data sources: The WSARE dataset, the Noufilay equation-based approach, and the Project Mimic data generator.

More Details

Negative selection based anomaly detector for multimodal health data

2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings

Levin, Drew L.; Moses, Melanie; Flanagan, Tatiana P.; Forrest, Stephanie; Finley, Patrick D.

Early detection of emerging disease outbreaks is crucial to effective containment and response, yet initial outbreak signatures can be difficult to detect with automated methods. Outbreaks may be masked by noisy data, and signs of an outbreak may be hidden across multiple data feeds. Current biosurveillance methods often perform unimodal statistical analyses that are unable to intelligently leverage multiple correlated data of different types while still retaining quantitative sensitivity. In this paper, we propose and implement an anomaly detection system for health data based upon the human immune system. The adaptive immune system operates over a high-dimensional antigen space in a distributed manner, allowing it to efficiently scale without relying on a centralized controller. Our negative selection algorithm based on the immune system provides effective and scalable distributed anomaly detection for biosurveillance. It detects anomalies in the large, complex data from modern health monitoring data feeds with low false positive rates. Our bootstrap aggregation method improves performance on high-dimensional data sets, and we implement a parallelized version of the algorithm to demonstrate the potential to implement it on a scalable distributed architecture. Our negative selection algorithm is able to detect 90% of all outbreaks with a false positive rate of 11.8% in a publicly available multimodal synthetic health record data set. The scalability and performance of the negative selection algorithm demonstrate that immune computation can provide effective approaches for national and global scale biosurveillence.

More Details

A Complex Systems Approach to More Resilient Multi-Layered Security Systems

Jones, Katherine A.; Bandlow, Alisa B.; Waddell, Lucas W.; Nozick, Linda K.; Levin, Drew L.; Brown, Nathanael J.

In July 2012, protestors cut through security fences and gained access to the Y-12 National Security Complex. This was believed to be a highly reliable, multi-layered security system. This report documents the results of a Laboratory Directed Research and Development (LDRD) project that created a consistent, robust mathematical framework using complex systems analysis algorithms and techniques to better understand the emergent behavior, vulnerabilities and resiliency of multi-layered security systems subject to budget constraints and competing security priorities. Because there are several dimensions to security system performance and a range of attacks that might occur, the framework is multi-objective for a performance frontier to be estimated. This research explicitly uses probability of intruder interruption given detection (PI) as the primary resilience metric. We demonstrate the utility of this framework with both notional as well as real-world examples of Physical Protection Systems (PPSs) and validate using a well-established force-on-force simulation tool, Umbra.

More Details
21 Results
21 Results