Publications

25 Results

Search results

Jump to search filters

Preliminary Results for Using Uncertainty and Out-of-distribution Detection to Identify Unreliable Predictions

Doak, Justin E.; Darling, Michael C.

As machine learning (ML) models are deployed into an ever-diversifying set of application spaces, ranging from self-driving cars to cybersecurity to climate modeling, the need to carefully evaluate model credibility becomes increasingly important. Uncertainty quantification (UQ) provides important information about the ability of a learned model to make sound predictions, often with respect to individual test cases. However, most UQ methods for ML are themselves data-driven and therefore susceptible to the same knowledge gaps as the models themselves. Specifically, UQ helps to identify points near decision boundaries where the models fit the data poorly, yet predictions can score as certain for points that are under-represented by the training data and thus out-of-distribution (OOD). One method for evaluating the quality of both ML models and their associated uncertainty estimates is out-of-distribution detection (OODD). We combine OODD with UQ to provide insights into the reliability of the individual predictions made by an ML model.

More Details

Decision Science for Machine Learning (DeSciML)

Darling, Michael C.; Field, Richard V.; Smith, Mark A.; Doak, Justin E.; Headen, James M.; Stracuzzi, David J.

The increasing use of machine learning (ML) models to support high-consequence decision making drives a need to increase the rigor of ML-based decision making. Critical problems ranging from climate change to nonproliferation monitoring rely on machine learning for aspects of their analyses. Likewise, future technologies, such as incorporation of data-driven methods into the stockpile surveillance and predictive failure analysis for weapons components, will all rely on decision-making that incorporates the output of machine learning models. In this project, our main focus was the development of decision scientific methods that combine uncertainty estimates for machine learning predictions, with a domain-specific model of error costs. Other focus areas include uncertainty measurement in ML predictions, designing decision rules using multiobjecive optimization, the value of uncertainty reduction, and decision-tailored uncertainty quantification for probability estimates. By laying foundations for rigorous decision making based on the predictions of machine learning models, these approaches are directly relevant to every national security mission that applies, or will apply, machine learning to data, most of which entail some decision context.

More Details

Self-updating models with error remediation

Proceedings of SPIE - The International Society for Optical Engineering

Doak, Justin E.; Smith, Michael R.; Ingram, Joey

Many environments currently employ machine learning models for data processing and analytics that were built using a limited number of training data points. Once deployed, the models are exposed to significant amounts of previously-unseen data, not all of which is representative of the original, limited training data. However, updating these deployed models can be difficult due to logistical, bandwidth, time, hardware, and/or data sensitivity constraints. We propose a framework, Self-Updating Models with Error Remediation (SUMER), in which a deployed model updates itself as new data becomes available. SUMER uses techniques from semi-supervised learning and noise remediation to iteratively retrain a deployed model using intelligently-chosen predictions from the model as the labels for new training iterations. A key component of SUMER is the notion of error remediation as self-labeled data can be susceptible to the propagation of errors. We investigate the use of SUMER across various data sets and iterations. We find that self-updating models (SUMs) generally perform better than models that do not attempt to self-update when presented with additional previously-unseen data. This performance gap is accentuated in cases where there is only limited amounts of initial training data. We also find that the performance of SUMER is generally better than the performance of SUMs, demonstrating a benefit in applying error remediation. Consequently, SUMER can autonomously enhance the operational capabilities of existing data processing systems by intelligently updating models in dynamic environments.

More Details

Tracking Cyber Adversaries with Adaptive Indicators of Compromise

Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017

Doak, Justin E.; Ingram, Joey; Mulder, Samuel A.; Naegle, John H.; Cox, Jonathan A.; Aimone, James B.; Dixon, Kevin R.; James, Conrad D.; Follett, David R.

A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expression (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities.In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naïve solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naïve solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.

More Details

Dynamic Analysis of Executables to Detect and Characterize Malware

Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018

Smith, Michael R.; Ingram, Joey; Lamb, Christopher L.; Draelos, Timothy J.; Doak, Justin E.; Aimone, James B.; James, Conrad D.

Malware detection and remediation is an on-going task for computer security and IT professionals. Here, we examine the use of neural algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored. We examine several deep learning techniques, and liquid state machines baselined against a random forest. The experiments examine the effects of concept drift to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the performance evaluation on an operational network may not match the performance achieved in training. Namely, the CAA may be about the same, but the values for precision and recall over the malware can change significantly. We structure experiments to highlight these caveats and offer insights into expected performance in operational environments. In addition, we use the induced models to better understand what differentiates malware samples from goodware, which can further be used as a forensics tool to provide directions for investigation and remediation.

More Details

Temporal Cyber Attack Detection

Ingram, Joey; Draelos, Timothy J.; Sahakian, Meghan A.; Doak, Justin E.

Rigorous characterization of the performance and generalization ability of cyber defense systems is extremely difficult, making it hard to gauge uncertainty, and thus, confidence. This difficulty largely stems from a lack of labeled attack data that fully explores the potential adversarial space. Currently, performance of cyber defense systems is typically evaluated in a qualitative manner by manually inspecting the results of the system on live data and adjusting as needed. Additionally, machine learning has shown promise in deriving models that automatically learn indicators of compromise that are more robust than analyst-derived detectors. However, to generate these models, most algorithms require large amounts of labeled data (i.e., examples of attacks). Algorithms that do not require annotated data to derive models are similarly at a disadvantage, because labeled data is still necessary when evaluating performance. In this work, we explore the use of temporal generative models to learn cyber attack graph representations and automatically generate data for experimentation and evaluation. Training and evaluating cyber systems and machine learning models requires significant, annotated data, which is typically collected and labeled by hand for one-off experiments. Automatically generating such data helps derive/evaluate detection models and ensures reproducibility of results. Experimentally, we demonstrate the efficacy of generative sequence analysis techniques on learning the structure of attack graphs, based on a realistic example. These derived models can then be used to generate more data. Additionally, we provide a roadmap for future research efforts in this area.

More Details

Learning to rank for alert triage

2016 IEEE Symposium on Technologies for Homeland Security, HST 2016

Bierma, Michael B.; Doak, Justin E.; Hudson, Corey H.

As cyber monitoring capabilities expand and data rates increase, cyber security analysts must filter through an increasing number of alerts in order to identify potential intrusions on the network. This process is often manual and time-consuming, which limits the number of alerts an analyst can process. This generation of a vast number of alerts without any kind of ranking or prioritization is often referred to as alert desensitization [1]. This is the phenomenon where competent analysts become so numbed by the barrage of false positives that they are unable to identify the true positives, leading to unfortunate breaches. Our goal is to alleviate alert desensitization by placing the most important alerts at the front of the queue. With less time and energy expended investigating false positives, critical alerts may not be overlooked allowing timely responses to potential breaches. This paper discusses the use of supervised machine learning to rank these cyber security alerts to ensure that an analyst's time and energy are focused on the most important alerts.

More Details

Learning to rank for alert triage

2016 IEEE Symposium on Technologies for Homeland Security, HST 2016

Bierma, Michael B.; Doak, Justin E.; Hudson, Corey H.

As cyber monitoring capabilities expand and data rates increase, cyber security analysts must filter through an increasing number of alerts in order to identify potential intrusions on the network. This process is often manual and time-consuming, which limits the number of alerts an analyst can process. This generation of a vast number of alerts without any kind of ranking or prioritization is often referred to as alert desensitization [1]. This is the phenomenon where competent analysts become so numbed by the barrage of false positives that they are unable to identify the true positives, leading to unfortunate breaches. Our goal is to alleviate alert desensitization by placing the most important alerts at the front of the queue. With less time and energy expended investigating false positives, critical alerts may not be overlooked allowing timely responses to potential breaches. This paper discusses the use of supervised machine learning to rank these cyber security alerts to ensure that an analyst's time and energy are focused on the most important alerts.

More Details

Dynamic defense workshop :

Haas, Jason J.; Doak, Justin E.; Crosby, Sean M.; Helinski, Ryan H.; Lamb, Christopher L.

On September 5th and 6th, 2012, the Dynamic Defense Workshop: From Research to Practice brought together researchers from academia, industry, and Sandia with the goals of increasing collaboration between Sandia National Laboratories and external organizations, de ning and un- derstanding dynamic, or moving target, defense concepts and directions, and gaining a greater understanding of the state of the art for dynamic defense. Through the workshop, we broadened and re ned our de nition and understanding, identi ed new approaches to inherent challenges, and de ned principles of dynamic defense. Half of the workshop was devoted to presentations of current state-of-the-art work. Presentation topics included areas such as the failure of current defenses, threats, techniques, goals of dynamic defense, theory, foundations of dynamic defense, future directions and open research questions related to dynamic defense. The remainder of the workshop was discussion, which was broken down into sessions on de ning challenges, applications to host or mobile environments, applications to enterprise network environments, exploring research and operational taxonomies, and determining how to apply scienti c rigor to and investigating the eld of dynamic defense.

More Details
25 Results
25 Results