Publications Search

Test and Evaluation of Systems with Embedded Machine Learning Components

ITEA Journal of Test and Evaluation

Smith, Michael R.; Cueller, Christopher R.; Jose, Deepu J.; Ingram, Joey; Martinez, Carianne M.; Debonis, Mark

As Machine Learning (ML) continues to advance, it is being integrated into more systems. Often, the ML component represents a significant portion of the system that reduces the burden on the end user or significantly improves task performance. However, the ML component represents an unknown complex phenomenon that is learned from collected data without the need to be explicitly programmed. Despite the improvement in task performance, the models are often black boxes. Evaluating the credibility and the vulnerabilities of ML models poses a gap in current test and evaluation practice. For high consequence applications, the lack of testing and evaluation procedures represents a significant source of uncertainty and risk. To help reduce that risk, here we present considerations to evaluate systems embedded with an ML component within a red-teaming inspired methodology. We focus on (1) cyber vulnerabilities to an ML model, (2) evaluating performance gaps, and (3) adversarial ML vulnerabilities.

More Details

TYPE Journal Article YEAR 2023

DOI OSTI

Modeling Correlated Features for Machine Learning Classification

Field, Richard V.; Smith, Michael R.; Ingram, Joey; Domschot, Eva D.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2023

DOI OSTI

Software Verification Toolkit (SVT): Survey on Available Software Verification Tools and Future Direction

Davis, Nickolas A.; Berger, Taylor E.; McDonald, Arthur A.; Ingram, Joey; Foster, James D.; Sanchez, Katherine A.

Writing software is difficult. However, writing complex, well tested and designed, and functionally correct software is incredibly difficult. An entire field of study is devoted to the validation and verification of software to address this problem, and in this paper we analyze the landscape of currently available third party software. We have divided our analyses into three separate subsections with regards to software validation: formal methods, static analysis, and test generation. Formal verification is the most complex method in which to validate software correctness, but also the most thorough as it truly validates the mathematical validity of the source code. Static analysis generally is relegated to abstract syntax tree traversal techniques to find errors related to faulty software such as memory leaks or stack overflow issues. Automatic test generation is similar in implementation to static analysis, but pushes a bit further in verifying the boundedness of function inputs and outputs with regards to annotated or parsed criteria. The crux of this report is to analyze and describe the software tools that implement these techniques to validate and verify software. Pros and cons related to installation, utilization, and capabilities of the frameworks are described, and reproducible examples are provided with a focus on usability. The initial survey concluded that the most interesting tools of note are Z3, Isabelle/HOL, and TLA+ with regards to formal verification; and Infer, Frama-C, and SonarQube with regards to static analysis. With these tools in mind, a final conjecture is provided that describes future avenues of utilizing these tools for developing a verification framework to assist in validating existing software at Sandia National Laboratories.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

Going Beyond Signature Malware Detection by Learning Behaviors

Johnson, Nicholas T.; Domschot, Eva; Khanna, Kanad K.; Kegelmeyer, William P.; Lamb, Christopher L.; Ramyaa, Ramyaa; Smith, Michael R.; Verzi, Stephen J.; Zhou, Xin Z.; Carbajal, Armida J.; Haus, Bridget; Ingram, Joey

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2021

DOI OSTI

Mind the Gap: On Bridging the Semantic Gap between Machine Learning and Malware Analysis

AISec 2020 - Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security

Smith, Michael R.; Johnson, Nicholas T.; Ingram, Joey; Carbajal, Armida J.; Haus, Bridget I.; Domschot, Eva; Ramyaa, Ramyaa; Lamb, Christopher L.; Verzi, Stephen J.; Kegelmeyer, William P.

Machine learning (ML) techniques are being used to detect increasing amounts of malware and variants. Despite successful applications of ML, we hypothesize that the full potential of ML is not realized in malware analysis (MA) due to a semantic gap between the ML and MA communities-as demonstrated in the data that is used. Due in part to the available data, ML has primarily focused on detection whereas MA is also interested in identifying behaviors. We review existing open-source malware datasets used in ML and find a lack of behavioral information that could facilitate stronger impact by ML in MA. As a first step in bridging this gap, we label existing data with behavioral information using open-source MA reports-1) altering the analysis from identifying malware to identifying behaviors, 2)~aligning ML better with MA, and 3)~allowing ML models to generalize to novel malware in a zero/few-shot learning manner. We classify the behavior of a malware family not seen during training using transfer learning from a state-of-the-art model for malware family classification and achieve 57%-84% accuracy on behavioral identification but fail to outperform the baseline set by a majority class predictor. This highlights opportunities for improvement on this task related to the data representation, the need for malware specific ML techniques, and a larger training set of malware samples labeled with behaviors.

More Details

TYPE Conference Presenation YEAR 2020

DOI OSTI Scopus

MalGen: On Bridging the Semantic Gap between Machine Learning and Malware Analysis

Smith, Michael R.; Carbajal, Armida J.; Domschot, Eva; Haus, Bridget I.; Ingram, Joey; Johnson, Nicholas T.; Kegelmeyer, William P.; Lamb, Christopher L.; Ramyaa, Ramyaa; Verzi, Stephen J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Mind the Gap: On Bridging the Semantic Gap between Machine Learning and Malware Analysis

Smith, Michael R.; Johnson, Nicholas T.; Ingram, Joey; Carbajal, Armida J.; Haus, Bridget I.; Domschot, Eva; Ramyaa, Ramyaa; Lamb, Christopher L.; Verzi, Stephen J.; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

DOI OSTI

Mind the Gap: On Bridging the Semantic Gap between Machine Learning and Information Security

Smith, Michael R.; Johnson, Nicholas T.; Ingram, Joey; Carbajal, Armida J.; Ramyaa, Ramyaa; Domschot, Evelyn; Lamb, Christopher L.; Verzi, Stephen J.; Kegelmeyer, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Self-updating models with error remediation

Proceedings of SPIE - The International Society for Optical Engineering

Doak, Justin E.; Smith, Michael R.; Ingram, Joey

Many environments currently employ machine learning models for data processing and analytics that were built using a limited number of training data points. Once deployed, the models are exposed to significant amounts of previously-unseen data, not all of which is representative of the original, limited training data. However, updating these deployed models can be difficult due to logistical, bandwidth, time, hardware, and/or data sensitivity constraints. We propose a framework, Self-Updating Models with Error Remediation (SUMER), in which a deployed model updates itself as new data becomes available. SUMER uses techniques from semi-supervised learning and noise remediation to iteratively retrain a deployed model using intelligently-chosen predictions from the model as the labels for new training iterations. A key component of SUMER is the notion of error remediation as self-labeled data can be susceptible to the propagation of errors. We investigate the use of SUMER across various data sets and iterations. We find that self-updating models (SUMs) generally perform better than models that do not attempt to self-update when presented with additional previously-unseen data. This performance gap is accentuated in cases where there is only limited amounts of initial training data. We also find that the performance of SUMER is generally better than the performance of SUMs, demonstrating a benefit in applying error remediation. Consequently, SUMER can autonomously enhance the operational capabilities of existing data processing systems by intelligently updating models in dynamic environments.

More Details

TYPE Conference Poster YEAR 2020

DOI OSTI Scopus

Self-updating Models with Error Remediation for Bandwidth-constrained Environments

Doak, Justin E.; Ingram, Joey; Smith, Michael R.; Vineyard, Craig M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Tracking Cyber Adversaries with Adaptive Indicators of Compromise

Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017

Doak, Justin E.; Ingram, Joey; Mulder, Samuel A.; Naegle, John H.; Cox, Jonathan A.; Aimone, James B.; Dixon, Kevin R.; James, Conrad D.; Follett, David R.

A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expression (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities.In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naïve solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naïve solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI Scopus

Dynamic Analysis of Executables to Detect and Characterize Malware

Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018

Smith, Michael R.; Ingram, Joey; Lamb, Christopher L.; Draelos, Timothy J.; Doak, Justin E.; Aimone, James B.; James, Conrad D.

Malware detection and remediation is an on-going task for computer security and IT professionals. Here, we examine the use of neural algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored. We examine several deep learning techniques, and liquid state machines baselined against a random forest. The experiments examine the effects of concept drift to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the performance evaluation on an operational network may not match the performance achieved in training. Namely, the CAA may be about the same, but the values for precision and recall over the malware can change significantly. We structure experiments to highlight these caveats and offer insights into expected performance in operational environments. In addition, we use the induced models to better understand what differentiates malware samples from goodware, which can further be used as a forensics tool to provide directions for investigation and remediation.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI Scopus

Sequence Embeddings using LSTM Networks: Applications to Cybersecurity

Gandhi, Apurva G.; Kelkar, Mahimna K.; Ingram, Joey; Martin, Shawn

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Analyzing Source Code with LSTM Sequence Embeddings

Martin, Shawn; Gandhi, Apurva G.; Kelkar, Mahimna; Ingram, Joey

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Temporal Cyber Attack Detection

Ingram, Joey; Draelos, Timothy J.; Sahakian, Meghan A.; Doak, Justin E.

Rigorous characterization of the performance and generalization ability of cyber defense systems is extremely difficult, making it hard to gauge uncertainty, and thus, confidence. This difficulty largely stems from a lack of labeled attack data that fully explores the potential adversarial space. Currently, performance of cyber defense systems is typically evaluated in a qualitative manner by manually inspecting the results of the system on live data and adjusting as needed. Additionally, machine learning has shown promise in deriving models that automatically learn indicators of compromise that are more robust than analyst-derived detectors. However, to generate these models, most algorithms require large amounts of labeled data (i.e., examples of attacks). Algorithms that do not require annotated data to derive models are similarly at a disadvantage, because labeled data is still necessary when evaluating performance. In this work, we explore the use of temporal generative models to learn cyber attack graph representations and automatically generate data for experimentation and evaluation. Training and evaluating cyber systems and machine learning models requires significant, annotated data, which is typically collected and labeled by hand for one-off experiments. Automatically generating such data helps derive/evaluate detection models and ensures reproducibility of results. Experimentally, we demonstrate the efficacy of generative sequence analysis techniques on learning the structure of attack graphs, based on a realistic example. These derived models can then be used to generate more data. Additionally, we provide a roadmap for future research efforts in this area.

More Details

TYPE Other Report YEAR 2017

DOI DOI OSTI OSTI

Tracking Cyber Adversaries with Adaptive Indicators of Compromise

Doak, Justin E.; Ingram, Joey; Mulder, Samuel A.; Naegle, John H.; Cox, Jonathan A.; Aimone, James B.; Dixon, Kevin R.; James, Conrad D.; Follet, David R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI DOI OSTI

An Evaluation of Sequence Learning Methods for Detecting Malware Using System Call Traces

Ingram, Joey; Draelos, Timothy J.; Lamb, Christopher L.; Smith, Michael R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Statistical Techniques For Real-time Anomaly Detection Using Spark Over Multi-source VMware Performance Data

Sandia journal manuscript; Not yet accepted for publication

Solaimani, Mohiuddin; Iftekhar, Mohammed; Khan, Latifur; Thuraisingham, Bhavani; Ingram, Joey

Anomaly detection refers to the identi cation of an irregular or unusual pat- tern which deviates from what is standard, normal, or expected. Such deviated patterns typically correspond to samples of interest and are assigned different labels in different domains, such as outliers, anomalies, exceptions, or malware. Detecting anomalies in fast, voluminous streams of data is a formidable chal- lenge. This paper presents a novel, generic, real-time distributed anomaly detection framework for heterogeneous streaming data where anomalies appear as a group. We have developed a distributed statistical approach to build a model and later use it to detect anomaly. As a case study, we investigate group anomaly de- tection for a VMware-based cloud data center, which maintains a large number of virtual machines (VMs). We have built our framework using Apache Spark to get higher throughput and lower data processing time on streaming data. We have developed a window-based statistical anomaly detection technique to detect anomalies that appear sporadically. We then relaxed this constraint with higher accuracy by implementing a cluster-based technique to detect sporadic and continuous anomalies. We conclude that our cluster-based technique out- performs other statistical techniques with higher accuracy and lower processing time.

More Details

TYPE Journal Article YEAR 2015

OSTI OSTI

Adaptive Framework for Classification and Novel Class Detection over Evolving Data Streams with Limited Labeled Data

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Haque, Ahsanul; Khan, Latifur; Baron, Michael; Ingram, Joey

Most approaches to classifying evolving data streams either divide the stream of data into fixed-size chunks or use gradual forgetting to address the problems of infinite length and concept drift. Finding the fixed size of the chunks or choosing a forgetting rate without prior knowledge about time-scale of change is not a trivial task. As a result, these approaches suffer from a trade-off between performance and sensitivity. To address this problem, we present a framework which uses change detection techniques on the classifier performance to determine chunk boundaries dynamically. Though this framework exhibits good performance, it is heavily dependent on the availability of true labels of data instances. However, labeled data instances are scarce in realistic settings and not readily available. Therefore, we present a second framework which is unsupervised in nature, and exploits change detection on classifier confidence values to determine chunk boundaries dynamically. In this way, it avoids the use of labeled data while still addressing the problems of infinite length and concept drift. Moreover, both of our proposed frameworks address the concept evolution problem by detecting outliers having similar values for the attributes. We provide theoretical proof that our change detection method works better than other state-of-the-art approaches in this particular scenario. Results from experiments on various benchmark and synthetic data sets also show the efficiency of our proposed frameworks.

More Details

TYPE Journal Article YEAR 2015

OSTI OSTI