Publications

Results 1–25 of 123

Search results

Jump to search filters

Partitioning Communication Streams Into Graph Snapshots

IEEE Transactions on Network Science and Engineering

Wendt, Jeremy D.; Field, Richard V.; Phillips, Cynthia A.; Prasadan, Arvind P.; Wilson, Tegan; Soundarajan, Sucheta; Bhowmick, Sanjukta

We present EASEE (Edge Advertisements into Snapshots using Evolving Expectations) for partitioning streaming communication data into static graph snapshots. Given streaming communication events (A talks to B), EASEE identifies when events suffice for a static graph (a snapshot). EASEE uses combinatorial statistical models to adaptively find when a snapshot is stable, while watching for significant data shifts - indicating a new snapshot should begin. If snapshots are not found carefully, they poorly represent the underlying data - and downstream graph analytics fail: We show a community detection example. We demonstrate EASEE's strengths against several real-world datasets, and its accuracy against known-answer synthetic datasets. Synthetic datasets' results show that (1) EASEE finds known-answer data shifts very quickly; and (2) ignoring these shifts drastically affects analytics on resulting snapshots. We show that previous work misses these shifts. Further, we evaluate EASEE against seven real-world datasets (330 K to 2.5B events), and find snapshot-over-time behaviors missed by previous works. Finally, we show that the resulting snapshots' measured properties (e.g., graph density) are altered by how snapshots are identified from the communication event stream. In particular, EASEE's snapshots do not generally 'densify' over time, contradicting previous influential results that used simpler partitioning methods.

More Details

A Decision Theoretic Approach To Optimizing Machine Learning Decisions with Prediction Uncertainty

Field, Richard V.; Darling, Michael C.

While the use of machine learning (ML) classifiers is widespread, their output is often not part of any follow-on decision-making process. To illustrate, consider the scenario where we have developed and trained an ML classifier to find malicious URL links. In this scenario, network administrators must decide whether to allow a computer user to visit a particular website, or to instead block access because the site is deemed malicious. It would be very beneficial if decisions such as these could be made automatically using a trained ML classifier. Unfortunately, due to a variety of reasons discussed herein, the output from these classifiers can be uncertain, rendering downstream decisions difficult. Herein, we provide a framework for: (1) quantifying and propagating uncertainty in ML classifiers; (2) formally linking ML outputs with the decision-making process; and (3) making optimal decisions for classification under uncertainty with single or multiple objectives.

More Details

Decision Science for Machine Learning (DeSciML)

Darling, Michael C.; Field, Richard V.; Smith, Mark A.; Doak, Justin E.; Headen, James M.; Stracuzzi, David J.

The increasing use of machine learning (ML) models to support high-consequence decision making drives a need to increase the rigor of ML-based decision making. Critical problems ranging from climate change to nonproliferation monitoring rely on machine learning for aspects of their analyses. Likewise, future technologies, such as incorporation of data-driven methods into the stockpile surveillance and predictive failure analysis for weapons components, will all rely on decision-making that incorporates the output of machine learning models. In this project, our main focus was the development of decision scientific methods that combine uncertainty estimates for machine learning predictions, with a domain-specific model of error costs. Other focus areas include uncertainty measurement in ML predictions, designing decision rules using multiobjecive optimization, the value of uncertainty reduction, and decision-tailored uncertainty quantification for probability estimates. By laying foundations for rigorous decision making based on the predictions of machine learning models, these approaches are directly relevant to every national security mission that applies, or will apply, machine learning to data, most of which entail some decision context.

More Details

SAGE Intrusion Detection System: Sensitivity Analysis Guided Explainability for Machine Learning

Smith, Michael R.; Laros, James H.; Ames, Arlo L.; Carey, Alycia N.; Cueller, Christopher R.; Field, Richard V.; Maxfield, Trevor; Mitchell, Scott A.; Morris, Elizabeth S.; Moss, Blake C.; Nyre-Yu, Megan N.; Rushdi, Ahmad R.; Stites, Mallory C.; Smutz, Charles S.; Zhou, Xin Z.

This report details the results of a three-fold investigation of sensitivity analysis (SA) for machine learning (ML) explainability (MLE): (1) the mathematical assessment of the fidelity of an explanation with respect to a learned ML model, (2) quantifying the trustworthiness of a prediction, and (3) the impact of MLE on the efficiency of end-users through multiple users studies. We focused on the cybersecurity domain as the data is inherently non-intuitive. As ML is being using in an increasing number of domains, including domains where being wrong can elicit high consequences, MLE has been proposed as a means of generating trust in a learned ML models by end users. However, little analysis has been performed to determine if the explanations accurately represent the target model and they themselves should be trusted beyond subjective inspection. Current state-of-the-art MLE techniques only provide a list of important features based on heuristic measures and/or make certain assumptions about the data and the model which are not representative of the real-world data and models. Further, most are designed without considering the usefulness by an end-user in a broader context. To address these issues, we present a notion of explanation fidelity based on Shapley values from cooperative game theory. We find that all of the investigated MLE explainability methods produce explanations that are incongruent with the ML model that is being explained. This is because they make critical assumptions about feature independence and linear feature interactions for computational reasons. We also find that in deployed, explanations are rarely used due to a variety of reason including that there are several other tools which are trusted more than the explanations and there is little incentive to use the explanations. In the cases when the explanations are used, we found that there is the danger that explanations persuade the end users to wrongly accept false positives and false negatives. However, ML model developers and maintainers find the explanations more useful to help ensure that the ML model does not have obvious biases. In light of these findings, we suggest a number of future directions including developing MLE methods that directly model non-linear model interactions and including design principles that take into account the usefulness of explanations to the end user. We also augment explanations with a set of trustworthiness measures that measure geometric aspects of the data to determine if the model output should be trusted.

More Details

Quantifying Graph Uncertainty from Communication Data

Wendt, Jeremy D.; Field, Richard V.; Phillips, Cynthia A.; Prasadan, Arvind P.

Graphs are a widely used abstraction for representing a variety of important real-world problems including emulating cyber networks for situational awareness, or studying social networks to understand human interactions or pandemic spread. Communication data is often converted into graphs to help understand social and technical patterns in the underlying communication data. However, prior to this project, little work had been performed analyzing how best to develop graphs from such data. Thus, many critical, national security problems were being performed against graph representations of questionable quality. Herein, we describe our analyses that were precursors to our final statistically grounded technique for creating static graph snapshots from a stream of communication events. The first analyzes the statistical distribution properties of a variety of real-world communication datasets generally fit best by Pareto, log normal, and extreme value distributions. The second derives graph properties that can be estimated given the expected statistical distribution for communication events and the communication interval to be viewed node observability, edge observability, and expected accuracy of node degree. Unfortunately, as that final process is under review for publication, we can't publish it here at this time.

More Details

Applying Compression-Based Metrics to Seismic Data in Support of Global Nuclear Explosion Monitoring

Matzen, Laura E.; Ting, Christina T.; Field, Richard V.; Morrow, J.D.; Brogan, Ronald; Young, Christopher J.; Zhou, Angela; Trumbo, Michael C.; Coram, Jamie L.

The analysis of seismic data for evidence of possible nuclear explosion testing is a critical global security mission that relies heavily on human expertise to identify and mark seismic signals embedded in background noise. To assist analysts in making these determinations, we adapted two compression distance metrics for use with seismic data. First, we demonstrated that the Normalized Compression Distance (NCD) metric can be adapted for use with waveform data and can identify the arrival times of seismic signals. Then we tested an approximation for the NCD called Sliding Information Distance (SLID), which can be computed much faster than NCD. We assessed the accuracy of the SLID output by comparing it to both the Akaike Information Criterion (AIC) and the judgments of expert seismic analysts. Our results indicate that SLID effectively identifies arrival times and provides analysts with useful information that can aid their analysis process.

More Details

Generalized Boundary Detection Using Compression-based Analytics

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Ting, Christina T.; Field, Richard V.; Quach, Tu-Thach Q.; Bauer, Travis L.

We present a new method for boundary detection within sequential data using compression-based analytics. Our approach is to approximate the information distance between two adjacent sliding windows within the sequence. Large values in the distance metric are indicative of boundary locations. A new algorithm is developed, referred to as sliding information distance (SLID), that provides a fast, accurate, and robust approximation to the normalized information distance. A modified smoothed z-score algorithm is used to locate peaks in the distance metric, indicating boundary locations. A variety of data sources are considered, including text and audio, to demonstrate the efficacy of our approach.

More Details

Compression Analytics for Classification and Anomaly Detection Within Network Communication

IEEE Transactions on Information Forensics and Security

Ting, Christina T.; Field, Richard V.; Fisher, Andrew N.; Bauer, Travis L.

The flexibility of network communication within Internet protocols is fundamental to network function, yet this same flexibility permits the possibility of malicious use. In particular, malicious behavior can masquerade as benign traffic, thus evading systems designed to catch misuse of network resources. However, perfect imitation of benign traffic is difficult, meaning that small unintentional deviations from normal can occur. Identifying these deviations requires that the defenders know what features reveal malicious behavior. Herein, we present an application of compression-based analytics to network communication that can reduce the need for defenders to know a priori what features they need to examine. Motivating the approach is the idea that compression relies on the ability to discover and make use of predictable elements in information, thereby highlighting any deviations between expected and received content. We introduce a so-called 'slice compression' score to identify malicious or anomalous communication in two ways. First, we apply normalized compression distances to classification problems and discuss methods for reducing the noise by excising application content (as opposed to protocol features) using slice compression. Second, we present a new technique for anomaly detection, referred to as slice compression for anomaly detection. A diverse collection of datasets are analyzed to illustrate the efficacy of the proposed approaches. While our focus is network communication, other types of data are also considered to illustrate the generality of the method.

More Details

Probability Distribution of von Mises Stress in the Presence of Pre-load

Conference Proceedings of the Society for Experimental Mechanics Series

Segalman, Daniel J.; Reese, Garth M.; Field, Richard V.

Random vibration under preload is important in multiple endeavors, including those involving launch and re-entry. In these days of increasing reliance on predictive simulation, it is important to address this problem in a probabilistic manner – this is the appropriate flavor of quantification of margin and uncertainty in the context of random vibration. One of the quantities of particular interest in design is the probability distribution of von Mises stress. There are some methods in the literature that begin to address this problem, but they generally are extremely restricted and astonishingly, the most common restriction of these techniques is that they assume zero mean loads. The work presented here employs modal tools to suggest an approach for estimating the probability distributions for von Mises stress of a linear structure for the case of multiple independent Gaussian random loadings combined with a nonzero pre-load.

More Details

Quantifying the structural response of a slender cone to turbulent spots at mach 6

AIAA Scitech 2019 Forum

Robbins, Brian A.; Casper, Katya M.; Coffin, Peter C.; Mesh, Mikhail M.; Field, Richard V.

A numerical study of the response of a conical structure to periodic turbulent spot loading at Mach 6 is conducted and compared with experimental results. First, a deterministic model which describes the birthing of turbulent spots established by a defined forcing frequency as well as the evolution of the spots is derived. The model is then used to apply turbulent spot loading to a calibrated finite element model of a slender cone structure. The numerical solution yielded acceleration response data for the cone structure. These data are compared to experimental measurement. Similar damping times and acceleration amplitudes are observed for isolated spots. At higher frequencies of turbulent spot generation, the panel response corresponds to the structural natural mode shape being forced; however, only qualitative agreement is observed. Finally, the convection velocity for two cases is varied. It is shown that marginal deviations in the convection velocity of turbulent spots yields little change in the resulting response of a structure. This result illustrates that the time between spot events provides the dominant determination of which structural modes are excited.

More Details

A dynamic model for social networks

Field, Richard V.; Link, Hamilton E.; Skryzalin, Jacek S.; Wendt, Jeremy D.

Social network graph models are data structures representing entities (often people, corporations, or accounts) as "vertices" and their interactions as "edges" between pairs of vertices. These graphs are most often total-graph models — the overall structure of edges and vertices in a bidirectional or directional graph are described in global terms and the network is generated algorithmically. We are interested in "egocentrie or "agent-based" models of social networks where the behavior of the individual participants are described and the graph itself is an emergent phenomenon. Our hope is that such graph models will allow us to ultimately reason from observations back to estimated properties of the individuals and populations, and result in not only more accurate algorithms for link prediction and friend recommendation, but also a more intuitive understanding of human behavior in such systems than is revealed by previous approaches. This report documents our preliminary work in this area; we describe several past graph models, two egocentric models of our own design, and our thoughts about the future direction of this research.

More Details
Results 1–25 of 123
Results 1–25 of 123