The flexibility of network communication within Internet protocols is fundamental to network function, yet this same flexibility permits the possibility of malicious use. In particular, malicious behavior can masquerade as benign traffic, thus evading systems designed to catch misuse of network resources. However, perfect imitation of benign traffic is difficult, meaning that small unintentional deviations from normal can occur. Identifying these deviations requires that the defenders know what features reveal malicious behavior. Herein, we present an application of compression-based analytics to network communication that can reduce the need for defenders to know a priori what features they need to examine. Motivating the approach is the idea that compression relies on the ability to discover and make use of predictable elements in information, thereby highlighting any deviations between expected and received content. We introduce a so-called 'slice compression' score to identify malicious or anomalous communication in two ways. First, we apply normalized compression distances to classification problems and discuss methods for reducing the noise by excising application content (as opposed to protocol features) using slice compression. Second, we present a new technique for anomaly detection, referred to as slice compression for anomaly detection. A diverse collection of datasets are analyzed to illustrate the efficacy of the proposed approaches. While our focus is network communication, other types of data are also considered to illustrate the generality of the method.
We present a new method for boundary detection within sequential data using compression-based analytics. Our approach is to approximate the information distance between two adjacent sliding windows within the sequence. Large values in the distance metric are indicative of boundary locations. A new algorithm is developed, referred to as sliding information distance (SLID), that provides a fast, accurate, and robust approximation to the normalized information distance. A modified smoothed z-score algorithm is used to locate peaks in the distance metric, indicating boundary locations. A variety of data sources are considered, including text and audio, to demonstrate the efficacy of our approach.
Ting, Christina; Awasthi, Neha; Hub, Jochen S.; Muller, Marcus
The formation and closure of aqueous pores in lipid bilayers is a key step in various biophysical processes. Large pores are well described by classical nucleation theory, but the free-energy landscape of small, biologically relevant pores has remained largely unexplored. The existence of small and metastable "prepores" was hypothesized decades ago from electroporation experiments, but resolving metastable prepores from theoretical models remained challenging. Using two complementary methods - atomistic simulations and self-consistent field theory of a minimal lipid model - we determine the parameters for which metastable prepores occur in lipid membranes. Both methods consistently suggest that pore metastability depends on the relative volume ratio between the lipid head group and lipid tails: lipids with a larger head-group volume fraction (or shorter saturated tails) form metastable prepores, whereas lipids with a smaller head-group volume fraction (or longer unsaturated tails) form unstable prepores.
Thousands of facilities worldwide are engaged in biological research activities. One of DTRA's missions is to fully understand the types of facilities involved in collecting, investigating, and storing biological materials. This characterization enables DTRA to increase situational awareness and identify potential partners focused on biodefense and biosecurity. As a result of this mission, DTRA created a database to identify biological facilities from publicly available, open-source information. This paper describes an on-going effort to automate data collection and entry of facilities into this database. To frame our analysis more concretely, we consider the following motivating question: How would a decision maker respond to a pathogen outbreak during the 2018 Winter Olympics in South Korea? To address this question, we aim to further characterize the existing South Korean facilities in DTRA's database, and to identify new candidate facilities for entry, so that decision makers can identify local facilities properly equipped to assist and respond to an event. We employ text and social analytics on bibliometric data from South Korean facilities and a list of select pathogen agents to identify patterns and relationships within scientific publication graphs.
On August 15, 2016, Sandia hosted a visit by Professor Venkatesh Narayanamurti. Prof Narayanamurti (Benjamin Peirce Research Professor of Technology and Public Policy at Harvard, Board Member of the Belfer Center for Science and International Affairs, former Dean of the School of Engineering and Applied Science at Harvard, former Dean of Engineering at UC Santa Barbara, and former Vice President of Division 1000 at Sandia). During the visit, a small, informal, all-day idea exploration session on "Towards an Engineering and Applied Science of Research" was conducted. This document is a brief synopsis or "footprint" of the presentations and discussions at this Idea Exploration Session. The intent of this document is to stimulate further discussion about pathways Sandia can take to improve its Research practices.
In this work we extend compression-based algorithms for deception detection in text. In contrast to approaches that rely on theories for deception to identify feature sets, compression automatically identifies the most significant features. We consider two datasets that allow us to explore deception in opinion (content) and deception in identity (stylometry). Our first approach is to use unsupervised clustering based on a normalized compression distance (NCD) between documents. Our second approach is to use Prediction by Partial Matching (PPM) to train a classifier with conditional probabilities from labeled documents, followed by arithmetic coding (AC) to classify an unknown document based on which label gives the best compression. We find a significant dependence of the classifier on the relative volume of training data used to build the conditional probability distributions of the different labels. Methods are demonstrated to overcome the data size-dependence when analytics, not information transfer, is the goal. Our results indicate that deceptive text contains structure statistically distinct from truthful text, and that this structure can be automatically detected using compression-based algorithms.
In this work we extend compression-based algorithms for deception detection in text. In contrast to approaches that rely on theories for deception to identify feature sets, compression automatically identifies the most significant features. We consider two datasets that allow us to explore deception in opinion (content) and deception in identity (stylometry). Our first approach is to use unsupervised clustering based on a normalized compression distance (NCD) between documents. Our second approach is to use Prediction by Partial Matching (PPM) to train a classifier with conditional probabilities from labeled documents, followed by arithmetic coding (AC) to classify an unknown document based on which label gives the best compression. We find a significant dependence of the classifier on the relative volume of training data used to build the conditional probability distributions of the different labels. Methods are demonstrated to overcome the data size-dependence when analytics, not information transfer, is the goal. Our results indicate that deceptive text contains structure statistically distinct from truthful text, and that this structure can be automatically detected using compression-based algorithms.
We perform molecular dynamics simulations of a coarse-grained model of ionomer melts in an applied oscillating electric field. The frequency-dependent conductivity and susceptibility are calculated directly from the current density and polarization density, respectively. At high frequencies, we find a peak in the real part of the conductivity due to plasma oscillations of the ions. At lower frequencies, the dynamic response of the ionomers depends on the ionic aggregate morphology in the system, which consists of either percolated or isolated aggregates. We show that the dynamic response of the model ionomers to the applied oscillating field can be understood by comparison with relevant time scales in the systems, obtained from independent calculations.