Publications

Results 26–50 of 66

Search results

Jump to search filters

Trajectory analysis via a geometric feature space approach

Statistical Analysis and Data Mining

Laros, James H.; Wilson, Andrew T.

This study aimed to organize a body of trajectories in order to identify, search for and classify both common and uncommon behaviors among objects such as aircraft and ships. Existing comparison functions such as the Fréchet distance are computationally expensive and yield counterintuitive results in some cases. We propose an approach using feature vectors whose components represent succinctly the salient information in trajectories. These features incorporate basic information such as the total distance traveled and the distance between start/stop points as well as geometric features related to the properties of the convex hull, trajectory curvature and general distance geometry. Additionally, these features can generally be mapped easily to behaviors of interest to humans who are searching large databases. Most of these geometric features are invariant under rigid transformation. We demonstrate the use of different subsets of these features to identify trajectories similar to an exemplar, cluster a database of several hundred thousand trajectories and identify outliers.

More Details

PANTHER. Trajectory Analysis

Laros, James H.; Wilson, Andrew T.; Valicka, Christopher G.; Kegelmeyer, William P.; Shead, Timothy M.; Czuchlewski, Kristina R.; Newton, Benjamin D.

We want to organize a body of trajectories in order to identify, search for, classify and predict behavior among objects such as aircraft and ships. Existing compari- son functions such as the Fr'echet distance are computationally expensive and yield counterintuitive results in some cases. We propose an approach using feature vectors whose components represent succinctly the salient information in trajectories. These features incorporate basic information such as total distance traveled and distance be- tween start/stop points as well as geometric features related to the properties of the convex hull, trajectory curvature and general distance geometry. Additionally, these features can generally be mapped easily to behaviors of interest to humans that are searching large databases. Most of these geometric features are invariant under rigid transformation. We demonstrate the use of different subsets of these features to iden- tify trajectories similar to an exemplar, cluster a database of several hundred thousand trajectories, predict destination and apply unsupervised machine learning algorithms.

More Details

Nested Narratives Final Report

Wilson, Andrew T.; Pattengale, Nicholas D.; Forsythe, James C.; Carvey, Brad

In cybersecurity forensics and incident response, the story of what has happened is the most important artifact yet the one least supported by tools and techniques. Existing tools focus on gathering and manipulating low-level data to allow an analyst to investigate exactly what happened on a host system or a network. Higher-level analysis is usually left to whatever ad hoc tools and techniques an individual may have developed. We discuss visual representations of narrative in the context of cybersecurity incidents with an eye toward multi-scale illustration of actions and actors. We envision that this representation could smoothly encompass individual packets on a wire at the lowest level and nation-state-level actors at the highest. We present progress to date, discuss the impact of technical risk on this project and highlight opportunities for future work.

More Details

Factors contributing to performance for cyber security forensic analysis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Hopkins, Shelby E.; Wilson, Andrew T.; Silva, Austin R.; Forsythe, James C.

Previously, the current authors (Hopkins et al. 2015) described research in which subjects provided a tool that facilitated their construction of a narrative account of events performed better in conducting cyber security forensic analysis. The narrative tool offered several distinct features. In the current paper, an analysis is reported that considered which features of the tool contributed to superior performance. This analysis revealed two features that accounted for a statistically significant portion of the variance in performance. The first feature provided a mechanism for subjects to identify suspected perpetrators of the crimes and their motives. The second feature involved the ability to create an annotated visuospatial diagram of clues regarding the crimes and their relationships to one another. Based on these results, guidance may be provided for the development of software tools meant to aid cyber security professionals in conducting forensic analysis.

More Details

Investigating the integration of supercomputers and data-Warehouse appliances

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Oldfield, Ron A.; Davidson, George; Ulmer, Craig D.; Wilson, Andrew T.

Two decades of experience with massively parallel supercomputing has given insight into the problem domains where these architectures are cost effective. Likewise experience with database machines and more recently massively parallel database appliances has shown where these architectures are valuable. Combining both architectures to simultaneously solve problems has received much less attention. In this paper, we describe a motivating application for economic modeling that requires both HPC and database capabilities. Then we discuss hardware and software integration issues related to a direct integration of a Cray XT supercomputer and a Netezza database appliance. © 2014 Springer-Verlag Berlin Heidelberg.

More Details

Evaluating parallel relational databases for medical data analysis

Wilson, Andrew T.; Rintoul, Mark D.

Hospitals have always generated and consumed large amounts of data concerning patients, treatment and outcomes. As computers and networks have permeated the hospital environment it has become feasible to collect and organize all of this data. This raises naturally the question of how to deal with the resulting mountain of information. In this report we detail a proof-of-concept test using two commercially available parallel database systems to analyze a set of real, de-identified medical records. We examine database scalability as data sizes increase as well as responsiveness under load from multiple users.

More Details

Tracking topic birth and death in LDA

Wilson, Andrew T.; Robinson, David G.

Most topic modeling algorithms that address the evolution of documents over time use the same number of topics at all times. This obscures the common occurrence in the data where new subjects arise and old ones diminish or disappear entirely. We propose an algorithm to model the birth and death of topics within an LDA-like framework. The user selects an initial number of topics, after which new topics are created and retired without further supervision. Our approach also accommodates many of the acceleration and parallelization schemes developed in recent years for standard LDA. In recent years, topic modeling algorithms such as latent semantic analysis (LSA)[17], latent Dirichlet allocation (LDA)[10] and their descendants have offered a powerful way to explore and interrogate corpora far too large for any human to grasp without assistance. Using such algorithms we are able to search for similar documents, model and track the volume of topics over time, search for correlated topics or model them with a hierarchy. Most of these algorithms are intended for use with static corpora where the number of documents and the size of the vocabulary are known in advance. Moreover, almost all current topic modeling algorithms fix the number of topics as one of the input parameters and keep it fixed across the entire corpus. While this is appropriate for static corpora, it becomes a serious handicap when analyzing time-varying data sets where topics come and go as a matter of course. This is doubly true for online algorithms that may not have the option of revising earlier results in light of new data. To be sure, these algorithms will account for changing data one way or another, but without the ability to adapt to structural changes such as entirely new topics they may do so in counterintuitive ways.

More Details
Results 26–50 of 66
Results 26–50 of 66