Publications

Results 26–50 of 66
Skip to search filters

Time Series Discord Detection in Medical Data using a Parallel Relational Database

Woodbridge, Diane W.; Rintoul, Mark D.; Wilson, Andrew T.; Goldstein, Richard H.

Recent advances in sensor technology have made continuous real-time health monitoring available in both hospital and non-hospital settings. Since data collected from high frequency medical sensors includes a huge amount of data, storing and processing continuous medical data is an emerging big data area. Especially detecting anomaly in real time is important for patients’ emergency detection and prevention. A time series discord indicates a subsequence that has the maximum difference to the rest of the time series subsequences, meaning that it has abnormal or unusual data trends. In this study, we implemented two versions of time series discord detection algorithms on a high performance parallel database management system (DBMS) and applied them to 240 Hz waveform data collected from 9,723 patients. The initial brute force version of the discord detection algorithm takes each possible subsequence and calculates a distance to the nearest non-self match to find the biggest discords in time series. For the heuristic version of the algorithm, a combination of an array and a trie structure was applied to order time series data for enhancing time efficiency. The study results showed efficient data loading, decoding and discord searches in a large amount of data, benefiting from the time series discord detection algorithm and the architectural characteristics of the parallel DBMS including data compression, data pipe-lining, and task scheduling.

More Details

Trajectory analysis via a geometric feature space approach

Statistical Analysis and Data Mining

Rintoul, Mark D.; Wilson, Andrew T.

This study aimed to organize a body of trajectories in order to identify, search for and classify both common and uncommon behaviors among objects such as aircraft and ships. Existing comparison functions such as the Fréchet distance are computationally expensive and yield counterintuitive results in some cases. We propose an approach using feature vectors whose components represent succinctly the salient information in trajectories. These features incorporate basic information such as the total distance traveled and the distance between start/stop points as well as geometric features related to the properties of the convex hull, trajectory curvature and general distance geometry. Additionally, these features can generally be mapped easily to behaviors of interest to humans who are searching large databases. Most of these geometric features are invariant under rigid transformation. We demonstrate the use of different subsets of these features to identify trajectories similar to an exemplar, cluster a database of several hundred thousand trajectories and identify outliers.

More Details

PANTHER. Trajectory Analysis

Rintoul, Mark D.; Wilson, Andrew T.; Valicka, Christopher G.; Kegelmeyer, William P.; Shead, Timothy M.; Czuchlewski, Kristina R.; Newton, Benjamin D.

We want to organize a body of trajectories in order to identify, search for, classify and predict behavior among objects such as aircraft and ships. Existing compari- son functions such as the Fr'echet distance are computationally expensive and yield counterintuitive results in some cases. We propose an approach using feature vectors whose components represent succinctly the salient information in trajectories. These features incorporate basic information such as total distance traveled and distance be- tween start/stop points as well as geometric features related to the properties of the convex hull, trajectory curvature and general distance geometry. Additionally, these features can generally be mapped easily to behaviors of interest to humans that are searching large databases. Most of these geometric features are invariant under rigid transformation. We demonstrate the use of different subsets of these features to iden- tify trajectories similar to an exemplar, cluster a database of several hundred thousand trajectories, predict destination and apply unsupervised machine learning algorithms.

More Details

Nested Narratives Final Report

Wilson, Andrew T.; Pattengale, Nicholas D.; Forsythe, James C.; Carvey, Brad

In cybersecurity forensics and incident response, the story of what has happened is the most important artifact yet the one least supported by tools and techniques. Existing tools focus on gathering and manipulating low-level data to allow an analyst to investigate exactly what happened on a host system or a network. Higher-level analysis is usually left to whatever ad hoc tools and techniques an individual may have developed. We discuss visual representations of narrative in the context of cybersecurity incidents with an eye toward multi-scale illustration of actions and actors. We envision that this representation could smoothly encompass individual packets on a wire at the lowest level and nation-state-level actors at the highest. We present progress to date, discuss the impact of technical risk on this project and highlight opportunities for future work.

More Details

Facilitation of Forensic Analysis Using a Narrative Template

Procedia Manufacturing

Hopkins, Shelby; Wilson, Andrew T.; Silva, Austin R.; Forsythe, James C.

Criminal forensic analysis involves examining a collection of clues to construct a plausible account of the events associated with a crime. In this paper, a study is presented that assessed whether software tools designed to encourage construction of narrative accounts would facilitate cyber forensic analysis. Compared to a baseline condition (i.e., spreadsheet with note-taking capabilities) and a visualization condition, subjects performed best when provided tools that emphasized established components of narratives. Specifically, features that encouraged subjects to identify suspected entities, and their activities and motivations proved beneficial. It is proposed that software tools developed to facilitate cyber forensic analysis and training of cyber security professionals incorporate techniques that facilitate a narrative account of events.

More Details

Investigating the integration of supercomputers and data-Warehouse appliances

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Oldfield, Ron A.; Davidson, George; Ulmer, Craig D.; Wilson, Andrew T.

Two decades of experience with massively parallel supercomputing has given insight into the problem domains where these architectures are cost effective. Likewise experience with database machines and more recently massively parallel database appliances has shown where these architectures are valuable. Combining both architectures to simultaneously solve problems has received much less attention. In this paper, we describe a motivating application for economic modeling that requires both HPC and database capabilities. Then we discuss hardware and software integration issues related to a direct integration of a Cray XT supercomputer and a Netezza database appliance. © 2014 Springer-Verlag Berlin Heidelberg.

More Details

Evaluating parallel relational databases for medical data analysis

Wilson, Andrew T.; Rintoul, Mark D.

Hospitals have always generated and consumed large amounts of data concerning patients, treatment and outcomes. As computers and networks have permeated the hospital environment it has become feasible to collect and organize all of this data. This raises naturally the question of how to deal with the resulting mountain of information. In this report we detail a proof-of-concept test using two commercially available parallel database systems to analyze a set of real, de-identified medical records. We examine database scalability as data sizes increase as well as responsiveness under load from multiple users.

More Details

Tracking topic birth and death in LDA

Wilson, Andrew T.; Robinson, David G.

Most topic modeling algorithms that address the evolution of documents over time use the same number of topics at all times. This obscures the common occurrence in the data where new subjects arise and old ones diminish or disappear entirely. We propose an algorithm to model the birth and death of topics within an LDA-like framework. The user selects an initial number of topics, after which new topics are created and retired without further supervision. Our approach also accommodates many of the acceleration and parallelization schemes developed in recent years for standard LDA. In recent years, topic modeling algorithms such as latent semantic analysis (LSA)[17], latent Dirichlet allocation (LDA)[10] and their descendants have offered a powerful way to explore and interrogate corpora far too large for any human to grasp without assistance. Using such algorithms we are able to search for similar documents, model and track the volume of topics over time, search for correlated topics or model them with a hierarchy. Most of these algorithms are intended for use with static corpora where the number of documents and the size of the vocabulary are known in advance. Moreover, almost all current topic modeling algorithms fix the number of topics as one of the input parameters and keep it fixed across the entire corpus. While this is appropriate for static corpora, it becomes a serious handicap when analyzing time-varying data sets where topics come and go as a matter of course. This is doubly true for online algorithms that may not have the option of revising earlier results in light of new data. To be sure, these algorithms will account for changing data one way or another, but without the ability to adapt to structural changes such as entirely new topics they may do so in counterintuitive ways.

More Details
Results 26–50 of 66
Results 26–50 of 66