Publications / SAND Report

Hybridizing Classifiers and Collection Systems to Maximize Intelligence and Minimize Uncertainty in National Security Data Analytics Applications

Staid, Andrea S.; Valicka, Christopher G.

There are numerous applications that combine data collected from sensors with machine-learning based classification models to predict the type of event or objects observed. Both the collection of the data itself and the classification models can be tuned for optimal performance, but we hypothesize that additional gains can be realized by jointly assessing both factors together. Through this research, we used a seismic event dataset and two neural network classification models that issued probabilistic predictions on each event to determine whether it was an earthquake or a quarry blast. Real world applications will have constraints on data collection, perhaps in terms of a budget for the number of sensors or on where, when, or how data can be collected. We mimicked such constraints by creating subnetworks of sensors with both size and locational constraints. We compare different methods of determining the set of sensors in each subnetwork in terms of their predictive accuracy and the number of events that they observe overall. Additionally, we take the classifiers into account, treating them both as black-box models and testing out various ways of combining predictions among models and among the set of sensors that observe any given event. We find that comparable overall performance can be seen with less than half the number of sensors in the full network. Additionally, a voting scheme that uses the average confidence across the sensors for a given event shows improved predictive accuracy across nearly all subnetworks. Lastly, locational constraints matter, but sometimes in unintuitive ways, as better-performing sensors may be chosen instead of the ones excluded based on location. This being a short-term research effort, we offer a lengthy discussion on interesting next-steps and ties to other ongoing research efforts that we did not have time to pursue. These include a detailed analysis of the subnetwork performance broken down by event type, specific location, and model confidence. This project also included a Campus Executive research partnership with Texas A&M University. Through this, we worked with a professor and student to study information gain for UAV routing. This was an alternative way of looking at the similar problem space that includes sensor operation for data collection and the resulting benefit to be gained from it. This work is described in an Appendix.