Data science digs through an avalanche of information
Sandia’s Data Science Research Challenge doesn’t look like a typical research challenge. Rather than a single challenge that all participating groups are working on, it includes a large collection of different projects, divided into seven areas, whose work covers a wide range of mission areas.
“The large collection of data science projects spans all of the missions of Sandia, including nuclear weapons, but also cyber, remote sensing, critical U.S. infrastructure and others,” says Enhanced Decision Making senior manager and Data Science Research Challenge co-deputy John Feddema.
He says data science as a discipline has existed for more than 30 years; however, it recently became a fast-growing field of study because of the explosion of big data analysis on cloud computing platforms.
Data science research areas Sandia is concentrating on include:
- Feature extraction, including tensor analysis
- Graph and clustering analysis
- Event correlation and classification
- Uncertainty quantification
- Advanced analytic environments
- Multi-INT (multiple intelligence) data and model fusion
- Visualization and human cognition
- Sensor tasking and planning
At its core, data science involves making sense of data for many different reasons. Expertise in data science requires skills and perspectives from mathematics, statistics, computer science, and also detailed subject-matter expertise. “For nuclear nonproliferation, we want to know where other countries are in their nuclear ambitions. For cyber-security and critical infrastructures in the United States such as power grids, we want to know that our assets are secure,” Feddema says.
He says Sandia is developing algorithms and software tools to deeply analyze data and come up with results detailed and robust enough to support high-consequence missions such as maintaining the nation’s nuclear stockpile.
“Many companies are using machine learning and neural networks for classification, and these commercial applications are very good. But you have to bring it a notch beyond the current commercial techniques for national security applications. If a commercial application renders a street incorrectly, national security isn’t compromised. Sandia is developing new machine learning algorithms to assist in high-consequence systems, when you have to be right every time,” Feddema says.
The Research Challenge, which is two years into its 10-year roadmap, focuses on geospatial problems. “If you’re looking for ‘patterns of life’ activities in remote sensing data and you want to pick out off-normal events, then you want to look at the statistics of those normal trends,” Feddema says.
He says that pulling data off the web, or generating data via remote sensing systems like synthetic aperture radar (SAR), can result in petabytes of data, but ultimately sense needs to be made of the data. “When we have a large amount of data and we have to boil it down, we try to reduce things down to a set of features,” Feddema says. “Sparse information is represented in graphs very well. At the end of the processing chain is visualization and human cognition, to give people the right information they can be confident in to make decisions. And next is sensory planning — if you have these expensive assets you need to put sensors in the right place to gather sensor information.”