Publications Details
Modeling Complex Relationships in Large-Scale Data using Hypergraphs (LDRD Final Report)
Dunlavy, Daniel D.; Wang, Fulton W.; Wolf, Michael W.; Ellingwood, Nathan D.
This SAND report documents the findings of the LDRD project, "Modeling Complex Relationships in Large-Scale Data using Hypergraphs". The project ran from October 2017 through September 2019. The focus of the project was the development and application of hypergraph data analytics to Sandia relational data applications. In this project, we attempted to apply a hypergraph data analysis method—specifically, hypergraph eigenvector centrality—to Sandia mission problems to identify influential entities (people, location, times, etc.) in the data. Unfortunately, the application data led to graph and hypergraph representations containing disconnected components. To date, there are no well-established techniques for applying eigenvector centrality to such graphs and hypergraphs. In this report, we present several heuristics for computing eigenvector centrality for disconnected graphs. We believe this is an important start to understanding how to approach the similar problem for hypergraphs, but this project concluded before we made progress on that problem. The ideas, methods, and suggestions presented here can be used for further research into this challenging problem. We also present our ideas for generating graphs with known degree and centrality distributions. The goal in presenting this work is to identify a procedure for analyzing such graphs once the problem of addressing disconnected components has been addressed. When working with a single data set, this generator can be used to create many instances of graphs that can be used to analyze the robustness of the centrality computations for the original data set. Although the results did not match perfectly in the case of the Facebook Ego dataset used in the experiments presented here, this again represents a good start in the direction of a graph generator for such problems. We note that there are potential trade-offs between how the degree and centrality distributions are fit to the original data and suggested several possible avenues for follow-on research efforts.