Publications

Results 51–75 of 181

Search results

Jump to search filters

Unsupervised Learning Through Randomized Algorithms for High-Volume High-Velocity Data (ULTRA-HV)

Pinar, Ali P.; Kolda, Tamara G.; Carlberg, Kevin T.; Ballard, Grey; Mahoney, Michael

Through long-term investments in computing, algorithms, facilities, and instrumentation, DOE is an established leader in massive-scale, high-fidelity simulations, as well as science-leading experimentation. In both cases, DOE is generating more data than it can analyze and the problem is intensifying quickly. The need for advanced algorithms that can automatically convert the abundance of data into a wealth of useful information by discovering hidden structures is well recognized. Such efforts however, are hindered by the massive volume of the data and its high velocity. Here, the challenge is developing unsupervised learning methods to discover hidden structure in high-volume, high-velocity data.

More Details

Exploiting Social Media Sensor Networks through Novel Data Fusion Techniques

Kouri, Tina M.; Pinar, Ali P.

Unprecedented amounts of data are continuously being generated by sensors (“hard” data) and by humans (“soft” data), and this data needs to be exploited to its full potential. The first step in exploiting this data is determine how the hard and soft data are related to each other. In this project we fuse hard and soft data, using the attributes of each (e.g., time and space), to gain more information about interesting events. Next, we attempt to use social networking textual data to predict the present (i.e., predict that an interesting event is occurring and details about the event) using data mining, machine learning, natural language processing, and text analysis techniques.

More Details

Accurate Characterization of Real Networks from Inaccurate Measurements

Pinar, Ali P.

Our nation's dependence on information networks makes it vital to anticipate disruptions or find weaknesses in these networks. But networks like the Internet are vast, distributed, and there is no mechanism to completely collect their structure. We are restricted to specific data collection schemes (like traceroute samples from router interfaces) that examine tiny portions of such a network. It has been empirically documented and theoretically proven that these measurements have significant biases, and direct inferences from them will be wrong. But these data collection mechanisms have limited flexibility and cannot be easily modified. Moreover, in many applications there are limits on how much data can be collected. How do we make accurate inferences of network properties with biased and limited measurements? The general problem this report deals with is how to work with incompletely observed networks. We will present several different approaches to this problem. First we will present an approach to estimate the degree distribution of a graph by sampling only a small portion of the vertices. This algorithm provides provably accurate results with sublinear samples. An alternative approach would be to try to enhance the information in the by selective collecting new information by probing for neighbors of a vertex or presence of individual edges. A different setting for working with incomplete arises when we have full access to local information, but do not have any global version of the graph. Can we still identify critical nodes in such a graph? We present an approach to identify such nodes efficiently. Finally, how can we put these ideas together to identify the structure of a network? We present an approach that can complement the existing approaches for network mapping. We start with an estimate of network structure based on existing network mapping methods. Then we find a critical router in the network, use the traffic through this network to selectively collect new data to enhance our prediction.

More Details

Understanding the Hierarchy of Dense Subgraphs in Stationary and Temporally Varying Setting

Sariyuce, Ahmet E.; Pinar, Ali P.

Graphs are widely used to model relationships in a wide variety of domains such as sociology, bioinformatics, infrastructure, the WWW, to name a few. One of the key observations is that while real-world graphs are often globally sparse, they are locally dense. In other words, the average degree is often quite small (say at most 10 in a million vertex graph), but vertex neighborhoods are often dense. Finding dense subgraphs is a critical aspect of graph mining It has been used for finding communities and spam link farms in web graphs, graph visualization, real-time story identification, DNA motif detection in biological networks, finding correlated genes, epilepsy prediction, finding price value motifs in financial data, graph compression, distance query indexing, and increasing the throughput of social networking site servers. However, most standard formulations of this problem (like clique, quasi-clique, k-densest subgraph) are NP-hard. Furthermore, current dense subgraph finding algorithms usually optimize some objective, and only find a few such subgraphs without providing any structural relations, whereas the goal is rarely to find the "true optimum," but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. In this project, we first aim to devise algorithms and provide 3 implementations with nice visualizations to find the hierarchy between dense subgraphs, and then understand the structure of the hierarchy to gain more insight on the hidden patterns in real-world networks. Another important aspects in graph analysis is the temporal nature of networks. Networks evolve over time and in many applications data arrives at a high velocity, and thus it is important to design algorithms that can process data efficiently. We report three main results towards identifying dense structures in large evolving graphs. First, we will show how the hierarchical connectedness structure can be maintained efficiently, where connectedness is defined by increasing levels of connectivity strength. Next, we present dense structure can be identified in bipartite graphs without building projection graphs. And finally, we present a new method for peeling algorithms This new approach avoids sequential nature of peeling algorithms and is amenable to parallelization, which is crucial for processing high velocity data.

More Details

Efficient Uncertainty Quantification in Stochastic Economic Dispatch

IEEE Transactions on Power Systems

Safta, Cosmin S.; Chen, Richard L.; Najm, H.N.; Pinar, Ali P.; Watson, Jean-Paul W.

Stochastic economic dispatch models address uncertainties in forecasts of renewable generation output by considering a finite number of realizations drawn from a stochastic process model, typically via Monte Carlo sampling. Accurate evaluations of expectations or higher order moments for quantities of interest, e.g., generating cost, can require a prohibitively large number of samples. We propose an alternative to Monte Carlo sampling based on polynomial chaos expansions. These representations enable efficient and accurate propagation of uncertainties in model parameters, using sparse quadrature methods. We also use Karhunen-Loève expansions for efficient representation of uncertain renewable energy generation that follows geographical and temporal correlations derived from historical data at each wind farm. Considering expected production cost, we demonstrate that the proposed approach can yield several orders of magnitude reduction in computational cost for solving stochastic economic dispatch relative to Monte Carlo sampling, for a given target error threshold.

More Details

Bounded-Degree Connected Approximations of Stochastic Networks

IEEE Transactions on Molecular, Biological, and Multi-Scale Communications

Pinar, Ali P.; Quinn, Christopher J.; Kiyavash, Negar

We propose a method to find optimal sparse connected approximations for large complex networks of nodes interacting over time. Optimality is measured by Kullback-Leibler divergence. The sparsity is controlled by the user through specifying the in-degrees. The approximations have spanning tree subgraphs, enabling them to depict flow through a network. They can also depict feedback. The approximations can capture both significant statistical dependencies in, and preserve salient topological features of, the full (unknown) network. The proposed method can be applied to networks without restrictions on the dynamics beyond stationarity. We analyze computational and sample complexity. We demonstrate their efficacy using a large-scale simulation of the turtle visual cortex and experimental data from the primate motor cortex.

More Details

Escape: Efficiently counting all 5-vertex subgraphs

26th International World Wide Web Conference, WWW 2017

Pinar, Ali P.; Seshadhri, C.; Vishal, Vaidyanathan

Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex or 5-vertex patterns is highly challenging, and there are few practical results known that can scale to massive sizes. We introduce an algorithmic framework that can be adopted to count any small pattern in a graph and apply this framework to compute exact counts for all 5-vertex subgraphs. Our framework is built on cutting a pattern into smaller ones, and using counts of smaller patterns to get larger counts. Furthermore, we exploit degree orientations of the graph to reduce runtimes even further. These methods avoid the combinatorial explosion that typical subgraph counting algorithms face. We prove that it suffices to enumerate only four specific subgraphs (three of them have less than 5 vertices) to exactly count all 5-vertex patterns. We perform extensive empirical experiments on a variety of real-world graphs. We are able to compute counts of graphs with tens of millions of edges in minutes on a commodity machine. To the best of our knowledge, this is the first practical algorithm for 5-vertex pattern counting that runs at this scale. A stepping stone to our main algorithm is a fast method for counting all 4-vertex patterns. This algorithm is typically ten times faster than the state of the art 4-vertex counters.

More Details

MaxReach: Reducing network incompleteness through node probes

Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016

Soundarajan, Sucheta; Eliassi-Rad, Tina; Gallagher, Brian; Pinar, Ali P.

Real-world network datasets are often incomplete. Subsequently, any analysis on such networks is likely to produce skewed results. We examine the following problem: given an incomplete network, which b nodes should be probed to bring as many new nodes as possible into the observed network? For instance, consider someone who has observed a portion (say 1%) of the Twitter network. How should she use a limited budget to reduce the incompleteness of the network? In this work, we propose a novel algorithm, called MAXREACH, which uses a budget b to increase the number of nodes in the observed network. Our experiments, across a range of datasets and conditions, demonstrate the efficacy of MAXREACH.

More Details

Trigger Detection for Adaptive Scientific Workflows Using Percentile Sampling

SIAM Journal on Scientific Computing (Online)

Pinar, Ali P.; Bennett, Janine C.; Salloum, Maher S.; Bhagatwala, Ankit B.; Chen, Jacqueline H.; Seshadhri, C.

The increasing complexity of both scientific simulations and high-performance computing system architectures are driving the need for adaptive workflows, in which the composition and execution of computational and data manipulation steps dynamically depend on the evolutionary state of the simulation itself. Consider, for example, the frequency of data storage. Critical phases of the simulation should be captured with high frequency and with high fidelity for postanalysis; however, we cannot afford to retain the same frequency for the full simulation due to the high cost of data movement. We can instead look for triggers, indicators that the simulation will be entering a critical phase, and adapt the workflow accordingly. In this paper, we present a methodology for detecting triggers and demonstrate its use in the context of direct numerical simulations of turbulent combustion using S3D. We show that chemical explosive mode analysis (CEMA) can be used to devise a noise-tolerant indicator for rapid increase in heat release. However, exhaustive computation of CEMA values dominates the total simulation, and thus is prohibitively expensive. To overcome this computational bottleneck, we propose a quantile sampling approach. Our sampling-based algorithm comes with provable error/confidence bounds, as a function of the number of samples. Most importantly, the number of samples is independent of the problem size, and thus our proposed sampling algorithm offers perfect scalability. Our experiments on homogeneous charge compression ignition and reactivity controlled compression ignition simulations show that the proposed method can detect rapid increases in heat release, and its computational overhead is negligible. Our results will be used to make dynamic workflow decisions regarding data storage and mesh resolution in future combustion simulations.

More Details

Sparse approximations of directed information graphs

IEEE International Symposium on Information Theory - Proceedings

Quinn, Christopher J.; Pinar, Ali P.; Gao, Jing; Su, Lu

Given a network of agents interacting over time, which few interactions best characterize the dynamics of the whole network? We propose an algorithm that finds the optimal sparse approximation of a network. The user controls the level of sparsity by specifying the total number of edges. The networks are represented using directed information graphs, a graphical model that depicts causal influences between agents in a network. Goodness of approximation is measured with Kullback-Leibler divergence. The algorithm finds the best approximation with no assumptions on the topology or the class of the joint distribution.

More Details

Measuring and Modeling Bipartite Graphs with Community Structure

Kolda, Tamara G.; Pinar, Ali P.; Aksoy, Sinan

Network science is a powerful tool for analyzing complex systems in fields ranging from sociology to engineering to biology. This paper is focused on generative models of bipartite graphs, also known as two-way graphs. We propose two generative models that can be easily tuned to reproduce the characteristics of real-world networks, not just qualitatively, but quantitatively. The measurements we consider are the degree distributions and the bipartite clustering coefficient, which we refer to as the metamorphosis coefficient. We define edge, node, and degreewise metamorphosis coefficients, enabling a more detailed understand of the bipartite community structure. Our proposed bipartite Chung-Lu model is able to reproduce real-world degree distributions, and our proposed bipartite “BTER” model reproduces both the degree distributions as well as the degreewise metamorphosis coefficients. We demonstrate the effectiveness of these models on several real-world data sets.

More Details
Results 51–75 of 181
Results 51–75 of 181