Publications Search

Proc. of 2nd Int. Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine 2013 - Held in Conj. with SIGKDD 2013 Conf.

Berry, Jonathan; Phillips, Cynthia A.; Plimpton, Steven J.; Shead, Timothy M.

We present an algorithm to maintain the connected components of a graph that arrives as an infinite stream of edges. We formalize the algorithm on X-stream, a new parallel theoretical computational model for infinite streams. Connectivity-related queries, including component spanning trees, are supported with some latency, returning the state of the graph at the time of the query. Because an infinite stream may eventually exceed the storage limits of any number of finite-memory processors, we assume an aging command or daemon where "uninteresting" edges are removed when the system nears capacity. Following an aging command the system will block queries until its data structures are repaired, but edges will continue to be accepted from the stream, never dropped. The algorithm will not fail unless a model-specific constant fraction of the aggregate memory across all processors is full. In normal operation, it will not fail unless aggregate memory is completely full. Unlike previous theoretical streaming models designed for finite graphs that assume a single shared memory machine or require arbitrary-size intemediate files, X-stream distributes a graph over a ring network of finite-memory processors. Though the model is synchronous and reminiscent of systolic algorithms, our implementation uses an asynchronous message-passing system. We argue the correctness of our X-stream connected components algorithm, and give preliminary experimental results on synthetic and real graph streams.

More Details

TYPE Presentation YEAR 2013

OSTI Scopus

Perspectives for computational modeling of cell replacement for neurological disorders

Frontiers in Computational Neuroscience

Aimone, James B.

Mathematical modeling of anatomically-constrained neural networks has provided significant insights regarding the response of networks to neurological disorders or injury. A logical extension of these models is to incorporate treatment regimens to investigate network responses to intervention. The addition of nascent neurons from stem cell precursors into damaged or diseased tissue has been used as a successful therapeutic tool in recent decades. Interestingly, models have been developed to examine the incorporation of new neurons into intact adult structures, particularly the dentate granule neurons of the hippocampus. These studies suggest that the unique properties of maturing neurons, can impact circuit behavior in unanticipated ways. In this perspective, we review the current status of models used to examine damaged CNS structures with particular focus on cortical damage due to stroke. Secondly, we suggest that computational modeling of cell replacement therapies can be made feasible by implementing approaches taken by current models of adult neurogenesis. The development of these models is critical for generating hypotheses regarding transplant therapies and improving outcomes by tailoring transplants to desired effects.

More Details

TYPE Journal Article YEAR 2013

OSTI DOI

The impact of hybrid-core processors on MPI message rate

ACM International Conference Proceeding Series

Barrett, Brian; Brightwell, Ronald B.; Hammond, Simon; Hemmert, Karl S.

Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insight into how MPI implementations for future hybrid-core processors should be designed.

More Details

TYPE Conference YEAR 2013

OSTI Scopus

Kokkos: Enabling performance portability across manycore architectures

Proceedings - 2013 Extreme Scaling Workshop, XSW 2013

Edwards, Harold C.; Trott, Christian R.

The manycore revolution in computational hardware can be characterized by increasing thread counts, decreasing memory per thread, and architecture specific performance constraints for memory access patterns. High performance computing (HPC) on emerging many core architectures requires codes to exploit every opportunity for thread-level parallelism and satisfy conflicting performance constraints. We developed the Kokkos C++ library to provide scientific and engineering codes with a user accessible many core performance portable programming model. The two foundational abstractions of Kokkos are (1) dispatch work to a many core device for parallel execution and (2) manage multidimensional arrays with polymorphic layouts. The integration of these abstractions enables users' code to satisfy multiple architecture specific memory access pattern performance constraints without having to modify their source code. In this paper we describe the Kokkos abstractions, summarize its application programmer interface (API), and present performance results for a molecular dynamics computational kernel and finite element mini-application. © 2013 IEEE.

More Details

TYPE Conference YEAR 2013

OSTI Scopus

Scalable matrix computations on large scale-free graphs using 2D graph partitioning

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran

Scalable parallel computing is essential for processing large scale-free (power-law) graphs. The distribution of data across processes becomes important on distributed-memory computers with thousands of cores. It has been shown that two dimensional layouts (edge partitioning) can have significant advantages over traditional one-dimensional layouts. However, simple 2D block distribution does not use the structure of the graph, and more advanced 2D partitioning methods are too expensive for large graphs. We propose a new two-dimensional partitioning algorithm that combines graph partitioning with 2D block distribution. The computational cost of the algorithm is essentially the same as 1D graph partitioning. We study the performance of sparse matrix-vector multiplication (SpMV) for scale-free graphs from the web and social networks using several different partitioners and both 1D and 2D data layouts. We show that SpMV run time is reduced by exploiting the graph's structure. Contrary to popular belief, we observe that current graph and hypergraph partitioners often yield relatively good partitions on scale-free graphs. We demonstrate that our new 2D partitioning method consistently outperforms the other methods considered, for both SpMV and an eigensolver, on matrices with up to 1.6 billion nonzeros using up to 16,384 cores. Copyright 2013 ACM.

More Details

TYPE Conference YEAR 2013

DOI OSTI Scopus