Publications Search

Using Tpetra without CUDA UVM

Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI OSTI

Trilinos Users Group Data Services Update

Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI OSTI

Integrating PGAS and MPI-based Graph Analysis

Mccrary, Trevor M.; Devine, Karen D.; Younge, Andrew J.

This project demonstrates that Chapel programs can interface with MPI-based libraries written in C++ without storing multiple copies of shared data. Chapel is a language for productive parallel computing using global address spaces (PGAS). We identified two approaches to interface Chapel code with the MPI-based Grafiki and Trilinos libraries. The first uses a single Chapel executable to call a C function that interacts with the C++ libraries. The second uses the mmap function to allow separate executables to read and write to the same block of memory on a node. We also encapsulated the second approach in Docker/Singularity containers to maximize ease of use. Comparisons of the two approaches using shared and distributed memory installations of Chapel show that both approaches provide similar scalability and performance.

More Details

TYPE Other Report YEAR 2021

DOI OSTI

Integrated System and Application Continuous Performance Monitoring and Analysis Capability

Brandt, James M.; Cook, Jeanine C.; Aaziz, Omar R.; Allan, Benjamin A.; Devine, Karen D.; Laros, James H.; Gentile, Ann C.; Hammond, Simon D.; Kelley, Brian M.; Lopatina, Lena; Moore, Stan G.; Olivier, Stephen L.; Laros, James H.; Poliakoff, David Z.; Pawlowski, Roger P.; Regier, Phillip A.; Schmitz, Mark E.; Schwaller, Benjamin S.; Surjadidjaja, Vanessa S.; Swan, Matthew S.; Tucker, Tom; Tucker, Nick; Vaughan, Courtenay T.; Walton, Sara P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Integrated System and Application Continuous Performance Monitoring and Analysis Capability

Aaziz, Omar R.; Allan, Benjamin A.; Brandt, James M.; Cook, Jeanine C.; Devine, Karen D.; Elliott, James E.; Gentile, Ann C.; Hammond, Simon D.; Kelley, Brian M.; Lopatina, Lena; Moore, Stan G.; Olivier, Stephen L.; Laros, James H.; Poliakoff, David Z.; Pawlowski, Roger P.; Regier, Phillip A.; Schmitz, Mark E.; Schwaller, Benjamin S.; Surjadidjaja, Vanessa S.; Swan, Matthew S.; Tucker, Nick; Tucker, Thomas; Vaughan, Courtenay T.; Walton, Sara P.

Scientific applications run on high-performance computing (HPC) systems are critical for many national security missions within Sandia and the NNSA complex. However, these applications often face performance degradation and even failures that are challenging to diagnose. To provide unprecedented insight into these issues, the HPC Development, HPC Systems, Computational Science, and Plasma Theory & Simulation departments at Sandia crafted and completed their FY21 ASC Level 2 milestone entitled "Integrated System and Application Continuous Performance Monitoring and Analysis Capability." The milestone created a novel integrated HPC system and application monitoring and analysis capability by extending Sandia's Kokkos application portability framework, Lightweight Distributed Metric Service (LDMS) monitoring tool, and scalable storage, analysis, and visualization pipeline. The extensions to Kokkos and LDMS enable collection and storage of application data during run time, as it is generated, with negligible overhead. This data is combined with HPC system data within the extended analysis pipeline to present relevant visualizations of derived system and application metrics that can be viewed at run time or post run. This new capability was evaluated using several week-long, 290-node runs of Sandia's ElectroMagnetic Plasma In Realistic Environments ( EMPIRE ) modeling and design tool and resulted in 1TB of application data and 50TB of system data. EMPIRE developers remarked this capability was incredibly helpful for quickly assessing application health and performance alongside system state. In short, this milestone work built the foundation for expansive HPC system and application data collection, storage, analysis, visualization, and feedback framework that will increase total scientific output of Sandia's HPC users.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

Integrating PGAS and MPI-Based Graph Analysis

McCrary, Trevor M.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

ExaGraph: Partitioning and Coloring

Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.; Acer, Seher; Bogle, Ian A.; Slota, George; Madduri, Kamesh; Gilbert, Michael

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

A Career of Load Balancing

Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Removal of the UVM Requirement from Tpetra: MultiVector and BlockMultiVector

Devine, Karen D.; Danielson, Geoffrey C.; Fuller, Timothy J.; Hu, Jonathan J.; Kelley, Brian M.; Kim, Kyungjoo K.; Siefert, Christopher S.; Smith, Timothy A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Advanced Partitioning Strategies for Scalable Remapping in Climate Models

Grindeanu, Iulian; Mahadevan, Vijay; Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2021

DOI OSTI

Distributed Memory Graph Coloring Algorithms for Multiple GPUs

Proceedings of IA3 2020: 10th Workshop on Irregular Applications: Architectures and Algorithms, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Bogle, Ian A.; Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.; Slota, George M.

Graph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring on a single GPU or in distributed memory, but hybrid MPI+GPU algorithms have been unexplored until this work, to the best of our knowledge. We present several MPI+GPU coloring approaches that use implementations of the distributed coloring algorithms of Gebremedhin et al. and the shared-memory algorithms of Deveci et al. The on-node parallel coloring uses implementations in KokkosKernels, which provide parallelization for both multicore CPUs and GPUs. We further extend our approaches to solve for distance-2 coloring, giving the first known distributed and multi-GPU algorithm for this problem. In addition, we propose novel methods to reduce communication in distributed graph coloring. Our experiments show that our approaches operate efficiently on inputs too large to fit on a single GPU and scale up to graphs with 76.7 billion edges running on 128 GPUs.

More Details

TYPE Conference Paper YEAR 2020

OSTI Scopus

Distributed Graph Coloring on Multiple GPUs

Bogle, Ian A.; Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.; Slota, George

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2020

DOI OSTI

Distributed Memory Graph Coloring Algorithms for Multiple GPUs

Bogle, Ian A.; Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.; Slota, George M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

State of the Tpetra Linear Solver Stack

Siefert, Christopher S.; Devine, Karen D.; Hoemmen, Mark F.; Hu, Jonathan J.; Kelley, Brian M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Attributing Performance Variation from Integrated Application and System Data

Aaziz, Omar R.; Allan, Benjamin A.; Brandt, James M.; Cook, Jeanine C.; Devine, Karen D.; Laros, James H.; Gentile, Ann C.; Olivier, Stephen L.; Laros, James H.; Tucker, Thomas

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Distributed Biconnectivity

Bogle, Ian A.; Slota, George M.; Rajamanickam, Sivasankaran R.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Task Placement for Improvement of Parallel Scalability

Ellis, John E.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Trilinos Data Services 2019

Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Geometric mapping of tasks to processors on parallel computers with mesh or torus networks

IEEE Transactions on Parallel and Distributed Systems

Deveci, Mehmet; Devine, Karen D.; Laros, James H.; Taylor, Mark A.; Rajamanickam, Sivasankaran R.; Catalyurek, Umit V.

We present a new method for reducing parallel applications’ communication time by mapping their MPI tasks to processors in a way that lowers the distance messages travel and the amount of congestion in the network. Assuming geometric proximity among the tasks is a good approximation of their communication interdependence, we use a geometric partitioning algorithm to order both the tasks and the processors, assigning task parts to the corresponding processor parts. In this way, interdependent tasks are assigned to “nearby” cores in the network. We also present a number of algorithmic optimizations that exploit specific features of the network or application to further improve the quality of the mapping. We specifically address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network. However, our methods generalize to contiguous allocations as well, and results are shown for both contiguous and non-contiguous allocations. We show that, for the structured finite difference mini-application MiniGhost, our mapping methods reduced communication time up to 75 percent relative to MiniGhost’s default mapping on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modeling code E3SM/HOMME, our methods reduced communication time up to 31% on 16K cores of an IBM BlueGene/Q with contiguous allocation.

More Details

TYPE Journal Article YEAR 2019

DOI OSTI Scopus

A parallel graph algorithm for detecting mesh singularities in distributed memory ice sheet simulations

ACM International Conference Proceeding Series

Bogle, Ian A.; Devine, Karen D.; Perego, Mauro P.; Rajamanickam, Sivasankaran R.; Slota, George M.

We present a new, distributed-memory parallel algorithm for detection of degenerate mesh features that can cause singularities in ice sheet mesh simulations. Identifying and removing mesh features such as disconnected components (icebergs) or hinge vertices (peninsulas of ice detached from the land) can significantly improve the convergence of iterative solvers. Because the ice sheet evolves during the course of a simulation, it is important that the detection algorithm can run in situ with the simulation - - running in parallel and taking a negligible amount of computation time - - so that degenerate features (e.g., calving icebergs) can be detected as they develop. We present a distributed memory, BFS-based label-propagation approach to degenerate feature detection that is efficient enough to be called at each step of an ice sheet simulation, while correctly identifying all degenerate features of an ice sheet mesh. Our method finds all degenerate features in a mesh with 13 million vertices in 0.0561 seconds on 1536 cores in the MPAS Albany Land Ice (MALI) model. Compared to the previously used serial pre-processing approach, we observe a 46,000x speedup for our algorithm, and provide additional capability to do dynamic detection of degenerate features in the simulation.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI Scopus

FASTMath: Kokkos Kernels and Linear Solvers

Rajamanickam, Sivasankaran R.; Bogle, Ian A.; Hu, Jonathan J.; Devine, Karen D.; Slota, George M.; Perego, Mauro P.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

FASTMath: Frameworks Algorithms and Scalable Technologies for Mathematics

Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

A Parallel Graph Algorithm for Detecting Mesh Singularities in Distributed Memory Ice Sheet Simulations

Bogle, Ian; Devine, Karen D.; Perego, Mauro P.; Rajamanickam, Sivasankaran R.; Slota, George M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI

Parallel Sparse Tensor Decomposition with the Trilinos Parallel Linear Algebra Framework

Devine, Karen D.; Kolda, Tamara G.; Phipps, Eric T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Reducing E3SM Communication through Task Mapping

Ellis, John E.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Exploiting Scientific Software to Solve Problems in Data Analytics

Devine, Karen D.; Boman, Erik G.; Dunlavy, Daniel D.; Kolda, Tamara G.; Wolf, Michael W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Trilinos Framework and Solvers

Rajamanickam, Sivasankaran R.; Hu, Jonathan J.; Devine, Karen D.; Wolf, Michael W.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Tpetra and Data Services

Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Exploiting Geometric Partitioning in Task Mapping for Parallel Computes

Deveci, Mehmet D.; Devine, Karen D.; Laros, James H.; Taylor, Mark A.; Rajamanickam, Sivasankaran R.; Catalyurek, Umit V.

We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that applications' communication time is reduced. We address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network, although our methods generalize to contiguous allocations as well. The goal is to assign tasks to cores so that interdependent tasks are performed by "nearby' cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We also present a number of algorithmic optimizations that exploit specific features of the network or application. We show that, for the structured finite difference mini-application MiniGhost, our mapping methods reduced communication time up to 75% relative to MiniGhost's default mapping on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modeling code E3SM/HOMME, our methods reduced communication time up to 31% on 32K cores of an IBM BlueGene/Q with contiguous allocation.

More Details

TYPE Other Report YEAR 2018

DOI OSTI

ExaGraph at Sandia: Graph Coloring Clustering and Partitioning for Exascale Computing

Boman, Erik G.; Deveci, Mehmet; Devine, Karen D.; Rajamanickam, Sivasankaran R.; Wolf, Michael W.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

ExaGraph: Combinatorial Methods for Enabling Exascale Applications

Author, No; Halappanavar, Mahantesh; Buluc, Aydin; Boman, Erik G.; Pothen, Alex; Tumeo, Antonino; Azad, Ariful; Khan, Arif; Ferdous, Sm; Rajamanickam, Sivasankaran R.; Wolf, Michael W.; Deveci, Mehmet D.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Partitioning Trillion-edge Graphs in Minutes

Slota, George; Rajamanickam, Sivasankaran R.; Devine, Karen D.; Madduri, Kamesh

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

Trilinos Data Services

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Parallel Tensor Decompositions for Massive Heterogeneous Incomplete Data

Phipps, Eric T.; Kolda, Tamara G.; Anderson-Bergman, Clifford I.; Devine, Karen D.; Dunlavy, Daniel D.; Hong, David H.; Vuduc, Richard; Li, Jaijai; Young, Jeff; Ballard, Grey

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Task Placement to Reduce Application Communication Costs

Devine, Karen D.; Brandt, James M.; Deveci, Mehmet D.; Gentile, Ann C.; Leung, Vitus J.; Olivier, Stephen L.; Laros, James H.; Rajamanickam, Sivasankaran R.; Taylor, Mark A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

A computational spectral graph theory tutorial

Lehoucq, Richard B.; Boman, Erik G.; Devine, Karen D.; Berry, Jonathan W.; Dunlavy, Daniel D.; Wolf, Michael W.; Van HensonVan; Sanders, Geoff

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Enabling Low Mach Fluid Simulations Using Trilinos

Hu, Jonathan J.; Devine, Karen D.; Hoemmen, Mark F.; Lin, Paul L.; Rajamanickam, Sivasankaran R.; Roberts, Nathan V.; Siefert, Christopher S.; Trott, Christian R.; Prokopenko, Andrey

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI OSTI

Load Balancing throughout My Career

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Distributing linear systems for parallel computation

Devine, Karen D.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI OSTI

The Zoltan2 Toolkit: Partitioning Task Placement Coloring and Ordering

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI OSTI

Trilinos NGP Planning

Rajamanickam, Sivasankaran R.; Devine, Karen D.; Hu, Jonathan J.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI OSTI

Parallel Graph Coloring for Manycore Architectures

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Deveci, Mehmet D.; Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.

Graph algorithms are challenging to parallelize on manycore architectures due to complex data dependencies and irregular memory access. We consider the well studied problem of coloring the vertices of a graph. In many applications it is important to compute a coloring with few colors in near-lineartime. In parallel, the optimistic (speculative) coloring method by Gebremedhin and Manne is the preferred approach but it needs to be modified for manycore architectures. We discuss a range of implementation issues for this vertex-based optimistic approach. We also propose a novel edge-based optimistic approach that has more parallelism and is better suited to GPUs. We study the performance empirically on two architectures(Xeon Phi and GPU) and across many data sets (from finite element problems to social networks). Our implementation uses the Kokkos library, so it is portable across platforms. We show that on GPUs, we significantly reduce the number of colors (geometric mean 4X, but up to 48X) as compared to the widely used cuSPARSE library. In addition, our edge-based algorithm is 1.5 times faster on average than cuSPARSE, where it hasspeedups up to 139X on a circuit problem. We also show the effect of the coloring on a conjugate gradient solver using multi-colored Symmetric Gauss-Seidel method as preconditioner, the higher coloring quality found by the proposed methods reduces the overall solve time up to 33% compared to cuSPARSE.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI Scopus

Partitioning and Task Placement with Zoltan2

Deveci, Mehmet D.; Devine, Karen D.; Boman, Erik G.; Leung, Vitus J.; Rajamanickam, Sivasankaran R.; Taylor, Mark A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Smart HPC Centers: data analysis feedback and response

Brandt, James M.; Gentile, Ann C.; Martin, C.; Allan, Benjamin A.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Multi-Jagged: A Scalable Parallel Spatial Partitioning Algorithm

IEEE Transactions on Parallel and Distributed Systems

Deveci, Mehmet; Rajamanickam, Sivasankaran R.; Devine, Karen D.; Catalyurek, Umit V.

Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those requiring geometric locality of data (particle methods, crash simulations). We present, to our knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our algorithm does recursive multi-section with a given number of parts in each dimension. By computing multiple cut lines concurrently and intelligently deciding when to migrate data while computing the partition, we minimize data movement compared to efficient implementations of recursive bisection. We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and scales better than RCB in terms of run-time without degrading the load balance. Our implementation partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak scaling up to 6K cores.

More Details

TYPE Journal Article YEAR 2016

DOI OSTI Scopus

Infrastructure for in situ system monitoring and application data analysis

Proceedings of ISAV 2015: 1st International Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis

Brandt, James M.; Devine, Karen D.; Gentile, Ann C.

We present an architecture for high-performance computers that integrates in situ analysis of hardware and system monitoring data with application-specific data to reduce application runtimes and improve overall platform utilization. Large-scale high-performance computing systems typically use monitoring as a tool unrelated to application execution. Monitoring data flows from sampling points to a centralized off-system machine for storage and post-processing when root-cause analysis is required. Along the way, it may also be used for instantaneous threshold-based error detection. Applications can know their application state and possibly allocated resource state, but typically, they have no insight into globally shared resource state that may affect their execution. By analyzing performance data in situ rather than off-line, we enable applications to make real-time decisions about their resource utilization. We address the particular case of in situ network congestion analysis and its potential to improve task placement and data partitioning. We present several design and analysis considerations.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

Infrastructure for In Situ System Monitoring and Application Data Analysis

Brandt, James M.; Devine, Karen D.; Gentile, Ann C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI

Meshes Load Balancing Graph Algorithms

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

The Zoltan2 Toolkit: Partitioning Task Placement Coloring and Ordering

Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Architecture-aware Task Placement

Deveci, Mehmet; Devine, Karen D.; Leung, Vitus J.; Prokopenko, Andrey V.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI OSTI

Distributing Linear Systems for Parallel Computation

Devine, Karen D.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping

Brandt, James M.; Devine, Karen D.; Gentile, Ann C.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

The Zoltan2 Toolkit: Partitioning Task Placement Coloring and Ordering

Devine, Karen D.; Boman, Erik G.; Rajamanickam, Sivasankaran R.; Leung, Vitus J.; Riesen, Lee A.; Deveci, Mehmet; Catalyurek, Umit

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

2D Partitioning for Scalable Matrix Computations on Scale-Free Graphs

Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Meshes Geometry and Load Balancing Capability Area

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Using architecture information and real-time resource state to reduce power consumption and communication costs in parallel applications

Brandt, James M.; Devine, Karen D.; Gentile, Ann C.; Leung, Vitus J.; Olivier, Stephen L.; Laros, James H.; Rajamanickam, Sivasankaran R.; Bunde, David P.; Deveci, Mehmet; Catalyurek, Umit V.

As computer systems grow in both size and complexity, the need for applications and run-time systems to adjust to their dynamic environment also grows. The goal of the RAAMP LDRD was to combine static architecture information and real-time system state with algorithms to conserve power, reduce communication costs, and avoid network contention. We devel- oped new data collection and aggregation tools to extract static hardware information (e.g., node/core hierarchy, network routing) as well as real-time performance data (e.g., CPU uti- lization, power consumption, memory bandwidth saturation, percentage of used bandwidth, number of network stalls). We created application interfaces that allowed this data to be used easily by algorithms. Finally, we demonstrated the benefit of integrating system and application information for two use cases. The first used real-time power consumption and memory bandwidth saturation data to throttle concurrency to save power without increasing application execution time. The second used static or real-time network traffic information to reduce or avoid network congestion by remapping MPI tasks to allocated processors. Results from our work are summarized in this report; more details are available in our publications [2, 6, 14, 16, 22, 29, 38, 44, 51, 54].

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping

Brandt, James M.; Devine, Karen D.; Gentile, Ann C.; Laros, James H.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

DOI OSTI

Albany on Next-Generation Systems

Devine, Karen D.; Salinger, Andrew G.; Demeshko, Irina D.; Hansen, Glen H.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Installing the Anasazi Eigensolver Package with Application to Some Graph Eigenvalue Problems

Lehoucq, Richard B.; Boman, Erik G.; Devine, Karen D.; Thornquist, Heidi K.; Slattengren, Nicole S.

The purpose of this report is to document a basic installation of the Anasazi eigensolver package and provide a brief discussion on the numerical solution of some graph eigenvalue problems.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

FASTMath Partitioning and Task Placement

Devine, Karen D.; Diamond, Gerrett; Ibanez, Dan; Leung, Vitus J.; Prokopenko, Andrey V.; Rajamanickam, Sivasankaran R.; Shephard, Mark; Smith, Cameron

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Zoltan Three-Slide Overview for ATPESC 2014

Devine, Karen D.; Rajamanickam, Sivasankaran R.; Prokopenko, Andrey V.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping

Brandt, James M.; Devine, Karen D.; Gentile, Ann C.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

Exploiting geometric partitioning in task mapping for parallel computers

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

Deveci, Mehmet; Rajamanickam, Sivasankaran R.; Leung, Vitus J.; Pedretti, Kevin P.; Olivier, Stephen L.; Bunde, David P.; Catalyurek, Umit V.; Devine, Karen D.

We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a job are not necessarily located within a contiguous block nor within close proximity to each other in the network. The goal is to assign tasks to cores so that interdependent tasks are performed by 'nearby' cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We show that, for the structured finite difference mini-app Mini Ghost, our mapping method reduced execution time 34% on average on 65,536 cores of a Cray XE6. In a molecular dynamics mini-app, Mini MD, our mapping method reduced communication time by 26% on average on 6144 cores. We also compare our mapping with graph-based mappings from the LibTopoMap library and show that our mappings reduced the communication time on average by 15% in MiniGhost and 10% in MiniMD. © 2014 IEEE.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

Using 2D Matrix Distributions in Trilinos

Devine, Karen D.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

A computational spectral graph theory tutorial

Boman, Erik G.; Devine, Karen D.; Lehoucq, Richard B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Exploiting Geometric Partitioning in Task Mapping for Parallel Computers

Rajamanickam, Sivasankaran R.; Leung, Vitus J.; Pedretti, Kevin P.; Olivier, Stephen L.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The Zoltan Toolkits: Parallel Partitioning Load Balancing Coloring and Ordering

Devine, Karen D.; Boman, Erik G.; Rajamanickam, Sivasankaran R.; Leung, Vitus J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI OSTI

Multi-jagged: A Scalable Multi-section based Spatial Partitioning Algorithm

Rajamanickam, Sivasankaran R.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable Matrix Computations on Large Scale-Free Graphs Using 2D Graph Partitioning

Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

DOI OSTI

Combinatorial Scientific Computing for Exascale Systems and Applications

Devine, Karen D.; Rajamanickam, Sivasankaran R.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Using the Cray Gemini Performance Counters

Pedretti, Kevin; Vaughan, Courtenay T.; Barrett, Richard F.; Devine, Karen D.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable Matrix Computations on Large Scale-Free Graphs Using 2D Graph Partitioning

Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

DOI OSTI

Trilinos-based Software for Eigenanalysis of Graphs

Boman, Erik G.; Devine, Karen D.; Lehoucq, Richard B.; Slattengren, Nicole S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Efficient Computation of Eigenpairs for Large Scale-free Graphs

Boman, Erik G.; Devine, Karen D.; Lehoucq, Richard B.; Slattengren, Nicole S.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable matrix computations on large scale-free graphs using 2D graph partitioning

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.

Scalable parallel computing is essential for processing large scale-free (power-law) graphs. The distribution of data across processes becomes important on distributed-memory computers with thousands of cores. It has been shown that two dimensional layouts (edge partitioning) can have significant advantages over traditional one-dimensional layouts. However, simple 2D block distribution does not use the structure of the graph, and more advanced 2D partitioning methods are too expensive for large graphs. We propose a new two-dimensional partitioning algorithm that combines graph partitioning with 2D block distribution. The computational cost of the algorithm is essentially the same as 1D graph partitioning. We study the performance of sparse matrix-vector multiplication (SpMV) for scale-free graphs from the web and social networks using several different partitioners and both 1D and 2D data layouts. We show that SpMV run time is reduced by exploiting the graph's structure. Contrary to popular belief, we observe that current graph and hypergraph partitioners often yield relatively good partitions on scale-free graphs. We demonstrate that our new 2D partitioning method consistently outperforms the other methods considered, for both SpMV and an eigensolver, on matrices with up to 1.6 billion nonzeros using up to 16,384 cores. Copyright 2013 ACM.

More Details

TYPE Conference YEAR 2013

DOI OSTI Scopus

Multi-jagged: A Scalable Multi-section based Spatial Partitioning Algorithm

Rajamanickam, Sivasankaran R.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Meshes Geometry and Load Balancing Capability Area

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Zoltan2: Next-Generation Combinatorial Toolkit

Boman, Erik G.; Devine, Karen D.; Leung, Vitus J.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Data Partitioning for Scientific Applications and Emerging Architectures

Devine, Karen D.; Leung, Vitus J.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Data Distribution for HPC Applications

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Eigensolvers on HPC Platforms

Boman, Erik G.; Devine, Karen D.; Lehoucq, Richard B.; Slattengren, Nicole S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Exploiting Geometry and Adjacencies in Mesh Partitioning

Devine, Karen D.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

The Trilinos Project - Enabling predictive science and engineering through software libraries for scalable computing

Willenbring, James M.; Heroux, Michael A.; Devine, Karen D.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Meshes Load Balancing and Geometry Capability Area

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Architecture-aware Load Balancing and Ordering

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

MapReduce in MPI for Large-scale graph algorithms

Parallel Computing

Plimpton, Steven J.; Devine, Karen D.

We describe a parallel library written with message-passing (MPI) calls that allows algorithms to be expressed in the MapReduce paradigm. This means the calling program does not need to include explicit parallel code, but instead provides "map" and "reduce" functions that operate independently on elements of a data set distributed across processors. The library performs needed data movement between processors. We describe how typical MapReduce functionality can be implemented in an MPI context, and also in an out-of-core manner for data sets that do not fit within the aggregate memory of a parallel machine. Our motivation for creating this library was to enable graph algorithms to be written as MapReduce operations, allowing processing of terabyte-scale data sets on traditional MPI-based clusters. We outline MapReduce versions of several such algorithms: vertex ranking via PageRank, triangle finding, connected component identification, Luby's algorithm for maximally independent sets, and single-source shortest-path calculation. To test the algorithms on arbitrarily large artificial graphs we generate randomized R-MAT matrices in parallel; a MapReduce version of this operation is also described. Performance and scalability results for the various algorithms are presented for varying size graphs on a distributed-memory cluster. For some cases, we compare the results with non-MapReduce algorithms, different machines, and different MapReduce software, namely Hadoop. Our open-source library is written in C++, is callable from C++, C, Fortran, or scripting languages such as Python, and can run on any parallel platform that supports MPI. © 2011 Elsevier B.V. All rights reserved.

More Details

TYPE Journal Article YEAR 2011

OSTI Scopus

Tutorial: The Zoltan Toolkit

Rajamanickam, Sivasankaran R.; Boman, Erik G.; Devine, Karen D.; Leung, Vitus J.; Riesen, Lee A.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Parallel mesh management using interoperable tools

Devine, Karen D.

This presentation included a discussion of challenges arising in parallel mesh management, as well as demonstrated solutions. They also described the broad range of software for mesh management and modification developed by the Interoperable Technologies for Advanced Petascale Simulations (ITAPS) team, and highlighted applications successfully using the ITAPS tool suite.

More Details

TYPE Conference YEAR 2010

OSTI

TUG 2010 meshes, geometry and load balancing capability area

Devine, Karen D.; Copps, Kevin D.; Ebeida, Mohamed S.; Hensinger, David M.; Knupp, Patrick K.; Sjaardema, Gregory D.; Williams, Alan B.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Exploring Feasibility of 2D Sparse Matrix Partitioning: Background

Wolf, Michael W.; Boman, Erik G.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI OSTI

Improved parallel mesh partitioning with hypergraphs and Zoltan

Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Comparing Programming Paradigms for Graph Algorithms

Devine, Karen D.; Plimpton, Steven J.; Bayer, Gregory B.; Barrett, Brian B.; Berry, Jonathan W.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Parallel Partitioning Coloring and Ordering for Scientific Computing

Boman, Erik G.; Devine, Karen D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI DOI

Interoperable mesh components for large-scale, distributed-memory simulations

Journal of Physics: Conference Series

Devine, Karen D.; Diachin, L.; Kraftcheck, J.; Jansen, K.E.; Leung, Vitus J.; Luo, X.; Miller, M.; Ollivier-Gooch, C.; Ovcharenko, A.; Sahni, O.; Shephard, M.S.; Tautges, T.; Xie, T.; Zhou, M.

SciDAC applications have a demonstrated need for advanced software tools to manage the complexities associated with sophisticated geometry, mesh, and field manipulation tasks, particularly as computer architectures move toward the petascale. In this paper, we describe a software component - an abstract data model and programming interface - designed to provide support for parallel unstructured mesh operations. We describe key issues that must be addressed to successfully provide high-performance, distributed-memory unstructured mesh services and highlight some recent research accomplishments in developing new load balancing and MPI-based communication libraries appropriate for leadership class computing. Finally, we give examples of the use of parallel adaptive mesh modification in two SciDAC applications. © 2009 IOP Publishing Ltd.

More Details

TYPE Conference YEAR 2009

Scopus OSTI

TUG 2008 Meshes Geometry and Load Balancing Capability Area

Devine, Karen D.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Distributed micro-releases of bioterror pathogens : threat characterizations and epidemiology from uncertain patient observables

Adams, Brian M.; Devine, Karen D.; Najm, H.N.; Marzouk, Youssef M.

Terrorist attacks using an aerosolized pathogen preparation have gained credibility as a national security concern since the anthrax attacks of 2001. The ability to characterize the parameters of such attacks, i.e., to estimate the number of people infected, the time of infection, the average dose received, and the rate of disease spread in contemporary American society (for contagious diseases), is important when planning a medical response. For non-contagious diseases, we address the characterization problem by formulating a Bayesian inverse problem predicated on a short time-series of diagnosed patients exhibiting symptoms. To keep the approach relevant for response planning, we limit ourselves to 3.5 days of data. In computational tests performed for anthrax, we usually find these observation windows sufficient, especially if the outbreak model employed in the inverse problem is accurate. For contagious diseases, we formulated a Bayesian inversion technique to infer both pathogenic transmissibility and the social network from outbreak observations, ensuring that the two determinants of spreading are identified separately. We tested this technique on data collected from a 1967 smallpox epidemic in Abakaliki, Nigeria. We inferred, probabilistically, different transmissibilities in the structured Abakaliki population, the social network, and the chain of transmission. Finally, we developed an individual-based epidemic model to realistically simulate the spread of a rare (or eradicated) disease in a modern society. This model incorporates the mixing patterns observed in an (American) urban setting and accepts, as model input, pathogenic transmissibilities estimated from historical outbreaks that may have occurred in socio-economic environments with little resemblance to contemporary society. Techniques were also developed to simulate disease spread on static and sampled network reductions of the dynamic social networks originally in the individual-based model, yielding faster, though approximate, network-based epidemic models. These reduced-order models are useful in scenario analysis for medical response planning, as well as in computationally intensive inverse problems.

More Details

TYPE SAND Report YEAR 2008

DOI OSTI