Publications

Results 1–25 of 33

Search results

Jump to search filters

Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Lifflander, Jonathan; Slattengren, Nicole S.; Pebay, Philippe P.; Miller, Phil; Rizzi, Francesco N.; Bettencourt, Matthew T.

This paper explores dynamic load balancing algorithms used by asynchronous many-task (AMT), or 'taskbased', programming models to optimize task placement for scientific applications with dynamic workload imbalances. AMT programming models use overdecomposition of the computational domain. Overdecompostion provides a natural mechanism for domain developers to expose concurrency and break their computational domain into pieces that can be remapped to different hardware. This paper explores fully distributed load balancing strategies that have shown great promise for exascalelevel computing but are challenging to theoretically reason about and implement effectively. We present a novel theoretical analysis of a gossip-based load balancing protocol and use it to build an efficient implementation with fast convergence rates and high load balancing quality. We demonstrate our algorithm in a nextgeneration plasma physics application (EMPIRE) that induces time-varying workload imbalance due to spatial non-uniformity in particle density across the domain. Our highly scalable, novel load balancing algorithm, achieves over a 3x speedup (particle work) compared to a bulk-synchronous MPI implementation without load balancing.

More Details

HIERARCHICAL PARALLELISM FOR TRANSIENT SOLID MECHANICS SIMULATIONS

World Congress in Computational Mechanics and ECCOMAS Congress

Littlewood, David J.; Jones, Reese E.; Laros, James H.; Plews, Julia A.; Hetmaniuk, Ulrich L.; Lifflander, Jonathan

Software development for high-performance scientific computing continues to evolve in response to increased parallelism and the advent of on-node accelerators, in particular GPUs. While these hardware advancements have the potential to significantly reduce turnaround times, they also present implementation and design challenges for engineering codes. We investigate the use of two strategies to mitigate these challenges: the Kokkos library for performance portability across disparate architectures, and the DARMA/vt library for asynchronous many-task scheduling. We investigate the application of Kokkos within the NimbleSM finite element code and the LAMÉ constitutive model library. We explore the performance of DARMA/vt applied to NimbleSM contact mechanics algorithms. Software engineering strategies are discussed, followed by performance analyses of relevant solid mechanics simulations which demonstrate the promise of Kokkos and DARMA/vt for accelerated engineering simulators.

More Details

Design and Implementation Techniques for an MPI-Oriented AMT Runtime

Proceedings of ExaMPI 2020: Exascale MPI Workshop, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Lifflander, Jonathan; Miller, Phil; Slattengren, Nicole S.; Morales, Nicolas M.; Stickney, Paul; Pebay, Philippe P.

We present the execution model of Virtual Transport (VT) a new, Asynchronous Many-Task (AMT) runtime system that provides unprecedented integration and interoperability with MPI. We have developed VT in conjunction with large production applications to provide a highly incremental, high-value path to AMT adoption in the dominant ecosystem of MPI applications, libraries, and developers. Our aim is that the'MPI+X' model of hybrid parallelism can smoothly extend to become'MPI+VT +X'. We illustrate a set of design and implementation techniques that have been useful in building VT. We believe that these ideas and the code embodying them will be useful to others building similar systems, and perhaps provide insight to how MPI might evolve to better support them. We motivate our approach with two applications that are adopting VT and have begun to benefit from increased asynchrony and dynamic load balancing.

More Details

Using Sandia's Automatic Report Generator to Document EMPIRE-based Electrostatic Simulations

Pebay, Philippe P.; Lifflander, Jonathan

The goal of this report is to illustrate the use of Sandia's Automatic Report Generator (ARG), when applied to an Electrostatic simulation case run with Sandia's EMPIRE code. It documents the results of a hackathon session that was held at the March 19-22 DOE Workshop Workflow and Hackathon that was held in Livermore, where the co-authors demonstrated ARG's flexibilty by extending it to several aspect of such simulation in less than a day's worth of work. The Explorator component of ARG automatically picks up the case's input deck, hereby determining the data components that the Generator and Assembler components are currently able to document: meta-data, input deck, mesh, and solution fields. The ARG is not yet capable of documenting the particles file created by the simulation, which will require further work.

More Details

DARMA-EMPIRE Integration and Performance Assessment – Interim Report

Lifflander, Jonathan; Bettencourt, Matthew T.; Slattengren, Nicole S.; Templet, Gary J.; Miller, Phil; Perrinel, Meriadeg; Rizzi, Francesco N.; Pebay, Philippe P.

We begin by presenting an overview of the general philosophy that is guiding the novel DARMA developments, followed by a brief reminder about the background of this project. We finally present the FY19 design requirements. As the Exascale era arises, DARMA is uniquely positioned at the forefront of asychronous many-task (AMT) research and development (R&D) to explore emerging programming model paradigms for next-generation HPC applications at Sandia, across NNSA labs, and beyond. The DARMA project explores how to fundamentally shift the expression(PM) and execution(EM)of massively concurrent HPC scientific algorithms to be more asynchronous, resilient to executional aberrations in heterogeneous/unpredictable environments, and data-dependency conscious—thereby enabling an intelligent, dynamic, and self-aware runtime to guide execution.

More Details
Results 1–25 of 33
Results 1–25 of 33