Publications

Results 1–25 of 60
Skip to search filters

Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Lifflander, Jonathan; Slattengren, Nicole S.; Pébaÿ, Philippe P.; Miller, Phil; Rizzi, Francesco N.; Bettencourt, Matthew T.

This paper explores dynamic load balancing algorithms used by asynchronous many-task (AMT), or 'taskbased', programming models to optimize task placement for scientific applications with dynamic workload imbalances. AMT programming models use overdecomposition of the computational domain. Overdecompostion provides a natural mechanism for domain developers to expose concurrency and break their computational domain into pieces that can be remapped to different hardware. This paper explores fully distributed load balancing strategies that have shown great promise for exascalelevel computing but are challenging to theoretically reason about and implement effectively. We present a novel theoretical analysis of a gossip-based load balancing protocol and use it to build an efficient implementation with fast convergence rates and high load balancing quality. We demonstrate our algorithm in a nextgeneration plasma physics application (EMPIRE) that induces time-varying workload imbalance due to spatial non-uniformity in particle density across the domain. Our highly scalable, novel load balancing algorithm, achieves over a 3x speedup (particle work) compared to a bulk-synchronous MPI implementation without load balancing.

More Details

Design and Implementation Techniques for an MPI-Oriented AMT Runtime

Proceedings of ExaMPI 2020: Exascale MPI Workshop, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Lifflander, Jonathan; Miller, Phil; Slattengren, Nicole S.; Morales, Nicolas M.; Stickney, Paul; Pébaÿ, Philippe P.

We present the execution model of Virtual Transport (VT) a new, Asynchronous Many-Task (AMT) runtime system that provides unprecedented integration and interoperability with MPI. We have developed VT in conjunction with large production applications to provide a highly incremental, high-value path to AMT adoption in the dominant ecosystem of MPI applications, libraries, and developers. Our aim is that the'MPI+X' model of hybrid parallelism can smoothly extend to become'MPI+VT +X'. We illustrate a set of design and implementation techniques that have been useful in building VT. We believe that these ideas and the code embodying them will be useful to others building similar systems, and perhaps provide insight to how MPI might evolve to better support them. We motivate our approach with two applications that are adopting VT and have begun to benefit from increased asynchrony and dynamic load balancing.

More Details

Enabling Resilience in Asynchronous Many-Task Programming Models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Paul, Sri R.; Hayashi, Akihiro; Slattengren, Nicole S.; Kolla, Hemanth K.; Whitlock, Matthew J.; Bak, Seonmyeong; Teranishi, Keita T.; Mayo, Jackson M.; Sarkar, Vivek

Resilience is an imminent issue for next-generation platforms due to projected increases in soft/transient failures as part of the inherent trade-offs among performance, energy, and costs in system design. In this paper, we introduce a comprehensive approach to enabling application-level resilience in Asynchronous Many-Task (AMT) programming models with a focus on remedying Silent Data Corruption (SDC) that can often go undetected by the hardware and OS. Our approach makes it possible for the application programmer to declaratively express resilience attributes with minimal code changes, and to delegate the complexity of efficiently supporting resilience to our runtime system. We have created a prototype implementation of our approach as an extension to the Habanero C/C++ library (HClib), where different resilience techniques including task replay, task replication, algorithm-based fault tolerance (ABFT), and checkpointing are available. Our experimental results show that task replay incurs lower overhead than task replication when an appropriate error checking function is provided. Further, task replay matches the low overhead of ABFT. Our results also demonstrate the ability to combine different resilience schemes. To evaluate the effectiveness of our resilience mechanisms in the presence of errors, we injected synthetic errors at different error rates (1.0%, and 10.0%) and found modest increase in execution times. In summary, the results show that our approach supports efficient and scalable recovery, and that our approach can be used to influence the design of future AMT programming models and runtime systems that aim to integrate first-class support for user-level resilience.

More Details

ASC CSSE Level 2 Milestone #6362: Resilient Asynchronous Many Task Programming Model

Teranishi, Keita T.; Kolla, Hemanth K.; Slattengren, Nicole S.; Whitlock, Matthew J.; Mayo, Jackson M.; Clay, Robert L.; Paul, Sri R.; Hayashi, Akihiro H.; Sarkar, Vivek S.

This report is an outcome of the ASC CSSE Level 2 Milestone 6362: Analysis of Re- silient Asynchronous Many-Task (AMT) Programming Model. It comprises a summary and in-depth analysis of resilience schemes adapted to the AMT programming model. Herein, performance trade-offs of a resilient-AMT prograrnming model are assessed through two ap- proaches: (1) an analytical model realized by discrete event simulations and (2) empirical evaluation of benchmark programs representing regular and irregular workloads of explicit partial differential equation solvers. As part of this effort, an AMT execution simulator and a prototype resilient-AMT programming framework have been developed. The former permits us to hypothesize the performance behavior of a resilient-AMT model, and has undergone a verification and validation (V&V) process. The latter allows empirical evaluation of the perfor- mance of resilience schemes under emulated program failures and enabled the aforementioned V&V process. The outcome indicates that (1) resilience techniques implemented within an AMT framework allow efficient and scalable recovery under frequent failures, that (2) the abstraction of task and data instances in the AMT programming model enables readily us- able Application Program Interfaces (APIs) for resilience, and that (3) this abstraction enables predicting the performance of resilient-AMT applications with a simple simulation infrastruc- ture. This outcome will provide guidance for the design of the AMT programming model and runtime systems, user-level resilience support, and application development for ASC's next generation platforms (NGPs).

More Details

Metaprogramming-Enabled Parallel Execution of Apparently Sequential C++ Code

Proceedings of ESPM2 2016: 2nd International Workshop on Extreme Scale Programming Models and Middleware - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis

Hollman, David S.; Bennett, Janine C.; Kolla, Hemanth K.; Lifflander, Jonathan; Slattengren, Nicole S.; Wilke, Jeremiah J.

Task-based execution models have received considerable attention in recent years to meet the performance challenges facing high-performance computing (HPC). In this paper we introduce MetaPASS-Metaprogramming-enabled Para-llelism from Apparently Sequential Semantics-a proof-of-concept, non-intrusive header library that enables implicit task-based parallelism in a sequential C++ code. MetaPASS is a data-driven model, relying on dependency analysis of variable read-/write accesses to derive a directed acyclic graph (DAG) of the computation to be performed. MetaPASS enables embedding of runtime dependency analysis directly in C++ applications using only template metaprogramming. Rather than requiring verbose task-based code or source-to-source compilers, a native C++ code can be made task-based with minimal modifications. We present an overview of the programming model enabled by MetaPASS and the C++ runtime API required to support it. Details are provided regarding how standard template metaprogramming is used to capture task dependencies. We finally discuss how the programming model can be deployed in both an MPI+X and in a standalone distributed memory context.

More Details

DARMA 0.3.0-alpha Specification

Wilke, Jeremiah J.; Hollman, David S.; Slattengren, Nicole S.; lifflander, jonathan l.; Kolla, Hemanth K.; Rizzi, Francesco N.; Teranishi, Keita T.; Bennett, Janine C.

PARMA (Distributed Asynchronous Resilient Models and ApH asynchronous many-task (AMT) rmogramming models and hardware idiosyncrasies, 2) improve application programmer interface (API) plication Ico-desiga activities into meaningful requirements for characterization and definition, accelerating the development of pARMAI APT is a rranslation layer runtime systems Am' 11 between an application-facing . The application-facing user-level iting the generic language constructs of C++ and adding parallel programs. Though the implementation of the provide the front end semantics, it is nonetheless fully embedded in the C++ language and leverages a widely supported front end fiack end in C++, inher- that facilitate expressing distributed asynchronous uses C++ constructs unfamiliar to many programmers to subset of C++14 functionality (gcc >= 4.9, clang >= 3.5, icc > = 16). The rranslation layer leverages C++ to map the user's code onto the fiack encI runtime APT. The fiack end APT is a set of abstract classes and function signatures that iuntime systenr developers must implement in accordance with the specification require- ments in order to interface with application code written to the must link to a iuntime systenr that implements the abstract mentations will be external, drawing upon existing provided in the pARMAI code distribution. IDARMAI fiack end templatO front end. Executable 1DARMA applications runtime APT. It is intended that these imple- technologies. However, a reference implementation will be The front end rranslation layer, and iback end APT are detailed herein. We also include a list of application requirements driving the specification (along with a list of the applications contributing to the requirements to date), a brief history of changes between previous versions of the specification, and summary of the planned changes in up- coming versions of the specification. Appendices walk the user through a more detailed set of examples of applications written in the PARMA front encI APII and provide additional technical details for those the interested reader.

More Details
Results 1–25 of 60
Results 1–25 of 60