Publications

Results 1–25 of 47
Skip to search filters

A simulation infrastructure for examining the performance of resilience strategies at scale

Ferreira, Kurt; Levy, Scott L.

Fault-tolerance is a major challenge for many current and future extreme-scale systems, with many studies showing it to be the key limiter to application scalability. While there are a number of studies investigating the performance of various resilience mechanisms, these are typically limited to scales orders of magnitude smaller than expected for next-generation systems and simple benchmark problems. In this paper we show how, with very minor changes, a previously published and validated simulation framework for investigating appli- cation performance of OS noise can be used to simulate the overheads of various resilience mechanisms at scale. Using this framework, we compare the failure-free performance of this simulator against an analytic model to validate its performance and demonstrate its ability to simulate the performance of two popular rollback recovery methods on traces from real

More Details

ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE

Bettencourt, Matthew T.; Kramer, Richard M.; Cartwright, Keith C.; Phillips, Edward G.; Ober, Curtis C.; Pawlowski, Roger P.; Swan, Matthew S.; Kalashnikova, Irina; Phipps, Eric T.; Conde, Sidafa C.; Cyr, Eric C.; Ulmer, Craig D.; Kordenbrock, Todd H.; Levy, Scott L.; Templet, Gary J.; Hu, Jonathan J.; Lin, Paul L.; Glusa, Christian A.; Siefert, Christopher S.; Glass, Micheal W.

This report documents the outcome from the ASC ATDM Level 2 Milestone 6358: Assess Status of Next Generation Components and Physics Models in EMPIRE. This Milestone is an assessment of the EMPIRE (ElectroMagnetic Plasma In Realistic Environments) application and three software components. The assessment focuses on the electromagnetic and electrostatic particle-in-cell solu- tions for EMPIRE and its associated solver, time integration, and checkpoint-restart components. This information provides a clear understanding of the current status of the EMPIRE application and will help to guide future work in FY19 in order to ready the application for the ASC ATDM L 1 Milestone in FY20. It is clear from this assessment that performance of the linear solver will have to be a focus in FY19.

More Details
Results 1–25 of 47
Results 1–25 of 47