Center for Computing Research (CCR)

A cross-enclave composition mechanism for exascale system software

Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2016 - In conjunction with HPDC 2016

Evans, Noah; Pedretti, Kevin P.; Kocoloski, Brian; Lange, John; Lang, Michael; Bridges, Patrick G.

As supercomputers move to exascale, the number of cores per node continues to increase, but the I/O bandwidth between nodes is increasing more slowly. This leads to computational power outstripping I/O bandwidth. This growth, in turn, encourages moving as much of an HPC workflow as possible onto the node in order to minimize data movement. One particular method of application composition, enclaves, co-locates different operating systems and runtimes on the same node where they communicate by in situ communication mechanisms. In this work, we describe a mechanism for communicating between composed applications. We implement a mechanism using Copy onWrite cooperating with XEMEM shared memory to provide consistent, implicitly unsynchronized communication across enclaves. We then evaluate this mechanism using a composed application and analytics between the Kitten Lightweight Kernel and Linux on top of the Hobbes Operating System and Runtime. These results show a 3% overhead compared to an application running in isolation, demonstrating the viability of this approach.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI DOI

A Cross-Enclave Composition Mechanism for Exascale System Software

Evans, Noah; Pedretti, Kevin P.; Kocoloski, Brian; Lange, John; Lang, Michael L.; Bridges, Patrick G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI DOI

Evaluating the Viability of Process Replication Reliability for Exascale Systems

Ferreira, Kurt; Stearley, Jon S.; Laros, James H.; Oldfield, Ron A.; Pedretti, Kevin P.; Brightwell, Ronald B.; Bridges, Patrick G.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Evaluating the Viability of Using Compression to Mitigate Silent Corruption of Read-Mostly Application Data

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Levy, Scott; Ferreira, Kurt B.; Bridges, Patrick G.

Aggregating millions of hardware components to construct an exascale computing platform will pose significant resilience challenges. In addition to slowdowns associated with detected errors, silent errors are likely to further degrade application performance. Moreover, silent data corruption (SDC) has the potential to undermine the integrity of the results produced by important scientific applications.In this paper, we propose an application-independent mechanism to efficiently detect and correct SDC in read-mostly memory, where SDC may be most likely to occur. We use memory protection mechanisms to maintain compressed backups of application memory. We detect SDC by identifying changes in memory contents that occur without explicit write operations. We demonstrate that, for several applications, our approach can potentially protect a significant fraction of application memory pages from SDC with modest overheads. Moreover, our proposed technique can be straightforwardly combined with many other approaches to provide a significant bulwark against SDC.

More Details

TYPE Conference Poster YEAR 2017

Scopus OSTI DOI

How I learned to stop worrying and love in situ analytics: Leveraging latent synchronization in MPI collective algorithms

ACM International Conference Proceeding Series

Levy, Scott; Ferreira, Kurt B.; Widener, Patrick W.; Bridges, Patrick G.; Mondragon, Oscar H.

Scientific workloads running on current extreme-scale systems routinely generate tremendous volumes of data for postprocessing. This data movement has become a serious issue due to its energy cost and the fact that I/O bandwidths have not kept pace with data generation rates. In situ analytics is an increasingly popular alternative in which post-simulation processing is embedded into an application, running as part of the same MPI job. This can reduce data movement costs but introduces a new potential source of interference for the application. Using a validated simulation-based approach, we investigate how best to mitigate the interference from time-shared in situ tasks for a number of key extreme-scale workloads. This paper makes a number of contributions. First, we show that the independent scheduling of in situ analytics tasks can significantly degradation application performance, with slowdowns exceeding 1000%. Second, we demonstrate that the degree of synchronization found in many modern collective algorithms is sufficient to significantly reduce the overheads of this interference to less than 10% in most cases. Finally, we show that many applications already frequently invoke collective operations that use these synchronizing MPI algorithms. Therefore, the syncronization introduced by these MPI collective algorithms can be leveraged to efficiently schedule analytics tasks with minimal changes to existing applications. This paper provides critical analysis and guidance for MPI users and developers on the importance of scheduling in situ analytics tasks. It shows the degree of synchronization needed to mitigate the performance impacts of these time-shared coupled codes and demonstrates how that synchronization can be realized in an extreme-scale environment using modern collective algorithms.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI DOI

Improving Application Resilience to Memory Errors with Lightweight Compression

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Levy, Scott; Ferreira, Kurt B.; Bridges, Patrick G.

In next-generation extreme-scale systems, application performance will be limited by memory performance characteristics. The first exascale system is projected to contain many petabytes of memory. In addition to the sheer volume of the memory required, device trends, such as shrinking feature sizes and reduced supply voltages, have the potential to increase the frequency of memory errors. As a result, resilience to memory errors is a key challenge. In this paper, we evaluate the viability of using memory compression to repair detectable uncorrectable errors (DUEs) in memory. We develop a software library, evaluate its performance and demonstrate that it is able to significantly compress memory of HPC applications. Further, we show that exploiting compressed memory pages to correct memory errors can significantly improve application performance on next-generation systems.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI DOI

Improving Application Resilience to Memory Errors with Lightweight Compression

Levy, Scott L.; Ferreira, Kurt B.; Bridges, Patrick G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI DOI

LDRD final report : a lightweight operating system for multi-core capability class supercomputers

Pedretti, Kevin P.; Levenhagen, Michael J.; Ferreira, Kurt; Brightwell, Ronald B.; Kelly, Suzanne M.; Bridges, Patrick G.

The two primary objectives of this LDRD project were to create a lightweight kernel (LWK) operating system(OS) designed to take maximum advantage of multi-core processors, and to leverage the virtualization capabilities in modern multi-core processors to create a more flexible and adaptable LWK environment. The most significant technical accomplishments of this project were the development of the Kitten lightweight kernel, the co-development of the SMARTMAP intra-node memory mapping technique, and the development and demonstration of a scalable virtualization environment for HPC. Each of these topics is presented in this report by the inclusion of a published or submitted research paper. The results of this project are being leveraged by several ongoing and new research projects.

More Details

TYPE SAND Report YEAR 2010

OSTI DOI

Minimal-overhead virtualization of a large scale supercomputer

ACM SIGPLAN Notices

Lange, John R.; Pedretti, Kevin P.; Dinda, Peter; Bae, Chang; Bridges, Patrick G.; Soltero, Philip; Merritt, Alexander

Virtualization has the potential to dramatically increase the usability and reliability of high performance computing (HPC) systems. However, this potential will remain unrealized unless overheads can be minimized. This is particularly challenging on large scale machines that run carefully crafted HPC OSes supporting tightlycoupled, parallel applications. In this paper, we show how careful use of hardware and VMM features enables the virtualization of a large-scale HPC system, specifically a Cray XT4 machine, with .5% overhead on key HPC applications, microbenchmarks, and guests at scales of up to 4096 nodes. We describe three techniques essential for achieving such low overhead: passthrough I/O, workload-sensitive selection of paging mechanisms, and carefully controlled preemption. These techniques are forms of symbiotic virtualization, an approach on which we elaborate. Copyright © 2011 ACM.

More Details

TYPE Conference YEAR 2011

Scopus OSTI

Modeling Concurrent Point-to-Point Communication Cost in MPI Performance Models

Farmer, Shane F.; Skjellum, Anthony S.; Bridges, Patrick G.; Dosanjh, Matthew D.; Grant, Ryan E.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Opportunities for leveraging OS virtualization in high-end supercomputing

Pedretti, Kevin P.; Bridges, Patrick G.

This paper examines potential motivations for incorporating virtualization support in the system software stacks of high-end capability supercomputers. We advocate that this will increase the flexibility of these platforms significantly and enable new capabilities that are not possible with current fixed software stacks. Our results indicate that compute, virtual memory, and I/O virtualization overheads are low and can be further mitigated by utilizing well-known techniques such as large paging and VMM bypass. Furthermore, since the addition of virtualization support does not affect the performance of applications using the traditional native environment, there is essentially no disadvantage to its addition.

More Details

TYPE Conference YEAR 2010

OSTI

Opportunities for Leveraging OS Virtualization in High-End Supercomputing

Pedretti, Kevin P.; Bridges, Patrick G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Preparing for exascale: Modeling MPI for Many-core systems using fine-grain queues

Proceedings of the 3rd ExaMPI Workshop at the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2015

Bridges, Patrick G.; Dosanjh, Matthew D.; Grant, Ryan E.; Skjellum, Anthony; Farmer, Shane; Brightwell, Ronald B.

This paper presents a fine-grain queueing model of MPI point-To-point messaging performance for use in the design and analysis of current and future large-scale computing sys-Tems. In particular, the model seeks to capture key perfor-mance behavior of MPI communication on many-core sys-Tems. We demonstrate that this model encompasses key MPI performance characteristics, such as short/long proto-col and offoad/onload protocol tradeos, and demonstrate its use in predicting the potential impact of architectural and software changes for many-core systems on communication performance. In addition, we also discuss the limitations of this model and potential directions for enhancing its fi-delity.

More Details

TYPE Conference Poster YEAR 2015

Scopus OSTI DOI

Re-evaluating Network Onload vs. Offload for the Many-Core Era

Dosanjh, Matthew D.; Grant, Ryan E.; Bridges, Patrick G.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Re-evaluating Network Onload vs. Offload for the Many-Core Era

Dosanjh, Matthew D.; Grant, Ryan E.; Bridges, Patrick G.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI DOI

Re-evaluating Network Onload vs. Offload for the Many-Core Era

Dosanjh, Matthew D.; Grant, Ryan E.; Bridges, Patrick G.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

RMA-MT: A Benchmark Suite for Assessing MPI Multi-threaded RMA Performance

Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016

Dosanjh, Matthew D.; Groves, Taylor G.; Grant, Ryan E.; Brightwell, Ronald B.; Bridges, Patrick G.

Reaching Exascale will require leveraging massive parallelism while potentially leveraging asynchronous communication to help achieve scalability at such large levels of concurrency. MPI is a good candidate for providing the mechanisms to support communication at such large scales. Two existing MPI mechanisms are particularly relevant to Exascale: multi-threading, to support massive concurrency, and Remote Memory Access (RMA), to support asynchronous communication. Unfor-tunately, multi-threaded MPI RMA code has not been extensively studied. Part of the reason for this is that no public benchmarks or proxy applications exist to assess its performance. The contributions of this paper are the design and demonstration of the first available proxy applications and micro-benchmark suite for multi-threaded RMA in MPI, a study of multi-threaded RMA performance of different MPI implementations, and an evaluation of how these benchmarks can be used to test development for both performance and correctness.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI DOI

Scheduling In-Situ Analytics in Next-Generation Applications

Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016

Mondragon, Oscar H.; Bridges, Patrick G.; Levy, Scott; Ferreira, Kurt B.; Widener, Patrick W.

Next-generation applications increasingly rely on in situ analytics to guide computation, reduce the amount of I/O performed, and perform other important tasks. Scheduling where and when to run analytics is challenging, however. This paper quantifies the costs and benefits of different approaches to scheduling applications and analytics on nodes in large-scale applications, including space sharing, uncoordinated time sharing, and gang scheduled time sharing.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI

SHMEM-MT: A benchmark suite for assessing multi-threaded SHMEM performance

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Weeks, Hans; Dosanjh, Matthew D.; Bridges, Patrick G.; Grant, Ryan E.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI DOI

Similarity Engine: Using Content Similarity to Improve Memory Resilience

Levy, Scott L.; Ferreira, Kurt B.; Bridges, Patrick G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Similarity Engine: Using Content Similarity to Improve Memory Resilience

Levy, Scott L.; Ferreira, Kurt B.; Bridges, Patrick G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

System-level support for composition of applications

Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2015 - In conjunction with HPDC 2015

Kocoloski, Brian; Lange, John; Abbasi, Hasan; Bernholdt, David E.; Jones, Terry R.; Dayal, Jai; Evans, Noah; Lang, Michael; Lofstead, Jay; Pedretti, Kevin P.; Bridges, Patrick G.

Current HPC system software lacks support for emerging application deployment scenarios that combine one or more simulations with in situ analytics, sometimes called multi-component or multi-enclave applications. This paper presents an initial design study, implementation, and evaluation of mechanisms supporting composite multi-enclave applications in the Hobbes exascale operating system. These mechanisms include virtualization techniques isolating application custom enclaves while using the vendor-supplied host operating system and high-performance inter-VM communication mechanisms. Our initial single-node performance evaluation of these mechanisms on multi-enclave science applications, both real and proxy, demonstrate the ability to support multi-enclave HPC job composition with minimal performance overhead.

More Details

TYPE Conference Poster YEAR 2015

Scopus OSTI

Understanding Performance Interference in Next-Generation HPC Systems

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Mondragon, Oscar H.; Bridges, Patrick G.; Levy, Scott; Ferreira, Kurt B.; Widener, Patrick W.

Next-generation systems face a wide range of new potential sources of application interference, including resilience actions, system software adaptation, and in situ analytics programs. In this paper, we present a new model for analyzing the performance of bulk-synchronous HPC applications based on the use of extreme value theory. After validating this model against both synthetic and real applications, the paper then uses both simulation and modeling techniques to profile next-generation interference sources and characterize their behavior and performance impact on a selection of HPC benchmarks, mini-applications, and applications. Lastly, this work shows how the model can be used to understand how current interference mitigation techniques in multi-processors work.

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI

Using Simulation to Evaluate the Performance of Resilience Strategies at Scale

Levy, Scott L.; Ferreira, Kurt B.; Widener, Patrick W.; Bridges, Patrick G.; Mondragon, Oscar H.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI DOI

VM-based slack emulation of large-scale systems

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2011

Bridges, Patrick G.; Arnold, Dorian; Pedretti, Kevin P.

This paper describes the design of a system to enable large-scale testing of new software stacks and prospective high-end computing architectures. The proposed architecture combines system virtualization, time dilation, architectural simulation, and slack simulation to provide scalable emulation of hypothetical systems. We also describe virtualization-based full-system measurement and monitoring tools to aid in using the proposed system for co-design of high-performance computing system software and architectural features for future systems. Finally, we provide a description of the implementation strategy and status of the proposed system. © 2011 ACM.

More Details

TYPE Conference YEAR 2011

Scopus OSTI

Publications