Publications Search

Challenges for high-performance networking for exascale computing

Brightwell, Ronald B.; Barrett, Brian; Hemmert, Karl S.

Achieving the next three orders of magnitude performance increase to move from petascale to exascale computing will require a significant advancements in several fundamental areas. Recent studies have outlined many of the challenges in hardware and software that will be needed. In this paper, we examine these challenges with respect to high-performance networking. We describe the repercussions of anticipated changes to computing and networking hardware and discuss the impact that alternative parallel programming models will have on the network software stack. We also present some ideas on possible approaches that address some of these challenges.

More Details

TYPE Conference YEAR 2010

OSTI

Exploiting Direct Access Shared Memory for MPI on Multi-Core Processors

International Journal of High-Performance Computing Applications

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2010

DOI OSTI

rMPI : increasing fault resiliency in a message-passing environment

Ferreira, Kurt; Riesen, Rolf; Oldfield, Ron; Laros, James H.; Pedretti, Kevin P.; Stearley, Jon S.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Transparent redundant computing with MPI

Brightwell, Ronald B.; Ferreira, Kurt

Extreme-scale parallel systems will require alternative methods for applications to maintain current levels of uninterrupted execution. Redundant computation is one approach to consider, if the benefits of increased resiliency outweigh the cost of consuming additional resources. We describe a transparent redundancy approach for MPI applications and detail two different implementations that provide the ability to tolerate a range of failure scenarios, including loss of application processes and connectivity.We compare these two approaches and show performance results from micro-benchmarks that bound worst-case message passing performance degradation.We propose several enhancements that could lower the overhead of providing resiliency through redundancy.

More Details

TYPE Conference YEAR 2010

OSTI

On the path to exascale

International Journal of Distributed Systems and Technologies

Alvin, Kenneth F.; Barrett, Brian; Brightwell, Ronald B.; Dosanjh, Sudip S.; Geist, Al; Hemmert, Karl S.; Heroux, Michael; Kothe, Doug; Murphy, Richard C.; Nichols, Jeff; Oldfield, Ron; Rodrigues, Arun; Vetter, Jeffrey S.

There is considerable interest in achieving a 1000 fold increase in supercomputing power in the next decade, but the challenges are formidable. In this paper, the authors discuss some of the driving science and security applications that require Exascale computing (a million, trillion operations per second). Key architectural challenges include power, memory, interconnection networks and resilience. The paper summarizes ongoing research aimed at overcoming these hurdles. Topics of interest are architecture aware and scalable algorithms, system simulation, 3D integration, new approaches to system-directed resilience and new benchmarks. Although significant progress is being made, a broader international program is needed.

More Details

TYPE Journal Article YEAR 2010

Scopus OSTI

A lightweight, GPU-based software RAID system

Brightwell, Ronald B.; Ward, Harry L.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

System Software Research for Extreme-Scale Computing

Oldfield, Ron; Brightwell, Ronald B.; Pedretti, Kevin; Riesen, Rolf; Ferreira, Kurt; Kelly, Suzanne M.; Laros, James H.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Increasing fault resiliency in a message-passing environment

Ferreira, Kurt; Oldfield, Ron; Stearley, Jon S.; Laros, James H.; Pedretti, Kevin T.T.; Brightwell, Ronald B.

Petaflops systems will have tens to hundreds of thousands of compute nodes which increases the likelihood of faults. Applications use checkpoint/restart to recover from these faults, but even under ideal conditions, applications running on more than 30,000 nodes will likely spend more than half of their total run time saving checkpoints, restarting, and redoing work that was lost. We created a library that performs redundant computations on additional nodes allocated to the application. An active node and its redundant partner form a node bundle which will only fail, and cause an application restart, when both nodes in the bundle fail. The goal of this library is to learn whether this can be done entirely at the user level, what requirements this library places on a Reliability, Availability, and Serviceability (RAS) system, and what its impact on performance and run time is. We find that our redundant MPI layer library imposes a relatively modest performance penalty for applications, but that it greatly reduces the number of applications interrupts. This reduction in interrupts leads to huge savings in restart and rework time. For large-scale applications the savings compensate for the performance loss and the additional nodes required for redundant computations.

More Details

TYPE SAND Report YEAR 2009

DOI OSTI

Palacios and Kitten : high performance operating systems for scalable virtualized and native supercomputing

Pedretti, Kevin T.T.; Levenhagen, Michael; Brightwell, Ronald B.

Palacios and Kitten are new open source tools that enable applications, whether ported or not, to achieve scalable high performance on large machines. They provide a thin layer over the hardware to support both full-featured virtualized environments and native code bases. Kitten is an OS under development at Sandia that implements a lightweight kernel architecture to provide predictable behavior and increased flexibility on large machines, while also providing Linux binary compatibility. Palacios is a VMM that is under development at Northwestern University and the University of New Mexico. Palacios, which can be embedded into Kitten and other OSes, supports existing, unmodified applications and operating systems by using virtualization that leverages hardware technologies. We describe the design and implementation of both Kitten and Palacios. Our benchmarks show that they provide near native, scalable performance. Palacios and Kitten provide an incremental path to using supercomputer resources that is not performance-compromised.

More Details

TYPE SAND Report YEAR 2009

DOI OSTI

HPC application fault-tolerance using transparent redundant computation

Ferreira, Kurt; Riesen, Rolf; Oldfield, Ron; Brightwell, Ronald B.; Laros, James H.; Pedretti, Kevin P.

As the core count of HPC machines continue to grow in size, issues such as fault tolerance and reliability are becoming limiting factors for application scalability. Current techniques to ensure progress across faults, for example coordinated checkpoint-restart, are unsuitable for machines of this scale due to their predicted high overheads. In this study, we present the design and implementation of a novel system for ensuring reliability which uses transparent, rank-level, redundant computation. Using this system, we show the overheads involved in redundant computation for a number of real-world HPC applications. Additionally, we relate the communication characteristics of an application to the overheads observed.

More Details

TYPE Conference YEAR 2009

OSTI

Catamount Lightweight Kernel

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2009

OSTI

Catamount N-Way Performance on XT5

Brightwell, Ronald B.; Kelly, Suzanne M.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Parallel phase model : a programming model for high-end parallel machines with manycores

Brightwell, Ronald B.; Heroux, Michael A.; Wen, Zhaofang

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.

More Details

TYPE SAND Report YEAR 2009

DOI OSTI

Optimizing MPI Collectives with SMARTMAP

Proposed for publication in the ACM SIGOPS Operating System Review.

Brightwell, Ronald B.; Pedretti, Kevin P.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2008

OSTI

The influence of system balance on OS noise sensitivity

Brightwell, Ronald B.; Pedretti, Kevin P.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Exploring Memory Management Strategies in Catamount

Ferreira, Kurt; Pedretti, Kevin; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Exploring Memory Management Strategies in Catamount

Ferreira, Kurt; Pedretti, Kevin; Levenhagen, Michael; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Application and Operating System Software Challenges in the Multi-core Era

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2008

OSTI

Characterizing Application Sensitivity to OS Interference Using Kernel-Level Noise Injection

Ferreira, Kurt; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

A Prototype Implementation of MPI for SMARTMAP

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Lightweight Kernel Support for Direct Shared Memory Access on a Multi-Core Processor

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

High Message Rate NIC-Based Atomics: Design and Performance Considerations

Levenhagen, Michael; Hemmert, Karl S.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Designing and Implementing Lightweight Kernels for Capability Computing

Concurrency and Computation: Practice and Experience

Riesen, Rolf; Brightwell, Ronald B.; Ferreira, Kurt

Abstract not provided.

More Details

TYPE Journal Article YEAR 2008

OSTI

Instrumentation and Analysis of MPI Queue Times on the SeaStar High-Performance Network

Brightwell, Ronald B.; Pedretti, Kevin; Ferreira, Kurt

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC'07

Underwood, Keith D.; Levenhagen, Michael; Brightwell, Ronald B.

Partitioned global address space (PGAS) programming models have been identified as one of the few viable approaches for dealing with emerging many-core systems. These models tend to generate many small messages, which requires specific support from the network interface hardware to enable efficient execution. In the past, Cray included E-registers on the Cray T3E to support the SHMEM API; however, with the advent of multi-core processors, the balance of computation to communication capabilities has shifted toward computation. This paper explores the message rates that are achievable with multi-core processors and simplified PGAS support on a more conventional network interface. For message rate tests, we find that simple network interface hardware is more than sufficient. We also find that even typical data distributions, such as cyclic or block-cyclic, do not need specialized hardware support. Finally, we assess the impact of such support on the well known RandomAccess benchmark. (c) 2007 ACM.

More Details

TYPE Conference YEAR 2007

Scopus OSTI

Publications

Search results