Publications Search

Embracing Diversity: OS Support for Integrating High-Performance Computing and Data Analytics

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Software Requirements for ATDM On-Node Resource Management

Olivier, Stephen L.; Foulk, James W.; Brightwell, Ronald B.

This report outlines the software requirements for on-node resource management in the Advanced Simulation and Computing (ASC) Advanced Technology Development and Mitigation (ATDM) project at Sandia National Laboratories (SNL). The need for on-node resource management has arisen from the componentization of the software stack. Componentization aids in managing complexity and making software more composable and reusable. However, components must compete for limited on-node resources for execution (e.g., cores and hardware threads) and memory. The requirements documented in this report support an effort to manage this contention, avoiding oversubscription of resources and enabling their efficient deployment for application execution.

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

Qthreads: Run Time Library Support for Task Parallel Programming

Brightwell, Ronald B.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Hobbes: A Multi‐Stack Approach for Application Composition and Performance Isolation

Foulk, James W.; Brightwell, Ronald B.; Mukherjee, Shyamali; Evans, Noah; Kocoloski, Brian; Ouyang, Jiannan; Peter, Dinda; Hale, Kyle; Bridges, Patrick; Mondragon, Oscar; Lang, Michael

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

XPRESS: eXascale PRogramming Environment and System Software

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

OS/Runtime Abstractions and Interfaces for Managing the Memory Hierarchy

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

RMA-MT: A Benchmark Suite for Assessing MPI Multi-threaded RMA Performance

Dosanjh, Matthew G.; Groves, Taylor L.; Grant, Ryan; Brightwell, Ronald B.; Bridges, Patrick G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Practical Resilient Cases for FA-MPI a Transactional Fault-Tolerant MPI

Brightwell, Ronald B.; Hassani, Amin; Skjellum, Anthony; Bangalore, Purushotham

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

XPRESS: eXascale Programming Environment and System Software

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Qthreads and Thoughts on ULT Standardization

Brightwell, Ronald B.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Interconnect So*ware/Hardware Co-Design for Extreme-Scale Systems

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Hardware Support for OS/Runtime and Interconnect

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

HPX Applications and Performance Adaptation

Koniges, Alice; Candadai, Jayashree A.; Kaiser, Hartmut; Huck, Kevin; Kemp, Jeremy; Heller, Thomas; Anderson, Matthew; Lumsdaine, Andrew; Serio, Adrian; Wolf, Michael; Lelbach, Bryce; Brightwell, Ronald B.; Sterling, Thomas

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Preparing for Exascale: Modeling MPI for Many-Core Systems using Fine-Grain Queues

Bridges, Patrick G.; Dosanjh, Matthew G.; Grant, Ryan; Farmer, Shane; Skjellum, Anthony; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI

Re-evaluating Network Onload vs. Offload for the Many-Core Era

Dosanjh, Matthew G.; Grant, Ryan; Bridges, Patrick G.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI

Re-evaluating Network Onload vs. Offload for the Many-Core Era

Dosanjh, Matthew G.; Grant, Ryan; Bridges, Patrick; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI

Panel: What is a Lightweight Kernel?

Riesen, Rolf; Maccabe, Barney; Gerofi, Balazs; Lombard, David; Lange, John; Foulk, James W.; Ferreira, Kurt; Lang, Mike; Keppel, Pardo; Wisniewski, Robert; Brightwell, Ronald B.; Inglett, Todd; Park, Yoonho; Ishikawa, Yutaka

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Asynchronous Many Task Runtime System Working Group

Brightwell, Ronald B.; Clay, Robert L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Re-evaluating Network Onload vs. Offload for the Many-Core Era

Dosanjh, Matthew G.; Grant, Ryan; Bridges, Patrick G.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Re-evaluating Network Onload vs. Offload for the Many-Core Era

Dosanjh, Matthew G.; Grant, Ryan; Bridges, Patrick G.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Hobbes Extreme Scale OS

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Metrics for evaluating energy saving techniques for resilient HPC systems

Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014

Grant, Ryan; Olivier, Stephen L.; Laros, James H.; Brightwell, Ronald B.; Porterfield, Allan K.

The metrics used for evaluating energy saving techniques for future HPC systems are critical to the correct assessment of proposed methods. Current predictions forecast that overcoming reduced system reliability, increased power requirements and energy consumption will be a major design challenge for future systems. Modern runtime energy-saving research efforts do not take into account the energy spent providing reliability. They also do not account for the increase in the probability of failure during application execution due to runtime overhead from energy saving methods. While this is very reasonable for current systems, it is insufficient for future generation systems. By taking into account the energy consumption ramifications of increased runtimes on system reliability, better energy saving techniques can be developed. This paper demonstrates how to determine the impact of runtime energy conservation methods within the context of failure-prone large scale systems. In addition, a survey of several energy savings methodologies is conducted and an analysis is performed with respect to their effectiveness in an environment in which failures occur.

More Details

TYPE Conference YEAR 2014

Scopus OSTI DOI

An evaluation of MPI message rate on hybrid-core processors

International Journal of High Performance Computing Applications

Brightwell, Ronald B.; Barrett, Brian W.; Grant, Ryan; Hammond, Simon; Hemmert, Karl S.

Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that may combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way in which compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insight into how MPI implementations for future hybrid-core processors should be designed.

More Details

TYPE Journal Article YEAR 2014

DOI OSTI Scopus

The Portals 4.0.2 Networking Programming Interface

Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan; Hemmert, Karl S.; Foulk, James W.; Wheeler, Kyle B.; Underwood, Keith D.; Riesen, Rolf; Maccabe, Arthur B.; Hudson, Trammell

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

Re-evaluating Network Onload vs. Offload for the Many-Core Era

Dosanjh, Matthew G.; Grant, Ryan; Bridges, Patrick; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

Re-evaluating Network Onload vs. Offload for the Many-Core Era

Dosanjh, Matthew G.; Grant, Ryan; Bridges, Patrick; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

Comparing Contrasting Generalizing and Integrating Two Current Designs for Fault-Tolerant MPI

Brightwell, Ronald B.; Hassani, Amin; Skjellum, Anthony

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Hobbes: Using Virtualization to Enable Exascale Applications

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

HCW'14 Panel: Is the Amount of Heterogeneity Increasing in Future Systems?

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Metrics for Evalua0ng Energy Saving Techniques for Resilient HPC Systems

Grant, Ryan; Olivier, Stephen L.; Laros, James H.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Hobbes: OS and Runtime Support for Application Composition

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Portals as a Case Study for Software/Hardware Co-Design

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Asking the right questions: Benchmarking fault-tolerant extreme-scale systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Widener, Patrick; Ferreira, Kurt; Levy, Scott; Bridges, Patrick G.; Arnold, Dorian; Brightwell, Ronald B.

Much recent research has explored fault-tolerance mechanisms intended for current and future extreme-scale systems. Evaluations of the suitability of checkpoint-based solutions have typically been carried out using relatively uncomplicated computational kernels designed to measure floating point performance. More recent investigations have added scaled-down "proxy" applications to more closely match the composition and behavior of deployed ones. However, the information obtained from these studies (whether floating point performance or application runtime) is not necessarily of the most value in evaluating resilience strategies. We observe that even when using a more sophisticated metric, the information available from evaluating uncoordinated checkpointing using both microbenchmarks and proxy applications does not agree. This implies that not only might researchers be asking the wrong questions, but that the answers to the right ones might be unexpected and potentially misleading. We seek to open a discussion on whether benchmarks designed to provide predictable performance evaluations of HPC hardware and toolchains are providing the right feedback for the evaluation of fault-tolerance in these applications, and more generally on how benchmarking of resilience mechanisms ought to be approached in the exascale design space. © 2014 Springer-Verlag Berlin Heidelberg.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

A Transactional Model for Fault-Tolerant MPI for Petascale and Exascale Systems

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Design and Evaluation of FA-MPI A Transactional Fault-Tolerant MPI

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Panel: How Do We Protect the HPC So3ware Investments in the Future

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Metrics for Evaluating Energy Saving Techniques for Resilient HPC Systems

Grant, Ryan; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

A Holistic Approach to Modeling and Simulation for Resilience and Power Configuration

Ferreira, Kurt; Levy, Scott L.N.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Hobbes: Composition and Virtualization as the Foundations of an Extreme-Scale OS/R

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

A System So*ware Approach for Unifying Simula8on and Analysis at Extreme--Scale

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters

Grant, Ryan; Barrett, Brian; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The portals 4.0.1 network programming interface

Barrett, Brian; Brightwell, Ronald B.; Pedretti, Kevin; Hemmert, Karl S.

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities. 3

More Details

TYPE SAND Report YEAR 2013

DOI OSTI