Publications

Results 1–25 of 45
Skip to search filters

A highly reliable RAID system based on GPUs

Curry, Matthew L.

While RAID is the prevailing method of creating reliable secondary storage infrastructure, many users desire more flexibility than offered by current implementations. To attain needed performance, customers have often sought after hardware-based RAID solutions. This talk describes a RAID system that offloads erasure correction coding calculations to GPUs, allowing increased reliability by supporting new RAID levels while maintaining high performance.

More Details

CephFS experiments on stria.sandia.gov

Widener, Patrick W.; Curry, Matthew L.

This report is an institutional record of experiments conducted to explore performance of a vendor installation of CephFS on the SNL stria cluster. Comparisons between CephFS, the Lustre parallel file system, and NFS were done using the IOR and MDTEST benchmarking tools, a test program which uses the SEACAS/Trilinos IOSS library, and the checkpointing activity performed by the LAMMPS molecular dynamics simulation.

More Details

Demonstration of a Legacy Application's Path to Exascale - ASC L2 Milestone 4467

Barrett, Brian B.; Kelly, Suzanne M.; Klundt, Ruth A.; Laros, James H.; Leung, Vitus J.; Levenhagen, Michael J.; Lofstead, Gerald F.; Moreland, Kenneth D.; Oldfield, Ron A.; Pedretti, Kevin P.; Rodrigues, Arun; Barrett, Richard F.; Ward, Harry L.; Vandyke, John P.; Vaughan, Courtenay T.; Wheeler, Kyle B.; Brandt, James M.; Brightwell, Ronald B.; Curry, Matthew L.; Fabian, Nathan D.; Ferreira, Kurt; Gentile, Ann C.; Hemmert, Karl S.

Abstract not provided.

FY18 L2 Milestone #6360 Report: Initial Capability of an Arm-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

FY18 L2 Milestone #8759 Report: Vanguard Astra and ATSE ? an ARM-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

GPU erasure coding for campaign storage

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Haddock, Walker; Curry, Matthew L.; Bangalore, Purushotham V.; Skjellum, Anthony

High-performance computing (HPC) demands high bandwidth and low latency in I/O performance leading to the development of storage systems and I/O software components that strive to provide greater and greater performance. However, capital and energy budgets along with increasing storage capacity requirements have motivated the search for lower cost, large storage systems for HPC. With Burst Buffer technology increasing the bandwidth and reducing the latency for I/O between the compute and storage systems, the back-end storage bandwidth and latency requirements can be reduced, especially underneath an adequately sized modern parallel file system. Cloud computing has led to the development of large, low-cost storage solutions where design has focused on high capacity, availability, and low energy consumption at lowest cost. Cloud computing storage systems leverage duplicates and erasure coding technology to provide high availability at much lower cost than traditional HPC storage systems. Leveraging certain cloud storage infrastructure and concepts in HPC would be valuable economically in terms of cost-effective performance for certain storage tiers. To enable the use of cloud storage technologies for HPC we study the architecture for interfacing cloud storage between the HPC parallel file systems and the archive storage. In this paper, we report our comparison of two erasure coding implementations for the Ceph file system. We compare measurements of various degrees of sharding that are relevant for HPC applications. We show that the Gibraltar GPU Erasure coding library outperforms a CPU implementation of an erasure coding plugin for the Ceph object storage system, opening the potential for new ways to architect such storage systems based on Ceph.

More Details

High performance erasure coding for very large stripe sizes

2019 Spring Simulation Conference, SpringSim 2019

Haddock, Walker; Bangalore, Purushotham V.; Curry, Matthew L.; Skjellum, Anthony

Exascale computing demands high bandwidth and low latency I/O on the computing edge. Object storage systems can provide higher bandwidth and lower latencies than tape archive. File transfer nodes present a single point of mediation through which data moving between these storage systems must pass. By increasing the performance of erasure coding, stripes can be subdivided into large numbers of shards.This paper's contribution is a prototype nearline disk object storage system based on Ceph. We show that using general purpose graphical processing units (GPGPU) for erasure coding on file transfer nodes is effective when using a large number of shards. We describe an architecture for nearline disk archive storage for use with high performance computing (HPC) and demonstrate the performance with benchmarking results. We compare the benchmark performance of our design with the Intel®Storage Acceleration Library (ISA-L) CPU based erasure coding libraries using the native Ceph erasure coding feature.

More Details
Results 1–25 of 45
Results 1–25 of 45