Page 2 – Center for Computing Research (CCR)

In order to provide large quantities of high-reliability disk-based storage, it has become necessary to aggregate disks into fault-tolerant groups based on the RAID methodology. Most RAID levels do provide some fault tolerance, but there are certain classes of applications that require increased levels of fault tolerance within an array. Some of these applications include embedded systems in harsh environments that have a low level of serviceability, or uninhabited data centers servicing cloud computing. When describing RAID reliability, the Mean Time To Data Loss (MTTDL) calculations will often assume that the time to replace a failed disk is relatively low, or even negligible compared to rebuild time. For platforms that are in remote areas collecting and processing data, it may be impossible to access the system to perform system maintenance for long periods. A disk may fail early in a platform's life, but not be replaceable for much longer than typical for RAID arrays. Service periods may be scheduled at intervals on the order of months, or the platform may not be serviced until the end of a mission in progress. Further, this platform may be subject to extreme conditions that can accelerate wear and tear on a disk, requiring even more protection from failures. We have created a high parity RAID implementation that uses a Graphics Processing Unit (GPU) to compute more than two blocks of parity information per stripe, allowing extra parity to eliminate or reduce the requirement for rebuilding data between service periods. While this type of controller is highly effective for RAID 6 systems, an important benefit is the ability to incorporate more parity into a RAID storage system. Such RAID levels, as yet unnamed, can tolerate the failure of three or more disks (depending on configuration) without data loss. While this RAID system certainly has applications in embedded systems running applications in the field, similar benefits can be obtained for servers that are engineered for storage density, with less regard for serviceability or maintainability. A storage brick can be designed to have a MTTDL that extends well beyond the useful lifetime of the hardware used, allowing the disk subsystem to require less service throughout the lifetime of a compute resource. This approach is similar to the Xiotech ISE. Such a design can be deliberately placed remotely (without frequent support) in order to provide colocation, or meet cost goals. For workloads where reliability is key, but conditions are sub-optimal for routine serviceability, a high-parity RAID can provide extra reliability in extraordinary situations. For example, for installations requiring very high Mean Time To Repair, the extra parity can eliminate certain problems with maintaining hot spares, increasing overall reliability. Furthermore, in situations where disk reliability is reduced because of harsh conditions, extra parity can guard against early data loss due to lowered Mean Time To Failure. If used through an iSCSI interface with a streaming workload, it is possible to gain all of these benefits without impacting performance.

More Details

TYPE Conference YEAR 2010

OSTI

A brief parallel I/O tutorial

Ward, Harry L.

This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a way that minimizes interaction with the I/O system, maximizes the amount of data moved per call, and provides the I/O system the most information about the I/O transfer per request.

More Details

TYPE SAND Report YEAR 2010

OSTI DOI

A lightweight, GPU-based software RAID system

Brightwell, Ronald B.; Ward, Harry L.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

On Sacling I/O for Commodity Clusters

Rudish, Don W.; Cranford, Scott C.; Ward, Harry L.; Allan, Benjamin A.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

I/O trace data from homme_cam_3_2_59 code runs

Ward, Harry L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2009

OSTI

Some Open Problems in Supercomputing I/O

Ward, Harry L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2008

OSTI

Scalable IO Requirements at Petascale

Ward, Harry L.

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

Red Storm IO Performance Analysis

Laros, James H.; Ward, Harry L.; Kelly, Suzanne M.; Kellogg, Brian R.; Tomkins, James

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

A lightweight approach to file system development

Oldfield, Ron A.; Riesen, Rolf; Ward, Harry L.; Lawry, William L.

Abstract not provided.

More Details

TYPE Conference YEAR 2005

OSTI

Implementing scalable disk-less clusters using the Network File System (NFS)

Laros, James H.; Laros, James H.; Ward, Harry L.

This paper describes a methodology for implementing disk-less cluster systems using the Network File System (NFS) that scales to thousands of nodes. This method has been successfully deployed and is currently in use on several production systems at Sandia National Labs. This paper will outline our methodology and implementation, discuss hardware and software considerations in detail and present cluster configurations with performance numbers for various management operations like booting.

More Details

TYPE Conference YEAR 2003

OSTI

The Cluster Integration Toolkit (CIT) : an extensible, portable, scalable cluster management software implementation

Laros, James H.; Laros, James H.; Ward, Harry L.; Dauchy, Nathan W.; Vasak, James S.; Klundt, Ruth A.; Laguna, Glenn A.; Epperson, Marcus E.; Stearley, Jon S.

Abstract not provided.

More Details

TYPE Conference YEAR 2003

OSTI

Publications