Publications Details

Publications / Conference

Experience in implementing a parallel file system

Wheat, S.R.

With ever increasing processor and memory speeds, new methods to overcome the ``I/O bottleneck`` need to be found. This is especially true for massively parallel computers that need to store and retrieve large amounts of data fast and reliably, to fully utilize the available processing power. We have designed and implemented a parallel file system, that distributes the work of transferring data to and from mass storage, across several I/O nodes and communication channels. The prototype parallel file system makes use of the existing single threaded file system of the Sandia/University of New Mexico Operating System (SUNMOS). SUNMOS is a joint project between Sandia National Laboratory and the University of New Mexico to create a small and efficient OS for Massively Parallel (MP) Multiple Instruction, Multiple Data (MIMD) machines. We chose file striping to interleave files across sixteen disks. By using source-routing of messages we were able to increase throughput beyond the maximum single channel bandwidth the default routing algorithm of the nCUBE 2 hypercube allows. We describe our implementation, the results of our experiments, and the influence this work has had on the design of the Performance-oriented, User-managed, Messaging Architecture (PUMA) operating system, the successor to SUNMOS.