GPU erasure coding for campaign storage
High-performance computing (HPC) demands high bandwidth and low latency in I/O performance leading to the development of storage systems and I/O software components that strive to provide greater and greater performance. However, capital and energy budgets along with increasing storage capacity requirements have motivated the search for lower cost, large storage systems for HPC. With Burst Buffer technology increasing the bandwidth and reducing the latency for I/O between the compute and storage systems, the back-end storage bandwidth and latency requirements can be reduced, especially underneath an adequately sized modern parallel file system. Cloud computing has led to the development of large, low-cost storage solutions where design has focused on high capacity, availability, and low energy consumption at lowest cost. Cloud computing storage systems leverage duplicates and erasure coding technology to provide high availability at much lower cost than traditional HPC storage systems. Leveraging certain cloud storage infrastructure and concepts in HPC would be valuable economically in terms of cost-effective performance for certain storage tiers. To enable the use of cloud storage technologies for HPC we study the architecture for interfacing cloud storage between the HPC parallel file systems and the archive storage. In this paper, we report our comparison of two erasure coding implementations for the Ceph file system. We compare measurements of various degrees of sharding that are relevant for HPC applications. We show that the Gibraltar GPU Erasure coding library outperforms a CPU implementation of an erasure coding plugin for the Ceph object storage system, opening the potential for new ways to architect such storage systems based on Ceph.