Publications

Results 1–25 of 55

Search results

Jump to search filters

Processing Particle Data Flows with SmartNICs

2022 IEEE High Performance Extreme Computing Conference, HPEC 2022

Liu, Jianshen L.; Maltzahn, Carlos; Curry, Matthew L.; Ulmer, Craig D.

Many distributed applications implement complex data flows and need a flexible mechanism for routing data between producers and consumers. Recent advances in programmable network interface cards, or SmartNICs, represent an opportunity to offload data-flow tasks into the network fabric, thereby freeing the hosts to perform other work. System architects in this space face multiple questions about the best way to leverage SmartNICs as processing elements in data flows. In this paper, we advocate the use of Apache Arrow as a foundation for implementing data-flow tasks on SmartNICs. We report on our experiences adapting a partitioning algorithm for particle data to Apache Arrow and measure the on-card processing performance for the BlueField-2 SmartNIC. Our experiments confirm that the BlueField-2's (de)compression hardware can have a significant impact on in-transit workflows where data must be unpacked, processed, and repacked.

More Details

Performance Characteristics of the BlueField-2 SmartNIC

Liu, Jianshen; Maltzahn, Carlos; Ulmer, Craig D.; Curry, Matthew L.

High-performance computing (HPC) researchers have long envisioned scenarios where application workflows could be improved through the use of programmable processing elements embedded in the network fabric. Recently, vendors have introduced programmable Smart Network Interface Cards (SmartNICs) that enable computations to be offloaded to the edge of the network. There is great interest in both the HPC and high-performance data analytics (HPDA) communities in understanding the roles these devices may play in the data paths of upcoming systems. This paper focuses on characterizing both the networking and computing aspects of NVIDIA’s new BlueField-2 SmartNIC when used in a 100Gb/s Ethernet environment. For the networking evaluation we conducted multiple transfer experiments between processors located at the host, the SmartNIC, and a remote host. These tests illuminate how much effort is required to saturate the network and help estimate the processing headroom available on the SmartNIC during transfers. For the computing evaluation we used the stress-ng benchmark to compare the BlueField-2 to other servers and place realistic bounds on the types of offload operations that are appropriate for the hardware. Our findings from this work indicate that while the BlueField-2 provides a flexible means of processing data at the network’s edge, great care must be taken to not overwhelm the hardware. While the host can easily saturate the network link, the SmartNIC’s embedded processors may not have enough computing resources to sustain more than half the expected bandwidth when using kernel-space packet processing. From a computational perspective, encryption operations, memory operations under contention, and on-card IPC operations on the SmartNIC perform significantly better than the general-purpose servers used for comparisons in our experiments. Therefore, applications that mainly focus on these operations may be good candidates for offloading to the SmartNIC.

More Details

CephFS experiments on stria.sandia.gov

Widener, Patrick W.; Curry, Matthew L.

This report is an institutional record of experiments conducted to explore performance of a vendor installation of CephFS on the SNL stria cluster. Comparisons between CephFS, the Lustre parallel file system, and NFS were done using the IOR and MDTEST benchmarking tools, a test program which uses the SEACAS/Trilinos IOSS library, and the checkpointing activity performed by the LAMMPS molecular dynamics simulation.

More Details

Scale-out edge storage systems with embedded storage nodes to get better availability and cost-efficiency at the same time

HotEdge 2020 - 3rd USENIX Workshop on Hot Topics in Edge Computing

Liu, Jianshen; Curry, Matthew L.; Maltzahn, Carlos; Kufeldt, Philip

In the resource-rich environment of data centers most failures can quickly failover to redundant resources. In contrast, failure in edge infrastructures with limited resources might require maintenance personnel to drive to the location in order to fix the problem. The operational cost of these“truck rolls” to locations at the edge infrastructure competes with the operational cost incurred by extra space and power needed for redundant resources at the edge. Computational storage devices with network interfaces can act as network-attached storage servers and offer a new design point for storage systems at the edge. In this paper we hypothesize that a system consisting of a larger number of such small “embedded” storage nodes provides higher availability due to a larger number of failure domains while also saving operational cost in terms of space and power. As evidence for our hypothesis, we compared the possibility of data loss between two different types of storage systems: one is constructed with general-purpose servers, and the other one is constructed with embedded storage nodes. Our results show that the storage system constructed with general-purpose servers has 7 to 20 times higher risk of losing data over the storage system constructed with embedded storage devices. We also compare the two alternatives in terms of power and space using the Media-Based Work Unit (MBWU) that we developed in an earlier paper as a reference point.

More Details

High performance erasure coding for very large stripe sizes

Simulation Series

Haddock, Walker; Bangalore, Purushotham V.; Curry, Matthew L.; Skjellum, Anthony

Exascale computing demands high bandwidth and low latency I/O on the computing edge. Object storage systems can provide higher bandwidth and lower latencies than tape archive. File transfer nodes present a single point of mediation through which data moving between these storage systems must pass. By increasing the performance of erasure coding, stripes can be subdivided into large numbers of shards. This paper’s contribution is a prototype nearline disk object storage system based on Ceph. We show that using general purpose graphical processing units (GPGPU) for erasure coding on file transfer nodes is effective when using a large number of shards. We describe an architecture for nearline disk archive storage for use with high performance computing (HPC) and demonstrate the performance with benchmarking results. We compare the benchmark performance of our design with the IntelR⃝ Storage Acceleration Library (ISA-L) CPU based erasure coding libraries using the native Ceph erasure coding feature.

More Details
Results 1–25 of 55
Results 1–25 of 55