Publications

14 Results

Search results

Jump to search filters

Viability of S3 Object Storage for the ASC Program at Sandia

Kordenbrock, Todd H.; Templet, Gary J.; Ulmer, Craig D.; Widener, Patrick

Recent efforts at Sandia such as DataSEA are creating search engines that enable analysts to query the institution’s massive archive of simulation and experiment data. The benefit of this work is that analysts will be able to retrieve all historical information about a system component that the institution has amassed over the years and make better-informed decisions in current work. As DataSEA gains momentum, it faces multiple technical challenges relating to capacity storage. From a raw capacity perspective, data producers will rapidly overwhelm the system with massive amounts of data. From an accessibility perspective, analysts will expect to be able to retrieve any portion of the bulk data, from any system on the enterprise network. Sandia’s Institutional Computing is mitigating storage problems at the enterprise level by procuring new capacity storage systems that can be accessed from anywhere on the enterprise network. These systems use the simple storage service, or S3, API for data transfers. While S3 uses objects instead of files, users can access it from their desktops or Sandia’s high-performance computing (HPC) platforms. S3 is particularly well suited for bulk storage in DataSEA, as datasets can be decomposed into object that can be referenced and retrieved individually, as needed by an analyst. In this report we describe our experiences working with S3 storage and provide information about how developers can leverage Sandia’s current systems. We present performance results from two sets of experiments. First, we measure S3 throughput when exchanging data between four different HPC platforms and two different enterprise S3 storage systems on the Sandia Restricted Network (SRN). Second, we measure the performance of S3 when communicating with a custom-built Ceph storage system that was constructed from HPC components. Overall, while S3 storage is significantly slower than traditional HPC storage, it provides significant accessibility benefits that will be valuable for archiving and exploiting historical data. There are multiple opportunities that arise from this work, including enhancing DataSEA to leverage S3 for bulk storage and adding native S3 support to Sandia’s IOSS library.

More Details

Data Services for Visualization and Analysis - ASC Level II Milestone (7186)

Templet Jr., Gary J.; Glickman, Matthew R.; Kordenbrock, Todd H.; Levy, Scott L.; Lofstead, Gerald F.; Mauldin, Jeff; Otahal, Thomas J.; Ulmer, Craig D.; Widener, Patrick W.; Oldfield, Ron A.

A new in transit Data Service is presented and compared to the traditional file-based workflow and the newly refactored in situ Catalyst workflow. Each workflow is enabled by the IOSS mesh interface equipped with data management layers for Exodus and CGNS (file-based), Catalyst (in situ), and FAODEL (in transit). FAODEL is a distributed object store that can transmit data across MPI allocations. Catalyst is a Para View-based visualization capability developed as part of the CSSE Data Services effort. The workflows considered here take SPARC data into Catalyst for visualization post-processing. Although still in unoptimized form, we show that the in transit approach is a viable alternative to file-based and in situ workflows and offers several advantages to both simulation and post-processing developers. Since IOSS is a mature interface with wide adoption across Sandia and externally, each workflow can be reconfigured to use different simulations that generate mesh data and post-processing tools that consume it.

More Details

The case for explicit reuse semantics for RDMA communication

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020

Levy, Scott L.; Widener, Patrick W.; Ulmer, Craig D.; Kordenbrock, Todd H.

Remote Direct Memory Access (RDMA) is an increasingly important technology in high-performance computing (HPC). RDMA provides low-latency, high-bandwidth data transfer between compute nodes. Additionally, it does not require explicit synchronization with the destination processor. Eliminating unnecessary synchronization can significantly improve the communication performance of large-scale scientific codes. A long-standing challenge presented by RDMA communication is mitigating the cost of registering memory with the network interface controller (NIC). Reusing memory once it is registered has been shown to significantly reduce the cost of RDMA communication. However, existing approaches for reusing memory rely on implicit memory semantics. In this paper, we introduce an approach that makes memory reuse semantics explicit by exposing a separate allocator for registered memory. The data and analysis in this paper yield the following contributions: (i) managing registered memory explicitly enables efficient reuse of registered memory; (ii) registering large memory regions to amortize the registration cost over multiple user requests can significantly reduce cost of acquiring new registered memory; and (iii) reducing the cost of acquiring registered memory can significantly improve the performance of RDMA communication. Reusing registered memory is key to high-performance RDMA communication. By making reuse semantics explicit, our approach has the potential to improve RDMA performance by making it significantly easier for programmers to efficiently reuse registered memory.

More Details

Mediating Data Center Storage Diversity in HPC Applications with FAODEL

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Widener, Patrick W.; Ulmer, Craig D.; Levy, Scott L.; Kordenbrock, Todd H.; Templet, Gary J.

Composition of computational science applications into both ad hoc pipelines for analysis of collected or generated data and into well-defined and repeatable workflows is becoming increasingly popular. Meanwhile, dedicated high performance computing storage environments are rapidly becoming more diverse, with both significant amounts of non-volatile memory storage and mature parallel file systems available. At the same time, computational science codes are being coupled to data analysis tools which are not filesystem-oriented. In this paper, we describe how the FAODEL data management service can expose different available data storage options and mediate among them in both application- and FAODEL-directed ways. These capabilities allow applications to exploit their knowledge of the different types of data they may exchange during a workflow execution, and also provide FAODEL with mechanisms to proactively tune data storage behavior when appropriate. We describe the implementation of these capabilities in FAODEL and how they are used by applications, and present preliminary performance results demonstrating the potential benefits of our approach.

More Details

ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE

Bettencourt, Matthew T.; Kramer, Richard M.; Cartwright, Keith C.; Phillips, Edward G.; Ober, Curtis C.; Pawlowski, Roger P.; Swan, Matthew S.; Kalashnikova, Irina; Phipps, Eric T.; Conde, Sidafa C.; Cyr, Eric C.; Ulmer, Craig D.; Kordenbrock, Todd H.; Levy, Scott L.; Templet, Gary J.; Hu, Jonathan J.; Lin, Paul L.; Glusa, Christian A.; Siefert, Christopher S.; Glass, Micheal W.

This report documents the outcome from the ASC ATDM Level 2 Milestone 6358: Assess Status of Next Generation Components and Physics Models in EMPIRE. This Milestone is an assessment of the EMPIRE (ElectroMagnetic Plasma In Realistic Environments) application and three software components. The assessment focuses on the electromagnetic and electrostatic particle-in-cell solutions for EMPIRE and its associated solver, time integration, and checkpoint-restart components. This information provides a clear understanding of the current status of the EMPIRE application and will help to guide future work in FY19 in order to ready the application for the ASC ATDM L1 Milestone in FY20. It is clear from this assessment that performance of the linear solver will have to be a focus in FY19.

More Details

EMPRESS-Extensible metadata provider for extreme-scale scientific simulations

Proceedings of PDSW-DISCS 2017 - 2nd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis

Lawson, Margaret R.; Lofstead, Gerald F.; Levy, Scott L.; Widener, Patrick W.; Ulmer, Craig D.; Mukherjee, Shyamali M.; Templet, Gary J.; Kordenbrock, Todd H.

Significant challenges exist in the efficient retrieval of data from extreme-scale simulations. An important and evolving method of addressing these challenges is application-level metadata management. Historically, HDF5 and NetCDF have eased data retrieval by offering rudimentary attribute capabilities that provide basic metadata. ADIOS simplified data retrieval by utilizing metadata for each process' data. EMPRESS provides a simple example of the next step in this evolution by integrating per-process metadata with the storage system itself, making it more broadly useful than single file or application formats. Additionally, it allows for more robust and customizable metadata.

More Details

ATDM Data Management FY2015: Data Warehouse Progress Report

Ulmer, Craig D.; Fabian, Nathan D.; Kordenbrock, Todd H.; Mukherjee, Shyamali M.; Oldfield, Ron A.; Templet, Gary J.

The Advanced Technology Development and Mitigation (ATDM) program at Sandia National Laboratories is a new effort to build next-generation simulation codes that will map well to upcoming exascale computing platforms. Rather than follow traditional single- program, multiple data (SPMD) programming techniques, ATDM is developing applications in an asynchronous many task (AMT) form that describes work as a graph of tasks that have data dependencies. The data management team is focused on developing a data warehouse for ATDM that will enable tasks to store and exchange data objects efficiently. This report summarizes the data management teams efforts during FY15, and documents: (1) an initial API and implementation for the data warehouses key/value store, (2) API requirements for use with ATDMs runtime, (3) initial requirements for storing ATDM-specific data, and (4) the current organization of software components that will be used by the data warehouse.

More Details
14 Results
14 Results