Publications Search

Several recent workshops conducted by the DOE Advanced Scientific Computing Research program have established the fact that the complexity of developing applications and executing them on high-performance computing (HPC) systems is rising at a rate which will make it nearly impossible to continue to achieve higher levels of performance and scalability. Absent an alternative approach to managing this ever-growing complexity, HPC systems will become increasingly difficult to use. A more holistic approach to designing and developing applications and managing system resources is required. This paper outlines a research strategy for managing the increasing the complexity by providing the programming environment, software stack, and hardware capabilities needed for autonomous resource management of HPC systems. Developing portable applications for a variety of HPC systems of varying scale requires a paradigm shift from the current approach, where applications are painstakingly mapped to individual machine resources, to an approach where machine resources are automatically mapped and optimized to applications as they execute. Achieving such automated resource management for HPC systems is a daunting challenge that requires significant sustained investment in exploring new approaches and novel capabilities in software and hardware that span the spectrum from programming systems to device-level mechanisms. This paper provides an overview of the functionality needed to enable autonomous resource management and optimization and describes the components currently being explored at Sandia National Laboratories to help support this capability.

More Details

TYPE Conference Poster YEAR 2021

OSTI Scopus

PMEMCPY: A simple, lightweight, and portable I/O library for storing data in persistent memory

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Logan, Luke M.; Lofstead, Gerald F.; Levy, Scott L.N.; Widener, Patrick; Sun, Xian H.; Kougkas, Anthony

Persistent memory (PMEM) devices can achieve comparable performance to DRAM while providing significantly more capacity. This has made the technology compelling as an expansion to main memory. Rethinking PMEM as storage devices can offer a high performance buffering layer for HPC applications to temporarily, but safely store data. However, modern parallel I/O libraries, such as HDF5 and pNetCDF, are complicated and introduce significant software and metadata overheads when persisting data to these storage devices, wasting much of their potential. In this work, we explore the potential of PMEM as storage through pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memory. We demonstrate that our approach is up to 2x faster than other popular parallel I/O libraries under real workloads.

More Details

TYPE Conference Paper YEAR 2021

DOI OSTI Scopus

FY20 CSSE L2 Milestone 7186

Templet Jr., Gary J.; Glickman, Matthew R.; Kordenbrock, Todd; Levy, Scott L.N.; Lofstead, Gerald F.; Mauldin, Jeff; Otahal, Thomas J.; Ulmer, Craig; Widener, Patrick; Oldfield, Ron

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Containerized Environment for Reproducibility and Traceability of Scientific Workflows

Olaya, Paula; Lofstead, Gerald F.; Taufer, Michela

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Containerized Environment for Reproducibility and Traceability of Scientific Workflows

Olaya, Paula; Lofstead, Gerald F.; Taufer, Michela

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Stitch It Up: Using Progressive Data Storage to Scale Science

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020

Lofstead, Gerald F.; Mitchell, John A.; Chen, Enze

Generally, scientific simulations load the entire simulation domain into memory because most, if not all, of the data changes with each time step. This has driven application structures that have, in turn, affected the design of popular IO libraries, such as HDF-5, ADIOS, and NetCDF. This assumption makes sense for many cases, but there is also a significant collection of simulations where this approach results in vast swaths of unchanged data written each time step.This paper explores a new IO approach that is capable of stitching together a coherent global view of the total simulation space at any given time. This benefit is achieved with no performance penalty compared to running with the full data set in memory, at a radically smaller process requirement, and results in radical data reduction with no fidelity loss. Additionally, the structures employed enable online simulation monitoring.

More Details

TYPE Conference Poster YEAR 2020

OSTI Scopus

Containers and Data-Centric Computing

Lofstead, Gerald F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

SystemX: Turbocharging Scientific Investigation Through Comprehensive Metadata Management

Lawson, Margaret; Lofstead, Gerald F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Data Pallets: Containerizing Storage For Reproducibility and Traceability

Lecture Notes in Computer Science

Lofstead, Gerald F.; Baker, Joshua; Younge, Andrew J.

Trusting simulation output is crucial for Sandia’s mission objectives. Here, we rely on these simulations to perform our high-consequence mission tasks given national treaty obligations. Other science and modeling applications, while they may have high-consequence results, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed workflow and provenance systems to aid in both automating simulation and modeling execution as well as determining exactly how was some output was created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated “sandbox” and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project explores extending the container concept to include storage as a new container type we call data pallets. Data Pallets are potentially writeable, auto generated by the system based on IO activities, and usable as a way to link the contained data back to the application and input deck used to create it.

More Details

TYPE Journal Article YEAR 2019

DOI OSTI