Publications Search

Trusting simulation output is crucial for Sandia’s mission objectives. Here, we rely on these simulations to perform our high-consequence mission tasks given national treaty obligations. Other science and modeling applications, while they may have high-consequence results, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed workflow and provenance systems to aid in both automating simulation and modeling execution as well as determining exactly how was some output was created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated “sandbox” and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project explores extending the container concept to include storage as a new container type we call data pallets. Data Pallets are potentially writeable, auto generated by the system based on IO activities, and usable as a way to link the contained data back to the application and input deck used to create it.

More Details

TYPE Journal Article YEAR 2019

DOI OSTI

SC19 BOF: Containers in HPC

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

A Case for Portability and Reproducibility of HPC Containers

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Astra and the state of ARM in HPC

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

E4S and Supercontainers

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Astra and the state of ARM in HPC

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Advancing the Usage and Scalability of Containers in HPC

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

SC 19 Tutorial: Best Practices

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Enabling HPC workloads on Cloud Infrastructure using Kubernetes Container Orchestration Mechanisms

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

SC 19 Tutorial: Getting Started with Containers on HPC

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

A Case for Portability and Reproducibility of HPC Containers

Proceedings of CANOPIE-HPC 2019: 1st International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Canon, Richard S.; Younge, Andrew J.

Containerized computing is quickly changing the landscape for the development and deployment of many HPC applications. Containers are able to lower the barrier of entry for emerging workloads to leverage supercomputing resources. However, containers are no silver bullet for deploying HPC software and there are several challenges ahead in which the community must address to ensure container workloads can be reproducible and inter-operable. In this paper, we discuss several challenges in utilizing containers for HPC applications and the current approaches used in many HPC container runtimes. These approaches have been proven to enable high-performance execution of containers at scale with the appropriate runtimes. However, the use of these techniques are still ad hoc, test the limits of container workload portability, and several gaps likely remain. We discuss those remaining gaps and propose several potential solutions, including custom container label tagging and runtime hooks as a first step in managing HPC system library complexity.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI Scopus

Enabling HPC Workloads on Cloud Infrastructure Using Kubernetes Container Orchestration Mechanisms

Proceedings of CANOPIE-HPC 2019: 1st International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Beltre, Angel M.; Saha, Pankaj; Govindaraju, Madhusudhan; Grant, Ryan; Younge, Andrew J.

Containers offer a broad array of benefits, including a consistent lightweight runtime environment through OS-level virtualization, as well as low overhead to maintain and scale applications with high efficiency. Moreover, containers are known to package and deploy applications consistently across varying infrastructures. Container orchestrators manage a large number of containers for microservices based cloud applications. However, the use of such service orchestration frameworks towards HPC workloads remains relatively unexplored. In this paper we study the potential use of Kubernetes on HPC infrastructure for use by the scientific community. We directly compare both its features and performance against Docker Swarm and bare metal execution of HPC applications. Herein, we detail the configurations required for Kubernetes to operate with containerized MPI applications, specifically accounting for operations such as (1) underlying device access, (2) inter-container communication across different hosts, and (3) configuration limitations. This evaluation quantifies the performance difference between representative MPI workloads running both on bare metal and containerized orchestration frameworks with Kubernetes, operating over both Ethernet and InfiniBand interconnects. Our results show that Kubernetes and Docker Swarm can achieve near bare metal performance over RDMA communication when high performance transports are enabled. Our results also show that Kubernetes presents overheads for several HPC applications over TCP/IP protocol. However, Docker Swarm's throughput is near bare metal performance for the same applications.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI Scopus

Publications

Search results