Publications

Results 1–50 of 114

Computing-as-a-Service: A Blueprint For Digital Engineering

Foulk, James W.; Younge, Andrew J.; Lueninghoener, Cory D.; Bernard, Sylvain R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2024

DOI OSTI

Enabling power measurement and control on Astra: The first petascale Arm supercomputer

Concurrency and Computation: Practice and Experience

Grant, Ryan E.; Hammond, Simon D.; Foulk, James W.; Levenhagen, Michael; Olivier, Stephen L.; Foulk, James W.; Ward, Lee; Younge, Andrew J.

Astra, deployed in 2018, was the first petascale supercomputer to utilize processors based on the ARM instruction set. The system was also the first under Sandia's Vanguard program which seeks to provide an evaluation vehicle for novel technologies that with refinement could be utilized in demanding, large-scale HPC environments. In addition to ARM, several other important first-of-a-kind developments were used in the machine, including new approaches to cooling the datacenter and machine. This article documents our experiences building a power measurement and control infrastructure for Astra. While this is often beyond the control of users today, the accurate measurement, cataloging, and evaluation of power, as our experiences show, is critical to the successful deployment of a large-scale platform. While such systems exist in part for other architectures, Astra required new development to support the novel Marvell ThunderX2 processor used in compute nodes. In addition to documenting the measurement of power during system bring up and for subsequent on-going routine use, we present results associated with controlling the power usage of the processor, an area which is becoming of progressively greater interest as data centers and supercomputing sites look to improve compute/energy efficiency and find additional sources for full system optimization.

More Details

TYPE Journal Article YEAR 2023

DOI OSTI Scopus

SNL ATDM Software Ecosystem Then and Now: Operating Systems and On-Node Runtime

Olivier, Stephen L.; Brightwell, Ronald B.; Dosanjh, Matthew G.; Ferreira, Kurt; Levy, Scott L.N.; Foulk, James W.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2022

OSTI

Chapel and Grafiki Integration

Mccrary, Trevor; Devine, Karen; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Building and Running HPC Containers across the US Department of Energy

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Building HPC Containers: A Primer

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

The Design and Development of ATSE: Advanced Tri-lab Software Environment

Curry, Matthew L.; Foulk, James W.; Younge, Andrew J.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Gestalt Computing: Hybrid Traditional HPC and Cloud Hardware and Software Support

Lofstead, Gerald F.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2022

OSTI

SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime

Olivier, Stephen L.; Brightwell, Ronald B.; Dosanjh, Matthew G.; Ferreira, Kurt; Levy, Scott L.N.; Foulk, James W.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2022

OSTI

Sandia?s Experiences (and Predictions) on Arm

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Constructing Containers for Exascale Computing

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

CANOPIE-HPC Workshop Introduction - SC21

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Integrating PGAS and MPI-based Graph Analysis

Mccrary, Trevor M.; Devine, Karen; Younge, Andrew J.

This project demonstrates that Chapel programs can interface with MPI-based libraries written in C++ without storing multiple copies of shared data. Chapel is a language for productive parallel computing using global address spaces (PGAS). We identified two approaches to interface Chapel code with the MPI-based Grafiki and Trilinos libraries. The first uses a single Chapel executable to call a C function that interacts with the C++ libraries. The second uses the mmap function to allow separate executables to read and write to the same block of memory on a node. We also encapsulated the second approach in Docker/Singularity containers to maximize ease of use. Comparisons of the two approaches using shared and distributed memory installations of Chapel show that both approaches provide similar scalability and performance.

More Details

TYPE Other Report YEAR 2021

DOI OSTI

Fine-Grained Containerization for Data Authentication, Security, and Portable Workflows

Lofstead, Gerald F.; Olaya, Paula; Younge, Andrew J.; Taufer, Michela

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2021

OSTI

Portability and Reproducibility Considerations with Containers

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

A Case Study in Using Containers to Build and Distribute HPC Applications: ALEGRA

Younge, Andrew J.; Fuller, Timothy J.; Foulk, James W.; Bova, Steven W.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Sandias Experiences with Arm

Younge, Andrew J.; Hammond, Simon; Foulk, James W.; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Containers and the Arm Ecosystem

Younge, Andrew J.; Hammond, Si; Foulk, James W.; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Experiences with Arm

Hammond, Simon; Foulk, James W.; Foulk, James W.; Younge, Andrew J.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Fugaku and A64FX Update - April 2021

Hammond, Simon; Curry, Matthew; Davis, Kevin; Dang, Vinh Q.; Guba, Oksana; Hoekstra, Robert J.; Foulk, James W.; Foulk, James W.; Poliakoff, David; Rajamanickam, Sivasankaran; Trott, Christian R.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

ECP Tutorial: Getting Started with Containers on HPC

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Container Support at DOE Compute Facilities 2021

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Supercontainers and E4S

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Containers at Sandia

Younge, Andrew J.; Agelastos, Anthony M.; Lawson, Gary; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime

Olivier, Stephen L.; Brightwell, Ronald B.; Ferreira, Kurt; Grant, Ryan; Levy, Scott L.N.; Foulk, James W.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Containers and the Truth between HPC & Cloud System Software Convergence

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

ALAMO: Autonomous lightweight allocation, management, and optimization

Communications in Computer and Information Science

Brightwell, Ronald B.; Ferreira, Kurt; Grant, Ryan; Levy, Scott L.N.; Lofstead, Gerald F.; Olivier, Stephen L.; Foulk, James W.; Younge, Andrew J.; Gentile, Ann C.; Foulk, James W.

Several recent workshops conducted by the DOE Advanced Scientific Computing Research program have established the fact that the complexity of developing applications and executing them on high-performance computing (HPC) systems is rising at a rate which will make it nearly impossible to continue to achieve higher levels of performance and scalability. Absent an alternative approach to managing this ever-growing complexity, HPC systems will become increasingly difficult to use. A more holistic approach to designing and developing applications and managing system resources is required. This paper outlines a research strategy for managing the increasing the complexity by providing the programming environment, software stack, and hardware capabilities needed for autonomous resource management of HPC systems. Developing portable applications for a variety of HPC systems of varying scale requires a paradigm shift from the current approach, where applications are painstakingly mapped to individual machine resources, to an approach where machine resources are automatically mapped and optimized to applications as they execute. Achieving such automated resource management for HPC systems is a daunting challenge that requires significant sustained investment in exploring new approaches and novel capabilities in software and hardware that span the spectrum from programming systems to device-level mechanisms. This paper provides an overview of the functionality needed to enable autonomous resource management and optimization and describes the components currently being explored at Sandia National Laboratories to help support this capability.

More Details

TYPE Conference Poster YEAR 2021

OSTI Scopus

HPC Operating SystemResearch Areas and Challenges

Foulk, James W.; Brightwell, Ronald B.; Younge, Andrew J.; Lange, Jack

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Containers for the Modernization of HPC Software Deployment

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Chronicles of astra: Challenges and lessons from the first petascale arm supercomputer

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Foulk, James W.; Younge, Andrew J.; Hammond, Simon; Foulk, James W.; Curry, Matthew; Aguilar, Michael J.; Hoekstra, Robert J.; Brightwell, Ronald B.

Arm processors have been explored in HPC for several years, however there has not yet been a demonstration of viability for supporting large-scale production workloads. In this paper, we offer a retrospective on the process of bringing up Astra, the first Petascale supercomputer based on 64-bit Arm processors, and validating its ability to run production HPC applications. Through this process several immature technology gaps were addressed, including software stack enablement, Linux bugs at scale, thermal management issues, power management capabilities, and advanced container support. From this experience, several lessons learned are formulated that contributed to the successful deployment of Astra. These insights can be helpful to accelerate deploying and maturing other first-seen HPC technologies. With Astra now supporting many users running a diverse set of production applications at multi-thousand node scales, we believe this constitutes strong supporting evidence that Arm is a viable technology for even the largest-scale supercomputer deployments.

More Details

TYPE Conference Poster YEAR 2020

OSTI Scopus

Early Experiences with A64FX

Hammond, Simon; Younge, Andrew J.; Foulk, James W.; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2020

DOI OSTI

Enabling Power Measurement and Control on Astra: The First Petascale Arm Supercomputer

Grant, Ryan; Hammond, Simon; Foulk, James W.; Levenhagen, Michael; Olivier, Stephen L.; Foulk, James W.; Ward, Harry L.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2020

OSTI

CANOPIE-HPC Workshop at SC20

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2020

DOI OSTI

Modern Container Runtimes for Exascale computing era

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Job Modeling for Power Forecasting and Analysis on the Astra Supercomputer

Wang, Felix W.; Foulk, James W.; Vineyard, Craig M.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2020

OSTI

Chronicles of Astra: Challenges and Lessons from theFirst Petascale Arm Supercomputer

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2020

DOI OSTI

ECP Container Status 2020

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Towards Containerized HPC Applications at Exascale

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2020

DOI OSTI

Containers in HPC: Testbeds Production and Towards Exascale

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Containers and the Future of Supercomputing

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

HPC Container Runtimes: A Quick Primer

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Supercomputing with Containers: Practice Experiences and Tupperware

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Container Utilization at DOE Compute Facilities

Younge, Andrew J.; Agelastos, Anthony M.; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Machines Learning about Machines - ML for Analysis and Control of HPC Infrastructure

Wang, Felix W.; Green, Sam; Foulk, James W.; Vineyard, Craig M.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

ECP Tutorial: Getting Started with Containers on HPC

Younge, Andrew J.; Canon, Shane; Shende, Sameer

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

HPC Containers Usage Sandia National Laboratories

Younge, Andrew J.; Agelastos, Anthony M.; Lawson, Gary; Foulk, James W.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

ECP Supercontainers:2.3.5.09 Packaging Technologies

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Data Pallets: Containerizing Storage For Reproducibility and Traceability

Lecture Notes in Computer Science

Lofstead, Gerald F.; Baker, Joshua; Younge, Andrew J.

Trusting simulation output is crucial for Sandia’s mission objectives. Here, we rely on these simulations to perform our high-consequence mission tasks given national treaty obligations. Other science and modeling applications, while they may have high-consequence results, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed workflow and provenance systems to aid in both automating simulation and modeling execution as well as determining exactly how was some output was created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated “sandbox” and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project explores extending the container concept to include storage as a new container type we call data pallets. Data Pallets are potentially writeable, auto generated by the system based on IO activities, and usable as a way to link the contained data back to the application and input deck used to create it.

More Details

TYPE Journal Article YEAR 2019

DOI OSTI

Enabling HPC workloads on Cloud Infrastructure using Kubernetes Container Orchestration Mechanisms

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

A Case for Portability and Reproducibility of HPC Containers

Proceedings of CANOPIE-HPC 2019: 1st International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Canon, Richard S.; Younge, Andrew J.

Containerized computing is quickly changing the landscape for the development and deployment of many HPC applications. Containers are able to lower the barrier of entry for emerging workloads to leverage supercomputing resources. However, containers are no silver bullet for deploying HPC software and there are several challenges ahead in which the community must address to ensure container workloads can be reproducible and inter-operable. In this paper, we discuss several challenges in utilizing containers for HPC applications and the current approaches used in many HPC container runtimes. These approaches have been proven to enable high-performance execution of containers at scale with the appropriate runtimes. However, the use of these techniques are still ad hoc, test the limits of container workload portability, and several gaps likely remain. We discuss those remaining gaps and propose several potential solutions, including custom container label tagging and runtime hooks as a first step in managing HPC system library complexity.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI Scopus

Results 1–50 of 114

Results 1–50 of 114