Publications Search

A Study on the Integration between SIMT and Scalar Cores: Loosely Coupled to Tightly Coupled

Ramshanker, Abinands; Chetput Venkataraghaven, Sooraj; Hughes, Clayton; Foulk, James W.; Rogers, Timothy

Abstract not provided.

More Details

TYPE Conference Proceeding YEAR 2024

OSTI

SST Tutorial

Lavin, Patrick R.; Hemmert, Karl S.; Hughes, Clayton

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2024

DOI OSTI

“Smarter” NICs for faster algorithms [Slides]

Karamati, Sara; Young, Jeffrey L.; Vuduc, Rich; Hemmert, Karl S.; Schonbein, William W.; Siefert, Christopher; Levy, Scott L.N.; Hughes, Clayton

The basic building block of a distributed-memory cluster or supercomputer is a node. Each node includes a host, which is a processor (xPU) + memory hierarchy. The host can communicate with other hosts via its NIC (network interface controller). A network connects the nodes. The nodes may be arranged in some topology, which determines the network’s carrying capacity and cost.

More Details

TYPE Other Report YEAR 2023

DOI OSTI

Evaluation of HPC Workloads Running on Open-Source RISC-V Hardware

Foulk, James W.; Berger-Vergiat, Luc; Feinberg, Benjamin; Hughes, Clayton; Levenhagen, Michael

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2023

OSTI

ERAS: Enabling Integration of Real-World Intellectual Properties in Architectural Simulators -- Osseus Introduction

Hughes, Clayton; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Feinberg, Benjamin; Nema, Shubham; Awad, Amro; Kirschner, Justin; Adak, Debpratim

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2023

DOI OSTI

ERAS: Enabling Integration of Real-World Intellectual Properties in Architectural Simulators -- SST Introduction

Hughes, Clayton; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Nema, Shubham; Awad, Amro; Kaushik Chunduru, Shiva

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2023

DOI OSTI

Co-Designing Open-Source Hardware With The Structural Simulation Toolkit

Hughes, Clayton; Voskuilen, Gwendolyn R.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2023

DOI OSTI

ATHNEA: Enabling Codesign for Next-Generation AI/ML Architectures

Plagge, Mark; Feinberg, Benjamin; Rothganger, Fredrick R.; Agarwal, Sapan; Hughes, Clayton; Cardwell, Suma G.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

ATHENA: An Analytical Analog Neuromorphic Hardware Estimation Tool

Plagge, Mark; Cardwell, Suma G.; Hughes, Clayton; Agarwal, Sapan

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2022

OSTI

ATHENA: Analytical Tool for Heterogeneous Neuromorphic Architectures

Cardwell, Suma G.; Plagge, Mark; Hughes, Clayton; Rothganger, Fredrick R.; Agarwal, Sapan; Feinberg, Benjamin; Awad, Amro; Mcfarland, John; Parker, Luke

The ASC program seeks to use machine learning to improve efficiencies in its stockpile stewardship mission. Moreover, there is a growing market for technologies dedicated to accelerating AI workloads. Many of these emerging architectures promise to provide savings in energy efficiency, area, and latency when compared to traditional CPUs for these types of applications — neuromorphic analog and digital technologies provide both low-power and configurable acceleration of challenging artificial intelligence (AI) algorithms. If designed into a heterogeneous system with other accelerators and conventional compute nodes, these technologies have the potential to augment the capabilities of traditional High Performance Computing (HPC) platforms [5]. This expanded computation space requires not only a new approach to physics simulation, but the ability to evaluate and analyze next-generation architectures specialized for AI/ML workloads in both traditional HPC and embedded ND applications. Developing this capability will enable ASC to understand how this hardware performs in both HPC and ND environments, improve our ability to port our applications, guide the development of computing hardware, and inform vendor interactions, leading them toward solutions that address ASC’s unique requirements.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

Data Transfers and Host/Device Communication using OneAPI for FPGA

Lane, Phillip A.; Siefert, Christopher; Olivier, Stephen L.; Hughes, Clayton; Foulk, James W.; Voskuilen, Gwendolyn R.; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Modeling Analog Tile-Based Accelerators Using SST

Feinberg, Benjamin; Agarwal, Sapan; Plagge, Mark; Rothganger, Fredrick R.; Cardwell, Suma G.; Hughes, Clayton

Analog computing has been widely proposed to improve the energy efficiency of multiple important workloads including neural network operations, and other linear algebra kernels. To properly evaluate analog computing and explore more complex workloads such as systems consisting of multiple analog data paths, system level simulations are required. Moreover, prior work on system architectures for analog computing often rely on custom simulators creating signficant additional design effort and complicating comparisons between different systems. To remedy these issues, this report describes the design and implementation of a flexible tile-based analog accelerator element for the Structural Simulation Toolkit (SST). The element focuses on heavily on the tile controller—an often neglected aspect of prior work—that is sufficiently versatile to simulate a wide range of different tile operations including neural network layers, signal processing kernels, and generic linear algebra operations without major constraints. The tile model also interoperates with existing SST memory and network models to reduce the overall development load and enable future simulation of heterogeneous systems with both conventional digital logic and analog compute tiles. Finally, both the tile and array models are designed to easily support future extensions as new analog operations and applications that can benefit from analog computing are developed.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

Unified Memory: GPGPU-Sim/UVM Smart Integration

Liu, Yechen; Rogers, Timothy; Hughes, Clayton

CPU/GPU heterogeneous compute platforms are an ubiquitous element in computing and a programming model specified for this heterogeneous computing model is important for both performance and programmability. A programming model that exposes the shared, unified, address space between the heterogeneous units is a necessary step in this direction as it removes the burden of explicit data movement from the programmer while maintaining performance. GPU vendors, such as AMD and NVIDIA, have released software-managed runtimes that can provide programmers the illusion of unified CPU and GPU memory by automatically migrating data in and out of the GPU memory. However, this runtime support is not included in GPGPU-Sim, a commonly used framework that models the features of a modern graphics processor that are relevant to non-graphics applications. UVM Smart was developed, which extended GPGPU-Sim 3.x to in- corporate the modeling of on-demand pageing and data migration through the runtime. This report discusses the integration of UVM Smart and GPGPU-Sim 4.0 and the modifications to improve simulation performance and accuracy.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

Simulating Next-Gen Dataflow Architectures for HPC

Hughes, Clayton; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Solving Sparse Linear Systems on FPGAs using oneAPI

Siefert, Christopher; Hughes, Clayton; Miller, Nicholas; Olivier, Stephen L.; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Minerva: Rethinking Secure Architectures for the Era of Fabric-Attached Memory Architectures

Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022

Alwadi, Mazen; Wang, Rujia; Mohaisen, David; Hughes, Clayton; Hammond, Simon; Awad, Amro

Fabric-attached memory (FAM) is proposed to enable the seamless integration of directly accessible memory modules attached to the shared system fabric, which will provide future systems with flexible memory integration options, mitigate underutilization, and facilitate data sharing. Recently proposed interconnects, such as Gen-Z and Compute Express Link (CXL), define security, correctness, and performance requirements of fabric-attached devices, including memory. These initiatives are supported by most major system and processor vendors, bringing widespread adoption of FAM-enabled systems one step closer to reality and security concerns to the forefront. This paper discusses the challenges for adapting secure memory implementations to FAM-enabled systems for the first time in literature. Specifically, we observe that handling the security metadata used to protect fabric-attached memories needs to be done deliberately to eliminate unintentional integrity check failures and/or security vulnerabilities, caused by an inconsistent view of the shared security metadata across nodes. Our scheme, Minerva, elegantly adapts secure memory implementations to support FAM-enabled systems with negligible performance over-heads (3.8% of an ideal scheme), compared to the performance overhead (99.5% of an ideal scheme) for a scheme that uses conventional invalidation-based cache coherence to ensure the consistency of security metadata across nodes.

More Details

TYPE Conference Proceeding YEAR 2022

DOI OSTI Scopus

'Smarter' NICs for faster molecular dynamics: a case study

Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022

Karamati, Sara; Hughes, Clayton; Hemmert, Karl S.; Grant, Ryan E.; Schonbein, William W.; Levy, Scott L.N.; Conte, Thomas M.; Young, Jeffrey; Buduc, Richard W.

This work evaluates the benefits of using a 'smart' network interface card (SmartNIC) as a compute accelerator for the example of the MiniMD molecular dynamics proxy application. The accelerator is NVIDIA's BlueField-2 card, which includes an 8-core Arm processor along with a small amount of DRAM and storage. We test the networking and data movement performance of these cards compared to a standard Intel server host using microbenchmarks and MiniMD. In MiniMD, we identify two distinct classes of computation, namely core computation and maintenance computation, which are executed in sequence. We restructure the algorithm and code to weaken this dependence and increase task parallelism, thereby making it possible to increase utilization of the BlueField-2 concurrently with the host. We evaluate our implementation on a cluster consisting of 16 dual-socket Intel Broadwell host nodes with one BlueField-2 per host-node. Our results show that while the overall compute performance of BlueField-2 is limited, using them with a modified MiniMD algorithm allows for up to 20% speedup over the host CPU baseline with no loss in simulation accuracy.

More Details

TYPE Conference Proceeding YEAR 2022

DOI OSTI Scopus

ARIAA Update -- SST

Hughes, Clayton; Ashraf, Rizwan; Gioiosa, Roberto; Phillips, Cynthia A.; Berry, Jonathan; Hart, William E.; Laird, Carl; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Computational Offload with BlueField Smart NICs

Karamati, Sara; Young, Jeffrey; Conte, Tom; Hemmert, Karl S.; Grant, Ryan; Hughes, Clayton; Vuduc, Rich

The recent introduction of a new generation of "smart NICs" have provided new accelerator platforms that include CPU cores or reconfigurable fabric in addition to traditional networking hardware and packet offloading capabilities. While there are currently several proposals for using these smartNICs for low-latency, in-line packet processing operations, there remains a gap in knowledge as to how they might be used as computational accelerators for traditional high-performance applications. This work aims to look at benchmarks and mini-applications to evaluate possible benefits of using a smartNIC as a compute accelerator for HPC applications. We investigate NVIDIA's current-generation BlueField-2 card, which includes eight Arm CPUs along with a small amount of storage, and we test the networking and data movement performance of these cards compared to a standard Intel server host. We then detail how two different applications, YASK and miniMD can be modified to make more efficient use of the BlueField-2 device with a focus on overlapping computation and communication for operations like neighbor building and halo exchanges. Our results show that while the overall compute performance of these devices is limited, using them with a modified miniMD algorithm allows for potential speedups of 5 to 20% over the host CPU baseline with no loss in simulation accuracy.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

A-SST Initial Specification

Rodrigues, Arun; Hammond, Simon; Hemmert, Karl S.; Hughes, Clayton; Kenny, Joseph; Voskuilen, Gwendolyn R.

The U.S. Army Research Office (ARO), in partnership with IARPA, are investigating innovative, efficient, and scalable computer architectures that are capable of executing next-generation large scale data-analytic applications. These applications are increasingly sparse, unstructured, non-local, and heterogeneous. Under the Advanced Graphic Intelligence Logical computing Environment (AGILE) program, Performer teams will be asked to design computer architectures to meet the future needs of the DoD and the Intelligence Community (IC). This design effort will require flexible, scalable, and detailed simulation to assess the performance, efficiency, and validity of their designs. To support AGILE, Sandia National Labs will be providing the AGILE-enhanced Structural Simulation Toolkit (A-SST). This toolkit is a computer architecture simulation framework designed to support fast, parallel, and multi-scale simulation of novel architectures. This document describes the A-SST framework, some of its library of simulation models, and how it may be used by AGILE Performers.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI