DOI OSTI

DeACT: Architecture-Aware Virtual Memory Support for Fabric Attached Memory Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Kommareddy, Vamsee R.; Hughes, Clayton; Hammond, Simon; Awad, Amro

1 The exponential growth of data has driven technology providers to develop new protocols, such as cache coherent interconnects and memory semantic fabrics, to help users and facilities leverage advances in memory technologies to satisfy these growing memory and storage demands. Using these new protocols, fabric-Attached memories (FAM) can be directly attached to a system interconnect and be easily integrated with a variety of processing elements (PEs). Moreover, systems that support FAM can be smoothly upgraded and allow multiple PEs to share the FAM memory pools using well-defined protocols. The sharing of FAM between PEs allows efficient data sharing, improves memory utilization, reduces cost by allowing flexible integration of different PEs and memory modules from several vendors, and makes it easier to upgrade the system. One promising use-case for FAMs is in High-Performance Compute (HPC) systems, where the underutilization of memory is a major challenge. However, adopting FAMs in HPC systems brings new challenges. In addition to cost, flexibility, and efficiency, one particular problem that requires rethinking is virtual memory support for security and performance. To address these challenges, this paper presents decoupled access control and address translation (DeACT), a novel virtual memory implementation that supports HPC systems equipped with FAM. Compared to the state-of-The-Art two-level translation approach, DeACT achieves speedup of up to 4.59x (1.8x on average) without compromising security.1Part of this work was done when Vamsee was working under the supervision of Amro Awad at UCF. Amro Awad is now with the ECE Department at NC State.

More Details

TYPE Conference Paper YEAR 2021

DOI OSTI Scopus

Vision for Co-designing a Unified-Memory Centric Heterogeneous Node Architecture

Rajamanickam, Sivasankaran; Krishna, Tushar; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2021

OSTI

SST-GPU: A Scalable SST GPU Component for Performance Modeling and Profiling

Hughes, Clayton; Hammond, Simon; Zhang, Mengchi; Liu, Yechen; Rogers, Tim; Hoekstra, Robert J.

Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of unprecedented amounts of data have created a space for massively parallel accelerators capable of maintaining context for thousands of concurrent threads resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. One path for the design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. The SST framework has been proven to scale up to run simulations containing tens of thousands of nodes. A previous report described the initial integration of the open-source, execution-driven GPU simulator, GPGPU-Sim, into the SST framework. This report discusses the results of the integration and how to use the new GPU component in SST. It also provides examples of what it can be used to analyze and a correlation study showing how closely the execution matches that of a Nvidia V100 GPU when running kernels and mini-apps.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

StressBench: A Configurable Full System Network and I/O Benchmark Framework

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Hughes, Clayton; Hammond, Simon; Foulk, James W.; Foulk, James W.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Experiences Scaling a Production Arm Supercomputer to Petaflops and Beyond

Foulk, James W.; Foulk, James W.; Hammond, Simon

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Scalable generation of graphs for benchmarking HPC community-detection algorithms

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Slota, George M.; Berry, Jonathan; Hammond, Simon; Olivier, Stephen L.; Phillips, Cynthia A.; Rajamanickam, Sivasankaran

Community detection in graphs is a canonical social network analysis method. We consider the problem of generating suites of teras-cale synthetic social networks to compare the solution quality of parallel community-detection methods. The standard method, based on the graph generator of Lancichinetti, Fortunato, and Radicchi (LFR), has been used extensively for modest-scale graphs, but has inherent scalability limitations. We provide an alternative, based on the scalable Block Two-Level Erdos-Renyi (BTER) graph generator, that enables HPC-scale evaluation of solution quality in the style of LFR. Our approach varies community coherence, and retains other important properties. Our methods can scale real-world networks, e.g., to create a version of the Friendster network that is 512 times larger. With BTER's inherent scalability, we can generate a 15-terabyte graph (4.6B vertices, 925B edges) in just over one minute. We demonstrate our capability by showing that label-propagation community-detection algorithm can be strong-scaled with negligible solution-quality loss.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI Scopus