Publications Search

SST Tutorial

Lavin, Patrick R.; Hemmert, Karl S.; Hughes, Clayton

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2024

DOI OSTI

“Smarter” NICs for faster algorithms [Slides]

Karamati, Sara; Young, Jeffrey L.; Vuduc, Rich; Hemmert, Karl S.; Schonbein, William W.; Siefert, Christopher; Levy, Scott L.N.; Hughes, Clayton

The basic building block of a distributed-memory cluster or supercomputer is a node. Each node includes a host, which is a processor (xPU) + memory hierarchy. The host can communicate with other hosts via its NIC (network interface controller). A network connects the nodes. The nodes may be arranged in some topology, which determines the network’s carrying capacity and cost.

More Details

TYPE Other Report YEAR 2023

DOI OSTI

ERAS: Enabling Integration of Real-World Intellectual Properties in Architectural Simulators -- SST Introduction

Hughes, Clayton; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Nema, Shubham; Awad, Amro; Kaushik Chunduru, Shiva

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2023

DOI OSTI

ERAS: Enabling Integration of Real-World Intellectual Properties in Architectural Simulators -- Osseus Introduction

Hughes, Clayton; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Feinberg, Benjamin; Nema, Shubham; Awad, Amro; Kirschner, Justin; Adak, Debpratim

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2023

DOI OSTI

Co-Designing Open-Source Hardware With The Structural Simulation Toolkit

Hughes, Clayton; Voskuilen, Gwendolyn R.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2023

DOI OSTI

The Portals 4.3 Network Programming Interface

Schonbein, William W.; Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan E.; Hemmert, Karl S.; Foulk, James W.; Underwood, Keith; Riesen, Rolf; Hoefler, Torsten; Barbe, Mathieu; Suraty Filho, Luiz H.; Ratchov, Alexandre; Maccabe, Arthur B.

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

'Smarter' NICs for faster molecular dynamics: a case study

Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022

Karamati, Sara; Hughes, Clayton; Hemmert, Karl S.; Grant, Ryan E.; Schonbein, William W.; Levy, Scott L.N.; Conte, Thomas M.; Young, Jeffrey; Buduc, Richard W.

This work evaluates the benefits of using a 'smart' network interface card (SmartNIC) as a compute accelerator for the example of the MiniMD molecular dynamics proxy application. The accelerator is NVIDIA's BlueField-2 card, which includes an 8-core Arm processor along with a small amount of DRAM and storage. We test the networking and data movement performance of these cards compared to a standard Intel server host using microbenchmarks and MiniMD. In MiniMD, we identify two distinct classes of computation, namely core computation and maintenance computation, which are executed in sequence. We restructure the algorithm and code to weaken this dependence and increase task parallelism, thereby making it possible to increase utilization of the BlueField-2 concurrently with the host. We evaluate our implementation on a cluster consisting of 16 dual-socket Intel Broadwell host nodes with one BlueField-2 per host-node. Our results show that while the overall compute performance of BlueField-2 is limited, using them with a modified MiniMD algorithm allows for up to 20% speedup over the host CPU baseline with no loss in simulation accuracy.

More Details

TYPE Conference Proceeding YEAR 2022

DOI OSTI Scopus

HIHE01-36: Evaluate how various topologies perform in the context of link failures [Slides]

Hemmert, Karl S.; Kenny, Joseph

Study looks at the effect that failed links have on the throughput of HPC systems: What workloads are most effected? How many links need to be down before throughput of the machine is noticeably affected?

More Details

TYPE Other Report YEAR 2021

DOI OSTI

Computational Offload with BlueField Smart NICs

Karamati, Sara; Young, Jeffrey; Conte, Tom; Hemmert, Karl S.; Grant, Ryan; Hughes, Clayton; Vuduc, Rich

The recent introduction of a new generation of "smart NICs" have provided new accelerator platforms that include CPU cores or reconfigurable fabric in addition to traditional networking hardware and packet offloading capabilities. While there are currently several proposals for using these smartNICs for low-latency, in-line packet processing operations, there remains a gap in knowledge as to how they might be used as computational accelerators for traditional high-performance applications. This work aims to look at benchmarks and mini-applications to evaluate possible benefits of using a smartNIC as a compute accelerator for HPC applications. We investigate NVIDIA's current-generation BlueField-2 card, which includes eight Arm CPUs along with a small amount of storage, and we test the networking and data movement performance of these cards compared to a standard Intel server host. We then detail how two different applications, YASK and miniMD can be modified to make more efficient use of the BlueField-2 device with a focus on overlapping computation and communication for operations like neighbor building and halo exchanges. Our results show that while the overall compute performance of these devices is limited, using them with a modified miniMD algorithm allows for potential speedups of 5 to 20% over the host CPU baseline with no loss in simulation accuracy.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

A-SST Initial Specification

Rodrigues, Arun; Hammond, Simon; Hemmert, Karl S.; Hughes, Clayton; Kenny, Joseph; Voskuilen, Gwendolyn R.

The U.S. Army Research Office (ARO), in partnership with IARPA, are investigating innovative, efficient, and scalable computer architectures that are capable of executing next-generation large scale data-analytic applications. These applications are increasingly sparse, unstructured, non-local, and heterogeneous. Under the Advanced Graphic Intelligence Logical computing Environment (AGILE) program, Performer teams will be asked to design computer architectures to meet the future needs of the DoD and the Intelligence Community (IC). This design effort will require flexible, scalable, and detailed simulation to assess the performance, efficiency, and validity of their designs. To support AGILE, Sandia National Labs will be providing the AGILE-enhanced Structural Simulation Toolkit (A-SST). This toolkit is a computer architecture simulation framework designed to support fast, parallel, and multi-scale simulation of novel architectures. This document describes the A-SST framework, some of its library of simulation models, and how it may be used by AGILE Performers.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

Towards an Extensible Framework for Accelerated System Simulation

Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hughes, Clayton; Hemmert, Karl S.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2021

DOI OSTI

SST-ExplorerEnabling System-level Performance and Reliability Analysis for Designs with Real-World IPs

Rodrigues, Arun; Awad, Amro; Hughes, Clayton; Agarwal, Sapan; Skoufis, Michael; Voskuilen, Gwendolyn R.; Nema, Shubham; Razdan, Rohin; Gardner, Alan; Hemmert, Karl S.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

ERAS: Enabling the Integration of Real-World Intellectual Properties (IPs) in Architectural Simulators

Nema, Shubham; Razdan, Rohin; Rodrigues, Arun; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Adak, Debratim; Hammond, Simon; Awad, Amro; Hughes, Clayton

Sandia National Laboratories is investigating scalable architectural simulation capabilities with a focus on simulating and evaluating highly scalable supercomputers for high performance computing applications. There is a growing demand for RTL model integration to provide the capability to simulate customized node architectures and heterogeneous systems. This report describes the first steps integrating the ESSENTial Signal Simulation Enabled by Netlist Transforms (ESSENT) tool with the Structural Simulation Toolkit (SST). ESSENT can emit C++ models from models written in FIRRTL to automatically generate components. The integration workflow will automatically generate the SST component and necessary interfaces to ’plug’ the ESSENT model into the SST framework.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

SST-ExplorerEnabling System-level Performance and Reliability Analysis for Designs with Real-World IPs

Rodrigues, Arun; Awad, Amro; Hughes, Clayton; Agarwal, Sapan; Skoufis, Michael; Voskuilen, Gwendolyn R.; Nema, Shubham; Razdan, Rohin; Gardner, Alan; Hemmert, Karl S.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2021

DOI OSTI

Evaluating Trade-offs in Potential Exascale Interconnect Technologies

Hemmert, Karl S.; Bair, Ray; Bhatale, Abhinav; Groves, Taylor; Jain, Nikhil; Lewis, Cannada; Mubarak, Misbah; Pakin, Scott D.; Ross, Robert; Wilke, Jeremiah

This report details work to study trade-offs in topology and network bandwidth for potential interconnects in the exascale (2021-2022) timeframe. The work was done using multiple interconnect models across two parallel discrete event simulators. Results from each independent simulator are shown and discussed and the areas of agreement and disagreement are explored.

More Details

TYPE Other Report YEAR 2020

DOI OSTI

Developing SST Element Libraries

Voskuilen, Gwendolyn R.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

The Exascale Computing Project: Hardware Evaluation for Interconnects

Hemmert, Karl S.; Wilke, Jeremiah; Ross, Rob; Groves, Taylor; Karlin, Ian

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Abstract Machine Models and Proxy Architectures for Exascale Computing

Ang, James A.; Barrett, Richard F.; Benner, Robert E.; Burke, Daniel; Chan, Cy; Cook, Jeanine; Daley, Christopher S.; Donofrio, David; Hammond, Simon; Hemmert, Karl S.; Hoekstra, Robert J.; Ibrahim, Khaled; Kelly, Suzanne M.; Le, Hoang; Leung, Vitus J.; Michelogiannakis, George; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan; Unat, D.; Wright, Nick J.; Voskuilen, Gwendolyn R.

To achieve exascale computing, fundamental hardware architectures must change. The most significant consequence of this assertion is the impact on the scientific and engineering applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. In order to adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency into the future. While many details of the exascale architectures are undefined, an abstract machine model is designed to allow application developers to focus on the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. We use the term proxy architecture to describe a parameterized version of an abstract machine model, with the parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models are formulated to enable discussion between the developers of analytic models and simulators and computer hardware architects. They allow for application performance analysis and hardware optimization opportunities. In this report our goal is to provide the application development community with a set of models that can help software developers prepare for exascale. In addition, through the use of proxy architectures, we can enable a more concrete exploration of how well new and evolving application codes map onto future architectures. This second version of the document addresses system scale considerations and provides a system-level abstract machine model with proxy architecture information.

More Details

TYPE SAND Report YEAR 2019

DOI OSTI

ASC CSSE Milestone 6812: SST-GPGPU

Hughes, Clayton; Hammond, Simon; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Verifying Simulator Readiness for Evaluating Potential Exascale Interconnect Technologies [PowerPoint]

Hemmert, Karl S.; Wilke, Jeremiah; Kenny, Joseph; Lewis, Cannada; Bhatele, Abhinav; Georgakoudis, Giorgis; Pakin, Scott; Mubarak, Misbah; Groves, Taylor

Goals of the milestone are to: verify key hardware contention models in controlled environment; validate simulator readiness for future milestones; and, provide baseline to define cross-validation workflow across teams for ''bracketing'' results.

More Details

TYPE Other Report YEAR 2019

DOI OSTI

ECP HE Node Simulation - SNL

Hughes, Clayton; Rodrigues, Arun; Voskuilen, Gwendolyn R.; Hemmert, Karl S.; Hammond, Simon; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Towards Lightweight and Scalable Simulation of Large-Scale OpenSHMEM Applications

Levenhagen, Michael; Hammond, Simon; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

ASC CSSE Level 2 Milestone Briefing: SST-GPU

Hughes, Clayton; Hammond, Simon; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

The Portals 4.2 Network Programming Interface

Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan; Hemmert, Karl S.; Foulk, James W.; Wheeler, Kyle; Riesen, Rolf; Hoefler, Torsten; Maccabe, Arthur B.; Hudson, Trammell

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

TYPE SAND Report YEAR 2018

DOI OSTI

Towards Lightweight and Scalable Simulation of Large-Scale OpenSHMEM Applications

Levenhagen, Michael; Hammond, Simon; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Trinity: Opportunities and Challenges of a Heterogeneous System

Hemmert, Karl S.; Moore, Stan G.; Gallis, Michael A.; Davis, Mike E.; Levesque, John; Hjelm, Nathan; Lujan, James; Morton, David; Nam, Hai A.; Parga, Alex; Peltz Jr., Paul; Shipman, Galen; Torrez, Alfred

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Structural Simulation Toolkit (SST) Tutorial

Hammond, Simon; Rodrigues, Arun; Voskuilen, Gwendolyn R.; Hemmert, Karl S.; Levenhagen, Michael; Hughes, Clayton; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Trinity: Opportunities and Challenges of a Heterogeneous System

Hemmert, Karl S.; Moore, Stan G.; Gallis, Michael A.; Davis, Mike E.; Levesque, John; Hjelm, Nathan; Lujan, James; Morton, David; Nam, Hai A.; Parga, Alex; Peltz Jr., Paul; Shipman, Galen; Torrez, Alfred

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Trinity Update: Open Science Burst Buffers Intel Xeon Phi Processor Plans

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Merlin Element Library Deep Dive

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Analyzing Exascale Memory Architectures Using the SST Toolkit

Hughes, Clayton; Awad, Amro; Hammond, Simon; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Interconnect Working Group

Hemmert, Karl S.; Bair, Ray; Bhatele, Abhinav; Groves, Taylor; Hammond, Simon; Jain, Nikhil; Levenhagen, Michael; Mubarak, Misbah; Pakin, Scott; Ross, Rob; Wilke, Jeremiah

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

SST Simulation Framework (and Complex Memory)

Hammond, Simon; Hughes, Clayton; Awad, Amro; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hemmert, Karl S.; Levenhagen, Michael; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Towards a Scalable Integrated Simulation Framework for Extreme Heterogeneity in High Performance Computing

Hammond, Simon; Rodrigues, Arun; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Hughes, Clayton; Levenhagen, Michael; Hoekstra, Robert J.; Ang, James A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Final Review of FY17 ASC CSSE L2 Milestone #6018 entitled "Analyzing Power Usage Characteristics of Workloads Running on Trinity"

Hoekstra, Robert J.; Hammond, Simon; Hemmert, Karl S.; Gentile, Ann C.; Oldfield, Ron; Lang, Mike; Martin, Steve

The presentation documented the technical approach of the team and summary of the results with sufficient detail to demonstrate both the value and the completion of the milestone. A separate SAND report was also generated with more detail to supplement the presentation.

More Details

TYPE Other Report YEAR 2017

DOI OSTI

Vanguard: Maturing the ARM Software Ecosystem for U.S. DOE Supercomputing

Foulk, James W.; Foulk, James W.; Grant, Ryan; Hammond, Simon; Hemmert, Karl S.; Martinez, David; Noe, John P.; Foulk, James W.; Ward, Harry L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Sandia's ARM-centric Co-Design Strategy: Introduction to the NNSA/ASC Vanguard Project

Ang, James A.; Brightwell, Ronald B.; Hammond, Simon; Hemmert, Karl S.; Hoekstra, Robert J.; Foulk, James W.; Foulk, James W.; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Trinity Architecture

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Performance Analysis for Using Non-Volatile Memory DIMMs: Opportunities and Challenges

Awad, Amro; Hammond, Simon; Hughes, Clayton; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

Unveiling the Interplay between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks

Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017

Kaplan, Fulya; Tuncer, Ozan; Leung, Vitus J.; Hemmert, Karl S.; Coskun, Ayse K.

Network messaging delay historically constitutes a large portion of the wall-clock time for High Performance Computing (HPC) applications, as these applications run on many nodes and involve intensive communication among their tasks. Dragonfly network topology has emerged as a promising solution for building exascale HPC systems owing to its low network diameter and large bisection bandwidth. Dragonfly includes local links that form groups and global links that connect these groups via high bandwidth optical links. Many aspects of the dragonfly network design are yet to be explored, such as the performance impact of the connectivity of the global links, i.e., global link arrangements, the bandwidth of the local and global links, or the job allocation algorithm. This paper first introduces a packet-level simulation framework to model the performance of HPC applications in detail. The proposed framework is able to simulate known MPI (message passing interface) routines as well as applications with custom-defined communication patterns for a given job placement algorithm and network topology. Using this simulation framework, we investigate the coupling between global link bandwidth and arrangements, communication pattern and intensity, job allocation and task mapping algorithms, and routing mechanisms in dragonfly topologies. We demonstrate that by choosing the right combination of system settings and workload allocation algorithms, communication overhead can be decreased by up to 44%. We also show that circulant arrangement provides up to 15% higher bisection bandwidth compared to the other arrangements, but for realistic workloads, the performance impact of link arrangements is less than 3%.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI Scopus

Structural Simulation Toolkit (SST)

Rodrigues, Arun; Moore, Branden J.; Hammond, Simon; Hemmert, Karl S.; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Performance Analysis for Using Non-Volatile Memory DIMMs: Opportunities and Challenges

Awad, Amro; Hammond, Simon; Hughes, Clayton; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

Journal of Parallel and Distributed Computing

Berry, Jonathan; Bender, Michael A.; Hammond, Simon; Hemmert, Karl S.; Mccauley, Samuel; Moore, Branden J.; Moseley, Benjamin; Phillips, Cynthia A.; Resnick, David R.; Rodrigues, Arun

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Thus, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient. In this paper, we explore the design space for a user-controlled multi-level main memory. Our work identifies situations in which rewriting application kernels can provide significant performance gains when using near-memory. We present algorithms designed for two-level main memory, using divide-and-conquer to partition computations and streaming to exploit data locality. We consider algorithms for the fundamental application of sorting and for the data analysis kernel k-means. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories’ SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance improvements for our sorting algorithm. In contrast, the k-means algorithm is generally CPU bound and does not improve when using near-memory except under extreme conditions. These conditions require large instances that rule out SST simulation, but we demonstrate improvements by running on a customized machine with high and low bandwidth memory. These case studies in co-design serve as positive and cautionary templates, respectively, for the major task of optimizing the computational kernels of many fundamental applications for two-level main memory systems.

More Details

TYPE Journal Article YEAR 2017

DOI OSTI Scopus

The Portals 4.1 Network Programming Interface

Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan; Hemmert, Karl S.; Foulk, James W.; Wheeler, Kyle; Underwood, Keith D.; Riesen, Rolf; Maccabe, Arthur B.; Hudson, Trammel

This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tar geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

SST Update

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

HT Breakout Discussions Modeling and Performance Tools

Hemmert, Karl S.; Schulz, Martin

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks

Kaplan, Fulya; Tuncer, Ozan; Leung, Vitus J.; Hemmert, Karl S.; Coskun, Aysek

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

Stalled Active and Idle (SAI): Characterizing Large-scale Dragonfly Networks

Groves, Taylor L.; Hammond, Simon; Hemmert, Karl S.; Grant, Ryan; Levenhagen, Michael; Arnold, Dorian

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

(SAI) Stalled Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks

Groves, Taylor L.; Grant, Ryan; Hemmert, Karl S.; Hammond, Simon; Levenhagen, Michael; Arnold, Dorian

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Trinity: Architecture and Early Experience

Hemmert, Karl S.; Rajan, Mahesh; Hoekstra, Robert J.; Dawson, Shawn; Vigil, Manuel; Grunau, Daryl; Lujan, James; Morton, David; Nam, Hai A.; Peltz Jr., Paul; Torrez, Alfred; Wright, Cornell; Glass, Micheal W.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Publications

Search results