Publications Search

Trinity: Opportunities and Challenges of a Heterogeneous System

Hemmert, Karl S.; Moore, Stan G.; Gallis, Michael A.; Davis, Mike E.; Levesque, John; Hjelm, Nathan; Lujan, James; Morton, David; Nam, Hai A.; Parga, Alex; Peltz Jr., Paul; Shipman, Galen; Torrez, Alfred

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Structural Simulation Toolkit (SST) Tutorial

Hammond, Simon; Rodrigues, Arun; Voskuilen, Gwendolyn R.; Hemmert, Karl S.; Levenhagen, Michael; Hughes, Clayton; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Trinity: Opportunities and Challenges of a Heterogeneous System

Hemmert, Karl S.; Moore, Stan G.; Gallis, Michael A.; Davis, Mike E.; Levesque, John; Hjelm, Nathan; Lujan, James; Morton, David; Nam, Hai A.; Parga, Alex; Peltz Jr., Paul; Shipman, Galen; Torrez, Alfred

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Trinity Update: Open Science Burst Buffers Intel Xeon Phi Processor Plans

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Merlin Element Library Deep Dive

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Analyzing Exascale Memory Architectures Using the SST Toolkit

Hughes, Clayton; Awad, Amro; Hammond, Simon; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Interconnect Working Group

Hemmert, Karl S.; Bair, Ray; Bhatele, Abhinav; Groves, Taylor; Hammond, Simon; Jain, Nikhil; Levenhagen, Michael; Mubarak, Misbah; Pakin, Scott; Ross, Rob; Wilke, Jeremiah

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

SST Simulation Framework (and Complex Memory)

Hammond, Simon; Hughes, Clayton; Awad, Amro; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hemmert, Karl S.; Levenhagen, Michael; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Towards a Scalable Integrated Simulation Framework for Extreme Heterogeneity in High Performance Computing

Hammond, Simon; Rodrigues, Arun; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Hughes, Clayton; Levenhagen, Michael; Hoekstra, Robert J.; Ang, James A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Final Review of FY17 ASC CSSE L2 Milestone #6018 entitled "Analyzing Power Usage Characteristics of Workloads Running on Trinity"

Hoekstra, Robert J.; Hammond, Simon; Hemmert, Karl S.; Gentile, Ann C.; Oldfield, Ron; Lang, Mike; Martin, Steve

The presentation documented the technical approach of the team and summary of the results with sufficient detail to demonstrate both the value and the completion of the milestone. A separate SAND report was also generated with more detail to supplement the presentation.

More Details

TYPE Other Report YEAR 2017

DOI OSTI

Vanguard: Maturing the ARM Software Ecosystem for U.S. DOE Supercomputing

Foulk, James W.; Foulk, James W.; Grant, Ryan; Hammond, Simon; Hemmert, Karl S.; Martinez, David; Noe, John P.; Foulk, James W.; Ward, Harry L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Sandia's ARM-centric Co-Design Strategy: Introduction to the NNSA/ASC Vanguard Project

Ang, James A.; Brightwell, Ronald B.; Hammond, Simon; Hemmert, Karl S.; Hoekstra, Robert J.; Foulk, James W.; Foulk, James W.; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Trinity Architecture

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Performance Analysis for Using Non-Volatile Memory DIMMs: Opportunities and Challenges

Awad, Amro; Hammond, Simon; Hughes, Clayton; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

Unveiling the Interplay between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks

Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017

Kaplan, Fulya; Tuncer, Ozan; Leung, Vitus J.; Hemmert, Karl S.; Coskun, Ayse K.

Network messaging delay historically constitutes a large portion of the wall-clock time for High Performance Computing (HPC) applications, as these applications run on many nodes and involve intensive communication among their tasks. Dragonfly network topology has emerged as a promising solution for building exascale HPC systems owing to its low network diameter and large bisection bandwidth. Dragonfly includes local links that form groups and global links that connect these groups via high bandwidth optical links. Many aspects of the dragonfly network design are yet to be explored, such as the performance impact of the connectivity of the global links, i.e., global link arrangements, the bandwidth of the local and global links, or the job allocation algorithm. This paper first introduces a packet-level simulation framework to model the performance of HPC applications in detail. The proposed framework is able to simulate known MPI (message passing interface) routines as well as applications with custom-defined communication patterns for a given job placement algorithm and network topology. Using this simulation framework, we investigate the coupling between global link bandwidth and arrangements, communication pattern and intensity, job allocation and task mapping algorithms, and routing mechanisms in dragonfly topologies. We demonstrate that by choosing the right combination of system settings and workload allocation algorithms, communication overhead can be decreased by up to 44%. We also show that circulant arrangement provides up to 15% higher bisection bandwidth compared to the other arrangements, but for realistic workloads, the performance impact of link arrangements is less than 3%.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI Scopus

Structural Simulation Toolkit (SST)

Rodrigues, Arun; Moore, Branden J.; Hammond, Simon; Hemmert, Karl S.; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Performance Analysis for Using Non-Volatile Memory DIMMs: Opportunities and Challenges

Awad, Amro; Hammond, Simon; Hughes, Clayton; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

Journal of Parallel and Distributed Computing

Berry, Jonathan; Bender, Michael A.; Hammond, Simon; Hemmert, Karl S.; Mccauley, Samuel; Moore, Branden J.; Moseley, Benjamin; Phillips, Cynthia A.; Resnick, David R.; Rodrigues, Arun

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Thus, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient. In this paper, we explore the design space for a user-controlled multi-level main memory. Our work identifies situations in which rewriting application kernels can provide significant performance gains when using near-memory. We present algorithms designed for two-level main memory, using divide-and-conquer to partition computations and streaming to exploit data locality. We consider algorithms for the fundamental application of sorting and for the data analysis kernel k-means. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories’ SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance improvements for our sorting algorithm. In contrast, the k-means algorithm is generally CPU bound and does not improve when using near-memory except under extreme conditions. These conditions require large instances that rule out SST simulation, but we demonstrate improvements by running on a customized machine with high and low bandwidth memory. These case studies in co-design serve as positive and cautionary templates, respectively, for the major task of optimizing the computational kernels of many fundamental applications for two-level main memory systems.

More Details

TYPE Journal Article YEAR 2017

DOI OSTI Scopus

The Portals 4.1 Network Programming Interface

Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan; Hemmert, Karl S.; Foulk, James W.; Wheeler, Kyle; Underwood, Keith D.; Riesen, Rolf; Maccabe, Arthur B.; Hudson, Trammel

This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tar geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

SST Update

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

HT Breakout Discussions Modeling and Performance Tools

Hemmert, Karl S.; Schulz, Martin

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks

Kaplan, Fulya; Tuncer, Ozan; Leung, Vitus J.; Hemmert, Karl S.; Coskun, Aysek

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

Stalled Active and Idle (SAI): Characterizing Large-scale Dragonfly Networks

Groves, Taylor L.; Hammond, Simon; Hemmert, Karl S.; Grant, Ryan; Levenhagen, Michael; Arnold, Dorian

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

(SAI) Stalled Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks

Groves, Taylor L.; Grant, Ryan; Hemmert, Karl S.; Hammond, Simon; Levenhagen, Michael; Arnold, Dorian

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Trinity: Architecture and Early Experience

Hemmert, Karl S.; Rajan, Mahesh; Hoekstra, Robert J.; Dawson, Shawn; Vigil, Manuel; Grunau, Daryl; Lujan, James; Morton, David; Nam, Hai A.; Peltz Jr., Paul; Torrez, Alfred; Wright, Cornell; Glass, Micheal W.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Publications

Search results