Publications Search

Trinity: Architecture and Early Experience

Hemmert, Karl S.; Rajan, Mahesh; Hoekstra, Robert J.; Dawson, Shawn; Vigil, Manuel; Grunau, Daryl; Lujan, James; Morton, David; Nam, Hai A.; Peltz Jr., Paul; Torrez, Alfred; Wright, Cornell; Glass, Micheal W.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Trinity: Architecture and Early Experience

Hemmert, Karl S.; Rajan, Mahesh; Hoekstra, Robert J.; Dawson, Shawn; Vigil, Manuel; Grunau, Daryl; Lujan, James; Morton, David; Nam, Hai A.; Peltz Jr., Paul; Torrez, Alfred; Wright, Cornell

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Miranda: a lightweight processor

Voskuilen, Gwendolyn R.; Moore, Branden J.; Rodrigues, Arun; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Structural Simulation Toolkit (SST)

Rodrigues, Arun; Voskuilen, Gwendolyn R.; Hammond, Simon; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Multi-Level Memory ? The Next Opportunity for Performance?

Hammond, Simon; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hemmert, Karl S.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Performance and Modeling Tools (SST & PerfMiner)

Rodrigues, Arun; Cook, Jeanine; Hammond, Simon; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015

Bender, Michael A.; Berry, Jonathan; Hammond, Simon; Hemmert, Karl S.; Mccauley, Samuel; Moore, Branden J.; Moseley, Benjamin; Phillips, Cynthia A.; Resnick, David R.; Rodrigues, Arun

A fundamental challenge for supercomputer architecture is that processors cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. As the number of cores per chip increases, and traditional DDR DRAM speeds stagnate, the problem is only getting worse. A variety of non-DDR 3D memory technologies (Wide I/O 2, HBM) offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. However, such a packaging scheme cannot contain sufficient memory capacity for a node. It seems likely that future systems will require at least two levels of main memory: high-bandwidth, low-power memory near the processor and low-bandwidth high-capacity memory further away. This near memory will probably not have significantly faster latency than the far memory. This, combined with the large size of the near memory (multiple GB) and power constraints, may make it difficult to treat it as a standard cache. In this paper, we explore some of the design space for a user-controlled multi-level main memory. We present algorithms designed for the heterogeneous bandwidth, using streaming to exploit data locality. We consider algorithms for the fundamental application of sorting. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories' SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance. This co-design effort suggests implementing two-level main memory systems may improve memory performance in fundamental applications.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

Structural Simulation Toolkit. Lunch & Learn

Moore, Branden J.; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hammond, Simon; Hemmert, Karl S.

This is a presentation outlining a lunch and learn lecture for the Structural Simulation Toolkit, supported by Sandia National Laboratories.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

ASCR Computer Architecture Laboratory

Hammond, Simon; Ang, James A.; Rodrigues, Arun; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Cook, Jeanine

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Trinity Platform Overview

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Ember: Reference Communication Patterns for Exascale

Hammond, Simon; Hemmert, Karl S.; Levenhagen, Michael; Rodrigues, Arun; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Structural Simulation Toolkit

Voskuilen, Gwendolyn R.; Hammond, Simon; Rodrigues, Arun; Moore, Branden J.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Sandia?s Open Source Co-Design Capabilities

Ang, James A.; Foulk, James W.; Hemmert, Karl S.; Hammond, Simon; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

DOE's Fast Forward and Design Forward R&D Projects: Influence Exascale Hardware

Ang, James A.; Hammond, Simon; Hemmert, Karl S.; Foulk, James W.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

An evaluation of MPI message rate on hybrid-core processors

International Journal of High Performance Computing Applications

Brightwell, Ronald B.; Barrett, Brian W.; Grant, Ryan; Hammond, Simon; Hemmert, Karl S.

Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that may combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way in which compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insight into how MPI implementations for future hybrid-core processors should be designed.

More Details

TYPE Journal Article YEAR 2014

DOI OSTI Scopus

The Portals 4.0.2 Networking Programming Interface

Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan; Hemmert, Karl S.; Foulk, James W.; Wheeler, Kyle B.; Underwood, Keith D.; Riesen, Rolf; Maccabe, Arthur B.; Hudson, Trammell

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

The Structural Simulation Toolkit

Rodrigues, Arun; Moore, Branden J.; Hammond, Simon; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

DOI OSTI

Abstract machine models and proxy architectures for exascale computing

Ang, James A.; Barrett, Richard F.; Benner, Robert E.; Burke, D.; Chan, C.; Donofrio, David; Hammond, Simon; Hemmert, Karl S.; Kelly, Suzanne M.; Le, H.; Leung, Vitus J.; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan T.; Unat, Didem; Wright, N.J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

The Sandia Co-design Ecosystem

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Using a complementary emulation-simulation co-design approach to assess application readiness for Processing-in-Memory systems

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stelle, George W.; Olivier, Stephen L.; Stark, Dylan T.; Rodrigues, Arun; Hemmert, Karl S.

Disruptive changes to computer architecture are paving the way toward extreme scale computing. The co-design strategy of collaborative research and development among computer architects, system software designers, and application teams can help to ensure that applications not only cope but thrive with these changes. In this paper, we present a novel combined co-design approach of emulation and simulation in the context of investigating future Processing in Memory (PIM) architectures. PIM enables co-location of data and computation to decrease data movement, to provide increases in memory speed and capacity compared to existing technologies and, perhaps most importantly for extreme scale, to improve energy efficiency. Our evaluation of PIM focuses on three mini-applications representing important production applications. The emulation and simulation studies examine the effects of locality-aware versus locality-oblivious data distribution and computation, and they compare PIM to conventional architectures. Both studies contribute in their own way to the overall understanding of the application-architecture interactions, and our results suggest that PIM technology shows great potential for efficient computation without negatively impacting productivity.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI Scopus

XGC Overview

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Extreme-scale Computing Grand Challenge LDRD (XGC)

Hemmert, Karl S.; Barrett, Brian; Barrett, Richard F.; Lentine, Anthony L.; Rodrigues, Arun; Denton-Hill, Kim M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

The Impact of Hybrid-Core Processors on MPI Message Rate

Barrett, Brian; Brightwell, Ronald B.; Hemmert, Karl S.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The portals 4.0.1 network programming interface

Barrett, Brian; Brightwell, Ronald B.; Pedretti, Kevin; Hemmert, Karl S.

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities. 3

More Details

TYPE SAND Report YEAR 2013

DOI OSTI

Interconnect Challenges at Exascale

Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Publications

Search results