Publications Search

Simulating HPC systems is a difficult task and the emergence of “Beyond CMOS” architectures and execution models will increase that difficulty. This document presents a “tutorial” on some of the simulation challenges faced by conventional and non-conventional architectures (Section 1) and goals and requirements for simulating Beyond CMOS systems (Section 2). These provide background for proposed short- and long-term roadmaps for simulation efforts at Sandia (Sections 3 and 4). Additionally, a brief explanation of a proof-of-concept integration of a Beyond CMOS architectural simulator is presented (Section 2.3).

More Details

TYPE Other Report YEAR 2017

DOI OSTI

Sandia's ARM-centric Co-Design Strategy: Introduction to the NNSA/ASC Vanguard Project

Ang, James A.; Brightwell, Ronald B.; Hammond, Simon; Hemmert, Karl S.; Hoekstra, Robert J.; Foulk, James W.; Foulk, James W.; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

SST Modsim 2017

Rodrigues, Arun

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Performance Analysis for Using Non-Volatile Memory DIMMs: Opportunities and Challenges

Awad, Amro; Hammond, Simon; Hughes, Clayton; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

The Impact of Increasing Memory System Diversity on Applications

Voskuilen, Gwendolyn R.; Rodrigues, Arun; Frank, Michael P.; Hammond, Simon

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Structural Simulation Toolkit (SST)

Rodrigues, Arun; Moore, Branden J.; Hammond, Simon; Hemmert, Karl S.; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Sandia?s ARM?centric Co-Design Strategy

Ang, James A.; Hammond, Simon; Hoekstra, Robert J.; Foulk, James W.; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Performance Analysis for Using Non-Volatile Memory DIMMs: Opportunities and Challenges

Awad, Amro; Hammond, Simon; Hughes, Clayton; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI

The Impact of Increasing Memory System Diversity on Applications

Voskuilen, Gwendolyn R.; Rodrigues, Arun; Frank, Michael P.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

Journal of Parallel and Distributed Computing

Berry, Jonathan; Bender, Michael A.; Hammond, Simon; Hemmert, Karl S.; Mccauley, Samuel; Moore, Branden J.; Moseley, Benjamin; Phillips, Cynthia A.; Resnick, David R.; Rodrigues, Arun

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Thus, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient. In this paper, we explore the design space for a user-controlled multi-level main memory. Our work identifies situations in which rewriting application kernels can provide significant performance gains when using near-memory. We present algorithms designed for two-level main memory, using divide-and-conquer to partition computations and streaming to exploit data locality. We consider algorithms for the fundamental application of sorting and for the data analysis kernel k-means. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories’ SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance improvements for our sorting algorithm. In contrast, the k-means algorithm is generally CPU bound and does not improve when using near-memory except under extreme conditions. These conditions require large instances that rule out SST simulation, but we demonstrate improvements by running on a customized machine with high and low bandwidth memory. These case studies in co-design serve as positive and cautionary templates, respectively, for the major task of optimizing the computational kernels of many fundamental applications for two-level main memory systems.

More Details

TYPE Journal Article YEAR 2017

DOI OSTI Scopus

Messier: A Detailed NVM-Based DIMM Model for the SST Simulation Framework

Awad, Amro; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hammond, Simon; Hoekstra, Robert J.; Hughes, Clayton

DRAM technology is the main building block of main memory, however, DRAM scaling is becoming very challenging. The main issues for DRAM scaling are the increasing error rates with each new generation, the geometric and physical constraints of scaling the capacitor part of the DRAM cells, and the high power consumption caused by the continuous need for refreshing cell values. At the same time, emerging Non- Volatile Memory (NVM) technologies, such as Phase-Change Memory (PCM), are emerging as promising replacements for DRAM. NVMs, when compared to current technologies e.g., NAND-based ash, have latencies comparable to DRAM. Additionally, NVMs are non-volatile, which eliminates the need for refresh power and enables persistent memory applications. Finally, NVMs have promising densities and the potential for multi-level cell (MLC) storage.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

NNSA Applications and Multi-level Memory

Rodrigues, Arun; Voskuilen, Gwendolyn R.; Frank, Michael P.; Hammond, Simon

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

ECP Architectural Simulation

Hoekstra, Robert J.; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Analyzing allocation behavior for multi-level memory

ACM International Conference Proceeding Series

Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hammond, Simon

Managing multi-level memories will require different policies from those used for cache hierarchies, as memory technologies differ in latency, bandwidth, and volatility. To this end we analyze application data allocations and main memory accesses to determine whether an application-driven approach to managing a multi-level memory system comprising stacked and conventional DRAM is viable. Our early analysis shows that the approach is viable, but some applications may require dynamic allocations (i.e., migration) while others are amenable to static allocation.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI Scopus

Multi-Level Memory: What You Add Is More Important Than What You Take Out

Rodrigues, Arun; Voskuilen, Gwendolyn R.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

ASC L2 Milestone - Evaluation of Opportunities for Multi-Level Memory

Voskuilen, Gwendolyn R.; Rodrigues, Arun; Frank, Michael P.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Evaluating the Opportunities for Multi-Level Memory – An ASC 2016 L2 Milestone

Voskuilen, Gwendolyn R.; Rodrigues, Arun; Frank, Michael P.; Hammond, Simon

The next two Advanced Technology platforms for the ASC program will feature complex memory hierarchies – in the Trinity supercomputer being deployed in 2016, Intel’s Knights Landing processors will feature 16GB of on-package, high-bandwidth memory, combined with a larger capacity DDR4 memory and in 2018, the Sierra machine deployed at Lawrence Livermore National Laboratory will feature powerful compute nodes containing POWER9 processors with large capacity memories and an array of coherent GPU accelerators also with high bandwidth memories.

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

ASC L2 Milestone - Evaluation of Opportunities for Multi-Level Memory

Voskuilen, Gwendolyn R.; Rodrigues, Arun; Frank, Michael P.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Analyzing allocation behavior for multi-level memory

Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI

Optical networks for high-performance computing: Promises and perils

5th IEEE Photonics Society Optical Interconnects Conference, OI 2016

Rodrigues, Arun

Optical networks hold great promise for improving the performance of supercomputers, yet they have always proven just out of reach. This talk will examine the potential of optical interconnects, barriers to adoption, and possible solutions from hardware/software co-design.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI Scopus

Building a Simulator

Rodrigues, Arun; Moore, Branden J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Publications

Search results