Publications Search

The Potentials and Perils of Multi-Level Memory

Rodrigues, Arun; Jayaraj, Jagan; Hammond, Simon; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015

Bender, Michael A.; Berry, Jonathan; Hammond, Simon; Hemmert, Karl S.; Mccauley, Samuel; Moore, Branden J.; Moseley, Benjamin; Phillips, Cynthia A.; Resnick, David R.; Rodrigues, Arun

A fundamental challenge for supercomputer architecture is that processors cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. As the number of cores per chip increases, and traditional DDR DRAM speeds stagnate, the problem is only getting worse. A variety of non-DDR 3D memory technologies (Wide I/O 2, HBM) offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. However, such a packaging scheme cannot contain sufficient memory capacity for a node. It seems likely that future systems will require at least two levels of main memory: high-bandwidth, low-power memory near the processor and low-bandwidth high-capacity memory further away. This near memory will probably not have significantly faster latency than the far memory. This, combined with the large size of the near memory (multiple GB) and power constraints, may make it difficult to treat it as a standard cache. In this paper, we explore some of the design space for a user-controlled multi-level main memory. We present algorithms designed for the heterogeneous bandwidth, using streaming to exploit data locality. We consider algorithms for the fundamental application of sorting. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories' SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance. This co-design effort suggests implementing two-level main memory systems may improve memory performance in fundamental applications.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

Structural Simulation Toolkit. Lunch & Learn

Moore, Branden J.; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hammond, Simon; Hemmert, Karl S.

This is a presentation outlining a lunch and learn lecture for the Structural Simulation Toolkit, supported by Sandia National Laboratories.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

The Potentials and Perils of Multi-Level Memory

Rodrigues, Arun; Jagan, Jayaraj; Hammond, Simon; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

ASCR Computer Architecture Laboratory

Hammond, Simon; Ang, James A.; Rodrigues, Arun; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Cook, Jeanine

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Ember: Reference Communication Patterns for Exascale

Hammond, Simon; Hemmert, Karl S.; Levenhagen, Michael; Rodrigues, Arun; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Structural Simulation Toolkit

Voskuilen, Gwendolyn R.; Hammond, Simon; Rodrigues, Arun; Moore, Branden J.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Trends in Microfabrication Capabilities & Device Architectures

Bauer, Todd M.; Jones, Adam; Lentine, Anthony L.; Mudrick, John P.; Okandan, Murat; Rodrigues, Arun

The last two decades have seen an explosion in worldwide R&D, enabling fundamentally new capabilities while at the same time changing the international technology landscape. The advent of technologies for continued miniaturization and electronics feature size reduction, and for architectural innovations, will have many technical, economic, and national security implications. It is important to anticipate possible microelectronics development directions and their implications on US national interests. This report forecasts and assesses trends and directions for several potentially disruptive microfabrication capabilities and device architectures that may emerge in the next 5-10 years.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

The Potentials and Perils of Multi-Level Memory

Jayaraj, Jagan; Rodrigues, Arun; Hammond, Simon; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

The Potential and Perils of Multi-Level Memory

Rodrigues, Arun; Jayaraj, Jagan; Hammond, Simon; Voskuilen, Gwendolyn R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Sandia?s Open Source Co-Design Capabilities

Ang, James A.; Foulk, James W.; Hemmert, Karl S.; Hammond, Simon; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Simulation & Co-Design for HPC

Rodrigues, Arun

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Design methodology for optimizing optical interconnection networks in high performance systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Rumley, Sebastien; Glick, Madeleine; Hammond, Simon; Rodrigues, Arun; Bergman, Keren

Modern high performance computers connect hundreds of thousands of endpoints and employ thousands of switches. This allows for a great deal of freedom in the design of the network topology. At the same time, due to the sheer numbers and complexity involved, it becomes more challenging to easily distinguish between promising and improper designs. With ever increasing line rates and advances in optical interconnects, there is a need for renewed design methodologies that comprehensively capture the requirements and expose tradeoffs expeditiously in this complex design space. We introduce a systematic approach, based on Generalized Moore Graphs, allowing one to quickly gauge the ideal level of connectivity required for a given number of end-points and traffic hypothesis, and to collect insight on the role of the switch radix in the topology cost. Based on this approach, we present a methodology for the identification of Pareto-optimal topologies. We apply our method to a practical case with 25,000 nodes and present the results.

More Details

TYPE Conference Poster YEAR 2015

OSTI Scopus

The Structural Simulation Toolkit

Rodrigues, Arun; Moore, Branden J.; Hammond, Simon; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

DOI OSTI

Abstract machine models and proxy architectures for exascale computing

Ang, James A.; Barrett, Richard F.; Benner, Robert E.; Burke, D.; Chan, C.; Donofrio, David; Hammond, Simon; Hemmert, Karl S.; Kelly, Suzanne M.; Le, H.; Leung, Vitus J.; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan T.; Unat, Didem; Wright, N.J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

Using a complementary emulation-simulation co-design approach to assess application readiness for Processing-in-Memory systems

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stelle, George W.; Olivier, Stephen L.; Stark, Dylan T.; Rodrigues, Arun; Hemmert, Karl S.

Disruptive changes to computer architecture are paving the way toward extreme scale computing. The co-design strategy of collaborative research and development among computer architects, system software designers, and application teams can help to ensure that applications not only cope but thrive with these changes. In this paper, we present a novel combined co-design approach of emulation and simulation in the context of investigating future Processing in Memory (PIM) architectures. PIM enables co-location of data and computation to decrease data movement, to provide increases in memory speed and capacity compared to existing technologies and, perhaps most importantly for extreme scale, to improve energy efficiency. Our evaluation of PIM focuses on three mini-applications representing important production applications. The emulation and simulation studies examine the effects of locality-aware versus locality-oblivious data distribution and computation, and they compare PIM to conventional architectures. Both studies contribute in their own way to the overall understanding of the application-architecture interactions, and our results suggest that PIM technology shows great potential for efficient computation without negatively impacting productivity.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI Scopus