Publications

12 Results

Search results

Jump to search filters

Abstract Machine Models and Proxy Architectures for Exascale Computing

Ang, James A.; Barrett, Richard F.; Benner, R.E.; Burke, Daniel; Chan, Cy; Cook, Jeanine C.; Daley, Christopher S.; Donofrio, David; Hammond, Simon D.; Hemmert, Karl S.; Hoekstra, Robert J.; Ibrahim, Khaled; Kelly, Suzanne M.; Le, Hoang; Leung, Vitus J.; Michelogiannakis, George; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan; Unat, D.; Wright, Nick J.; Voskuilen, Gwendolyn R.

To achieve exascale computing, fundamental hardware architectures must change. The most significant consequence of this assertion is the impact on the scientific and engineering applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. In order to adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency into the future. While many details of the exascale architectures are undefined, an abstract machine model is designed to allow application developers to focus on the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. We use the term proxy architecture to describe a parameterized version of an abstract machine model, with the parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models are formulated to enable discussion between the developers of analytic models and simulators and computer hardware architects. They allow for application performance analysis and hardware optimization opportunities. In this report our goal is to provide the application development community with a set of models that can help software developers prepare for exascale. In addition, through the use of proxy architectures, we can enable a more concrete exploration of how well new and evolving application codes map onto future architectures. This second version of the document addresses system scale considerations and provides a system-level abstract machine model with proxy architecture information.

More Details

Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

Journal of Parallel and Distributed Computing

Berry, Jonathan W.; Bender, Michael A.; Hammond, Simon D.; Hemmert, Karl S.; Mccauley, Samuel; Moore, Branden J.; Moseley, Benjamin; Phillips, Cynthia A.; Resnick, David R.; Rodrigues, Arun

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Thus, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient. In this paper, we explore the design space for a user-controlled multi-level main memory. Our work identifies situations in which rewriting application kernels can provide significant performance gains when using near-memory. We present algorithms designed for two-level main memory, using divide-and-conquer to partition computations and streaming to exploit data locality. We consider algorithms for the fundamental application of sorting and for the data analysis kernel k-means. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories’ SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance improvements for our sorting algorithm. In contrast, the k-means algorithm is generally CPU bound and does not improve when using near-memory except under extreme conditions. These conditions require large instances that rule out SST simulation, but we demonstrate improvements by running on a customized machine with high and low bandwidth memory. These case studies in co-design serve as positive and cautionary templates, respectively, for the major task of optimizing the computational kernels of many fundamental applications for two-level main memory systems.

More Details

Yearly Update: Exascale Projections for 2014

Kogge, Peter M.; Resnick, David R.

The HPC architectures of today are significantly different for a decade ago, with high odds that further changes will occur on the road to Exascale. This report discusses the "perfect storm' in technology that produced this change, the classes of architectures we are dealing with, and probable trends in how they will evolve. These properties and trends are then evaluated in terms of what it likely means to future Exascale systems and applications.

More Details

Proposing an Abstracted Interface and Protocol for Computer Systems

Resnick, David R.; Ignatowski, Mike

While it made sense for historical reasons to develop different interfaces and protocols for memory channels, CPU to CPU interactions, and I/O devices, ongoing developments in the computer industry are leading to more converged requirements and physical implementations for these interconnects. As it becomes increasingly common for advanced components to contain a variety of computational devices as well as memory, the distinction between processors, memory, accelerators, and I/O devices become s increasingly blurred. As a result, the interface requirements among such components are converging. There is also a wide range of new disruptive technologies that will impact the computer market in the coming years , including 3D integration and emerging NVRAM memory. Optimal exploitation of these technologies cannot be done with the existing memory, storage, and I/O interface standards. The computer industry has historically made major advances when industry players have been able to add innovation behind a standard interface. The standard interface provides a large market for their products and enables relatively quick and widespread adoption. To enable a new wave of innovation in the form of advanced memory products and accelerators, we need a new standard interface explicitly designed to provide both the performance and flexibility to support new system integration solutions.

More Details

Proposing an Abstracted Interface and Protocol for Computer Systems

Resnick, David R.; Ignatowski, Mike

While it made sense for historical reasons to develop different interfaces and protocols for memory channels, CPU to CPU interactions, and I/O devices, ongoing developments in the computer industry are leading to more converged requirements and physical implementations for these interconnects. As it becomes increasingly common for advanced components to contain a variety of computational devices as well as memory, the distinction between processors, memory, accelerators, and I/O devices becomes increasingly blurred. As a result, the interface requirements among such components are converging. There is also a wide range of new disruptive technologies that will impact the computer market in the coming years, including 3D integration and emerging NVRAM memory. Optimal exploitation of these technologies cannot be done with the existing memory, storage, and I/O interface standards. The computer industry has historically made major advances when industry players have been able to add innovation behind a standard interface. The standard interface provides a large market for their products and enables relatively quick and widespread adoption. To enable a new wave of innovation in the form of advanced memory products and accelerators, we need a new standard interface explicitly designed to provide both the performance and flexibility to support new system integration solutions

More Details
12 Results
12 Results