The Promise of Neuromorphic Computing
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of unprecedented amounts of data have created a space for massively parallel accelerators capable of maintaining context for thousands of concurrent threads resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. One path for the design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. The SST framework has been proven to scale up to run simulations containing tens of thousands of nodes. A previous report described the initial integration of the open-source, execution-driven GPU simulator, GPGPU-Sim, into the SST framework. This report discusses the results of the integration and how to use the new GPU component in SST. It also provides examples of what it can be used to analyze and a correlation study showing how closely the execution matches that of a Nvidia V100 GPU when running kernels and mini-apps.
The review was conducted on May 8-9, 2017 at the University of Utah. Overall the review team was impressed with the work presented and found that the CCMSC had met or exceeded the Year 3 milestones. Specific details, comments, and recommendations are included in this document.
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
Arm processors have been explored in HPC for several years, however there has not yet been a demonstration of viability for supporting large-scale production workloads. In this paper, we offer a retrospective on the process of bringing up Astra, the first Petascale supercomputer based on 64-bit Arm processors, and validating its ability to run production HPC applications. Through this process several immature technology gaps were addressed, including software stack enablement, Linux bugs at scale, thermal management issues, power management capabilities, and advanced container support. From this experience, several lessons learned are formulated that contributed to the successful deployment of Astra. These insights can be helpful to accelerate deploying and maturing other first-seen HPC technologies. With Astra now supporting many users running a diverse set of production applications at multi-thousand node scales, we believe this constitutes strong supporting evidence that Arm is a viable technology for even the largest-scale supercomputer deployments.
With the dawn of the exascale era, computer scientists and engineers are faced with tremendous challenges across all facets of the HPC system - scalability, performance, reliability, and power consumption. In particular, the power-performance benefit from one processor generation to the next is seeing ever-diminishing returns and will require fundamental changes in the way we approach computation. In fact, it is likely that different applications will require different types of accelerators in order to meet power, performance, and reliability requirements at scale. One potential type of accelerator, a dataflow architecture, diverges from the traditional sequentially executed instruction model into one that reflects the inherent instruction-level parallelism in a program. This work presents the initial steps toward a tool that can extract the control-dataflow graph from an application.
Abstract not provided.
The review was conducted on May 9-10, 2016 at the University of Utah. Overall the review team was impressed with the work presented and found that the CCMSC had met or exceeded the Year 2 milestones. Specific details, comments and recommendations are included in this document.
To achieve exascale computing, fundamental hardware architectures must change. The most significant consequence of this assertion is the impact on the scientific and engineering applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. In order to adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency into the future. While many details of the exascale architectures are undefined, an abstract machine model is designed to allow application developers to focus on the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. We use the term proxy architecture to describe a parameterized version of an abstract machine model, with the parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models are formulated to enable discussion between the developers of analytic models and simulators and computer hardware architects. They allow for application performance analysis and hardware optimization opportunities. In this report our goal is to provide the application development community with a set of models that can help software developers prepare for exascale. In addition, through the use of proxy architectures, we can enable a more concrete exploration of how well new and evolving application codes map onto future architectures. This second version of the document addresses system scale considerations and provides a system-level abstract machine model with proxy architecture information.
Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of massive amounts of data have created a space for massively parallel accelerators capable of maintaining context for thousands of concurrent threads resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. One path for the design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. The SST framework has been proven to scale up to run simulations containing tens of thousands of nodes. A previous report described the initial integration of the open-source, execution-driven GPU simulator, GPGPU-Sim, into the SST framework. This report discusses the results of the integration and how to use the new GPU component in SST. It also provides examples of what it can be used to analyze and a correlation study showing how closely the execution matches that of a Nvidia V100 GPU when running kernels and mini-apps.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of massive amounts of data have created a space for massively parallel acceleration where the context for thousands of concurrent threads are resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. The design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. To address the need for a scalable, decentralized GPU model that can model large GPUs, chiplet-based GPUs and multi-node GPUs, this report details the first steps in integrating the open-source, execution driven GPGPU-Sim into the SST framework. The first stage of this project, creates two elements: a kernel scheduler SST element accepts work from SST CPU models and schedules it to an SM-collection element that performs cycle-by-cycle timing using SSTs Mem Hierarchy to model a flexible memory system.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The Vanguard program informally began in January 2017 with the submission of a white paper entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia National Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.
The Vanguard program informally began in January 2017 with the submission of a white paper entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia National Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads.
Abstract not provided.