The U.S. Army Research Office (ARO), in partnership with IARPA, are investigating innovative, efficient, and scalable computer architectures that are capable of executing next-generation large scale data-analytic applications. These applications are increasingly sparse, unstructured, non-local, and heterogeneous. Under the Advanced Graphic Intelligence Logical computing Environment (AGILE) program, Performer teams will be asked to design computer architectures to meet the future needs of the DoD and the Intelligence Community (IC). This design effort will require flexible, scalable, and detailed simulation to assess the performance, efficiency, and validity of their designs. To support AGILE, Sandia National Labs will be providing the AGILE-enhanced Structural Simulation Toolkit (A-SST). This toolkit is a computer architecture simulation framework designed to support fast, parallel, and multi-scale simulation of novel architectures. This document describes the A-SST framework, some of its library of simulation models, and how it may be used by AGILE Performers.
Sandia National Laboratories is investigating scalable architectural simulation capabilities with a focus on simulating and evaluating highly scalable supercomputers for high performance computing applications. There is a growing demand for RTL model integration to provide the capability to simulate customized node architectures and heterogeneous systems. This report describes the first steps integrating the ESSENTial Signal Simulation Enabled by Netlist Transforms (ESSENT) tool with the Structural Simulation Toolkit (SST). ESSENT can emit C++ models from models written in FIRRTL to automatically generate components. The integration workflow will automatically generate the SST component and necessary interfaces to ’plug’ the ESSENT model into the SST framework.
Peng, Ivy P.; Voskuilen, Gwendolyn R.; Sarkar, Abhik S.; Boehme, David B.; Long, Rogelio L.; Moore, Shirley M.; Gokhale, Maya G.
This is the second in a sequence of three Hardware Evaluation milestones that provide insight into the following questions: What are the sources of excess data movement across all levels of the memory hierarchy, going out to the network fabric? What can be done at various levels of the hardware/software hierarchy to reduce excess data movement? How does reduced data movement track application performance? The results of this study can be used to suggest where the DOE supercomputing facilities, working with their hardware vendors, can optimize aspects of the system to reduce excess data movement. Quantitative analysis will also benefit systems software and applications to optimize caching and data layout strategies. Another potential avenue is to answer cost-benefit questions, such as those involving memory capacity versus latency and bandwidth. This milestone focuses on techniques to reduce data movement, quantitatively evaluates the efficacy of the techniques in accomplishing that goal, and measures how performance tracks data movement reduction. We study a small collection of benchmarks and proxy mini-apps that run on pre-exascale GPUs and on the Accelsim GPU simulator. Our approach has two thrusts: to measure advanced data movement reduction directives and techniques on the newest available GPUs, and to evaluate our benchmark set on simulated GPUs configured with architectural refinements to reduce data movement.
In this report, we abstract eleven papers published during the project and describe preliminary unpublished results that warrant follow-up work. The topic is multi-level memory algorithmics, or how to effectively use multiple layers of main memory. Modern compute nodes all have this feature in some form.
Voskuilen, Gwendolyn R.; Gimenez, Alfredo G.; Peng, Ivy P.; Moore, Shirley M.; Gokhale, Maya G.
In HIHE01-1, "Evaluate a PathForward/Facilities memory-relevant performance study/analysis," we conducted a focused study on the performance differences between HBM2 and HBM3 as revealed through execution of representative benchmarks. We used measurements on an existing many-core system, Knight's Landing (KNL), to calibrate simulator settings, and then performed Structural Simulation Toolkit (SST) simulations of KNL-like CPUs that access future high bandwidth memories. This report documents our findings.