Publications Search

Simulating Next-Gen Dataflow Architectures for HPC

Hughes, Clayton; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Eris: Fault Injection and Tracking Framework for Reliability Analysis of Open-Source Hardware

Proceedings 2022 IEEE International Symposium on Performance Analysis of Systems and Software Ispass 2022

Nema, Shubham; Kirschner, Justin; Adak, Debpratim; Agarwal, Sapan; Feinberg, Benjamin; Rodrigues, Arun; Marinella, Matthew; Awad, Amro

As transistors have been scaled over the past decade, modern systems have become increasingly susceptible to faults. Increased transistor densities and lower capacitances make a particle strike more likely to cause an upset. At the same time, complex computer systems are increasingly integrated into safety-critical systems such as autonomous vehicles. These two trends make the study of system reliability and fault tolerance essential for modern systems. To analyze and improve system reliability early in the design process, new tools are needed for RTL fault analysis.This paper proposes Eris, a novel framework to identify vulnerable components in hardware designs through fault-injection and fault propagation tracking. Eris builds on ESSENT - a fast C/C++ RTL simulation framework - to provide fault injection, fault tracking, and control-flow deviation detection capabilities for RTL designs. To demonstrate Eris' capabilities, we analyze the reliability of the open source Rocket Chip SoC by randomly injecting faults during thousands of runs on four microbenchmarks. As part of this analysis we measure the sensitivity of different hardware structures to faults based on the likelihood of a random fault causing silent data corruption, unrecoverable data errors, program crashes, and program hangs. We detect control flow deviations and determine whether or not they are benign. Additionally, using Eris' novel fault-tracking capabilities we are able to find 78% more vulnerable components in the same number of simulations compared to RTL-based fault injection techniques without these capabilities. We will release Eris as an open-source tool to aid future research into processor reliability and hardening.

More Details

TYPE Conference Paper YEAR 2022

DOI OSTI Scopus

Eris: Fault Injection and Tracking Framework for Reliability Analysis of Open-Source Hardware

Proceedings - 2022 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2022

Nema, Shubham; Kirschner, Justin; Adak, Debpratim; Agarwal, Sapan; Feinberg, Benjamin; Rodrigues, Arun; Marinella, Matthew; Awad, Amro

As transistors have been scaled over the past decade, modern systems have become increasingly susceptible to faults. Increased transistor densities and lower capacitances make a particle strike more likely to cause an upset. At the same time, complex computer systems are increasingly integrated into safety-critical systems such as autonomous vehicles. These two trends make the study of system reliability and fault tolerance essential for modern systems. To analyze and improve system reliability early in the design process, new tools are needed for RTL fault analysis.This paper proposes Eris, a novel framework to identify vulnerable components in hardware designs through fault-injection and fault propagation tracking. Eris builds on ESSENT - a fast C/C++ RTL simulation framework - to provide fault injection, fault tracking, and control-flow deviation detection capabilities for RTL designs. To demonstrate Eris' capabilities, we analyze the reliability of the open source Rocket Chip SoC by randomly injecting faults during thousands of runs on four microbenchmarks. As part of this analysis we measure the sensitivity of different hardware structures to faults based on the likelihood of a random fault causing silent data corruption, unrecoverable data errors, program crashes, and program hangs. We detect control flow deviations and determine whether or not they are benign. Additionally, using Eris' novel fault-tracking capabilities we are able to find 78% more vulnerable components in the same number of simulations compared to RTL-based fault injection techniques without these capabilities. We will release Eris as an open-source tool to aid future research into processor reliability and hardening.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI Scopus

A-SST Initial Specification

Rodrigues, Arun; Hammond, Simon; Hemmert, Karl S.; Hughes, Clayton; Kenny, Joseph; Voskuilen, Gwendolyn R.

The U.S. Army Research Office (ARO), in partnership with IARPA, are investigating innovative, efficient, and scalable computer architectures that are capable of executing next-generation large scale data-analytic applications. These applications are increasingly sparse, unstructured, non-local, and heterogeneous. Under the Advanced Graphic Intelligence Logical computing Environment (AGILE) program, Performer teams will be asked to design computer architectures to meet the future needs of the DoD and the Intelligence Community (IC). This design effort will require flexible, scalable, and detailed simulation to assess the performance, efficiency, and validity of their designs. To support AGILE, Sandia National Labs will be providing the AGILE-enhanced Structural Simulation Toolkit (A-SST). This toolkit is a computer architecture simulation framework designed to support fast, parallel, and multi-scale simulation of novel architectures. This document describes the A-SST framework, some of its library of simulation models, and how it may be used by AGILE Performers.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

Towards an Extensible Framework for Accelerated System Simulation

Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hughes, Clayton; Hemmert, Karl S.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2021

DOI OSTI

SST-ExplorerEnabling System-level Performance and Reliability Analysis for Designs with Real-World IPs

Rodrigues, Arun; Awad, Amro; Hughes, Clayton; Agarwal, Sapan; Skoufis, Michael; Voskuilen, Gwendolyn R.; Nema, Shubham; Razdan, Rohin; Gardner, Alan; Hemmert, Karl S.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2021

DOI OSTI

SST-ExplorerEnabling System-level Performance and Reliability Analysis for Designs with Real-World IPs

Rodrigues, Arun; Awad, Amro; Hughes, Clayton; Agarwal, Sapan; Skoufis, Michael; Voskuilen, Gwendolyn R.; Nema, Shubham; Razdan, Rohin; Gardner, Alan; Hemmert, Karl S.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

ERAS: Enabling the Integration of Real-World Intellectual Properties (IPs) in Architectural Simulators

Nema, Shubham; Razdan, Rohin; Rodrigues, Arun; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Adak, Debratim; Hammond, Simon; Awad, Amro; Hughes, Clayton

Sandia National Laboratories is investigating scalable architectural simulation capabilities with a focus on simulating and evaluating highly scalable supercomputers for high performance computing applications. There is a growing demand for RTL model integration to provide the capability to simulate customized node architectures and heterogeneous systems. This report describes the first steps integrating the ESSENTial Signal Simulation Enabled by Netlist Transforms (ESSENT) tool with the Structural Simulation Toolkit (SST). ESSENT can emit C++ models from models written in FIRRTL to automatically generate components. The integration workflow will automatically generate the SST component and necessary interfaces to ’plug’ the ESSENT model into the SST framework.

More Details

TYPE SAND Report YEAR 2021

DOI OSTI

Multiscale System Modeling of Single-Event-Induced Faults in Advanced Node Processors

IEEE Transactions on Nuclear Science

Cannon, Matthew J.; Rodrigues, Arun; Black, Dolores A.; Black, Jeff; Bustamante, Luis; Feinberg, Benjamin; Quinn, Heather M.; Clark, Lawrence T.; Brunhaver, John S.; Barnaby, Hugh; Mclain, Michael; Agarwal, Sapan; Marinella, Matthew

Integration-technology feature shrink increases computing-system susceptibility to single-event effects (SEE). While modeling SEE faults will be critical, an integrated processor's scope makes physically correct modeling computationally intractable. Without useful models, presilicon evaluation of fault-tolerance approaches becomes impossible. To incorporate accurate transistor-level effects at a system scope, we present a multiscale simulation framework. Charge collection at the 1) device level determines 2) circuit-level transient duration and state-upset likelihood. Circuit effects, in turn, impact 3) register-transfer-level architecture-state corruption visible at 4) the system level. Thus, the physically accurate effects of SEEs in large-scale systems, executed on a high-performance computing (HPC) simulator, could be used to drive cross-layer radiation hardening by design. We demonstrate the capabilities of this model with two case studies. First, we determine a D flip-flop's sensitivity at the transistor level on 14-nm FinFet technology, validating the model against published cross sections. Second, we track and estimate faults in a microprocessor without interlocked pipelined stages (MIPS) processor for Adams 90% worst case environment in an isotropic space environment.

More Details

TYPE Journal Article YEAR 2021

DOI OSTI Scopus

Enabling Guaranteed Correctness and Leading Edge Performance under Radiation with a Heterogeneous System

Feinberg, Benjamin; Rodrigues, Arun; Marinella, Matthew; Agarwal, Sapan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Enabling Guaranteed Correctness and Leading Edge Performance under Radiation with a Heterogeneous System

Feinberg, Benjamin; Rodrigues, Arun; Agarwal, Sapan; Marinella, Matthew

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2021

OSTI

Multiscale System Modeling of Single Event Induced Faults in Advanced Node Processors

Cannon, Matthew J.; Rodrigues, Arun; Black, Dolores A.; Black, Jeffrey D.; Bustamante, Luis; Feinberg, Benjamin; Quinn, Heather; Lawrence, Clark; Brunhaver, John; Hugh, Barnaby; Mclain, Michael; Agarwal, Sapan; Marinella, Matthew

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2020

DOI OSTI

Generic Spiking Architecture (GenSA)

Rothganger, Fredrick R.; Rodrigues, Arun

Neuromorphic devices are a rapidly growing area of interest in industry, with machines in production by IBM and Intel, among others. These devices promise to reduce size, weight and power (SWaP) costs while increasing resilience and facilitating high- performance computing (HPC). Each device will favor some set of algorithms, but this relationship has not been thoroughly studied. The field of neuromorphic computing is so new that existing devices were designed with merely estimated use-cases in mind. To better understand the fit between neuromorphic algorithms and machines, a simulated machine can be configured to any point in the design space. This will identify better choices of devices, and perhaps guide the market in new directions. The design of a generic spiking machine generalizes existing examples while also looking forward to devices that haven't been built yet. Each parameter is specified, along the approach/mechanism by which the relevant component is implemented in the simulator.

More Details

TYPE SAND Report YEAR 2020

DOI OSTI

Fault Tracking and Modeling in Advanced Node Processors of Single Event Effects

Cannon, Matthew J.; Rodrigues, Arun; Black, Dolores A.; Black, Jeffrey D.; Bustamante, Luis; Feinberg, Benjamin; Quinn, Heather; Clark, Lawrence; Brunhaver, John S.; Barnaby, Hugh; Mclain, Michael; Agarwal, Sapan; Marinella, Matthew

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Multiscale Modeling of Single Event-Induced Faults in FinFET-based Processors

Cannon, Matthew J.; Rodrigues, Arun; Black, Dolores A.; Black, Jeffrey D.; Bustamante, Luis; Feinberg, Benjamin; Clark, Larry; Brunhaver, John; Barnaby, Hugh; Agarwal, Sapan; Marinella, Matthew

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Towards a Scatter-Gather Architecture

Rodrigues, Arun; Voskieulen, Gwen; Gohkale, Maya

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI

Evaluating the Opportunities for Multi-Level Memory - An ASC 2016 L2 Milestone

Voskuilen, Gwendolyn R.; Frank, Michael P.; Hammond, Simon; Rodrigues, Arun

As new memory technologies appear on the market, there is a growing push to incorporate them into future architectures. Compared to traditional DDR DRAM, these technologies provide appealing advantages such as increased bandwidth or non-volatility. However, the technologies have significant downsides as well including higher cost, manufacturing complexity, and for non-volatile memories, higher latency and wear-out limitations. As such, no technology has emerged as a clear technological and economic winner. As a result, systems are turning to the concept of multi-level memory, or mixing multiple memory technologies in a single system to balance cost, performance, and reliability.

More Details

TYPE Other Report YEAR 2019

DOI OSTI

Abstract Machine Models and Proxy Architectures for Exascale Computing

Ang, James A.; Barrett, Richard F.; Benner, Robert E.; Burke, Daniel; Chan, Cy; Cook, Jeanine; Daley, Christopher S.; Donofrio, David; Hammond, Simon; Hemmert, Karl S.; Hoekstra, Robert J.; Ibrahim, Khaled; Kelly, Suzanne M.; Le, Hoang; Leung, Vitus J.; Michelogiannakis, George; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan; Unat, D.; Wright, Nick J.; Voskuilen, Gwendolyn R.

To achieve exascale computing, fundamental hardware architectures must change. The most significant consequence of this assertion is the impact on the scientific and engineering applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. In order to adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency into the future. While many details of the exascale architectures are undefined, an abstract machine model is designed to allow application developers to focus on the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. We use the term proxy architecture to describe a parameterized version of an abstract machine model, with the parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models are formulated to enable discussion between the developers of analytic models and simulators and computer hardware architects. They allow for application performance analysis and hardware optimization opportunities. In this report our goal is to provide the application development community with a set of models that can help software developers prepare for exascale. In addition, through the use of proxy architectures, we can enable a more concrete exploration of how well new and evolving application codes map onto future architectures. This second version of the document addresses system scale considerations and provides a system-level abstract machine model with proxy architecture information.

More Details

TYPE SAND Report YEAR 2019

DOI OSTI