Publications Search

The ASC program seeks to use machine learning to improve efficiencies in its stockpile stewardship mission. Moreover, there is a growing market for technologies dedicated to accelerating AI workloads. Many of these emerging architectures promise to provide savings in energy efficiency, area, and latency when compared to traditional CPUs for these types of applications — neuromorphic analog and digital technologies provide both low-power and configurable acceleration of challenging artificial intelligence (AI) algorithms. If designed into a heterogeneous system with other accelerators and conventional compute nodes, these technologies have the potential to augment the capabilities of traditional High Performance Computing (HPC) platforms [5]. This expanded computation space requires not only a new approach to physics simulation, but the ability to evaluate and analyze next-generation architectures specialized for AI/ML workloads in both traditional HPC and embedded ND applications. Developing this capability will enable ASC to understand how this hardware performs in both HPC and ND environments, improve our ability to port our applications, guide the development of computing hardware, and inform vendor interactions, leading them toward solutions that address ASC’s unique requirements.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

Modeling Analog Tile-Based Accelerators Using SST

Feinberg, Benjamin; Agarwal, Sapan; Plagge, Mark; Rothganger, Fredrick R.; Cardwell, Suma G.; Hughes, Clayton

Analog computing has been widely proposed to improve the energy efficiency of multiple important workloads including neural network operations, and other linear algebra kernels. To properly evaluate analog computing and explore more complex workloads such as systems consisting of multiple analog data paths, system level simulations are required. Moreover, prior work on system architectures for analog computing often rely on custom simulators creating signficant additional design effort and complicating comparisons between different systems. To remedy these issues, this report describes the design and implementation of a flexible tile-based analog accelerator element for the Structural Simulation Toolkit (SST). The element focuses on heavily on the tile controller—an often neglected aspect of prior work—that is sufficiently versatile to simulate a wide range of different tile operations including neural network layers, signal processing kernels, and generic linear algebra operations without major constraints. The tile model also interoperates with existing SST memory and network models to reduce the overall development load and enable future simulation of heterogeneous systems with both conventional digital logic and analog compute tiles. Finally, both the tile and array models are designed to easily support future extensions as new analog operations and applications that can benefit from analog computing are developed.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

CrossSim Inference Manual v2.0

Xiao, Tianyao P.; Bennett, Christopher; Feinberg, Benjamin; Marinella, Matthew; Agarwal, Sapan

Neural networks are largely based on matrix computations. During forward inference, the most heavily used compute kernel is the matrix-vector multiplication (MVM): $W \vec{x} $. Inference is a first frontier for the deployment of next-generation hardware for neural network applications, as it is more readily deployed in edge devices, such as mobile devices or embedded processors with size, weight, and power constraints. Inference is also easier to implement in analog systems than training, which has more stringent device requirements. The main processing kernel used during inference is the MVM.

More Details

TYPE Other Report YEAR 2022

DOI OSTI

An Accurate, Error-Tolerant, and Energy-Efficient Neural Network Inference Engine Based on SONOS Analog Memory

IEEE Transactions on Circuits and Systems I: Regular Papers

Xiao, Tianyao P.; Feinberg, Benjamin; Bennett, Christopher; Agrawal, Vineet; Saxena, Prashant; Prabhakar, Venkatraman; Ramkumar, Krishnaswamy; Medu, Harsha; Raghavan, Vijay; Chettuvetty, Ramesh; Agarwal, Sapan; Marinella, Matthew

We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-To-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a > 10× gain in energy efficiency over state-of-The-Art digital and analog inference accelerators.

More Details

TYPE Journal Article YEAR 2022

DOI OSTI Scopus

Achieving Accurate In-Memory Neural Network Inference with Highly Overlapping Nonvolatile Memory State Distributions

Marinella, Matthew; Xiao, Tianyao P.; Feinberg, Benjamin; Bennett, Christopher; Agrawal, Vineet; Puchner, Helmut; Agarwal, Sapan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Energy Efficient Deep Neural Network Processing: Digital CMOS Limits and Prospects for Analog In-Memory Computing

Marinella, Matthew; Xiao, Tianyao P.; Bennett, Christopher; Feinberg, Benjamin; Wahby, William; Jacobs-Gedrim, Robin B.; Agrawal, Vineet; Puchner, Helmut; Barnaby, Hugh; Agarwal, Sapan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI

Eris: Fault Injection and Tracking Framework for Reliability Analysis of Open-Source Hardware

Proceedings 2022 IEEE International Symposium on Performance Analysis of Systems and Software Ispass 2022

Nema, Shubham; Kirschner, Justin; Adak, Debpratim; Agarwal, Sapan; Feinberg, Benjamin; Rodrigues, Arun; Marinella, Matthew; Awad, Amro

As transistors have been scaled over the past decade, modern systems have become increasingly susceptible to faults. Increased transistor densities and lower capacitances make a particle strike more likely to cause an upset. At the same time, complex computer systems are increasingly integrated into safety-critical systems such as autonomous vehicles. These two trends make the study of system reliability and fault tolerance essential for modern systems. To analyze and improve system reliability early in the design process, new tools are needed for RTL fault analysis.This paper proposes Eris, a novel framework to identify vulnerable components in hardware designs through fault-injection and fault propagation tracking. Eris builds on ESSENT - a fast C/C++ RTL simulation framework - to provide fault injection, fault tracking, and control-flow deviation detection capabilities for RTL designs. To demonstrate Eris' capabilities, we analyze the reliability of the open source Rocket Chip SoC by randomly injecting faults during thousands of runs on four microbenchmarks. As part of this analysis we measure the sensitivity of different hardware structures to faults based on the likelihood of a random fault causing silent data corruption, unrecoverable data errors, program crashes, and program hangs. We detect control flow deviations and determine whether or not they are benign. Additionally, using Eris' novel fault-tracking capabilities we are able to find 78% more vulnerable components in the same number of simulations compared to RTL-based fault injection techniques without these capabilities. We will release Eris as an open-source tool to aid future research into processor reliability and hardening.

More Details

TYPE Conference Paper YEAR 2022

DOI OSTI Scopus

Eris: Fault Injection and Tracking Framework for Reliability Analysis of Open-Source Hardware

Proceedings - 2022 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2022

Nema, Shubham; Kirschner, Justin; Adak, Debpratim; Agarwal, Sapan; Feinberg, Benjamin; Rodrigues, Arun; Marinella, Matthew; Awad, Amro

As transistors have been scaled over the past decade, modern systems have become increasingly susceptible to faults. Increased transistor densities and lower capacitances make a particle strike more likely to cause an upset. At the same time, complex computer systems are increasingly integrated into safety-critical systems such as autonomous vehicles. These two trends make the study of system reliability and fault tolerance essential for modern systems. To analyze and improve system reliability early in the design process, new tools are needed for RTL fault analysis.This paper proposes Eris, a novel framework to identify vulnerable components in hardware designs through fault-injection and fault propagation tracking. Eris builds on ESSENT - a fast C/C++ RTL simulation framework - to provide fault injection, fault tracking, and control-flow deviation detection capabilities for RTL designs. To demonstrate Eris' capabilities, we analyze the reliability of the open source Rocket Chip SoC by randomly injecting faults during thousands of runs on four microbenchmarks. As part of this analysis we measure the sensitivity of different hardware structures to faults based on the likelihood of a random fault causing silent data corruption, unrecoverable data errors, program crashes, and program hangs. We detect control flow deviations and determine whether or not they are benign. Additionally, using Eris' novel fault-tracking capabilities we are able to find 78% more vulnerable components in the same number of simulations compared to RTL-based fault injection techniques without these capabilities. We will release Eris as an open-source tool to aid future research into processor reliability and hardening.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI Scopus

Analog Neural Network Inference Accuracy in One-Selector One-Resistor Memory Arrays

Proceedings - 2022 IEEE International Conference on Rebooting Computing, ICRC 2022

Xiao, Tianyao P.; Bennett, Christopher; Wilson, Donald E.; Feinberg, Benjamin; Agarwal, Sapan; Marinella, Matthew

Non-volatile memory arrays require select devices to ensure accurate programming. The one-selector one-resistor (1S1R) array where a two-terminal nonlinear select device is placed in series with a resistive memory element is attractive due to its high-density data storage; however, the effect of the nonlinear select device on the accuracy of analog in-memory computing has not been explored. This work evaluates the impact of select and memory device properties on the results of analog matrix-vector multiplications. We integrate nonlinear circuit simulations into CrossSim and perform end-to-end neural network inference simulations to study how the select device affects the accuracy of neural network inference. We propose an adjustment to the input voltage that can effectively compensate for the electrical load of the select device. Our results show that for deep residual networks trained on CIFAR-10, a compensation that is uniform across all devices in the system can mitigate these effects over a wide range of values for the select device I-V steepness and memory device On/Off ratio. A realistic I-V curve steepness of 60 mV/dec can yield an accuracy on CIFAR-10 that is within 0.44% of the floating-point accuracy.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI Scopus

Analysis and mitigation of parasitic resistance effects for analog in-memory neural network acceleration

Semiconductor Science and Technology

Xiao, Tianyao P.; Feinberg, Benjamin; Rohan, Jacob N.; Bennett, Christopher; Agarwal, Sapan; Marinella, Matthew

To support the increasing demands for efficient deep neural network processing, accelerators based on analog in-memory computation of matrix multiplication have recently gained significant attention for reducing the energy of neural network inference. However, analog processing within memory arrays must contend with the issue of parasitic voltage drops across the metal interconnects, which distort the results of the computation and limit the array size. This work analyzes how parasitic resistance affects the end-to-end inference accuracy of state-of-the-art convolutional neural networks, and comprehensively studies how various design decisions at the device, circuit, architecture, and algorithm levels affect the system's sensitivity to parasitic resistance effects. A set of guidelines are provided for how to design analog accelerator hardware that is intrinsically robust to parasitic resistance, without any explicit compensation or re-training of the network parameters.

More Details

TYPE Journal Article YEAR 2021

DOI OSTI Scopus

CrossSim: GPU-Accelerated Simulation of Analog Neural Networks

Xiao, Tianyao P.; Bennett, Christopher; Feinberg, Benjamin; Marinella, Matthew; Agarwal, Sapan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Designing Accurate and Robust Analog Accelerators for Neural Networks and Linear Algebra

Xiao, Tianyao P.; Feinberg, Benjamin; Bennett, Christopher; Agarwal, Sapan; Marinella, Matthew

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Multiscale System Modeling of Single-Event-Induced Faults in Advanced Node Processors

IEEE Transactions on Nuclear Science

Cannon, Matthew J.; Rodrigues, Arun; Black, Dolores A.; Black, Jeff; Bustamante, Luis; Feinberg, Benjamin; Quinn, Heather M.; Clark, Lawrence T.; Brunhaver, John S.; Barnaby, Hugh; Mclain, Michael; Agarwal, Sapan; Marinella, Matthew

Integration-technology feature shrink increases computing-system susceptibility to single-event effects (SEE). While modeling SEE faults will be critical, an integrated processor's scope makes physically correct modeling computationally intractable. Without useful models, presilicon evaluation of fault-tolerance approaches becomes impossible. To incorporate accurate transistor-level effects at a system scope, we present a multiscale simulation framework. Charge collection at the 1) device level determines 2) circuit-level transient duration and state-upset likelihood. Circuit effects, in turn, impact 3) register-transfer-level architecture-state corruption visible at 4) the system level. Thus, the physically accurate effects of SEEs in large-scale systems, executed on a high-performance computing (HPC) simulator, could be used to drive cross-layer radiation hardening by design. We demonstrate the capabilities of this model with two case studies. First, we determine a D flip-flop's sensitivity at the transistor level on 14-nm FinFet technology, validating the model against published cross sections. Second, we track and estimate faults in a microprocessor without interlocked pipelined stages (MIPS) processor for Adams 90% worst case environment in an isotropic space environment.

More Details

TYPE Journal Article YEAR 2021

DOI OSTI Scopus

Evaluating complexity and resilience trade-offs in emerging memory inference machines

Bennett, Christopher; Dellana, Ryan; Xiao, Tianyao P.; Feinberg, Benjamin; Agarwal, Sapan; Cardwell, Suma G.; Marinella, Matthew; Severa, William M.; Aimone, James B.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Enabling Guaranteed Correctness and Leading Edge Performance under Radiation with a Heterogeneous System

Feinberg, Benjamin; Rodrigues, Arun; Marinella, Matthew; Agarwal, Sapan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Analog-Domain Computation for Solving Numerical Systems

Rohan, Jacob N.; Nguyen, Alex; Feinberg, Benjamin; Agarwal, Sapan; Marinella, Matthew; Forbes, Travis; Kulkarni, Jaydeep

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin; Wong, Ryan; Xiao, Tianyao P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI Scopus

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin; Wong, Ryan; Xiao, Tianyao P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI Scopus

Enabling Guaranteed Correctness and Leading Edge Performance under Radiation with a Heterogeneous System

Feinberg, Benjamin; Rodrigues, Arun; Agarwal, Sapan; Marinella, Matthew

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2021

OSTI

Publications

Search results