Publications

Results 26–50 of 52

Search results

Jump to search filters

Ionizing Radiation Effects in SONOS-Based Neuromorphic Inference Accelerators

IEEE Transactions on Nuclear Science

Xiao, Tianyao P.; Bennett, Christopher; Agarwal, Sapan; Hughart, David R.; Barnaby, Hugh J.; Puchner, Helmut; Prabhakar, Venkatraman; Talin, Albert A.; Marinella, Matthew

We evaluate the sensitivity of neuromorphic inference accelerators based on silicon-oxide-nitride-oxide-silicon (SONOS) charge trap memory arrays to total ionizing dose (TID) effects. Data retention statistics were collected for 16 Mbit of 40-nm SONOS digital memory exposed to ionizing radiation from a Co-60 source, showing good retention of the bits up to the maximum dose of 500 krad(Si). Using this data, we formulate a rate-equation-based model for the TID response of trapped charge carriers in the ONO stack and predict the effect of TID on intermediate device states between 'program' and 'erase.' This model is then used to simulate arrays of low-power, analog SONOS devices that store 8-bit neural network weights and support in situ matrix-vector multiplication. We evaluate the accuracy of the irradiated SONOS-based inference accelerator on two image recognition tasks - CIFAR-10 and the challenging ImageNet data set - using state-of-the-art convolutional neural networks, such as ResNet-50. We find that across the data sets and neural networks evaluated, the accelerator tolerates a maximum TID between 10 and 100 krad(Si), with deeper networks being more susceptible to accuracy losses due to TID.

More Details

Heavy-Ion-Induced Displacement Damage Effects in Magnetic Tunnel Junctions with Perpendicular Anisotropy

IEEE Transactions on Nuclear Science

Xiao, Tianyao P.; Bennett, Christopher; Mancoff, Frederick B.; Manuel, Jack; Hughart, David R.; Jacobs-Gedrim, Robin B.; Bielejec, Edward S.; Vizkelethy, Gyorgy; Sun, Jijun; Aggarwal, Sanjeev; Arghavani, Reza; Marinella, Matthew

We evaluate the resilience of CoFeB/MgO/CoFeB magnetic tunnel junctions (MTJs) with perpendicular magnetic anisotropy (PMA) to displacement damage induced by heavy-ion irradiation. MTJs were exposed to 3-MeV Ta2+ ions at different levels of ion beam fluence spanning five orders of magnitude. The devices remained insensitive to beam fluences up to $10^{11}$ ions/cm2, beyond which a gradual degradation in the device magnetoresistance, coercive magnetic field, and spin-transfer-torque (STT) switching voltage were observed, ending with a complete loss of magnetoresistance at very high levels of displacement damage (>0.035 displacements per atom). The loss of magnetoresistance is attributed to structural damage at the MgO interfaces, which allows electrons to scatter among the propagating modes within the tunnel barrier and reduces the net spin polarization. Ion-induced damage to the interface also reduces the PMA. This study clarifies the displacement damage thresholds that lead to significant irreversible changes in the characteristics of STT magnetic random access memory (STT-MRAM) and elucidates the physical mechanisms underlying the deterioration in device properties.

More Details

In situ Parallel Training of Analog Neural Network Using Electrochemical Random-Access Memory

Frontiers in Neuroscience (Online)

Talin, Albert A.; Li, Yiyang; Fuller, Elliot J.; Bennett, Christopher; Xiao, Tianyao P.; Salleo, Alberto; Melianas, Armantas; Isele, Erik; Marinella, Matthew; Tao, Hanbo

In-memory computing based on non-volatile resistive memory can significantly improve the energy efficiency of artificial neural networks. However, accurate in situ training has been challenging due to the nonlinear and stochastic switching of the resistive memory elements. One promising analog memory is the electrochemical random-access memory (ECRAM), also known as the redox transistor. Its low write currents and linear switching properties across hundreds of analog states enable accurate and massively parallel updates of a full crossbar array, which yield rapid and energy-efficient training. While simulations predict that ECRAM based neural networks achieve high training accuracy at significantly higher energy efficiency than digital implementations, these predictions have not been experimentally achieved. In this work, we train a 3 × 3 array of ECRAM devices that learns to discriminate several elementary logic gates (AND, OR, NAND). We record the evolution of the network’s synaptic weights during parallel in situ (on-line) training, with outer product updates. Due to linear and reproducible device switching characteristics, our crossbar simulations not only accurately simulate the epochs to convergence, but also quantitatively capture the evolution of weights in individual devices. The implementation of the first in situ parallel training together with strong agreement with simulation results provides a significant advance toward developing ECRAM into larger crossbar arrays for artificial neural network accelerators, which could enable orders of magnitude improvements in energy efficiency of deep neural networks.

More Details

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin; Wong, Ryan; Xiao, Tianyao P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin; Wong, Ryan; Xiao, Tianyao P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

Analog architectures for neural network acceleration based on non-volatile memory

Applied Physics Reviews

Xiao, Tianyao P.; Bennett, Christopher; Feinberg, Benjamin; Agarwal, Sapan; Marinella, Matthew

Analog hardware accelerators, which perform computation within a dense memory array, have the potential to overcome the major bottlenecks faced by digital hardware for data-heavy workloads such as deep learning. Exploiting the intrinsic computational advantages of memory arrays, however, has proven to be challenging principally due to the overhead imposed by the peripheral circuitry and due to the non-ideal properties of memory devices that play the role of the synapse. We review the existing implementations of these accelerators for deep supervised learning, organizing our discussion around the different levels of the accelerator design hierarchy, with an emphasis on circuits and architecture. We explore and consolidate the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlight the key design trade-offs underlying these techniques.

More Details

Mosaics, The Best of Both Worlds: Analog devices with Digital Spiking Communication to build a Hybrid Neural Network Accelerator

Aimone, James B.; Bennett, Christopher; Cardwell, Suma G.; Dellana, Ryan; Xiao, Tianyao P.

Neuromorphic architectures have seen a resurgence of interest in the past decade owing to 100x-1000x efficiency gain over conventional Von Neumann architectures. Digital neuromorphic chips like Intel's Loihi have shown efficiency gains compared to GPUs and CPUs and can be scaled to build larger systems. Analog neuromorphic architectures promise even further savings in energy efficiency, area, and latency than their digital counterparts. Neuromorphic analog and digital technologies provide both low-power and configurable acceleration of challenging artificial intelligence (AI) algorithms. We present a hybrid analog-digital neuromorphic architecture that can amplify the advantages of both high-density analog memory and spike-based digital communication while mitigating each of the other approaches' limitations.

More Details

Device-aware inference operations in SONOS nonvolatile memory arrays

IEEE International Reliability Physics Symposium Proceedings

Bennett, Christopher; Xiao, Tianyao P.; Dellana, Ryan; Feinberg, Benjamin; Agarwal, Sapan; Marinella, Matthew; Agrawal, Vineet; Prabhakar, Venkatraman; Ramkumar, Krishnaswamy; Hinh, Long; Saha, Swatilekha; Raghavan, Vijay; Chettuvetty, Ramesh

Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision.

More Details

Plasticity-enhanced domain-wall MTJ neural networks for energy-efficient online learning

Proceedings - IEEE International Symposium on Circuits and Systems

Bennett, Christopher; Xiao, Tianyao P.; Cui, Can; Hassan, Naimul; Akinola, Otitoaleke G.; Incorvia, Jean A.C.; Velasquez, Alvaro; Friedman, Joseph S.; Marinella, Matthew

Machine learning implements backpropagation via abundant training samples. We demonstrate a multi-stage learning system realized by a promising non-volatile memory device, the domain-wall magnetic tunnel junction (DW-MTJ). The system consists of unsupervised (clustering) as well as supervised sub-systems, and generalizes quickly (with few samples). We demonstrate interactions between physical properties of this device and optimal implementation of neuroscience-inspired plasticity learning rules, and highlight performance on a suite of tasks. Our energy analysis confirms the value of the approach, as the learning budget stays below 20µJ even for large tasks used typically in machine learning.

More Details
Results 26–50 of 52
Results 26–50 of 52