Publications

Results 51–100 of 154

Search results

Jump to search filters

Analog architectures for neural network acceleration based on non-volatile memory

Applied Physics Reviews

Xiao, Tianyao P.; Bennett, Christopher; Feinberg, Benjamin; Agarwal, Sapan; Marinella, Matthew

Analog hardware accelerators, which perform computation within a dense memory array, have the potential to overcome the major bottlenecks faced by digital hardware for data-heavy workloads such as deep learning. Exploiting the intrinsic computational advantages of memory arrays, however, has proven to be challenging principally due to the overhead imposed by the peripheral circuitry and due to the non-ideal properties of memory devices that play the role of the synapse. We review the existing implementations of these accelerators for deep supervised learning, organizing our discussion around the different levels of the accelerator design hierarchy, with an emphasis on circuits and architecture. We explore and consolidate the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlight the key design trade-offs underlying these techniques.

More Details

Tunnel-FET Switching Is Governed by Non-Lorentzian Spectral Line Shape

Proceedings of the IEEE

Vadlamani, Sri K.; Agarwal, Sapan; Limmer, David T.; Louie, Steven G.; Fischer, Felix R.; Yablonovitch, Eli

In tunnel field-effect transistors (tFETs), the preferred mechanism for switching occurs by alignment (on) or misalignment (off) of two energy levels or band edges. Unfortunately, energy levels are never perfectly sharp. When a quantum dot interacts with a wire, its energy is broadened. Its actual spectral shape controls the current/voltage response of such transistor switches, from on (aligned) to off (misaligned). The most common model of spectral line shape is the Lorentzian, which falls off as reciprocal energy offset squared. Unfortunately, this is too slow a turnoff, algebraically, to be useful as a transistor switch. Electronic switches generally demand an on/off ratio of at least a million. Steep exponentially falling spectral tails would be needed for rapid off-state switching. This requires a new electronic feature, not previously recognized: narrowband, heavy-effective mass, quantum wire electrical contacts, to the tunneling quantum states. These are a necessity for spectrally sharp switching.

More Details

PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM

IEEE Transactions on Computers

Ankit, Aayush; El Hajj, Izzat; Agarwal, Sapan; Marinella, Matthew; Foltin, Martin; Strachan, John P.; Milojicic, Dejan; Hwu, Wen M.; Roy, Kaushik

The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our design can also be integrated into other accelerators in the literature to enhance their efficiency. Our evaluation shows that PANTHER achieves up to 8.02×, 54.21×, and 103× energy reductions as well as 7.16×, 4.02×, and 16× execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively.

More Details

Device-aware inference operations in SONOS nonvolatile memory arrays

IEEE International Reliability Physics Symposium Proceedings

Bennett, Christopher; Xiao, Tianyao P.; Dellana, Ryan; Feinberg, Benjamin; Agarwal, Sapan; Marinella, Matthew; Agrawal, Vineet; Prabhakar, Venkatraman; Ramkumar, Krishnaswamy; Hinh, Long; Saha, Swatilekha; Raghavan, Vijay; Chettuvetty, Ramesh

Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision.

More Details

Extracting an Empirical Intermetallic Hydride Design Principle from Limited Data via Interpretable Machine Learning

Journal of Physical Chemistry Letters

Witman, Matthew D.; Ling, Sanliang; Grant, David M.; Walker, Gavin S.; Agarwal, Sapan; Stavila, Vitalie; Allendorf, Mark

An open question in the metal hydride community is whether there are simple, physics-based design rules that dictate the thermodynamic properties of these materials across the variety of structures and chemistry they can exhibit. While black box machine learning-based algorithms can predict these properties with some success, they do not directly provide the basis on which these predictions are made, therefore complicating the a priori design of novel materials exhibiting a desired property value. In this work we demonstrate how feature importance, as identified by a gradient boosting tree regressor, uncovers the strong dependence of the metal hydride equilibrium H2 pressure on a volume-based descriptor that can be computed from just the elemental composition of the intermetallic alloy. Elucidation of this simple structure-property relationship is valid across a range of compositions, metal substitutions, and structural classes exhibited by intermetallic hydrides. This permits rational targeting of novel intermetallics for high-pressure hydrogen storage (low-stability hydrides) by their descriptor values, and we predict a known intermetallic to form a low-stability hydride (as confirmed by density functional theory calculations) that has not yet been experimentally investigated.

More Details

Energy and Performance Benchmarking of a Domain Wall-Magnetic Tunnel Junction Multibit Adder

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

Xiao, Tianyao P.; Bennett, Christopher; Hu, Xuan; Feinberg, Benjamin; Jacobs-Gedrim, Robin B.; Agarwal, Sapan; Brunhaver, John S.; Friedman, Joseph S.; Incorvia, Jean A.C.; Marinella, Matthew

The domain-wall (DW)-magnetic tunnel junction (MTJ) device implements universal Boolean logic in a manner that is naturally compact and cascadable. However, an evaluation of the energy efficiency of this emerging technology for standard logic applications is still lacking. In this article, we use a previously developed compact model to construct and benchmark a 32-bit adder entirely from DW-MTJ devices that communicates with DW-MTJ registers. The results of this large-scale design and simulation indicate that while the energy cost of systems driven by spin-Transfer torque (STT) DW motion is significantly higher than previously predicted, the same concept using spin-orbit torque (SOT) switching benefits from an improvement in the energy per operation by multiple orders of magnitude, attaining competitive energy values relative to a comparable CMOS subprocessor component. This result clarifies the path toward practical implementations of an all-magnetic processor system.

More Details

Low-Voltage, CMOS-Free Synaptic Memory Based on LiXTiO2 Redox Transistors

ACS Applied Materials and Interfaces

Li, Yiyang; Fuller, Elliot J.; Asapu, Shiva; Kurita, Tomochika; Agarwal, Sapan; Yang, J.J.; Talin, Albert A.

Neuromorphic computers based on analogue neural networks aim to substantially lower computing power by reducing the need to shuttle data between memory and logic units. Artificial synapses containing nonvolatile analogue conductance states enable direct computation using memory elements; however, most nonvolatile analogue memories require high write voltages and large current densities and are accompanied by nonlinear and unpredictable weight updates. Here, we develop an inorganic redox transistor based on electrochemical lithium-ion insertion into LiXTiO2 that displays linear weight updates at both low current densities and low write voltages. The write voltage, as low as 200 mV at room temperature, is achieved by minimizing the open-circuit voltage and using a low-voltage diffusive memristor selector. We further show that the LiXTiO2 redox transistor can achieve an extremely sharp transistor subthreshold slope of just 40 mV/decade when operating in an electrochemically driven phase transformation regime.

More Details

Wafer-Scale TaOx Device Variability and Implications for Neuromorphic Computing Applications

IEEE International Reliability Physics Symposium Proceedings

Bennett, Christopher; Garland, Diana; Jacobs-Gedrim, Robin B.; Agarwal, Sapan; Marinella, Matthew

Scaling arrays of non-volatile memory devices from academic demonstrations to reliable, manufacturable systems requires a better understanding of variability at array and wafer-scale levels. CrossSim models the accuracy of neural networks implemented on an analog resistive memory accelerator using the cycle-to-cycle variability of a single device. In this work, we extend this modeling tool to account for device-to-device variation in a realistic way, and evaluate the impact of this reliability issue in the context of neuromorphic online learning tasks.

More Details

Wafer-Scale TaOx Device Variability and Implications for Neuromorphic Computing Applications

IEEE International Reliability Physics Symposium Proceedings

Bennett, Christopher; Garland, Diana; Jacobs-Gedrim, Robin B.; Agarwal, Sapan; Marinella, Matthew

Scaling arrays of non-volatile memory devices from academic demonstrations to reliable, manufacturable systems requires a better understanding of variability at array and wafer-scale levels. CrossSim models the accuracy of neural networks implemented on an analog resistive memory accelerator using the cycle-to-cycle variability of a single device. In this work, we extend this modeling tool to account for device-to-device variation in a realistic way, and evaluate the impact of this reliability issue in the context of neuromorphic online learning tasks.

More Details

Designing and modeling analog neural network training accelerators

2019 International Symposium on VLSI Technology, Systems and Application, VLSI-TSA 2019

Agarwal, Sapan; Jacobs-Gedrim, Robin B.; Bennett, Christopher; Hsia, Alexander W.; Adee, Shane M.; Hughart, David R.; Fuller, Elliot J.; Li, Yiyang; Talin, Albert A.; Marinella, Matthew

Analog crossbars have the potential to reduce the energy and latency required to train a neural network by three orders of magnitude when compared to an optimized digital ASIC. The crossbar simulator, CrossSim, can be used to model device nonidealities and determine what device properties are needed to create an accurate neural network accelerator. Experimentally measured device statistics are used to simulate neural network training accuracy and compare different classes of devices including TaOx ReRAM, Lir-Co-Oz devices, and conventional floating gate SONOS memories. A technique called 'Periodic Carry' can overcomes device nonidealities by using a positional number system while maintaining the benefit of parallel analog matrix operations.

More Details

Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing

Science

Fuller, Elliot J.; Keene, Scott T.; Melianas, Armantas; Wang, Zhongrui; Asapu, Shiva; Agarwal, Sapan; Li, Yiyang; Tuchman, Yaakov; James, Conrad D.; Marinella, Matthew; Yang, J.J.; Salleo, Alberto; Talin, Albert A.

Neuromorphic computers could overcome efficiency bottlenecks inherent to conventional computing through parallel programming and readout of artificial neural network weights in a crossbar memory array. However, selective and linear weight updates and <10-nanoampere read currents are required for learning that surpasses conventional computing efficiency. We introduce an ionic floating-gate memory array based on a polymer redox transistor connected to a conductive-bridge memory (CBM). Selective and linear programming of a redox transistor array is executed in parallel by overcoming the bridging threshold voltage of the CBMs. Synaptic weight readout with currents <10 nanoamperes is achieved by diluting the conductive polymer with an insulator to decrease the conductance. The redox transistors endure >1 billion write-read operations and support >1-megahertz write-read frequencies.

More Details

Sparse Data Acquisition on Emerging Memory Architectures

IEEE Access

Quach, Tu T.; Agarwal, Sapan; James, Conrad D.; Marinella, Matthew; Aimone, James B.

Emerging memory devices, such as resistive crossbars, have the capacity to store large amounts of data in a single array. Acquiring the data stored in large-capacity crossbars in a sequential fashion can become a bottleneck. We present practical methods, based on sparse sampling, to quickly acquire sparse data stored on emerging memory devices that support the basic summation kernel, reducing the acquisition time from linear to sub-linear. The experimental results show that at least an order of magnitude improvement in acquisition time can be achieved when the data are sparse. In addition, we show that the energy cost associated with our approach is competitive to that of the sequential method.

More Details

Training a Neural Network on Analog TaOx ReRAM Devices Irradiated With Heavy Ions: Effects on Classification Accuracy Demonstrated With CrossSim

IEEE Transactions on Nuclear Science

Jacobs-Gedrim, Robin B.; Hughart, David R.; Agarwal, Sapan; Vizkelethy, Gyorgy; Bielejec, Edward S.; Vaandrager, Bastiaan L.; Swanson, Scot E.; Knisely, Katherine; Taggart, Jennifer L.; Barnaby, Hugh L.; Marinella, Matthew

The image classification accuracy of a TaOx ReRAM-based neuromorphic computing accelerator is evaluated after intentionally inducing a displacement damage up to a fluence of 1014 2.5-MeV Si ions/cm2 on the analog devices that are used to store weights. Results are consistent with a radiation-induced oxygen vacancy production mechanism. When the device is in the high-resistance state during heavy ion radiation, the device resistance, linearity, and accuracy after training are only affected by high fluence levels. Here, the findings in this paper are in accordance with the results of previous studies on TaOx-based digital resistive random access memory. When the device is in the low-resistance state during irradiation, no resistance change was detected, but devices with a 4-kΩ inline resistor did show a reduction in accuracy after training at 1014 2.5-MeV Si ions/cm2. This indicates that changes in resistance can only be somewhat correlated with changes to devices’ analog properties. This paper demonstrates that TaOx devices are radiation tolerant not only for high radiation environment digital memory applications but also when operated in an analog mode suitable for neuromorphic computation and training on new data sets.

More Details

Sparse Data Acquisition on Emerging Memory Architectures

IEEE Access

Quach, Tu T.; Agarwal, Sapan; James, Conrad D.; Marinella, Matthew; Aimone, James B.

Emerging memory devices, such as resistive crossbars, have the capacity to store large amounts of data in a single array. Acquiring the data stored in large-capacity crossbars in a sequential fashion can become a bottleneck. We present practical methods, based on sparse sampling, to quickly acquire sparse data stored on emerging memory devices that support the basic summation kernel, reducing the acquisition time from linear to sub-linear. The experimental results show that at least an order of magnitude improvement in acquisition time can be achieved when the data are sparse. Finally, in addition, we show that the energy cost associated with our approach is competitive to that of the sequential method.

More Details

All-Solid-State Synaptic Transistor with Ultralow Conductance for Neuromorphic Computing

Advanced Functional Materials

Talin, Albert A.; Fuller, Elliot J.; Agarwal, Sapan

Electronic synaptic devices are important building blocks for neuromorphic computational systems that can go beyond the constraints of von Neumann architecture. Although two-terminal memristive devices are demonstrated to be possible candidates, they suffer from several shortcomings related to the filament formation mechanism including nonlinear switching, write noise, and high device conductance, all of which limit the accuracy and energy efficiency. Electrochemical three-terminal transistors, in which the channel conductance can be tuned without filament formation provide an alternative platform for synaptic electronics. In this work, an all-solid-state electrochemical transistor made with Li ion–based solid dielectric and 2D α-phase molybdenum oxide (α-MoO3) nanosheets as the channel is demonstrated. These devices achieve nonvolatile conductance modulation in an ultralow conductance regime (<75 nS) by reversible intercalation of Li ions into the α-MoO3 lattice. Based on this operating mechanism, the essential functionalities of synapses, such as short- and long-term synaptic plasticity and bidirectional near-linear analog weight update are demonstrated. Simulations using the handwritten digit data sets demonstrate high recognition accuracy (94.1%) of the synaptic transistor arrays. These results provide an insight into the application of 2D oxides for large-scale, energy-efficient neuromorphic computing networks.

More Details
Results 51–100 of 154
Results 51–100 of 154