Publications

Results 51–75 of 121

Search results

Jump to search filters

Mathematical optimizations for deep learning

Cyber-Physical Systems Security

Green, Sam G.; Vineyard, Craig M.; Koc, Cetin K.

Deep neural networks are often computationally expensive, during both the training stage and inference stage. Training is always expensive, because back-propagation requires high-precision floating-pointmultiplication and addition. However, various mathematical optimizations may be employed to reduce the computational cost of inference. Optimized inference is important for reducing power consumption and latency and for increasing throughput. This chapter introduces the central approaches for optimizing deep neural network inference: pruning "unnecessary" weights, quantizing weights and inputs, sharing weights between layer units, compressing weights before transferring from main memory, distilling large high-performance models into smaller models, and decomposing convolutional filters to reduce multiply and accumulate operations. In this chapter, using a unified notation, we provide a mathematical and algorithmic description of the aforementioned deep neural network inference optimization methods.

More Details

Impacts of Mathematical Optimizations on Reinforcement Learning Policy Performance

Proceedings of the International Joint Conference on Neural Networks

Green, Sam G.; Vineyard, Craig M.; Koc, Cetin K.

Deep neural networks (DNN) now outperform competing methods in many academic and industrial domains. These high-capacity universal function approximators have recently been leveraged by deep reinforcement learning (RL) algorithms to obtain impressive results for many control and decision making problems. During the past three years, research toward pruning, quantization, and compression of DNNs has reduced the mathematical, and therefore time and energy, requirements of DNN-based inference. For example, DNN optimization techniques have been developed which reduce storage requirements of VGG-16 from 552MB to 11.3MB, while maintaining the full-model accuracy for image classification. Building from DNN optimization results, the computer architecture community is taking increasing interest in exploring DNN hardware accelerator designs. Based on recent deep RL performance, we expect hardware designers to begin considering architectures appropriate for accelerating these algorithms too. However, it is currently unknown how, when, or if the 'noise' introduced by DNN optimization techniques will degrade deep RL performance. This work measures these impacts, using standard OpenAI Gym benchmarks. Our results show that mathematically optimized RL policies can perform equally to full-precision RL, while requiring substantially less computation. We also observe that different optimizations are better suited than others for different problem domains. By beginning to understand the impacts of mathematical optimizations on RL policy performance, this work serves as a starting point toward the development of low power or high performance deep RL accelerators.

More Details

Computing with spikes: The advantage of fine-grained timing

Neural Computation

Verzi, Stephen J.; Rothganger, Fredrick R.; Parekh, Ojas D.; Quach, Tu-Thach Q.; Miner, Nadine E.; Vineyard, Craig M.; James, Conrad D.; Aimone, James B.

Neural-inspired spike-based computing machines often claim to achieve considerable advantages in terms of energy and time efficiency by using spikes for computation and communication. However, fundamental questions about spike-based computation remain unanswered. For instance, how much advantage do spike-based approaches have over conventionalmethods, and underwhat circumstances does spike-based computing provide a comparative advantage? Simply implementing existing algorithms using spikes as the medium of computation and communication is not guaranteed to yield an advantage. Here, we demonstrate that spike-based communication and computation within algorithms can increase throughput, and they can decrease energy cost in some cases. We present several spiking algorithms, including sorting a set of numbers in ascending/descending order, as well as finding the maximum or minimum ormedian of a set of numbers.We also provide an example application: a spiking median-filtering approach for image processing providing a low-energy, parallel implementation. The algorithms and analyses presented here demonstrate that spiking algorithms can provide performance advantages and offer efficient computation of fundamental operations useful in more complex algorithms.

More Details

Neural-Inspired Anomaly Detection

Springer Proceedings in Complexity

Verzi, Stephen J.; Vineyard, Craig M.; Aimone, James B.

Anomaly detection is an important problem in various fields of complex systems research including image processing, data analysis, physical security and cybersecurity. In image processing, it is used for removing noise while preserving image quality, and in data analysis, physical security and cybersecurity, it is used to find interesting data points, objects or events in a vast sea of information. Anomaly detection will continue to be an important problem in domains intersecting with “Big Data”. In this paper we provide a novel algorithm for anomaly detection that uses phase-coded spiking neurons as basic computational elements.

More Details

A spike-Timing neuromorphic architecture

2017 IEEE International Conference on Rebooting Computing, ICRC 2017 - Proceedings

Hill, Aaron J.; Donaldson, Jonathon W.; Rothganger, Fredrick R.; Vineyard, Craig M.; Follett, David R.; Follett, Pamela L.; Smith, Michael R.; Verzi, Stephen J.; Severa, William M.; Wang, Felix W.; Aimone, James B.; Naegle, John H.; James, Conrad D.

Unlike general purpose computer architectures that are comprised of complex processor cores and sequential computation, the brain is innately parallel and contains highly complex connections between computational units (neurons). Key to the architecture of the brain is a functionality enabled by the combined effect of spiking communication and sparse connectivity with unique variable efficacies and temporal latencies. Utilizing these neuroscience principles, we have developed the Spiking Temporal Processing Unit (STPU) architecture which is well-suited for areas such as pattern recognition and natural language processing. In this paper, we formally describe the STPU, implement the STPU on a field programmable gate array, and show measured performance data.

More Details

A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing

Vineyard, Craig M.; Verzi, Stephen J.

As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilize memory.

More Details

Optimization-based computation with spiking neurons

Proceedings of the International Joint Conference on Neural Networks

Verzi, Stephen J.; Vineyard, Craig M.; Vugrin, Eric D.; Sahakian, Meghan A.; James, Conrad D.; Aimone, James B.

Considerable effort is currently being spent designing neuromorphic hardware for addressing challenging problems in a variety of pattern-matching applications. These neuromorphic systems offer low power architectures with intrinsically parallel and simple spiking neuron processing elements. Unfortunately, these new hardware architectures have been largely developed without a clear justification for using spiking neurons to compute quantities for problems of interest. Specifically, the use of spiking for encoding information in time has not been explored theoretically with complexity analysis to examine the operating conditions under which neuromorphic computing provides a computational advantage (time, space, power, etc.) In this paper, we present and formally analyze the use of temporal coding in a neural-inspired algorithm for optimization-based computation in neural spiking architectures.

More Details
Results 51–75 of 121
Results 51–75 of 121