Commutative Data Reordering: A New Technique to Reduce Data Movement Energy on Sparse Inference Workloads
Abstract not provided.
Abstract not provided.
Abstract not provided.
IEEE International Reliability Physics Symposium Proceedings
Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision.
Proceedings of SPIE - The International Society for Optical Engineering
The attachment of dopant precursor molecules to depassivated areas of hydrogen-terminated silicon templated with a scanning tunneling microscope (STM) has been used to create electronic devices with sub-nanometer precision, typically for quantum physics demonstrations, and to dope silicon past the solid-solubility limit, with potential applications in microelectronics and plasmonics. However, this process, which we call atomic precision advanced manufacturing (APAM), currently lacks the throughput required to develop sophisticated applications because there is no proven scalable hydrogen lithography pathway. Here, we demonstrate and characterize an APAM device workflow where STM lithography has been replaced with photolithography. An ultraviolet laser is shown to locally heat silicon controllably above the temperature required for hydrogen depassivation. STM images indicate a narrow range of laser energy density where hydrogen has been depassivated, and the surface remains well-ordered. A model for photothermal heating of silicon predicts a local temperature which is consistent with atomic-scale STM images of the photo-patterned regions. Finally, a simple device made by exposing photo-depassivated silicon to phosphine is found to have a carrier density and mobility similar to that produced by similar devices patterned by STM.
ACS Photonics
A number of applications in basic science and technology would benefit from high-fidelity photon-number-resolving photodetectors. While some recent experimental progress has been made in this direction, the requirements for true photon number resolution are stringent, and no design currently exists that achieves this goal. Here we employ techniques from fundamental quantum optics to demonstrate that detectors composed of subwavelength elements interacting collectively with the photon field can achieve high-performance photon number resolution. We propose a new design that simultaneously achieves photon number resolution, high efficiency, low jitter, low dark counts, and high count rate. We discuss specific systems that satisfy the design requirements, pointing to the important role of nanoscale device elements.
Nature
Nuclear spins are highly coherent quantum objects. In large ensembles, their control and detection via magnetic resonance is widely exploited, for example, in chemistry, medicine, materials science and mining. Nuclear spins also featured in early proposals for solid-state quantum computers1 and demonstrations of quantum search2 and factoring3 algorithms. Scaling up such concepts requires controlling individual nuclei, which can be detected when coupled to an electron4–6. However, the need to address the nuclei via oscillating magnetic fields complicates their integration in multi-spin nanoscale devices, because the field cannot be localized or screened. Control via electric fields would resolve this problem, but previous methods7–9 relied on transducing electric signals into magnetic fields via the electron–nuclear hyperfine interaction, which severely affects nuclear coherence. Here we demonstrate the coherent quantum control of a single 123Sb (spin-7/2) nucleus using localized electric fields produced within a silicon nanoelectronic device. The method exploits an idea proposed in 196110 but not previously realized experimentally with a single nucleus. Our results are quantitatively supported by a microscopic theoretical model that reveals how the purely electrical modulation of the nuclear electric quadrupole interaction results in coherent nuclear spin transitions that are uniquely addressable owing to lattice strain. The spin dephasing time, 0.1 seconds, is orders of magnitude longer than those obtained by methods that require a coupled electron spin to achieve electrical driving. These results show that high-spin quadrupolar nuclei could be deployed as chaotic models, strain sensors and hybrid spin-mechanical quantum systems using all-electrical controls. Integrating electrically controllable nuclei with quantum dots11,12 could pave the way to scalable, nuclear- and electron-spin-based quantum computers in silicon that operate without the need for oscillating magnetic fields.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
arXiv preprint
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Computational Physics
The fractional Laplacian in Rd, which we write as (−Δ)α/2 with α∈(0,2), has multiple equivalent characterizations. Moreover, in bounded domains, boundary conditions must be incorporated in these characterizations in mathematically distinct ways, and there is currently no consensus in the literature as to which definition of the fractional Laplacian in bounded domains is most appropriate for a given application. The Riesz (or integral) definition, for example, admits a nonlocal boundary condition, where the value of a function must be prescribed on the entire exterior of the domain in order to compute its fractional Laplacian. In contrast, the spectral definition requires only the standard local boundary condition. These differences, among others, lead us to ask the question: “What is the fractional Laplacian?” Beginning from first principles, we compare several commonly used definitions of the fractional Laplacian theoretically, through their stochastic interpretations as well as their analytical properties. Then, we present quantitative comparisons using a sample of state-of-the-art methods. We discuss recent advances on nonzero boundary conditions and present new methods to discretize such boundary value problems: radial basis function collocation (for the Riesz fractional Laplacian) and nonharmonic lifting (for the spectral fractional Laplacian). In our numerical studies, we aim to compare different definitions on bounded domains using a collection of benchmark problems. We consider the fractional Poisson equation with both zero and nonzero boundary conditions, where the fractional Laplacian is defined according to the Riesz definition, the spectral definition, the directional definition, and the horizon-based nonlocal definition. We verify the accuracy of the numerical methods used in the approximations for each operator, and we focus on identifying differences in the boundary behaviors of solutions to equations posed with these different definitions. Through our efforts, we aim to further engage the research community in open problems and assist practitioners in identifying the most appropriate definition and computational approach to use for their mathematical models in addressing anomalous transport in diverse applications.
Concurrency and Computation: Practice and Experience
As we approach exascale, computational parallelism will have to drastically increase in order to meet throughput targets. Many-core architectures have exacerbated this problem by trading reduced clock speeds, core complexity, and computation throughput for increasing parallelism. This presents two major challenges for communication libraries such as MPI: the library must leverage the performance advantages of thread level parallelism and avoid the scalability problems associated with increasing the number of processes to that scale. Hybrid programming models, such as MPI+X, have been proposed to address these challenges. MPI THREAD MULTIPLE is MPI's thread safe mode. While there has been work to optimize it, it largely remains non-performant in most implementations. While current applications avoid MPI multithreading due to performance concerns, it is expected to be utilized in future applications. One of the major synchronous data structures required by MPI is the matching engine. In this paper, we present a parallel matching algorithm that can improve MPI matching for multithreaded applications. We then perform a feasibility study to demonstrate the performance benefit of the technique.
SIAM Journal on Mathematics of Data Science
Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be interpreted as forward Euler discretizations of a nonlinear initial value problem whose time-dependent control variables represent the weights of the neural network. Hence, training a ResNet can be cast as an optimal control problem of the associated dynamical system. For similar time-dependent optimal control problems arising in engineering applications, parallel-in-time methods have shown notable improvements in scalability. This paper demonstrates the use of those techniques for efficient and effective training of ResNets. The proposed algorithms replace the classical (sequential) forward and backward propagation through the network layers with a parallel nonlinear multigrid iteration applied to the layer domain. This adds a new dimension of parallelism across layers that is attractive when training very deep networks. From this basic idea, we derive multiple layer-parallel methods. The most efficient version employs a simultaneous optimization approach where updates to the network parameters are based on inexact gradient information in order to speed up the training process. Finally, using numerical examples from supervised classification, we demonstrate that the new approach achieves a training performance similar to that of traditional methods, but enables layer-parallelism and thus provides speedup over layer-serial methods through greater concurrency.
Computers and Chemical Engineering
We study the solution of block-structured linear algebra systems arising in optimization by using iterative solution techniques. These systems are the core computational bottleneck of many problems of interest such as parameter estimation, optimal control, network optimization, and stochastic programming. Our approach uses a Krylov solver (GMRES) that is preconditioned with an alternating method of multipliers (ADMM). We show that this ADMM-GMRES approach overcomes well-known scalability issues of Schur complement decomposition in problems that exhibit a high degree of coupling. The effectiveness of the approach is demonstrated using linear systems that arise in stochastic optimal power flow problems and that contain up to 2 million total variables and 4000 coupling variables. We find that ADMM-GMRES is nearly an order of magnitude faster than Schur complement decomposition. Moreover, we demonstrate that the approach is robust to the selection of the augmented Lagrangian penalty parameter, which is a key advantage over the direct use of ADMM.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.