Publications

Results 101–125 of 9,998

Search results

Jump to search filters

An Accurate, Error-Tolerant, and Energy-Efficient Neural Network Inference Engine Based on SONOS Analog Memory

IEEE Transactions on Circuits and Systems I: Regular Papers

Xiao, Tianyao X.; Feinberg, Benjamin F.; Bennett, Christopher H.; Agrawal, Vineet; Saxena, Prashant; Prabhakar, Venkatraman; Ramkumar, Krishnaswamy; Medu, Harsha; Raghavan, Vijay; Chettuvetty, Ramesh; Agarwal, Sapan A.; Marinella, Matthew J.

We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-To-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a > 10× gain in energy efficiency over state-of-The-Art digital and analog inference accelerators.

More Details

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems

Moon, Gordon E.; Kwon, Hyoukjun; Jeong, Geonhwa; Chatarasi, Prasanth; Rajamanickam, Sivasankaran R.; Krishna, Tushar

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.

More Details

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems

Moon, Gordon E.; Kwon, Hyoukjun; Jeong, Geonhwa; Chatarasi, Prasanth; Rajamanickam, Sivasankaran R.; Krishna, Tushar

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The efficiency of these accelerators comes from employing optimized dataflow (i.e., spatial/temporal partitioning of data across the PEs and fine-grained scheduling) strategies to optimize data reuse. The focus of this work is to evaluate these accelerator architectures using a tiled general matrix-matrix multiplication (GEMM) kernel. To do so, we develop a framework that finds optimized mappings (dataflow and tile sizes) for a tiled GEMM for a given spatial accelerator and workload combination, leveraging an analytical cost model for runtime and energy. Our evaluations over five spatial accelerators demonstrate that the tiled GEMM mappings systematically generated by our framework achieve high performance on various GEMM workloads and accelerators.

More Details

Large-Scale Atomistic Simulations: Investigating Free Expansion

Moore, Stan G.

The experiment investigates free expansion of a supercritical fluid into a two-phase liquid-vapor coexistence region. A huge molecular dynamics simulation (6 billion Lennard-Jones atoms) was run on 5760 GPUs (33% of LLNL Sierra) using LAMMPS/Kokkos software. This improved visualization workflow and started preliminary simulations of aluminum using SNAP machine learning potential.

More Details

Elucidating size effects on the yield strength of single-crystal Cu via the Richtmyer-Meshkov instability

Journal of Applied Physics

Stewart, James A.; Wood, Mitchell A.; Olles, Joseph D.

Capturing the dynamic response of a material under high strain-rate deformation often demands challenging and time consuming experimental effort. While shock hydrodynamic simulation methods can aid in this area, a priori characterizations of the material strength under shock loading and spall failure are needed in order to parameterize constitutive models needed for these computational tools. Moreover, parameterizations of strain-rate-dependent strength models are needed to capture the full suite of Richtmyer-Meshkov instability (RMI) behavior of shock compressed metals, creating an unrealistic demand for these training data solely on experiments. Herein, we sweep a large range of geometric, crystallographic, and shock conditions within molecular dynamics (MD) simulations and demonstrate the breadth of RMI in Cu that can be captured from the atomic scale. Yield strength measurements from jetted and arrested material from a sinusoidal surface perturbation were quantified as Y RMI = 0.787 ± 0.374 GPa, higher than strain-rate-independent models used in experimentally matched hydrodynamic simulations. Defect-free, single-crystal Cu samples used in MD will overestimate Y RMI, but the drastic scale difference between experiment and MD is highlighted by high confidence neighborhood clustering predictions of RMI characterizations, yielding incorrect classifications.

More Details

Navier-Stokes Equations Do Not Describe the Smallest Scales of Turbulence in Gases

Physical Review Letters

McMullen, Ryan M.; Krygier, Michael K.; Torczynski, J.R.; Gallis, Michail A.

In turbulent flows, kinetic energy is transferred from the largest scales to progressively smaller scales, until it is ultimately converted into heat. The Navier-Stokes equations are almost universally used to study this process. Here, by comparing with molecular-gas-dynamics simulations, we show that the Navier-Stokes equations do not describe turbulent gas flows in the dissipation range because they neglect thermal fluctuations. We investigate decaying turbulence produced by the Taylor-Green vortex and find that in the dissipation range the molecular-gas-dynamics spectra grow quadratically with wave number due to thermal fluctuations, in agreement with previous predictions, while the Navier-Stokes spectra decay exponentially. Furthermore, the transition to quadratic growth occurs at a length scale much larger than the gas molecular mean free path, namely in a regime that the Navier-Stokes equations are widely believed to describe. In fact, our results suggest that the Navier-Stokes equations are not guaranteed to describe the smallest scales of gas turbulence for any positive Knudsen number.

More Details

Three-Photon Optical Pumping for Trapped Ion Quantum Computing

Hogle, Craig W.; Ivory, Megan K.; Lobser, Daniel L.; Ruzic, Brandon R.; DeRose, Christopher T.

In this report we describe the testing of a novel scheme for state preparation of trapped ions in a quantum computing setup. This technique optimally would allow for similar precision and speed of state preparation while allowing for individual addressability of single ions in a chain using technology already available in a trapped ion experiment. As quantum computing experiments become more complicated, mid-experiment measurements will become necessary to achieve algorithms such as quantum error correction. Any mid-experiment measurement then requires the measured qubit to be re-prepared to a known quantum state. Currently this involves the protected qubits to be moved a sizeable distance away from the qubit being re-prepared which can be costly in terms of experiment length as well as introducing errors. Theoretical calculations predict that a three-photon process would allow for state preparation without qubit movement with similar efficiencies to current state preparation methods.

More Details

First-passage time statistics on surfaces of general shape: Surface PDE solvers using Generalized Moving Least Squares (GMLS)

Journal of Computational Physics

Gross, B.J.; Kuberry, Paul A.; Atzberger, P.J.

We develop numerical methods for computing statistics of stochastic processes on surfaces of general shape with drift-diffusion dynamics dXt=a(Xt)dt+b(Xt)dWt. We formulate descriptions of Brownian motion and general drift-diffusion processes on surfaces. We consider statistics of the form u(x)=Ex[∫0τg(Xt)dt]+Ex[f(Xτ)] for a domain Ω and the exit stopping time τ=inft⁡{t>0|Xt∉Ω}, where f,g are general smooth functions. For computing these statistics, we develop high-order Generalized Moving Least Squares (GMLS) solvers for associated surface PDE boundary-value problems based on Backward-Kolmogorov equations. We focus particularly on the mean First Passage Times (FPTs) given by the case f=0,g=1 where u(x)=Ex[τ]. We perform studies for a variety of shapes showing our methods converge with high-order accuracy both in capturing the geometry and the surface PDE solutions. We then perform studies showing how statistics are influenced by the surface geometry, drift dynamics, and spatially dependent diffusivities.

More Details

A theoretical investigation of the hydrolysis of uranium hexafluoride: the initiation mechanism and vibrational spectroscopy

Physical Chemistry Chemical Physics

Lutz, Jesse J.; Byrd, Jason N.; Lotrich, Victor F.; Jensen, Daniel S.; Zador, Judit Z.; Hubbard, Joshua A.

Depleted uranium hexafluoride (UF6), a stockpiled byproduct of the nuclear fuel cycle, reacts readily with atmospheric humidity, but the mechanism is poorly understood. We compare several potential initiation steps at a consistent level of theory, generating underlying structures and vibrational modes using hybrid density functional theory (DFT) and computing relative energies of stationary points with double-hybrid (DH) DFT. A benchmark comparison is performed to assess the quality of DH-DFT data using reference energy differences obtained using a complete-basis-limit coupled-cluster (CC) composite method. The associated large-basis CC computations were enabled by a new general-purpose pseudopotential capability implemented as part of this work. Dispersion-corrected parameter-free DH-DFT methods, namely PBE0-DH-D3(BJ) and PBE-QIDH-D3(BJ), provided mean unsigned errors within chemical accuracy (1 kcal mol−1) for a set of barrier heights corresponding to the most energetically favorable initiation steps. The hydrolysis mechanism is found to proceed via intermolecular hydrogen transfer within van der Waals complexes involving UF6, UF5OH, and UOF4, in agreement with previous studies, followed by the formation of a previously unappreciated dihydroxide intermediate, UF4(OH)2. The dihydroxide is predicted to form under both kinetic and thermodynamic control, and, unlike the alternate pathway leading to the UO2F2 monomer, its reaction energy is exothermic, in agreement with observation. Finally, harmonic and anharmonic vibrational simulations are performed to reinterpret literature infrared spectroscopy in light of this newly identified species.

More Details

Pyomo.GDP: an ecosystem for logic based modeling and optimization development

Optimization and Engineering

Chen, Qi; Johnson, Emma S.; Bernal, David E.; Valentin, Romeo; Kale, Sunjeev; Bates, Johnny; Siirola, John D.; Grossmann, Ignacio E.

We present three core principles for engineering-oriented integrated modeling and optimization tool sets—intuitive modeling contexts, systematic computer-aided reformulations, and flexible solution strategies—and describe how new developments in Pyomo.GDP for Generalized Disjunctive Programming (GDP) advance this vision. We describe a new logical expression system implementation for Pyomo.GDP allowing for a more intuitive description of logical propositions. The logical expression system supports automated reformulation of these logical constraints to linear constraints. We also describe two new logic-based global optimization solver implementations built on Pyomo.GDP that exploit logical structure to avoid “zero-flow” numerical difficulties that arise in nonlinear network design problems when nodes or streams disappear. These new solvers also demonstrate the capability to link to external libraries for expanded functionality within an integrated implementation. We present these new solvers in the context of a flexible array of solution paths available to GDP models. Finally, we present results on a new library of GDP models demonstrating the value of multiple solution approaches.

More Details

Hole in one: Pathways to deterministic single-acceptor incorporation in Si(100)-2 × 1

AVS Quantum Science

Campbell, Quinn C.; Baczewski, Andrew D.; Butera, R.E.; Misra, Shashank M.

Stochastic incorporation kinetics can be a limiting factor in the scalability of semiconductor fabrication technologies using atomic-precision techniques. While these technologies have recently been extended from donors to acceptors, the extent to which kinetics will impact single-acceptor incorporation has yet to be assessed. To identify the precursor molecule and dosing conditions that are promising for deterministic incorporation, we develop and apply an atomistic model for the single-acceptor incorporation rates of several recently demonstrated molecules: diborane (B2H6), boron trichloride (BCl3), and aluminum trichloride in both monomer (AlCl3) and dimer forms (Al2Cl6). While all three precursors can realize single-acceptor incorporation, we predict that diborane is unlikely to realize deterministic incorporation, boron trichloride can realize deterministic incorporation with modest heating (50 °C), and aluminum trichloride can realize deterministic incorporation at room temperature. We conclude that both boron and aluminum trichloride are promising precursors for atomic-precision single-acceptor applications, with the potential to enable the reliable production of large arrays of single-atom quantum devices.

More Details

A hybrid meshfree discretization to improve the numerical performance of peridynamic models

Computer Methods in Applied Mechanics and Engineering

Shojaei, Arman; Hermann, Alexander; Cyron, Christian J.; Seleson, Pablo; Silling, Stewart A.

Efficient and accurate calculation of spatial integrals is of major interest in the numerical implementation of peridynamics (PD). The standard way to perform this calculation is a particle-based approach that discretizes the strong form of the PD governing equation. This approach has rapidly been adopted by the PD community since it offers some advantages. It is computationally cheaper than other available schemes, can conveniently handle material separation, and effectively deals with nonlinear PD models. Nevertheless, PD models are still computationally very expensive compared with those based on the classical continuum mechanics theory, particularly for large-scale problems in three dimensions. This results from the nonlocal nature of the PD theory which leads to interactions of each node of a discretized body with multiple surrounding nodes. Here, we propose a new approach to significantly boost the numerical efficiency of PD models. We propose a discretization scheme that employs a simple collocation procedure and is truly meshfree; i.e., it does not depend on any background integration cells. In contrast to the standard scheme, the proposed scheme requires a much smaller set of neighboring nodes (keeping the same physical length scale) to achieve a specific accuracy and is thus computationally more efficient. Our new scheme is applicable to the case of linear PD models and within neighborhoods where the solution can be approximated by smooth basis functions. Therefore, to fully exploit the advantages of both the standard and the proposed schemes, a hybrid discretization is presented that combines both approaches within an adaptive framework. The high performance of the developed framework is illustrated by several numerical examples, including brittle fracture and corrosion problems in two and three dimensions.

More Details

Atomic step disorder on polycrystalline surfaces leads to spatially inhomogeneous work functions

Journal of Vacuum Science and Technology A: Vacuum, Surfaces and Films

Bussmann, Ezra B.; Smith, Sean W.; Scrymgeour, David S.; Brumbach, Michael T.; Lu, Ping L.; Dickens, Sara D.; Michael, Joseph R.; Ohta, Taisuke O.; Hjalmarson, Harold P.; Schultz, Peter A.; Clem, Paul G.; Hopkins, Matthew M.; Moore, Christopher M.

Structural disorder causes materials' surface electronic properties, e.g., work function (φ), to vary spatially, yet it is challenging to prove exact causal relationships to underlying ensemble disorder, e.g., roughness or granularity. For polycrystalline Pt, nanoscale resolution photoemission threshold mapping reveals a spatially varying φ = 5.70 ± 0.03 eV over a distribution of (111) vicinal grain surfaces prepared by sputter deposition and annealing. With regard to field emission and related phenomena, e.g., vacuum arc initiation, a salient feature of the φ distribution is that it is skewed with a long tail to values down to 5.4 eV, i.e., far below the mean, which is exponentially impactful to field emission via the Fowler-Nordheim relation. We show that the φ spatial variation and distribution can be explained by ensemble variations of granular tilts and surface slopes via a Smoluchowski smoothing model wherein local φ variations result from spatially varying densities of electric dipole moments, intrinsic to atomic steps, that locally modify φ. Atomic step-terrace structure is confirmed with scanning tunneling microscopy (STM) at several locations on our surfaces, and prior works showed STM evidence for atomic step dipoles at various metal surfaces. From our model, we find an atomic step edge dipole μ = 0.12 D/edge atom, which is comparable to values reported in studies that utilized other methods and materials. Our results elucidate a connection between macroscopic φ and the nanostructure that may contribute to the spread of reported φ for Pt and other surfaces and may be useful toward more complete descriptions of polycrystalline metals in the models of field emission and other related vacuum electronics phenomena, e.g., arc initiation.

More Details

Finding Electronic Structure Machine Learning Surrogates without Training

Fiedler, Lenz; Hoffmann, Nils; Mohammed, Parvez; Popoola, Gabriel A.; Yovell, Tamar; Oles, Vladyslav; Ellis, J.A.; Rajamanickam, Sivasankaran R.; Cangi, Attila

A myriad of phenomena in materials science and chemistry rely on quantum-level simulations of the electronic structure in matter. While moving to larger length and time scales has been a pressing issue for decades, such large-scale electronic structure calculations are still challenging despite modern software approaches and advances in high-performance computing. The silver lining in this regard is the use of machine learning to accelerate electronic structure calculations – this line of research has recently gained growing attention. The grand challenge therein is finding a suitable machine-learning model during a process called hyperparameter optimization. This, however, causes a massive computational overhead in addition to that of data generation. We accelerate the construction of machine-learning surrogate models by roughly two orders of magnitude by circumventing excessive training during the hyperparameter optimization phase. We demonstrate our workflow for Kohn-Sham density functional theory, the most popular computational method in materials science and chemistry.

More Details

Unified Memory: GPGPU-Sim/UVM Smart Integration

Liu, Yechen; Rogers, Timothy; Hughes, Clayton H.

CPU/GPU heterogeneous compute platforms are an ubiquitous element in computing and a programming model specified for this heterogeneous computing model is important for both performance and programmability. A programming model that exposes the shared, unified, address space between the heterogeneous units is a necessary step in this direction as it removes the burden of explicit data movement from the programmer while maintaining performance. GPU vendors, such as AMD and NVIDIA, have released software-managed runtimes that can provide programmers the illusion of unified CPU and GPU memory by automatically migrating data in and out of the GPU memory. However, this runtime support is not included in GPGPU-Sim, a commonly used framework that models the features of a modern graphics processor that are relevant to non-graphics applications. UVM Smart was developed, which extended GPGPU-Sim 3.x to in- corporate the modeling of on-demand pageing and data migration through the runtime. This report discusses the integration of UVM Smart and GPGPU-Sim 4.0 and the modifications to improve simulation performance and accuracy.

More Details

Randomized Cholesky Preconditioning for Graph Partitioning Applications

Espinoza, Heliezer J.; Loe, Jennifer A.; Boman, Erik G.

Graph partitioning has emerged as an area of interest due to its use in various applications in computational research. One way to partition a graph is to solve for the eigenvectors of the corresponding graph Laplacian matrix. This project focuses on the eigensolver LOBPCG and the evaluation of a new preconditioner: Randomized Cholesky Factorization (rchol). This proconditioner was tested for its speed and accuracy against other well-known preconditioners for the method. After experiments were run on several known test matrices, rchol appears to be a better preconditioner for structured matrices. This research was sponsored by National Nuclear Security Administration Minority Serving Institutions Internship Program (NNSA-MSIIP) and completed at host facility Sandia National Laboratories. As such, after discussion of the research project itself, this report contains a brief reflection on experience gained as a result of participating in the NNSA-MSIIP.

More Details

A Block-Based Triangle Counting Algorithm on Heterogeneous Environments

IEEE Transactions on Parallel and Distributed Systems

Yasar, Abdurrahman; Rajamanickam, Sivasankaran R.; Berry, Jonathan W.; Catalyurek, Umit V.

Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.

More Details

Neuromorphic scaling advantages for energy-efficient random walk computations

Nature Electronics

Smith, John D.; Hill, Aaron J.; Reeder, Leah E.; Franke, Brian C.; Lehoucq, Richard B.; Parekh, Ojas D.; Severa, William M.; Aimone, James B.

Neuromorphic computing, which aims to replicate the computational structure and architecture of the brain in synthetic hardware, has typically focused on artificial intelligence applications. What is less explored is whether such brain-inspired hardware can provide value beyond cognitive tasks. Here we show that the high degree of parallelism and configurability of spiking neuromorphic architectures makes them well suited to implement random walks via discrete-time Markov chains. These random walks are useful in Monte Carlo methods, which represent a fundamental computational tool for solving a wide range of numerical computing tasks. Using IBM’s TrueNorth and Intel’s Loihi neuromorphic computing platforms, we show that our neuromorphic computing algorithm for generating random walk approximations of diffusion offers advantages in energy-efficient computation compared with conventional approaches. We also show that our neuromorphic computing algorithm can be extended to more sophisticated jump-diffusion processes that are useful in a range of applications, including financial economics, particle physics and machine learning.

More Details

Randomized Cholesky Preconditioning for Graph Partitioning Applications

Espinoza, Heliezer J.; Loe, Jennifer A.; Boman, Erik G.

A graph is a mathematical representation of a network; we say it consists of a set of vertices, which are connected by edges. Graphs have numerous applications in various fields, as they can model all sorts of connections, processes, or relations. For example, graphs can model intricate transit systems or the human nervous system. However, graphs that are large or complicated become difficult to analyze. This is why there is an increased interest in the area of graph partitioning, reducing the size of the graph into multiple partitions. For example, partitions of a graph representing a social network might help identify clusters of friends or colleagues. Graph partitioning is also a widely used approach to load balancing in parallel computing. The partitioning of a graph is extremely useful to decompose the graph into smaller parts and allow for easier analysis. There are different ways to solve graph partitioning problems. For this work, we focus on a spectral partitioning method which forms a partition based upon the eigenvectors of the graph Laplacian (details presented in Acer, et. al.). This method uses the LOBPCG algorithm to compute these eigenvectors. LOBPCG can be accelerated by an operator called a preconditioner. For this internship, we evaluate a randomized Cholesky (rchol) preconditioner for its effectiveness on graph partitioning problems with LOBPCG. We compare it with two standard preconditioners: Jacobi and Incomplete Cholesky (ichol). This research was conducted from August to December 2021 in conjunction with Sandia National Laboratories.

More Details

Zero-Truncated Poisson Tensor Decomposition for Sparse Count Data

Lopez, Oscar F.; Lehoucq, Richard B.; Dunlavy, Daniel D.

We propose a novel statistical inference paradigm for zero-inflated multiway count data that dispenses with the need to distinguish between true and false zero counts. Our approach ignores all zero entries and applies zero-truncated Poisson regression on the positive counts. Inference is accomplished via tensor completion that imposes low-rank structure on the Poisson parameter space. Our main result shows that an $\textit{N}$-way rank-R parametric tensor 𝓜 ϵ (0, ∞)$I$Χ∙∙∙Χ$I$ generating Poisson observations can be accurately estimated from approximately $IR^2 \text{log}^2_2(I)$ non-zero counts for a nonnegative canonical polyadic decomposition. Several numerical experiments are presented demonstrating that our zero-truncated paradigm is comparable to the ideal scenario where the locations of false zero counts are known $\textit{a priori}$.

More Details

Machine-Learning of Nonlocal Kernels for Anomalous Subsurface Transport from Breakthrough Curves

D'Elia, Marta D.; Glusa, Christian A.; Xu, Xiao; Foster, John T.

Anomalous behavior is ubiquitous in subsurface solute transport due to the presence of high degrees of heterogeneity at different scales in the media. Although fractional models have been extensively used to describe the anomalous transport in various subsurface applications, their application is hindered by computational challenges. Simpler nonlocal models characterized by integrable kernels and finite interaction length represent a computationally feasible alternative to fractional models; yet, the informed choice of their kernel functions still remains an open problem. We propose a general data-driven framework for the discovery of optimal kernels on the basis of very small and sparse data sets in the context of anomalous subsurface transport. Using spatially sparse breakthrough curves recovered from fine-scale particle-density simulations, we learn the best coarse-scale nonlocal model using a nonlocal operator regression technique. Predictions of the breakthrough curves obtained using the optimal nonlocal model show good agreement with fine-scale simulation results even at locations and time intervals different from the ones used to train the kernel, confirming the excellent generalization properties of the proposed algorithm. A comparison with trained classical models and with black-box deep neural networks confirms the superiority of the predictive capability of the proposed model.

More Details

Precision tomography of a three-qubit donor quantum processor in silicon

Nature

Author, No; Madzik, Mateusz T.; Asaad, Serwan; Youssry, Akram; Joecker, Benjamin; Rudinger, Kenneth M.; Nielsen, Erik N.; Young, Kevin C.; Proctor, Timothy J.; Baczewski, Andrew D.; Laucht, Arne; Schmitt, Vivien; Hudson, Fay E.; Itoh, Kohei M.; Jakob, Alexander M.; Johnson, Brett C.; Jamieson, David N.; Dzurak, Andrew S.; Ferrie, Christopher; Blume-Kohout, Robin J.; Morello, Andrea

Nuclear spins were among the first physical platforms to be considered for quantum information processing1,2, because of their exceptional quantum coherence3 and atomic-scale footprint. However, their full potential for quantum computing has not yet been realized, owing to the lack of methods with which to link nuclear qubits within a scalable device combined with multi-qubit operations with sufficient fidelity to sustain fault-tolerant quantum computation. Here we demonstrate universal quantum logic operations using a pair of ion-implanted 31P donor nuclei in a silicon nanoelectronic device. A nuclear two-qubit controlled-Z gate is obtained by imparting a geometric phase to a shared electron spin4, and used to prepare entangled Bell states with fidelities up to 94.2(2.7)%. The quantum operations are precisely characterized using gate set tomography (GST)5, yielding one-qubit average gate fidelities up to 99.95(2)%, two-qubit average gate fidelity of 99.37(11)% and two-qubit preparation/measurement fidelities of 98.95(4)%. These three metrics indicate that nuclear spins in silicon are approaching the performance demanded in fault-tolerant quantum processors6. We then demonstrate entanglement between the two nuclei and the shared electron by producing a Greenberger–Horne–Zeilinger three-qubit state with 92.5(1.0)% fidelity. Because electron spin qubits in semiconductors can be further coupled to other electrons7–9 or physically shuttled across different locations10,11, these results establish a viable route for scalable quantum information processing using donor nuclear and electron spins.

More Details

Calibration of elastoplastic constitutive model parameters from full-field data with automatic differentiation-based sensitivities

International Journal for Numerical Methods in Engineering

Seidl, Daniel T.; Granzow, Brian N.

We present a framework for calibration of parameters in elastoplastic constitutive models that is based on the use of automatic differentiation (AD). The model calibration problem is posed as a partial differential equation-constrained optimization problem where a finite element (FE) model of the coupled equilibrium equation and constitutive model evolution equations serves as the constraint. The objective function quantifies the mismatch between the displacement predicted by the FE model and full-field digital image correlation data, and the optimization problem is solved using gradient-based optimization algorithms. Forward and adjoint sensitivities are used to compute the gradient at considerably less cost than its calculation from finite difference approximations. Through the use of AD, we need only to write the constraints in terms of AD objects, where all of the derivatives required for the forward and inverse problems are obtained by appropriately seeding and evaluating these quantities. We present three numerical examples that verify the correctness of the gradient, demonstrate the AD approach's parallel computation capabilities via application to a large-scale FE model, and highlight the formulation's ease of extensibility to other classes of constitutive models.

More Details

Krylov subspace recycling for evolving structures

Computer Methods in Applied Mechanics and Engineering

Bolten, Matthias; De Sturler, Eric; Hahn, C.; Parks, Michael L.

Krylov subspace recycling is a powerful tool when solving a long series of large, sparse linear systems that change only slowly over time. In PDE constrained shape optimization, these series appear naturally, as typically hundreds or thousands of optimization steps are needed with only small changes in the geometry. In this setting, however, applying Krylov subspace recycling can be a difficult task. As the geometry evolves, in general, so does the finite element mesh defined on or representing this geometry, including the numbers of nodes and elements and element connectivity. This is especially the case if re-meshing techniques are used. As a result, the number of algebraic degrees of freedom in the system changes, and in general the linear system matrices resulting from the finite element discretization change size from one optimization step to the next. Changes in the mesh connectivity also lead to structural changes in the matrices. In the case of re-meshing, even if the geometry changes only a little, the corresponding mesh might differ substantially from the previous one. Obviously, this prevents any straightforward mapping of the approximate invariant subspace of the linear system matrix (the focus of recycling in this work) from one optimization step to the next; similar problems arise for other selected subspaces. In this paper, we present an algorithm to map an approximate invariant subspace of the linear system matrix for the previous optimization step to an approximate invariant subspace of the linear system matrix for the current optimization step, for general meshes. This is achieved by exploiting the map from coefficient vectors to finite element functions on the mesh, combined with interpolation or approximation of functions on the finite element mesh. We demonstrate the effectiveness of our approach numerically with several proof of concept studies for a specific meshing technique.

More Details

Measuring the capabilities of quantum computers

Nature Physics

Proctor, Timothy J.; Rudinger, Kenneth M.; Young, Kevin C.; Nielsen, Erik N.; Blume-Kohout, Robin J.

Quantum computers can now run interesting programs, but each processor’s capability—the set of programs that it can run successfully—is limited by hardware errors. These errors can be complicated, making it difficult to accurately predict a processor’s capability. Benchmarks can be used to measure capability directly, but current benchmarks have limited flexibility and scale poorly to many-qubit processors. We show how to construct scalable, efficiently verifiable benchmarks based on any program by using a technique that we call circuit mirroring. With it, we construct two flexible, scalable volumetric benchmarks based on randomized and periodically ordered programs. We use these benchmarks to map out the capabilities of twelve publicly available processors, and to measure the impact of program structure on each one. We find that standard error metrics are poor predictors of whether a program will run successfully on today’s hardware, and that current processors vary widely in their sensitivity to program structure.

More Details
Results 101–125 of 9,998
Results 101–125 of 9,998