In this report we describe the testing of a novel scheme for state preparation of trapped ions in a quantum computing setup. This technique optimally would allow for similar precision and speed of state preparation while allowing for individual addressability of single ions in a chain using technology already available in a trapped ion experiment. As quantum computing experiments become more complicated, mid-experiment measurements will become necessary to achieve algorithms such as quantum error correction. Any mid-experiment measurement then requires the measured qubit to be re-prepared to a known quantum state. Currently this involves the protected qubits to be moved a sizeable distance away from the qubit being re-prepared which can be costly in terms of experiment length as well as introducing errors. Theoretical calculations predict that a three-photon process would allow for state preparation without qubit movement with similar efficiencies to current state preparation methods.
We develop numerical methods for computing statistics of stochastic processes on surfaces of general shape with drift-diffusion dynamics dXt=a(Xt)dt+b(Xt)dWt. We formulate descriptions of Brownian motion and general drift-diffusion processes on surfaces. We consider statistics of the form u(x)=Ex[∫0τg(Xt)dt]+Ex[f(Xτ)] for a domain Ω and the exit stopping time τ=inft{t>0|Xt∉Ω}, where f,g are general smooth functions. For computing these statistics, we develop high-order Generalized Moving Least Squares (GMLS) solvers for associated surface PDE boundary-value problems based on Backward-Kolmogorov equations. We focus particularly on the mean First Passage Times (FPTs) given by the case f=0,g=1 where u(x)=Ex[τ]. We perform studies for a variety of shapes showing our methods converge with high-order accuracy both in capturing the geometry and the surface PDE solutions. We then perform studies showing how statistics are influenced by the surface geometry, drift dynamics, and spatially dependent diffusivities.
Depleted uranium hexafluoride (UF6), a stockpiled byproduct of the nuclear fuel cycle, reacts readily with atmospheric humidity, but the mechanism is poorly understood. We compare several potential initiation steps at a consistent level of theory, generating underlying structures and vibrational modes using hybrid density functional theory (DFT) and computing relative energies of stationary points with double-hybrid (DH) DFT. A benchmark comparison is performed to assess the quality of DH-DFT data using reference energy differences obtained using a complete-basis-limit coupled-cluster (CC) composite method. The associated large-basis CC computations were enabled by a new general-purpose pseudopotential capability implemented as part of this work. Dispersion-corrected parameter-free DH-DFT methods, namely PBE0-DH-D3(BJ) and PBE-QIDH-D3(BJ), provided mean unsigned errors within chemical accuracy (1 kcal mol−1) for a set of barrier heights corresponding to the most energetically favorable initiation steps. The hydrolysis mechanism is found to proceed via intermolecular hydrogen transfer within van der Waals complexes involving UF6, UF5OH, and UOF4, in agreement with previous studies, followed by the formation of a previously unappreciated dihydroxide intermediate, UF4(OH)2. The dihydroxide is predicted to form under both kinetic and thermodynamic control, and, unlike the alternate pathway leading to the UO2F2 monomer, its reaction energy is exothermic, in agreement with observation. Finally, harmonic and anharmonic vibrational simulations are performed to reinterpret literature infrared spectroscopy in light of this newly identified species.
Structural disorder causes materials' surface electronic properties, e.g., work function (φ), to vary spatially, yet it is challenging to prove exact causal relationships to underlying ensemble disorder, e.g., roughness or granularity. For polycrystalline Pt, nanoscale resolution photoemission threshold mapping reveals a spatially varying φ = 5.70 ± 0.03 eV over a distribution of (111) vicinal grain surfaces prepared by sputter deposition and annealing. With regard to field emission and related phenomena, e.g., vacuum arc initiation, a salient feature of the φ distribution is that it is skewed with a long tail to values down to 5.4 eV, i.e., far below the mean, which is exponentially impactful to field emission via the Fowler-Nordheim relation. We show that the φ spatial variation and distribution can be explained by ensemble variations of granular tilts and surface slopes via a Smoluchowski smoothing model wherein local φ variations result from spatially varying densities of electric dipole moments, intrinsic to atomic steps, that locally modify φ. Atomic step-terrace structure is confirmed with scanning tunneling microscopy (STM) at several locations on our surfaces, and prior works showed STM evidence for atomic step dipoles at various metal surfaces. From our model, we find an atomic step edge dipole μ = 0.12 D/edge atom, which is comparable to values reported in studies that utilized other methods and materials. Our results elucidate a connection between macroscopic φ and the nanostructure that may contribute to the spread of reported φ for Pt and other surfaces and may be useful toward more complete descriptions of polycrystalline metals in the models of field emission and other related vacuum electronics phenomena, e.g., arc initiation.
Stochastic incorporation kinetics can be a limiting factor in the scalability of semiconductor fabrication technologies using atomic-precision techniques. While these technologies have recently been extended from donors to acceptors, the extent to which kinetics will impact single-acceptor incorporation has yet to be assessed. To identify the precursor molecule and dosing conditions that are promising for deterministic incorporation, we develop and apply an atomistic model for the single-acceptor incorporation rates of several recently demonstrated molecules: diborane (B2H6), boron trichloride (BCl3), and aluminum trichloride in both monomer (AlCl3) and dimer forms (Al2Cl6). While all three precursors can realize single-acceptor incorporation, we predict that diborane is unlikely to realize deterministic incorporation, boron trichloride can realize deterministic incorporation with modest heating (50 °C), and aluminum trichloride can realize deterministic incorporation at room temperature. We conclude that both boron and aluminum trichloride are promising precursors for atomic-precision single-acceptor applications, with the potential to enable the reliable production of large arrays of single-atom quantum devices.
Computer Methods in Applied Mechanics and Engineering
Shojaei, Arman; Hermann, Alexander; Cyron, Christian J.; Seleson, Pablo; Silling, Stewart
Efficient and accurate calculation of spatial integrals is of major interest in the numerical implementation of peridynamics (PD). The standard way to perform this calculation is a particle-based approach that discretizes the strong form of the PD governing equation. This approach has rapidly been adopted by the PD community since it offers some advantages. It is computationally cheaper than other available schemes, can conveniently handle material separation, and effectively deals with nonlinear PD models. Nevertheless, PD models are still computationally very expensive compared with those based on the classical continuum mechanics theory, particularly for large-scale problems in three dimensions. This results from the nonlocal nature of the PD theory which leads to interactions of each node of a discretized body with multiple surrounding nodes. Here, we propose a new approach to significantly boost the numerical efficiency of PD models. We propose a discretization scheme that employs a simple collocation procedure and is truly meshfree; i.e., it does not depend on any background integration cells. In contrast to the standard scheme, the proposed scheme requires a much smaller set of neighboring nodes (keeping the same physical length scale) to achieve a specific accuracy and is thus computationally more efficient. Our new scheme is applicable to the case of linear PD models and within neighborhoods where the solution can be approximated by smooth basis functions. Therefore, to fully exploit the advantages of both the standard and the proposed schemes, a hybrid discretization is presented that combines both approaches within an adaptive framework. The high performance of the developed framework is illustrated by several numerical examples, including brittle fracture and corrosion problems in two and three dimensions.
Chen, Qi; Johnson, Emma S.; Bernal, David E.; Valentin, Romeo; Kale, Sunjeev; Bates, Johnny; Siirola, John D.; Grossmann, Ignacio E.
We present three core principles for engineering-oriented integrated modeling and optimization tool sets—intuitive modeling contexts, systematic computer-aided reformulations, and flexible solution strategies—and describe how new developments in Pyomo.GDP for Generalized Disjunctive Programming (GDP) advance this vision. We describe a new logical expression system implementation for Pyomo.GDP allowing for a more intuitive description of logical propositions. The logical expression system supports automated reformulation of these logical constraints to linear constraints. We also describe two new logic-based global optimization solver implementations built on Pyomo.GDP that exploit logical structure to avoid “zero-flow” numerical difficulties that arise in nonlinear network design problems when nodes or streams disappear. These new solvers also demonstrate the capability to link to external libraries for expanded functionality within an integrated implementation. We present these new solvers in the context of a flexible array of solution paths available to GDP models. Finally, we present results on a new library of GDP models demonstrating the value of multiple solution approaches.
A myriad of phenomena in materials science and chemistry rely on quantum-level simulations of the electronic structure in matter. While moving to larger length and time scales has been a pressing issue for decades, such large-scale electronic structure calculations are still challenging despite modern software approaches and advances in high-performance computing. The silver lining in this regard is the use of machine learning to accelerate electronic structure calculations – this line of research has recently gained growing attention. The grand challenge therein is finding a suitable machine-learning model during a process called hyperparameter optimization. This, however, causes a massive computational overhead in addition to that of data generation. We accelerate the construction of machine-learning surrogate models by roughly two orders of magnitude by circumventing excessive training during the hyperparameter optimization phase. We demonstrate our workflow for Kohn-Sham density functional theory, the most popular computational method in materials science and chemistry.
CPU/GPU heterogeneous compute platforms are an ubiquitous element in computing and a programming model specified for this heterogeneous computing model is important for both performance and programmability. A programming model that exposes the shared, unified, address space between the heterogeneous units is a necessary step in this direction as it removes the burden of explicit data movement from the programmer while maintaining performance. GPU vendors, such as AMD and NVIDIA, have released software-managed runtimes that can provide programmers the illusion of unified CPU and GPU memory by automatically migrating data in and out of the GPU memory. However, this runtime support is not included in GPGPU-Sim, a commonly used framework that models the features of a modern graphics processor that are relevant to non-graphics applications. UVM Smart was developed, which extended GPGPU-Sim 3.x to in- corporate the modeling of on-demand pageing and data migration through the runtime. This report discusses the integration of UVM Smart and GPGPU-Sim 4.0 and the modifications to improve simulation performance and accuracy.
Graph partitioning has emerged as an area of interest due to its use in various applications in computational research. One way to partition a graph is to solve for the eigenvectors of the corresponding graph Laplacian matrix. This project focuses on the eigensolver LOBPCG and the evaluation of a new preconditioner: Randomized Cholesky Factorization (rchol). This proconditioner was tested for its speed and accuracy against other well-known preconditioners for the method. After experiments were run on several known test matrices, rchol appears to be a better preconditioner for structured matrices. This research was sponsored by National Nuclear Security Administration Minority Serving Institutions Internship Program (NNSA-MSIIP) and completed at host facility Sandia National Laboratories. As such, after discussion of the research project itself, this report contains a brief reflection on experience gained as a result of participating in the NNSA-MSIIP.
Fractional equations have become the model of choice in several applications where heterogeneities at the microstructure result in anomalous diffusive behavior at the macroscale. Here, we introduce a new fractional operator characterized by a doubly-variable fractional order and possibly truncated interactions. Under certain conditions on the model parameters and on the regularity of the fractional order we show that the corresponding Poisson problem is well-posed. Additionally, we introduce a finite element discretization and describe an efficient implementation of the finite-element matrix assembly in the case of piecewise constant fractional order. Through several numerical tests, we illustrate the improved descriptive power of this new operator across media interfaces. Furthermore, we present one-dimensional and two-dimensional h-convergence results that show that the variable-order model has the same convergence behavior as the constant-order model.
Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.
Neuromorphic computing, which aims to replicate the computational structure and architecture of the brain in synthetic hardware, has typically focused on artificial intelligence applications. What is less explored is whether such brain-inspired hardware can provide value beyond cognitive tasks. Here we show that the high degree of parallelism and configurability of spiking neuromorphic architectures makes them well suited to implement random walks via discrete-time Markov chains. These random walks are useful in Monte Carlo methods, which represent a fundamental computational tool for solving a wide range of numerical computing tasks. Using IBM’s TrueNorth and Intel’s Loihi neuromorphic computing platforms, we show that our neuromorphic computing algorithm for generating random walk approximations of diffusion offers advantages in energy-efficient computation compared with conventional approaches. We also show that our neuromorphic computing algorithm can be extended to more sophisticated jump-diffusion processes that are useful in a range of applications, including financial economics, particle physics and machine learning.
A graph is a mathematical representation of a network; we say it consists of a set of vertices, which are connected by edges. Graphs have numerous applications in various fields, as they can model all sorts of connections, processes, or relations. For example, graphs can model intricate transit systems or the human nervous system. However, graphs that are large or complicated become difficult to analyze. This is why there is an increased interest in the area of graph partitioning, reducing the size of the graph into multiple partitions. For example, partitions of a graph representing a social network might help identify clusters of friends or colleagues. Graph partitioning is also a widely used approach to load balancing in parallel computing. The partitioning of a graph is extremely useful to decompose the graph into smaller parts and allow for easier analysis. There are different ways to solve graph partitioning problems. For this work, we focus on a spectral partitioning method which forms a partition based upon the eigenvectors of the graph Laplacian (details presented in Acer, et. al.). This method uses the LOBPCG algorithm to compute these eigenvectors. LOBPCG can be accelerated by an operator called a preconditioner. For this internship, we evaluate a randomized Cholesky (rchol) preconditioner for its effectiveness on graph partitioning problems with LOBPCG. We compare it with two standard preconditioners: Jacobi and Incomplete Cholesky (ichol). This research was conducted from August to December 2021 in conjunction with Sandia National Laboratories.
We propose a novel statistical inference paradigm for zero-inflated multiway count data that dispenses with the need to distinguish between true and false zero counts. Our approach ignores all zero entries and applies zero-truncated Poisson regression on the positive counts. Inference is accomplished via tensor completion that imposes low-rank structure on the Poisson parameter space. Our main result shows that an $\textit{N}$-way rank-R parametric tensor 𝓜 ϵ (0, ∞)$I$Χ∙∙∙Χ$I$ generating Poisson observations can be accurately estimated from approximately $IR^2 \text{log}^2_2(I)$ non-zero counts for a nonnegative canonical polyadic decomposition. Several numerical experiments are presented demonstrating that our zero-truncated paradigm is comparable to the ideal scenario where the locations of false zero counts are known $\textit{a priori}$.
Anomalous behavior is ubiquitous in subsurface solute transport due to the presence of high degrees of heterogeneity at different scales in the media. Although fractional models have been extensively used to describe the anomalous transport in various subsurface applications, their application is hindered by computational challenges. Simpler nonlocal models characterized by integrable kernels and finite interaction length represent a computationally feasible alternative to fractional models; yet, the informed choice of their kernel functions still remains an open problem. We propose a general data-driven framework for the discovery of optimal kernels on the basis of very small and sparse data sets in the context of anomalous subsurface transport. Using spatially sparse breakthrough curves recovered from fine-scale particle-density simulations, we learn the best coarse-scale nonlocal model using a nonlocal operator regression technique. Predictions of the breakthrough curves obtained using the optimal nonlocal model show good agreement with fine-scale simulation results even at locations and time intervals different from the ones used to train the kernel, confirming the excellent generalization properties of the proposed algorithm. A comparison with trained classical models and with black-box deep neural networks confirms the superiority of the predictive capability of the proposed model.
Nuclear spins were among the first physical platforms to be considered for quantum information processing1,2, because of their exceptional quantum coherence3 and atomic-scale footprint. However, their full potential for quantum computing has not yet been realized, owing to the lack of methods with which to link nuclear qubits within a scalable device combined with multi-qubit operations with sufficient fidelity to sustain fault-tolerant quantum computation. Here we demonstrate universal quantum logic operations using a pair of ion-implanted 31P donor nuclei in a silicon nanoelectronic device. A nuclear two-qubit controlled-Z gate is obtained by imparting a geometric phase to a shared electron spin4, and used to prepare entangled Bell states with fidelities up to 94.2(2.7)%. The quantum operations are precisely characterized using gate set tomography (GST)5, yielding one-qubit average gate fidelities up to 99.95(2)%, two-qubit average gate fidelity of 99.37(11)% and two-qubit preparation/measurement fidelities of 98.95(4)%. These three metrics indicate that nuclear spins in silicon are approaching the performance demanded in fault-tolerant quantum processors6. We then demonstrate entanglement between the two nuclei and the shared electron by producing a Greenberger–Horne–Zeilinger three-qubit state with 92.5(1.0)% fidelity. Because electron spin qubits in semiconductors can be further coupled to other electrons7–9 or physically shuttled across different locations10,11, these results establish a viable route for scalable quantum information processing using donor nuclear and electron spins.
We present a framework for calibration of parameters in elastoplastic constitutive models that is based on the use of automatic differentiation (AD). The model calibration problem is posed as a partial differential equation-constrained optimization problem where a finite element (FE) model of the coupled equilibrium equation and constitutive model evolution equations serves as the constraint. The objective function quantifies the mismatch between the displacement predicted by the FE model and full-field digital image correlation data, and the optimization problem is solved using gradient-based optimization algorithms. Forward and adjoint sensitivities are used to compute the gradient at considerably less cost than its calculation from finite difference approximations. Through the use of AD, we need only to write the constraints in terms of AD objects, where all of the derivatives required for the forward and inverse problems are obtained by appropriately seeding and evaluating these quantities. We present three numerical examples that verify the correctness of the gradient, demonstrate the AD approach's parallel computation capabilities via application to a large-scale FE model, and highlight the formulation's ease of extensibility to other classes of constitutive models.
Krylov subspace recycling is a powerful tool when solving a long series of large, sparse linear systems that change only slowly over time. In PDE constrained shape optimization, these series appear naturally, as typically hundreds or thousands of optimization steps are needed with only small changes in the geometry. In this setting, however, applying Krylov subspace recycling can be a difficult task. As the geometry evolves, in general, so does the finite element mesh defined on or representing this geometry, including the numbers of nodes and elements and element connectivity. This is especially the case if re-meshing techniques are used. As a result, the number of algebraic degrees of freedom in the system changes, and in general the linear system matrices resulting from the finite element discretization change size from one optimization step to the next. Changes in the mesh connectivity also lead to structural changes in the matrices. In the case of re-meshing, even if the geometry changes only a little, the corresponding mesh might differ substantially from the previous one. Obviously, this prevents any straightforward mapping of the approximate invariant subspace of the linear system matrix (the focus of recycling in this work) from one optimization step to the next; similar problems arise for other selected subspaces. In this paper, we present an algorithm to map an approximate invariant subspace of the linear system matrix for the previous optimization step to an approximate invariant subspace of the linear system matrix for the current optimization step, for general meshes. This is achieved by exploiting the map from coefficient vectors to finite element functions on the mesh, combined with interpolation or approximation of functions on the finite element mesh. We demonstrate the effectiveness of our approach numerically with several proof of concept studies for a specific meshing technique.
We have extended the computational singular perturbation (CSP) method to differential algebraic equation (DAE) systems and demonstrated its application in a heterogeneous-catalysis problem. The extended method obtains the CSP basis vectors for DAEs from a reduced Jacobian matrix that takes the algebraic constraints into account. We use a canonical problem in heterogeneous catalysis, the transient continuous stirred tank reactor (T-CSTR), for illustration. The T-CSTR problem is modelled fundamentally as an ordinary differential equation (ODE) system, but it can be transformed to a DAE system if one approximates typically fast surface processes using algebraic constraints for the surface species. We demonstrate the application of CSP analysis for both ODE and DAE constructions of a T-CSTR problem, illustrating the dynamical response of the system in each case. We also highlight the utility of the analysis in commenting on the quality of any particular DAE approximation built using the quasi-steady state approximation (QSSA), relative to the ODE reference case.
Many teams struggle to adapt and right-size software engineering best practices for quality assurance to fit their context. Introducing software quality is not usually framed in a way that motivates teams to take action, thus resulting in it becoming a "check the box for compliance"activity instead of a cultural practice that values software quality and the effort to achieve it. When and how can we provide effective incentives for software teams to adopt and integrate meaningful and enduring software quality practices? We explored this question through a persona-based ideation exercise at the 2021 Collegeville Workshop on Scientific Software in which we created three unique personas that represent different scientific software developer perspectives.
Quantum computers can now run interesting programs, but each processor’s capability—the set of programs that it can run successfully—is limited by hardware errors. These errors can be complicated, making it difficult to accurately predict a processor’s capability. Benchmarks can be used to measure capability directly, but current benchmarks have limited flexibility and scale poorly to many-qubit processors. We show how to construct scalable, efficiently verifiable benchmarks based on any program by using a technique that we call circuit mirroring. With it, we construct two flexible, scalable volumetric benchmarks based on randomized and periodically ordered programs. We use these benchmarks to map out the capabilities of twelve publicly available processors, and to measure the impact of program structure on each one. We find that standard error metrics are poor predictors of whether a program will run successfully on today’s hardware, and that current processors vary widely in their sensitivity to program structure.
Magann, Alicia B.; Mccaul, Gerard; Rabitz, Herschel A.; Bondar, Denys I.
The characterization of mixtures of non-interacting, spectroscopically similar quantum components has important applications in chemistry, biology, and materials science. We introduce an approach based on quantum tracking control that allows for determining the relative concentrations of constituents in a quantum mixture, using a single pulse which enhances the distinguishability of components of the mixture and has a length that scales linearly with the number of mixture constituents. To illustrate the method, we consider two very distinct model systems: mixtures of diatomic molecules in the gas phase, as well as solid-state materials composed of a mixture of components. A set of numerical analyses are presented, showing strong performance in both settings.
This paper describes an efficient reverse-mode differentiation algorithm for contraction operations for arbitrary and unconventional tensor network topologies. The approach leverages the tensor contraction tree of Evenbly and Pfeifer (2014), which provides an instruction set for the contraction sequence of a network. We show that this tree can be efficiently leveraged for differentiation of a full tensor network contraction using a recursive scheme that exploits (1) the bilinear property of contraction and (2) the property that trees have a single path from root to leaves. While differentiation of tensor-tensor contraction is already possible in most automatic differentiation packages, we show that exploiting these two additional properties in the specific context of contraction sequences can improve eficiency. Following a description of the algorithm and computational complexity analysis, we investigate its utility for gradient-based supervised learning for low-rank function recovery and for fitting real-world unstructured datasets. We demonstrate improved performance over alternating least-squares optimization approaches and the capability to handle heterogeneous and arbitrary tensor network formats. When compared to alternating minimization algorithms, we find that the gradient-based approach requires a smaller oversampling ratio (number of samples compared to number model parameters) for recovery. This increased efficiency extends to fitting unstructured data of varying dimensionality and when employing a variety of tensor network formats. Here, we show improved learning using the hierarchical Tucker method over the tensor-train in high-dimensional settings on a number of benchmark problems.
Hu, Xuan; Walker, Benjamin W.; Garcia-Sanchez, Felipe; Edwards, Alexander J.; Zhou, Peng; Incorvia, Jean A.C.; Paler, Alexandru; Frank, Michael P.; Friedman, Joseph S.
Magnetic skyrmions are nanoscale whirls of magnetism that can be propagated with electrical currents. The repulsion between skyrmions inspires their use for reversible computing based on the elastic billiard ball collisions proposed for conservative logic in 1982. In this letter, we evaluate the logical and physical reversibility of this skyrmion logic paradigm, as well as the limitations that must be addressed before dissipation-free computation can be realized.