Nuclear spins were among the first physical platforms to be considered for quantum information processing1,2, because of their exceptional quantum coherence3 and atomic-scale footprint. However, their full potential for quantum computing has not yet been realized, owing to the lack of methods with which to link nuclear qubits within a scalable device combined with multi-qubit operations with sufficient fidelity to sustain fault-tolerant quantum computation. Here we demonstrate universal quantum logic operations using a pair of ion-implanted 31P donor nuclei in a silicon nanoelectronic device. A nuclear two-qubit controlled-Z gate is obtained by imparting a geometric phase to a shared electron spin4, and used to prepare entangled Bell states with fidelities up to 94.2(2.7)%. The quantum operations are precisely characterized using gate set tomography (GST)5, yielding one-qubit average gate fidelities up to 99.95(2)%, two-qubit average gate fidelity of 99.37(11)% and two-qubit preparation/measurement fidelities of 98.95(4)%. These three metrics indicate that nuclear spins in silicon are approaching the performance demanded in fault-tolerant quantum processors6. We then demonstrate entanglement between the two nuclei and the shared electron by producing a Greenberger–Horne–Zeilinger three-qubit state with 92.5(1.0)% fidelity. Because electron spin qubits in semiconductors can be further coupled to other electrons7–9 or physically shuttled across different locations10,11, these results establish a viable route for scalable quantum information processing using donor nuclear and electron spins.

Physics-informed neural network architectures have emerged as a powerful tool for developing flexible PDE solvers that easily assimilate data. When applied to problems in shock physics however, these approaches face challenges related to the collocation-based PDE discretization underpinning them. By instead adopting a least squares space-time control volume scheme, we obtain a scheme which more naturally handles: regularity requirements, imposition of boundary conditions, entropy compatibility, and conservation, substantially reducing requisite hyperparameters in the process. Additionally, connections to classical finite volume methods allows application of inductive biases toward entropy solutions and total variation diminishing properties. For inverse problems in shock hydrodynamics, we propose inductive biases for discovering thermodynamically consistent equations of state that guarantee hyperbolicity. This framework therefore provides a means of discovering continuum shock models from molecular simulations of rarefied gases and metals. The output of the learning process provides a data-driven equation of state which may be incorporated into traditional shock hydrodynamics codes.

We present a framework for calibration of parameters in elastoplastic constitutive models that is based on the use of automatic differentiation (AD). The model calibration problem is posed as a partial differential equation-constrained optimization problem where a finite element (FE) model of the coupled equilibrium equation and constitutive model evolution equations serves as the constraint. The objective function quantifies the mismatch between the displacement predicted by the FE model and full-field digital image correlation data, and the optimization problem is solved using gradient-based optimization algorithms. Forward and adjoint sensitivities are used to compute the gradient at considerably less cost than its calculation from finite difference approximations. Through the use of AD, we need only to write the constraints in terms of AD objects, where all of the derivatives required for the forward and inverse problems are obtained by appropriately seeding and evaluating these quantities. We present three numerical examples that verify the correctness of the gradient, demonstrate the AD approach's parallel computation capabilities via application to a large-scale FE model, and highlight the formulation's ease of extensibility to other classes of constitutive models.

Quantum computers can now run interesting programs, but each processor’s capability—the set of programs that it can run successfully—is limited by hardware errors. These errors can be complicated, making it difficult to accurately predict a processor’s capability. Benchmarks can be used to measure capability directly, but current benchmarks have limited flexibility and scale poorly to many-qubit processors. We show how to construct scalable, efficiently verifiable benchmarks based on any program by using a technique that we call circuit mirroring. With it, we construct two flexible, scalable volumetric benchmarks based on randomized and periodically ordered programs. We use these benchmarks to map out the capabilities of twelve publicly available processors, and to measure the impact of program structure on each one. We find that standard error metrics are poor predictors of whether a program will run successfully on today’s hardware, and that current processors vary widely in their sensitivity to program structure.

Hu, Xuan; Walker, Benjamin W.; Garcia-Sanchez, Felipe; Edwards, Alexander J.; Zhou, Peng; Incorvia, Jean A.; Paler, Alexandru; Frank, Michael P.; Friedman, Joseph S.

Magnetic skyrmions are nanoscale whirls of magnetism that can be propagated with electrical currents. The repulsion between skyrmions inspires their use for reversible computing based on the elastic billiard ball collisions proposed for conservative logic in 1982. In this letter, we evaluate the logical and physical reversibility of this skyrmion logic paradigm, as well as the limitations that must be addressed before dissipation-free computation can be realized.

The visualization community has invested decades of research and development into producing large-scale production visualization tools. Although in situ is a paradigm shift for large-scale visualization, much of the same algorithms and operations apply regardless of whether the visualization is run post hoc or in situ. Thus, there is a great benefit to taking the large-scale code originally designed for post hoc use and leveraging it for use in situ. This chapter describes two in situ libraries, Libsim and Catalyst, that are based on mature visualization tools, VisIt and ParaView, respectively. Because they are based on fully-featured visualization packages, they each provide a wealth of features. For each of these systems we outline how the simulation and visualization software are coupled, what the runtime behavior and communication between these components are, and how the underlying implementation works. We also provide use cases demonstrating the systems in action. Both of these in situ libraries, as well as the underlying products they are based on, are made freely available as open-source products. The overviews in this chapter provide a toehold to the practical application of in situ visualization.

This paper describes an efficient reverse-mode differentiation algorithm for contraction operations for arbitrary and unconventional tensor network topologies. The approach leverages the tensor contraction tree of Evenbly and Pfeifer (2014), which provides an instruction set for the contraction sequence of a network. We show that this tree can be efficiently leveraged for differentiation of a full tensor network contraction using a recursive scheme that exploits (1) the bilinear property of contraction and (2) the property that trees have a single path from root to leaves. While differentiation of tensor-tensor contraction is already possible in most automatic differentiation packages, we show that exploiting these two additional properties in the specific context of contraction sequences can improve eficiency. Following a description of the algorithm and computational complexity analysis, we investigate its utility for gradient-based supervised learning for low-rank function recovery and for fitting real-world unstructured datasets. We demonstrate improved performance over alternating least-squares optimization approaches and the capability to handle heterogeneous and arbitrary tensor network formats. When compared to alternating minimization algorithms, we find that the gradient-based approach requires a smaller oversampling ratio (number of samples compared to number model parameters) for recovery. This increased efficiency extends to fitting unstructured data of varying dimensionality and when employing a variety of tensor network formats. Here, we show improved learning using the hierarchical Tucker method over the tensor-train in high-dimensional settings on a number of benchmark problems.

Measurements that occur within the internal layers of a quantum circuit—midcircuit measurements—are a useful quantum-computing primitive, most notably for quantum error correction. Midcircuit measurements have both classical and quantum outputs, so they can be subject to error modes that do not exist for measurements that terminate quantum circuits. Here we show how to characterize midcircuit measurements, modeled by quantum instruments, using a technique that we call quantum instrument linear gate set tomography (QILGST). We then apply this technique to characterize a dispersive measurement on a superconducting transmon qubit within a multiqubit system. By varying the delay time between the measurement pulse and subsequent gates, we explore the impact of residual cavity photon population on measurement error. QILGST can resolve different error modes and quantify the total error from a measurement; in our experiment, for delay times above 1000ns we measure a total error rate (i.e., half diamond distance) of ϵ⋄=8.1±1.4%, a readout fidelity of 97.0±0.3%, and output quantum-state fidelities of 96.7±0.6% and 93.7±0.7% when measuring 0 and 1, respectively.

Numerical simulations of Greenland and Antarctic ice sheets involve the solution of large-scale highly nonlinear systems of equations on complex shallow geometries. This work is concerned with the construction of Schwarz preconditioners for the solution of the associated tangent problems, which are challenging for solvers mainly because of the strong anisotropy of the meshes and wildly changing boundary conditions that can lead to poorly constrained problems on large portions of the domain. Here, two-level generalized Dryja-Smith-Widlund (GDSW)-type Schwarz preconditioners are applied to different land ice problems, i.e., a velocity problem, a temperature problem, as well as the coupling of the former two problems. We employ the message passing interface (MPI)- parallel implementation of multilevel Schwarz preconditioners provided by the package FROSch (fast and robust Schwarz) from the Trilinos library. The strength of the proposed preconditioner is that it yields out-of-the-box scalable and robust preconditioners for the single physics problems. To the best of our knowledge, this is the first time two-level Schwarz preconditioners have been applied to the ice sheet problem and a scalable preconditioner has been used for the coupled problem. The preconditioner for the coupled problem differs from previous monolithic GDSW preconditioners in the sense that decoupled extension operators are used to compute the values in the interior of the subdomains. Several approaches for improving the performance, such as reuse strategies and shared memory OpenMP parallelization, are explored as well. In our numerical study we target both uniform meshes of varying resolution for the Antarctic ice sheet as well as nonuniform meshes for the Greenland ice sheet. We present several weak and strong scaling studies confirming the robustness of the approach and the parallel scalability of the FROSch implementation. Among the highlights of the numerical results are a weak scaling study for up to 32 K processor cores (8 K MPI ranks and 4 OpenMP threads) and 566 M degrees of freedom for the velocity problem as well as a strong scaling study for up to 4 K processor cores (and MPI ranks) and 68 M degrees of freedom for the coupled problem.

Fault tolerance is a key challenge as high performance computing systems continue to increase component counts, individual component reliability decreases, and hardware and software complexity increases. To better understand the potential impacts of failures on next-generation systems, significant effort has been devoted to collecting, characterizing and analyzing failures on current systems. These studies require large volumes of data and complex analysis in an attempt to identify statistical properties of the failure data. In this paper, we examine the lifetime of failures on the Cielo supercomputer that was located at Los Alamos National Laboratory, looking specifically at the time between faults on this system. Through this analysis, we show that the time between uncorrectable faults for this system obeys Benford’s law, This law applies to a number of naturally occurring collections of numbers and states that the leading digit is more likely to be small, for example a leading digit of 1 is more likely than 9. We also show that a number of common distributions used to model failures also follow this law. This work provides critical analysis on the distribution of times between failures for extreme-scale systems. Specifically, the analysis in this work could be used as a simple form of failure prediction or used for modeling realistic failures.

We study orthogonal polynomials with respect to self-similar measures, focusing on the class of infinite Bernoulli convolutions, which are defined by iterated function systems with overlaps, especially those defined by the Pisot, Garsia, and Salem numbers. By using an algorithm of Mantica, we obtain graphs of the coefficients of the 3-term recursion relation defining the orthogonal polynomials. We use these graphs to predict whether the singular infinite Bernoulli convolutions belong to the Nevai class. Based on our numerical results, we conjecture that all infinite Bernoulli convolutions with contraction ratios greater than or equal to 1/2 belong to Nevai’s class, regardless of the probability weights assigned to the self-similar measures.

Study looks at the effect that failed links have on the throughput of HPC systems: What workloads are most effected? How many links need to be down before throughput of the machine is noticeably affected?

We analyze the regression accuracy of convolutional neural networks assembled from encoders, decoders and skip connections and trained with multifidelity data. These networks benefit from a significant reduction in the number of trainable parameters with respect to an equivalent fully connected network. These architectures are also versatile with respect to the input and output dimensionality. For example, encoder-decoder, decoder-encoder or decoder-encoder-decoder architectures are well suited to learn mappings between input and outputs of any dimensionality. We demonstrate the accuracy produced by such architectures when trained on a few high-fidelity and many low-fidelity data generated from models ranging from one-dimensional functions to Poisson equation solvers in two-dimensions. We finally discuss a number of implementation choices that improve the reliability of the uncertainty estimates generated by a dropblock regularizer, and compare uncertainty estimates among low-, high-and multi-fidelity approaches.

We report flow statistics and visualizations from gas-kinetic simulations using the Direct Simulation Monte Carlo (DSMC) method of compressible turbulent Couette flow over a porous substrate composed of an array of circular cylinders for which the Knudsen number is O(10-1). Comparisons are made with both smooth-wall DSMC simulations and direct numerical simulations of the Navier-Stokes equations for the same conditions. Roughness, permeability, and noncontinuum effects are assessed.

Software sustainability is critical for Computational Science and Engineering (CSE) software. It is also challenging due to factors ranging from funding models to the typical lifecycle of a research code to the inherent challenges of running fast on the newest architectures. Furthermore, measuring sustainability is challenging because sustainability consists of many complex attributes. To identify useful metrics for measuring CSE software sustainability, we gathered data from multiple freely available sources, including GitHub, SLOCCount, and Metrix++. This paper discusses the challenges practitioners face when measuring the sustainability of CSE software. We present an analysis of data with associated observations and future directions to better understand CSE software sustainability and how this work can be used to support decisions and improve sustainability by observing trends in metrics over time.

Powders under compression form mesostructures of particle agglomerations in response to both inter- and intra-particle forces. The ability to computationally predict the resulting mesostructures with reasonable accuracy requires models that capture the distributions associated with particle size and shape, contact forces, and mechanical response during deformation and fracture. The following report presents experimental data obtained for the purpose of validating emerging mesostructures simulated by discrete element method and peridynamic approaches. A custom compression apparatus, suitable for integration with our micro-computed tomography (micro-CT) system, was used to collect 3-D scans of a bulk powder at discrete steps of increasing compression. Details of the apparatus and the microcrystalline cellulose particles, with a nearly spherical shape and mean particle size, are presented. Comparative simulations were performed with an initial arrangement of particles and particle shapes directly extracted from the validation experiment. The experimental volumetric reconstruction was segmented to extract the relative positions and shapes of individual particles in the ensemble, including internal voids in the case of the microcrystalline cellulose particles. These computationally determined particles were then compressed within the computational domain and the evolving mesostructures compared directly to those in the validation experiment. The ability of the computational models to simulate the experimental mesostructures and particle behavior at increasing compression is discussed.

We propose a domain decomposition method for the efficient simulation of nonlocal problems. Our approach is based on a multi-domain formulation of a nonlocal diffusion problem where the subdomains share “nonlocal” interfaces of the size of the nonlocal horizon. This system of nonlocal equations is first rewritten in terms of minimization of a nonlocal energy, then discretized with a meshfree approximation and finally solved via a Lagrange multiplier approach in a way that resembles the finite element tearing and interconnect method. Specifically, we propose a distributed projected gradient algorithm for the solution of the Lagrange multiplier system, whose unknowns determine the nonlocal interface conditions between subdomains. Several two-dimensional numerical tests on problems as large as 191 million unknowns illustrate the strong and the weak scalability of our algorithm, which outperforms the standard approach to the distributed numerical solution of the problem. This work is the first rigorous numerical study in a two-dimensional multi-domain setting for nonlocal operators with finite horizon and, as such, it is a fundamental step towards increasing the use of nonlocal models in large scale simulations.

Karim, Mohammad K.; Narasimhachary, Santosh N.; Radailli, Francesco R.; Amann, Christian A.; Dayal, Kaushik D.; Silling, Stewart A.; Germann, Tim G.

In this study, we present a computational study and framework that allows us to study and understand the crack nucleation process from forging flaws. Forging flaws may be present in large steel rotor components commonly used for rotating power generation equipment including gas turbines, electrical generators, and steam turbines. The service life of these components is often limited by crack nucleation and subsequent growth from such forging flaws, which frequently exhibit themselves as non-metallic oxide inclusions. The fatigue crack growth process can be described by established engineering fracture mechanics methods. However, the initial crack nucleation process from a forging flaw is challenging for traditional engineering methods to quantify as it depends on the details of the flaw, including flaw morphology. We adopt the peridynamics method to describe and study this crack nucleation process. For a specific industrial gas turbine rotor steel, we present how we integrate and fit commonly known base material property data such as elastic properties, yield strength, and S-N curves, as well as fatigue crack growth data into a peridynamic model. The obtained model is then utilized in a series of high-performance two-dimensional peridynamic simulations to study the crack nucleation process from forging flaws for ambient and elevated temperatures in a rectangular simulation cell specimen. The simulations reveal an initial local nucleation at multiple small oxide inclusions followed by micro-crack propagation, arrest, coalescence, and eventual emergence of a dominant micro-crack that governs the crack nucleation process. The dependence on temperature and density of oxide inclusions of both the details of the microscopic processes and cycles to crack nucleation is also observed. Finally, the results are compared with fatigue experiments performed with specimens containing forging flaws of the same rotor steel.