Demonstrating Scalable Benchmarking of Quantum Computers
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Materials Science and Engineering: A
The mechanical properties of additively manufactured metals tend to show high variability, due largely to the stochastic nature of defect formation during the printing process. This study seeks to understand how automated high throughput testing can be utilized to understand the variable nature of additively manufactured metals at different print conditions, and to allow for statistically meaningful analysis. This is demonstrated by analyzing how different processing parameters, including laser power, scan velocity, and scan pattern, influence the tensile behavior of additively manufactured stainless steel 316L utilizing a newly developed automated test methodology. Microstructural characterization through computed tomography and electron backscatter diffraction is used to understand some of the observed trends in mechanical behavior. Specifically, grain size and morphology are shown to depend on processing parameters and influence the observed mechanical behavior. In the current study, laser-powder bed fusion, also known as selective laser melting or direct metal laser sintering, is shown to produce 316L over a wide processing range without substantial detrimental effect on the tensile properties. Ultimate tensile strengths above 600 MPa, which are greater than that for typical wrought annealed 316L with similar grain sizes, and elongations to failure greater than 40% were observed. It is demonstrated that this process has little sensitivity to minor intentional or unintentional variations in laser velocity and power.
Journal of Physical Chemistry. A, Molecules, Spectroscopy, Kinetics, Environment, and General Theory
Machine learning of the quantitative relationship between local environment descriptors and the potential energy surface of a system of atoms has emerged as a new frontier in the development of interatomic potentials (IAPs). Here, we present a comprehensive evaluation of ML-IAPs based on four local environment descriptors --- Behler-Parrinello symmetry functions, smooth overlap of atomic positions (SOAP), the Spectral Neighbor Analysis Potential (SNAP) bispectrum components, and moment tensors --- using a diverse data set generated using high-throughput density functional theory (DFT) calculations. The data set comprising bcc (Li, Mo) and fcc (Cu, Ni) metals and diamond group IV semiconductors (Si, Ge) is chosen to span a range of crystal structures and bonding. All descriptors studied show excellent performance in predicting energies and forces far surpassing that of classical IAPs, as well as predicting properties such as elastic constants and phonon dispersion curves. We observe a general trade-off between accuracy and the degrees of freedom of each model, and consequently computational cost. We will discuss these trade-offs in the context of model selection for molecular dynamics and other applications.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Concern about memory errors has been widespread in high-performance computing (HPC) for decades. These concerns have led to significant research on detecting and correcting memory errors to improve performance and provide strong guarantees about the correctness of the memory contents of scientific simulations. However, power concerns and changes in memory architectures threaten the viability of current approaches to protecting memory (e.g., Chipkill). Returning to less protective error-correcting codes (ECC), e.g., single-error correction, double-error detection (SECDED), may increase the frequency of memory errors, including silent data corruption (SDC). SDC has the potential to silently cause applications to produce incorrect results and mislead domain scientists. We propose an approach for exploiting unnecessary bits in pointer values to support encoding the pointer with a Reed-Solomon code. Encoding the pointer allows us to provides strong capabilities for correcting and detecting corruption of pointer values. In this paper, we provide a detailed description of how we can exploit unnecessary pointer bits to store Reed-Solomon parity symbols. We evaluate the performance impacts of this approach and examine the effectiveness of the approach against corruption. Our results demonstrate that encoding and decoding is fast (less than 45 per event) and that the protection it provides is robust (the rate of miscorrection is less than 5% even for significant corruption). The data and analysis presented in this paper demonstrates the power of our approach. It is fast, tunable, requires no additional per-pointer storage resources, and provides robust protection against pointer corruption.
Journal of Advances in Modeling Earth Systems
We derive a formulation of the nonhydrostatic equations in spherical geometry with a Lorenz staggered vertical discretization. The combination conserves a discrete energy in exact time integration when coupled with a mimetic horizontal discretization. The formulation is a version of Dubos and Tort (2014, https://doi.org/10.1175/MWR-D-14-00069.1) rewritten in terms of primitive variables. It is valid for terrain following mass or height coordinates and for both Eulerian or vertically Lagrangian discretizations. The discretization relies on an extension to Simmons and Burridge (1981, https://doi.org/10.1175/1520-0493(1981)109<0758:AEAAMC>2.0.CO;2) vertical differencing, which we show obeys a discrete derivative product rule. This product rule allows us to simplify the treatment of the vertical transport terms. Energy conservation is obtained via a term-by-term balance in the kinetic, internal, and potential energy budgets, ensuring an energy-consistent discretization up to time truncation error with no spurious sources of energy. We demonstrate convergence with respect to time truncation error in a spectral element code with a horizontal explicit vertically implicit implicit-explicit time stepping algorithm.
Quantum
As increasingly impressive quantum information processors are realized in laboratories around the world, robust and reliable characterization of these devices is now more urgent than ever. These diagnostics can take many forms, but one of the most popular categories is tomography, where an underlying parameterized model is proposed for a device and inferred by experiments. Here, we introduce and implement efficient operational tomography, which uses experimental observables as these model parameters. This addresses a problem of ambiguity in representation that arises in current tomographic approaches (the gauge problem). Solving the gauge problem enables us to efficiently implement operational tomography in a Bayesian framework computationally, and hence gives us a natural way to include prior information and discuss uncertainty in fit parameters. We demonstrate this new tomography in a variety of different experimentally-relevant scenarios, including standard process tomography, Ramsey interferometry, randomized benchmarking, and gate set tomography.
Proceedings of the 6th European Conference on Computational Mechanics: Solids, Structures and Coupled Problems, ECCM 2018 and 7th European Conference on Computational Fluid Dynamics, ECFD 2018
SNOWPAC (Stochastic Nonlinear Optimization With Path-Augmented Constraints) is a method for stochastic nonlinear constrained derivative-free optimization. For such problems, it extends the path-augmented constraints framework introduced by the deterministic optimization method NOWPAC and uses a noise-adapted trust region approach and Gaussian processes for noise reduction. In recent developments, SNOWPAC is available in the DAKOTA framework which offers a highly flexible interface to couple the optimizer with different sampling strategies or surrogate models. In this paper we discuss details of SNOWPAC and demonstrate the coupling with DAKOTA. We showcase the approach by presenting design optimization results of a shape in a 2D supersonic duct. This simulation is supposed to imitate the behavior of the flow in a SCRAMJET simulation but at a much lower computational cost. Additionally different mesh or model fidelities can be tested. Thus, it serves as a convenient test case before moving to costly SCRAMJET computations. Here, we study deterministic results and results obtained by introducing uncertainty on inflow parameters. As sampling strategies we compare classical Monte Carlo sampling with multilevel Monte Carlo approaches for which we developed new error estimators. All approaches show a reasonable optimization of the design over the objective while maintaining or seeking feasibility. Furthermore, we achieve significant reductions in computational cost by using multilevel approaches that combine solutions from different grid resolutions.
Proceedings of the ASME Design Engineering Technical Conference
Bayesian optimization (BO) is an efficient and flexible global optimization framework that is applicable to a very wide range of engineering applications. To leverage the capability of the classical BO, many extensions, including multi-objective, multi-fidelity, parallelization, and latent-variable modeling, have been proposed to address the limitations of the classical BO framework. In this work, we propose a novel multi-objective (MO) extension, called srMOBO-3GP, to solve the MO optimization problems in a sequential setting. Three different Gaussian processes (GPs) are stacked together, where each of the GP is assigned with a different task: the first GP is used to approximate a single-objective computed from the MO definition, the second GP is used to learn the unknown constraints, and the third GP is used to learn the uncertain Pareto frontier. At each iteration, a MO augmented Tchebycheff function converting MO to single-objective is adopted and extended with a regularized ridge term, where the regularization is introduced to smooth the single-objective function. Finally, we couple the third GP along with the classical BO framework to explore the richness and diversity of the Pareto frontier by the exploitation and exploration acquisition function. The proposed framework is demonstrated using several numerical benchmark functions, as well as a thermomechanical finite element model for flip-chip package design optimization.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
There is a wealth of psychological theory regarding the drive for individuals to congregate and form social groups, positing that people may organize out of fear, social pressure, or even to manage their self-esteem. We evaluate three such theories for multi-scale validity by studying them not only at the individual scale for which they were originally developed, but also for applicability to group interactions and behavior. We implement this multi-scale analysis using a dataset of communications and group membership derived from a long-running online game, matching the intent behind the theories to quantitative measures that describe players’ behavior. Once we establish that the theories hold for the dataset, we increase the scope to test the theories at the higher scale of group interactions. Despite being formulated to describe individual cognition and motivation, we show that some group dynamics theories hold at the higher level of group cognition and can effectively describe the behavior of joint decision making and higher-level interactions.
SIAM Journal on Numerical Analysis
We present a Fourier analysis of wave propagation problems subject to a class of continuous and discontinuous discretizations using high-degree Lagrange polynomials. This allows us to obtain explicit analytical formulas for the dispersion relation and group velocity and, for the first time to our knowledge, characterize analytically the emergence of gaps in the dispersion relation at specific wavenumbers, when they exist, and compute their specific locations. Wave packets with energy at these wavenumbers will fail to propagate correctly, leading to significant numerical dispersion. We also show that the Fourier analysis generates mathematical artifacts, and we explain how to remove them through a branch selection procedure conducted by analysis of eigenvectors and associated reconstructed solutions. The higher frequency eigenmodes, named erratic in this study, are also investigated analytically and numerically.
SIAM Journal on Scientific Computing
This paper considers preconditioners for the linear systems that arise from optimal control and inverse problems involving the Helmholtz equation. Specifically, we explore an all-at-once approach. The main contribution centers on the analysis of two block preconditioners. Variations of these preconditioners have been proposed and analyzed in prior works for optimal control problems where the underlying partial differential equation is a Laplace-like operator. In this paper, we extend some of the prior convergence results to Helmholtz-based optimization applications. Our analysis examines situations where control variables and observations are restricted to subregions of the computational domain. We prove that solver convergence rates do not deteriorate as the mesh is refined or as the wavenumber increases. More specifically, for one of the preconditioners we prove accelerated convergence as the wavenumber increases. Additionally, in situations where the control and observation subregions are disjoint, we observe that solver convergence rates have a weak dependence on the regularization parameter. We give a partial analysis of this behavior. We illustrate the performance of the preconditioners on control problems motivated by acoustic testing.
Proceedings of the 6th European Conference on Computational Mechanics: Solids, Structures and Coupled Problems, ECCM 2018 and 7th European Conference on Computational Fluid Dynamics, ECFD 2018
Wind energy is stochastic in nature; the prediction of aerodynamic quantities and loads relevant to wind energy applications involves modeling the interaction of a range of physics over many scales for many different cases. These predictions require a range of model fidelity, as predictive models that include the interaction of atmospheric and wind turbine wake physics can take weeks to solve on institutional high performance computing systems. In order to quantify the uncertainty in predictions of wind energy quantities with multiple models, researchers at Sandia National Laboratories have applied Multilevel-Multifidelity methods. A demonstration study was completed using simulations of a NREL 5MW rotor in an atmospheric boundary layer with wake interaction. The flow was simulated with two models of disparate fidelity; an actuator line wind plant large-eddy scale model, Nalu, using several mesh resolutions in combination with a lower fidelity model, OpenFAST. Uncertainties in the flow conditions and actuator forces were propagated through the model using Monte Carlo sampling to estimate the velocity defect in the wake and forces on the rotor. Coarse-mesh simulations were leveraged along with the lower-fidelity flow model to reduce the variance of the estimator, and the resulting Multilevel-Multifidelity strategy demonstrated a substantial improvement in estimator efficiency compared to the standard Monte Carlo method.
AIAA Scitech 2020 Forum
Truly predictive numerical simulations can only be obtained by performing Uncertainty Quantification. However, many realistic engineering applications require extremely complex and computationally expensive high-fidelity numerical simulations for their accurate performance characterization. Very often the combination of complex physical models and extreme operative conditions can easily lead to hundreds of uncertain parameters that need to be propagated through high-fidelity codes. Under these circumstances, a single fidelity uncertainty quantification approach, i.e. a workflow that only uses high-fidelity simulations, is unfeasible due to its prohibitive overall computational cost. To overcome this difficulty, in recent years multifidelity strategies emerged and gained popularity. Their core idea is to combine simulations with varying levels of fidelity/accuracy in order to obtain estimators or surrogates that can yield the same accuracy of their single fidelity counterparts at a much lower computational cost. This goal is usually accomplished by defining a priori a sequence of discretization levels or physical modeling assumptions that can be used to decrease the complexity of a numerical model realization and thus its computational cost. Less attention has been dedicated to low-fidelity models that can be built directly from a small number of available high-fidelity simulations. In this work we focus our attention on reduced order models (ROMs). Our main goal in this work is to investigate the combination of multifidelity uncertainty quantification and ROMs in order to evaluate the possibility to obtain an efficient framework for propagating uncertainties through expensive numerical codes. We focus our attention on sampling-based multifidelity approaches, like the multifidelity control variate, and we consider several scenarios for a numerical test problem, namely the Kuramoto-Sivashinsky equation, for which the efficiency of the multifidelity-ROM estimator is compared to the standard (single-fidelity) Monte Carlo approach.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
In the decade since support for task parallelism was incorporated into OpenMP, its use has remained limited in part due to concerns about its performance and scalability. This paper revisits a study from the early days of OpenMP tasking that used the Unbalanced Tree Search (UTS) benchmark as a stress test to gauge implementation efficiency. The present UTS study includes both Clang/LLVM and vendor OpenMP implementations on four different architectures. We measure parallel efficiency to examine each implementation’s performance in response to varying task granularity. We find that most implementations achieve over 90% efficiency using all available cores for tasks of O(100k) instructions, and the best even manage tasks of O(10k) instructions well.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
As computer architectures are rapidly evolving (e.g. those designed for exascale), multiple portability frameworks have been developed to avoid new architecture-specific development and tuning. However, portability frameworks depend on compilers for auto-vectorization and may lack support for explicit vectorization on heterogeneous platforms. Alternatively, programmers can use intrinsics-based primitives to achieve more efficient vectorization, but the lack of a gpu back-end for these primitives makes such code non-portable. A unified, portable, Single Instruction Multiple Data (simd) primitive proposed in this work, allows intrinsics-based vectorization on cpus and many-core architectures such as Intel Knights Landing (knl), and also facilitates Single Instruction Multiple Threads (simt) based execution on gpus. This unified primitive, coupled with the Kokkos portability ecosystem, makes it possible to develop explicitly vectorized code, which is portable across heterogeneous platforms. The new simd primitive is used on different architectures to test the performance boost against hard-to-auto-vectorize baseline, to measure the overhead against efficiently vectroized baseline, and to evaluate the new feature called the “logical vector length” (lvl). The simd primitive provides portability across cpus and gpus without any performance degradation being observed experimentally.
Abstract not provided.
Quantum
We propose a very large family of benchmarks for probing the performance of quantum computers. We call them volumetric benchmarks (VBs) because they generalize IBM's benchmark for measuring quantum volume [1]. The quantum volume benchmark defines a family of square circuits whose depth d and width w are the same. A volumetric benchmark defines a family of rectangular quantum circuits, for which d and w are uncoupled to allow the study of time/space performance trade-offs. Each VB defines a mapping from circuit shapes - (w, d) pairs - to test suites C(w, d). A test suite is an ensemble of test circuits that share a common structure. The test suite C for a given circuit shape may be a single circuit C, a specific list of circuits {C1... CN} that must all be run, or a large set of possible circuits equipped with a distribution Pr(C). The circuits in a given VB share a structure, which is limited only by designers' creativity. We list some known benchmarks, and other circuit families, that fit into the VB framework: several families of random circuits, periodic circuits, and algorithm-inspired circuits. The last ingredient defining a benchmark is a success criterion that defines when a processor is judged to have “passed” a given test circuit. We discuss several options. Benchmark data can be analyzed in many ways to extract many properties, but we propose a simple, universal graphical summary of results that illustrates the Pareto frontier of the d vs w trade-off for the processor being benchmarked.
Proceedings of the 6th European Conference on Computational Mechanics: Solids, Structures and Coupled Problems, ECCM 2018 and 7th European Conference on Computational Fluid Dynamics, ECFD 2018
Predictions from numerical hemodynamics are increasingly adopted and trusted in the diagnosis and treatment of cardiovascular disease. However, the predictive abilities of deterministic numerical models are limited due to the large number of possible sources of uncertainty including boundary conditions, vessel wall material properties, and patient specific model anatomy. Stochastic approaches have been proposed as a possible improvement, but are penalized by the large computational cost associated with repeated solutions of the underlying deterministic model. We propose a stochastic framework which leverages three cardiovascular model fidelities, i.e., three-, one- and zero-dimensional representations of cardiovascular blood flow. Specifically, we employ multilevel and multifidelity estimators from Sandia's open-source Dakota toolkit to reduce the variance in our estimated quantities of interest, while maintaining a reasonable computational cost. The performance of these estimators in terms of computational cost reductions is investigated for both global and local hemodynamic indicators.
Lecture Notes in Computational Science and Engineering
This article describes a parallel implementation of a two-level overlapping Schwarz preconditioner with the GDSW (Generalized Dryja–Smith–Widlund) coarse space described in previous work [12, 10, 15] into the Trilinos framework; cf. [16]. The software is a significant improvement of a previous implementation [12]; see Sec. 4 for results on the improved performance.
American Society of Mechanical Engineers, Fluids Engineering Division (Publication) FEDSM
Wear prediction is important in designing reliable machinery for slurry industry. It usually relies on multi-phase computational fluid dynamics, which is accurate but computationally expensive. Each run of the simulations can take hours or days even on a high-performance computing platform. The high computational cost prohibits a large number of simulations in the process of design optimization. In contrast to physics-based simulations, data-driven approaches such as machine learning are capable of providing accurate wear predictions at a small fraction of computational costs, if the models are trained properly. In this paper, a recently developed WearGP framework [1] is extended to predict the global wear quantities of interest by constructing Gaussian process surrogates. The effects of different operating conditions are investigated. The advantages of the WearGP framework are demonstrated by its high accuracy and low computational cost in predicting wear rates.
Lecture Notes in Computational Science and Engineering
A particle-mesh strategy is presented for scalar transport problems which provides diffusion-free advection, conserves mass locally (i.e. cellwise) and exhibits optimal convergence on arbitrary polyhedral meshes. This is achieved by expressing the convective field naturally located on the Lagrangian particles as a mesh quantity by formulating a dedicated particle-mesh projection based via a PDE-constrained optimization problem. Optimal convergence and local conservation are demonstrated for a benchmark test, and the application of the scheme to mass conservative density tracking is illustrated for the Rayleigh–Taylor instability.
SIAM Journal on Matrix Analysis and Applications
We propose a new algorithm for the fast solution of large, sparse, symmetric positive-definite linear systems, spaND (sparsified Nested Dissection). It is based on nested dissection, sparsification, and low-rank compression. After eliminating all interiors at a given level of the elimination tree, the algorithm sparsifies all separators corresponding to the interiors. This operation reduces the size of the separators by eliminating some degrees of freedom but without introducing any fill-in. This is done at the expense of a small and controllable approximation error. The result is an approximate factorization that can be used as an efficient preconditioner. We then perform several numerical experiments to evaluate this algorithm. We demonstrate that a version using orthogonal factorization and block-diagonal scaling takes fewer CG iterations to converge than previous similar algorithms on various kinds of problems. Furthermore, this algorithm is provably guaranteed to never break down and the matrix stays symmetric positive-definite throughout the process. We evaluate the algorithm on some large problems show it exhibits near-linear scaling. The factorization time is roughly \scrO (N), and the number of iterations grows slowly with N.