Publications

Results 1626–1650 of 9,998

Search results

Jump to search filters

A Portable SIMD Primitive Using Kokkos for Heterogeneous Architectures

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Sahasrabudhe, Damodar; Phipps, Eric T.; Rajamanickam, Sivasankaran R.; Berzins, Martin

As computer architectures are rapidly evolving (e.g. those designed for exascale), multiple portability frameworks have been developed to avoid new architecture-specific development and tuning. However, portability frameworks depend on compilers for auto-vectorization and may lack support for explicit vectorization on heterogeneous platforms. Alternatively, programmers can use intrinsics-based primitives to achieve more efficient vectorization, but the lack of a gpu back-end for these primitives makes such code non-portable. A unified, portable, Single Instruction Multiple Data (simd) primitive proposed in this work, allows intrinsics-based vectorization on cpus and many-core architectures such as Intel Knights Landing (knl), and also facilitates Single Instruction Multiple Threads (simt) based execution on gpus. This unified primitive, coupled with the Kokkos portability ecosystem, makes it possible to develop explicitly vectorized code, which is portable across heterogeneous platforms. The new simd primitive is used on different architectures to test the performance boost against hard-to-auto-vectorize baseline, to measure the overhead against efficiently vectroized baseline, and to evaluate the new feature called the “logical vector length” (lvl). The simd primitive provides portability across cpus and gpus without any performance degradation being observed experimentally.

More Details

An algebraic sparsified nested dissection algorithm using low-rank approximations

SIAM Journal on Matrix Analysis and Applications

Cambier, Leopold; Boman, Erik G.; Rajamanickam, Sivasankaran R.; Tuminaro, Raymond S.; Darve, Eric

We propose a new algorithm for the fast solution of large, sparse, symmetric positive-definite linear systems, spaND (sparsified Nested Dissection). It is based on nested dissection, sparsification, and low-rank compression. After eliminating all interiors at a given level of the elimination tree, the algorithm sparsifies all separators corresponding to the interiors. This operation reduces the size of the separators by eliminating some degrees of freedom but without introducing any fill-in. This is done at the expense of a small and controllable approximation error. The result is an approximate factorization that can be used as an efficient preconditioner. We then perform several numerical experiments to evaluate this algorithm. We demonstrate that a version using orthogonal factorization and block-diagonal scaling takes fewer CG iterations to converge than previous similar algorithms on various kinds of problems. Furthermore, this algorithm is provably guaranteed to never break down and the matrix stays symmetric positive-definite throughout the process. We evaluate the algorithm on some large problems show it exhibits near-linear scaling. The factorization time is roughly \scrO (N), and the number of iterations grows slowly with N.

More Details

Optimization Based Particle-Mesh Algorithm for High-Order and Conservative Scalar Transport

Lecture Notes in Computational Science and Engineering

Maljaars, Jakob M.; Labeur, Robert J.; Trask, Nathaniel A.; Sulsky, Deborah L.

A particle-mesh strategy is presented for scalar transport problems which provides diffusion-free advection, conserves mass locally (i.e. cellwise) and exhibits optimal convergence on arbitrary polyhedral meshes. This is achieved by expressing the convective field naturally located on the Lagrangian particles as a mesh quantity by formulating a dedicated particle-mesh projection based via a PDE-constrained optimization problem. Optimal convergence and local conservation are demonstrated for a benchmark test, and the application of the scheme to mass conservative density tracking is illustrated for the Rayleigh–Taylor instability.

More Details

Space-Efficient Reed-Solomon Encoding to Detect and Correct Pointer Corruption

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Levy, Scott L.; Ferreira, Kurt B.

Concern about memory errors has been widespread in high-performance computing (HPC) for decades. These concerns have led to significant research on detecting and correcting memory errors to improve performance and provide strong guarantees about the correctness of the memory contents of scientific simulations. However, power concerns and changes in memory architectures threaten the viability of current approaches to protecting memory (e.g., Chipkill). Returning to less protective error-correcting codes (ECC), e.g., single-error correction, double-error detection (SECDED), may increase the frequency of memory errors, including silent data corruption (SDC). SDC has the potential to silently cause applications to produce incorrect results and mislead domain scientists. We propose an approach for exploiting unnecessary bits in pointer values to support encoding the pointer with a Reed-Solomon code. Encoding the pointer allows us to provides strong capabilities for correcting and detecting corruption of pointer values. In this paper, we provide a detailed description of how we can exploit unnecessary pointer bits to store Reed-Solomon parity symbols. We evaluate the performance impacts of this approach and examine the effectiveness of the approach against corruption. Our results demonstrate that encoding and decoding is fast (less than 45 per event) and that the protection it provides is robust (the rate of miscorrection is less than 5% even for significant corruption). The data and analysis presented in this paper demonstrates the power of our approach. It is fast, tunable, requires no additional per-pointer storage resources, and provides robust protection against pointer corruption.

More Details

Towards an integrated and efficient framework for leveraging reduced order models for multifidelity uncertainty quantification

AIAA Scitech 2020 Forum

Blonigan, Patrick J.; Geraci, Gianluca G.; Rizzi, Francesco N.; Eldred, Michael S.

Truly predictive numerical simulations can only be obtained by performing Uncertainty Quantification. However, many realistic engineering applications require extremely complex and computationally expensive high-fidelity numerical simulations for their accurate performance characterization. Very often the combination of complex physical models and extreme operative conditions can easily lead to hundreds of uncertain parameters that need to be propagated through high-fidelity codes. Under these circumstances, a single fidelity uncertainty quantification approach, i.e. a workflow that only uses high-fidelity simulations, is unfeasible due to its prohibitive overall computational cost. To overcome this difficulty, in recent years multifidelity strategies emerged and gained popularity. Their core idea is to combine simulations with varying levels of fidelity/accuracy in order to obtain estimators or surrogates that can yield the same accuracy of their single fidelity counterparts at a much lower computational cost. This goal is usually accomplished by defining a priori a sequence of discretization levels or physical modeling assumptions that can be used to decrease the complexity of a numerical model realization and thus its computational cost. Less attention has been dedicated to low-fidelity models that can be built directly from a small number of available high-fidelity simulations. In this work we focus our attention on reduced order models (ROMs). Our main goal in this work is to investigate the combination of multifidelity uncertainty quantification and ROMs in order to evaluate the possibility to obtain an efficient framework for propagating uncertainties through expensive numerical codes. We focus our attention on sampling-based multifidelity approaches, like the multifidelity control variate, and we consider several scenarios for a numerical test problem, namely the Kuramoto-Sivashinsky equation, for which the efficiency of the multifidelity-ROM estimator is compared to the standard (single-fidelity) Monte Carlo approach.

More Details

Krylov Smoothing for Fully-Coupled AMG Preconditioners for VMS Resistive MHD

Lecture Notes in Computational Science and Engineering

Lin, Paul L.; Shadid, John N.; Tsuji, Paul H.

This study explores the use of a Krylov iterative method (GMRES) as a smoother for an algebraic multigrid (AMG) preconditioned Newton–Krylov iterative solution approach for a fully-implicit variational multiscale (VMS) finite element (FE) resistive magnetohydrodynamics (MHD) formulation. The efficiency of this approach is critically dependent on the scalability and performance of the AMG preconditioner for the linear solutions and the performance of the smoothers play an essential role. Krylov smoothers are considered an attempt to reduce the time and memory requirements of existing robust smoothers based on additive Schwarz domain decomposition (DD) with incomplete LU factorization solves on each subdomain. This brief study presents three time dependent resistive MHD test cases to evaluate the method. The results demonstrate that the GMRES smoother can be faster due to a decrease in the preconditioner setup time and a reduction in outer GMRESR solver iterations, and requires less memory (typically 35% less memory for global GMRES smoother) than the DD ILU smoother.

More Details

Lightweight Software Process Improvement Using Productivity and Sustainability Improvement Planning (PSIP)

Communications in Computer and Information Science

Milewicz, Reed M.; Heroux, Michael A.; Gonsiorowski, Elsa; Gupta, Rinku; Moulton, J.D.; Watson, Gregory R.; Willenbring, James M.; Zamora, Richard J.; Raybourn, Elaine M.

Productivity and Sustainability Improvement Planning (PSIP) is a lightweight, iterative workflow that allows software development teams to identify development bottlenecks and track progress to overcome them. In this paper, we present an overview of PSIP and how it compares to other software process improvement (SPI) methodologies, and provide two case studies that describe how the use of PSIP led to successful improvements in team effectiveness and efficiency.

More Details

GMLS-NEts: A machine learning framework for unstructured data

CEUR Workshop Proceedings

Trask, Nathaniel A.; Patel, Ravi G.; Gross, Ben J.; Atzberger, Paul J.

Data fields sampled on irregularly spaced points arise in many science and engineering applications. For regular grids, Convolutional Neural Networks (CNNs) gain benefits from weight sharing and invariances. We generalize CNNs by introducing methods for data on unstructured point clouds using Generalized Moving Least Squares (GMLS). GMLS is a nonparametric meshfree technique for estimating linear bounded functionals from scattered data, and has emerged as an effective technique for solving partial differential equations (PDEs). By parameterizing the GMLS estimator, we obtain learning methods for linear and non-linear operators with unstructured stencils. The requisite calculations are local, embarrassingly parallelizable, and supported by a rigorous approximation theory. We show how the framework may be used for unstructured physical data sets to perform operator regression, develop predictive dynamical models, and obtain feature extractors for engineering quantities of interest. The results show the promise of these architectures as foundations for data-driven model development in scientific machine learning applications.

More Details

Regular sensitivity computation avoiding chaotic effects in particle-in-cell plasma methods

Journal of Computational Physics

Chung, Seung W.; Bond, Stephen D.; Cyr, Eric C.; Freund, Jonathan B.

Particle-in-cell (PIC) simulation methods are attractive for representing species distribution functions in plasmas. However, as a model, they introduce uncertain parameters, and for quantifying their prediction uncertainty it is useful to be able to assess the sensitivity of a quantity-of-interest (QoI) to these parameters. Such sensitivity information is likewise useful for optimization. However, computing sensitivity for PIC methods is challenging due to the chaotic particle dynamics, and sensitivity techniques remain underdeveloped compared to those for Eulerian discretizations. This challenge is examined from a dual particle–continuum perspective that motivates a new sensitivity discretization. Two routes to sensitivity computation are presented and compared: a direct fully-Lagrangian particle-exact approach provides sensitivities of each particle trajectory, and a new particle-pdf discretization, which is formulated from a continuum perspective but discretized by particles to take the advantages of the same type of Lagrangian particle description leveraged by PIC methods. Since the sensitivity particles in this approach are only indirectly linked to the plasma-PIC particles, they can be positioned and weighted independently for efficiency and accuracy. The corresponding numerical algorithms are presented in mathematical detail. The advantage of the particle-pdf approach in avoiding the spurious chaotic sensitivity of the particle-exact approach is demonstrated for Debye shielding and sheath configurations. In essence, the continuum perspective makes implicit the distinctness of the particles, which circumvents the Lyapunov instability of the N-body PIC system. The cost of the particle-pdf approach is comparable to the baseline PIC simulation.

More Details

Hyper-Differential Sensitivity Analysis of Uncertain Parameters in PDE-Constrained Optimization

International Journal for Uncertainty Quantification

van Bloemen Waanders, Bart G.

Many problems in engineering and sciences require the solution of large scale optimization constrained by partial differential equations (PDEs). Though PDE-constrained optimization is itself challenging, most applications pose additional complexity, namely, uncertain parameters in the PDEs. Uncertainty quantification (UQ) is necessary to characterize, prioritize, and study the influence of these uncertain parameters. Sensitivity analysis, a classical tool in UQ, is frequently used to study the sensitivity of a model to uncertain parameters. In this article, we introduce "hyper-differential sensitivity analysis" which considers the sensitivity of the solution of a PDE-constrained optimization problem to uncertain parameters. Our approach is a goal-oriented analysis which may be viewed as a tool to complement other UQ methods in the service of decision making and robust design. We formally define hyper-differential sensitivity indices and highlight their relationship to the existing optimization and sensitivity analysis literatures. Assuming the presence of low rank structure in the parameter space, computational efficiency is achieved by leveraging a generalized singular value decomposition in conjunction with a randomized solver which converts the computational bottleneck of the algorithm into an embarrassingly parallel loop. Two multi-physics examples, consisting of nonlinear steady state control and transient linear inversion, demonstrate efficient identification of the uncertain parameters which have the greatest influence on the optimal solution.

More Details

Multifidelity uncertainty propagation for cardiovascular hemodynamics

Proceedings of the 6th European Conference on Computational Mechanics: Solids, Structures and Coupled Problems, ECCM 2018 and 7th European Conference on Computational Fluid Dynamics, ECFD 2018

Schiavazzi, Daniele E.; Fleeter, Casey M.; Geraci, Gianluca G.; Marsden, Alison L.

Predictions from numerical hemodynamics are increasingly adopted and trusted in the diagnosis and treatment of cardiovascular disease. However, the predictive abilities of deterministic numerical models are limited due to the large number of possible sources of uncertainty including boundary conditions, vessel wall material properties, and patient specific model anatomy. Stochastic approaches have been proposed as a possible improvement, but are penalized by the large computational cost associated with repeated solutions of the underlying deterministic model. We propose a stochastic framework which leverages three cardiovascular model fidelities, i.e., three-, one- and zero-dimensional representations of cardiovascular blood flow. Specifically, we employ multilevel and multifidelity estimators from Sandia's open-source Dakota toolkit to reduce the variance in our estimated quantities of interest, while maintaining a reasonable computational cost. The performance of these estimators in terms of computational cost reductions is investigated for both global and local hemodynamic indicators.

More Details

FROSch: A Fast And Robust Overlapping Schwarz Domain Decomposition Preconditioner Based on Xpetra in Trilinos

Lecture Notes in Computational Science and Engineering

Heinlein, Alexander; Klawonn, Axel; Rajamanickam, Sivasankaran R.; Rheinbach, Oliver

This article describes a parallel implementation of a two-level overlapping Schwarz preconditioner with the GDSW (Generalized Dryja–Smith–Widlund) coarse space described in previous work [12, 10, 15] into the Trilinos framework; cf. [16]. The software is a significant improvement of a previous implementation [12]; see Sec. 4 for results on the improved performance.

More Details

Enabling Scalable Multifluid Plasma Simulations Through Block Preconditioning

Lecture Notes in Computational Science and Engineering

Phillips, Edward G.; Shadid, John N.; Cyr, Eric C.; Miller, Sean M.

Recent work has demonstrated that block preconditioning can scalably accelerate the performance of iterative solvers applied to linear systems arising in implicit multiphysics PDE simulations. The idea of block preconditioning is to decompose the system matrix into physical sub-blocks and apply individual specialized scalable solvers to each sub-block. It can be advantageous to block into simpler segregated physics systems or to block by discretization type. This strategy is particularly amenable to multiphysics systems in which existing solvers, such as multilevel methods, can be leveraged for component physics and to problems with disparate discretizations in which scalable monolithic solvers are rare. This work extends our recent work on scalable block preconditioning methods for structure-preserving discretizatons of the Maxwell equations and our previous work in MHD system solvers to the context of multifluid electromagnetic plasma systems. We argue how a block preconditioner can address both the disparate discretization, as well as strongly-coupled off-diagonal physics that produces fast time-scales (e.g. plasma and cyclotron frequencies). We propose a block preconditioner for plasma systems that allows reuse of existing multigrid solvers for different degrees of freedom while capturing important couplings, and demonstrate the algorithmic scalability of this approach at time-scales of interest.

More Details
Results 1626–1650 of 9,998
Results 1626–1650 of 9,998