Publications

Results 976–1000 of 9,998

Search results

Jump to search filters

Low-Communication Asynchronous Distributed Generalized Canonical Polyadic Tensor Decomposition

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Lewis, Cannada L.; Phipps, Eric T.

In this work, we show that reduced communication algorithms for distributed stochastic gradient descent improve the time per epoch and strong scaling for the Generalized Canonical Polyadic (GCP) tensor decomposition, but with a cost, achieving convergence becomes more difficult. The implementation, based on MPI, shows that while one-sided algorithms offer a path to asynchronous execution, the performance benefits of optimized allreduce are difficult to best.

More Details

FROSch Preconditioners for Land Ice Simulations of Greenland and Antarctica

Heinlein, Alexander; Perego, Mauro P.; Rajamanickam, Sivasankaran R.

Numerical simulations of Greenland and Antarctic ice sheets involve the solution of large-scale highly nonlinear systems of equations on complex shallow geometries. This work is concerned with the construction of Schwarz preconditioners for the solution of the associated tangent problems, which are challenging for solvers mainly because of the strong anisotropy of the meshes and wildly changing boundary conditions that can lead to poorly constrained problems on large portions of the domain. Here, two-level GDSW (Generalized Dryja–Smith–Widlund) type Schwarz preconditioners are applied to different land ice problems, i.e., a velocity problem, a temperature problem, as well as the coupling of the former two problems. We employ the MPI-parallel implementation of multi-level Schwarz preconditioners provided by the package FROSch (Fast and Robust Schwarz)from the Trilinos library. The strength of the proposed preconditioner is that it yields out-of-the-box scalable and robust preconditioners for the single physics problems. To our knowledge, this is the first time two-level Schwarz preconditioners are applied to the ice sheet problem and a scalable preconditioner has been used for the coupled problem. The pre-conditioner for the coupled problem differs from previous monolithic GDSW preconditioners in the sense that decoupled extension operators are used to compute the values in the interior of the sub-domains. Several approaches for improving the performance, such as reuse strategies and shared memory OpenMP parallelization, are explored as well. In our numerical study we target both uniform meshes of varying resolution for the Antarctic ice sheet as well as non uniform meshes for the Greenland ice sheet are considered. We present several weak and strong scaling studies confirming the robustness of the approach and the parallel scalability of the FROSch implementation. Among the highlights of the numerical results are a weak scaling study for up to 32 K processor cores (8 K MPI-ranks and 4 OpenMP threads) and 566 M degrees of freedom for the velocity problem as well as a strong scaling study for up to 4 K processor cores (and MPI-ranks) and 68 M degrees of freedom for the coupled problem.

More Details

Exploration of multifidelity UQ sampling strategies for computer network applications

International Journal for Uncertainty Quantification

Geraci, Gianluca G.; Crussell, Jonathan C.; Swiler, Laura P.; Debusschere, Bert D.

Network modeling is a powerful tool to enable rapid analysis of complex systems that can be challenging to study directly using physical testing. Two approaches are considered: emulation and simulation. The former runs real software on virtualized hardware, while the latter mimics the behavior of network components and their interactions in software. Although emulation provides an accurate representation of physical networks, this approach alone cannot guarantee the characterization of the system under realistic operative conditions. Operative conditions for physical networks are often characterized by intrinsic variability (payload size, packet latency, etc.) or a lack of precise knowledge regarding the network configuration (bandwidth, delays, etc.); therefore uncertainty quantification (UQ) strategies should be also employed. UQ strategies require multiple evaluations of the system with a number of evaluation instances that roughly increases with the problem dimensionality, i.e., the number of uncertain parameters. It follows that a typical UQ workflow for network modeling based on emulation can easily become unattainable due to its prohibitive computational cost. In this paper, a multifidelity sampling approach is discussed and applied to network modeling problems. The main idea is to optimally fuse information coming from simulations, which are a low-fidelity version of the emulation problem of interest, in order to decrease the estimator variance. By reducing the estimator variance in a sampling approach it is usually possible to obtain more reliable statistics and therefore a more reliable system characterization. Several network problems of increasing difficulty are presented. For each of them, the performance of the multifidelity estimator is compared with respect to the single fidelity counterpart, namely, Monte Carlo sampling. For all the test problems studied in this work, the multifidelity estimator demonstrated an increased efficiency with respect to MC.

More Details

Error estimates for the optimal control of a parabolic fractional pde

SIAM Journal on Numerical Analysis

Glusa, Christian A.; Otarola, Enrique

We consider the integral definition of the fractional Laplacian and analyze a linearquadratic optimal control problem for the so-called fractional heat equation; control constraints are also considered. We derive existence and uniqueness results, first order optimality conditions, and regularity estimates for the optimal variables. To discretize the state equation we propose a fully discrete scheme that relies on an implicit finite difference discretization in time combined with a piecewise linear finite element discretization in space. We derive stability results and a novel L2(0, T;L2(Ω)) a priori error estimate. On the basis of the aforementioned solution technique, we propose a fully discrete scheme for our optimal control problem that discretizes the control variable with piecewise constant functions, and we derive a priori error estimates for it. We illustrate the theory with one- and two-dimensional numerical experiments.

More Details

An asymptotically compatible approach for Neumann-type boundary condition on nonlocal problems

ESAIM: Mathematical Modelling and Numerical Analysis

You, Huaiqian; Lu, Xin Y.; Trask, Nathaniel A.; Yu, Yue

In this paper we consider 2D nonlocal diffusion models with a finite nonlocal horizon parameter δ characterizing the range of nonlocal interactions, and consider the treatment of Neumann-like boundary conditions that have proven challenging for discretizations of nonlocal models. We propose a new generalization of classical local Neumann conditions by converting the local flux to a correction term in the nonlocal model, which provides an estimate for the nonlocal interactions of each point with points outside the domain. While existing 2D nonlocal flux boundary conditions have been shown to exhibit at most first order convergence to the local counter part as δ → 0, the proposed Neumann-type boundary formulation recovers the local case as O(δ2) in the L∞(ω) norm, which is optimal considering the O(δ2) convergence of the nonlocal equation to its local limit away from the boundary. We analyze the application of this new boundary treatment to the nonlocal diffusion problem, and present conditions under which the solution of the nonlocal boundary value problem converges to the solution of the corresponding local Neumann problem as the horizon is reduced. To demonstrate the applicability of this nonlocal flux boundary condition to more complicated scenarios, we extend the approach to less regular domains, numerically verifying that we preserve second-order convergence for non-convex domains with corners. Based on the new formulation for nonlocal boundary condition, we develop an asymptotically compatible meshfree discretization, obtaining a solution to the nonlocal diffusion equation with mixed boundary conditions that converges with O(δ2) convergence.

More Details

AC-Optimal Power Flow Solutions with Security Constraints from Deep Neural Network Models

Computer Aided Chemical Engineering

Kilwein, Zachary; Boukouvala, Fani; Laird, Carl D.; Castillo, Anya; Blakely, Logan; Eydenberg, Michael S.; Jalving, Jordan H.; Batsch-Smith, Lisa

In power grid operation, optimal power flow (OPF) problems are solved several times per day to find economically optimal generator setpoints that balance given load demands. Ideally, we seek an optimal solution that is also “N-1 secure”, meaning the system can absorb contingency events such as transmission line or generator failure without loss of service. Current practice is to solve the OPF problem and then check a subset of contingencies against heuristic values, resulting in, at best, suboptimal solutions. Unfortunately, online solution of the OPF problem including the full N-1 contingencies (i.e., two-stage stochastic programming formulation) is intractable for even modest sized electrical grids. To address this challenge, this work presents an efficient method to embed N-1 security constraints into the solution of the OPF by using Neural Network (NN) models to represent the security boundary. Our approach introduces a novel sampling technique, as well as a tuneable parameter to allow operators to balance the conservativeness of the security model within the OPF problem. Our results show that we are able to solve contingency formulations of larger size grids than reported in literature using non-linear programming (NLP) formulations with embedded NN models to local optimality. Solutions found with the NN constraint have marginally increased computational time but are more secure to contingency events.

More Details

Spiking Neural Streaming Binary Arithmetic

Proceedings - 2021 International Conference on Rebooting Computing, ICRC 2021

Aimone, James B.; Hill, Aaron J.; Severa, William M.; Vineyard, Craig M.

Boolean functions and binary arithmetic operations are central to standard computing paradigms. Accordingly, many advances in computing have focused upon how to make these operations more efficient as well as exploring what they can compute. To best leverage the advantages of novel computing paradigms it is important to consider what unique computing approaches they offer. However, for any special-purpose co-processor, Boolean functions and binary arithmetic operations are useful for, among other things, avoiding unnecessary I/O on-and-off the co-processor by pre- and post-processing data on-device. This is especially true for spiking neuromorphic architectures where these basic operations are not fundamental low-level operations. Instead, these functions require specific implementation. Here we discuss the implications of an advantageous streaming binary encoding method as well as a handful of circuits designed to exactly compute elementary Boolean and binary operations.

More Details

Dakota-NAERM Integration

Swiler, Laura P.; Newman, Sarah; Staid, Andrea S.; Barrett, Emily

This report presents the results of a collaborative effort under the Verification, Validation, and Uncertainty Quantification (VVUQ) thrust area of the North American Energy Resilience Model (NAERM) program. The goal of the effort described in this report was to integrate the Dakota software with the NAERM software framework to demonstrate sensitivity analysis of a co-simulation for NAERM.

More Details

Randomized sketching algorithms for low-memory dynamic optimization

SIAM Journal on Optimization

Kouri, Drew P.; Muthukumar, Ramchandran; Udell, Madeleine

This paper develops a novel limited-memory method to solve dynamic optimization problems. The memory requirements for such problems often present a major obstacle, particularly for problems with PDE constraints such as optimal flow control, full waveform inversion, and optical tomography. In these problems, PDE constraints uniquely determine the state of a physical system for a given control; the goal is to find the value of the control that minimizes an objective. While the control is often low dimensional, the state is typically more expensive to store. This paper suggests using randomized matrix approximation to compress the state as it is generated and shows how to use the compressed state to reliably solve the original dynamic optimization problem. Concretely, the compressed state is used to compute approximate gradients and to apply the Hessian to vectors. The approximation error in these quantities is controlled by the target rank of the sketch. This approximate first- and second-order information can readily be used in any optimization algorithm. As an example, we develop a sketched trust-region method that adaptively chooses the target rank using a posteriori error information and provably converges to a stationary point of the original problem. Numerical experiments with the sketched trust-region method show promising performance on challenging problems such as the optimal control of an advection-reaction-diffusion equation and the optimal control of fluid flow past a cylinder.

More Details

Polynomial preconditioned arnoldi with stability control

SIAM Journal on Scientific Computing

Embree, Mark; Loe, Jennifer A.; Morgan, Ronald

Polynomial preconditioning can improve the convergence of the Arnoldi method for computing eigenvalues. Such preconditioning significantly reduces the cost of orthogonalization; for difficult problems, it can also reduce the number of matrix-vector products. Parallel computations can particularly benefit from the reduction of communication-intensive operations. The GMRES algorithm provides a simple and effective way of generating the preconditioning polynomial. For some problems high degree polynomials are especially effective, but they can lead to stability problems that must be mitigated. A two-level "double polynomial preconditioning"strategy provides an effective way to generate high-degree preconditioners.

More Details

Effects of EOS and constitutive models on simulating copper shaped charge jets in ALEGRA

2019 15th Hypervelocity Impact Symposium, HVIS 2019

Doney, Robert L.; Niederhaus, John H.; Fuller, Timothy J.; Coppinger, Matthew J.

In this work we evaluated the effects that equations of state and strength models have on SCJ development using the Sandia National Laboratories multiphysics shock code, ALEGRA. Results were quantified using a Lagrangian tracer particle following liner collapse, passing through the compression zone, and flowing into the jet tip. We found consistent results among several EOS: 3320, 3331, and 3337. The 3325 EOS generated a measurable low density and hollow region near the jet tip which appears to be reflected in a lower internal energy of the jet. At this time, we cannot tell, experimentally, if such a hollow region exists. The 3337 EOS is recent, well documented [6], and produces results similar to 3320 [3]. The various strength models produced more noticeable differences. In terms of internal energy and temperature, SGL had the largest values followed by PTW, ZA, and finally JC and MTS, which were quite similar to each other. We looked at melt conditions in the SGL and JC models using the 3337 EOS. The SGL model reported a liquid region along the jet axis all the way to the tip-seemingly consistent with experiment-while the JC model does not indicate any phase transition. None of the other yield models indicated melt along the jet axis. For all EOS and strength models, we found similar results for the velocity history of the jet tip as measured against experiment using photon Dopper velocimetry.

More Details

Using Computation Effectively for Scalable Poisson Tensor Factorization: Comparing Methods beyond Computational Efficiency

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Myers, Jeremy M.; Dunlavy, Daniel D.

Poisson Tensor Factorization (PTF) is an important data analysis method for analyzing patterns and relationships in multiway count data. In this work, we consider several algorithms for computing a low-rank PTF of tensors with sparse count data values via maximum likelihood estimation. Such an approach reduces to solving a nonlinear, non-convex optimization problem, which can leverage considerable parallel computation due to the structure of the problem. However, since the maximum likelihood estimator corresponds to the global minimizer of this optimization problem, it is important to consider how effective methods are at both leveraging this inherent parallelism as well as computing a good approximation to the global minimizer. In this work we present comparisons of multiple methods for PTF that illustrate the tradeoffs in computational efficiency and accurately computing the maximum likelihood estimator. We present results using synthetic and real-world data tensors to demonstrate some of the challenges when choosing a method for a given tensor.

More Details

ALAMO: Autonomous lightweight allocation, management, and optimization

Communications in Computer and Information Science

Brightwell, Ronald B.; Ferreira, Kurt B.; Grant, Ryan E.; Levy, Scott L.; Lofstead, Gerald F.; Olivier, Stephen L.; Laros, James H.; Younge, Andrew J.; Gentile, Ann C.; Laros, James H.

Several recent workshops conducted by the DOE Advanced Scientific Computing Research program have established the fact that the complexity of developing applications and executing them on high-performance computing (HPC) systems is rising at a rate which will make it nearly impossible to continue to achieve higher levels of performance and scalability. Absent an alternative approach to managing this ever-growing complexity, HPC systems will become increasingly difficult to use. A more holistic approach to designing and developing applications and managing system resources is required. This paper outlines a research strategy for managing the increasing the complexity by providing the programming environment, software stack, and hardware capabilities needed for autonomous resource management of HPC systems. Developing portable applications for a variety of HPC systems of varying scale requires a paradigm shift from the current approach, where applications are painstakingly mapped to individual machine resources, to an approach where machine resources are automatically mapped and optimized to applications as they execute. Achieving such automated resource management for HPC systems is a daunting challenge that requires significant sustained investment in exploring new approaches and novel capabilities in software and hardware that span the spectrum from programming systems to device-level mechanisms. This paper provides an overview of the functionality needed to enable autonomous resource management and optimization and describes the components currently being explored at Sandia National Laboratories to help support this capability.

More Details

Towards Predictive Plasma Science and Engineering through Revolutionary Multi-Scale Algorithms and Models (Final Report)

Laity, George R.; Robinson, Allen C.; Cuneo, M.E.; Alam, Mary K.; Beckwith, Kristian B.; Bennett, Nichelle L.; Bettencourt, Matthew T.; Bond, Stephen D.; Cochrane, Kyle C.; Criscenti, Louise C.; Cyr, Eric C.; Laros, James H.; Drake, Richard R.; Evstatiev, Evstati G.; Fierro, Andrew S.; Gardiner, Thomas A.; Laros, James H.; Goeke, Ronald S.; Hamlin, Nathaniel D.; Hooper, Russell H.; Koski, Jason K.; Lane, James M.; Larson, Steven R.; Leung, Kevin L.; McGregor, Duncan A.; Miller, Philip R.; Miller, Sean M.; Ossareh, Susan J.; Phillips, Edward G.; Simpson, Sean S.; Sirajuddin, David S.; Smith, Thomas M.; Swan, Matthew S.; Thompson, Aidan P.; Tranchida, Julien G.; Bortz-Johnson, Asa J.; Welch, Dale R.; Russell, Alex M.; Watson, Eric D.; Rose, David V.; McBride, Ryan D.

This report describes the high-level accomplishments from the Plasma Science and Engineering Grand Challenge LDRD at Sandia National Laboratories. The Laboratory has a need to demonstrate predictive capabilities to model plasma phenomena in order to rapidly accelerate engineering development in several mission areas. The purpose of this Grand Challenge LDRD was to advance the fundamental models, methods, and algorithms along with supporting electrode science foundation to enable a revolutionary shift towards predictive plasma engineering design principles. This project integrated the SNL knowledge base in computer science, plasma physics, materials science, applied mathematics, and relevant application engineering to establish new cross-laboratory collaborations on these topics. As an initial exemplar, this project focused efforts on improving multi-scale modeling capabilities that are utilized to predict the electrical power delivery on large-scale pulsed power accelerators. Specifically, this LDRD was structured into three primary research thrusts that, when integrated, enable complex simulations of these devices: (1) the exploration of multi-scale models describing the desorption of contaminants from pulsed power electrodes, (2) the development of improved algorithms and code technologies to treat the multi-physics phenomena required to predict device performance, and (3) the creation of a rigorous verification and validation infrastructure to evaluate the codes and models across a range of challenge problems. These components were integrated into initial demonstrations of the largest simulations of multi-level vacuum power flow completed to-date, executed on the leading HPC computing machines available in the NNSA complex today. These preliminary studies indicate relevant pulsed power engineering design simulations can now be completed in (of order) several days, a significant improvement over pre-LDRD levels of performance.

More Details

HIERARCHICAL PARALLELISM FOR TRANSIENT SOLID MECHANICS SIMULATIONS

World Congress in Computational Mechanics and ECCOMAS Congress

Littlewood, David J.; Jones, Reese E.; Laros, James H.; Plews, Julia A.; Hetmaniuk, Ulrich L.; Lifflander, Jonathan

Software development for high-performance scientific computing continues to evolve in response to increased parallelism and the advent of on-node accelerators, in particular GPUs. While these hardware advancements have the potential to significantly reduce turnaround times, they also present implementation and design challenges for engineering codes. We investigate the use of two strategies to mitigate these challenges: the Kokkos library for performance portability across disparate architectures, and the DARMA/vt library for asynchronous many-task scheduling. We investigate the application of Kokkos within the NimbleSM finite element code and the LAMÉ constitutive model library. We explore the performance of DARMA/vt applied to NimbleSM contact mechanics algorithms. Software engineering strategies are discussed, followed by performance analyses of relevant solid mechanics simulations which demonstrate the promise of Kokkos and DARMA/vt for accelerated engineering simulators.

More Details

DPM: A Novel Training Method for Physics-Informed Neural Networks in Extrapolation

35th AAAI Conference on Artificial Intelligence, AAAI 2021

Kim, Jungeun; Lee, Kookjin L.; Lee, Dongeun; Jhin, Sheo Y.; Park, Noseong

We present a method for learning dynamics of complex physical processes described by time-dependent nonlinear partial differential equations (PDEs). Our particular interest lies in extrapolating solutions in time beyond the range of temporal domain used in training. Our choice for a baseline method is physics-informed neural network (PINN) because the method parameterizes not only the solutions, but also the equations that describe the dynamics of physical processes. We demonstrate that PINN performs poorly on extrapolation tasks in many benchmark problems. To address this, we propose a novel method for better training PINN and demonstrate that our newly enhanced PINNs can accurately extrapolate solutions in time. Our method shows up to 72% smaller errors than existing methods in terms of the standard L2-norm metric.

More Details

Integrated fluid and materials modeling of environmental barrier coatings

AIAA Scitech 2021 Forum

Newsome, David; Waxman, Rae; Hoffie, Andreas; Silling, Stewart A.

Environmental Barrier Coatings (EBC) protect ceramic matrix composites from exposure to high temperature moisture present in turbine operation through their dense top coats. However, moisture is able to diffuse and oxidize the Si bond coat to form the Thermally Grown Oxide (TGO), a layer of SiO2 where the incorporation of O causes swelling and stress. At sufficient TGO-based swelling, the EBC will fail due to increased damage such as delamination. A multiscale simulation framework has been developed to link operating conditions of a high-performance turbine to the failure modes of the EBC. Computational fluid dynamics (CFD) simulations of the E3 turbine were performed and compared to prior literature data to demonstrate the fidelity of the Loci/CHEM software to determine the flow conditions on the turbine blade surface. Boundary condition data of pressure and heat flux were then determined with the CFD simulations, providing the temperature at the bond coat. Peridynamics was used to model the microscale TGO growth. A swelling model that links moisture concentration to strain at the TGO due to the volume increase from oxidation was demonstrated, coupling moisture transport to localized strain and directly observing TGO growth and the corresponding damage. This framework is generalized and can be adapted to a range of EBC microstructures and operating conditions.

More Details

Bayesian model selection for metal yield models in high-velocity impact

2019 15th Hypervelocity Impact Symposium, HVIS 2019

Portone, Teresa; Niederhaus, John H.; Sanchez, Jason J.; Swiler, Laura P.

The shock hydrodynamics code ALEGRA and the optimization and uncertainty quantification toolkit Dakota are used to calibrate and select between three competing steel yield models, taking uncertainties in the system into account. A Bayesian model selection procedure is used to choose between the models in a systematic, automated fashion, within an uncertainty quantification workflow. Time-series penetration data of a long tungsten-alloy rod impacting a hardened steel plate at approximately 1250 m/s, along with their measurement uncertainty, are used to calibrate and select between the models. The procedure finds that between the Johnson–Cook, Steinberg–Guinan–Lund, and Zerilli–Armstrong stress models, Zerilli–Armstrong performs the best.

More Details

Deep learning of parameterized equations with applications to uncertainty quantification

International Journal for Uncertainty Quantification

Qin, Tong; Chen, Zhen; Jakeman, John D.; Xiu, Dongbin

We propose a learning algorithm for discovering unknown parameterized dynamical systems by using observational data of the state variables. Our method is built upon and extends the recent work of discovering unknown dynamical systems, in particular those using a deep neural network (DNN). We propose a DNN structure, largely based upon the residual network (ResNet), to not only learn the unknown form of the governing equation but also to take into account the random effect embedded in the system, which is generated by the random parameters. Once the DNN model is successfully constructed, it is able to produce system prediction over a longer term and for arbitrary parameter values. For uncertainty quantification, it allows us to conduct uncertainty analysis by evaluating solution statistics over the parameter space.

More Details
Results 976–1000 of 9,998
Results 976–1000 of 9,998