Publications

Results 926–950 of 9,998

Search results

Jump to search filters

SST-GPU: A Scalable SST GPU Component for Performance Modeling and Profiling

Hughes, Clayton; Hammond, Simon; Zhang, Mengchi; Liu, Yechen; Rogers, Tim; Hoekstra, Robert J.

Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of unprecedented amounts of data have created a space for massively parallel accelerators capable of maintaining context for thousands of concurrent threads resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. One path for the design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. The SST framework has been proven to scale up to run simulations containing tens of thousands of nodes. A previous report described the initial integration of the open-source, execution-driven GPU simulator, GPGPU-Sim, into the SST framework. This report discusses the results of the integration and how to use the new GPU component in SST. It also provides examples of what it can be used to analyze and a correlation study showing how closely the execution matches that of a Nvidia V100 GPU when running kernels and mini-apps.

More Details

Low-Communication Asynchronous Distributed Generalized Canonical Polyadic Tensor Decomposition

2021 IEEE High Performance Extreme Computing Conference Hpec 2021

Lewis, Cannada; Phipps, Eric T.

In this work, we show that reduced communication algorithms for distributed stochastic gradient descent improve the time per epoch and strong scaling for the Generalized Canonical Polyadic (GCP) tensor decomposition, but with a cost, achieving convergence becomes more difficult. The implementation, based on MPI, shows that while one-sided algorithms offer a path to asynchronous execution, the performance benefits of optimized allreduce are difficult to best.

More Details

Risk-averse control of fractional diffusion with uncertain exponent

SIAM Journal on Control and Optimization

Kouri, Drew P.; Antil, Harbir; Pfefferer, Johannes

In this paper, we introduce and analyze a new class of optimal control problems constrained by elliptic equations with uncertain fractional exponents. We utilize risk measures to formulate the resulting optimization problem. We develop a functional analytic framework, study the existence of solution, and rigorously derive the first-order optimality conditions. Additionally, we employ a sample-based approximation for the uncertain exponent and the finite element method to discretize in space. We prove the rate of convergence for the optimal risk neutral controls when using quadrature approximation for the uncertain exponent and conclude with illustrative examples.

More Details

Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Zhang, Yijia; Aksar, Burak; Aaziz, Omar R.; Schwaller, Benjamin; Brandt, James M.; Leung, Vitus J.; Egele, Manuel; Coskun, Ayse K.

On high-performance computing (HPC) systems, job allocation strategies control the placement of a job among available nodes. As the placement changes a job's communication performance, allocation can significantly affects execution times of many HPC applications. Existing allocation strategies typically make decisions based on resource limit, network topology, communication patterns, etc. However, system network performance at runtime is seldom consulted in allocation, even though it significantly affects job execution times.In this work, we demonstrate using monitoring data to improve HPC systems' performance by proposing a NetworkData-Driven (NeDD) job allocation framework, which monitors the network performance of an HPC system at runtime and allocates resources based on both network performance and job characteristics. NeDD characterizes system network performance by collecting the network traffic statistics on each router link, and it characterizes a job's sensitivity to network congestion by collecting Message Passing Interface (MPI) statistics. During allocation, NeDD pairs network-sensitive (network-insensitive) jobs with nodes whose parent routers have low (high) network traffic. Through experiments on a large HPC system, we demonstrate that NeDD reduces the execution time of parallel applications by 11% on average and up to 34%.

More Details

Effects of EOS and constitutive models on simulating copper shaped charge jets in ALEGRA

2019 15th Hypervelocity Impact Symposium, HVIS 2019

Doney, Robert L.; Niederhaus, John H.J.; Fuller, Timothy J.; Coppinger, Matthew J.

In this work we evaluated the effects that equations of state and strength models have on SCJ development using the Sandia National Laboratories multiphysics shock code, ALEGRA. Results were quantified using a Lagrangian tracer particle following liner collapse, passing through the compression zone, and flowing into the jet tip. We found consistent results among several EOS: 3320, 3331, and 3337. The 3325 EOS generated a measurable low density and hollow region near the jet tip which appears to be reflected in a lower internal energy of the jet. At this time, we cannot tell, experimentally, if such a hollow region exists. The 3337 EOS is recent, well documented [6], and produces results similar to 3320 [3]. The various strength models produced more noticeable differences. In terms of internal energy and temperature, SGL had the largest values followed by PTW, ZA, and finally JC and MTS, which were quite similar to each other. We looked at melt conditions in the SGL and JC models using the 3337 EOS. The SGL model reported a liquid region along the jet axis all the way to the tip-seemingly consistent with experiment-while the JC model does not indicate any phase transition. None of the other yield models indicated melt along the jet axis. For all EOS and strength models, we found similar results for the velocity history of the jet tip as measured against experiment using photon Dopper velocimetry.

More Details

PMEMCPY: A simple, lightweight, and portable I/O library for storing data in persistent memory

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Logan, Luke M.; Lofstead, Gerald F.; Levy, Scott L.N.; Widener, Patrick; Sun, Xian H.; Kougkas, Anthony

Persistent memory (PMEM) devices can achieve comparable performance to DRAM while providing significantly more capacity. This has made the technology compelling as an expansion to main memory. Rethinking PMEM as storage devices can offer a high performance buffering layer for HPC applications to temporarily, but safely store data. However, modern parallel I/O libraries, such as HDF5 and pNetCDF, are complicated and introduce significant software and metadata overheads when persisting data to these storage devices, wasting much of their potential. In this work, we explore the potential of PMEM as storage through pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memory. We demonstrate that our approach is up to 2x faster than other popular parallel I/O libraries under real workloads.

More Details

ADELUS: A Performance-Portable Dense LU Solver for Distributed-Memory Hardware-Accelerated Systems

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

Dang, Vinh Q.; Kotulski, Joseph D.; Rajamanickam, Sivasankaran

Solving dense systems of linear equations is essential in applications encountered in physics, mathematics, and engineering. This paper describes our current efforts toward the development of the ADELUS package for current and next generation distributed, accelerator-based, high-performance computing platforms. The package solves dense linear systems using partial pivoting LU factorization on distributed-memory systems with CPUs/GPUs. The matrix is block-mapped onto distributed memory on CPUs/GPUs and is solved as if it was torus-wrapped for an optimal balance of computation and communication. A permutation operation is performed to restore the results so the torus-wrap distribution is transparent to the user. This package targets performance portability by leveraging the abstractions provided in the Kokkos and Kokkos Kernels libraries. Comparison of the performance gains versus the state-of-the-art SLATE and DPLASMA GESV functionalities on the Summit supercomputer are provided. Preliminary performance results from large-scale electromagnetic simulations using ADELUS are also presented. The solver achieves 7.7 Petaflops on 7600 GPUs of the Sierra supercomputer translating to 16.9% efficiency.

More Details

Polynomial preconditioned arnoldi with stability control

SIAM Journal on Scientific Computing

Embree, Mark; Loe, Jennifer A.; Morgan, Ronald

Polynomial preconditioning can improve the convergence of the Arnoldi method for computing eigenvalues. Such preconditioning significantly reduces the cost of orthogonalization; for difficult problems, it can also reduce the number of matrix-vector products. Parallel computations can particularly benefit from the reduction of communication-intensive operations. The GMRES algorithm provides a simple and effective way of generating the preconditioning polynomial. For some problems high degree polynomials are especially effective, but they can lead to stability problems that must be mitigated. A two-level "double polynomial preconditioning"strategy provides an effective way to generate high-degree preconditioners.

More Details

CHARACTERIZING HUMAN PERFORMANCE: DETECTING TARGETS AT HIGH FALSE ALARM RATES

Proceedings of the 2021 International Topical Meeting on Probabilistic Safety Assessment and Analysis, PSA 2021

Speed, Ann E.; Wheeler, Jason; Russell, John; Oppel, Fred; Sanchez, Danielle N.; Silva, Austin R.; Chavez, Anna

The prevalence effect is the observation that, in visual search tasks as the signal (target) to noise (non-target) ratio becomes smaller, humans are more likely to miss the target when it does occur. Studied extensively in the basic literature [e.g., 1, 2], this effect has implications for real-world settings such as security guards monitoring physical facilities for attacks. Importantly, what seems to drive the effect is the development of a response bias based on learned sensitivity to the statistical likelihood of a target [e.g., 3-5]. This paper presents results from two experiments aimed at understanding how the target prevalence impacts the ability for individuals to detect a target on the 1,000th trial of a series of 1000 trials. The first experiment employed the traditional prevalence effect paradigm. This paradigm involves search for a perfect capital letter T amidst imperfect Ts. In a between-subjects design, our subjects experienced target prevalence rates of 50/50, 1/10, 1/100, or 1/1000. In all conditions, the final trial was always a target. The second (ongoing) experiment replicates this design using a notional physical facility in a mod/sim environment. This simulation enables triggering different intrusion detection sensors by simulated characters and events (e.g., people, animals, weather). In this experiment, subjects viewed 1000 “alarm” events and were asked to characterize each as either a nuisance alarm (e.g., set off by an animal) or an attack. As with the basic visual search study, the final trial was always an attack.

More Details

Solving Stochastic Inverse Problems for Property–Structure Linkages Using Data-Consistent Inversion and Machine Learning

JOM

Foulk, James W.; Wildey, Timothy

Determining process–structure–property linkages is one of the key objectives in material science, and uncertainty quantification plays a critical role in understanding both process–structure and structure–property linkages. In this work, we seek to learn a distribution of microstructure parameters that are consistent in the sense that the forward propagation of this distribution through a crystal plasticity finite element model matches a target distribution on materials properties. This stochastic inversion formulation infers a distribution of acceptable/consistent microstructures, as opposed to a deterministic solution, which expands the range of feasible designs in a probabilistic manner. To solve this stochastic inverse problem, we employ a recently developed uncertainty quantification framework based on push-forward probability measures, which combines techniques from measure theory and Bayes’ rule to define a unique and numerically stable solution. This approach requires making an initial prediction using an initial guess for the distribution on model inputs and solving a stochastic forward problem. To reduce the computational burden in solving both stochastic forward and stochastic inverse problems, we combine this approach with a machine learning Bayesian regression model based on Gaussian processes and demonstrate the proposed methodology on two representative case studies in structure–property linkages.

More Details

IMPACT OF SAMPLING STRATEGIES IN THE POLYNOMIAL CHAOS SURROGATE CONSTRUCTION FOR MONTE CARLO TRANSPORT APPLICATIONS

Proceedings of the International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering, M and C 2021

Geraci, Gianluca; Olson, Aaron

The accurate construction of a surrogate model is an effective and efficient strategy for performing Uncertainty Quantification (UQ) analyses of expensive and complex engineering systems. Surrogate models are especially powerful whenever the UQ analysis requires the computation of statistics which are difficult and prohibitively expensive to obtain via a direct sampling of the model, e.g. high-order moments and probability density functions. In this paper, we discuss the construction of a polynomial chaos expansion (PCE) surrogate model for radiation transport problems for which quantities of interest are obtained via Monte Carlo simulations. In this context, it is imperative to account for the statistical variability of the simulator as well as the variability associated with the uncertain parameter inputs. More formally, in this paper we focus on understanding the impact of the Monte Carlo transport variability on the recovery of the PCE coefficients. We are able to identify the contribution of both the number of uncertain parameter samples and the number of particle histories simulated per sample in the PCE coefficient recovery. Our theoretical results indicate an accuracy improvement when using few Monte Carlo histories per random sample with respect to configurations with an equivalent computational cost. These theoretical results are numerically illustrated for a simple synthetic example and two configurations of a one-dimensional radiation transport problem in which a slab is represented by means of materials with uncertain cross sections.

More Details

Randomized sketching algorithms for low-memory dynamic optimization

SIAM Journal on Optimization

Kouri, Drew P.; Muthukumar, Ramchandran; Udell, Madeleine

This paper develops a novel limited-memory method to solve dynamic optimization problems. The memory requirements for such problems often present a major obstacle, particularly for problems with PDE constraints such as optimal flow control, full waveform inversion, and optical tomography. In these problems, PDE constraints uniquely determine the state of a physical system for a given control; the goal is to find the value of the control that minimizes an objective. While the control is often low dimensional, the state is typically more expensive to store. This paper suggests using randomized matrix approximation to compress the state as it is generated and shows how to use the compressed state to reliably solve the original dynamic optimization problem. Concretely, the compressed state is used to compute approximate gradients and to apply the Hessian to vectors. The approximation error in these quantities is controlled by the target rank of the sketch. This approximate first- and second-order information can readily be used in any optimization algorithm. As an example, we develop a sketched trust-region method that adaptively chooses the target rank using a posteriori error information and provably converges to a stationary point of the original problem. Numerical experiments with the sketched trust-region method show promising performance on challenging problems such as the optimal control of an advection-reaction-diffusion equation and the optimal control of fluid flow past a cylinder.

More Details

ADELUS: A Performance-Portable Dense LU Solver for Distributed-Memory Hardware-Accelerated Systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Dang, Vinh Q.; Kotulski, Joseph D.; Rajamanickam, Sivasankaran

Solving dense systems of linear equations is essential in applications encountered in physics, mathematics, and engineering. This paper describes our current efforts toward the development of the ADELUS package for current and next generation distributed, accelerator-based, high-performance computing platforms. The package solves dense linear systems using partial pivoting LU factorization on distributed-memory systems with CPUs/GPUs. The matrix is block-mapped onto distributed memory on CPUs/GPUs and is solved as if it was torus-wrapped for an optimal balance of computation and communication. A permutation operation is performed to restore the results so the torus-wrap distribution is transparent to the user. This package targets performance portability by leveraging the abstractions provided in the Kokkos and Kokkos Kernels libraries. Comparison of the performance gains versus the state-of-the-art SLATE and DPLASMA GESV functionalities on the Summit supercomputer are provided. Preliminary performance results from large-scale electromagnetic simulations using ADELUS are also presented. The solver achieves 7.7 Petaflops on 7600 GPUs of the Sierra supercomputer translating to 16.9% efficiency.

More Details

DPM: A Novel Training Method for Physics-Informed Neural Networks in Extrapolation

35th AAAI Conference on Artificial Intelligence, AAAI 2021

Kim, Jungeun; Lee, Kookjin L.; Lee, Dongeun; Jhin, Sheo Y.; Park, Noseong

We present a method for learning dynamics of complex physical processes described by time-dependent nonlinear partial differential equations (PDEs). Our particular interest lies in extrapolating solutions in time beyond the range of temporal domain used in training. Our choice for a baseline method is physics-informed neural network (PINN) because the method parameterizes not only the solutions, but also the equations that describe the dynamics of physical processes. We demonstrate that PINN performs poorly on extrapolation tasks in many benchmark problems. To address this, we propose a novel method for better training PINN and demonstrate that our newly enhanced PINNs can accurately extrapolate solutions in time. Our method shows up to 72% smaller errors than existing methods in terms of the standard L2-norm metric.

More Details

Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature Review

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Milewicz, Reed M.; Pirkelbauer, Peter; Soundararajan, Prema; Ahmed, Hadia; Skjellum, Tony

A source-to-source compiler is a type of translator that accepts the source code of a program written in a programming language as its input and produces an equivalent source code in the same or different programming language. S2S techniques are commonly used to enable fluent translation between high-level programming languages, to perform large-scale refactoring operations, and to facilitate instrumentation for dynamic analysis. Negative perceptions about S2S’s applicability in High Performance Computing (HPC) are studied and evaluated here. This is a first study that brings to light reasons why scientists do not use source-to-source techniques for HPC. The primary audience for this paper are those considering S2S technology in their HPC application work.

More Details

An asymptotically compatible approach for Neumann-type boundary condition on nonlocal problems

ESAIM: Mathematical Modelling and Numerical Analysis

You, Huaiqian; Lu, Xin Y.; Trask, Nathaniel A.; Yu, Yue

In this paper we consider 2D nonlocal diffusion models with a finite nonlocal horizon parameter δ characterizing the range of nonlocal interactions, and consider the treatment of Neumann-like boundary conditions that have proven challenging for discretizations of nonlocal models. We propose a new generalization of classical local Neumann conditions by converting the local flux to a correction term in the nonlocal model, which provides an estimate for the nonlocal interactions of each point with points outside the domain. While existing 2D nonlocal flux boundary conditions have been shown to exhibit at most first order convergence to the local counter part as δ → 0, the proposed Neumann-type boundary formulation recovers the local case as O(δ2) in the L∞(ω) norm, which is optimal considering the O(δ2) convergence of the nonlocal equation to its local limit away from the boundary. We analyze the application of this new boundary treatment to the nonlocal diffusion problem, and present conditions under which the solution of the nonlocal boundary value problem converges to the solution of the corresponding local Neumann problem as the horizon is reduced. To demonstrate the applicability of this nonlocal flux boundary condition to more complicated scenarios, we extend the approach to less regular domains, numerically verifying that we preserve second-order convergence for non-convex domains with corners. Based on the new formulation for nonlocal boundary condition, we develop an asymptotically compatible meshfree discretization, obtaining a solution to the nonlocal diffusion equation with mixed boundary conditions that converges with O(δ2) convergence.

More Details

StressBench: A Configurable Full System Network and I/O Benchmark Framework

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Chester, Dean G.; Groves, Taylor; Hammond, Simon; Law, Tim; Wright, Steven A.; Smedley-Stevenson, Richard; Fahmy, Suhaib A.; Mudalidge, Gihan R.; Jarvis, Stephen A.

We present StressBench, a network benchmarking framework written for testing MPI operations and file I/O concurrently. It is designed specifically to execute MPI communication and file access patterns that are representative of real-world scientific applications. Existing tools consider either the worst case congestion with small abstract patterns or peak performance with simplistic patterns. StressBench allows for a richer study of congestion by allowing orchestration of network load scenarios that are representative of those typically seen at HPC centres, something that is difficult to achieve with existing tools. We demonstrate the versatility of the framework from micro benchmarks through to finely controlled congested runs across a cluster. Validation of the results using four proxy application communication schemes within StressBench against parent applications shows a maximum difference of 15%. Using the I/O modeling capabilities of StressBench, we are able to quantify the impact of file I/O on application traffic showing how it can be used in procurement and performance studies.

More Details

A block coordinate descent optimizer for classification problems exploiting convexity

CEUR Workshop Proceedings

Patel, Ravi; Trask, Nathaniel A.; Gulian, Mamikon; Cyr, Eric C.

Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid Newton/Gradient Descent (NGD) method is consistent with the interpretation of hidden layers as providing an adaptive basis and the linear layer as providing an optimal fit of the basis to data. By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to train the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training. The size of the Hessian in the second-order step scales only with the number weights in the linear layer and not the depth and width of the hidden layers; furthermore, the approach is applicable to arbitrary hidden layer architecture. Previous work applying this adaptive basis perspective to regression problems demonstrated significant improvements in accuracy at reduced training cost, and this work can be viewed as an extension of this approach to classification problems. We first prove that the resulting Hessian matrix is symmetric semi-definite, and that the Newton step realizes a global minimizer. By studying classification of manufactured two-dimensional point cloud data, we demonstrate both an improvement in validation error and a striking qualitative difference in the basis functions encoded in the hidden layer when trained using NGD. Application to image classification benchmarks for both dense and convolutional architectures reveals improved training accuracy, suggesting gains of second-order methods over gradient descent. A Tensorflow implementation of the algorithm is available at github.com/rgp62/.

More Details

Asymptotically compatible reproducing kernel collocation and meshfree integration for nonlocal diffusion

SIAM Journal on Numerical Analysis

Leng, Yu; Tian, Xiaochuan; Trask, Nathaniel A.; Foster, John T.

Reproducing kernel (RK) approximations are meshfree methods that construct shape functions from sets of scattered data. We present an asymptotically compatible (AC) RK collocation method for nonlocal diffusion models with Dirichlet boundary condition. The numerical scheme is shown to be convergent to both nonlocal diffusion and its corresponding local limit as nonlocal interaction vanishes. The analysis is carried out on a special family of rectilinear Cartesian grids for a linear RK method with designed kernel support. The key idea for the stability of the RK collocation scheme is to compare the collocation scheme with the standard Galerkin scheme, which is stable. In addition, assembling the stiffness matrix of the nonlocal problem requires costly computational resources because high-order Gaussian quadrature is necessary to evaluate the integral. We thus provide a remedy to the problem by introducing a quasi-discrete nonlocal diffusion operator for which no numerical quadrature is further needed after applying the RK collocation scheme. The quasi-discrete nonlocal diffusion operator combined with RK collocation is shown to be convergent to the correct local diffusion problem by taking the limits of nonlocal interaction and spatial resolution simultaneously. The theoretical results are then validated with numerical experiments. We additionally illustrate a connection between the proposed technique and an existing optimization based approach based on generalized moving least squares.

More Details

Scalable3-BO: Big data meets HPC - A scalable asynchronous parallel high-dimensional Bayesian optimization framework on supercomputers

Proceedings of the ASME Design Engineering Technical Conference

Foulk, James W.

Bayesian optimization (BO) is a flexible and powerful framework that is suitable for computationally expensive simulation-based applications and guarantees statistical convergence to the global optimum. While remaining as one of the most popular optimization methods, its capability is hindered by the size of data, the dimensionality of the considered problem, and the nature of sequential optimization. These scalability issues are intertwined with each other and must be tackled simultaneously. In this work, we propose the Scalable3-BO framework, which employs sparse GP as the underlying surrogate model to scope with Big Data and is equipped with a random embedding to efficiently optimize high-dimensional problems with low effective dimensionality. The Scalable3-BO framework is further leveraged with asynchronous parallelization feature, which fully exploits the computational resource on HPC within a computational budget. As a result, the proposed Scalable3-BO framework is scalable in three independent perspectives: with respect to data size, dimensionality, and computational resource on HPC. The goal of this work is to push the frontiers of BO beyond its well-known scalability issues and minimize the wall-clock waiting time for optimizing high-dimensional computationally expensive applications. We demonstrate the capability of Scalable3-BO with 1 million data points, 10,000-dimensional problems, with 20 concurrent workers in an HPC environment.

More Details

Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

SIAM Journal on Mathematics of Data Science

Newman, Elizabeth; Ruthotto, Lars; Hart, Joseph L.; Van Bloemen Waanders, Bart

Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, nonconvexity, data sparsity, and nontrivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss–Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to nonquadratic objective functions, most notably cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In our four numerical experiments from surrogate modeling, segmentation, and classification, GNvpro solves the optimization problem more efficiently than commonly used stochastic gradient descent (SGD) schemes. Also, GNvpro finds solutions that generalize well, and in all but one example better than well-tuned SGD methods, to unseen data points.

More Details

Deep Conservation: A Latent-Dynamics Model for Exact Satisfaction of Physical Conservation Laws

35th AAAI Conference on Artificial Intelligence, AAAI 2021

Lee, Kookjin L.; Carlberg, Kevin T.

This work proposes an approach for latent-dynamics learning that exactly enforces physical conservation laws. The method comprises two steps. First, the method computes a low-dimensional embedding of the high-dimensional dynamical-system state using deep convolutional autoencoders. This defines a low-dimensional nonlinear manifold on which the state is subsequently enforced to evolve. Second, the method defines a latent-dynamics model that associates with the solution to a constrained optimization problem. Here, the objective function is defined as the sum of squares of conservation-law violations over control volumes within a finite-volume discretization of the problem; nonlinear equality constraints explicitly enforce conservation over prescribed subdomains of the problem. Under modest conditions, the resulting dynamics model guarantees that the time-evolution of the latent state exactly satisfies conservation laws over the prescribed subdomains.

More Details

INCREMENTAL INTERVAL ASSIGNMENT BY INTEGER LINEAR ALGEBRA

Proceedings of the 29th International Meshing Roundtable, IMR 2021

Mitchell, Scott A.

Interval Assignment (IA) is the problem of selecting the number of mesh edges (intervals) for each curve for conforming quad and hex meshing. The intervals x is fundamentally integer-valued, yet many approaches perform floating-point optimization and convert a floating-point solution into an integer solution. We avoid such steps: we start integer, stay integer. Incremental Interval Assignment (IIA) uses integer linear algebra (Hermite normal form) to find an initial solution to the matrix equation Ax = b satisfying the meshing constraints. Solving for reduced row echelon form provides integer vectors spanning the nullspace of A. We add vectors from the nullspace to improve the initial solution. Compared to floating-point optimization approaches, IIA is faster and always produces an integer solution. The potential drawback is that there is no theoretical guarantee that the solution is optimal, but in practice we achieve solutions close to the user goals. The software is freely available.

More Details

HIERARCHICAL PARALLELISM FOR TRANSIENT SOLID MECHANICS SIMULATIONS

World Congress in Computational Mechanics and ECCOMAS Congress

Littlewood, David J.; Jones, Reese E.; Foulk, James W.; Plews, Julia A.; Hetmaniuk, Ulrich; Lifflander, Jonathan J.

Software development for high-performance scientific computing continues to evolve in response to increased parallelism and the advent of on-node accelerators, in particular GPUs. While these hardware advancements have the potential to significantly reduce turnaround times, they also present implementation and design challenges for engineering codes. We investigate the use of two strategies to mitigate these challenges: the Kokkos library for performance portability across disparate architectures, and the DARMA/vt library for asynchronous many-task scheduling. We investigate the application of Kokkos within the NimbleSM finite element code and the LAMÉ constitutive model library. We explore the performance of DARMA/vt applied to NimbleSM contact mechanics algorithms. Software engineering strategies are discussed, followed by performance analyses of relevant solid mechanics simulations which demonstrate the promise of Kokkos and DARMA/vt for accelerated engineering simulators.

More Details

2D Microstructure Reconstruction for SEM via Non-local Patch-Based Image Inpainting

Minerals, Metals and Materials Series

Foulk, James W.; Tran, Hoang

Microstructure reconstruction is a long-standing problem in experimental and computational materials science, for which numerous attempts have been made to solve. However, the majority of approaches often treats microstructure as discrete phases, which, in turn, reduces the quality of the resulting microstructures and limits its usage to the computational level of fidelity, but not the experimental level of fidelity. In this work, we applied our previously proposed approach [41] to generate synthetic microstructure images at the experimental level of fidelity for the UltraHigh Carbon Steel DataBase (UHCSDB) [13].

More Details
Results 926–950 of 9,998
Results 926–950 of 9,998