Publications

Results 951–1000 of 9,998

Search results

Jump to search filters

USING DEEP NEURAL NETWORKS TO PREDICT MATERIAL TYPES IN CONDITIONAL POINT SAMPLING APPLIED TO MARKOVIAN MIXTURE MODELS

Proceedings of the International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering, M and C 2021

Davis, Warren L.; Olson, Aaron; Popoola, Gabriel A.; Bolintineanu, Dan S.; Rodgers, Theron M.; Vu, Emily

Conditional Point Sampling (CoPS) is a recently developed stochastic media transport algorithm that has demonstrated a high degree of accuracy in 1-D and 3-D calculations for binary mixtures with Markovian mixing statistics. In theory, CoPS has the capacity to be accurate for material structures beyond just those with Markovian statistics. However, realizing this capability will require development of conditional probability functions (CPFs) that are based, not on explicit Markovian properties, but rather on latent properties extracted from material structures. Here, we describe a first step towards extracting these properties by developing CPFs using deep neural networks (DNNs). Our new approach lays the groundwork for enabling accurate transport on many classes of stochastic media. We train DNNs on ternary stochastic media with Markovian mixing statistics and compare their CPF predictions to those made by existing CoPS CPFs, which are derived based on Markovian mixing properties. We find that the DNN CPF predictions usually outperform the existing approximate CPF predictions, but with wider variance. In addition, even when trained on only one material volume realization, the DNN CPFs are shown to make accurate predictions on other realizations that have the same internal mixing behavior. We show that it is possible to form a useful CoPS CPF by using a DNN to extract correlation properties from realizations of stochastically mixed media, thus establishing a foundation for creating CPFs for mixtures other than those with Markovian mixing, where it may not be possible to derive an accurate analytical CPF.

More Details

Photothermal alternative to device fabrication using atomic precision advanced manufacturing techniques

Journal of Micro/Nanopatterning, Materials and Metrology

Katzenmeyer, Aaron M.; Dmitrovic, Sanja; Baczewski, Andrew D.; Campbell, Quinn; Bussmann, Ezra; Lu, Tzu M.; Anderson, Evan M.; Schmucker, Scott W.; Ivie, Jeffrey A.; Campbell, Deanna M.; Ward, Daniel R.; Scrymgeour, David; Wang, George T.; Misra, Shashank

The attachment of dopant precursor molecules to depassivated areas of hydrogen-terminated silicon templated with a scanning tunneling microscope (STM) has been used to create electronic devices with subnanometer precision, typically for quantum physics experiments. This process, which we call atomic precision advanced manufacturing (APAM), dopes silicon beyond the solid-solubility limit and produces electrical and optical characteristics that may also be useful for microelectronic and plasmonic applications. However, scanned probe lithography lacks the throughput required to develop more sophisticated applications. Here, we demonstrate and characterize an APAM device workflow where scanned probe lithography of the atomic layer resist has been replaced by photolithography. An ultraviolet laser is shown to locally and controllably heat silicon above the temperature required for hydrogen depassivation on a nanosecond timescale, a process resistant to under- and overexposure. STM images indicate a narrow range of energy density where the surface is both depassivated and undamaged. Modeling that accounts for photothermal heating and the subsequent hydrogen desorption kinetics suggests that the silicon surface temperatures reached in our patterning process exceed those required for hydrogen removal in temperature-programmed desorption experiments. A phosphorus-doped van der Pauw structure made by sequentially photodepassivating a predefined area and then exposing it to phosphine is found to have a similar mobility and higher carrier density compared with devices patterned by STM. Lastly, it is also demonstrated that photodepassivation and precursor exposure steps may be performed concomitantly, a potential route to enabling APAM outside of ultrahigh vacuum.

More Details

ALAMO: Autonomous lightweight allocation, management, and optimization

Communications in Computer and Information Science

Brightwell, Ronald B.; Ferreira, Kurt; Grant, Ryan; Levy, Scott L.N.; Lofstead, Gerald F.; Olivier, Stephen L.; Foulk, James W.; Younge, Andrew J.; Gentile, Ann C.; Foulk, James W.

Several recent workshops conducted by the DOE Advanced Scientific Computing Research program have established the fact that the complexity of developing applications and executing them on high-performance computing (HPC) systems is rising at a rate which will make it nearly impossible to continue to achieve higher levels of performance and scalability. Absent an alternative approach to managing this ever-growing complexity, HPC systems will become increasingly difficult to use. A more holistic approach to designing and developing applications and managing system resources is required. This paper outlines a research strategy for managing the increasing the complexity by providing the programming environment, software stack, and hardware capabilities needed for autonomous resource management of HPC systems. Developing portable applications for a variety of HPC systems of varying scale requires a paradigm shift from the current approach, where applications are painstakingly mapped to individual machine resources, to an approach where machine resources are automatically mapped and optimized to applications as they execute. Achieving such automated resource management for HPC systems is a daunting challenge that requires significant sustained investment in exploring new approaches and novel capabilities in software and hardware that span the spectrum from programming systems to device-level mechanisms. This paper provides an overview of the functionality needed to enable autonomous resource management and optimization and describes the components currently being explored at Sandia National Laboratories to help support this capability.

More Details

Risk-averse control of fractional diffusion with uncertain exponent

SIAM Journal on Control and Optimization

Kouri, Drew P.; Antil, Harbir; Pfefferer, Johannes

In this paper, we introduce and analyze a new class of optimal control problems constrained by elliptic equations with uncertain fractional exponents. We utilize risk measures to formulate the resulting optimization problem. We develop a functional analytic framework, study the existence of solution, and rigorously derive the first-order optimality conditions. Additionally, we employ a sample-based approximation for the uncertain exponent and the finite element method to discretize in space. We prove the rate of convergence for the optimal risk neutral controls when using quadrature approximation for the uncertain exponent and conclude with illustrative examples.

More Details

Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Zhang, Yijia; Aksar, Burak; Aaziz, Omar R.; Schwaller, Benjamin; Brandt, James M.; Leung, Vitus J.; Egele, Manuel; Coskun, Ayse K.

On high-performance computing (HPC) systems, job allocation strategies control the placement of a job among available nodes. As the placement changes a job's communication performance, allocation can significantly affects execution times of many HPC applications. Existing allocation strategies typically make decisions based on resource limit, network topology, communication patterns, etc. However, system network performance at runtime is seldom consulted in allocation, even though it significantly affects job execution times.In this work, we demonstrate using monitoring data to improve HPC systems' performance by proposing a NetworkData-Driven (NeDD) job allocation framework, which monitors the network performance of an HPC system at runtime and allocates resources based on both network performance and job characteristics. NeDD characterizes system network performance by collecting the network traffic statistics on each router link, and it characterizes a job's sensitivity to network congestion by collecting Message Passing Interface (MPI) statistics. During allocation, NeDD pairs network-sensitive (network-insensitive) jobs with nodes whose parent routers have low (high) network traffic. Through experiments on a large HPC system, we demonstrate that NeDD reduces the execution time of parallel applications by 11% on average and up to 34%.

More Details

Using Computation Effectively for Scalable Poisson Tensor Factorization: Comparing Methods beyond Computational Efficiency

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Myers, Jeremy M.; Dunlavy, Daniel M.

Poisson Tensor Factorization (PTF) is an important data analysis method for analyzing patterns and relationships in multiway count data. In this work, we consider several algorithms for computing a low-rank PTF of tensors with sparse count data values via maximum likelihood estimation. Such an approach reduces to solving a nonlinear, non-convex optimization problem, which can leverage considerable parallel computation due to the structure of the problem. However, since the maximum likelihood estimator corresponds to the global minimizer of this optimization problem, it is important to consider how effective methods are at both leveraging this inherent parallelism as well as computing a good approximation to the global minimizer. In this work we present comparisons of multiple methods for PTF that illustrate the tradeoffs in computational efficiency and accurately computing the maximum likelihood estimator. We present results using synthetic and real-world data tensors to demonstrate some of the challenges when choosing a method for a given tensor.

More Details

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

Jeong, Geonhwa; Kestor, Gokcen; Chatarasi, Prasanth; Parashar, Angshuman; Tsai, Po A.; Rajamanickam, Sivasankaran; Gioiosa, Roberto; Krishna, Tushar

To meet the extreme compute demands for deep learning across commercial and scientific applications, dataflow accelerators are becoming increasingly popular. While these “domain-specific” accelerators are not fully programmable like CPUs and GPUs, they retain varying levels of flexibility with respect to data orchestration, i.e., dataflow and tiling optimizations to enhance efficiency. There are several challenges when designing new algorithms and mapping approaches to execute the algorithms for a target problem on new hardware. Previous works have addressed these challenges individually. To address this challenge as a whole, in this work, we present a HW-SW codesign ecosystem for spatial accelerators called Union within the popular MLIR compiler infrastructure. Our framework allows exploring different algorithms and their mappings on several accelerator cost models. Union also includes a plug-and-play library of accelerator cost models and mappers which can easily be extended. The algorithms and accelerator cost models are connected via a novel mapping abstraction that captures the map space of spatial accelerators which can be systematically pruned based on constraints from the hardware, workload, and mapper. We demonstrate the value of Union for the community with several case studies which examine offloading different tensor operations (CONV/GEMM/Tensor Contraction) on diverse accelerator architectures using different mapping schemes.

More Details

Low-Communication Asynchronous Distributed Generalized Canonical Polyadic Tensor Decomposition

2021 IEEE High Performance Extreme Computing Conference Hpec 2021

Lewis, Cannada; Phipps, Eric T.

In this work, we show that reduced communication algorithms for distributed stochastic gradient descent improve the time per epoch and strong scaling for the Generalized Canonical Polyadic (GCP) tensor decomposition, but with a cost, achieving convergence becomes more difficult. The implementation, based on MPI, shows that while one-sided algorithms offer a path to asynchronous execution, the performance benefits of optimized allreduce are difficult to best.

More Details

Understanding the Effects of DRAM Correctable Error Logging at Scale

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Ferreira, Kurt; Levy, Scott L.N.; Kuhns, Victor; Debardeleben, Nathan; Blanchard, Sean

Fault tolerance poses a major challenge for future large-scale systems. Current research on fault tolerance has been principally focused on mitigating the impact of uncorrectable errors: errors that corrupt the state of the machine and require a restart from a known good state. However, correctable errors occur much more frequently than uncorrectable errors and may be even more common on future systems. Although an application can safely continue to execute when correctable errors occur, recovery from a correctable error requires the error to be corrected and, in most cases, information about its occurrence to be logged. The potential performance impact of these recovery activities has not been extensively studied in HPC. In this paper, we use simulation to examine the relationship between recovery from correctable errors and application performance for several important extreme-scale workloads. Our paper contains what is, to the best of our knowledge, the first detailed analysis of the impact of correctable errors on application performance. Our study shows that correctable errors can have significant impact on application performance for future systems. We also find that although the focus on correctable errors is focused on reducing failure rates, reducing the time required to log individual errors may have a greater impact on overheads at scale. Finally, this study outlines the error frequency and durations targets to keep correctable overheads similar to that of today's systems. This paper provides critical analysis and insight into the overheads of correctable errors and provides practical advice to systems administrators and hardware designers in an effort to fine-tune performance to application and system characteristics.

More Details

Randomized sketching algorithms for low-memory dynamic optimization

SIAM Journal on Optimization

Kouri, Drew P.; Muthukumar, Ramchandran; Udell, Madeleine

This paper develops a novel limited-memory method to solve dynamic optimization problems. The memory requirements for such problems often present a major obstacle, particularly for problems with PDE constraints such as optimal flow control, full waveform inversion, and optical tomography. In these problems, PDE constraints uniquely determine the state of a physical system for a given control; the goal is to find the value of the control that minimizes an objective. While the control is often low dimensional, the state is typically more expensive to store. This paper suggests using randomized matrix approximation to compress the state as it is generated and shows how to use the compressed state to reliably solve the original dynamic optimization problem. Concretely, the compressed state is used to compute approximate gradients and to apply the Hessian to vectors. The approximation error in these quantities is controlled by the target rank of the sketch. This approximate first- and second-order information can readily be used in any optimization algorithm. As an example, we develop a sketched trust-region method that adaptively chooses the target rank using a posteriori error information and provably converges to a stationary point of the original problem. Numerical experiments with the sketched trust-region method show promising performance on challenging problems such as the optimal control of an advection-reaction-diffusion equation and the optimal control of fluid flow past a cylinder.

More Details

Error estimates for the optimal control of a parabolic fractional pde

SIAM Journal on Numerical Analysis

Glusa, Christian; Otarola, Enrique

We consider the integral definition of the fractional Laplacian and analyze a linearquadratic optimal control problem for the so-called fractional heat equation; control constraints are also considered. We derive existence and uniqueness results, first order optimality conditions, and regularity estimates for the optimal variables. To discretize the state equation we propose a fully discrete scheme that relies on an implicit finite difference discretization in time combined with a piecewise linear finite element discretization in space. We derive stability results and a novel L2(0, T;L2(Ω)) a priori error estimate. On the basis of the aforementioned solution technique, we propose a fully discrete scheme for our optimal control problem that discretizes the control variable with piecewise constant functions, and we derive a priori error estimates for it. We illustrate the theory with one- and two-dimensional numerical experiments.

More Details

CHARACTERIZING HUMAN PERFORMANCE: DETECTING TARGETS AT HIGH FALSE ALARM RATES

Proceedings of the 2021 International Topical Meeting on Probabilistic Safety Assessment and Analysis, PSA 2021

Speed, Ann E.; Wheeler, Jason; Russell, John; Oppel, Fred; Sanchez, Danielle N.; Silva, Austin R.; Chavez, Anna

The prevalence effect is the observation that, in visual search tasks as the signal (target) to noise (non-target) ratio becomes smaller, humans are more likely to miss the target when it does occur. Studied extensively in the basic literature [e.g., 1, 2], this effect has implications for real-world settings such as security guards monitoring physical facilities for attacks. Importantly, what seems to drive the effect is the development of a response bias based on learned sensitivity to the statistical likelihood of a target [e.g., 3-5]. This paper presents results from two experiments aimed at understanding how the target prevalence impacts the ability for individuals to detect a target on the 1,000th trial of a series of 1000 trials. The first experiment employed the traditional prevalence effect paradigm. This paradigm involves search for a perfect capital letter T amidst imperfect Ts. In a between-subjects design, our subjects experienced target prevalence rates of 50/50, 1/10, 1/100, or 1/1000. In all conditions, the final trial was always a target. The second (ongoing) experiment replicates this design using a notional physical facility in a mod/sim environment. This simulation enables triggering different intrusion detection sensors by simulated characters and events (e.g., people, animals, weather). In this experiment, subjects viewed 1000 “alarm” events and were asked to characterize each as either a nuisance alarm (e.g., set off by an animal) or an attack. As with the basic visual search study, the final trial was always an attack.

More Details

Asymptotically compatible reproducing kernel collocation and meshfree integration for nonlocal diffusion

SIAM Journal on Numerical Analysis

Leng, Yu; Tian, Xiaochuan; Trask, Nathaniel A.; Foster, John T.

Reproducing kernel (RK) approximations are meshfree methods that construct shape functions from sets of scattered data. We present an asymptotically compatible (AC) RK collocation method for nonlocal diffusion models with Dirichlet boundary condition. The numerical scheme is shown to be convergent to both nonlocal diffusion and its corresponding local limit as nonlocal interaction vanishes. The analysis is carried out on a special family of rectilinear Cartesian grids for a linear RK method with designed kernel support. The key idea for the stability of the RK collocation scheme is to compare the collocation scheme with the standard Galerkin scheme, which is stable. In addition, assembling the stiffness matrix of the nonlocal problem requires costly computational resources because high-order Gaussian quadrature is necessary to evaluate the integral. We thus provide a remedy to the problem by introducing a quasi-discrete nonlocal diffusion operator for which no numerical quadrature is further needed after applying the RK collocation scheme. The quasi-discrete nonlocal diffusion operator combined with RK collocation is shown to be convergent to the correct local diffusion problem by taking the limits of nonlocal interaction and spatial resolution simultaneously. The theoretical results are then validated with numerical experiments. We additionally illustrate a connection between the proposed technique and an existing optimization based approach based on generalized moving least squares.

More Details

Gate Set Tomography

Quantum

Nielsen, Erik N.; Gamble, John K.; Rudinger, Kenneth M.; Scholten, Travis; Young, Kevin; Blume-Kohout, Robin

Gate set tomography (GST) is a protocol for detailed, predictive characterization of logic operations (gates) on quantum computing processors. Early versions of GST emerged around 2012-13, and since then it has been refined, demonstrated, and used in a large number of experiments. This paper presents the foundations of GST in comprehensive detail. The most important feature of GST, compared to older state and process tomography protocols, is that it is calibration-free. GST does not rely on pre-calibrated state preparations and measurements. Instead, it characterizes all the operations in a gate set simultaneously and self-consistently, relative to each other. Long sequence GST can estimate gates with very high precision and efficiency, achieving Heisenberg scaling in regimes of practical interest. In this paper, we cover GST’s intellectual history, the techniques and experiments used to achieve its intended purpose, data analysis, gauge freedom and fixing, error bars, and the interpretation of gauge-fixed estimates of gate sets. Our focus is fundamental mathematical aspects of GST, rather than implementation details, but we touch on some of the foundational algorithmic tricks used in the pyGSTi implementation.

More Details

An asymptotically compatible approach for Neumann-type boundary condition on nonlocal problems

ESAIM: Mathematical Modelling and Numerical Analysis

You, Huaiqian; Lu, Xin Y.; Trask, Nathaniel A.; Yu, Yue

In this paper we consider 2D nonlocal diffusion models with a finite nonlocal horizon parameter δ characterizing the range of nonlocal interactions, and consider the treatment of Neumann-like boundary conditions that have proven challenging for discretizations of nonlocal models. We propose a new generalization of classical local Neumann conditions by converting the local flux to a correction term in the nonlocal model, which provides an estimate for the nonlocal interactions of each point with points outside the domain. While existing 2D nonlocal flux boundary conditions have been shown to exhibit at most first order convergence to the local counter part as δ → 0, the proposed Neumann-type boundary formulation recovers the local case as O(δ2) in the L∞(ω) norm, which is optimal considering the O(δ2) convergence of the nonlocal equation to its local limit away from the boundary. We analyze the application of this new boundary treatment to the nonlocal diffusion problem, and present conditions under which the solution of the nonlocal boundary value problem converges to the solution of the corresponding local Neumann problem as the horizon is reduced. To demonstrate the applicability of this nonlocal flux boundary condition to more complicated scenarios, we extend the approach to less regular domains, numerically verifying that we preserve second-order convergence for non-convex domains with corners. Based on the new formulation for nonlocal boundary condition, we develop an asymptotically compatible meshfree discretization, obtaining a solution to the nonlocal diffusion equation with mixed boundary conditions that converges with O(δ2) convergence.

More Details

Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature Review

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Milewicz, Reed M.; Pirkelbauer, Peter; Soundararajan, Prema; Ahmed, Hadia; Skjellum, Tony

A source-to-source compiler is a type of translator that accepts the source code of a program written in a programming language as its input and produces an equivalent source code in the same or different programming language. S2S techniques are commonly used to enable fluent translation between high-level programming languages, to perform large-scale refactoring operations, and to facilitate instrumentation for dynamic analysis. Negative perceptions about S2S’s applicability in High Performance Computing (HPC) are studied and evaluated here. This is a first study that brings to light reasons why scientists do not use source-to-source techniques for HPC. The primary audience for this paper are those considering S2S technology in their HPC application work.

More Details

Spiking Neural Streaming Binary Arithmetic

Proceedings - 2021 International Conference on Rebooting Computing, ICRC 2021

Aimone, James B.; Hill, Aaron; Severa, William M.; Vineyard, Craig M.

Boolean functions and binary arithmetic operations are central to standard computing paradigms. Accordingly, many advances in computing have focused upon how to make these operations more efficient as well as exploring what they can compute. To best leverage the advantages of novel computing paradigms it is important to consider what unique computing approaches they offer. However, for any special-purpose co-processor, Boolean functions and binary arithmetic operations are useful for, among other things, avoiding unnecessary I/O on-and-off the co-processor by pre- and post-processing data on-device. This is especially true for spiking neuromorphic architectures where these basic operations are not fundamental low-level operations. Instead, these functions require specific implementation. Here we discuss the implications of an advantageous streaming binary encoding method as well as a handful of circuits designed to exactly compute elementary Boolean and binary operations.

More Details

ADELUS: A Performance-Portable Dense LU Solver for Distributed-Memory Hardware-Accelerated Systems

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

Dang, Vinh Q.; Kotulski, Joseph D.; Rajamanickam, Sivasankaran

Solving dense systems of linear equations is essential in applications encountered in physics, mathematics, and engineering. This paper describes our current efforts toward the development of the ADELUS package for current and next generation distributed, accelerator-based, high-performance computing platforms. The package solves dense linear systems using partial pivoting LU factorization on distributed-memory systems with CPUs/GPUs. The matrix is block-mapped onto distributed memory on CPUs/GPUs and is solved as if it was torus-wrapped for an optimal balance of computation and communication. A permutation operation is performed to restore the results so the torus-wrap distribution is transparent to the user. This package targets performance portability by leveraging the abstractions provided in the Kokkos and Kokkos Kernels libraries. Comparison of the performance gains versus the state-of-the-art SLATE and DPLASMA GESV functionalities on the Summit supercomputer are provided. Preliminary performance results from large-scale electromagnetic simulations using ADELUS are also presented. The solver achieves 7.7 Petaflops on 7600 GPUs of the Sierra supercomputer translating to 16.9% efficiency.

More Details

Integrated fluid and materials modeling of environmental barrier coatings

AIAA Scitech 2021 Forum

Newsome, David; Waxman, Rae; Hoffie, Andreas; Silling, Stewart

Environmental Barrier Coatings (EBC) protect ceramic matrix composites from exposure to high temperature moisture present in turbine operation through their dense top coats. However, moisture is able to diffuse and oxidize the Si bond coat to form the Thermally Grown Oxide (TGO), a layer of SiO2 where the incorporation of O causes swelling and stress. At sufficient TGO-based swelling, the EBC will fail due to increased damage such as delamination. A multiscale simulation framework has been developed to link operating conditions of a high-performance turbine to the failure modes of the EBC. Computational fluid dynamics (CFD) simulations of the E3 turbine were performed and compared to prior literature data to demonstrate the fidelity of the Loci/CHEM software to determine the flow conditions on the turbine blade surface. Boundary condition data of pressure and heat flux were then determined with the CFD simulations, providing the temperature at the bond coat. Peridynamics was used to model the microscale TGO growth. A swelling model that links moisture concentration to strain at the TGO due to the volume increase from oxidation was demonstrated, coupling moisture transport to localized strain and directly observing TGO growth and the corresponding damage. This framework is generalized and can be adapted to a range of EBC microstructures and operating conditions.

More Details

AC-Optimal Power Flow Solutions with Security Constraints from Deep Neural Network Models

Computer Aided Chemical Engineering

Kilwein, Zachary; Boukouvala, Fani; Laird, Carl; Castillo, Anya; Blakely, Logan; Eydenberg, Michael S.; Jalving, Jordan H.; Batsch-Smith, Lisa

In power grid operation, optimal power flow (OPF) problems are solved several times per day to find economically optimal generator setpoints that balance given load demands. Ideally, we seek an optimal solution that is also “N-1 secure”, meaning the system can absorb contingency events such as transmission line or generator failure without loss of service. Current practice is to solve the OPF problem and then check a subset of contingencies against heuristic values, resulting in, at best, suboptimal solutions. Unfortunately, online solution of the OPF problem including the full N-1 contingencies (i.e., two-stage stochastic programming formulation) is intractable for even modest sized electrical grids. To address this challenge, this work presents an efficient method to embed N-1 security constraints into the solution of the OPF by using Neural Network (NN) models to represent the security boundary. Our approach introduces a novel sampling technique, as well as a tuneable parameter to allow operators to balance the conservativeness of the security model within the OPF problem. Our results show that we are able to solve contingency formulations of larger size grids than reported in literature using non-linear programming (NLP) formulations with embedded NN models to local optimality. Solutions found with the NN constraint have marginally increased computational time but are more secure to contingency events.

More Details

Polynomial preconditioned arnoldi with stability control

SIAM Journal on Scientific Computing

Embree, Mark; Loe, Jennifer A.; Morgan, Ronald

Polynomial preconditioning can improve the convergence of the Arnoldi method for computing eigenvalues. Such preconditioning significantly reduces the cost of orthogonalization; for difficult problems, it can also reduce the number of matrix-vector products. Parallel computations can particularly benefit from the reduction of communication-intensive operations. The GMRES algorithm provides a simple and effective way of generating the preconditioning polynomial. For some problems high degree polynomials are especially effective, but they can lead to stability problems that must be mitigated. A two-level "double polynomial preconditioning"strategy provides an effective way to generate high-degree preconditioners.

More Details

DPM: A Novel Training Method for Physics-Informed Neural Networks in Extrapolation

35th AAAI Conference on Artificial Intelligence, AAAI 2021

Kim, Jungeun; Lee, Kookjin L.; Lee, Dongeun; Jhin, Sheo Y.; Park, Noseong

We present a method for learning dynamics of complex physical processes described by time-dependent nonlinear partial differential equations (PDEs). Our particular interest lies in extrapolating solutions in time beyond the range of temporal domain used in training. Our choice for a baseline method is physics-informed neural network (PINN) because the method parameterizes not only the solutions, but also the equations that describe the dynamics of physical processes. We demonstrate that PINN performs poorly on extrapolation tasks in many benchmark problems. To address this, we propose a novel method for better training PINN and demonstrate that our newly enhanced PINNs can accurately extrapolate solutions in time. Our method shows up to 72% smaller errors than existing methods in terms of the standard L2-norm metric.

More Details

Data-driven learning of nonlocal models: From high-fidelity simulations to constitutive laws

CEUR Workshop Proceedings

D'Elia, Marta; Silling, Stewart; You, Huaiqian; Yu, Yue

We show that machine learning can improve the accuracy of simulations of stress waves in one-dimensional composite materials. We propose a data-driven technique to learn nonlocal constitutive laws for stress wave propagation models. The method is an optimization-based technique in which the nonlocal kernel function is approximated via Bernstein polynomials. The kernel, including both its functional form and parameters, is derived so that when used in a nonlocal solver, it generates solutions that closely match high-fidelity data. The optimal kernel therefore acts as a homogenized nonlocal continuum model that accurately reproduces wave motion in a smaller-scale, more detailed model that can include multiple materials. We apply this technique to wave propagation within a heterogeneous bar with a periodic microstructure. Several one-dimensional numerical tests illustrate the accuracy of our algorithm. The optimal kernel is demonstrated to reproduce high-fidelity data for a composite material in applications that are substantially different from the problems used as training data.

More Details

HIERARCHICAL PARALLELISM FOR TRANSIENT SOLID MECHANICS SIMULATIONS

World Congress in Computational Mechanics and ECCOMAS Congress

Littlewood, David J.; Jones, Reese E.; Foulk, James W.; Plews, Julia A.; Hetmaniuk, Ulrich; Lifflander, Jonathan J.

Software development for high-performance scientific computing continues to evolve in response to increased parallelism and the advent of on-node accelerators, in particular GPUs. While these hardware advancements have the potential to significantly reduce turnaround times, they also present implementation and design challenges for engineering codes. We investigate the use of two strategies to mitigate these challenges: the Kokkos library for performance portability across disparate architectures, and the DARMA/vt library for asynchronous many-task scheduling. We investigate the application of Kokkos within the NimbleSM finite element code and the LAMÉ constitutive model library. We explore the performance of DARMA/vt applied to NimbleSM contact mechanics algorithms. Software engineering strategies are discussed, followed by performance analyses of relevant solid mechanics simulations which demonstrate the promise of Kokkos and DARMA/vt for accelerated engineering simulators.

More Details

Reusability First: Toward FAIR Workflows

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Wolf, Matthew; Logan, Jeremy; Mehta, Kshitij; Jacobson, Daniel; Cashman, Mikaela; Walker, Angelica M.; Eisenhauer, Greg; Widener, Patrick; Cliff, Ashley

The FAIR principles of open science (Findable, Accessible, Interoperable, and Reusable) have had transformative effects on modern large-scale computational science. In particular, they have encouraged more open access to and use of data, an important consideration as collaboration among teams of researchers accelerates and the use of workflows by those teams to solve problems increases. How best to apply the FAIR principles to workflows themselves, and software more generally, is not yet well understood. We argue that the software engineering concept of technical debt management provides a useful guide for application of those principles to workflows, and in particular that it implies reusability should be considered as 'first among equals'. Moreover, our approach recognizes a continuum of reusability where we can make explicit and selectable the tradeoffs required in workflows for both their users and developers. To this end, we propose a new abstraction approach for reusable workflows, with demonstrations for both synthetic workloads and real-world computational biology workflows. Through application of novel systems and tools that are based on this abstraction, these experimental workflows are refactored to rightsize the granularity of workflow components to efficiently fill the gap between end-user simplicity and general customizability. Our work makes it easier to selectively reason about and automate the connections between trade-offs across user and developer concerns when exposing degrees of freedom for reuse. Additionally, by exposing fine-grained reusability abstractions we enable performance optimizations, as we demonstrate on both institutional-scale and leadership-class HPC resources.

More Details

Path towards a vertical TFET enabled by atomic precision advanced manufacturing

2021 Silicon Nanoelectronics Workshop, SNW 2021

Lu, Tzu M.; Gao, Xujiao; Anderson, Evan M.; Mendez Granado, Juan P.; Campbell, Deanna M.; Ivie, Jeffrey A.; Schmucker, Scott W.; Grine, Albert; Lu, Ping; Tracy, Lisa A.; Arghavani, Reza; Misra, Shashank

We propose a vertical TFET using atomic precision advanced manufacturing (APAM) to create an abrupt buried n++-doped source. We developed a gate stack that preserves the APAM source to accumulate holes above it, with a goal of band-to-band tunneling (BTBT) perpendicular to the gate – critical for the proposed device. A metal-insulator-semiconductor (MIS) capacitor shows hole accumulation above the APAM source, corroborated by simulation, demonstrating the TFET’s feasibility.

More Details

Deep learning of parameterized equations with applications to uncertainty quantification

International Journal for Uncertainty Quantification

Qin, Tong; Chen, Zhen; Jakeman, John D.; Xiu, Dongbin

We propose a learning algorithm for discovering unknown parameterized dynamical systems by using observational data of the state variables. Our method is built upon and extends the recent work of discovering unknown dynamical systems, in particular those using a deep neural network (DNN). We propose a DNN structure, largely based upon the residual network (ResNet), to not only learn the unknown form of the governing equation but also to take into account the random effect embedded in the system, which is generated by the random parameters. Once the DNN model is successfully constructed, it is able to produce system prediction over a longer term and for arbitrary parameter values. For uncertainty quantification, it allows us to conduct uncertainty analysis by evaluating solution statistics over the parameter space.

More Details

Scalable3-BO: Big data meets HPC - A scalable asynchronous parallel high-dimensional Bayesian optimization framework on supercomputers

Proceedings of the ASME Design Engineering Technical Conference

Foulk, James W.

Bayesian optimization (BO) is a flexible and powerful framework that is suitable for computationally expensive simulation-based applications and guarantees statistical convergence to the global optimum. While remaining as one of the most popular optimization methods, its capability is hindered by the size of data, the dimensionality of the considered problem, and the nature of sequential optimization. These scalability issues are intertwined with each other and must be tackled simultaneously. In this work, we propose the Scalable3-BO framework, which employs sparse GP as the underlying surrogate model to scope with Big Data and is equipped with a random embedding to efficiently optimize high-dimensional problems with low effective dimensionality. The Scalable3-BO framework is further leveraged with asynchronous parallelization feature, which fully exploits the computational resource on HPC within a computational budget. As a result, the proposed Scalable3-BO framework is scalable in three independent perspectives: with respect to data size, dimensionality, and computational resource on HPC. The goal of this work is to push the frontiers of BO beyond its well-known scalability issues and minimize the wall-clock waiting time for optimizing high-dimensional computationally expensive applications. We demonstrate the capability of Scalable3-BO with 1 million data points, 10,000-dimensional problems, with 20 concurrent workers in an HPC environment.

More Details

Effects of EOS and constitutive models on simulating copper shaped charge jets in ALEGRA

2019 15th Hypervelocity Impact Symposium, HVIS 2019

Doney, Robert L.; Niederhaus, John H.J.; Fuller, Timothy J.; Coppinger, Matthew J.

In this work we evaluated the effects that equations of state and strength models have on SCJ development using the Sandia National Laboratories multiphysics shock code, ALEGRA. Results were quantified using a Lagrangian tracer particle following liner collapse, passing through the compression zone, and flowing into the jet tip. We found consistent results among several EOS: 3320, 3331, and 3337. The 3325 EOS generated a measurable low density and hollow region near the jet tip which appears to be reflected in a lower internal energy of the jet. At this time, we cannot tell, experimentally, if such a hollow region exists. The 3337 EOS is recent, well documented [6], and produces results similar to 3320 [3]. The various strength models produced more noticeable differences. In terms of internal energy and temperature, SGL had the largest values followed by PTW, ZA, and finally JC and MTS, which were quite similar to each other. We looked at melt conditions in the SGL and JC models using the 3337 EOS. The SGL model reported a liquid region along the jet axis all the way to the tip-seemingly consistent with experiment-while the JC model does not indicate any phase transition. None of the other yield models indicated melt along the jet axis. For all EOS and strength models, we found similar results for the velocity history of the jet tip as measured against experiment using photon Dopper velocimetry.

More Details

StressBench: A Configurable Full System Network and I/O Benchmark Framework

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Chester, Dean G.; Groves, Taylor; Hammond, Simon; Law, Tim; Wright, Steven A.; Smedley-Stevenson, Richard; Fahmy, Suhaib A.; Mudalidge, Gihan R.; Jarvis, Stephen A.

We present StressBench, a network benchmarking framework written for testing MPI operations and file I/O concurrently. It is designed specifically to execute MPI communication and file access patterns that are representative of real-world scientific applications. Existing tools consider either the worst case congestion with small abstract patterns or peak performance with simplistic patterns. StressBench allows for a richer study of congestion by allowing orchestration of network load scenarios that are representative of those typically seen at HPC centres, something that is difficult to achieve with existing tools. We demonstrate the versatility of the framework from micro benchmarks through to finely controlled congested runs across a cluster. Validation of the results using four proxy application communication schemes within StressBench against parent applications shows a maximum difference of 15%. Using the I/O modeling capabilities of StressBench, we are able to quantify the impact of file I/O on application traffic showing how it can be used in procurement and performance studies.

More Details

EMPIRE-PIC: A performance portable unstructured particle-in-cell code

Communications in Computational Physics

Bettencourt, Matthew T.; Brown, Dominic A.S.; Cartwright, Keith L.; Cyr, Eric C.; Glusa, Christian; Lin, Paul T.; Moore, Stan G.; Mcgregor, Duncan A.O.; Pawlowski, Roger; Phillips, Edward; Roberts, Nathan V.; Wright, Steven A.; Maheswaran, Satheesh; Jones, John P.; Jarvis, Stephen A.

In this paper we introduce EMPIRE-PIC, a finite element method particle-in-cell (FEM-PIC) application developed at Sandia National Laboratories. The code has been developed in C++ using the Trilinos library and the Kokkos Performance Portability Framework to enable running on multiple modern compute architectures while only requiring maintenance of a single codebase. EMPIRE-PIC is capable of solving both electrostatic and electromagnetic problems in two- and three-dimensions to second-order accuracy in space and time. In this paper we validate the code against three benchmark problems - a simple electron orbit, an electrostatic Langmuir wave, and a transverse electromagnetic wave propagating through a plasma. We demonstrate the performance of EMPIRE-PIC on four different architectures: Intel Haswell CPUs, Intel's Xeon Phi Knights Landing, ARM Thunder-X2 CPUs, and NVIDIA Tesla V100 GPUs attached to IBM POWER9 processors. This analysis demonstrates scalability of the code up to more than two thousand GPUs, and greater than one hundred thousand CPUs.

More Details

FROSch Preconditioners for Land Ice Simulations of Greenland and Antarctica

Heinlein, Alexander; Perego, Mauro; Rajamanickam, Sivasankaran

Numerical simulations of Greenland and Antarctic ice sheets involve the solution of large-scale highly nonlinear systems of equations on complex shallow geometries. This work is concerned with the construction of Schwarz preconditioners for the solution of the associated tangent problems, which are challenging for solvers mainly because of the strong anisotropy of the meshes and wildly changing boundary conditions that can lead to poorly constrained problems on large portions of the domain. Here, two-level GDSW (Generalized Dryja–Smith–Widlund) type Schwarz preconditioners are applied to different land ice problems, i.e., a velocity problem, a temperature problem, as well as the coupling of the former two problems. We employ the MPI-parallel implementation of multi-level Schwarz preconditioners provided by the package FROSch (Fast and Robust Schwarz)from the Trilinos library. The strength of the proposed preconditioner is that it yields out-of-the-box scalable and robust preconditioners for the single physics problems. To our knowledge, this is the first time two-level Schwarz preconditioners are applied to the ice sheet problem and a scalable preconditioner has been used for the coupled problem. The pre-conditioner for the coupled problem differs from previous monolithic GDSW preconditioners in the sense that decoupled extension operators are used to compute the values in the interior of the sub-domains. Several approaches for improving the performance, such as reuse strategies and shared memory OpenMP parallelization, are explored as well. In our numerical study we target both uniform meshes of varying resolution for the Antarctic ice sheet as well as non uniform meshes for the Greenland ice sheet are considered. We present several weak and strong scaling studies confirming the robustness of the approach and the parallel scalability of the FROSch implementation. Among the highlights of the numerical results are a weak scaling study for up to 32 K processor cores (8 K MPI-ranks and 4 OpenMP threads) and 566 M degrees of freedom for the velocity problem as well as a strong scaling study for up to 4 K processor cores (and MPI-ranks) and 68 M degrees of freedom for the coupled problem.

More Details

Dakota-NAERM Integration

Swiler, Laura P.; Newman, Sarah; Staid, Andrea; Barrett, Emily

This report presents the results of a collaborative effort under the Verification, Validation, and Uncertainty Quantification (VVUQ) thrust area of the North American Energy Resilience Model (NAERM) program. The goal of the effort described in this report was to integrate the Dakota software with the NAERM software framework to demonstrate sensitivity analysis of a co-simulation for NAERM.

More Details

Towards Predictive Plasma Science and Engineering through Revolutionary Multi-Scale Algorithms and Models (Final Report)

Laity, George R.; Robinson, Allen C.; Cuneo, Michael E.; Alam, Kathleen M.; Beckwith, Kristian; Bennett, Nichelle L.; Bettencourt, Matthew T.; Bond, Stephen D.; Cochrane, Kyle; Criscenti, Louise; Cyr, Eric C.; Foulk, James W.; Drake, Richard R.; Evstatiev, Evstati G.; Fierro, Andrew S.; Gardiner, Thomas A.; Foulk, James W.; Goeke, Ronald S.; Hamlin, Nathaniel D.; Hooper, Russell; Koski, Jason P.; Lane, James M.D.; Larson, Steven R.; Leung, Kevin; Mcgregor, Duncan A.O.; Miller, Philip R.; Miller, Sean; Ossareh, Susan J.; Phillips, Edward; Simpson, Sean; Sirajuddin, David; Smith, Thomas M.; Swan, Matthew S.; Thompson, A.P.; Tranchida, Julien; Bortz-Johnson, Asa J.; Welch, Dale; Russell, Alex; Watson, Eric; Rose, David; Mcbride, Ryan

This report describes the high-level accomplishments from the Plasma Science and Engineering Grand Challenge LDRD at Sandia National Laboratories. The Laboratory has a need to demonstrate predictive capabilities to model plasma phenomena in order to rapidly accelerate engineering development in several mission areas. The purpose of this Grand Challenge LDRD was to advance the fundamental models, methods, and algorithms along with supporting electrode science foundation to enable a revolutionary shift towards predictive plasma engineering design principles. This project integrated the SNL knowledge base in computer science, plasma physics, materials science, applied mathematics, and relevant application engineering to establish new cross-laboratory collaborations on these topics. As an initial exemplar, this project focused efforts on improving multi-scale modeling capabilities that are utilized to predict the electrical power delivery on large-scale pulsed power accelerators. Specifically, this LDRD was structured into three primary research thrusts that, when integrated, enable complex simulations of these devices: (1) the exploration of multi-scale models describing the desorption of contaminants from pulsed power electrodes, (2) the development of improved algorithms and code technologies to treat the multi-physics phenomena required to predict device performance, and (3) the creation of a rigorous verification and validation infrastructure to evaluate the codes and models across a range of challenge problems. These components were integrated into initial demonstrations of the largest simulations of multi-level vacuum power flow completed to-date, executed on the leading HPC computing machines available in the NNSA complex today. These preliminary studies indicate relevant pulsed power engineering design simulations can now be completed in (of order) several days, a significant improvement over pre-LDRD levels of performance.

More Details
Results 951–1000 of 9,998
Results 951–1000 of 9,998