Lecture Notes in Computational Science and Engineering
Maljaars, Jakob M.; Labeur, Robert J.; Trask, Nathaniel A.; Sulsky, Deborah L.
A particle-mesh strategy is presented for scalar transport problems which provides diffusion-free advection, conserves mass locally (i.e. cellwise) and exhibits optimal convergence on arbitrary polyhedral meshes. This is achieved by expressing the convective field naturally located on the Lagrangian particles as a mesh quantity by formulating a dedicated particle-mesh projection based via a PDE-constrained optimization problem. Optimal convergence and local conservation are demonstrated for a benchmark test, and the application of the scheme to mass conservative density tracking is illustrated for the Rayleigh–Taylor instability.
Proceedings of the 6th European Conference on Computational Mechanics: Solids, Structures and Coupled Problems, ECCM 2018 and 7th European Conference on Computational Fluid Dynamics, ECFD 2018
SNOWPAC (Stochastic Nonlinear Optimization With Path-Augmented Constraints) is a method for stochastic nonlinear constrained derivative-free optimization. For such problems, it extends the path-augmented constraints framework introduced by the deterministic optimization method NOWPAC and uses a noise-adapted trust region approach and Gaussian processes for noise reduction. In recent developments, SNOWPAC is available in the DAKOTA framework which offers a highly flexible interface to couple the optimizer with different sampling strategies or surrogate models. In this paper we discuss details of SNOWPAC and demonstrate the coupling with DAKOTA. We showcase the approach by presenting design optimization results of a shape in a 2D supersonic duct. This simulation is supposed to imitate the behavior of the flow in a SCRAMJET simulation but at a much lower computational cost. Additionally different mesh or model fidelities can be tested. Thus, it serves as a convenient test case before moving to costly SCRAMJET computations. Here, we study deterministic results and results obtained by introducing uncertainty on inflow parameters. As sampling strategies we compare classical Monte Carlo sampling with multilevel Monte Carlo approaches for which we developed new error estimators. All approaches show a reasonable optimization of the design over the objective while maintaining or seeking feasibility. Furthermore, we achieve significant reductions in computational cost by using multilevel approaches that combine solutions from different grid resolutions.
This study explores the use of a Krylov iterative method (GMRES) as a smoother for an algebraic multigrid (AMG) preconditioned Newton–Krylov iterative solution approach for a fully-implicit variational multiscale (VMS) finite element (FE) resistive magnetohydrodynamics (MHD) formulation. The efficiency of this approach is critically dependent on the scalability and performance of the AMG preconditioner for the linear solutions and the performance of the smoothers play an essential role. Krylov smoothers are considered an attempt to reduce the time and memory requirements of existing robust smoothers based on additive Schwarz domain decomposition (DD) with incomplete LU factorization solves on each subdomain. This brief study presents three time dependent resistive MHD test cases to evaluate the method. The results demonstrate that the GMRES smoother can be faster due to a decrease in the preconditioner setup time and a reduction in outer GMRESR solver iterations, and requires less memory (typically 35% less memory for global GMRES smoother) than the DD ILU smoother.
Di Matteo, Olivia; Gamble, John; Granade, Chris; Rudinger, Kenneth M.; Wiebe, Nathan
As increasingly impressive quantum information processors are realized in laboratories around the world, robust and reliable characterization of these devices is now more urgent than ever. These diagnostics can take many forms, but one of the most popular categories is tomography, where an underlying parameterized model is proposed for a device and inferred by experiments. Here, we introduce and implement efficient operational tomography, which uses experimental observables as these model parameters. This addresses a problem of ambiguity in representation that arises in current tomographic approaches (the gauge problem). Solving the gauge problem enables us to efficiently implement operational tomography in a Bayesian framework computationally, and hence gives us a natural way to include prior information and discuss uncertainty in fit parameters. We demonstrate this new tomography in a variety of different experimentally-relevant scenarios, including standard process tomography, Ramsey interferometry, randomized benchmarking, and gate set tomography.
There is a wealth of psychological theory regarding the drive for individuals to congregate and form social groups, positing that people may organize out of fear, social pressure, or even to manage their self-esteem. We evaluate three such theories for multi-scale validity by studying them not only at the individual scale for which they were originally developed, but also for applicability to group interactions and behavior. We implement this multi-scale analysis using a dataset of communications and group membership derived from a long-running online game, matching the intent behind the theories to quantitative measures that describe players’ behavior. Once we establish that the theories hold for the dataset, we increase the scope to test the theories at the higher scale of group interactions. Despite being formulated to describe individual cognition and motivation, we show that some group dynamics theories hold at the higher level of group cognition and can effectively describe the behavior of joint decision making and higher-level interactions.
Data fields sampled on irregularly spaced points arise in many science and engineering applications. For regular grids, Convolutional Neural Networks (CNNs) gain benefits from weight sharing and invariances. We generalize CNNs by introducing methods for data on unstructured point clouds using Generalized Moving Least Squares (GMLS). GMLS is a nonparametric meshfree technique for estimating linear bounded functionals from scattered data, and has emerged as an effective technique for solving partial differential equations (PDEs). By parameterizing the GMLS estimator, we obtain learning methods for linear and non-linear operators with unstructured stencils. The requisite calculations are local, embarrassingly parallelizable, and supported by a rigorous approximation theory. We show how the framework may be used for unstructured physical data sets to perform operator regression, develop predictive dynamical models, and obtain feature extractors for engineering quantities of interest. The results show the promise of these architectures as foundations for data-driven model development in scientific machine learning applications.
American Society of Mechanical Engineers, Fluids Engineering Division (Publication) FEDSM
Foulk, James W.; Visintainer, Robert; Furlan, John; Pagalthivarthi, Krishnan V.; Garman, Mohamed; Cutright, Aaron; Wang, Yan
Wear prediction is important in designing reliable machinery for slurry industry. It usually relies on multi-phase computational fluid dynamics, which is accurate but computationally expensive. Each run of the simulations can take hours or days even on a high-performance computing platform. The high computational cost prohibits a large number of simulations in the process of design optimization. In contrast to physics-based simulations, data-driven approaches such as machine learning are capable of providing accurate wear predictions at a small fraction of computational costs, if the models are trained properly. In this paper, a recently developed WearGP framework [1] is extended to predict the global wear quantities of interest by constructing Gaussian process surrogates. The effects of different operating conditions are investigated. The advantages of the WearGP framework are demonstrated by its high accuracy and low computational cost in predicting wear rates.
Bayesian optimization (BO) is an efficient and flexible global optimization framework that is applicable to a very wide range of engineering applications. To leverage the capability of the classical BO, many extensions, including multi-objective, multi-fidelity, parallelization, and latent-variable modeling, have been proposed to address the limitations of the classical BO framework. In this work, we propose a novel multi-objective (MO) extension, called srMOBO-3GP, to solve the MO optimization problems in a sequential setting. Three different Gaussian processes (GPs) are stacked together, where each of the GP is assigned with a different task: the first GP is used to approximate a single-objective computed from the MO definition, the second GP is used to learn the unknown constraints, and the third GP is used to learn the uncertain Pareto frontier. At each iteration, a MO augmented Tchebycheff function converting MO to single-objective is adopted and extended with a regularized ridge term, where the regularization is introduced to smooth the single-objective function. Finally, we couple the third GP along with the classical BO framework to explore the richness and diversity of the Pareto frontier by the exploitation and exploration acquisition function. The proposed framework is demonstrated using several numerical benchmark functions, as well as a thermomechanical finite element model for flip-chip package design optimization.
This article describes a parallel implementation of a two-level overlapping Schwarz preconditioner with the GDSW (Generalized Dryja–Smith–Widlund) coarse space described in previous work [12, 10, 15] into the Trilinos framework; cf. [16]. The software is a significant improvement of a previous implementation [12]; see Sec. 4 for results on the improved performance.
We present a Fourier analysis of wave propagation problems subject to a class of continuous and discontinuous discretizations using high-degree Lagrange polynomials. This allows us to obtain explicit analytical formulas for the dispersion relation and group velocity and, for the first time to our knowledge, characterize analytically the emergence of gaps in the dispersion relation at specific wavenumbers, when they exist, and compute their specific locations. Wave packets with energy at these wavenumbers will fail to propagate correctly, leading to significant numerical dispersion. We also show that the Fourier analysis generates mathematical artifacts, and we explain how to remove them through a branch selection procedure conducted by analysis of eigenvectors and associated reconstructed solutions. The higher frequency eigenmodes, named erratic in this study, are also investigated analytically and numerically.
Productivity and Sustainability Improvement Planning (PSIP) is a lightweight, iterative workflow that allows software development teams to identify development bottlenecks and track progress to overcome them. In this paper, we present an overview of PSIP and how it compares to other software process improvement (SPI) methodologies, and provide two case studies that describe how the use of PSIP led to successful improvements in team effectiveness and efficiency.
Many problems in engineering and sciences require the solution of large scale optimization constrained by partial differential equations (PDEs). Though PDE-constrained optimization is itself challenging, most applications pose additional complexity, namely, uncertain parameters in the PDEs. Uncertainty quantification (UQ) is necessary to characterize, prioritize, and study the influence of these uncertain parameters. Sensitivity analysis, a classical tool in UQ, is frequently used to study the sensitivity of a model to uncertain parameters. In this article, we introduce "hyper-differential sensitivity analysis" which considers the sensitivity of the solution of a PDE-constrained optimization problem to uncertain parameters. Our approach is a goal-oriented analysis which may be viewed as a tool to complement other UQ methods in the service of decision making and robust design. We formally define hyper-differential sensitivity indices and highlight their relationship to the existing optimization and sensitivity analysis literatures. Assuming the presence of low rank structure in the parameter space, computational efficiency is achieved by leveraging a generalized singular value decomposition in conjunction with a randomized solver which converts the computational bottleneck of the algorithm into an embarrassingly parallel loop. Two multi-physics examples, consisting of nonlinear steady state control and transient linear inversion, demonstrate efficient identification of the uncertain parameters which have the greatest influence on the optimal solution.
The data from the multi-modal transportation test conducted in 2017 demonstrated that the inputs from the shock events during all transport modes (truck, rail, and ship) were amplified from the cask to the spent commercial nuclear fuel surrogate assemblies. These data do not support common assumption that the cask content experiences the same accelerations as the cask itself. This was one of the motivations for conducting 30 cm drop tests. The goal of the 30 cm drop test is to measure accelerations and strains on the surrogate spent nuclear fuel assembly and to determine whether the fuel rods can maintain their integrity inside a transportation cask when dropped from a height of 30 cm. The 30 cm drop is the remaining NRC normal conditions of transportation regulatory requirement (10 CFR 71.71) for which there are no data on the actual surrogate fuel. Because the full-scale cask and impact limiters were not available (and their cost was prohibitive), it was proposed to achieve this goal by conducting three separate tests. This report describes the first two tests — the 30 cm drop test of the 1/3 scale cask (conducted in December 2018) and the 30 cm drop of the full-scale dummy assembly (conducted in June 2019). The dummy assembly represents the mass of a real spent nuclear fuel assembly. The third test (to be conducted in the spring of 2020) will be the 30 cm drop of the full-scale surrogate assembly. The surrogate assembly represents a real full-scale assembly in physical, material, and mechanical characteristics, as well as in mass.
Trusting simulation output is crucial for Sandia’s mission objectives. Here, we rely on these simulations to perform our high-consequence mission tasks given national treaty obligations. Other science and modeling applications, while they may have high-consequence results, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed workflow and provenance systems to aid in both automating simulation and modeling execution as well as determining exactly how was some output was created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated “sandbox” and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project explores extending the container concept to include storage as a new container type we call data pallets. Data Pallets are potentially writeable, auto generated by the system based on IO activities, and usable as a way to link the contained data back to the application and input deck used to create it.
A key problem in social network analysis is to identify nonhuman interactions. State-of-the-art bot-detection systems like Botometer train machine-learning models on user-specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non-human activity from publicly available electronic-social-network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar-inspired bound; and then remove the violator-to-violator edges. We run our algorithm on multiple social networks and show that our Dunbar-inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter-2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5 or greater. Furthermore, after we remove the roughly 15 million violator-violator edges from the 1.2-billion-edge Twitter-2010 follower graph, 34% of the violator nodes experience a factor-of-two decrease in PageRank. PageRank is a key component of many graph algorithms such as node/edge ranking and graph sparsification. Thus, this artificial inflation would bias algorithmic output, and result in some incorrect decisions based on this output.
Flame detectors provide an important layer of protection for personnel in petrochemical plants, but effective placement can be challenging. A mixed-integer nonlinear programming formulation is proposed for optimal placement of flame detectors while considering non-uniform probabilities of detection failure. We show that this approach allows for the placement of fire detectors using a fixed sensor budget and outperforms models that do not account for imperfect detection. We develop a linear relaxation to the formulation and an efficient solution algorithm that achieves global optimality with reasonable computational effort. We integrate this problem formulation into the Python package, Chama, and demonstrate the effectiveness of this formulation on a small test case and on two real-world case studies using the fire and gas mapping software, Kenexis Effigy.
Knowledge graph embedding (KGE) learns latent vector representations of named entities (i.e., vertices) and relations (i.e., edge labels) of knowledge graphs. Herein, we address two problems in KGE. First, relations may belong to one or multiple categories, such as functional, symmetric, transitive, reflexive, and so forth; thus, relation categories are not exclusive. Some relation categories cause non-trivial challenges for KGE. Second, we found that zero gradients happen frequently in many translation based embedding methods such as TransE and its variations. To solve these problems, we propose i) converting a knowledge graph into a bipartite graph, although we do not physically convert the graph but rather use an equivalent trick; ii) using multiple vector representations for a relation; and iii) using a new hinge loss based on energy ratio(rather than energy gap) that does not cause zero gradients. We show that our method significantly improves the quality of embedding.
Compact semiconductor device models are essential for efficiently designing and analyzing large circuits. However, traditional compact model development requires a large amount of manual effort and can span many years. Moreover, inclusion of new physics (e.g., radiation effects) into an existing model is not trivial and may require redevelopment from scratch. Machine Learning (ML) techniques have the potential to automate and significantly speed up the development of compact models. In addition, ML provides a range of modeling options that can be used to develop hierarchies of compact models tailored to specific circuit design stages. In this paper, we explore three such options: (1) table-based interpolation, (2) Generalized Moving Least-Squares, and (3) feedforward Deep Neural Networks, to develop compact models for a p-n junction diode. We evaluate the performance of these "data-driven" compact models by (1) comparing their voltage-current characteristics against laboratory data, and (2) building a bridge rectifier circuit using these devices, predicting the circuit's behavior using SPICE-like circuit simulations, and then comparing these predictions against laboratory measurements of the same circuit.
Community detection in graphs is a canonical social network analysis method. We consider the problem of generating suites of teras-cale synthetic social networks to compare the solution quality of parallel community-detection methods. The standard method, based on the graph generator of Lancichinetti, Fortunato, and Radicchi (LFR), has been used extensively for modest-scale graphs, but has inherent scalability limitations. We provide an alternative, based on the scalable Block Two-Level Erdos-Renyi (BTER) graph generator, that enables HPC-scale evaluation of solution quality in the style of LFR. Our approach varies community coherence, and retains other important properties. Our methods can scale real-world networks, e.g., to create a version of the Friendster network that is 512 times larger. With BTER's inherent scalability, we can generate a 15-terabyte graph (4.6B vertices, 925B edges) in just over one minute. We demonstrate our capability by showing that label-propagation community-detection algorithm can be strong-scaled with negligible solution-quality loss.
Engage the C++ standards committee to further the adoption of successful Kokkos concepts into the C++ standard, and provide feedback on proposed concurrency mechanisms such as the executors proposal.
Supporting the latest hardware and compiler versions is important to leverage improvements in the software environment and new HPC platforms. We will provide certified support for the latest releases of vendor compilers from Intel, AMD, IBM, NVIDIA, ARM and Cray as well as of open source compilers GCC and Clang.
The ECP/VTK-m project is providing the core capabilities to perform scientific visualization on Exascale architectures. The ECP/VTK-m project fills the critical feature gap of performing visualization and analysis on processors like graphics-based processors. The results of this project will be delivered in tools like ParaView, Vislt, and Ascent as well as in stand-alone form. Moreover, these projects are depending on this ECP effort to be able to make effective use of ECP architectures.
The Message Passing Interface (MPI) libraries use message queues to guarantee correct message ordering between communicating processes. Message queues are in the critical path of MPI communications and thus, the performance of message queue operations can have significant impact on the performance of applications. Collective communications are widely used in MPI applications and they can have considerable impact on generating long message queues. In this paper, we propose a unified message matching mechanism that improves the message queue search time by distinguishing messages coming from point-to-point and collective communications and using a distinct message queue data structure for them. For collective operations, it dynamically profiles the impact of each collective call on message queues during the application runtime and uses this information to adapt the message queue data structure for each collective dynamically. Moreover, we use a partner/non-partner message queue data structure for the messages coming from point-to-point communications. The proposed approach can successfully reduce the queue search time while maintaining scalable memory consumption. The evaluation results show that we can obtain up to 5.5x runtime speedup for applications with long list traversals. Moreover, we can gain up to 15% and 94% queue search time improvement for all elements in applications with short and medium list traversals, respectively.
Proceedings of PAW-ATM 2019: Parallel Applications Workshop, Alternatives to MPI+X, Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis
Pei, Yu; Bosilca, George; Yamazaki, Ichitaro; Ida, Akihiro; Dongarra, Jack
To minimize data movement, many parallel ap-plications statically distribute computational tasks among the processes. However, modern simulations often encounters ir-regular computational tasks whose computational loads change dynamically at runtime or are data dependent. As a result, load imbalance among the processes at each step of simulation is a natural situation that must be dealt with at the programming level. The de facto parallel programming approach, flat MPI (one process per core), is hardly suitable to manage the lack of balance, imposing significant idle time on the simulation as processes have to wait for the slowest process at each step of simulation. One critical application for many domains is the LU factor-ization of a large dense matrix stored in the Block Low-Rank (BLR) format. Using the low-rank format can significantly reduce the cost of factorization in many scientific applications, including the boundary element analysis of electrostatic field. However, the partitioning of the matrix based on underlying geometry leads to different sizes of the matrix blocks whose numerical ranks change at each step of factorization, leading to the load imbalance among the processes at each step of factorization. We use BLR LU factorization as a test case to study the programmability and performance of five different programming approaches: (1) flat MPI, (2) Adaptive MPI (Charm++), (3) MPI + OpenMP, (4) parameterized task graph (PTG), and (5) dynamic task discovery (DTD). The last two versions use a task-based paradigm to express the algorithm; we rely on the PaRSEC run-time system to execute the tasks. We first point out programming features needed to efficiently solve this category of problems, hinting at possible alternatives to the MPI+X programming paradigm. We then evaluate the programmability of the different approaches, detailing our experience implementing the algorithm using each of the models. Finally, we show the performance result on the Intel Haswell-based Bridges system at the Pittsburgh Supercomputing Center (PSC) and analyze the effectiveness of the implementations to address the load imbalance.
A hierarchical solver is proposed for solving sparse ill-conditioned linear systems in parallel. The solver is based on a modification of the LoRaSp method, but employs a deferred-compression technique, which provably reduces the approximation error and significantly improves efficiency. Moreover, the deferred-compression technique introduces minimal overhead and does not affect parallelism. As a result, the new solver achieves linear computational complexity under mild assumptions and excellent parallel scalability. To demonstrate the performance of the new solver, we focus on applying it to solve sparse linear systems arising from ice sheet modeling. The strong anisotropic phenomena associated with the thin structure of ice sheets creates serious challenges for existing solvers. To address the anisotropy, we additionally developed a customized partitioning scheme for the solver, which captures the strong-coupling direction accurately. In general, the partitioning can be computed algebraically with existing software packages, and thus the new solver is generalizable for solving other sparse linear systems. Our results show that ice sheet problems of about 300 million degrees of freedom have been solved in just a few minutes using 1024 processors.
Proceedings of CANOPIE-HPC 2019: 1st International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis
Containers offer a broad array of benefits, including a consistent lightweight runtime environment through OS-level virtualization, as well as low overhead to maintain and scale applications with high efficiency. Moreover, containers are known to package and deploy applications consistently across varying infrastructures. Container orchestrators manage a large number of containers for microservices based cloud applications. However, the use of such service orchestration frameworks towards HPC workloads remains relatively unexplored. In this paper we study the potential use of Kubernetes on HPC infrastructure for use by the scientific community. We directly compare both its features and performance against Docker Swarm and bare metal execution of HPC applications. Herein, we detail the configurations required for Kubernetes to operate with containerized MPI applications, specifically accounting for operations such as (1) underlying device access, (2) inter-container communication across different hosts, and (3) configuration limitations. This evaluation quantifies the performance difference between representative MPI workloads running both on bare metal and containerized orchestration frameworks with Kubernetes, operating over both Ethernet and InfiniBand interconnects. Our results show that Kubernetes and Docker Swarm can achieve near bare metal performance over RDMA communication when high performance transports are enabled. Our results also show that Kubernetes presents overheads for several HPC applications over TCP/IP protocol. However, Docker Swarm's throughput is near bare metal performance for the same applications.
Proceedings of CANOPIE-HPC 2019: 1st International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis
Containerized computing is quickly changing the landscape for the development and deployment of many HPC applications. Containers are able to lower the barrier of entry for emerging workloads to leverage supercomputing resources. However, containers are no silver bullet for deploying HPC software and there are several challenges ahead in which the community must address to ensure container workloads can be reproducible and inter-operable. In this paper, we discuss several challenges in utilizing containers for HPC applications and the current approaches used in many HPC container runtimes. These approaches have been proven to enable high-performance execution of containers at scale with the appropriate runtimes. However, the use of these techniques are still ad hoc, test the limits of container workload portability, and several gaps likely remain. We discuss those remaining gaps and propose several potential solutions, including custom container label tagging and runtime hooks as a first step in managing HPC system library complexity.
Securing cyber systems is of paramount importance, but rigorous, evidence-based techniques to support decision makers for high-consequence decisions have been missing. The need for bringing rigor into cybersecurity is well-recognized, but little progress has been made over the last decades. We introduce a new project, SECURE, that aims to bring more rigor into cyber experimentation. The core idea is to follow the footsteps of computational science and engineering and expand similar capabilities to support rigorous cyber experimentation. In this paper, we review the cyber experimentation process, present the research areas that underlie our effort, discuss the underlying research challenges, and report on our progress to date. This paper is based on work in progress, and we expect to have more complete results for the conference.
Proceedings of PMBS 2019: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis
In this work we investigate the dynamic communication behavior of parent and proxy applications, and investigate whether or not the dynamic communication behavior of the proxy matches that of its respective parent application. The idea of proxy applications is that they should match their parent well, and should exercise the hardware and perform similarly, so that from them lessons can be learned about how the HPC system and the application can best be utilized. We show here that some proxy/parent pairs do not need the extra detail of dynamic behavior analysis, while others can benefit from it, and through this we also identified a parent/proxy mismatch and improved the proxy application.
This report is the final report for the LDRD project "Fast and Robust Linear Solvers using Hierarchical Matrices". The project was a success. We developed two novel algorithms for solving sparse linear systems. We demonstrated their effectiveness on ill-conditioned linear systems from ice sheet simulations. We showed that in many cases, we can obtain near-linear scaling. We believe this approach has strong potential for difficult linear systems and should be considered for other Sandia and DOE applications. We also report on some related research activities in dense solvers and randomized linear algebra.
In this report, we abstract eleven papers published during the project and describe preliminary unpublished results that warrant follow-up work. The topic is multi-level memory algorithmics, or how to effectively use multiple layers of main memory. Modern compute nodes all have this feature in some form.
Jakeman, John D.; Eldred, Michael S.; Geraci, G.; Gorodetsky, A.
In this paper, we present an adaptive algorithm to construct response surface approximations of high-fidelity models using a hierarchy of lower fidelity models. Our algorithm is based on multiindex stochastic collocation and automatically balances physical discretization error and response surface error to construct an approximation of model outputs. This surrogate can be used for uncertainty quantification (UQ) and sensitivity analysis (SA) at a fraction of the cost of a purely high-fidelity approach. We demonstrate the effectiveness of our algorithm on a canonical test problem from the UQ literature and a complex multi-physics model that simulates the performance of an integrated nozzle for an unmanned aerospace vehicle. We find that when the input-output response is sufficiently smooth our algorithm produces approximations that can be up to orders of magnitude more accurate than single fidelity approximations for a fixed computational budget.
We present an optimization-based coupling method for local and nonlocal continuum models. Our approach couches the coupling of the models into a control problem where the states are the solutions of the nonlocal and local equations, the objective is to minimize their mismatch on the overlap of the local and nonlocal problem domains, and the virtual controls are the nonlocal volume constraint and the local boundary condition. We present the method in the context of Local-to-Nonlocal diffusion coupling. Numerical examples illustrate the theoretical properties of the approach.
Tetrahedral finite element workflows have the potential to drastically reduce time to solution for computational solid mechanics simulations when compared to traditional hexahedral finite element analogues. A recently developed, higher-order composite tetrahedral element has shown promise in the space of incompressible computational plasticity. Mesh adaptivity has the potential to increase solution accuracy and increase solution robustness. In this work, we demonstrate an initial strategy to perform conformal mesh adaptivity for this higher-order composite tetrahedral element using well-established mesh modification operations for linear tetrahedra. We propose potential extensions to improve this initial strategy in terms of robustness and accuracy.
The effects of irradiation on 3C-silicon carbide (SiC) and amorphous SiC (a-SiC) are investigated using both in situ transmission electron microscopy (TEM) and complementary molecular dynamics (MD) simulations. The single ion strikes identified in the in situ TEM irradiation experiments, utilizing a 1.7 MeV Au3+ ion beam with nanosecond resolution, are contrasted to MD simulation results of the defect cascades produced by 10-100 keV Si primary knock-on atoms (PKAs). The MD simulations also investigated defect structures that could possibly be responsible for the observed strain fields produced by single ion strikes in the TEM ion beam irradiation experiments. Both MD simulations and in situ TEM experiments show evidence of radiation damage in 3C-SiC but none in a-SiC. Selected area electron diffraction patterns, based on the results of MD simulations and in situ TEM irradiation experiments, show no evidence of structural changes in either 3C-SiC or a-SiC.
This work proposes an approach for latent dynamics learning that exactly enforces physical conservation laws. The method comprises two steps. First, we compute a low-dimensional embedding of the high-dimensional dynamical-system state using deep convolutional autoencoders. This defines a low-dimensional nonlinear manifold on which the state is subsequently enforced to evolve. Second, we define a latent dynamics model that associates with a constrained optimization problem. Specifically, the objective function is defined as the sum of squares of conservation-law violations over control volumes in a finite-volume discretization of the problem; nonlinear equality constraints explicitly enforce conservation over prescribed subdomains of the problem. The resulting dynamics model—which can be considered as a projection-based reduced-order model—ensures that the time-evolution of the latent state exactly satisfies conservation laws over the prescribed subdomains. In contrast to existing methods for latent dynamics learning, this is the only method that both employs a nonlinear embedding and computes dynamics for the latent state that guarantee the satisfaction of prescribed physical properties. Numerical experiments on a benchmark advection problem illustrate the method's ability to significantly reduce the dimensionality while enforcing physical conservation.
Morris, Joseph; Cowen, Benjamin; Teysseyre, S.; Hecht, Adam A.
Understanding the propagation of radiation damage in a material is paramount to predicting the material damage effects. To date, no current literature has investigated the Threshold Displacement Energy (TDE) of Ca and F atoms in CaF2 through molecular dynamics and simulated statistical analysis. A set of interatomic potentials between Ca-Ca, F-F, and F-Ca were splined, fully characterizing a pure CaF2 simulation cell, by using published Born-Mayer-Huggins, standard ZBL, and Coulomb potentials, with a resulting structure within 1% of standard density and published lattice constants. Using this simulation cell, molecular dynamics simulations were performed with LAMMPS using a simulation that randomly generated 500 Ca and F PKA directions for each incremental set of energies, and a simulation in each of the [1 0 0], [1 1 0], and [1 1 1] directions with 500 trials for each incremental energy. MD simulations of radiation damage in CaF2 are carried out using F and Ca PKAs, with energies ranging from 2 to 200 eV. Probabilistic determinations of the TDE and Threshold Vacancy Energy (TVE) of Ca and F atoms in CaF2 were performed, as well as examining vacancy, interstitial, and antisite production rates over the range of PKA energies. Many more F atoms were displaced from both PKA species, and though F recombination appears more probable than Ca recombination, F vacancy numbers are higher. In conclusion, the higher number of F vacancies than Ca vacancies suggests F Frenkel pairs dominate CaF2 damage.
The purpose of this paper is to study a Helmholtz problem with a spectral fractional Laplacian, instead ofthe standard Laplacian. Recently, it has been established that such a fractional Helmholtz problem better captures the underlying behavior in Geophysical Electromagnetics. We establish the well-posedness and regularity of this problem. We introduce a hybrid finite element-spectral approach to discretize it and show well-posedness of the discrete system. In addition, we derive a priori discretization error estimates. Finally, we introduce an efficient solver that scales as well as the best possible solver for the classical integer-order Helmholtz equation. We conclude with several illustrative examples that confirm our theoretical findings.
We propose a novel method for generating anisotropic adaptive Voronoi meshes that conforms to non-manifold curved boundaries. Our novel method modifies the sampling rules for the VoroCrust software to bring the VoroCrust seeds closer to the surface they are representing. This enables the reconstruction of two surfaces bounding a narrow region while filling the space in-between with stretched Voronoi cells.
A mechanical model is introduced for predicting the initiation and evolution of complex fracture patterns without the need for a damage variable or law. The model, a continuum variant of Newton’s second law, uses integral rather than partial differential operators where the region of integration is over finite domain. The force interaction is derived from a novel nonconvex strain energy density function, resulting in a nonmonotonic material model. The resulting equation of motion is proved to be mathematically well-posed. The model has the capacity to simulate nucleation and growth of multiple, mutually interacting dynamic fractures. In the limit of zero region of integration, the model reproduces the classic Griffith model of brittle fracture. The simplicity of the formulation avoids the need for supplemental kinetic relations that dictate crack growth or the need for an explicit damage evolution law.