Fractional equations have become the model of choice in several applications where heterogeneities at the microstructure result in anomalous diffusive behavior at the macroscale. In this work we introduce a new fractional operator characterized by a doubly-variable fractional order and possibly truncated interactions. Under certain conditions on the model parameters and on the regularity of the fractional order we show that the corresponding Poisson problem is well-posed. We also introduce a finite element discretization and describe an efficient implementation of the finite-element matrix assembly in the case of piecewise constant fractional order. Through several numerical tests, we illustrate the improved descriptive power of this new operator across media interfaces. Furthermore, we present one-dimensional and two-dimensional h-convergence results that show that the variable-order model has the same convergence behavior as the constant-order model.
Nonlocal models provide a much-needed predictive capability for important Sandia mission applications, ranging from fracture mechanics for nuclear components to subsurface flow for nuclear waste disposal, where traditional partial differential equations (PDEs) models fail to capture effects due to long-range forces at the microscale and mesoscale. However, utilization of this capability is seriously compromised by the lack of a rigorous nonlocal interface theory, required for both application and efficient solution of nonlocal models. To unlock the full potential of nonlocal modeling we developed a mathematically rigorous and physically consistent interface theory and demonstrate its scope in mission-relevant exemplar problems.
Nonlocal operators of fractional type are a popular modeling choice for applications that do not adhere to classical diffusive behavior; however, one major challenge in nonlocal simulations is the selection of model parameters. In this work we propose an optimization-based approach to parameter identification for fractional models with an optional truncation radius. We formulate the inference problem as an optimal control problem where the objective is to minimize the discrepancy between observed data and an approximate solution of the model, and the control variables are the fractional order and the truncation length. For the numerical solution of the minimization problem we propose a gradient-based approach, where we enhance the numerical performance by an approximation of the bilinear form of the state equation and its derivative with respect to the fractional order. Several numerical tests in one and two dimensions illustrate the theoretical results and show the robustness and applicability of our method.
Anomalous behavior is ubiquitous in subsurface solute transport due to the presence of high degrees of heterogeneity at different scales in the media. Although fractional models have been extensively used to describe the anomalous transport in various subsurface applications, their application is hindered by computational challenges. Simpler nonlocal models characterized by integrable kernels and finite interaction length represent a computationally feasible alternative to fractional models; yet, the informed choice of their kernel functions still remains an open problem. We propose a general data-driven framework for the discovery of optimal kernels on the basis of very small and sparse data sets in the context of anomalous subsurface transport. Using spatially sparse breakthrough curves recovered from fine-scale particle-density simulations, we learn the best coarse-scale nonlocal model using a nonlocal operator regression technique. Predictions of the breakthrough curves obtained using the optimal nonlocal model show good agreement with fine-scale simulation results even at locations and time intervals different from the ones used to train the kernel, confirming the excellent generalization properties of the proposed algorithm. A comparison with trained classical models and with black-box deep neural networks confirms the superiority of the predictive capability of the proposed model.
We propose a domain decomposition method for the efficient simulation of nonlocal problems. Our approach is based on a multi-domain formulation of a nonlocal diffusion problem where the subdomains share “nonlocal” interfaces of the size of the nonlocal horizon. This system of nonlocal equations is first rewritten in terms of minimization of a nonlocal energy, then discretized with a meshfree approximation and finally solved via a Lagrange multiplier approach in a way that resembles the finite element tearing and interconnect method. Specifically, we propose a distributed projected gradient algorithm for the solution of the Lagrange multiplier system, whose unknowns determine the nonlocal interface conditions between subdomains. Several two-dimensional numerical tests on problems as large as 191 million unknowns illustrate the strong and the weak scalability of our algorithm, which outperforms the standard approach to the distributed numerical solution of the problem. This work is the first rigorous numerical study in a two-dimensional multi-domain setting for nonlocal operators with finite horizon and, as such, it is a fundamental step towards increasing the use of nonlocal models in large scale simulations.
Graph partitioning has been an important tool to partition the work among several processors to minimize the communication cost and balance the workload. While accelerator-based supercomputers are emerging to be the standard, the use of graph partitioning becomes even more important as applications are rapidly moving to these architectures. However, there is no distributed-memory-parallel, multi-GPU graph partitioner available for applications. We developed a spectral graph partitioner, Sphynx, using the portable, accelerator-friendly stack of the Trilinos framework. In Sphynx, we allow using different preconditioners and exploit their unique advantages. We use Sphynx to systematically evaluate the various algorithmic choices in spectral partitioning with a focus on the GPU performance. We perform those evaluations on two distinct classes of graphs: regular (such as meshes, matrices from finite element methods) and irregular (such as social networks and web graphs), and show that different settings and preconditioners are needed for these graph classes. The experimental results on the Summit supercomputer show that Sphynx is the fastest alternative on irregular graphs in an application-friendly setting and obtains a partitioning quality close to ParMETIS on regular graphs. When compared to nvGRAPH on a single GPU, Sphynx is faster and obtains better balance and better quality partitions. Sphynx provides a good and robust partitioning method across a wide range of graphs for applications looking for a GPU-based partitioner.
Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strategies at the linear algebra algorithms level if we want to exploit the hardware to its full potential while meeting the accuracy requirements. In this paper, we focus on preconditioned sparse iterative linear solvers, a key kernel in several CSE applications. We present a study of multiprecision strategies for accelerating this kernel on GPUs. We seek the best methods for incorporating multiple precisions into the GMRES linear solver; these include iterative refinement and parallelizable preconditioners. Our work presents strategies to determine when multiprecision GMRES will be effective and to choose parameters for a multiprecision iterative refinement solver to achieve better performance. We use an implementation that is based on the Trilinos library and employs Kokkos Kernels for performance portability of linear algebra kernels. Performance results demonstrate the promise of multiprecision approaches and demonstrate even further improvements are possible by optimizing low-level kernels.
The purpose of this paper is to study a Helmholtz problem with a spectral fractional Laplacian, instead of the standard Laplacian. Recently, it has been established that such a fractional Helmholtz problem better captures the underlying behavior in geophysical electromagnetics. We establish the well-posedness and regularity of this problem. We introduce a hybrid spectral-finite element approach to discretize it and show well-posedness of the discrete system. In addition, we derive a priori discretization error estimates. Finally, we introduce an efficient solver that scales as well as the best possible solver for the classical integer-order Helmholtz equation. We conclude with several illustrative examples that confirm our theoretical findings.
We consider the integral definition of the fractional Laplacian and analyze a linearquadratic optimal control problem for the so-called fractional heat equation; control constraints are also considered. We derive existence and uniqueness results, first order optimality conditions, and regularity estimates for the optimal variables. To discretize the state equation we propose a fully discrete scheme that relies on an implicit finite difference discretization in time combined with a piecewise linear finite element discretization in space. We derive stability results and a novel L2(0, T;L2(Ω)) a priori error estimate. On the basis of the aforementioned solution technique, we propose a fully discrete scheme for our optimal control problem that discretizes the control variable with piecewise constant functions, and we derive a priori error estimates for it. We illustrate the theory with one- and two-dimensional numerical experiments.
In this paper we introduce EMPIRE-PIC, a finite element method particle-in-cell (FEM-PIC) application developed at Sandia National Laboratories. The code has been developed in C++ using the Trilinos library and the Kokkos Performance Portability Framework to enable running on multiple modern compute architectures while only requiring maintenance of a single codebase. EMPIRE-PIC is capable of solving both electrostatic and electromagnetic problems in two- and three-dimensions to second-order accuracy in space and time. In this paper we validate the code against three benchmark problems - a simple electron orbit, an electrostatic Langmuir wave, and a transverse electromagnetic wave propagating through a plasma. We demonstrate the performance of EMPIRE-PIC on four different architectures: Intel Haswell CPUs, Intel's Xeon Phi Knights Landing, ARM Thunder-X2 CPUs, and NVIDIA Tesla V100 GPUs attached to IBM POWER9 processors. This analysis demonstrates scalability of the code up to more than two thousand GPUs, and greater than one hundred thousand CPUs.
Parallel implementations of linear iterative solvers generally alternate between phases of data exchange and phases of local computation. Increasingly large problem sizes and more heterogeneous compute architectures make load balancing and the design of low latency network interconnects that are able to satisfy the communication requirements of linear solvers very challenging tasks. In particular, global communication patterns such as inner products become increasingly limiting at scale. We explore the use of asynchronous communication based on one-sided Message Passing Interface primitives in the context of domain decomposition solvers. In particular, a scalable asynchronous two-level Schwarz method is presented. We discuss practical issues encountered in the development of a scalable solver and show experimental results obtained on a state-of-the-art supercomputer system that illustrate the benefits of asynchronous solvers in load balanced as well as load imbalanced scenarios. Using the novel method, we can observe speedups of up to four times over its classical synchronous equivalent.