In this paper, we continue our efforts to exploit optimization and control ideas as a common foundation for the development of property-preserving numerical methods. Here we focus on a class of scalar advection equations whose solutions have fixed mass in a given Eulerian region and constant bounds in any Lagrangian volume. Our approach separates discretization of the equations from the preservation of their solution properties by treating the latter as optimization constraints. This relieves the discretization process from having to comply with additional restrictions and makes stability and accuracy the sole considerations in its design. A property-preserving solution is then sought as a state that minimizes the distance to an optimally accurate but not property-preserving target solution computed by the scheme, subject to constraints enforcing discrete proxies of the desired properties. Furthermore, we consider two such formulations in which the optimization variables are given by the nodal solution values and suitably defined nodal fluxes, respectively. A key result of the paper reveals that a standard Algebraic Flux Correction (AFC) scheme is a modified version of the second formulation obtained by shrinking its feasible set to a hypercube. In conclusion, we present numerical studies illustrating the optimization-based formulations and comparing them with AFC
We extend and improve recent results given by Singh and Watson on using classical bounds on the union of sets in a chance-constrained optimization problem. Specifically, we revisit the so-called Dawson and Sankoff bound that provided one of the best approximations of a chance constraint in the previous analysis. Next, we show that our work is a generalization of the previous work, and in fact the inequality employed previously is a very relaxed approximation with assumptions that do not generally hold. Computational results demonstrate on average over a 43% improvement in the bounds. As a byproduct, we provide an exact reformulation of the floor function in optimization models.
We utilize generalized moving least squares (GMLS) to develop meshfree techniques for discretizing hydrodynamic flow problems on manifolds. We use exterior calculus to formulate incompressible hydrodynamic equations in the Stokesian regime and handle the divergence-free constraints via a generalized vector potential. This provides less coordinate-centric descriptions and enables the development of efficient numerical methods and splitting schemes for the fourth-order governing equations in terms of a system of second-order elliptic operators. Using a Hodge decomposition, we develop methods for manifolds having spherical topology. We show the methods exhibit high-order convergence rates for solving hydrodynamic flows on curved surfaces. The methods also provide general high-order approximations for the metric, curvature, and other geometric quantities of the manifold and associated exterior calculus operators. The approaches also can be utilized to develop high-order solvers for other scalar-valued and vector-valued problems on manifolds.
Mimetic methods discretize divergence by restricting the Gauss theorem to mesh cells. Because point clouds lack such geometric entities, construction of a compatible meshfree divergence remains a challenge. In this work, we define an abstract Meshfree Mimetic Divergence (MMD) operator on point clouds by contraction of field and virtual face moments. This MMD satisfies a discrete divergence theorem, provides a discrete local conservation principle, and is first-order accurate. We consider two MMD instantiations. The first one assumes a background mesh and uses generalized moving least squares (GMLS) to obtain the necessary field and face moments. This MMD instance is appropriate for settings where a mesh is available but its quality is insufficient for a robust and accurate mesh-based discretization. The second MMD operator retains the GMLS field moments but defines virtual face moments using computationally efficient weighted graph-Laplacian equations. This MMD instance does not require a background grid and is appropriate for applications where mesh generation creates a computational bottleneck. It allows one to trade an expensive mesh generation problem for a scalable algebraic one, without sacrificing compatibility with the divergence operator. We demonstrate the approach by using the MMD operator to obtain a virtual finite-volume discretization of conservation laws on point clouds. Numerical results in the paper confirm the mimetic properties of the method and show that it behaves similarly to standard finite volume methods.
Historically, neuroscience principles have heavily influenced artificial intelligence (AI), for example the influence of the perceptron model, essentially a simple model of a biological neuron, on artificial neural networks. More recently, notable recent AI advances, for example the growing popularity of reinforcement learning, often appear more aligned with cognitive neuroscience or psychology, focusing on function at a relatively abstract level. At the same time, neuroscience stands poised to enter a new era of large-scale high-resolution data and appears more focused on underlying neural mechanisms or architectures that can, at times, seem rather removed from functional descriptions. While this might seem to foretell a new generation of AI approaches arising from a deeper exploration of neuroscience specifically for AI, the most direct path for achieving this is unclear. Here we discuss cultural differences between the two fields, including divergent priorities that should be considered when leveraging modern-day neuroscience for AI. For example, the two fields feed two very different applications that at times require potentially conflicting perspectives. We highlight small but significant cultural shifts that we feel would greatly facilitate increased synergy between the two fields.
We describe and analyze a variance reduction approach for Monte Carlo (MC) sampling that accelerates the estimation of statistics of computationally expensive simulation models using an ensemble of models with lower cost. These lower cost models — which are typically lower fidelity with unknown statistics — are used to reduce the variance in statistical estimators relative to a MC estimator with equivalent cost. We derive the conditions under which our proposed approximate control variate framework recovers existing multifidelity variance reduction schemes as special cases. We demonstrate that existing recursive/nested strategies are suboptimal because they use the additional low-fidelity models only to efficiently estimate the unknown mean of the first low-fidelity model. As a result, they cannot achieve variance reduction beyond that of a control variate estimator that uses a single low-fidelity model with known mean. However, there often exists about an order-of-magnitude gap between the maximum achievable variance reduction using all low-fidelity models and that achieved by a single low-fidelity model with known mean. We show that our proposed approach can exploit this gap to achieve greater variance reduction by using non-recursive sampling schemes. The proposed strategy reduces the total cost of accurately estimating statistics, especially in cases where only low-fidelity simulation models are accessible for additional evaluations. Several analytic examples and an example with a hyperbolic PDE describing elastic wave propagation in heterogeneous media are used to illustrate the main features of the methodology.
Rushdi, Mostafa A.; Dief, Tarek N.; Yoshida, Shigeo; Schmehl, Roland; Rushdi, Ahmad R.
Kites can be used to harvest wind energy at higher altitudes while using only a fraction of the material required for conventional wind turbines. In this work, we present the kite system of Kyushu University and demonstrate how experimental data can be used to train machine learning regression models. The system is designed for 7 kW traction power and comprises an inflatable wing with suspended kite control unit that is either tethered to a fixed ground anchor or to a towing vehicle to produce a controlled relative flow environment. A measurement unit was attached to the kite for data acquisition. To predict the generated tether force, we collected input–output samples from a set of well-designed experimental runs to act as our labeled training data in a supervised machine learning setting. We then identified a set of key input parameters which were found to be consistent with our sensitivity analysis using Pearson input–output correlation metrics. Finally, we designed and tested the accuracy of a neural network, among other multivariate regression models. The quality metrics of our models show great promise in accurately predicting the tether force for new input/feature combinations and potentially guide new designs for optimal power generation.
Uncertainty pervades virtually every branch of science and engineering, and in many disciplines, the underlying phenomena can be modeled by partial differential equations (PDEs) with uncertain or random inputs. This work is motivated by risk-averse stochastic programming problems constrained by PDEs. These problems are posed in infinite dimensions, which leads to a significant increase in the scale of the (discretized) problem. In order to handle the inherent nonsmoothness of, for example, coherent risk measures and to exploit existing solution techniques for smooth, PDE-constrained optimization problems, we propose a variational smoothing technique called epigraphical (epi-)regularization. We investigate the effects of epi-regularization on the axioms of coherency and prove differentiability of the smoothed risk measures. In addition, we demonstrate variational convergence of the epi-regularized risk measures and prove the consistency of minimizers and first-order stationary points for the approximate risk-averse optimization problem. We conclude with numerical experiments confirming our theoretical results.
Remote Direct Memory Access (RDMA) is an increasingly important technology in high-performance computing (HPC). RDMA provides low-latency, high-bandwidth data transfer between compute nodes. Additionally, it does not require explicit synchronization with the destination processor. Eliminating unnecessary synchronization can significantly improve the communication performance of large-scale scientific codes. A long-standing challenge presented by RDMA communication is mitigating the cost of registering memory with the network interface controller (NIC). Reusing memory once it is registered has been shown to significantly reduce the cost of RDMA communication. However, existing approaches for reusing memory rely on implicit memory semantics. In this paper, we introduce an approach that makes memory reuse semantics explicit by exposing a separate allocator for registered memory. The data and analysis in this paper yield the following contributions: (i) managing registered memory explicitly enables efficient reuse of registered memory; (ii) registering large memory regions to amortize the registration cost over multiple user requests can significantly reduce cost of acquiring new registered memory; and (iii) reducing the cost of acquiring registered memory can significantly improve the performance of RDMA communication. Reusing registered memory is key to high-performance RDMA communication. By making reuse semantics explicit, our approach has the potential to improve RDMA performance by making it significantly easier for programmers to efficiently reuse registered memory.
Generally, scientific simulations load the entire simulation domain into memory because most, if not all, of the data changes with each time step. This has driven application structures that have, in turn, affected the design of popular IO libraries, such as HDF-5, ADIOS, and NetCDF. This assumption makes sense for many cases, but there is also a significant collection of simulations where this approach results in vast swaths of unchanged data written each time step.This paper explores a new IO approach that is capable of stitching together a coherent global view of the total simulation space at any given time. This benefit is achieved with no performance penalty compared to running with the full data set in memory, at a radically smaller process requirement, and results in radical data reduction with no fidelity loss. Additionally, the structures employed enable online simulation monitoring.
Data movement is a significant and growing consumer of energy in modern systems, from specialized low-power accelerators to GPUs with power budgets in the hundreds of Watts. Given the importance of the problem, prior work has proposed designing interconnects on which the energy cost of transmitting a 0 is significantly lower than that of transmitting a 1. With such an interconnect, data movement energy is reduced by encoding the transmitted data such that the number of 1s is minimized. Although promising, these data encoding proposals do not take full advantage of application level semantics. As an example of a neglected optimization opportunity, consider the case of a dot product computation as part of a neural network inference task. The order in which the neural network weights are fetched and processed does not affect correctness, and can be optimized to further reduce data movement energy. This paper presents commutative data reordering (CDR), a hardware-software approach that leverages the commutative property in linear algebra to strategically select the order in which weight matrix coefficients are fetched from memory. To find a low-energy transmission order, weight ordering is modeled as an instance of one of two well-studied problems, the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. This reduction makes it possible to leverage the vast body of work on efficient approximation methods to find a good transmission order. CDR exploits the indirection inherent to sparse matrix formats such that no additional metadata is required to specify the selected order. The hardware modifications required to support CDR are minimal, and incur an area penalty of less than 0.01% when implemented on top of a mobile-class GPU. When applied to 7 neural network inference tasks running on a GPU-based system, CDR respectively reduces average DRAM IO energy by 53.1% and 22.2% over the data bus invert encoding scheme used by LPDDR4, and the recently proposed Base + XOR encoding. These savings are attained with no changes to the mobile system software and no runtime performance penalty.
As part of the Department of Energy response to the novel coronavirus pandemic of 2020, a modeling effort was sponsored by the DOE Office of Science. One task of this modeling effort at Sandia was to develop a model to predict medical resource needs given various patient arrival scenarios. Resources needed include personnel resources (nurses, ICU nurses, physicians, respiratory therapists), fixed resources (regular or ICU beds and ventilators), and consumable resources (masks, gowns, gloves, face shields, sedatives). This report documents the uncertainty analysis that was performed on the resource model. The uncertainty analysis involved sampling 26 input parameters to the model. The sampling was performed conditional on the patient arrival streams that also were inputs to the model. These patient arrival streams were derived from various epidemiology models and had a significant effect on the projected resource needs. In this report, we document the sampling approach, the parameter ranges used, and the computational workflow necessary to perform large-scale uncertainty studies for every county and state in the United States.
Graph partitioning has been an important tool to partition the work among several processors to minimize the communication cost and balance the workload. While accelerator-based supercomputers are emerging to be the standard, the use of graph partitioning becomes even more important as applications are rapidly moving to these architectures. However, there is no scalable, distributed-memory, multi-GPU graph partitioner available for applications. We developed a spectral graph partitioner, Sphynx, using the portable, accelerator-friendly stack of the Trilinos framework. We use Sphnyx to systematically evaluate the various algorithmic choices in spectral partitioning with a focus on GPU performance. We perform those evaluations on irregular graphs, because state-of-the-art partitioners have the most difficulty on them. We demonstrate that Sphynx is up to 17x faster on GPUs compared to the case on CPUs, and up to 580x faster compared to a state-of-the-art multilevel partitioner. Sphynx provides a robust alternative for applications looking for a GPU-based partitioner.
A broad set of data science and engineering questions may be organized as graphs, providing a powerful means for describing relational data. Although experts now routinely compute graph algorithms on huge, unstructured graphs using high performance computing (HPC) or cloud resources, this practice hasn't yet broken into the mainstream. Such computations require great expertise, yet users often need rapid prototyping and development to quickly customize existing code. Toward that end, we are exploring the use of the Chapel programming language as a means of making some important graph analytics more accessible, examining the breadth of characteristics that would make for a productive programming environment, one that is expressive, performant, portable, and robust.
Bayesian optimization (BO) is an effective surrogate-based method that has been widely used to optimize simulation-based applications. While the traditional Bayesian optimization approach only applies to single-fidelity models, many realistic applications provide multiple levels of fidelity with various levels of computational complexity and predictive capability. In this work, we propose a multi-fidelity Bayesian optimization method for design applications with both known and unknown constraints. The proposed framework, called sMF-BO-2CoGP, is built on a multi-level CoKriging method to predict the objective function. An external binary classifier, which we approximate using a separate CoKriging model, is used to distinguish between feasible and infeasible regions. Finally, the sMF-BO-2CoGP method is demonstrated using a series of analytical examples and a flip-chip application for design optimization to minimize the deformation due to warping under thermal loading conditions.
We propose a multilevel approach for trace systems resulting from hybridized discontinuous Galerkin (HDG) methods. The key is to blend ideas from nested dissection, domain decomposition, and high-order characteristic of HDG discretizations. Specifically, we first create a coarse solver by eliminating and/or limiting the front growth in nested dissection. This is accomplished by projecting the trace data into a sequence of same or high-order polynomials on a set of increasingly h-coarser edges/faces. We then combine the coarse solver with a block-Jacobi fine scale solver to form a two-level solver/preconditioner. Numerical experiments indicate that the performance of the resulting two-level solver/preconditioner depends on the smoothness of the solution and can offer significant speedups and memory savings compared to the nested dissection direct solver. While the proposed algorithms are developed within the HDG framework, they are applicable to other hybrid(ized) high-order finite element methods. Moreover, we show that our multilevel algorithms can be interpreted as a multigrid method with specific intergrid transfer and smoothing operators. With several numerical examples from Poisson, pure transport, and convection-diffusion equations we demonstrate the robustness and scalability of the algorithms with respect to solution order. While scalability with mesh size in general is not guaranteed and depends on the smoothness of the solution and the type of equation, improving it is a part of future work.
Here, we study orthogonal polynomials with respect to self-similar measures, focusing on the class of infinite Bernoulli convolutions, which are defined by iterated function systems with overlaps, especially those defined by the Pisot, Garsia, and Salem numbers. By using an algorithm of Mantica, we obtain graphs of the coefficients of the 3-term recursion relation defining the orthogonal polynomials. We use these graphs to predict whether the singular infinite Bernoulli convolutions belong to the Nevai class. Based on our numerical results, we conjecture that all infinite Bernoulli convolutions with contraction ratios greater than or equal to 1/2 belong to Nevai’s class, regardless of the probability weights assigned to the self-similar measures.
This report describes the results of a seven day effort to assist subject matter experts address a problem related to COVID-19. In the course of this effort, we analyzed the 29K documents provided as part of the White House's call to action. This involved applying a variety of natural language processing techniques and compression-based analytics in combination with visualization techniques and assessment with subject matter experts to pursue answers to a specific question. In this paper, we will describe the algorithms, the software, the study performed, and availability of the software developed during the effort.
Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision.
The ExaLearn miniGAN team (Ellis and Rajamanickam) have released miniGAN, a generative adversarial network(GAN) proxy application, through the ECP proxy application suite. miniGAN is the first machine learning proxy application in the suite (note: the ECP CANDLE project did previously release some benchmarks) and models the performance for training generator and discriminator networks. The GAN's generator and discriminator generate plausible 2D/3D maps and identify fake maps, respectively. miniGAN aims to be a proxy application for related applications in cosmology (CosmoFlow, ExaGAN) and wind energy (ExaWind). miniGAN has been developed so that optimized mathematical kernels (e.g., kernels provided by Kokkos Kernels) can be plugged into to the proxy application to explore potential performance improvements. miniGAN has been released as open source software and is available through the ECP proxy application website (https://proxyapps.exascaleproject.ordecp-proxy-appssuite/) and on GitHub (https://github.com/SandiaMLMiniApps/miniGAN). As part of this release, a generator is provided to generate a data set (series of images) that are inputs to the proxy application.