Entanglement Enhanced Matterwave Interferometry
Abstract not provided.
Abstract not provided.
Abstract not provided.
Leibniz International Proceedings in Informatics, LIPIcs
We study a trajectory analysis problem we call the Trajectory Capture Problem (TCP), in which, for a given input set T of trajectories in the plane, and an integer k-2, we seek to compute a set of k points ("portals") to maximize the total weight of all subtrajectories of T between pairs of portals. This problem naturally arises in trajectory analysis and summarization. We show that the TCP is NP-hard (even in very special cases) and give some first approximation results. Our main focus is on attacking the TCP with practical algorithm-engineering approaches, including integer linear programming (to solve instances to provable optimality) and local search methods. We study the integrality gap arising from such approaches. We analyze our methods on different classes of data, including benchmark instances that we generate. Our goal is to understand the best performing heuristics, based on both solution time and solution quality. We demonstrate that we are able to compute provably optimal solutions for real-world instances. 2012 ACM Subject Classification Theory of computation ! Design and analysis of algorithms.
Computer Methods in Applied Mechanics and Engineering
In this paper, we continue our efforts to exploit optimization and control ideas as a common foundation for the development of property-preserving numerical methods. Here we focus on a class of scalar advection equations whose solutions have fixed mass in a given Eulerian region and constant bounds in any Lagrangian volume. Our approach separates discretization of the equations from the preservation of their solution properties by treating the latter as optimization constraints. This relieves the discretization process from having to comply with additional restrictions and makes stability and accuracy the sole considerations in its design. A property-preserving solution is then sought as a state that minimizes the distance to an optimally accurate but not property-preserving target solution computed by the scheme, subject to constraints enforcing discrete proxies of the desired properties. Furthermore, we consider two such formulations in which the optimization variables are given by the nodal solution values and suitably defined nodal fluxes, respectively. A key result of the paper reveals that a standard Algebraic Flux Correction (AFC) scheme is a modified version of the second formulation obtained by shrinking its feasible set to a hypercube. In conclusion, we present numerical studies illustrating the optimization-based formulations and comparing them with AFC
Optimization Letters
We extend and improve recent results given by Singh and Watson on using classical bounds on the union of sets in a chance-constrained optimization problem. Specifically, we revisit the so-called Dawson and Sankoff bound that provided one of the best approximations of a chance constraint in the previous analysis. Next, we show that our work is a generalization of the previous work, and in fact the inequality employed previously is a very relaxed approximation with assumptions that do not generally hold. Computational results demonstrate on average over a 43% improvement in the bounds. As a byproduct, we provide an exact reformulation of the floor function in optimization models.
Journal of Computational Physics
Mimetic methods discretize divergence by restricting the Gauss theorem to mesh cells. Because point clouds lack such geometric entities, construction of a compatible meshfree divergence remains a challenge. In this work, we define an abstract Meshfree Mimetic Divergence (MMD) operator on point clouds by contraction of field and virtual face moments. This MMD satisfies a discrete divergence theorem, provides a discrete local conservation principle, and is first-order accurate. We consider two MMD instantiations. The first one assumes a background mesh and uses generalized moving least squares (GMLS) to obtain the necessary field and face moments. This MMD instance is appropriate for settings where a mesh is available but its quality is insufficient for a robust and accurate mesh-based discretization. The second MMD operator retains the GMLS field moments but defines virtual face moments using computationally efficient weighted graph-Laplacian equations. This MMD instance does not require a background grid and is appropriate for applications where mesh generation creates a computational bottleneck. It allows one to trade an expensive mesh generation problem for a scalable algebraic one, without sacrificing compatibility with the divergence operator. We demonstrate the approach by using the MMD operator to obtain a virtual finite-volume discretization of conservation laws on point clouds. Numerical results in the paper confirm the mimetic properties of the method and show that it behaves similarly to standard finite volume methods.
Journal of Computational Physics
We utilize generalized moving least squares (GMLS) to develop meshfree techniques for discretizing hydrodynamic flow problems on manifolds. We use exterior calculus to formulate incompressible hydrodynamic equations in the Stokesian regime and handle the divergence-free constraints via a generalized vector potential. This provides less coordinate-centric descriptions and enables the development of efficient numerical methods and splitting schemes for the fourth-order governing equations in terms of a system of second-order elliptic operators. Using a Hodge decomposition, we develop methods for manifolds having spherical topology. We show the methods exhibit high-order convergence rates for solving hydrodynamic flows on curved surfaces. The methods also provide general high-order approximations for the metric, curvature, and other geometric quantities of the manifold and associated exterior calculus operators. The approaches also can be utilized to develop high-order solvers for other scalar-valued and vector-valued problems on manifolds.
Frontiers in Computational Neuroscience
Historically, neuroscience principles have heavily influenced artificial intelligence (AI), for example the influence of the perceptron model, essentially a simple model of a biological neuron, on artificial neural networks. More recently, notable recent AI advances, for example the growing popularity of reinforcement learning, often appear more aligned with cognitive neuroscience or psychology, focusing on function at a relatively abstract level. At the same time, neuroscience stands poised to enter a new era of large-scale high-resolution data and appears more focused on underlying neural mechanisms or architectures that can, at times, seem rather removed from functional descriptions. While this might seem to foretell a new generation of AI approaches arising from a deeper exploration of neuroscience specifically for AI, the most direct path for achieving this is unclear. Here we discuss cultural differences between the two fields, including divergent priorities that should be considered when leveraging modern-day neuroscience for AI. For example, the two fields feed two very different applications that at times require potentially conflicting perspectives. We highlight small but significant cultural shifts that we feel would greatly facilitate increased synergy between the two fields.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020
A broad set of data science and engineering questions may be organized as graphs, providing a powerful means for describing relational data. Although experts now routinely compute graph algorithms on huge, unstructured graphs using high performance computing (HPC) or cloud resources, this practice hasn't yet broken into the mainstream. Such computations require great expertise, yet users often need rapid prototyping and development to quickly customize existing code. Toward that end, we are exploring the use of the Chapel programming language as a means of making some important graph analytics more accessible, examining the breadth of characteristics that would make for a productive programming environment, one that is expressive, performant, portable, and robust.
Journal of Computational Physics
We describe and analyze a variance reduction approach for Monte Carlo (MC) sampling that accelerates the estimation of statistics of computationally expensive simulation models using an ensemble of models with lower cost. These lower cost models — which are typically lower fidelity with unknown statistics — are used to reduce the variance in statistical estimators relative to a MC estimator with equivalent cost. We derive the conditions under which our proposed approximate control variate framework recovers existing multifidelity variance reduction schemes as special cases. We demonstrate that existing recursive/nested strategies are suboptimal because they use the additional low-fidelity models only to efficiently estimate the unknown mean of the first low-fidelity model. As a result, they cannot achieve variance reduction beyond that of a control variate estimator that uses a single low-fidelity model with known mean. However, there often exists about an order-of-magnitude gap between the maximum achievable variance reduction using all low-fidelity models and that achieved by a single low-fidelity model with known mean. We show that our proposed approach can exploit this gap to achieve greater variance reduction by using non-recursive sampling schemes. The proposed strategy reduces the total cost of accurately estimating statistics, especially in cases where only low-fidelity simulation models are accessible for additional evaluations. Several analytic examples and an example with a hyperbolic PDE describing elastic wave propagation in heterogeneous media are used to illustrate the main features of the methodology.
Arxiv
Abstract not provided.
Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020
Graph partitioning has been an important tool to partition the work among several processors to minimize the communication cost and balance the workload. While accelerator-based supercomputers are emerging to be the standard, the use of graph partitioning becomes even more important as applications are rapidly moving to these architectures. However, there is no scalable, distributed-memory, multi-GPU graph partitioner available for applications. We developed a spectral graph partitioner, Sphynx, using the portable, accelerator-friendly stack of the Trilinos framework. We use Sphnyx to systematically evaluate the various algorithmic choices in spectral partitioning with a focus on GPU performance. We perform those evaluations on irregular graphs, because state-of-the-art partitioners have the most difficulty on them. We demonstrate that Sphynx is up to 17x faster on GPUs compared to the case on CPUs, and up to 580x faster compared to a state-of-the-art multilevel partitioner. Sphynx provides a robust alternative for applications looking for a GPU-based partitioner.
Abstract not provided.
Proceedings - International Symposium on Computer Architecture
Data movement is a significant and growing consumer of energy in modern systems, from specialized low-power accelerators to GPUs with power budgets in the hundreds of Watts. Given the importance of the problem, prior work has proposed designing interconnects on which the energy cost of transmitting a 0 is significantly lower than that of transmitting a 1. With such an interconnect, data movement energy is reduced by encoding the transmitted data such that the number of 1s is minimized. Although promising, these data encoding proposals do not take full advantage of application level semantics. As an example of a neglected optimization opportunity, consider the case of a dot product computation as part of a neural network inference task. The order in which the neural network weights are fetched and processed does not affect correctness, and can be optimized to further reduce data movement energy. This paper presents commutative data reordering (CDR), a hardware-software approach that leverages the commutative property in linear algebra to strategically select the order in which weight matrix coefficients are fetched from memory. To find a low-energy transmission order, weight ordering is modeled as an instance of one of two well-studied problems, the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. This reduction makes it possible to leverage the vast body of work on efficient approximation methods to find a good transmission order. CDR exploits the indirection inherent to sparse matrix formats such that no additional metadata is required to specify the selected order. The hardware modifications required to support CDR are minimal, and incur an area penalty of less than 0.01% when implemented on top of a mobile-class GPU. When applied to 7 neural network inference tasks running on a GPU-based system, CDR respectively reduces average DRAM IO energy by 53.1% and 22.2% over the data bus invert encoding scheme used by LPDDR4, and the recently proposed Base + XOR encoding. These savings are attained with no changes to the mobile system software and no runtime performance penalty.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020
Remote Direct Memory Access (RDMA) is an increasingly important technology in high-performance computing (HPC). RDMA provides low-latency, high-bandwidth data transfer between compute nodes. Additionally, it does not require explicit synchronization with the destination processor. Eliminating unnecessary synchronization can significantly improve the communication performance of large-scale scientific codes. A long-standing challenge presented by RDMA communication is mitigating the cost of registering memory with the network interface controller (NIC). Reusing memory once it is registered has been shown to significantly reduce the cost of RDMA communication. However, existing approaches for reusing memory rely on implicit memory semantics. In this paper, we introduce an approach that makes memory reuse semantics explicit by exposing a separate allocator for registered memory. The data and analysis in this paper yield the following contributions: (i) managing registered memory explicitly enables efficient reuse of registered memory; (ii) registering large memory regions to amortize the registration cost over multiple user requests can significantly reduce cost of acquiring new registered memory; and (iii) reducing the cost of acquiring registered memory can significantly improve the performance of RDMA communication. Reusing registered memory is key to high-performance RDMA communication. By making reuse semantics explicit, our approach has the potential to improve RDMA performance by making it significantly easier for programmers to efficiently reuse registered memory.
Abstract not provided.
Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
Generally, scientific simulations load the entire simulation domain into memory because most, if not all, of the data changes with each time step. This has driven application structures that have, in turn, affected the design of popular IO libraries, such as HDF-5, ADIOS, and NetCDF. This assumption makes sense for many cases, but there is also a significant collection of simulations where this approach results in vast swaths of unchanged data written each time step.This paper explores a new IO approach that is capable of stitching together a coherent global view of the total simulation space at any given time. This benefit is achieved with no performance penalty compared to running with the full data set in memory, at a radically smaller process requirement, and results in radical data reduction with no fidelity loss. Additionally, the structures employed enable online simulation monitoring.