The Kokkos EcoSystem
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
IEEE Computer Architecture Letters
Neural network (NN) inference is an essential part of modern systems and is found at the heart of numerous applications ranging from image recognition to natural language processing. In situ NN accelerators can efficiently perform NN inference using resistive crossbars, which makes them a promising solution to the data movement challenges faced by conventional architectures. Although such accelerators demonstrate significant potential for dense NNs, they often do not benefit from sparse NNs, which contain relatively few non-zero weights. Processing sparse NNs on in situ accelerators results in wasted energy to charge the entire crossbar where most elements are zeros. To address this limitation, this letter proposes Granular Matrix Reordering (GMR): a preprocessing technique that enables an energy-efficient computation of sparse NNs on in situ accelerators. GMR reorders the rows and columns of sparse weight matrices to maximize the crossbars' utilization and minimize the total number of crossbars needed to be charged. The reordering process does not rely on sparsity patterns and incurs no accuracy loss. Overall, GMR achieves an average of 28 percent and up to 34 percent reduction in energy consumption over seven pruned NNs across four different pruning methods and network architectures.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Computer Methods in Applied Mechanics and Engineering
Standard approaches for uncertainty quantification in cardiovascular modeling pose challenges due to the large number of uncertain inputs and the significant computational cost of realistic three-dimensional simulations. We propose an efficient uncertainty quantification framework utilizing a multilevel multifidelity Monte Carlo (MLMF) estimator to improve the accuracy of hemodynamic quantities of interest while maintaining reasonable computational cost. This is achieved by leveraging three cardiovascular model fidelities, each with varying spatial resolution to rigorously quantify the variability in hemodynamic outputs. We employ two low-fidelity models (zero- and one-dimensional) to construct several different estimators. Our goal is to investigate and compare the efficiency of estimators built from combinations of these two low-fidelity model alternatives and our high-fidelity three-dimensional models. We demonstrate this framework on healthy and diseased models of aortic and coronary anatomy, including uncertainties in material property and boundary condition parameters. Our goal is to demonstrate that for this application it is possible to accelerate the convergence of the estimators by utilizing a MLMF paradigm. Therefore, we compare our approach to single fidelity Monte Carlo estimators and to a multilevel Monte Carlo approach based only on three-dimensional simulations, but leveraging multiple spatial resolutions. We demonstrate significant, on the order of 10 to 100 times, reduction in total computational cost with the MLMF estimators. We also examine the differing properties of the MLMF estimators in healthy versus diseased models, as well as global versus local quantities of interest. As expected, global quantities such as outlet pressure and flow show larger reductions than local quantities, such as those relating to wall shear stress, as the latter rely more heavily on the highest fidelity model evaluations. Similarly, healthy models show larger reductions than diseased models. In all cases, our workflow coupling Dakota's MLMF estimators with the SimVascular cardiovascular modeling framework makes uncertainty quantification feasible for constrained computational budgets.
Proceedings of the ACM SIGMOD International Conference on Management of Data
Given an input stream of size N, a †-heavy hitter is an item that occurs at least † N times in S. The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = † N-th occurrence (and hence becomes a heavy hitter). We call this the Timely Event Detection (TED) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams, and with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (ω(N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable trade-off between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device's random I/O throughput, i.e., ∼100K observations per second.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Computational Physics
In this work, a stabilized continuous Galerkin (CG) method for magnetohydrodynamics (MHD) is presented. Ideal, compressible inviscid MHD equations are discretized in space on unstructured meshes using piecewise linear or bilinear finite element bases to get a semi-discrete scheme. Stabilization is then introduced to the semi-discrete method in a strategy that follows the algebraic flux correction paradigm. This involves adding some artificial diffusion to the high order, semi-discrete method and mass lumping in the time derivative term. The result is a low order method that provides local extremum diminishing properties for hyperbolic systems. The difference between the low order method and the high order method is scaled element-wise using a limiter and added to the low order scheme. The limiter is solution dependent and computed via an iterative linearity preserving nodal variation limiting strategy. The stabilization also involves an optional consistent background high order dissipation that reduces phase errors. The resulting stabilized scheme is a semi-discrete method that can be applied to inviscid shock MHD problems and may be even extended to resistive and viscous MHD problems. To satisfy the divergence free constraint of the MHD equations, we add parabolic divergence cleaning to the system. Various time integration methods can be used to discretize the scheme in time. We demonstrate the robustness of the scheme by solving several shock MHD problems.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Leibniz International Proceedings in Informatics, LIPIcs
We study a trajectory analysis problem we call the Trajectory Capture Problem (TCP), in which, for a given input set T of trajectories in the plane, and an integer k-2, we seek to compute a set of k points ("portals") to maximize the total weight of all subtrajectories of T between pairs of portals. This problem naturally arises in trajectory analysis and summarization. We show that the TCP is NP-hard (even in very special cases) and give some first approximation results. Our main focus is on attacking the TCP with practical algorithm-engineering approaches, including integer linear programming (to solve instances to provable optimality) and local search methods. We study the integrality gap arising from such approaches. We analyze our methods on different classes of data, including benchmark instances that we generate. Our goal is to understand the best performing heuristics, based on both solution time and solution quality. We demonstrate that we are able to compute provably optimal solutions for real-world instances. 2012 ACM Subject Classification Theory of computation ! Design and analysis of algorithms.