Publications

Results 851–875 of 9,998

Search results

Jump to search filters

Large-Scale Trajectory Analysis via Feature Vectors

Foulk, James W.; Jones, Jessica L.; Newton, Benjamin D.; Wisniewski, Kyra L.; Wilson, Andrew T.; Ginaldi, Melissa J.; Waddell, Cleveland A.; Goss, Kenneth; Ward, Katrina J.

The explosion of both sensors and GPS-enabled devices has resulted in position/time data being the next big frontier for data analytics. However, many of the problems associated with large numbers of trajectories do not necessarily have an analog with many of the historic big-data applications such as text and image analysis. Modern trajectory analytics exploits much of the cutting-edge research in machine-learning, statistics, computational geometry and other disciplines. We will show that for doing trajectory analytics at scale, it is necessary to fundamentally change the way the information is represented through a feature-vector approach. We then demonstrate the ability to solve large trajectory analytics problems using this representation.

More Details

Parallel Solver Framework for Mixed-Integer PDE-Constrained Optimization

Phillips, Cynthia A.; Chatter, Michelle; Eckstein, Jonathan; Erturk, Alper; El-Kady, Ihab F.; Gerbe, Romain; Kouri, Drew P.; Loughlin, William; Reinke, Charles M.; Rokkam, Rohith; Ruzzene, Massimo; Sugino, Christopher; Swanson, Calvin; Van Bloemen Waanders, Bart

ROL-PEBBL is a C++, MPI-based parallel code for mixed-integer PDE-constrained optimization (MIPDECO). In these problems we wish to optimize (control, design, etc.) physical systems, which must obey the laws of physics, when some of the decision variables must take integer values. ROL-PEBBL combines a code to efficiently search over integer choices (PEBBL = Parallel Enumeration Branch-and-Bound Library) and a code for efficient nonlinear optimization, including PDE-constrained optimization (ROL = Rapid Optimization Library). In this report, we summarize the design of ROL-PEBBL and initial applications/results. For an artificial source-inversion problem, finding sources of pollution on a grid from sparse samples, ROL-PEBBLs solution for the nest grid gave the best optimization guarantee for any general solver that gives both a solution and a quality guarantee.

More Details

The DPG Method for the Convection-Reaction Problem Revisited

Demkowicz, Leszek; Roberts, Nathan V.

We study both conforming and non-conforming versions of the practical DPG method for the convection-reaction problem. We determine that the most common approach for DPG stability analysis (construction of a local Fortin operator) is infeasible for the convection-reaction problem. We then develop a line of argument based on the direct construction of a global Fortin operator; we find that employing a polynomial enrichment for the test space does not suffice for this purpose, motivating the introduction of a (two-element) subgrid mesh. The argument combines mathematical analysis with numerical experiments

More Details

Codesign for the Masses

Lewis, Cannada; Hammond, Simon; Wilke, Jeremiah

In this position paper we will address challenges and opportunities relating to the design and codesign of application specific circuits. Given our background as computational scientists, our perspective is from the viewpoint of a highly motivated application developer as opposed to career computer architects

More Details

Using MLIR Framework for Codesign of ML Architectures Algorithms and Simulation Tools

Lewis, Cannada; Hughes, Clayton; Hammond, Simon; Rajamanickam, Sivasankaran

MLIR (Multi-Level Intermediate Representation), is an extensible compiler framework that supports high-level data structures and operation constructs. These higher-level code representations are particularly applicable to the artificial intelligence and machine learning (AI/ML) domain, allowing developers to more easily support upcoming heterogeneous AI/ML accelerators and develop flexible domain specific compilers/frameworks with higher-level intermediate representations (IRs) and advanced compiler optimizations. The result of using MLIR within the LLVM compiler framework is expected to yield significant improvement in the quality of generated machine code, which in turn will result in improved performance and hardware efficiency

More Details

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin; Wong, Ryan; Xiao, Tianyao P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin; Wong, Ryan; Xiao, Tianyao P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

DeACT: Architecture-Aware Virtual Memory Support for Fabric Attached Memory Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Kommareddy, Vamsee R.; Hughes, Clayton; Hammond, Simon; Awad, Amro

1 The exponential growth of data has driven technology providers to develop new protocols, such as cache coherent interconnects and memory semantic fabrics, to help users and facilities leverage advances in memory technologies to satisfy these growing memory and storage demands. Using these new protocols, fabric-Attached memories (FAM) can be directly attached to a system interconnect and be easily integrated with a variety of processing elements (PEs). Moreover, systems that support FAM can be smoothly upgraded and allow multiple PEs to share the FAM memory pools using well-defined protocols. The sharing of FAM between PEs allows efficient data sharing, improves memory utilization, reduces cost by allowing flexible integration of different PEs and memory modules from several vendors, and makes it easier to upgrade the system. One promising use-case for FAMs is in High-Performance Compute (HPC) systems, where the underutilization of memory is a major challenge. However, adopting FAMs in HPC systems brings new challenges. In addition to cost, flexibility, and efficiency, one particular problem that requires rethinking is virtual memory support for security and performance. To address these challenges, this paper presents decoupled access control and address translation (DeACT), a novel virtual memory implementation that supports HPC systems equipped with FAM. Compared to the state-of-The-Art two-level translation approach, DeACT achieves speedup of up to 4.59x (1.8x on average) without compromising security.1Part of this work was done when Vamsee was working under the supervision of Amro Awad at UCF. Amro Awad is now with the ECE Department at NC State.

More Details

Data-driven learning of nonlocal physics from high-fidelity synthetic data

Computer Methods in Applied Mechanics and Engineering

You, Huaiqian; Yu, Yue; Trask, Nathaniel A.; Gulian, Mamikon; D'Elia, Marta

A key challenge to nonlocal models is the analytical complexity of deriving them from first principles, and frequently their use is justified a posteriori. In this work we extract nonlocal models from data, circumventing these challenges and providing data-driven justification for the resulting model form. Extracting data-driven surrogates is a major challenge for machine learning (ML) approaches, due to nonlinearities and lack of convexity — it is particularly challenging to extract surrogates which are provably well-posed and numerically stable. Our scheme not only yields a convex optimization problem, but also allows extraction of nonlocal models whose kernels may be partially negative while maintaining well-posedness even in small-data regimes. To achieve this, based on established nonlocal theory, we embed in our algorithm sufficient conditions on the non-positive part of the kernel that guarantee well-posedness of the learnt operator. These conditions are imposed as inequality constraints to meet the requisite conditions of the nonlocal theory. We demonstrate this workflow for a range of applications, including reproduction of manufactured nonlocal kernels; numerical homogenization of Darcy flow associated with a heterogeneous periodic microstructure; nonlocal approximation to high-order local transport phenomena; and approximation of globally supported fractional diffusion operators by truncated kernels.

More Details
Results 851–875 of 9,998
Results 851–875 of 9,998