Publications Search

Polynomial Preconditioned GMRES for GPU Computing

Loe, Jennifer A.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

One-Synch CGS2 Algorithm in the Context of QR and Arnoldi (DCGS2)

Bielich, Daniel R.W.; Langou, Julien; Thomas, Stephen; Swirydowicz, Kasia; Yamazaki, Ichitaro; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Exagraph: Combinatorial Methods for Enabling Exascale Science

Halappanavar, Mahantesh; Acer, Seher; Boman, Erik G.; Buluc, Aydin; Ekanayate, Saliya; Feerdous, Sm; Gawande, Nitin; Ghosh, Sayan; Khan, Arif; Minotoli, Marco; Pothen, Alex; Rajamanickam, Sivasankaran; Selvitopi, Oguz; Tallent, Nathan; Tumeo, Antonio

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Sake: Solvers and Kernels for Exascale

Rajamanickam, Sivasankaran; Berger-Vergiat, Luc; Boman, Erik G.; Yamazaki, Ichitaro

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

PEEKS Overview

Boman, Erik G.; Bielich, Daniel R.W.; Loe, Jennifer A.; Yamazaki, Ichitaro

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Sake: Solvers and Kernels for Exascale

Rajamanickam, Sivasankaran; Berger-Vergiat, Luc; Yamazaki, Ichitaro; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2021

DOI OSTI

ExaGraph: Partitioning and Coloring

Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran; Acer, Seher; Bogle, Ian; Slota, George; Madduri, Kamesh; Gilbert, Michael

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Polynomial Preconditioning in Trilinos

Loe, Jennifer A.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Multiprecision Krylov Solvers in Kokkos and Belos

Loe, Jennifer A.; Glusa, Christian; Yamazaki, Ichitaro; Boman, Erik G.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Hierarchical Low-rank Solvers for Large Sparse Linear Systems

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

Loe, Jennifer A.; Glusa, Christian; Yamazaki, Ichitaro; Boman, Erik G.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2021

DOI OSTI

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin; Wong, Ryan; Xiao, Tianyao P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI Scopus

An Analog Preconditioner for Solving Linear Systems

Proceedings - International Symposium on High-Performance Computer Architecture

Feinberg, Benjamin; Wong, Ryan; Xiao, Tianyao P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Over the past decade as Moore's Law has slowed, the need for new forms of computation that can provide sustainable performance improvements has risen. A new method, called in situ computing, has shown great potential to accelerate matrix vector multiplication (MVM), an important kernel for a diverse range of applications from neural networks to scientific computing. Existing in situ accelerators for scientific computing, however, have a significant limitation: These accelerators provide no acceleration for preconditioning-A key bottleneck in linear solvers and in scientific computing workflows. This paper enables in situ acceleration for state-of-The-Art linear solvers by demonstrating how to use a new in situ matrix inversion accelerator for analog preconditioning. As existing techniques that enable high precision and scalability for in situ MVM are inapplicable to in situ matrix inversion, new techniques to compensate for circuit non-idealities are proposed. Additionally, a new approach to bit slicing that enables splitting operands across multiple devices without external digital logic is proposed. For scalability, this paper demonstrates how in situ matrix inversion kernels can work in tandem with existing domain decomposition techniques to accelerate the solutions of arbitrarily large linear systems. The analog kernel can be directly integrated into existing preconditioning workflows, leveraging several well-optimized numerical linear algebra tools to improve the behavior of the circuit. The result is an analog preconditioner that is more effective (up to 50% fewer iterations) than the widely used incomplete LU factorization preconditioner, ILU(0), while also reducing the energy and execution time of each approximate solve operation by 1025x and 105x respectively.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI Scopus

Scalable asynchronous domain decomposition solvers

SIAM Journal on Scientific Computing

Glusa, Christian; Boman, Erik G.; Chow, Edmond; Rajamanickam, Sivasankaran; Szyld, Daniel B.

Parallel implementations of linear iterative solvers generally alternate between phases of data exchange and phases of local computation. Increasingly large problem sizes and more heterogeneous compute architectures make load balancing and the design of low latency network interconnects that are able to satisfy the communication requirements of linear solvers very challenging tasks. In particular, global communication patterns such as inner products become increasingly limiting at scale. We explore the use of asynchronous communication based on one-sided Message Passing Interface primitives in the context of domain decomposition solvers. In particular, a scalable asynchronous two-level Schwarz method is presented. We discuss practical issues encountered in the development of a scalable solver and show experimental results obtained on a state-of-the-art supercomputer system that illustrate the benefits of asynchronous solvers in load balanced as well as load imbalanced scenarios. Using the novel method, we can observe speedups of up to four times over its classical synchronous equivalent.

More Details

TYPE Journal Article YEAR 2020

DOI OSTI Scopus

ECP ST Review : CLOVER PEEKS

Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

An Analog Preconditioner for Solving Linear Systems

Feinberg, Benjamin; Wong, Ryan; Xiao, Tianyao P.; Rohan, Jacob N.; Boman, Erik G.; Marinella, Matthew; Agarwal, Sapan; Ipek, Engin

Abstract not provided.

More Details

TYPE Conference Paper YEAR 2020

DOI OSTI

ECP Multiprecision Project Review Slides

Loe, Jennifer A.; Rajamanickam, Sivasankaran; Boman, Erik G.; Anzt, Hartwig

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Towards Use of Mixed Precision in ECP Math Libraries [Exascale Computing Project]

Antz, Hartwig; Boman, Erik G.; Gates, Mark; Kruger, Scott; Li, Sherry; Loe, Jennifer A.; Osei-Kuffuor, Daniel; Tomov, Stan; Tsai, Yaohung M.; Meier Yang, Ulrike

The use of multiple types of precision in mathematical software has the potential to increase its performance on new heterogeneous architectures. The xSDK project focuses both on the investigation and development of multiprecision algorithms as well as their inclusion into xSDK member libraries. This report summarizes current efforts on including and/or using mixed precision capabilities in the math libraries Ginkgo, heFFTe, hypre, MAGMA, PETSc/TAO, SLATE, SuperLU, and Trilinos, including KokkosKernels. It contains both numerical results from libraries that already provide mixed precision capabilities, as well as descriptions of the strategies to incorporate multiprecision into established libraries.

More Details

TYPE Other Report YEAR 2020

DOI OSTI

Mixed-Precision GMRES in Trilinos

Loe, Jennifer A.; Glusa, Christian; Yamazaki, Ichitaro; Boman, Erik G.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2020

DOI OSTI

Multiprecision GMRES in Trilinos packages Belos and Kokkos

Loe, Jennifer A.; Glusa, Christian; Boman, Erik G.; Yamazaki, Ichitaro; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2020

DOI OSTI

Multiprecision Krylov Solvers in Trilinos

Loe, Jennifer A.; Glusa, Christian; Boman, Erik G.; Yamazaki, Ichitaro; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Distributed Memory Graph Coloring Algorithms for Multiple GPUs

Proceedings of IA3 2020: 10th Workshop on Irregular Applications: Architectures and Algorithms, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Bogle, Ian; Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran; Slota, George M.

Graph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring on a single GPU or in distributed memory, but hybrid MPI+GPU algorithms have been unexplored until this work, to the best of our knowledge. We present several MPI+GPU coloring approaches that use implementations of the distributed coloring algorithms of Gebremedhin et al. and the shared-memory algorithms of Deveci et al. The on-node parallel coloring uses implementations in KokkosKernels, which provide parallelization for both multicore CPUs and GPUs. We further extend our approaches to solve for distance-2 coloring, giving the first known distributed and multi-GPU algorithm for this problem. In addition, we propose novel methods to reduce communication in distributed graph coloring. Our experiments show that our approaches operate efficiently on inputs too large to fit on a single GPU and scale up to graphs with 76.7 billion edges running on 128 GPUs.

More Details

TYPE Conference Paper YEAR 2020

OSTI Scopus

Distributed Graph Coloring on Multiple GPUs

Bogle, Ian; Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran; Slota, George

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2020

DOI OSTI

Distributed Memory Graph Coloring Algorithms for Multiple GPUs

Bogle, Ian; Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran; Slota, George M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

SPHYNX: Spectral partitioning for HYbrid and aXelerator-enabled systems

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020

Acer, Seher; Boman, Erik G.; Rajamanickam, Sivasankaran

Graph partitioning has been an important tool to partition the work among several processors to minimize the communication cost and balance the workload. While accelerator-based supercomputers are emerging to be the standard, the use of graph partitioning becomes even more important as applications are rapidly moving to these architectures. However, there is no scalable, distributed-memory, multi-GPU graph partitioner available for applications. We developed a spectral graph partitioner, Sphynx, using the portable, accelerator-friendly stack of the Trilinos framework. We use Sphnyx to systematically evaluate the various algorithmic choices in spectral partitioning with a focus on GPU performance. We perform those evaluations on irregular graphs, because state-of-the-art partitioners have the most difficulty on them. We demonstrate that Sphynx is up to 17x faster on GPUs compared to the case on CPUs, and up to 580x faster compared to a state-of-the-art multilevel partitioner. Sphynx provides a robust alternative for applications looking for a GPU-based partitioner.

More Details

TYPE Conference Poster YEAR 2020

OSTI Scopus

Publications

Search results