Publications

Results 26–50 of 210

Towards Use of Mixed Precision in ECP Math Libraries [Exascale Computing Project]

Antz, Hartwig A.; Boman, Erik G.; Gates, Mark G.; Kruger, Scott E.; Li, Sherry L.; Loe, Jennifer A.; Osei-Kuffuor, Daniel O.; Tomov, Stan T.; Tsai, Yaohung M.; Meier Yang, Ulrike M.

The use of multiple types of precision in mathematical software has the potential to increase its performance on new heterogeneous architectures. The xSDK project focuses both on the investigation and development of multiprecision algorithms as well as their inclusion into xSDK member libraries. This report summarizes current efforts on including and/or using mixed precision capabilities in the math libraries Ginkgo, heFFTe, hypre, MAGMA, PETSc/TAO, SLATE, SuperLU, and Trilinos, including KokkosKernels. It contains both numerical results from libraries that already provide mixed precision capabilities, as well as descriptions of the strategies to incorporate multiprecision into established libraries.

More Details

TYPE Other Report YEAR 2020

OSTI DOI

Distributed Memory Graph Coloring Algorithms for Multiple GPUs

Proceedings of IA3 2020: 10th Workshop on Irregular Applications: Architectures and Algorithms, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Bogle, Ian; Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.; Slota, George M.

Graph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring on a single GPU or in distributed memory, but hybrid MPI+GPU algorithms have been unexplored until this work, to the best of our knowledge. We present several MPI+GPU coloring approaches that use implementations of the distributed coloring algorithms of Gebremedhin et al. and the shared-memory algorithms of Deveci et al. The on-node parallel coloring uses implementations in KokkosKernels, which provide parallelization for both multicore CPUs and GPUs. We further extend our approaches to solve for distance-2 coloring, giving the first known distributed and multi-GPU algorithm for this problem. In addition, we propose novel methods to reduce communication in distributed graph coloring. Our experiments show that our approaches operate efficiently on inputs too large to fit on a single GPU and scale up to graphs with 76.7 billion edges running on 128 GPUs.

More Details

TYPE Conference Paper YEAR 2020

Scopus OSTI

Mixed-Precision GMRES in Trilinos

Loe, Jennifer A.; Glusa, Christian A.; Yamazaki, Ichitaro Y.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2020

OSTI DOI

Multiprecision Krylov Solvers in Trilinos

Loe, Jennifer A.; Glusa, Christian A.; Boman, Erik G.; Yamazaki, Ichitaro Y.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Multiprecision GMRES in Trilinos packages Belos and Kokkos

Loe, Jennifer A.; Glusa, Christian A.; Boman, Erik G.; Yamazaki, Ichitaro Y.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2020

OSTI DOI

Distributed Graph Coloring on Multiple GPUs

Bogle, Ian A.; Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.; Slota, George M.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2020

OSTI DOI

Distributed Memory Graph Coloring Algorithms for Multiple GPUs

Bogle, Ian A.; Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.; Slota, George M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

SPHYNX: Spectral partitioning for HYbrid and aXelerator-enabled systems

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020

Acer, Seher A.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Graph partitioning has been an important tool to partition the work among several processors to minimize the communication cost and balance the workload. While accelerator-based supercomputers are emerging to be the standard, the use of graph partitioning becomes even more important as applications are rapidly moving to these architectures. However, there is no scalable, distributed-memory, multi-GPU graph partitioner available for applications. We developed a spectral graph partitioner, Sphynx, using the portable, accelerator-friendly stack of the Trilinos framework. We use Sphnyx to systematically evaluate the various algorithmic choices in spectral partitioning with a focus on GPU performance. We perform those evaluations on irregular graphs, because state-of-the-art partitioners have the most difficulty on them. We demonstrate that Sphynx is up to 17x faster on GPUs compared to the case on CPUs, and up to 580x faster compared to a state-of-the-art multilevel partitioner. Sphynx provides a robust alternative for applications looking for a GPU-based partitioner.

More Details

TYPE Conference Poster YEAR 2020

Scopus OSTI

SPHYNX: Spectral Partitioning for HYbrid aNd aXelerator-based systems

Acer, Seher A.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Low-synchronization orthogonalization schemes for s-step and pipelined Krylov solvers in Trilinos

Yamazaki, Ichitaro Y.; Hoemmen, Mark F.; Boman, Erik G.; Elliott, James E.; Thomas, Stephen T.; Swirydowicz, Katarzyna S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI DOI

CLOVER

Boman, Erik G.; Anzt, Hartwig A.; Dongarra, Jack D.; Gates, Mark G.; Rajamanickam, Sivasankaran R.; Tomov, Stan T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

ExaGraph: Parallel Partitioning and Coloring for Exascale Applications

Acer, Seher A.; Boman, Erik G.; Rajamanickam, Sivasankaran R.; Bogle, Ian A.; Slota, George M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Mixed-precision preconditioned Krylov solvers with Trilinos

Yamazaki, Ichitaro Y.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

2D Block Cyclic Partitioning for Sparse Matrices

Acer, Seher A.; Boman, Erik G.; Aykanat, Cevdet A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Polynomial Preconditioned GMRES in Trilinos: Practical Considerations for High Performance Computing

Loe, Jennifer A.; Thornquist, Heidi K.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI DOI

An algebraic sparsified nested dissection algorithm using low-rank approximations

SIAM Journal on Matrix Analysis and Applications

Cambier, Leopold; Chen, Chao; Boman, Erik G.; Rajamanickam, Sivasankaran R.; Tuminaro, Raymond S.; Darve, Eric

We propose a new algorithm for the fast solution of large, sparse, symmetric positive-definite linear systems, spaND (sparsified Nested Dissection). It is based on nested dissection, sparsification, and low-rank compression. After eliminating all interiors at a given level of the elimination tree, the algorithm sparsifies all separators corresponding to the interiors. This operation reduces the size of the separators by eliminating some degrees of freedom but without introducing any fill-in. This is done at the expense of a small and controllable approximation error. The result is an approximate factorization that can be used as an efficient preconditioner. We then perform several numerical experiments to evaluate this algorithm. We demonstrate that a version using orthogonal factorization and block-diagonal scaling takes fewer CG iterations to converge than previous similar algorithms on various kinds of problems. Furthermore, this algorithm is provably guaranteed to never break down and the matrix stays symmetric positive-definite throughout the process. We evaluate the algorithm on some large problems show it exhibits near-linear scaling. The factorization time is roughly \scrO (N), and the number of iterations grows slowly with N.

More Details

TYPE Journal Article YEAR 2020

Scopus OSTI DOI

ExaGraph: Combinatorial Methods for Enabling Exascale Applications

Halappanavar, Mahantesh H.; Buluc, Aydin B.; Boman, Erik G.; pothen, alex p.; Tumeo, Antonino T.; Khan, Arif K.; Minutoli, Marco M.; Tallent, Nathan T.; Gawande, Nitin G.; Ekanayake, Saliya E.; Ghosh, Sayan G.; Acer, Seher A.; Rajamanickam, Sivasankaran R.; Ferdous, S.M.F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Low-synch Gram Schmidt orthogonalization schemes for s-step and pipelined Krylov solvers in Trilinos

Yamazaki, Ichitaro Y.; Tomas, Stephen T.; Hoemmen, Mark F.; Boman, Erik G.; Swirydowicz, Katarzyna S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Polynomial Preconditioned GMRES in Trilinos: Practical Considerations for High-Performance Computing

Loe, Jennifer A.; Thornquist, Heidi K.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI DOI

LDRD Final Report: Fast and Robust Linear Solvers based on Hierarchical Matrices

Boman, Erik G.; Darve, Eric D.; Lehoucq, Richard B.; Rajamanickam, Sivasankaran R.; Tuminaro, Raymond S.; Yamazaki, Ichitaro Y.

Abstract not provided.

More Details

TYPE Other Report YEAR 2019

OSTI DOI

A robust hierarchical solver for ill-conditioned systems with applications to ice sheet modeling

Journal of Computational Physics

Chen, Chao; Cambier, Leopold; Boman, Erik G.; Rajamanickam, Sivasankaran R.; Tuminaro, Raymond S.; Darve, Eric

A hierarchical solver is proposed for solving sparse ill-conditioned linear systems in parallel. The solver is based on a modification of the LoRaSp method, but employs a deferred-compression technique, which provably reduces the approximation error and significantly improves efficiency. Moreover, the deferred-compression technique introduces minimal overhead and does not affect parallelism. As a result, the new solver achieves linear computational complexity under mild assumptions and excellent parallel scalability. To demonstrate the performance of the new solver, we focus on applying it to solve sparse linear systems arising from ice sheet modeling. The strong anisotropic phenomena associated with the thin structure of ice sheets creates serious challenges for existing solvers. To address the anisotropy, we additionally developed a customized partitioning scheme for the solver, which captures the strong-coupling direction accurately. In general, the partitioning can be computed algebraically with existing software packages, and thus the new solver is generalizable for solving other sparse linear systems. Our results show that ice sheet problems of about 300 million degrees of freedom have been solved in just a few minutes using 1024 processors.

More Details

TYPE Journal Article YEAR 2019

Scopus OSTI DOI

Hierarchical Low-rank Preconditioners and Solvers for Linear Systems from PDEs

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Polynomial Preconditioning for Avoiding Communication in GMRES

Loe, Jennifer A.; Thornquist, Heidi K.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

SpaND: An Algebraic Sparsified Nested Dissection Algorithm Using Low-Rank Approximations

Cambier, Leopold C.; Chen, Chao C.; Boman, Erik G.; Rajamanickam, Sivasankaran R.; Tuminaro, Raymond S.; Darve, Eric D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

SpaND: An Algebraic Sparsified Nested Dissection Algorithm Using Low-Rank Approximations

Boman, Erik G.; Cambier, Leopold C.; Chen, Chao C.; Darve, Eric D.; Rajamanickam, Sivasankaran R.; Tuminaro, Raymond S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Results 26–50 of 210

Results 26–50 of 210