Publications

Results 26–50 of 55

Search results

Jump to search filters

Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - In conjunction with IEEE IPDPS 2021

Loe, Jennifer A.; Glusa, Christian A.; Yamazaki, Ichitaro Y.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strategies at the linear algebra algorithms level if we want to exploit the hardware to its full potential while meeting the accuracy requirements. In this paper, we focus on preconditioned sparse iterative linear solvers, a key kernel in several CSE applications. We present a study of multiprecision strategies for accelerating this kernel on GPUs. We seek the best methods for incorporating multiple precisions into the GMRES linear solver; these include iterative refinement and parallelizable preconditioners. Our work presents strategies to determine when multiprecision GMRES will be effective and to choose parameters for a multiprecision iterative refinement solver to achieve better performance. We use an implementation that is based on the Trilinos library and employs Kokkos Kernels for performance portability of linear algebra kernels. Performance results demonstrate the promise of multiprecision approaches and demonstrate even further improvements are possible by optimizing low-level kernels.

More Details

ExaWind: Exascale Predictive Wind Plant Flow Physics Modeling

Sprague, Michael; Ananthan, Shreyas; Binyahib, Roba; Brazell, Michael; De Frahan, Marc H.; King, Ryan A.; Mullowney, Paul; Rood, Jon; Sharma, Ashesh; Thomas, Stephen A.; Vijayakumar, Ganesh; Crozier, Paul C.; Berger-Vergiat, Luc B.; Cheung, Lawrence C.; Dement, David C.; deVelder, Nathaniel d.; Glaze, D.J.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Matula, Neil M.; Okusanya, Tolulope O.; Overfelt, James R.; Rajamanickam, Sivasankaran R.; Sakievich, Philip S.; Smith, Timothy A.; Vo, Johnathan V.; Williams, Alan B.; Yamazaki, Ichitaro Y.; Turner, William J.; Prokopenko, Andrey; Wilson, Robert V.; Moser, Robert; Melvin, Jeremy; Sitaraman, Jay

Abstract not provided.

Performance Portable Supernode-based Sparse Triangular Solver for Manycore Architectures

ACM International Conference Proceeding Series

Yamazaki, Ichitaro Y.; Rajamanickam, Sivasankaran R.; Ellingwood, Nathan D.

Sparse triangular solver is an important kernel in many computational applications. However, a fast, parallel, sparse triangular solver on a manycore architecture such as GPU has been an open issue in the field for several years. In this paper, we develop a sparse triangular solver that takes advantage of the supernodal structures of the triangular matrices that come from the direct factorization of a sparse matrix. We implemented our solver using Kokkos and Kokkos Kernels such that our solver is portable to different manycore architectures. This has the additional benefit of allowing our triangular solver to use the team-level kernels and take advantage of the hierarchical parallelism available on the GPU. We compare the effects of different scheduling schemes on the performance and also investigate an algorithmic variant called the partitioned inverse. Our performance results on an NVIDIA V100 or P100 GPU demonstrate that our implementation can be 12.4 × or 19.5 × faster than the vendor optimized implementation in NVIDIA's CuSPARSE library.

More Details

Compare linear-system solver and preconditioner stacks with emphasis on GPU performance and propose phase-2 NGP solver development pathway

Hu, Jonathan J.; Berger-Vergiat, Luc B.; Thomas, Stephen; Swirydowicz, Kasia; Yamazaki, Ichitaro Y.; Mullowney, Paul; Rajamanickam, Sivasankaran R.; Sitaraman, Jay; Sprague, Michael

The goal of the ExaWind project is to enable predictive simulations of wind farms comprised of many megawatt-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. The primary physics codes in the ExaWind project are Nalu-Wind, which is an unstructured-grid solver for the acoustically incompressible Navier-Stokes equations, and OpenFAST, which is a whole-turbine simulation code. The Nalu-Wind model consists of the mass-continuity Poisson-type equation for pressure and a momentum equation for the velocity. For such modeling approaches, simulation times are dominated by linear-system setup and solution for the continuity and momentum systems. For the ExaWind challenge problem, the moving meshes greatly affect overall solver costs as reinitialization of matrices and recomputation of preconditioners is required at every time step. In this report we evaluated GPU-performance baselines for the linear solvers in the Trilinos and hypre solver stacks using two representative Nalu-Wind simulations: an atmospheric boundary layer precursor simulation on a structured mesh, and a fixed-wing simulation using unstructured overset meshes. Both strong-scaling and weak-scaling experiments were conducted on the OLCF supercomputer Summit and similar proxy clusters. We focused on the performance of multi-threaded Gauss-Seidel and two-stage Gauss-Seidel that are extensions of classical Gauss-Seidel; of one-reduce GMRES, a communication-reducing variant of the Krylov GMRES; and algebraic multigrid methods that incorporate the afore-mentioned methods. The team has established that AMG methods are capable of solving linear systems arising from the fixed-wing overset meshes on CPU, a critical intermediate result for ExaWind FY20 Q3 and Q4 milestones. For the fixed-wing strong-scaling study (model with 3M grid-points), the team identified that Nalu-Wind simulations with the new Trilinos and hypre solvers scale to modest GPU counts, maintaining above 70% efficiency up to 6 GPUs. However, there still remain significant bottlenecks to performance: matrix assembly (hypre), AMG setup (hypre and Trilinos) In the weak-scaling experiments (going from 0.4M to 211M gridpoints), it's shown that the solver apply phases are faster on GPUs, but that Nalu-Wind simulation times grow, primarily due to the multigrid-setup process. Finally, based on the report outcomes, we propose a linear solver path-forward for the remainder of the ExaWind project. Near term, the NREL team will continue their work on GPU-based linear-system assembly. They will also investigate how the use of alternatives to the NVIDIA UVM (unified virtual memory) paradigm affects performance. Longer term, the NREL team will evaluate algorithmic performance on other types of accelerators and merge their improvements back to the main hypre repository branch. Near term, the Trilinos team will address performance bottlenecks identified in this milestone, such as implementing a GPU-based segregated momentum solve and reusing matrix graphs across linear-system assembly phases. Longer term, the Trilinos team will do detailed analysis and optimization of multigrid setup.

More Details

Preparing sparse solvers for exascale computing

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

Heroux, Michael A.; Anzt, Hartwig; Boman, Erik G.; Falgout, Rob; Ghysels, Pieter; Li, Xiaoye; Mcinnes, Lois C.; Mills, Richard T.; Rajamanickam, Sivasankaran R.; Rupp, Karl; Smith, Bryce B.; Yamazaki, Ichitaro Y.; Yang, Ulrike M.

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.

More Details

ExaWind: Exascale Predictive Wind Plant Flow Physics Modeling

Sprague, M.; Ananthan, S.; Brazell, M.; Glaws, A.; De Frahan, M.; King, R.; Natarajan, M.; Rood, J.; Sharma, A.; Sirydowicz, K.; Thomas, S.; Vijaykumar, G.; Yellapantula, S.; Crozier, Paul C.; Berger-Vergiat, Luc B.; Cheung, Lawrence C.; Glaze, D.J.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Okusanya, Tolulope O.; Overfelt, James R.; Rajamanickam, Sivasankaran R.; Sakievich, Philip S.; Smith, Timothy A.; Vo, Johnathan V.; Williams, Alan B.; Yamazaki, Ichitaro Y.; Turner, J.; Prokopenko, A.; Wilson, R.; Moser, R.; Melvin, J.; Sitaraman, J.

Abstract not provided.

Fast and Robust Linear Solvers based on Hierarchical Matrices (LDRD Final Report)

Boman, Erik G.; Darve, Eric; Lehoucq, Richard B.; Rajamanickam, Sivasankaran R.; Tuminaro, Raymond S.; Yamazaki, Ichitaro Y.

This report is the final report for the LDRD project "Fast and Robust Linear Solvers using Hierarchical Matrices". The project was a success. We developed two novel algorithms for solving sparse linear systems. We demonstrated their effectiveness on ill-conditioned linear systems from ice sheet simulations. We showed that in many cases, we can obtain near-linear scaling. We believe this approach has strong potential for difficult linear systems and should be considered for other Sandia and DOE applications. We also report on some related research activities in dense solvers and randomized linear algebra.

More Details
Results 26–50 of 55
Results 26–50 of 55