Publications

Results 126–150 of 354

Search results

Jump to search filters

SPHYNX: Spectral partitioning for HYbrid and aXelerator-enabled systems

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020

Acer, Seher; Boman, Erik G.; Rajamanickam, Sivasankaran

Graph partitioning has been an important tool to partition the work among several processors to minimize the communication cost and balance the workload. While accelerator-based supercomputers are emerging to be the standard, the use of graph partitioning becomes even more important as applications are rapidly moving to these architectures. However, there is no scalable, distributed-memory, multi-GPU graph partitioner available for applications. We developed a spectral graph partitioner, Sphynx, using the portable, accelerator-friendly stack of the Trilinos framework. We use Sphnyx to systematically evaluate the various algorithmic choices in spectral partitioning with a focus on GPU performance. We perform those evaluations on irregular graphs, because state-of-the-art partitioners have the most difficulty on them. We demonstrate that Sphynx is up to 17x faster on GPUs compared to the case on CPUs, and up to 580x faster compared to a state-of-the-art multilevel partitioner. Sphynx provides a robust alternative for applications looking for a GPU-based partitioner.

More Details

ECP Report: Update on Proxy Applications and Vendor Interactions

Ang, Jim; Sweeney, Christine; Wolf, Michael; Ellis, John A.; Ghosh, Sayan; Kagawa, Ai; Huang, Yunzhi; Rajamanickam, Sivasankaran; Ramakrishnaiah, Vinay; Schram, Malachi; Yoo, Shinjae

The ExaLearn miniGAN team (Ellis and Rajamanickam) have released miniGAN, a generative adversarial network(GAN) proxy application, through the ECP proxy application suite. miniGAN is the first machine learning proxy application in the suite (note: the ECP CANDLE project did previously release some benchmarks) and models the performance for training generator and discriminator networks. The GAN's generator and discriminator generate plausible 2D/3D maps and identify fake maps, respectively. miniGAN aims to be a proxy application for related applications in cosmology (CosmoFlow, ExaGAN) and wind energy (ExaWind). miniGAN has been developed so that optimized mathematical kernels (e.g., kernels provided by Kokkos Kernels) can be plugged into to the proxy application to explore potential performance improvements. miniGAN has been released as open source software and is available through the ECP proxy application website (https://proxyapps.exascaleproject.ordecp-proxy-appssuite/) and on GitHub (https://github.com/SandiaMLMiniApps/miniGAN). As part of this release, a generator is provided to generate a data set (series of images) that are inputs to the proxy application.

More Details

Preparing sparse solvers for exascale computing

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

Heroux, Michael A.; Anzt, Hartwig; Boman, Erik G.; Falgout, Rob; Ghysels, Pieter; Li, Xiaoye; Mcinnes, Lois C.; Mills, Richard T.; Rajamanickam, Sivasankaran; Rupp, Karl; Smith, Bryce; Yamazaki, Ichitaro; Yang, Ulrike M.

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.

More Details

How Robust Are Graph Neural Networks to Structural Noise?

Fox, James S.; Rajamanickam, Sivasankaran

Graph neural networks (GNNs) are an emerging model for learning graph embeddings and making predictions on graph structured data. However, robustness of graph neural networks is not yet well-understood. In this work, we focus on node structural identity predictions, where a representative GNN model is able to achieve near-perfect accuracy. We also show that the same GNN model is not robust to addition of structural noise, through a controlled dataset and set of experiments. Finally, we show that under the right conditions, graph-augmented training is capable of significantly improving robustness to structural noise.

More Details

An algebraic sparsified nested dissection algorithm using low-rank approximations

SIAM Journal on Matrix Analysis and Applications

Cambier, Leopold; Boman, Erik G.; Rajamanickam, Sivasankaran; Tuminaro, Raymond S.; Darve, Eric

We propose a new algorithm for the fast solution of large, sparse, symmetric positive-definite linear systems, spaND (sparsified Nested Dissection). It is based on nested dissection, sparsification, and low-rank compression. After eliminating all interiors at a given level of the elimination tree, the algorithm sparsifies all separators corresponding to the interiors. This operation reduces the size of the separators by eliminating some degrees of freedom but without introducing any fill-in. This is done at the expense of a small and controllable approximation error. The result is an approximate factorization that can be used as an efficient preconditioner. We then perform several numerical experiments to evaluate this algorithm. We demonstrate that a version using orthogonal factorization and block-diagonal scaling takes fewer CG iterations to converge than previous similar algorithms on various kinds of problems. Furthermore, this algorithm is provably guaranteed to never break down and the matrix stays symmetric positive-definite throughout the process. We evaluate the algorithm on some large problems show it exhibits near-linear scaling. The factorization time is roughly \scrO (N), and the number of iterations grows slowly with N.

More Details

FROSch: A Fast And Robust Overlapping Schwarz Domain Decomposition Preconditioner Based on Xpetra in Trilinos

Lecture Notes in Computational Science and Engineering

Heinlein, Alexander; Klawonn, Axel; Rajamanickam, Sivasankaran; Rheinbach, Oliver

This article describes a parallel implementation of a two-level overlapping Schwarz preconditioner with the GDSW (Generalized Dryja–Smith–Widlund) coarse space described in previous work [12, 10, 15] into the Trilinos framework; cf. [16]. The software is a significant improvement of a previous implementation [12]; see Sec. 4 for results on the improved performance.

More Details

A Portable SIMD Primitive Using Kokkos for Heterogeneous Architectures

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Sahasrabudhe, Damodar; Phipps, Eric T.; Rajamanickam, Sivasankaran; Berzins, Martin

As computer architectures are rapidly evolving (e.g. those designed for exascale), multiple portability frameworks have been developed to avoid new architecture-specific development and tuning. However, portability frameworks depend on compilers for auto-vectorization and may lack support for explicit vectorization on heterogeneous platforms. Alternatively, programmers can use intrinsics-based primitives to achieve more efficient vectorization, but the lack of a gpu back-end for these primitives makes such code non-portable. A unified, portable, Single Instruction Multiple Data (simd) primitive proposed in this work, allows intrinsics-based vectorization on cpus and many-core architectures such as Intel Knights Landing (knl), and also facilitates Single Instruction Multiple Threads (simt) based execution on gpus. This unified primitive, coupled with the Kokkos portability ecosystem, makes it possible to develop explicitly vectorized code, which is portable across heterogeneous platforms. The new simd primitive is used on different architectures to test the performance boost against hard-to-auto-vectorize baseline, to measure the overhead against efficiently vectroized baseline, and to evaluate the new feature called the “logical vector length” (lvl). The simd primitive provides portability across cpus and gpus without any performance degradation being observed experimentally.

More Details

FROSch: A Fast And Robust Overlapping Schwarz Domain Decomposition Preconditioner Based on Xpetra in Trilinos

Lecture Notes in Computational Science and Engineering

Heinlein, Alexander; Klawonn, Axel; Rajamanickam, Sivasankaran; Rheinbach, Oliver

This article describes a parallel implementation of a two-level overlapping Schwarz preconditioner with the GDSW (Generalized Dryja–Smith–Widlund) coarse space described in previous work [12, 10, 15] into the Trilinos framework; cf. [16]. The software is a significant improvement of a previous implementation [12]; see Sec. 4 for results on the improved performance.

More Details

ExaWind: Exascale Predictive Wind Plant Flow Physics Modeling

Sprague, M.; Ananthan, S.; Brazell, M.; Glaws, A.; De Frahan, M.; King, R.; Natarajan, M.; Rood, J.; Sharma, A.; Sirydowicz, K.; Thomas, S.; Vijaykumar, G.; Yellapantula, S.; Crozier, Paul; Berger-Vergiat, Luc; Cheung, Lawrence; Glaze, David J.; Hu, Jonathan J.; Knaus, Robert C.; Lee, Dong H.; Okusanya, Tolulope O.; Overfelt, James R.; Rajamanickam, Sivasankaran; Sakievich, Philip; Smith, Timothy A.; Vo, Johnathan; Williams, Alan B.; Yamazaki, Ichitaro; Turner, J.; Prokopenko, A.; Wilson, R.; Moser, R.; Melvin, J.; Sitaraman, J.

Abstract not provided.

Results 126–150 of 354
Results 126–150 of 354