Publications Search

LDRD Report: Scheduling Irregular Algorithms

This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performance computing to obtain better data locality and thus reduce run times.

More Details

TYPE Other Report YEAR 2014

DOI OSTI

Fast Solvers for Graph Laplacians

Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Installing the Anasazi Eigensolver Package with Application to Some Graph Eigenvalue Problems

Lehoucq, Richard B.; Boman, Erik G.; Devine, Karen; Thornquist, Heidi K.; Slattengren, Nicole

The purpose of this report is to document a basic installation of the Anasazi eigensolver package and provide a brief discussion on the numerical solution of some graph eigenvalue problems.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

Zoltan Three-Slide Overview for ATPESC 2014

Devine, Karen; Rajamanickam, Sivasankaran; Prokopenko, Andrey V.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Computations on Graph Laplacians

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on Distributed GPUs

Boman, Erik G.; Heroux, Michael A.; Hoemmen, Mark F.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Yamazaki, Ichitaro; Rajamanickam, Sivasankaran; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

A Nested Dissection Partitioning Method for Parallel Sparse Matrix-Vector Multiplication

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI DOI

Using 2D Matrix Distributions in Trilinos

Devine, Karen; Boman, Erik G.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI OSTI

A computational spectral graph theory tutorial

Boman, Erik G.; Devine, Karen; Lehoucq, Richard B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The Zoltan Toolkits: Parallel Partitioning Load Balancing Coloring and Ordering

Devine, Karen; Boman, Erik G.; Rajamanickam, Sivasankaran; Leung, Vitus J.

Abstract not provided.

More Details

TYPE Report YEAR 2013

OSTI OSTI

Amoritzing AMG Components Across Problem Sequences

Tuminaro, Raymond S.; Hu, Jonathan J.; Prokopenko, Andrey V.; Siefert, Christopher; Tsuji, Paul H.; Boman, Erik G.; Cyr, Eric C.; Lin, Paul T.; Shadid, John N.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Randomized and Asynchronous Algorithms for Exascale Solvers

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Unsymmetric Nested Dissection Ordering

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable Matrix Computations on Large Scale-Free Graphs Using 2D Graph Partitioning

Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference YEAR 2013

DOI OSTI

Preconditioning for Large Scale-Free Graphs

Boman, Erik G.; Lehoucq, Richard B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Preconditioners for Large Scale-Free Graphs

Boman, Erik G.; Lehoucq, Richard B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Toward Flexible Scalable Algebraic Multigrid Solvers

Tuminaro, Raymond S.; Boman, Erik G.; Hu, Jonathan J.; Prokopenko, Andrey V.; Siefert, Christopher; Tsuji, Paul H.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Randomized and Asynchronous Algorithms for Large Linear Systems

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

A Simple Efficient Preconditioner for Graph Laplacians

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Combinatorial Scientific Computing for Exascale Systems and Applications

Devine, Karen; Rajamanickam, Sivasankaran; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable Matrix Computations on Large Scale-Free Graphs Using 2D Graph Partitioning

Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference YEAR 2013

DOI OSTI

Trilinos-based Software for Eigenanalysis of Graphs

Boman, Erik G.; Devine, Karen; Lehoucq, Richard B.; Slattengren, Nicole

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Efficient Computation of Eigenpairs for Large Scale-free Graphs

Boman, Erik G.; Devine, Karen; Lehoucq, Richard B.; Slattengren, Nicole

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable matrix computations on large scale-free graphs using 2D graph partitioning

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran

Scalable parallel computing is essential for processing large scale-free (power-law) graphs. The distribution of data across processes becomes important on distributed-memory computers with thousands of cores. It has been shown that two dimensional layouts (edge partitioning) can have significant advantages over traditional one-dimensional layouts. However, simple 2D block distribution does not use the structure of the graph, and more advanced 2D partitioning methods are too expensive for large graphs. We propose a new two-dimensional partitioning algorithm that combines graph partitioning with 2D block distribution. The computational cost of the algorithm is essentially the same as 1D graph partitioning. We study the performance of sparse matrix-vector multiplication (SpMV) for scale-free graphs from the web and social networks using several different partitioners and both 1D and 2D data layouts. We show that SpMV run time is reduced by exploiting the graph's structure. Contrary to popular belief, we observe that current graph and hypergraph partitioners often yield relatively good partitions on scale-free graphs. We demonstrate that our new 2D partitioning method consistently outperforms the other methods considered, for both SpMV and an eigensolver, on matrices with up to 1.6 billion nonzeros using up to 16,384 cores. Copyright 2013 ACM.

More Details

TYPE Conference YEAR 2013

DOI DOI OSTI OSTI Scopus Scopus

Publications

Search results