Publications Search

Solving Laplacian linear systems is an important task in a variety of practical and theoretical applications. This problem is known to have solutions that perform in linear times polylogarithmic work in theory, but these algorithms are difficult to implement in practice. We examine existing solution techniques in order to determine the best methods currently available and for which types of problems are they useful. We perform timing experiments using a variety of solvers on a variety of problems and present our results. We discover differing solver behavior between web graphs and a class of synthetic graphs designed to model them.

More Details

TYPE Conference Poster YEAR 2016

DOI OSTI Scopus

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran; Yamazaki, I.; Boman, Erik G.; Prokopenko, Andrey V.; Heroux, Michael A.; Dongarra, J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Solving Graph Laplacians for Complex Networks

Boman, Erik G.; Deweese, Kevin; Gilbert, John R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

ShyLU and Thread Scalable Subdomain Solvers

Rajamanickam, Sivasankaran; Boman, Erik G.; Bradley, Andrew M.; Booth, Joshua D.; Deveci, Mehmet; Kim, Kyungjoo; Dohrmann, Clark R.; Thornquist, Heidi K.; Chow, Edmond; Patel, Aftab

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

ShyLU: On node Solvers and Kokkos-Kernels

Rajamanickam, Sivasankaran; Boman, Erik G.; Bradley, Andrew M.; Booth, Joshua D.; Kim, Kyungjoo; Deveci, Mehmet

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran; Yamazaki, Ichitaro; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stan; Dongarra, Jack

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Basker: A Scalable Sparse Direct Linear Solver for Many-Core Architectures

Booth, Joshua D.; Rajamanickam, Sivasankaran; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Evaluating the Potential of a Laplacian Linear Solver

Informatica

Boman, Erik G.; Deweese, Kevin; Gilbert, John R.

A new approach for solving Laplacian linear systems proposed by Kelner et al. involves the random sampling and update of fundamental cycles in a graph. We evaluate the performance of this approach on a variety of real world graphs. We examine di erent ways to choose the set of cycles and their sequence of updates with the goal of providing more exibility and potential parallelism. We propose a parallel model of the Kelner et al. method for evaluating potential parallelism concerned with minimizing the span of edges updated at every iteration. We provide experimental results comparing the potential parallelism of the fundamental cycle basis and the extended basis. Our preliminary experiments show that choosing a non-fundamental set of cycles can save signi cant work compared to a fundamental cycle basis.

More Details

TYPE Journal Article YEAR 2015

OSTI

High-Performance Computing for Extreme-Scale Data Analytics

Boman, Erik G.; Madduri, Kamesh; Rajamanickam, Sivasankaran; Wolf, Michael

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Distributing Linear Systems for Parallel Computation

Devine, Karen; Boman, Erik G.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Parallel Graph Coloring

Boman, Erik G.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran; Yamazaki, Ichitaro; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire; Dongarra, Jack

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

The Zoltan2 Toolkit: Partitioning Task Placement Coloring and Ordering

Devine, Karen; Boman, Erik G.; Rajamanickam, Sivasankaran; Leung, Vitus J.; Riesen, Lee A.; Deveci, Mehmet; Catalyurek, Umit

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

2D Partitioning for Scalable Matrix Computations on Scale-Free Graphs

Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Manycore Graph Algorithms and the Kokkos Library

Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

LDRD Report: Scheduling Irregular Algorithms

Boman, Erik G.

This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performance computing to obtain better data locality and thus reduce run times.

More Details

TYPE Other Report YEAR 2014

DOI OSTI

Fast Solvers for Graph Laplacians

Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Installing the Anasazi Eigensolver Package with Application to Some Graph Eigenvalue Problems

Lehoucq, Rich; Boman, Erik G.; Devine, Karen; Thornquist, Heidi K.; Slattengren, Nicole L.

The purpose of this report is to document a basic installation of the Anasazi eigensolver package and provide a brief discussion on the numerical solution of some graph eigenvalue problems.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

Zoltan Three-Slide Overview for ATPESC 2014

Devine, Karen; Rajamanickam, Sivasankaran; Prokopenko, Andrey V.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Computations on Graph Laplacians

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on Distributed GPUs

Boman, Erik G.; Heroux, Michael A.; Hoemmen, Mark F.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Yamazaki, Ichitaro; Rajamanickam, Sivasankaran; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

A Nested Dissection Partitioning Method for Parallel Sparse Matrix-Vector Multiplication

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI DOI

Using 2D Matrix Distributions in Trilinos

Devine, Karen; Boman, Erik G.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

A computational spectral graph theory tutorial

Boman, Erik G.; Devine, Karen; Lehoucq, Rich

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The Zoltan Toolkits: Parallel Partitioning Load Balancing Coloring and Ordering

Devine, Karen; Boman, Erik G.; Rajamanickam, Sivasankaran; Leung, Vitus J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Amoritzing AMG Components Across Problem Sequences

Tuminaro, Raymond S.; Hu, Jonathan J.; Prokopenko, Andrey V.; Siefert, Christopher; Tsuji, Paul H.; Boman, Erik G.; Cyr, Eric C.; Lin, Paul T.; Shadid, John N.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Randomized and Asynchronous Algorithms for Exascale Solvers

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Unsymmetric Nested Dissection Ordering

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable Matrix Computations on Large Scale-Free Graphs Using 2D Graph Partitioning

Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference YEAR 2013

DOI OSTI

Preconditioning for Large Scale-Free Graphs

Boman, Erik G.; Lehoucq, Rich

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Preconditioners for Large Scale-Free Graphs

Boman, Erik G.; Lehoucq, Rich

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Toward Flexible Scalable Algebraic Multigrid Solvers

Tuminaro, Raymond S.; Boman, Erik G.; Hu, Jonathan J.; Prokopenko, Andrey V.; Siefert, Christopher; Tsuji, Paul H.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Randomized and Asynchronous Algorithms for Large Linear Systems

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

A Simple Efficient Preconditioner for Graph Laplacians

Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Combinatorial Scientific Computing for Exascale Systems and Applications

Devine, Karen; Rajamanickam, Sivasankaran; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable Matrix Computations on Large Scale-Free Graphs Using 2D Graph Partitioning

Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference YEAR 2013

DOI OSTI

Trilinos-based Software for Eigenanalysis of Graphs

Boman, Erik G.; Devine, Karen; Lehoucq, Rich; Slattengren, Nicole L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Efficient Computation of Eigenpairs for Large Scale-free Graphs

Boman, Erik G.; Devine, Karen; Lehoucq, Rich; Slattengren, Nicole L.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Scalable matrix computations on large scale-free graphs using 2D graph partitioning

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Boman, Erik G.; Devine, Karen; Rajamanickam, Sivasankaran

Scalable parallel computing is essential for processing large scale-free (power-law) graphs. The distribution of data across processes becomes important on distributed-memory computers with thousands of cores. It has been shown that two dimensional layouts (edge partitioning) can have significant advantages over traditional one-dimensional layouts. However, simple 2D block distribution does not use the structure of the graph, and more advanced 2D partitioning methods are too expensive for large graphs. We propose a new two-dimensional partitioning algorithm that combines graph partitioning with 2D block distribution. The computational cost of the algorithm is essentially the same as 1D graph partitioning. We study the performance of sparse matrix-vector multiplication (SpMV) for scale-free graphs from the web and social networks using several different partitioners and both 1D and 2D data layouts. We show that SpMV run time is reduced by exploiting the graph's structure. Contrary to popular belief, we observe that current graph and hypergraph partitioners often yield relatively good partitions on scale-free graphs. We demonstrate that our new 2D partitioning method consistently outperforms the other methods considered, for both SpMV and an eigensolver, on matrices with up to 1.6 billion nonzeros using up to 16,384 cores. Copyright 2013 ACM.

More Details

TYPE Conference YEAR 2013

DOI OSTI Scopus

ShyLU: A hybrid-hybrid solver for multicore platforms

Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

Rajamanickam, Sivasankaran; Boman, Erik G.; Heroux, Michael A.

With the ubiquity of multicore processors, it is crucial that solvers adapt to the hierarchical structure of modern architectures. We present ShyLU, a "hybrid-hybrid" solver for general sparse linear systems that is hybrid in two ways: First, it combines direct and iterative methods. The iterative part is based on approximate Schur complements where we compute the approximate Schur complement using a value-based dropping strategy or structure-based probing strategy. Second, the solver uses two levels of parallelism via hybrid programming (MPI+threads). ShyLU is useful both in shared-memory environments and on large parallel computers with distributed memory. In the latter case, it should be used as a sub domain solver. We argue that with the increasing complexity of compute nodes, it is important to exploit multiple levels of parallelism even within a single compute node. We show the robustness of ShyLU against other algebraic preconditioners. ShyLU scales well up to 384 cores for a given problem size. We also study the MPI-only performance of ShyLU against a hybrid implementation and conclude that on present multicore nodes MPI-only implementation is better. However, for future multicore machines (96 or more cores) hybrid/ hierarchical algorithms and implementations are important for sustained performance. © 2012 IEEE.

More Details

TYPE Conference YEAR 2012

OSTI Scopus

Multithreaded algorithms for maxmum matching in bipartite graphs

Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

Azad, Ariful; Halappanavar, Mahantesh; Rajamanickam, Sivasankaran; Boman, Erik G.; Khan, Arif; Pothen, Alex

We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in a bipartite graph on multicore and massively multithreaded computers. As computers with larger numbers of slower cores dominate the commodity processor market, the design of multithreaded algorithms to solve large matching problems becomes a necessity. Recent work on serial algorithms for the matching problem has shown that their performance is sensitive to the order in which the vertices are processed for matching. In a multithreaded environment, imposing a serial order in which vertices are considered for matching would lead to loss of concurrency and performance. But this raises the question: Would parallel matching algorithms on multithreaded machines improve performance over a serial algorithm? We answer this question in the affirmative. We report efficient multithreaded implementations of three classes of algorithms based on their manner of searching for augmenting paths: breadth-first-search, depth-first-search, and a combination of both. The Karp-Sipser initialization algorithm is used to make the parallel algorithms practical. We report extensive results and insights using three shared-memory platforms (a 48-core AMD Opteron, a 32-coreIntel Nehalem, and a 128-processor Cray XMT) on a representative set of real-world and synthetic graphs. To the best of our knowledge, this is the first study of augmentation-based parallel algorithms for bipartite cardinality matching that demonstrates good speedups on multithreaded shared memory multiprocessors. © 2012 IEEE.

More Details

TYPE Conference YEAR 2012

Scopus OSTI