Publications Search

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George W.; Edwards, Harold C.; Olivier, Stephen L.

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran; Yamazaki, I.; Boman, Erik G.; Prokopenko, Andrey V.; Heroux, Michael A.; Dongarra, J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts

Booth, Joshua D.; Rajamanickam, Sivasankaran; Thornquist, Heidi K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI

Task-parallel Sparse Incomplete Cholesky Factorization using Kokkos Portable APIs

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Edwards, Harold C.; Olivier, Stephen L.; Stelle, George W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

ShyLU and Thread Scalable Subdomain Solvers

Rajamanickam, Sivasankaran; Boman, Erik G.; Bradley, Andrew M.; Booth, Joshua D.; Deveci, Mehmet; Kim, Kyungjoo; Dohrmann, Clark R.; Thornquist, Heidi K.; Chow, Edmond; Patel, Aftab

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

ShyLU: On node Solvers and Kokkos-Kernels

Rajamanickam, Sivasankaran; Boman, Erik G.; Bradley, Andrew M.; Booth, Joshua D.; Kim, Kyungjoo; Deveci, Mehmet

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran; Yamazaki, Ichitaro; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stan; Dongarra, Jack

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Communication-Avoiding Preconditioners for s-step Krylov Methods

Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Basker: A Scalable Sparse Direct Linear Solver for Many-Core Architectures

Booth, Joshua D.; Rajamanickam, Sivasankaran; Boman, Erik G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Architecture-aware Task Placement

Deveci, Mehmet; Devine, Karen; Leung, Vitus J.; Prokopenko, Andrey V.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

WebGraphAnalysisontheBlueWaters Supercomputer

Slota, George M.; Rajamanickam, Sivasankaran; Madduri, Kamesh

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Irregular Graph Algorithms on Parallel Processing Systems

Slota, George M.; Rajamanickam, Sivasankaran; Madduri, Kamesh

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

High-Performance Graph Analytics on Manycore Processors

Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015

Slota, George M.; Rajamanickam, Sivasankaran; Madduri, Kamesh

The divergence in the computer architecture landscape has resulted in different architectures being considered mainstream at the same time. For application and algorithm developers, a dilemma arises when one must focus on using underlying architectural features to extract the best performance on each of these architectures, while writing portable code at the same time. We focus on this problem with graph analytics as our target application domain. In this paper, we present an abstraction-based methodology for performance-portable graph algorithm design on manicure architectures. We demonstrate our approach by systematically optimizing algorithms for the problems of breadth-first search, color propagation, and strongly connected components. We use Kokkos, a manicure library and programming model, for prototyping our algorithms. Our portable implementation of the strongly connected components algorithm on the NVIDIA Tesla K40M is up to 3.25× faster than a state-of-the-art parallel CPU implementation on a dual-socket Sandy Bridge compute node.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

High-Performance Computing for Extreme-Scale Data Analytics

Boman, Erik G.; Madduri, Kamesh; Rajamanickam, Sivasankaran; Wolf, Michael

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Distributing Linear Systems for Parallel Computation

Devine, Karen; Boman, Erik G.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Supercomputing for Web Graph Analytics

Slota, George M.; Rajamanickam, Sivasankaran; Madduri, Kamesh

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Embedded Sampling?Based Uncertainty Quantification Approaches for Emerging Computer Architectures

D'Elia, Marta; Phipps, Eric T.; Edwards, Harold C.; Hu, Jonathan J.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Parallel Graph Coloring

Boman, Erik G.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran; Yamazaki, Ichitaro; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire; Dongarra, Jack

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Exploring Embedded Uncertainty Quantification Methods on Next-Generation Computer Architectures

Phipps, Eric T.; D'Elia, Marta; Hu, Jonathan J.; Rajamanickam, Sivasankaran

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

The Zoltan2 Toolkit: Partitioning Task Placement Coloring and Ordering

Devine, Karen; Boman, Erik G.; Rajamanickam, Sivasankaran; Leung, Vitus J.; Riesen, Lee A.; Deveci, Mehmet; Catalyurek, Umit

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

A hybrid approach for parallel transistor-level full-chip circuit simulation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Thornquist, Heidi K.; Rajamanickam, Sivasankaran

The computer-aided design (CAD) applications that are fundamental to the electronic design automation industry need to harness the available hardware resources to be able to perform full-chip simulation for modern technology nodes (45nm and below). We will present a hybrid (MPI+threads) approach for parallel transistor-level transient circuit simulation that achieves scalable performance for some challenging large-scale integrated circuits. This approach focuses on the computationally expensive part of the simulator: the linear system solve. Hybrid versions of two iterative linear solver strategies are presented, one takes advantage of block triangular form structure while the other uses a Schur complement technique. Results indicate up to a 27x improvement in total simulation time on 256 cores.

More Details

TYPE Conference YEAR 2015

Scopus OSTI DOI