Sort by Date
Sort by Title
Standard Format
Show Abstracts
As Citations (APA)
Skip to search filters
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
Yamazaki, Ichitaro; Rajamanickam, Sivasankaran R. ; Boman, Erik G. ; Hoemmen, Mark F. ; Heroux, Michael A. ; Tomov, Stanimire
Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.
Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS
Deveci, Mehmet; Rajamanickam, Sivasankaran R. ; Leung, Vitus J. ; Pedretti, Kevin P. ; Olivier, Stephen L. ; Bunde, David P.; Catalyurek, Umit V.; Devine, Karen D.
We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a job are not necessarily located within a contiguous block nor within close proximity to each other in the network. The goal is to assign tasks to cores so that interdependent tasks are performed by 'nearby' cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We show that, for the structured finite difference mini-app Mini Ghost, our mapping method reduced execution time 34% on average on 65,536 cores of a Cray XE6. In a molecular dynamics mini-app, Mini MD, our mapping method reduced communication time by 26% on average on 6144 cores. We also compare our mapping with graph-based mappings from the LibTopoMap library and show that our mappings reduced the communication time on average by 15% in MiniGhost and 10% in MiniMD. © 2014 IEEE.
Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS
Slota, George M.; Rajamanickam, Sivasankaran R. ; Madduri, Kamesh
Finding the strongly connected components (SCCs) of a directed graph is a fundamental graph-theoretic problem. Tarjan's algorithm is an efficient serial algorithm to find SCCs, but relies on the hard-to-parallelize depth-first search (DFS). We observe that implementations of several parallel SCC detection algorithms show poor parallel performance on modern multicore platforms and large-scale networks. This paper introduces the Multistep method, a new approach that avoids work inefficiencies seen in prior SCC approaches. It does not rely on DFS, but instead uses a combination of breadth-first search (BFS) and a parallel graph coloring routine. We show that the Multistep method scales well on several real-world graphs, with performance fairly independent of topological properties such as the size of the largest SCC and the total number of SCCs. On a 16-core Intel Xeon platform, our algorithm achieves a 20X speedup over the serial approach on a 2 billion edge graph, fully decomposing it in under two seconds. For our collection of test networks, we observe that the Multistep method is 1.92X faster (mean speedup) than the state-of-the-art Hong et al. SCC method. In addition, we modify the Multistep method to find connected and weakly connected components, as well as introduce a novel algorithm for determining articulation vertices of biconnected components. These approaches all utilize the same underlying BFS and coloring routines. © 2014 IEEE.
Lin, Paul L. ; Rajamanickam, Sivasankaran R. ; Siefert, Christopher S. ; Bettencourt, Matthew T. ; Cyr, Eric C. ; Domino, Stefan P. ; Fisher, Travis C. ; Hoemmen, Mark F. ; Hu, Jonathan J. ; Phipps, Eric T. ; Prokopenko, Andrey V.
Boman, Erik G. ; Devine, Karen D. ; Rajamanickam, Sivasankaran R.
Devine, Karen D. ; Boman, Erik G. ; Rajamanickam, Sivasankaran R.
Rajamanickam, Sivasankaran R. ; Leung, Vitus J. ; Pedretti, Kevin P. ; Olivier, Stephen L. ; Devine, Karen D.
Devine, Karen D. ; Boman, Erik G. ; Rajamanickam, Sivasankaran R. ; Leung, Vitus J.
Rajamanickam, Sivasankaran R. ; Slota, George M.
Boman, Erik G. ; Devine, Karen D. ; Rajamanickam, Sivasankaran R.
Rajamanickam, Sivasankaran R. ; Devine, Karen D.
Devine, Karen D. ; Rajamanickam, Sivasankaran R. ; Boman, Erik G.
Boman, Erik G. ; Devine, Karen D. ; Rajamanickam, Sivasankaran R.
Rajamanickam, Sivasankaran R.
Hammond, Simon D. ; Rajamanickam, Sivasankaran R. ; Ang, James A. ; Barrett, Richard F. ; Doerfler, Douglas W. ; Heroux, Michael A. ; Laros, James H.
Schiek, Richard S. ; Mei, Ting M. ; Rajamanickam, Sivasankaran R. ; Keiter, Eric R. ; Warrender, Christina E. ; Aimone, James B. ; Thornquist, Heidi K. ; Russo, Thomas V. ; Verley, Jason V. ; Crossno, Patricia J.
Rajamanickam, Sivasankaran R. ; Devine, Karen D.
Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Azad, Ariful; Halappanavar, Mahantesh; Rajamanickam, Sivasankaran R. ; Boman, Erik G. ; Khan, Arif; Pothen, Alex
We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in a bipartite graph on multicore and massively multithreaded computers. As computers with larger numbers of slower cores dominate the commodity processor market, the design of multithreaded algorithms to solve large matching problems becomes a necessity. Recent work on serial algorithms for the matching problem has shown that their performance is sensitive to the order in which the vertices are processed for matching. In a multithreaded environment, imposing a serial order in which vertices are considered for matching would lead to loss of concurrency and performance. But this raises the question: Would parallel matching algorithms on multithreaded machines improve performance over a serial algorithm? We answer this question in the affirmative. We report efficient multithreaded implementations of three classes of algorithms based on their manner of searching for augmenting paths: breadth-first-search, depth-first-search, and a combination of both. The Karp-Sipser initialization algorithm is used to make the parallel algorithms practical. We report extensive results and insights using three shared-memory platforms (a 48-core AMD Opteron, a 32-coreIntel Nehalem, and a 128-processor Cray XMT) on a representative set of real-world and synthetic graphs. To the best of our knowledge, this is the first study of augmentation-based parallel algorithms for bipartite cardinality matching that demonstrates good speedups on multithreaded shared memory multiprocessors. © 2012 IEEE.
Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Rajamanickam, Sivasankaran R. ; Boman, Erik G. ; Heroux, Michael A.
Rajamanickam, Sivasankaran R. ; Boman, Erik G. ; Heroux, Michael A. ; Thornquist, Heidi K.
Boman, Erik G. ; Devine, Karen D. ; Leung, Vitus J. ; Rajamanickam, Sivasankaran R.
Rajamanickam, Sivasankaran R. ; Boman, Erik G.
Devine, Karen D. ; Leung, Vitus J. ; Rajamanickam, Sivasankaran R.
Boman, Erik G. ; Rajamanickam, Sivasankaran R.
Rajamanickam, Sivasankaran R. ; Boman, Erik G. ; Heroux, Michael A.
Results 276–300 of 315
25 Results per page
50 Results per page
100 Results per page
200 Results per page