Research
My primary focus is on High Performance Computing.
I focus on many interrelated subfields within high performance computing as
described below. One common theme among all my work lately is to design
architectureaware algorithms for nextgeneration supercomputers.
Linear Solvers
Linear solvers are the foundational tools for many scientific simulations of national interest. While this subfield has been studied for many years, number of open problems still remain. My primary interest in linear solvers area is to develop algorithms for linear solvers that is targeted towards future supercomputers. I am interested both in the nodelevel and systemlevel solvers. At the nodelevel my interests lie in taskparallel/dataparallel factorization like preconditioners, smoothers and direct solvers. At the systemlevel, I focus on hybrid Schur complement methods and preconditioners for communicationavoiding (sstep) Krylov methods.
Related Publications
 Tacho: MemoryScalable Task Parallel Sparse Cholesky Factorization, K. Kim, H. C. Edwards, S. Rajamanickam. In proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 550559, 2018. (PDF)
 A Distributedmemory Hierarchical Solver for General Sparse Linear Systems, C. Chen, H. Pouransari, S. Rajamanickam, E. Boman, E. Darve, Parallel Computing (2017). (PDF)
 Basker: Parallel Sparse LU Factorization utilizing Hierarchical Parallelism and Data Layouts J. Booth, N. Ellingwood, H. Thornquist, and S. Rajamanickam, Parallel Computing, vol 68, pp 1731, 2017.
 A Survey of Direct Methods for Sparse Linear Systems, T. Davis, S. Rajamanickam, W. SidLakhdar, Acta Numerica, Volume 25, 2016, pp 383566. (Invited) (Tech Report)
 Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts, J. Booth, S. Rajamanickam and H. Thornquist, In proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 673682.
 A Comparison of HighLevel Programming Choices for Incomplete Sparse Factorization Across Different Architectures, J. Booth, K. Kim, S. Rajamanickam, In proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 397406.
 Task Parallel Incomplete Cholesky Factorization using 2D PartitionedBlock Layout, Kyungjoo Kim, Sivasankaran Rajamanickam, George Stelle, H. Carter Edwards, Stephen L. Olivier. ArXiv Report (under review)
 Domain Decomposition Preconditioners for CommunicationAvoiding Krylov Methods on a Hybrid CPU/GPU Cluster, I. Yamazaki, S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, In Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis (SC14), 2014.(PDF)
 Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems, E. Bavier, M. Hoemmen, S. Rajamanickam, H. Thornquist, Scientific Programming, 2012, Volume 20, Issue 3. (PDF)
 ShyLU: A HybridHybrid Solver for Multicore Platforms, S. Rajamanickam, E. G. Boman, and M. A. Heroux, 26th International Parallel and Distributed Processing Symposium (IPDPS 2012), pp.631643, 2125 May 2012. (PDF)
 Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky factorization and Update / Downdate, Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, ACM Transactions on Mathematical Software, 2008, Volume 35, Number 3. (PDF)
Combinatorial Algorithms for High Performance Scientific Computing
Combinatorial algorithms impact a number of areas in high performance
scientific computing starting from partitioning the input for better load
balance and increased parallelism, ordering techniques for fewer floating
point operations and increased parallelism, matching techniques for better
numerical stability of linear solvers etc. I have worked on all these aspects
(partitioning, ordering, coloring, matching algorithms). My primary interest
in algorithms for nextgeneration architectures in some cases (coloring,
matching) and new algorithms for systemlevel efforts in other cases
(partitioning, ordering). I am interested in graph, hypergraph and coordinate
based algorithms for different type of problems.
Related Publications
 Parallel Graph Coloring for Manycore Architectures, M. Deveci, E. Boman, K. Devine and S. Rajamanickam, In proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 892901.
 MultiJagged: A Scalable Parallel Spatial Partitioning Algorithm, M. Deveci, S. Rajamanickam, K.D. Devine, U. V. Catalyurek, IEEE Transactions on Parallel and Distributed Systems, 2015. (PDF)
 Exploiting Geometric Partitioning in Task Mapping for Parallel Computers, M. Deveci, S. Rajamanickam, V. Leung, K. Pedretti, S. Olivier, D. Bunde, U. V. Catalyurek, K. Devine, In Proceedings of 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2014). (PDF)
 Scalable Matrix Computations on ScaleFree Graphs Using 2D Graph Partitioning, E.G. Boman, K.D. Devine, S. Rajamanickam, International Conference for High Performance Computing, Networking, Storage and Analysis (SC13), 2013. (PDF)
 Multithreaded Algorithms for Maximum Matching in Bipartite Graphs, A. Azad, M. Halappanavar, S. Rajamanickam, E. G. Boman, A. Khan and A. Pothen, 26th International Parallel and Distributed Processing Symposium (IPDPS 2012), pp.860872, 2125 May 2012. (PDF)
 An Evaluation of the Zoltan Parallel Graph and Hypergraph Partitioners, S. Rajamanickam and E. G. Boman, 10th DIMACS Implementation Challenge. (PDF)
 Parallel Partitioning with Zoltan: Is Hypergraph Partitioning Worth It?, S. Rajamanickam and E. G. Boman. Graph Partitioning and Graph Clustering  10th DIMACS Implementation Challenge Workshop, Georgia Institute of Technology, Atlanta, GA, USA, February 1314, 2012. Proceedings. (PDF)
 A study of combinatorial issues in a sparse hybrid solver, E. G. Boman and S. Rajamanickam, Proceedings of SciDAC 2011, 2011. (PDF)
Graph algorithms for analytics in HPC Platforms
As the amount of data from the web, social networks and other
nontraditional data sources grows, the need to analyze these data using high
performance computing systems grows as well. Data from these resources have
very different properties when compared to data from traditional scientific
computing problems metioned above. I am interested in specialized techniques
for analyzing such data using HPC platforms. We have developed a number of
algorithms from graph traversals, identifying connected components, community
detection based partitioning algorithms.
Related Publications
 Experimental Design of Work Chunking for Graph Algorithms on High Bandwidth Memory Architectures, G. Slota, S. Rajamanickam. In proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 87884, 2018. (PDF)
 Fast linear algebrabased triangle counting with KokkosKernels, M. Wolf, M. Deveci, J. Berry, S. Hammond, and S. Rajamanickam, 2017 IEEE High Performance Extreme Computing Conference (HPEC), 2017. (PDF)
 Partitioning Trillionedge graphs in minutes, G. Slota, S. Rajamanickam, K. Devine, and K. Madduri, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 646655, 2017. (PDF)
 Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations, G. Slota, S. Rajamanickam, K. Madduri, In proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops IPDPSW 2017, pp 588597, 2017.(PDF)
 Complex Network Partitioning Using Label Propagation, G. Slota, K. Madduri, S. Rajamanickam, SIAM Journal on Scientific Computing, 2016, 38(5), S620S645.(PDF)
 Parallel Graph Coloring for Manycore Architectures, M. Deveci, E. Boman, K. Devine and S. Rajamanickam, In proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 892901.
 A Case Study of Complex Graph Analysis in Distributed Memory: Implementation and Optimization, G. Slota, S. Rajamanickam, K. Madduri, In proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 293302.
 Highperformance Graph Analytics on Manycore Processors, G. M. Slota, S. Rajamanickam, K. Madduri, In Proceedings of 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2015). (PDF)
 PULP: Scalable MultiObjective MultiConstraint Partitioning for SmallWorld Networks, G. M. Slota, K. Madduri, and S. Rajamanickam, In Proceedings of IEEE Conference on Big Data (BigData 2014), pages 481490, 2014. (PDF)
 BFS and Coloringbased Parallel Algorithms for Strongly Connected Components and Related Problems, G. M. Slota, S. Rajamanickam, K. Madduri,28th In Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS 2014). (PDF)
 Scalable Matrix Computations on ScaleFree Graphs Using 2D Graph Partitioning, E.G. Boman, K.D. Devine, S. Rajamanickam, International Conference for High Performance Computing, Networking, Storage and Analysis (SC13), 2013. (PDF)
Linear Algebra Kernels
Sparse and dense linear algebra kernels are foundational kernels for
scientific computing and in some cases even data analysis. My primary focus
is on developing performanceportable algorithms for sparse and dense linear
algebra kernels on architectures like GPUs and Intel Knights Landing processors.
Related Publications
 Multithreaded Sparse MatrixMatrix Multiplication for ManyCore and GPU Architectures, M. Deveci, C. Trott, S. Rajamanickam, Parallel Computing, vol 78, 3346, 2018.
 Designing VectorFriendly Compact BLAS and LAPACK Kernels, K. Kim, T. Costa, M. Deveci, A. Bradley, S. Hammond, M. Guney, S. Knepper, S. Story, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17), pages 55:155:12, 2017.(PDF)
 Fast linear algebrabased triangle counting with KokkosKernels, M. Wolf, M. Deveci, J. Berry, S. Hammond, and S. Rajamanickam, 2017 IEEE High Performance Extreme Computing Conference (HPEC), 2017. (PDF)
 PerformancePortable Sparse MatrixMatrix Multiplication for ManyCore Architectures, M. Deveci, C. Trott, S. Rajamanickam, In proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops IPDPSW 2017, pp 693792, 2017.( PDF)
Applications
All the above algorithmic work is focused on delivering capabilities to
important applications from a number of domains such as circuit, thermal
fluids, solid mechanics, structural dynamics and climate simulations. My work
impacts these applications both directly and indirectly.
Related Publications
 Towards ExtremeScale Simulations for Low Mach Fluids with SecondGeneration Trilinos, P. Lin, M. Bettencourt, S. Domino, T. Fisher, M. Hoemmen, J. Hu, E. Phipps, A. Prokopenko, S. Rajamanickam, C. Siefert, S. Kennon, Parallel Processing Letters, Volume 24, Issue 04, 2014. (Preprint)
 A Hybrid Approach for Parallel TransistorLevel FullChip Circuit Simulation, H. K. Thornquist, S. Rajamanickam, VECPAR 2014. (Preprint)
 Towards Extreme Scale Simulation with NextGeneration Trilinos: a low mach fluid application case study, P. Lin, M. Bettencourt, S. Domino, T. Fisher, M. Hoemmen, J. Hu, E. Phipps, A. Prokopenko, S. Rajamanickam, C. Siefert, E. Cyr, S. Kennon, IPDPS Workshops 2014. (PDF)
 Electrical modeling and simulation for stockpile stewardship, H. K. Thornquist, E. R. Keiter, S. Rajamanickam, XRDS: Crossroads, The ACM Magazine for Students, Volume 19, Issue 3, 2013.(PDF)
 Enabling NextGeneration Parallel Circuit Simulation with Trilinos, C. Baker, E. G. Boman, M. A. Heroux, E. Keiter, S. Rajamanickam, R. Schiek, and H. Thornquist, Workshop on Algorithms and Programming Tools for NextGeneration HighPerformance Scientific Software (HPSS 2011), at EuroPar 2011. (PDF)

Contact
Email: srajama@sandia.gov
(505) 8447181 (Phone)
Mailing address (USPS)
Sandia National Laboratories
P.O. Box 5800, MS 1320
Albuquerque, NM 871851320
FedEX/UPS/DHL
Sandia National Laboratories
1515 Eubank SE
MS 1320
Albuquerque, NM 87123
