Publications

11 Results
Skip to search filters

STS-k: A multilevel sparse triangular solution scheme for NUMA multicores

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Kabir, Humayun; Booth, Joshua D.; Aupy, Guillaume; Benoit, Anne; Robert, Yves; Raghavan, Padma

We consider techniques to improve the performance of parallel sparse triangular solution on non-uniform memory architecture multicores by extending earlier coloring and level set schemes for single-core multiprocessors. We develop STS-k, where k represents a small number of transformations for latency reduction from increased spatial and temporal locality of data accesses. We propose a graph model of data reuse to inform the development of STS-k and to prove that computing an optimal cost schedule is NP-complete. We observe significant speed-ups with STS-3 on 32-core Intel Westmere-Ex and 24-core AMD 'MagnyCours' processors. Incremental gains solely from the 3-level transformations in STS-3 for a fixed ordering, correspond to reductions in execution times by factors of 1.4(Intel) and 1.5(AMD) for level sets and 2(Intel) and 2.2(AMD) for coloring. On average, execution times are reduced by a factor of 6(Intel) and 4(AMD) for STS-3 with coloring compared to a reference implementation using level sets.

More Details

Hybrid sparse linear solutions with substituted factorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Booth, Joshua D.; Raghavan, Padma

We develop a computationally less expensive alternative to the direct solution of a large sparse symmetric positive definite system arising from the numerical solution of elliptic partial differential equation models. Our method, substituted factorization, replaces the computationally expensive factorization of certain dense submatrices that arise in the course of direct solution with sparse Cholesky factorization with one or more solutions of triangular systems using substitution. These substitutions fit into the tree-structure commonly used by parallel sparse Cholesky, and reduce the initial factorization cost at the expense of a slight increase cost in solving for a right-hand side vector. Our analysis shows that substituted factorization reduces the number of floating-point operations for the model k × k 5-point finite-difference problem by 10% and empirical tests show execution time reduction on average of 24.4%. On a test suite of three-dimensional problems we observe execution time reduction as high as 51.7% and 43.1% on average.

More Details
11 Results
11 Results