Publications

Results 51–64 of 64

Search results

Jump to search filters

A comparison of high-level programming choices for incomplete sparse factorization across different architectures

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Booth, Joshua D.; Kim, Kyungjoo; Rajamanickam, Sivasankaran

All many-core systems require fine-grained shared memory parallelism, however the most efficient way to extract such parallelism is far from trivial. Fine-grained parallel algorithms face various performance trade-offs related to tasking, accesses to global data-structures, and use of shared cache. While programming models provide high level abstractions, such as data and task parallelism, algorithmic choices still remain open on how to best implement irregular algorithms, such as sparse factorizations, while taking into account the trade-offs mentioned above. In this paper, we compare these performance trade-offs for task and data parallelism on different hardware architectures such as Intel Sandy Bridge, Intel Xeon Phi, and IBM Power8. We do this by comparing the scaling of a new task-parallel incomplete sparse Cholesky factorization called Tacho and a new data-parallel incomplete sparse LU factorization called Basker. Both solvers utilize Kokkos programming model and were developed within the ShyLU package of Trilinos. Using these two codes we demonstrate how high-level programming changes affect performance and overhead costs on multiple multi/many-core systems. We find that Kokkos is able to provide comparable performance with both parallel-for and task/futures on traditional x86 multicores. However, the choice of which high-level abstraction to use on many-core systems depends on both the architectures and input matrices.

More Details

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George W.; Edwards, Harold C.; Olivier, Stephen L.

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.

More Details

Intercomparison of 3D pore-scale flow and solute transport simulation methods

Advances in Water Resources

Mehmani, Yashar; Schoenherr, Martin; Pasquali, Andrea; Clark, Colin; Perkins, William A.; Kim, Kyungjoo; Perego, Mauro; Parks, Michael L.; Balhoff, Matthew T.; Richmond, Marshall C.; Geier, Martin; Krafczyk, Manfred; Luo, Li S.; Tartakovsky, Alexandre M.; Winter, C.L.

Multiple numerical approaches have been developed to simulate porous media fluid flow and solute transport at the pore scale. These include 1) methods that explicitly model the three-dimensional geometry of pore spaces and 2) methods that conceptualize the pore space as a topologically consistent set of stylized pore bodies and pore throats. In previous work we validated a model of the first type, using computational fluid dynamics (CFD) codes employing a standard finite volume method (FVM), against magnetic resonance velocimetry (MRV) measurements of pore-scale velocities. Here we expand that validation to include additional models of the first type based on the lattice Boltzmann method (LBM) and smoothed particle hydrodynamics (SPH), as well as a model of the second type, a pore-network model (PNM). The PNM approach used in the current study was recently improved and demonstrated to accurately simulate solute transport in a two-dimensional experiment. While the PNM approach is computationally much less demanding than direct numerical simulation methods, the effect of conceptualizing complex three-dimensional pore geometries on solute transport in the manner of PNMs has not been fully determined. We apply all four approaches (FVM-based CFD, LBM, SPH and PNM) to simulate pore-scale velocity distributions and (for capable codes) nonreactive solute transport, and intercompare the model results. Comparisons are drawn both in terms of macroscopic variables (e.g., permeability, solute breakthrough curves) and microscopic variables (e.g., local velocities and concentrations). Generally good agreement was achieved among the various approaches, but some differences were observed depending on the model context. The intercomparison work was challenging because of variable capabilities of the codes, and inspired some code enhancements to allow consistent comparison of flow and transport simulations across the full suite of methods. This paper provides support for confidence in a variety of pore-scale modeling methods and motivates further development and application of pore-scale simulation methods.

More Details
Results 51–64 of 64
Results 51–64 of 64