Publications

Results 151–175 of 227

Search results

Jump to search filters

A hybrid-hybrid solver for manycore platforms

SC'11 - Proceedings of the 2011 High Performance Computing Networking, Storage and Analysis Companion, Co-located with SC'11

Rajamanickam, Sivasankaran R.; Boman, Erik G.; Heroux, Michael A.

With the increasing levels of parallelism in a compute node, it is important to exploit multiple levels of parallelism even within a single compute node. We present ShyLU (pro- nounced\Shy-loo"for Scalable Hybrid LU), a\hybrid-hybrid" solver for general sparse linear systems that is hybrid in two ways: First, it combines direct and iterative methods. The iterative method is based on approximate Schur com- plements. Second, the solver uses two levels of parallelism via hybrid programming (MPI+threads). Our solver is use- ful both in shared-memory environments and on large par- allel computers with distributed memory (as a subdomain solver). We compare the robustness of ShyLU against other algebraic preconditioners. ShyLU scales well up to 192 cores for a given problem size. We compare at MPI performance of ShyLU against a hybrid implementation. We conclude that on present multicore nodes at MPI is better. However, for future manycore machines (48 or more cores) hybrid/ hi- erarchical algorithms and implementations are important for sustained performance. Copyright is held by the author/owner(s).

More Details

Factors impacting performance of multithreaded sparse riangular solvet

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Wolf, Michael M.; Heroux, Michael A.; Boman, Erik G.

As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes. © 2011 Springer-Verlag Berlin Heidelberg.

More Details

Sensor placement for municipal water networks

Phillips, Cynthia A.; Boman, Erik G.; Carr, Robert D.; Hart, William E.; Berry, Jonathan W.; Watson, Jean-Paul W.; Hart, David B.; Mckenna, Sean A.; Riesen, Lee A.

We consider the problem of placing a limited number of sensors in a municipal water distribution network to minimize the impact over a given suite of contamination incidents. In its simplest form, the sensor placement problem is a p-median problem that has structure extremely amenable to exact and heuristic solution methods. We describe the solution of real-world instances using integer programming or local search or a Lagrangian method. The Lagrangian method is necessary for solution of large problems on small PCs. We summarize a number of other heuristic methods for effectively addressing issues such as sensor failures, tuning sensors based on local water quality variability, and problem size/approximation quality tradeoffs. These algorithms are incorporated into the TEVA-SPOT toolkit, a software suite that the US Environmental Protection Agency has used and is using to design contamination warning systems for US municipal water systems.

More Details

Factors impacting performance of multithreaded triangular solve

Wolf, Michael W.; Heroux, Michael A.; Boman, Erik G.

As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes.

More Details

Modeling the fracture of ice sheets on parallel computers

Tuminaro, Raymond S.; Boman, Erik G.

The objective of this project is to investigate the complex fracture of ice and understand its role within larger ice sheet simulations and global climate change. At the present time, ice fracture is not explicitly considered within ice sheet models due in part to large computational costs associated with the accurate modeling of this complex phenomena. However, fracture not only plays an extremely important role in regional behavior but also influences ice dynamics over much larger zones in ways that are currently not well understood. Dramatic illustrations of fracture-induced phenomena most notably include the recent collapse of ice shelves in Antarctica (e.g. partial collapse of the Wilkins shelf in March of 2008 and the diminishing extent of the Larsen B shelf from 1998 to 2002). Other fracture examples include ice calving (fracture of icebergs) which is presently approximated in simplistic ways within ice sheet models, and the draining of supraglacial lakes through a complex network of cracks, a so called ice sheet plumbing system, that is believed to cause accelerated ice sheet flows due essentially to lubrication of the contact surface with the ground. These dramatic changes are emblematic of the ongoing change in the Earth's polar regions and highlight the important role of fracturing ice. To model ice fracture, a simulation capability will be designed centered around extended finite elements and solved by specialized multigrid methods on parallel computers. In addition, appropriate dynamic load balancing techniques will be employed to ensure an approximate equal amount of work for each processor.

More Details
Results 151–175 of 227
Results 151–175 of 227