Publications

Results 7801–7850 of 9,998

Search results

Jump to search filters

rMPI : increasing fault resiliency in a message-passing environment

Ferreira, Kurt; Oldfield, Ron A.; Stearley, Jon S.; Laros, James H.; Pedretti, Kevin T.T.; Brightwell, Ronald B.

As High-End Computing machines continue to grow in size, issues such as fault tolerance and reliability limit application scalability. Current techniques to ensure progress across faults, like checkpoint-restart, are unsuitable at these scale due to excessive overheads predicted to more than double an applications time to solution. Redundant computation, long used in distributed and mission critical systems, has been suggested as an alternative to checkpoint-restart on its own. In this paper we describe the rMPI library which enables portable and transparent redundant computation for MPI applications. We detail the design of the library as well as two replica consistency protocols, outline the overheads of this library at scale on a number of real-world applications, and finally outline the significant increase in an applications time to solution at extreme scale as well as show the scenarios in which redundant computation makes sense.

More Details

Factors impacting performance of multithreaded sparse riangular solvet

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Wolf, Michael M.; Heroux, Michael A.; Boman, Erik G.

As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes. © 2011 Springer-Verlag Berlin Heidelberg.

More Details

Assessing the Near-Term Risk of Climate Uncertainty:Interdependencies among the U.S. States

Backus, George A.; Trucano, Timothy G.; Robinson, David G.; Adams, Brian M.; Richards, Elizabeth H.; Siirola, John D.; Boslough, Mark B.; Taylor, Mark A.; Conrad, Stephen H.; Kelic, Andjelka; Roach, Jesse D.; Warren, Drake E.; Ballantine, Marissa D.; Stubblefield, W.A.; Snyder, Lillian A.; Finley, Ray E.; Horschel, Daniel S.; Ehlen, Mark E.; Klise, Geoffrey T.; Malczynski, Leonard A.; Stamber, Kevin L.; Tidwell, Vincent C.; Vargas, Vanessa N.; Zagonel, Aldo A.

Abstract not provided.

Results 7801–7850 of 9,998
Results 7801–7850 of 9,998