Publications

Results 1–25 of 31

Search results

Jump to search filters

Exploiting geometric partitioning in task mapping for parallel computers

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

Deveci, Mehmet; Rajamanickam, Sivasankaran R.; Leung, Vitus J.; Pedretti, Kevin P.; Olivier, Stephen L.; Bunde, David P.; Catalyurek, Umit V.; Devine, Karen D.

We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a job are not necessarily located within a contiguous block nor within close proximity to each other in the network. The goal is to assign tasks to cores so that interdependent tasks are performed by 'nearby' cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We show that, for the structured finite difference mini-app Mini Ghost, our mapping method reduced execution time 34% on average on 65,536 cores of a Cray XE6. In a molecular dynamics mini-app, Mini MD, our mapping method reduced communication time by 26% on average on 6144 cores. We also compare our mapping with graph-based mappings from the LibTopoMap library and show that our mappings reduced the communication time on average by 15% in MiniGhost and 10% in MiniMD. © 2014 IEEE.

More Details

Demonstration of a Legacy Application's Path to Exascale - ASC L2 Milestone 4467

Barrett, Brian B.; Kelly, Suzanne M.; Klundt, Ruth A.; Laros, James H.; Leung, Vitus J.; Levenhagen, Michael J.; Lofstead, Gerald F.; Moreland, Kenneth D.; Oldfield, Ron A.; Pedretti, Kevin P.; Rodrigues, Arun; Barrett, Richard F.; Ward, Harry L.; Vandyke, John P.; Vaughan, Courtenay T.; Wheeler, Kyle B.; Brandt, James M.; Brightwell, Ronald B.; Curry, Matthew L.; Fabian, Nathan D.; Ferreira, Kurt; Gentile, Ann C.; Hemmert, Karl S.

Abstract not provided.

VM-based slack emulation of large-scale systems

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2011

Bridges, Patrick G.; Arnold, Dorian; Pedretti, Kevin P.

This paper describes the design of a system to enable large-scale testing of new software stacks and prospective high-end computing architectures. The proposed architecture combines system virtualization, time dilation, architectural simulation, and slack simulation to provide scalable emulation of hypothetical systems. We also describe virtualization-based full-system measurement and monitoring tools to aid in using the proposed system for co-design of high-performance computing system software and architectural features for future systems. Finally, we provide a description of the implementation strategy and status of the proposed system. © 2011 ACM.

More Details

Minimal-overhead virtualization of a large scale supercomputer

ACM SIGPLAN Notices

Lange, John R.; Pedretti, Kevin P.; Dinda, Peter; Bae, Chang; Bridges, Patrick G.; Soltero, Philip; Merritt, Alexander

Virtualization has the potential to dramatically increase the usability and reliability of high performance computing (HPC) systems. However, this potential will remain unrealized unless overheads can be minimized. This is particularly challenging on large scale machines that run carefully crafted HPC OSes supporting tightlycoupled, parallel applications. In this paper, we show how careful use of hardware and VMM features enables the virtualization of a large-scale HPC system, specifically a Cray XT4 machine, with .5% overhead on key HPC applications, microbenchmarks, and guests at scales of up to 4096 nodes. We describe three techniques essential for achieving such low overhead: passthrough I/O, workload-sensitive selection of paging mechanisms, and carefully controlled preemption. These techniques are forms of symbiotic virtualization, an approach on which we elaborate. Copyright © 2011 ACM.

More Details

Investigating the impact of the cielo cray XE6 architecture on scientific application codes

Vaughan, Courtenay T.; Rajan, Mahesh R.; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin P.

Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, and supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.

More Details

Opportunities for leveraging OS virtualization in high-end supercomputing

Pedretti, Kevin P.; Bridges, Patrick G.

This paper examines potential motivations for incorporating virtualization support in the system software stacks of high-end capability supercomputers. We advocate that this will increase the flexibility of these platforms significantly and enable new capabilities that are not possible with current fixed software stacks. Our results indicate that compute, virtual memory, and I/O virtualization overheads are low and can be further mitigated by utilizing well-known techniques such as large paging and VMM bypass. Furthermore, since the addition of virtualization support does not affect the performance of applications using the traditional native environment, there is essentially no disadvantage to its addition.

More Details

Palacios and kitten: New high performance operating systems for scalable virtualized and native supercomputing

Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010

Lange, John; Pedretti, Kevin P.; Hudson, Trammell; Dinda, Peter; Cui, Zheng; Xia, Lei; Bridges, Patrick; Gocke, Andy; Jaconette, Steven; Levenhagen, Michael J.; Brightwell, Ronald B.

Palacios is a new open-source VMM under development at Northwestern University and the University of New Mexico that enables applications executing in a virtualized environment to achieve scalable high performance on large machines. Palacios functions as a modularized extension to Kitten, a high performance operating system being developed at Sandia National Laboratories to support large-scale supercomputing applications. Together, Palacios and Kitten provide a thin layer over the hardware to support full-featured virtualized environments alongside Kitten's lightweight native environment. Palacios supports existing, unmodified applications and operating systems by using the hardware virtualization technologies in recent AMD and Intel processors. Additionally, Palacios leverages Kitten's simple memory management scheme to enable low-overhead pass-through of native devices to a virtualized environment. We describe the design, implementation, and integration of Palacios and Kitten. Our benchmarks show that Palacios provides near native (within 5%), scalable performance for virtualized environments running important parallel applications. This new architecture provides an incremental path for applications to use supercomputers, running specialized lightweight host operating systems, that is not significantly performance-compromised. © 2010 IEEE.

More Details
Results 1–25 of 31
Results 1–25 of 31