Publications

Results 7401–7425 of 9,998

Search results

Jump to search filters

Using the Sirocco File System for high-bandwidth checkpoints

Klundt, Ruth A.; Ward, Harry L.

The Sirocco File System, a file system for exascale under active development, is designed to allow the storage software to maximize quality of service through increased flexibility and local decision-making. By allowing the storage system to manage a range of storage targets that have varying speeds and capacities, the system can increase the speed and surety of storage to the application. We instrument CTH to use a group of RAM-based Sirocco storage servers allocated within the job as a high-performance storage tier to accept checkpoints, allowing computation to potentially continue asynchronously of checkpoint migration to slower, more permanent storage. The result is a 10-60x speedup in constructing and moving checkpoint data from the compute nodes. This demonstration of early Sirocco functionality shows a significant benefit for a real I/O workload, checkpointing, in a real application, CTH. By running Sirocco storage servers within a job as RAM-only stores, CTH was able to store checkpoints 10-60x faster than storing to PanFS, allowing the job to continue computing sooner. While this prototype did not include automatic data migration, the checkpoint was available to be pushed or pulled to disk-based storage as needed after the compute nodes continued computing. Future developments include the ability to dynamically spawn Sirocco nodes to absorb checkpoints, expanding this mechanism to other fast tiers of storage like flash memory, and sharing of dynamic Sirocco nodes between multiple jobs as needed.

More Details

The potential, limitations, and challenges of divide and conquer quantum electronic structure calculations on energetic materials

Tucker, Jon R.; Magyar, Rudolph J.

High explosives are an important class of energetic materials used in many weapons applications. Even with modern computers, the simulation of the dynamic chemical reactions and energy release is exceedingly challenging. While the scale of the detonation process may be macroscopic, the dynamic bond breaking responsible for the explosive release of energy is fundamentally quantum mechanical. Thus, any method that does not adequately describe bonding is destined to lack predictive capability on some level. Performing quantum mechanics calculations on systems with more than dozens of atoms is a gargantuan task, and severe approximation schemes must be employed in practical calculations. We have developed and tested a divide and conquer (DnC) scheme to obtain total energies, forces, and harmonic frequencies within semi-empirical quantum mechanics. The method is intended as an approximate but faster solution to the full problem and is possible due to the sparsity of the density matrix in many applications. The resulting total energy calculation scales linearly as the number of subsystems, and the method provides a path-forward to quantum mechanical simulations of millions of atoms.

More Details

Formulating and analyzing multi-stage sensor placement problems

Water Distribution Systems Analysis 2010 - Proceedings of the 12th International Conference, WDSA 2010

Watson, Jean-Paul; Hart, William E.; Woodruff, David L.; Murray, Regan

The optimization of sensor placements is a key aspect of the design of contaminant warning systems for automatically detecting contaminants in water distribution systems. Although researchers have generally assumed that all sensors are placed at the same time, in practice sensor networks will likely grow and evolve over time. For example, limitations for a water utility's budget may dictate an staged, incremental deployment of sensors over many years. We describe optimization formulations of multi-stage sensor placement problems. The objective of these formulations includes an explicit trade-off between the value of the initially deployed and final sensor networks. This trade-off motivates the deployment of sensors in initial stages of the deployment schedule, even though these choices typically lead to a solution that is suboptimal when compared to placing all sensors at once. These multi-stage sensor placement problems can be represented as mixed-integer programs, and we illustrate the impact of this trade-off using standard commercial solvers. We also describe a multi-stage formulation that models budget uncertainty, expressed as a tree of potential budget scenarios through time. Budget uncertainty is used to assess and hedge against risks due to a potentially incomplete deployment of a planned sensor network. This formulation is a multi-stage stochastic mixed-integer program, which are notoriously difficult to solve. We apply standard commercial solvers to small-scale test problems, enabling us to effectively analyze multi-stage sensor placement problems subject to budget uncertainties, and assess the impact of accounting for such uncertainty relative to a deterministic multi-stage model. © 2012 ASCE.

More Details

Optimal determination of grab sample locations and source inversion in large-scale water distribution systems

Water Distribution Systems Analysis 2010 - Proceedings of the 12th International Conference, WDSA 2010

Wong, Angelica; Young, James; Laird, Carl D.; Hart, William E.; Mckenna, Sean A.

We present a mixed-integer linear programming formulation to determine optimal locations for manual grab sampling after the detection of contaminants in a water distribution system. The formulation selects optimal manual grab sample locations that maximize the total pair-wise distinguishability of candidate contamination events. Given an initial contaminant detection location, a source inversion is performed that will eliminate unlikely events resulting in a much smaller set of candidate contamination events. We then propose a cyclical process where optimal grab samples locations are determined and manual grab samples taken. Relying only on YES/NO indicators of the presence of contaminant, source inversion is performed to reduce the set of candidate contamination events. The process is repeated until the number of candidate events is sufficiently small. Case studies testing this process are presented using water network models ranging from 4 to approximately 13000 nodes. The results demonstrate that the contamination event can be identified within a remarkably small number of sampling cycles using very few sampling teams. Furthermore, solution times were reasonable making this formulation suitable for real-time settings. © 2012 ASCE.

More Details

Optimization of Large-Scale Heterogeneous System-of-Systems Models

Gray, Genetha A.; Hart, William E.; Hough, Patricia D.; Parekh, Ojas D.; Phillips, Cynthia A.; Siirola, John D.; Swiler, Laura P.; Watson, Jean-Paul

Decision makers increasingly rely on large-scale computational models to simulate and analyze complex man-made systems. For example, computational models of national infrastructures are being used to inform government policy, assess economic and national security risks, evaluate infrastructure interdependencies, and plan for the growth and evolution of infrastructure capabilities. A major challenge for decision makers is the analysis of national-scale models that are composed of interacting systems: effective integration of system models is difficult, there are many parameters to analyze in these systems, and fundamental modeling uncertainties complicate analysis. This project is developing optimization methods to effectively represent and analyze large-scale heterogeneous system of systems (HSoS) models, which have emerged as a promising approach for describing such complex man-made systems. These optimization methods enable decision makers to predict future system behavior, manage system risk, assess tradeoffs between system criteria, and identify critical modeling uncertainties.

More Details

A tunable, software-based DRAM error detection and correction library for HPC

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Fiala, David; Ferreira, Kurt; Mueller, Frank; Engelmann, Christian

Proposed exascale systems will present a number of considerable resiliency challenges. In particular, DRAM soft-errors, or bit-flips, are expected to greatly increase due to the increased memory density of these systems. Current hardware-based fault-tolerance methods will be unsuitable for addressing the expected soft error frequency rate. As a result, additional software will be needed to address this challenge. In this paper we introduce LIBSDC, a tunable, transparent silent data corruption detection and correction library for HPC applications. LIBSDC provides comprehensive SDC protection for program memory by implementing on-demand page integrity verification. Experimental benchmarks with Mantevo HPCCG show that once tuned, LIBSDC is able to achieve SDC protection with 50% overhead of resources, less than the 100% needed for double modular redundancy. © 2012 Springer-Verlag Berlin Heidelberg.

More Details
Results 7401–7425 of 9,998
Results 7401–7425 of 9,998