Publications

Results 8326–8350 of 9,998

Search results

Jump to search filters

A brief parallel I/O tutorial

Ward, Harry L.

This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a way that minimizes interaction with the I/O system, maximizes the amount of data moved per call, and provides the I/O system the most information about the I/O transfer per request.

More Details

Porting LAMMPS to GPUs

Brown, William M.; Crozier, Paul C.; Plimpton, Steven J.

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.

More Details

Foundational development of an advanced nuclear reactor integrated safety code

Schmidt, Rodney C.; Hooper, Russell H.; Humphries, Larry; Lorber, Alfred L.; Spotz, William S.

This report describes the activities and results of a Sandia LDRD project whose objective was to develop and demonstrate foundational aspects of a next-generation nuclear reactor safety code that leverages advanced computational technology. The project scope was directed towards the systems-level modeling and simulation of an advanced, sodium cooled fast reactor, but the approach developed has a more general applicability. The major accomplishments of the LDRD are centered around the following two activities. (1) The development and testing of LIME, a Lightweight Integrating Multi-physics Environment for coupling codes that is designed to enable both 'legacy' and 'new' physics codes to be combined and strongly coupled using advanced nonlinear solution methods. (2) The development and initial demonstration of BRISC, a prototype next-generation nuclear reactor integrated safety code. BRISC leverages LIME to tightly couple the physics models in several different codes (written in a variety of languages) into one integrated package for simulating accident scenarios in a liquid sodium cooled 'burner' nuclear reactor. Other activities and accomplishments of the LDRD include (a) further development, application and demonstration of the 'non-linear elimination' strategy to enable physics codes that do not provide residuals to be incorporated into LIME, (b) significant extensions of the RIO CFD code capabilities, (c) complex 3D solid modeling and meshing of major fast reactor components and regions, and (d) an approach for multi-physics coupling across non-conformal mesh interfaces.

More Details

A framework for reduced order modeling with mixed moment matching and peak error objectives

SIAM Journal on Scientific Computing

Santarelli, Keith R.

We examine a new method of producing reduced order models for LTI systems which attempts to minimize a bound on the peak error between t he original and reduced order models subject to a bound on the peak value of the input. The method, which can be implemented by solving a set of linear programming problems that are parameterized v ia a single scalar quantity, is able to minimize an error bound subject to a number of moment matc hing constraints. Moreover, because all optimization is performed in the time domain, the method can also be used to perform model reduction for infinite dimensional systems, rather than being restricted to finite order state space descriptions. We begin by contrasting the method we present her e with two classes of standard model reduction algorithms, namely, moment matching algorithms and singular value-based methods. After motivating the class of reduction tools we propose, we describe the algorithm (which minimizes the Ll norm of the difference between the original and reduced order impulse responses) and formulate the corresponding linear programming problem that is solved during each iteration of the algorithm. We then prove that, for a certain class of LTI systems, the metho d we propose can be used to produce reduced order models of arbitrary accuracy even when the original system is infinite dimensional. We then show how to incorporate moment matching constraints into the basic error bound minimization algorithm, and present three examples which utilize the techni ques described herein. We conclude with some comments on extensions to multi-input, multi-output systems, as well as some general comments for future work. © 2010 Society for Industrial and Applied Mathematics.

More Details

A framework for reduced order modeling with mixed moment matching and peak error objectives

SIAM Journal on Scientific Computing

Santarelli, Keith R.

We examine a new method of producing reduced order models for LTI systems which attempts to minimize a bound on the peak error between t he original and reduced order models subject to a bound on the peak value of the input. The method, which can be implemented by solving a set of linear programming problems that are parameterized v ia a single scalar quantity, is able to minimize an error bound subject to a number of moment matc hing constraints. Moreover, because all optimization is performed in the time domain, the method can also be used to perform model reduction for infinite dimensional systems, rather than being restricted to finite order state space descriptions. We begin by contrasting the method we present her e with two classes of standard model reduction algorithms, namely, moment matching algorithms and singular value-based methods. After motivating the class of reduction tools we propose, we describe the algorithm (which minimizes the Ll norm of the difference between the original and reduced order impulse responses) and formulate the corresponding linear programming problem that is solved during each iteration of the algorithm. We then prove that, for a certain class of LTI systems, the metho d we propose can be used to produce reduced order models of arbitrary accuracy even when the original system is infinite dimensional. We then show how to incorporate moment matching constraints into the basic error bound minimization algorithm, and present three examples which utilize the techni ques described herein. We conclude with some comments on extensions to multi-input, multi-output systems, as well as some general comments for future work. © 2010 Society for Industrial and Applied Mathematics.

More Details

A switched state feedback law for the stabilization of LTI systems

Proceedings of the 2010 American Control Conference, ACC 2010

Santarelli, Keith R.

Inspired by prior work in the design of switched feedback controllers for second order systems, we develop a switched state feedback control law for the stabilization of LTI systems of arbitrary dimension. The control law operates by switching between two static gain vectors in such a way that the state trajectory is driven onto a stable n - 1 dimensional hyperplane (where n represents the system dimension). We begin by briefly examining relevant geometric properties of the phase portraits in the case of two-dimensional systems and show how these geometric properties can be expressed as algebraic constraints on the switched vector fields that are applicable to LTI systems of arbitrary dimension. We then describe an explicit procedure for designing stabilizing controllers and illustrate the closed-loop transient performance via two examples. © 2010 AACC.

More Details

Advanced I/O for large-scale scientific applications

Oldfield, Ron A.

As scientific simulations scale to use petascale machines and beyond, the data volumes generated pose a dual problem. First, with increasing machine sizes, the careful tuning of IO routines becomes more and more important to keep the time spent in IO acceptable. It is not uncommon, for instance, to have 20% of an application's runtime spent performing IO in a 'tuned' system. Careful management of the IO routines can move that to 5% or even less in some cases. Second, the data volumes are so large, on the order of 10s to 100s of TB, that trying to discover the scientifically valid contributions requires assistance at runtime to both organize and annotate the data. Waiting for offline processing is not feasible due both to the impact on the IO system and the time required. To reduce this load and improve the ability of scientists to use the large amounts of data being produced, new techniques for data management are required. First, there is a need for techniques for efficient movement of data from the compute space to storage. These techniques should understand the underlying system infrastructure and adapt to changing system conditions. Technologies include aggregation networks, data staging nodes for a closer parity to the IO subsystem, and autonomic IO routines that can detect system bottlenecks and choose different approaches, such as splitting the output into multiple targets, staggering output processes. Such methods must be end-to-end, meaning that even with properly managed asynchronous techniques, it is still essential to properly manage the later synchronous interaction with the storage system to maintain acceptable performance. Second, for the data being generated, annotations and other metadata must be incorporated to help the scientist understand output data for the simulation run as a whole, to select data and data features without concern for what files or other storage technologies were employed. All of these features should be attained while maintaining a simple deployment for the science code and eliminating the need for allocation of additional computational resources.

More Details

Lightweight storage and overlay networks for fault tolerance

Oldfield, Ron A.

The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.

More Details
Results 8326–8350 of 9,998
Results 8326–8350 of 9,998