Publications

Results 9351–9400 of 9,998

Search results

Jump to search filters

Implications of application usage characteristics for collective communication offload

International Journal of High Performance Computing and Networking

Brightwell, Ronald B.; Goudy, Sue P.; Rodrigues, Arun; Underwood, Keith D.

The global, synchronous nature of some collective operations implies that they will become the bottleneck when scaling to hundreds of thousands of nodes. One approach improves collective performance using a programmable network interface to directly implement collectives. While these implementations improve micro-benchmark performance, accelerating applications will require deeper understanding of application behaviour. We describe several characteristics of applications that impact collective communication performance. We analyse network resource usage data to guide the design of collective offload engines and their associated programming interfaces. In particular, we provide an analysis of the potential benefit of non-blocking collective communication operations for MPI. © 2006 Inderscience Enterprises Ltd.

More Details

ALEGRA-HEDP validation strategy

Trucano, Timothy G.

This report presents a initial validation strategy for specific SNL pulsed power program applications of the ALEGRA-HEDP radiation-magnetohydrodynamics computer code. The strategy is written to be (1) broadened and deepened with future evolution of particular specifications given in this version; (2) broadly applicable to computational capabilities other than ALEGRA-HEDP directed at the same pulsed power applications. The content and applicability of the document are highly constrained by the R&D thrust of the SNL pulsed power program. This means that the strategy has significant gaps, indicative of the flexibility required to respond to an ongoing experimental program that is heavily engaged in phenomena discovery.

More Details

Constitutive models for rubber networks undergoing simultaneous crosslinking and scission

Budzien, Joanne L.; Lo, Chi S.; Curro, John G.; Thompson, Aidan P.; Grest, Gary S.

Constitutive models for chemically reacting networks are formulated based on a generalization of the independent network hypothesis. These models account for the coupling between chemical reaction and strain histories, and have been tested by comparison with microscopic molecular dynamics simulations. An essential feature of these models is the introduction of stress transfer functions that describe the interdependence between crosslinks formed and broken at various strains. Efforts are underway to implement these constitutive models into the finite element code Adagio. Preliminary results are shown that illustrate the effects of changing crosslinking and scission rates and history.

More Details

Algorithm and simulation development in support of response strategies for contamination events in air and water systems

van Bloemen Waanders, Bart G.

Chemical/Biological/Radiological (CBR) contamination events pose a considerable threat to our nation's infrastructure, especially in large internal facilities, external flows, and water distribution systems. Because physical security can only be enforced to a limited degree, deployment of early warning systems is being considered. However to achieve reliable and efficient functionality, several complex questions must be answered: (1) where should sensors be placed, (2) how can sparse sensor information be efficiently used to determine the location of the original intrusion, (3) what are the model and data uncertainties, (4) how should these uncertainties be handled, and (5) how can our algorithms and forward simulations be sufficiently improved to achieve real time performance? This report presents the results of a three year algorithmic and application development to support the identification, mitigation, and risk assessment of CBR contamination events. The main thrust of this investigation was to develop (1) computationally efficient algorithms for strategically placing sensors, (2) identification process of contamination events by using sparse observations, (3) characterization of uncertainty through developing accurate demands forecasts and through investigating uncertain simulation model parameters, (4) risk assessment capabilities, and (5) reduced order modeling methods. The development effort was focused on water distribution systems, large internal facilities, and outdoor areas.

More Details

Enabling fluid-structural strong thermal coupling within a multi-physics environment

Collection of Technical Papers - 44th AIAA Aerospace Sciences Meeting

Hooper, Russell H.; Smith, Thomas M.; Ober, Curtis C.

We demonstrate use of a Jacobian-Free Newton-Krylov solver to enable strong thermal coupling at the interface between a solid body and an external compressible fluid. Our method requires only information typically used in loose coupling based on successive substitution and is implemented within a multi-physics framework. We present results for two external flows over thermally conducting solid bodies obtained using both loose and strong coupling strategies. Performance of the two strategies is compared to elucidate both advantages and caveats associated with strong coupling.

More Details

Automated expert modeling for automated student evaluation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abbott, Robert G.

This paper presents automated expert modeling for automated student evaluation, or AEMASE (pronounced "amaze"). This technique grades students by comparing their actions to a model of expert behavior. The expert model is constructed with machine learning techniques, avoiding the costly and time-consuming process of manual knowledge elicitation and expert system implementation. A brief summary of after action review (AAR) and intelligent tutoring systems (ITS) provides background for a prototype AAR application with a learning expert model. A validation experiment confirms that the prototype accurately grades student behavior on a tactical aircraft maneuver application. Finally, several topics for further research are proposed. © Springer-Verlag Berlin Heidelberg 2006.

More Details

Recycling Krylov subspaces for sequences of linear systems

SIAM Journal on Scientific Computing

Parks, Michael L.; De Sturler, Eric; Mackey, Greg; Johnson, Duane D.; Maiti, Spandan

Many problems in science and engineering require the solution of a long sequence of slowly changing linear systems. We propose and analyze two methods that significantly reduce the total number of matrix-vector products required to solve all systems. We consider the general case where both the matrix and right-hand side change, and we make no assumptions regarding the change in the right-hand sides. Furthermore, we consider general nonsingular matrices, and we do not assume that all matrices are pairwise close or that the sequence of matrices converges to a particular matrix. Our methods work well under these general assumptions, and hence form a significant advancement with respect to related work in this area. We can reduce the cost of solving subsequent systems in the sequence by recycling selected subspaces generated for previous systems. We consider two approaches that allow for the continuous improvement of the recycled subspace at low cost. We consider both Hermitian and non-Hermitian problems, and we analyze our algorithms both theoretically and numerically to illustrate the effects of subspace recycling. We also demonstrate the effectiveness of our algorithms for a range of applications from computational mechanics, materials science, and computational physics. © 2006 Society for Industrial and Applied Mathematics.

More Details

Automatic differentiation of C++ codes for large-scale scientific computing

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Bartlett, Roscoe B.; Gay, David M.; Phipps, Eric T.

We discuss computing first derivatives for models based on elements, such as large-scale finite-element PDE discretizations, implemented in the C++ programming language. We use a hybrid technique of automatic differentiation (AD) and manual assembly, with local element-level derivatives computed via AD and manually summed into the global derivative. C++ templating and operator overloading work well for both forward- and reverse-mode derivative computations. We found that AD derivative computations compared favorably in time to finite differencing for a scalable finite-element discretization of a convection-diffusion problem in two dimensions. © Springer-Verlag Berlin Heidelberg 2006.

More Details

The surfpack software library for surrogate modeling of sparse irregularly spaced multidimensional data

Collection of Technical Papers - 11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference

Giunta, Anthony A.; Swiler, Laura P.; Brown, Shannon L.; Eldred, Michael S.; Richards, Mark D.; Cyr, Eric C.

Surfpack is a general-purpose software library of multidimensional function approximation methods for applications such as data visualization, data mining, sensitivity analysis, uncertainty quantification, and numerical optimization. Surfpack is primarily intended for use on sparse, irregularly-spaced, n-dimensional data sets where classical function approximation methods are not applicable. Surfpack is under development at Sandia National Laboratories, with a public release of Surfpack version 1.0 in August 2006. This paper provides an overview of Surfpack's function approximation methods along with some of its software design attributes. In addition, this paper provides some simple examples to illustrate the utility of Surfpack for data trend analysis, data visualization, and optimization. Copyright © 2006 by the American Institute of Aeronautics and Astronautics, Inc.

More Details

Measuring MPI send and receive overhead and application availability in high performance network interfaces

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Doerfler, Douglas W.; Brightwell, Ronald B.

In evaluating new high-speed network interfaces, the usual metrics of latency and bandwidth are commonly measured and reported. There are numerous other message passing characteristics that can have a dramatic effect on application performance that should be analyzed when evaluating a new interconnect. One such metric is overhead, which dictates the networks ability to allow the application to perform non-message passing work while a transfer is taking place. A method for measuring overhead, and hence calculating application availability, is presented. Results for several next-generation network interfaces are also presented. © Springer-Verlag Berlin Heidelberg 2006.

More Details

Modeling and simulation technology readiness levels

Clay, Robert L.; Marburger, Scot J.; Shneider, Max S.; Trucano, Timothy G.

This report summarizes the results of an effort to establish a framework for assigning and communicating technology readiness levels (TRLs) for the modeling and simulation (ModSim) capabilities at Sandia National Laboratories. This effort was undertaken as a special assignment for the Weapon Simulation and Computing (WSC) program office led by Art Hale, and lasted from January to September 2006. This report summarizes the results, conclusions, and recommendations, and is intended to help guide the program office in their decisions about the future direction of this work. The work was broken out into several distinct phases, starting with establishing the scope and definition of the assignment. These are characterized in a set of key assertions provided in the body of this report. Fundamentally, the assignment involved establishing an intellectual framework for TRL assignments to Sandia's modeling and simulation capabilities, including the development and testing of a process to conduct the assignments. To that end, we proposed a methodology for both assigning and understanding the TRLs, and outlined some of the restrictions that need to be placed on this process and the expected use of the result. One of the first assumptions we overturned was the notion of a ''static'' TRL--rather we concluded that problem context was essential in any TRL assignment, and that leads to dynamic results (i.e., a ModSim tool's readiness level depends on how it is used, and by whom). While we leveraged the classic TRL results from NASA, DoD, and Sandia's NW program, we came up with a substantially revised version of the TRL definitions, maintaining consistency with the classic level definitions and the Predictive Capability Maturity Model (PCMM) approach. In fact, we substantially leveraged the foundation the PCMM team provided, and augmented that as needed. Given the modeling and simulation TRL definitions and our proposed assignment methodology, we conducted four ''field trials'' to examine how this would work in practice. The results varied substantially, but did indicate that establishing the capability dependencies and making the TRL assignments was manageable and not particularly time consuming. The key differences arose in perceptions of how this information might be used, and what value it would have (opinions ranged from negative to positive value). The use cases and field trial results are included in this report. Taken together, the results suggest that we can make reasonably reliable TRL assignments, but that using those without the context of the information that led to those results (i.e., examining the measures suggested by the PCMM table, and extended for ModSim TRL purposes) produces an oversimplified result--that is, you cannot really boil things down to just a scalar value without losing critical information.

More Details

Semi-infinite target penetration by ogive-nose penetrators: ALEGRA/SHISM code predictions for ideal and non-ideal impacts

American Society of Mechanical Engineers, Pressure Vessels and Piping Division (Publication) PVP

Bishop, Joseph E.; Voth, Thomas E.; Brown, Kevin H.

The physics of ballistic penetration mechanics is of great interest in penetrator and counter-measure design. The phenomenology associated with these events can be quite complex and a significant number of studies have been conducted ranging from purely experimental to 'engineering' models based on empirical and/or analytical descriptions to fully-coupled penetrator/target, thermo-mechanical numerical simulations. Until recently, however, there appears to be a paucity of numerical studies considering 'non-ideal' impacts [1]. The goal of this work is to demonstrate the SHISM algorithm implemented in the ALEGRA Multi-Material ALE (Arbitrary Lagrangian Eulerian) code [13]. The SHISM algorithm models the three-dimensional continuum solid mechanics response of the target and penetrator in a fully coupled manner. This capability allows for the study of 'non-ideal' impacts (e.g. pitch, yaw and/or obliquity of the target/penetrator pair). In this work predictions using the SHISM algorithm are compared to previously published experimental results for selected ideal and non-ideal impacts of metal penetrator-target pairs. These results show good agreement between predicted and measured maximum depth-of-penetration, DOP, for ogive-nose penetrators with striking velocities in the 0.5 to 1.5 km/s range. Ideal impact simulations demonstrate convergence in predicted DOP for the velocity range considered. A theory is advanced to explain disagreement between predicted and measured DOP at higher striking velocities. This theory postulates uncertainties in angle-of-attack for the observed discrepancies. It is noted that material models and associated parameters used here, were unmodified from those in the literature. Hence, no tuning of models was performed to match experimental data. Copyright © 2005 by ASME.

More Details

Kevlar and Carbon Composite body armor - Analysis and testing

American Society of Mechanical Engineers, Pressure Vessels and Piping Division (Publication) PVP

Uekert, Vanessa S.; Stofleth, Jerome H.; Preece, Dale S.; Risenmay, Matthew A.

Kevlar materials make excellent body armor due to their fabric-like flexibility and ultra-high tensile strength. Carbon composites are made up from many layers of carbon AS-4 material impregnated with epoxy. Fiber orientation is bidirectional, orientated at 0° and 90°. They also have ultra-high tensile strength but can be made into relatively hard armor pieces. Once many layers are cut and assembled they can be ergonomicically shaped in a mold during the heated curing process. Kevlar and carbon composites can be used together to produce light and effective body armor. This paper will focus on computer analysis and laboratory testing of a Kevlar/carbon composite cross-section proposed for body armor development. The carbon composite is inserted between layers of Kevlar. The computer analysis was performed with a Lagrangian transversely Isotropic material model for both the Kevlar and Carbon Composite. The computer code employed is AUTODYN. Both the computer analysis and laboratory testing utilized different fragments sizes of hardened steel impacting on the armor cross-section. The steel fragments are right-circular cylinders. Laboratory testing was undertaken by firing various sizes of hardened steel fragments at square test coupons of Kevlar layers and heat cured carbon composites. The V50 velocity for the various fragment sizes was determined from the testing. This V50 data can be used to compare the body armor design with other previously designed armor systems. AUTODYN [1] computer simulations of the fragment impacts were compared to the experimental results and used to evaluate and guide the overall design process. This paper will include the detailed transversely isotropic computer simulations of the Kevlar/carbon composite cross-section as well as the experimental results and a comparison between the two. Conclusions will be drawn about the design process and the validity of current computer modeling methods for Kevlar and carbon composites. Copyright © 2005 by ASME.

More Details

Enhancing NIC performance for MPI using processing-in-memory

Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005

Rodrigues, Arun; Murphy, Richard; Brightwell, Ronald B.; Underwood, Keith D.

Processing-in-Memory (PIM) technology encompasses a range of research leveraging a tight coupling of memory and processing. The most unique features of the technology are extremely wide paths to memory, extremely low memory latency, and wide functional units. Many PIM researchers are also exploring extremely fine-grained multi-threading capabilities. This paper explores a mechanism for leveraging these features of PIM technology to enhance commodity architectures in a seemingly mundane way: accelerating MPI. Modern network interfaces leverage simple processors to offload portions of the MPI semantics, particularly the management of posted receive and unexpected message queues. Without adding cost or increasing clock frequency, using PIMs in the network interface can enhance performance. The results are a significant decrease in latency and increase in small message bandwidth, particularly when long queues are present.

More Details

Computational stability study of 3D flow in a differentially heated 8:1:1 cavity

3rd M.I.T. Conference on Computational Fluid and Solid Mechanics

Salinger, Andrew G.

The critical Rayleigh number Racr of the Hopf bifurcation that signals the limit of steady flows in a differentially heated 8:1:1 cavity is computed. The two-dimensional analog of this problem was the subject of a comprehensive set of benchmark calculations that included the estimation of Racr [1]. In this work we begin to answer the question of whether the 2D results carry over into 3D models. For the case of the 2D model being extruded for a depth of 1, and no-slip/no-penetration and adiabatic boundary conditions placed at these walls, the steady flow and destabilizing eigenvectors qualitatively match those from the 2D model. A mesh resolution study extending to a 20-million unknown model shows that the presence of these walls delays the first critical Rayleigh number from 3.06 × 105 to 5.13 × 105. © 2005 Elsevier Ltd.

More Details

Reversible logic for supercomputing

2005 Computing Frontiers Conference

DeBenedictis, Erik

This paper is about making reversible logic a reality for supercomputing. Reversible logic offers a way to exceed certain basic limits on the performance of computers, yet a powerful case will have to be made to justify its substantial development expense. This paper explores the limits of current, irreversible logic for supercomputers, thus forming a threshold above which reversible logic is the only solution. Problems above this threshold are discussed, with the science and mitigation of global warming being discussed in detail. To further develop the idea of using reversible logic in supercomputing, a design for a 1 Zettaflops supercomputer as required for addressing global climate warming is presented. However, to create such a design requires deviations from the mainstream of both the software for climate simulation and research directions of reversible logic. These deviations provide direction on how to make reversible logic practical. Copyright 2005 ACM.

More Details

Considering the relative importance of network performance and network features

Proceedings of the International Conference on Parallel Processing

Lawry, William L.; Underwood, Keith

Latency and bandwidth are usually considered to be the dominant factor in parallel application performance; however, recent studies have indicated that support for independent progress in MPI can also have a significant impact on application performance. This paper leverages the Cplant system at Sandia National Labs to compare a faster, vendor provided MPI library without independent progress to an internally developed MPI library that sacrifices some performance to provide independent progress. The results are surprising. Although some applications see significant negative impacts from the reduced network performance, others are more sensitive to the presence of independent progress. © 2005 IEEE.

More Details

An analysis of the double-precision floating-point FFT on FPGAs

Proceedings - 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2005

Hemmert, Karl S.; Underwood, Keith

Advances in FPGA technology have led to dramatic improvements in double precision floating-point performance. Modern FPGAs boast several GigaFLOPs of raw computing power. Unfortunately, this computing power is distributed across 30 floating-point units with over 10 cycles of latency each. The user must find two orders of magnitude more parallelism than is typically exploited in a single microprocessor; thus, it is not clear that the computational power of FPGAs can be exploited across a wide range of algorithms. This paper explores three implementation alternatives for the Fast Fourier Transform (FFT) on FPGAs. The algorithms are compared in terms of sustained performance and memory requirements for various FFT sizes and FPGA sizes. The results indicate that FPGAs are competitive with microprocessors in terms of performance and that the "correct" FFT implementation varies based on the size of the transform and the size of the FPGA. © 2005 IEEE.

More Details

Perspectives on optimization under uncertainty: Algorithms and applications

Giunta, Anthony A.; Eldred, Michael S.; Swiler, Laura P.; Trucano, Timothy G.

This paper provides an overview of several approaches to formulating and solving optimization under uncertainty (OUU) engineering design problems. In addition, the topic of high-performance computing and OUU is addressed, with a discussion of the coarse- and fine-grained parallel computing opportunities in the various OUU problem formulations. The OUU approaches covered here are: sampling-based OUU, surrogate model-based OUU, analytic reliability-based OUU (also known as reliability-based design optimization), polynomial chaos-based OUU, and stochastic perturbation-based OUU.

More Details

A comparison of floating point and logarithmic number systems for FPGAs

Proceedings - 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2005

Haselman, Michael; Beauchamp, Michael; Wood, Aaron; Hauck, Scott; Underwood, Keith; Hemmert, Karl S.

There have been many papers proposing the use of logarithmic numbers (LNS) as an alternative to floating point because of simpler multiplication, division and exponentiation computations [1,4-9,13]. However, this advantage comes at the cost of complicated, inexact addition and subtraction, as well as the need to convert between the formats. In this work, we created a parameterized LNS library of computational units and compared them to an existing floating point library. Specifically, we considered multiplication, division, addition, subtraction, and format conversion to determine when one format should be used over the other and when it is advantageous to change formats during a calculation. © 2005 IEEE.

More Details

A comparison of Navier Stokes and network models to predict chemical transport in municipal water distribution systems

World Water Congress 2005: Impacts of Global Climate Change - Proceedings of the 2005 World Water and Environmental Resources Congress

Van Bloemen Waanders, B.; Hammond, G.; Shadid, John N.; Collis, S.; Murray, R.

We investigate the accuracy of chemical transport in network models for small geometric configurations. Network model have successfully simulated the general operations of large water distribution systems. However, some of the simplifying assumptions associated with the implementation may cause inaccuracies if chemicals need to be carefully characterized at a high level of detail. In particular, we are interested in precise transport behavior so that inversion and control problems can be applied to water distribution networks. As an initial phase, Navier Stokes combined with a convection-diffusion formulation was used to characterize the mixing behavior at a pipe intersection in two dimensions. Our numerical models predict only on the order of 12-14 % of the chemical to be mixed with the other inlet pipe. Laboratory results show similar behavior and suggest that even if our numerical model is able to resolve turbulence, it may not improve the mixing behavior. This conclusion may not be appropriate however for other sets of operating conditions, and therefore we have started to develop a 3D implementation. Preliminary results for duct geometry are presented. © copyright ASCE 2005.

More Details

Molecular simulations of beta-amyloid protein near hydrated lipids (PECASE)

Thompson, Aidan P.

We performed molecular dynamics simulations of beta-amyloid (A{beta}) protein and A{beta} fragment(31-42) in bulk water and near hydrated lipids to study the mechanism of neurotoxicity associated with the aggregation of the protein. We constructed full atomistic models using Cerius2 and ran simulations using LAMMPS. MD simulations with different conformations and positions of the protein fragment were performed. Thermodynamic properties were compared with previous literature and the results were analyzed. Longer simulations and data analyses based on the free energy profiles along the distance between the protein and the interface are ongoing.

More Details
Results 9351–9400 of 9,998
Results 9351–9400 of 9,998