Publications

Results 8376–8400 of 9,998

Search results

Jump to search filters

Increasing fault resiliency in a message-passing environment

Ferreira, Kurt; Oldfield, Ron A.; Stearley, Jon S.; Laros, James H.; Pedretti, Kevin T.T.; Brightwell, Ronald B.

Petaflops systems will have tens to hundreds of thousands of compute nodes which increases the likelihood of faults. Applications use checkpoint/restart to recover from these faults, but even under ideal conditions, applications running on more than 30,000 nodes will likely spend more than half of their total run time saving checkpoints, restarting, and redoing work that was lost. We created a library that performs redundant computations on additional nodes allocated to the application. An active node and its redundant partner form a node bundle which will only fail, and cause an application restart, when both nodes in the bundle fail. The goal of this library is to learn whether this can be done entirely at the user level, what requirements this library places on a Reliability, Availability, and Serviceability (RAS) system, and what its impact on performance and run time is. We find that our redundant MPI layer library imposes a relatively modest performance penalty for applications, but that it greatly reduces the number of applications interrupts. This reduction in interrupts leads to huge savings in restart and rework time. For large-scale applications the savings compensate for the performance loss and the additional nodes required for redundant computations.

More Details

Graph algorithms in the titan toolkit

McLendon, William C.

Graph algorithms are a key component in a wide variety of intelligence analysis activities. The Graph-Based Informatics for Non-Proliferation and Counter-Terrorism project addresses the critical need of making these graph algorithms accessible to Sandia analysts in a manner that is both intuitive and effective. Specifically we describe the design and implementation of an open source toolkit for doing graph analysis, informatics, and visualization that provides Sandia with novel analysis capability for non-proliferation and counter-terrorism.

More Details

Final report LDRD project 105816 : model reduction of large dynamic systems with localized nonlinearities

Lehoucq, Richard B.; Dohrmann, Clark R.; Segalman, Daniel J.

Advanced computing hardware and software written to exploit massively parallel architectures greatly facilitate the computation of extremely large problems. On the other hand, these tools, though enabling higher fidelity models, have often resulted in much longer run-times and turn-around-times in providing answers to engineering problems. The impediments include smaller elements and consequently smaller time steps, much larger systems of equations to solve, and the inclusion of nonlinearities that had been ignored in days when lower fidelity models were the norm. The research effort reported focuses on the accelerating the analysis process for structural dynamics though combinations of model reduction and mitigation of some factors that lead to over-meshing.

More Details

Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling

Journal of Computational Physics

Lin, Paul T.; Shadid, John N.; Sala, Marzio; Tuminaro, Raymond S.; Hennigan, Gary L.; Hoekstra, Robert J.

In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system is obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 108 unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system. © 2009 Elsevier Inc. All rights reserved.

More Details

A comparison of Lagrangian/Eulerian approaches for tracking the kinematics of high deformation solid motion

Ames, Thomas L.; Robinson, Allen C.

The modeling of solids is most naturally placed within a Lagrangian framework because it requires constitutive models which depend on knowledge of the original material orientations and subsequent deformations. Detailed kinematic information is needed to ensure material frame indifference which is captured through the deformation gradient F. Such information can be tracked easily in a Lagrangian code. Unfortunately, not all problems can be easily modeled using Lagrangian concepts due to severe distortions in the underlying motion. Either a Lagrangian/Eulerian or a pure Eulerian modeling framework must be introduced. We discuss and contrast several Lagrangian/Eulerian approaches for keeping track of the details of material kinematics.

More Details

Investigating methods of supporting dynamically linked executables on high performance computing platforms

Laros, James H.; Kelly, Suzanne M.; Levenhagen, Michael J.; Pedretti, Kevin T.T.

Shared libraries have become ubiquitous and are used to achieve great resource efficiencies on many platforms. The same properties that enable efficiencies on time-shared computers and convenience on small clusters prove to be great obstacles to scalability on large clusters and High Performance Computing platforms. In addition, Light Weight operating systems such as Catamount have historically not supported the use of shared libraries specifically because they hinder scalability. In this report we will outline the methods of supporting shared libraries on High Performance Computing platforms using Light Weight kernels that we investigated. The considerations necessary to evaluate utility in this area are many and sometimes conflicting. While our initial path forward has been determined based on this evaluation we consider this effort ongoing and remain prepared to re-evaluate any technology that might provide a scalable solution. This report is an evaluation of a range of possible methods of supporting dynamically linked executables on capability class1 High Performance Computing platforms. Efforts are ongoing and extensive testing at scale is necessary to evaluate performance. While performance is a critical driving factor, supporting whatever method is used in a production environment is an equally important and challenging task.

More Details

Improving performance via mini-applications

Doerfler, Douglas W.; Crozier, Paul C.; Edwards, Harold C.; Williams, Alan B.; Rajan, Mahesh R.; Keiter, Eric R.; Thornquist, Heidi K.

Application performance is determined by a combination of many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, we find that the use of mini-applications - small self-contained proxies for real applications - is an excellent approach for rapidly exploring the parameter space of all these choices. Furthermore, use of mini-applications enriches the interaction between application, library and computer system developers by providing explicit functioning software and concrete performance results that lead to detailed, focused discussions of design trade-offs, algorithm choices and runtime performance issues. In this paper we discuss a collection of mini-applications and demonstrate how we use them to analyze and improve application performance on new and future computer platforms.

More Details

Efficient algorithms for mixed aleatory-epistemic uncertainty quantification with application to radiation-hardened electronics. Part I, algorithms and benchmark results

Eldred, Michael S.; Swiler, Laura P.

This report documents the results of an FY09 ASC V&V Methods level 2 milestone demonstrating new algorithmic capabilities for mixed aleatory-epistemic uncertainty quantification. Through the combination of stochastic expansions for computing aleatory statistics and interval optimization for computing epistemic bounds, mixed uncertainty analysis studies are shown to be more accurate and efficient than previously achievable. Part I of the report describes the algorithms and presents benchmark performance results. Part II applies these new algorithms to UQ analysis of radiation effects in electronic devices and circuits for the QASPR program.

More Details

Quantitative resilience analysis through control design

Vugrin, Eric D.; Camphouse, Russell C.; Sunderland, Daniel S.

Critical infrastructure resilience has become a national priority for the U. S. Department of Homeland Security. System resilience has been studied for several decades in many different disciplines, but no standards or unifying methods exist for critical infrastructure resilience analysis. Few quantitative resilience methods exist, and those existing approaches tend to be rather simplistic and, hence, not capable of sufficiently assessing all aspects of critical infrastructure resilience. This report documents the results of a late-start Laboratory Directed Research and Development (LDRD) project that investigated the development of quantitative resilience through application of control design methods. Specifically, we conducted a survey of infrastructure models to assess what types of control design might be applicable for critical infrastructure resilience assessment. As a result of this survey, we developed a decision process that directs the resilience analyst to the control method that is most likely applicable to the system under consideration. Furthermore, we developed optimal control strategies for two sets of representative infrastructure systems to demonstrate how control methods could be used to assess the resilience of the systems to catastrophic disruptions. We present recommendations for future work to continue the development of quantitative resilience analysis methods.

More Details

Toward improved branch prediction through data mining

Hemmert, Karl S.

Data mining and machine learning techniques can be applied to computer system design to aid in optimizing design decisions, improving system runtime performance. Data mining techniques have been investigated in the context of branch prediction. Specifically, a comparison of traditional branch predictor performance has been made to data mining algorithms. Additionally, the possiblity of whether additional features available within the architectural state might serve to further improve branch prediction has been evaluated. Results show that data mining techniques indicate potential for improved branch prediction, especially when register file contents are included as a feature set.

More Details

Scalable analysis tools for sensitivity analysis and UQ (3160) results

Ice, Lisa I.; Fabian, Nathan D.; Moreland, Kenneth D.; Bennett, Janine C.; Thompson, David C.; Karelitz, David B.

The 9/30/2009 ASC Level 2 Scalable Analysis Tools for Sensitivity Analysis and UQ (Milestone 3160) contains feature recognition capability required by the user community for certain verification and validation tasks focused around sensitivity analysis and uncertainty quantification (UQ). These feature recognition capabilities include crater detection, characterization, and analysis from CTH simulation data; the ability to call fragment and crater identification code from within a CTH simulation; and the ability to output fragments in a geometric format that includes data values over the fragments. The feature recognition capabilities were tested extensively on sample and actual simulations. In addition, a number of stretch criteria were met including the ability to visualize CTH tracer particles and the ability to visualize output from within an S3D simulation.

More Details

A fully implicit method for 3D quasi-steady state magnetic advection-diffusion

Siefert, Christopher S.; Robinson, Allen C.

We describe the implementation of a prototype fully implicit method for solving three-dimensional quasi-steady state magnetic advection-diffusion problems. This method allows us to solve the magnetic advection diffusion equations in an Eulerian frame with a fixed, user-prescribed velocity field. We have verified the correctness of method and implementation on two standard verification problems, the Solberg-White magnetic shear problem and the Perry-Jones-White rotating cylinder problem.

More Details

Highly scalable linear solvers on thousands of processors

Siefert, Christopher S.; Tuminaro, Raymond S.; Domino, Stefan P.; Robinson, Allen C.

In this report we summarize research into new parallel algebraic multigrid (AMG) methods. We first provide a introduction to parallel AMG. We then discuss our research in parallel AMG algorithms for very large scale platforms. We detail significant improvements in the AMG setup phase to a matrix-matrix multiplication kernel. We present a smoothed aggregation AMG algorithm with fewer communication synchronization points, and discuss its links to domain decomposition methods. Finally, we discuss a multigrid smoothing technique that utilizes two message passing layers for use on multicore processors.

More Details

Neural assembly models derived through nano-scale measurements

Fan, Hongyou F.; Forsythe, James C.; Branda, Catherine B.; Warrender, Christina E.; Schiek, Richard S.

This report summarizes accomplishments of a three-year project focused on developing technical capabilities for measuring and modeling neuronal processes at the nanoscale. It was successfully demonstrated that nanoprobes could be engineered that were biocompatible, and could be biofunctionalized, that responded within the range of voltages typically associated with a neuronal action potential. Furthermore, the Xyce parallel circuit simulator was employed and models incorporated for simulating the ion channel and cable properties of neuronal membranes. The ultimate objective of the project had been to employ nanoprobes in vivo, with the nematode C elegans, and derive a simulation based on the resulting data. Techniques were developed allowing the nanoprobes to be injected into the nematode and the neuronal response recorded. To the authors's knowledge, this is the first occasion in which nanoparticles have been successfully employed as probes for recording neuronal response in an in vivo animal experimental protocol.

More Details
Results 8376–8400 of 9,998
Results 8376–8400 of 9,998