Publications

Results 9451–9500 of 9,998

Search results

Jump to search filters

Accelerating list management for MPI

Hemmert, Karl S.; Rodrigues, Arun; Underwood, Keith

The latency and throughput of MPI messages are critically important to a range of parallel scientific applications. In many modern networks, both of these performance characteristics are largely driven by the performance of a processor on the network interface. Because of the semantics of MPI, this embedded processor is forced to traverse a linked list of posted receives each time a message is received. As this list grows long, the latency of message reception grows and the throughput of MPI messages decreases. This paper presents a novel hardware feature to handle list management functions on a network interface. By moving functions such as list insertion, list traversal, and list deletion to the hardware unit, latencies are decreased by up to 20% in the zero length queue case with dramatic improvements in the presence of long queues. Similarly, the throughput is increased by up to 10% in the zero length queue case and by nearly 100% in the presence queues of 30 messages.

More Details

A multiscale discontinuous galerkin method with the computational structure of a continuous galerkin method

Scovazzi, Guglielmo S.; Bochev, Pavel B.

Proliferation of degrees-of-freedom has plagued discontinuous Galerkin methodology from its inception over 30 years ago. This paper develops a new computational formulation that combines the advantages of discontinuous Galerkin methods with the data structure of their continuous Galerkin counterparts. The new method uses local, element-wise problems to project a continuous finite element space into a given discontinuous space, and then applies a discontinuous Galerkin formulation. The projection leads to parameterization of the discontinuous degrees-of-freedom by their continuous counterparts and has a variational multiscale interpretation. This significantly reduces the computational burden and, at the same time, little or no degradation of the solution occurs. In fact, the new method produces improved solutions compared with the traditional discontinuous Galerkin method in some situations.

More Details

Seeded perturbations in wire array Z-Pinches

Jones, Brent M.; Deeney, Christopher D.; Mckenney, John M.; Garasi, Christopher J.; Mehlhorn, Thomas A.; Robinson, Allen C.; Coverdale, Christine A.

Controlled seeding of perturbations is employed to study the evolution of wire array z-pinch implosion instabilities which strongly impact x-ray production when the 3D plasma stagnates on axis. Wires modulated in radius exhibit locally enhanced magnetic field and imploding bubble formation at discontinuities in wire radius due to the perturbed current path. Wires coated with localized spectroscopic dopants are used to track turbulent material flow. Experiments and MHD modeling offer insight into the behavior of z-pinch instabilities.

More Details

Coupled Mesh Lagrangian/ALE modeling: opportunities and challenges

Bishop, Joseph E.; Hensinger, David M.; Voth, Thomas E.; Wong, Michael K.; Robinson, Allen C.

The success of Lagrangian contact modeling leads one to believe that important aspects of this capability may be used for multi-material modeling when only a portion of the simulation can be represented in a Lagrangian frame. We review current experience with two dual mesh technologies where one of these meshes is a Lagrangian mesh and the other is an Arbitrary Lagrangian/Eulerian (ALE) mesh. These methods are cast in the framework of an operator-split ALE algorithm where a Lagrangian step is followed by a remesh/remap step. An interface-coupled methodology is considered first. This technique is applicable to problems involving contact between materials of dissimilar compliance. The technique models the more compliant (soft) material as ALE while the less compliant (hard) material and associated interface are modeled in a Lagrangian fashion. Loads are transferred between the hard and soft materials via explicit transient dynamics contact algorithms. The use of these contact algorithms remove the requirement of node-tonode matching at the soft-hard interface. In the context of the operator-split ALE algorithm, a single Lagrangian step is performed using a mesh to mesh contact algorithm. At the end of the Lagrangian step the meshes will be slightly offset at the interface but non-interpenetrating. The ALE mesh nodes at the interface are then remeshed to their initial location relative to the Lagrangian body faces and the ALE mesh is smoothed, translated and rotated to follow Lagrangian body. Robust remeshing in the ALE region is required for success of this algorithm, and we describe current work in this area. The second method is an overlapping grid methodology that requires mapping of information between a Lagrangian mesh and an ALE mesh. The Lagrangian mesh describes a relatively hard body that interacts with softer material contained in the ALE mesh. A predicted solution for the velocity field is performed independently on both meshes. Element-centered velocity and momentum are transferred between the meshes using the volume transfer capability implemented in contact algorithms. Data from the ALE mesh is mapped to a phantom mesh that surrounds the Lagrangian mesh, providing for the reaction to the predicted motion of the Lagrangian material. Data from the Lagrangian mesh is mapped directly to the ALE mesh. A momentum balance is performed on both meshes to adjust the velocity field to account for the interaction of the material from the other mesh. Subsequent, remeshing and remapping of the ALE mesh is performed to allow large deformation of the softer material. We overview current progress using this approach and discuss avenues for future research and development.

More Details

Applications of algebraic topology to compatible spatial discretizations

Bochev, Pavel B.

We provide a common framework for compatible discretizations using algebraic topology to guide our analysis. The main concept is the natural inner product on cochains, which induces a combinatorial Hodge theory. The framework comprises of mutually consistent operations of differentiation and integration, has a discrete Stokes theorem, and preserves the invariants of the DeRham cohomology groups. The latter allows for an elementary calculation of the kernel of the discrete Laplacian. Our framework provides an abstraction that includes examples of compatible finite element, finite volume and finite difference methods. We describe how these methods result from the choice of a reconstruction operator and when they are equivalent.

More Details

Analyzing the impact of overlap, offload, and independent progress for MPI

Proposed for publication in the International Journal of High Performance Computing Applications.

Brightwell, Ronald B.; Riesen, Rolf; Underwood, Keith

The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of the Message Passing Interface (MPI) to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed to yield performance benefits. Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have not been studied in depth at the application level. This lack of analysis is complicated by the fact that most MPI implementations do not sufficiently support overlap or independent progress. Recent work has demonstrated a quantifiable advantage for an MPI implementation that uses offload to provide overlap and independent progress. The study is conducted on two different platforms with each having two MPI implementations (one with and one without independent progress). Thus, identical network hardware and virtually identical software stacks are used. Furthermore, one platform, ASCI Red, allows further separation of features such as overlap and offload. Thus, this paper extends previous work by further qualifying the source of the performance advantage: offload, overlap, or independent progress.

More Details

Effect of deformation path sequence on the behavior of nanoscale copper bicrystal interfaces

Proposed for publication in the Journal of Engineering Materials and Technology.

Plimpton, Steven J.

Molecular dynamics calculations are performed to study the effect of deformation sequence and history on the inelastic behavior of copper interfaces on the nanoscale. An asymmetric 45 deg tilt bicrystal interface is examined, representing an idealized high-angle grain boundary interface. The interface model is subjected to three different deformation paths: tension then shear, shear then tension, and combined proportional tension and shear. Analysis shows that path-history dependent material behavior is confined within a finite layer of deformation around the bicrystal interface. The relationships between length scale and interface properties, such as the thickness of the path-history dependent layer and the interface strength, are discussed in detail.

More Details

Nonlinear magnetohydrodynamics simulation using high-order finite elements

Proposed for publication in the Journal of Computational Physics.

Plimpton, Steven J.

A conforming representation composed of 2D finite elements and finite Fourier series is applied to 3D nonlinear non-ideal magnetohydrodynamics using a semi-implicit time-advance. The self-adjoint semi-implicit operator and variational approach to spatial discretization are synergistic and enable simulation in the extremely stiff conditions found in high temperature plasmas without sacrificing the geometric flexibility needed for modeling laboratory experiments. Growth rates for resistive tearing modes with experimentally relevant Lundquist number are computed accurately with time-steps that are large with respect to the global Alfven time and moderate spatial resolution when the finite elements have basis functions of polynomial degree (p) two or larger. An error diffusion method controls the generation of magnetic divergence error. Convergence studies show that this approach is effective for continuous basis functions with p {ge} 2, where the number of test functions for the divergence control terms is less than the number of degrees of freedom in the expansion for vector fields. Anisotropic thermal conduction at realistic ratios of parallel to perpendicular conductivity (x{parallel}/x{perpendicular}) is computed accurately with p {ge} 3 without mesh alignment. A simulation of tearing-mode evolution for a shaped toroidal tokamak equilibrium demonstrates the effectiveness of the algorithm in nonlinear conditions, and its results are used to verify the accuracy of the numerical anisotropic thermal conduction in 3D magnetic topologies.

More Details

An improved convergence bound for aggregation-based domain decomposition preconditioners

Proposed for publication in the SIAM Journal on Matrix Analysis and Applications.

Sala, Marzio S.; Shadid, John N.; Tuminaro, Raymond S.

In this paper we present a two-level overlapping domain decomposition preconditioner for the finite-element discretization of elliptic problems in two and three dimensions. The computational domain is partitioned into overlapping subdomains, and a coarse space correction, based on aggregation techniques, is added. Our definition of the coarse space does not require the introduction of a coarse grid. We consider a set of assumptions on the coarse basis functions to bound the condition number of the resulting preconditioned system. These assumptions involve only geometrical quantities associated with the aggregates and the subdomains. We prove that the condition number using the two-level additive Schwarz preconditioner is O(H/{delta} + H{sub 0}/{delta}), where H and H{sub 0} are the diameters of the subdomains and the aggregates, respectively, and {delta} is the overlap among the subdomains and the aggregates. This extends the bounds presented in [C. Lasser and A. Toselli, Convergence of some two-level overlapping domain decomposition preconditioners with smoothed aggregation coarse spaces, in Recent Developments in Domain Decomposition Methods, Lecture Notes in Comput. Sci. Engrg. 23, L. Pavarino and A. Toselli, eds., Springer-Verlag, Berlin, 2002, pp. 95-117; M. Sala, Domain Decomposition Preconditioners: Theoretical Properties, Application to the Compressible Euler Equations, Parallel Aspects, Ph.D. thesis, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland, 2003; M. Sala, Math. Model. Numer. Anal., 38 (2004), pp. 765-780]. Numerical experiments on a model problem are reported to illustrate the performance of the proposed preconditioner.

More Details

Xyce Parallel Electronic Simulator - Users' Guide Version 2.1

Hutchinson, Scott A.; Keiter, Eric R.; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric R.; Pawlowski, Roger P.; Fixel, Deborah A.; Schiek, Richard S.; Bogdan, Carolyn W.

This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6

More Details

Density functional theory study of transition metal porphine adsorption on gold surface and electric field induced conformation changes

Proposed for publication in the Journal of the American Chemical Society.

Rempe, Susan R.; Schultz, Peter A.; Chandross, M.

We apply density functional theory (DFT) and the DFT+U technique to study the adsorption of transition metal porphine molecules on atomistically flat Au(111) surfaces. DFT calculations using the Perdew?Burke?Ernzerhof exchange correlation functional correctly predict the palladium porphine (PdP) low-spin ground state. PdP is found to adsorb preferentially on gold in a flat geometry, not in an edgewise geometry, in qualitative agreement with experiments on substituted porphyrins. It exhibits no covalent bonding to Au(111), and the binding energy is a small fraction of an electronvolt. The DFT+U technique, parametrized to B3LYP-predicted spin state ordering of the Mn d-electrons, is found to be crucial for reproducing the correct magnetic moment and geometry of the isolated manganese porphine (MnP) molecule. Adsorption of Mn(II)P on Au(111) substantially alters the Mn ion spin state. Its interaction with the gold substrate is stronger and more site-specific than that of PdP. The binding can be partially reversed by applying an electric potential, which leads to significant changes in the electronic and magnetic properties of adsorbed MnP and 0.1 {angstrom} changes in the Mn-nitrogen distances within the porphine macrocycle. We conjecture that this DFT+U approach may be a useful general method for modeling first-row transition metal ion complexes in a condensed-matter setting.

More Details

Reversible logic for supercomputing

DeBenedictis, Erik

This paper is about making reversible logic a reality for supercomputing. Reversible logic offers a way to exceed certain basic limits on the performance of computers, yet a powerful case will have to be made to justify its substantial development expense. This paper explores the limits of current, irreversible logic for supercomputers, thus forming a threshold above which reversible logic is the only solution. Problems above this threshold are discussed, with the science and mitigation of global warming being discussed in detail. To further develop the idea of using reversible logic in supercomputing, a design for a 1 Zettaflops supercomputer as required for addressing global climate warming is presented. However, to create such a design requires deviations from the mainstream of both the software for climate simulation and research directions of reversible logic. These deviations provide direction on how to make reversible logic practical.

More Details

Dynamic data-driven inversion for terascale simulations real-time identification of airborne contaminants

Draganescu, Andrei I.

In contrast to traditional terascale simulations that have known, fixed data inputs, dynamic data-driven (DDD) applications are characterized by unknown data and informed by dynamic observations. DDD simulations give rise to inverse problems of determining unknown data from sparse observations. The main difficulty is that the optimality system is a boundary value problem in 4D space-time, even though the forward simulation is an initial value problem. We construct special-purpose parallel multigrid algorithms that exploit the spectral structure of the inverse operator. Experiments on problems of localizing airborne contaminant release from sparse observations in a regional atmospheric transport model demonstrate that 17-million-parameter inversion can be effected at a cost of just 18 forward simulations with high parallel efficiency. On 1024 Alphaserver EV68 processors, the turnaround time is just 29 minutes. Moreover, inverse problems with 135 million parameters - corresponding to 139 billion total space-time unknowns - are solved in less than 5 hours on the same number of processors. These results suggest that ultra-high resolution data-driven inversion can be carried out sufficiently rapidly for simulation-based 'real-time' hazard assessment.

More Details

A model for resource-aware load balancing on heterogeneous clusters

Proposed for publication in the IEEE Transactions on Parallel and Distributed Systems.

Devine, Karen D.

We address the problem of partitioning and dynamic load balancing on clusters with heterogeneous hardware resources. We propose DRUM, a model that encapsulates hardware resources and their interconnection topology. DRUM provides monitoring facilities for dynamic evaluation of communication, memory, and processing capabilities. Heterogeneity is quantified by merging the information from the monitors to produce a scalar number called 'power.' This power allows DRUM to be used easily by existing load-balancing procedures such as those in the Zoltan Toolkit while placing minimal burden on application programmers. We demonstrate the use of DRUM to guide load balancing in the adaptive solution of a Laplace equation on a heterogeneous cluster. We observed a significant reduction in execution time compared to traditional methods.

More Details
Results 9451–9500 of 9,998
Results 9451–9500 of 9,998