Publications

Results 9476–9500 of 9,998

Search results

Jump to search filters

Applications of algebraic topology to compatible spatial discretizations

Bochev, Pavel B.

We provide a common framework for compatible discretizations using algebraic topology to guide our analysis. The main concept is the natural inner product on cochains, which induces a combinatorial Hodge theory. The framework comprises of mutually consistent operations of differentiation and integration, has a discrete Stokes theorem, and preserves the invariants of the DeRham cohomology groups. The latter allows for an elementary calculation of the kernel of the discrete Laplacian. Our framework provides an abstraction that includes examples of compatible finite element, finite volume and finite difference methods. We describe how these methods result from the choice of a reconstruction operator and when they are equivalent.

More Details

Analyzing the impact of overlap, offload, and independent progress for MPI

Proposed for publication in the International Journal of High Performance Computing Applications.

Brightwell, Ronald B.; Riesen, Rolf; Underwood, Keith

The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of the Message Passing Interface (MPI) to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed to yield performance benefits. Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have not been studied in depth at the application level. This lack of analysis is complicated by the fact that most MPI implementations do not sufficiently support overlap or independent progress. Recent work has demonstrated a quantifiable advantage for an MPI implementation that uses offload to provide overlap and independent progress. The study is conducted on two different platforms with each having two MPI implementations (one with and one without independent progress). Thus, identical network hardware and virtually identical software stacks are used. Furthermore, one platform, ASCI Red, allows further separation of features such as overlap and offload. Thus, this paper extends previous work by further qualifying the source of the performance advantage: offload, overlap, or independent progress.

More Details

Effect of deformation path sequence on the behavior of nanoscale copper bicrystal interfaces

Proposed for publication in the Journal of Engineering Materials and Technology.

Plimpton, Steven J.

Molecular dynamics calculations are performed to study the effect of deformation sequence and history on the inelastic behavior of copper interfaces on the nanoscale. An asymmetric 45 deg tilt bicrystal interface is examined, representing an idealized high-angle grain boundary interface. The interface model is subjected to three different deformation paths: tension then shear, shear then tension, and combined proportional tension and shear. Analysis shows that path-history dependent material behavior is confined within a finite layer of deformation around the bicrystal interface. The relationships between length scale and interface properties, such as the thickness of the path-history dependent layer and the interface strength, are discussed in detail.

More Details

Nonlinear magnetohydrodynamics simulation using high-order finite elements

Proposed for publication in the Journal of Computational Physics.

Plimpton, Steven J.

A conforming representation composed of 2D finite elements and finite Fourier series is applied to 3D nonlinear non-ideal magnetohydrodynamics using a semi-implicit time-advance. The self-adjoint semi-implicit operator and variational approach to spatial discretization are synergistic and enable simulation in the extremely stiff conditions found in high temperature plasmas without sacrificing the geometric flexibility needed for modeling laboratory experiments. Growth rates for resistive tearing modes with experimentally relevant Lundquist number are computed accurately with time-steps that are large with respect to the global Alfven time and moderate spatial resolution when the finite elements have basis functions of polynomial degree (p) two or larger. An error diffusion method controls the generation of magnetic divergence error. Convergence studies show that this approach is effective for continuous basis functions with p {ge} 2, where the number of test functions for the divergence control terms is less than the number of degrees of freedom in the expansion for vector fields. Anisotropic thermal conduction at realistic ratios of parallel to perpendicular conductivity (x{parallel}/x{perpendicular}) is computed accurately with p {ge} 3 without mesh alignment. A simulation of tearing-mode evolution for a shaped toroidal tokamak equilibrium demonstrates the effectiveness of the algorithm in nonlinear conditions, and its results are used to verify the accuracy of the numerical anisotropic thermal conduction in 3D magnetic topologies.

More Details

An improved convergence bound for aggregation-based domain decomposition preconditioners

Proposed for publication in the SIAM Journal on Matrix Analysis and Applications.

Sala, Marzio S.; Shadid, John N.; Tuminaro, Raymond S.

In this paper we present a two-level overlapping domain decomposition preconditioner for the finite-element discretization of elliptic problems in two and three dimensions. The computational domain is partitioned into overlapping subdomains, and a coarse space correction, based on aggregation techniques, is added. Our definition of the coarse space does not require the introduction of a coarse grid. We consider a set of assumptions on the coarse basis functions to bound the condition number of the resulting preconditioned system. These assumptions involve only geometrical quantities associated with the aggregates and the subdomains. We prove that the condition number using the two-level additive Schwarz preconditioner is O(H/{delta} + H{sub 0}/{delta}), where H and H{sub 0} are the diameters of the subdomains and the aggregates, respectively, and {delta} is the overlap among the subdomains and the aggregates. This extends the bounds presented in [C. Lasser and A. Toselli, Convergence of some two-level overlapping domain decomposition preconditioners with smoothed aggregation coarse spaces, in Recent Developments in Domain Decomposition Methods, Lecture Notes in Comput. Sci. Engrg. 23, L. Pavarino and A. Toselli, eds., Springer-Verlag, Berlin, 2002, pp. 95-117; M. Sala, Domain Decomposition Preconditioners: Theoretical Properties, Application to the Compressible Euler Equations, Parallel Aspects, Ph.D. thesis, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland, 2003; M. Sala, Math. Model. Numer. Anal., 38 (2004), pp. 765-780]. Numerical experiments on a model problem are reported to illustrate the performance of the proposed preconditioner.

More Details

Xyce Parallel Electronic Simulator - Users' Guide Version 2.1

Hutchinson, Scott A.; Keiter, Eric R.; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric R.; Pawlowski, Roger P.; Fixel, Deborah A.; Schiek, Richard S.; Bogdan, Carolyn W.

This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6

More Details

Density functional theory study of transition metal porphine adsorption on gold surface and electric field induced conformation changes

Proposed for publication in the Journal of the American Chemical Society.

Rempe, Susan R.; Schultz, Peter A.; Chandross, M.

We apply density functional theory (DFT) and the DFT+U technique to study the adsorption of transition metal porphine molecules on atomistically flat Au(111) surfaces. DFT calculations using the Perdew?Burke?Ernzerhof exchange correlation functional correctly predict the palladium porphine (PdP) low-spin ground state. PdP is found to adsorb preferentially on gold in a flat geometry, not in an edgewise geometry, in qualitative agreement with experiments on substituted porphyrins. It exhibits no covalent bonding to Au(111), and the binding energy is a small fraction of an electronvolt. The DFT+U technique, parametrized to B3LYP-predicted spin state ordering of the Mn d-electrons, is found to be crucial for reproducing the correct magnetic moment and geometry of the isolated manganese porphine (MnP) molecule. Adsorption of Mn(II)P on Au(111) substantially alters the Mn ion spin state. Its interaction with the gold substrate is stronger and more site-specific than that of PdP. The binding can be partially reversed by applying an electric potential, which leads to significant changes in the electronic and magnetic properties of adsorbed MnP and 0.1 {angstrom} changes in the Mn-nitrogen distances within the porphine macrocycle. We conjecture that this DFT+U approach may be a useful general method for modeling first-row transition metal ion complexes in a condensed-matter setting.

More Details

Reversible logic for supercomputing

DeBenedictis, Erik

This paper is about making reversible logic a reality for supercomputing. Reversible logic offers a way to exceed certain basic limits on the performance of computers, yet a powerful case will have to be made to justify its substantial development expense. This paper explores the limits of current, irreversible logic for supercomputers, thus forming a threshold above which reversible logic is the only solution. Problems above this threshold are discussed, with the science and mitigation of global warming being discussed in detail. To further develop the idea of using reversible logic in supercomputing, a design for a 1 Zettaflops supercomputer as required for addressing global climate warming is presented. However, to create such a design requires deviations from the mainstream of both the software for climate simulation and research directions of reversible logic. These deviations provide direction on how to make reversible logic practical.

More Details

Dynamic data-driven inversion for terascale simulations real-time identification of airborne contaminants

Draganescu, Andrei I.

In contrast to traditional terascale simulations that have known, fixed data inputs, dynamic data-driven (DDD) applications are characterized by unknown data and informed by dynamic observations. DDD simulations give rise to inverse problems of determining unknown data from sparse observations. The main difficulty is that the optimality system is a boundary value problem in 4D space-time, even though the forward simulation is an initial value problem. We construct special-purpose parallel multigrid algorithms that exploit the spectral structure of the inverse operator. Experiments on problems of localizing airborne contaminant release from sparse observations in a regional atmospheric transport model demonstrate that 17-million-parameter inversion can be effected at a cost of just 18 forward simulations with high parallel efficiency. On 1024 Alphaserver EV68 processors, the turnaround time is just 29 minutes. Moreover, inverse problems with 135 million parameters - corresponding to 139 billion total space-time unknowns - are solved in less than 5 hours on the same number of processors. These results suggest that ultra-high resolution data-driven inversion can be carried out sufficiently rapidly for simulation-based 'real-time' hazard assessment.

More Details

A model for resource-aware load balancing on heterogeneous clusters

Proposed for publication in the IEEE Transactions on Parallel and Distributed Systems.

Devine, Karen D.

We address the problem of partitioning and dynamic load balancing on clusters with heterogeneous hardware resources. We propose DRUM, a model that encapsulates hardware resources and their interconnection topology. DRUM provides monitoring facilities for dynamic evaluation of communication, memory, and processing capabilities. Heterogeneity is quantified by merging the information from the monitors to produce a scalar number called 'power.' This power allows DRUM to be used easily by existing load-balancing procedures such as those in the Zoltan Toolkit while placing minimal burden on application programmers. We demonstrate the use of DRUM to guide load balancing in the adaptive solution of a Laplace equation on a heterogeneous cluster. We observed a significant reduction in execution time compared to traditional methods.

More Details
Results 9476–9500 of 9,998
Results 9476–9500 of 9,998