Our aim is to determine the network of events, or the regulatory network, that defines an immune response to a bio-toxin. As a model system, we are studying T cell regulatory network triggered through tyrosine kinase receptor activation using a combination of pathway stimulation and time-series microarray experiments. Our approach is composed of five steps (1) microarray experiments and data error analysis, (2) data clustering, (3) data smoothing and discretization, (4) network reverse engineering, and (5) network dynamics analysis and fingerprint identification. The technological outcome of this study is a suite of experimental protocols and computational tools that reverse engineer regulatory networks provided gene expression data. The practical biological outcome of this work is an immune response fingerprint in terms of gene expression levels. Inferring regulatory networks from microarray data is a new field of investigation that is no more than five years old. To the best of our knowledge, this work is the first attempt that integrates experiments, error analyses, data clustering, inference, and network analysis to solve a practical problem. Our systematic approach of counting, enumeration, and sampling networks matching experimental data is new to the field of network reverse engineering. The resulting mathematical analyses and computational tools lead to new results on their own and should be useful to others who analyze and infer networks.
The thermal challenge problem has been developed at Sandia National Laboratories as a testbed for demonstrating various types of validation approaches and prediction methods. This report discusses one particular methodology to assess the validity of a computational model given experimental data. This methodology is based on Bayesian Belief Networks (BBNs) and can incorporate uncertainty in experimental measurements, in physical quantities, and model uncertainties. The approach uses the prior and posterior distributions of model output to compute a validation metric based on Bayesian hypothesis testing (a Bayes' factor). This report discusses various aspects of the BBN, specifically in the context of the thermal challenge problem. A BBN is developed for a given set of experimental data in a particular experimental configuration. The development of the BBN and the method for ''solving'' the BBN to develop the posterior distribution of model output through Monte Carlo Markov Chain sampling is discussed in detail. The use of the BBN to compute a Bayes' factor is demonstrated.
The formation and functions of living materials and organisms are fundamentally different from those of synthetic materials and devices. Synthetic materials tend to have static structures, and are not capable of adapting to the functional needs of changing environments. In contrast, living systems utilize energy to create, heal, reconfigure, and dismantle materials in a dynamic, non-equilibrium fashion. The overall goal of the project was to organize and reconfigure functional assemblies of nanoparticles using strategies that mimic those found in living systems. Active assembly of nanostructures was studied using active biomolecules to drive the organization and assembly of nanocomposite materials. In this system, kinesin motor proteins and microtubules were used to direct the transport and interactions of nanoparticles at synthetic interfaces. In addition, the kinesin/microtubule transport system was used to actively assemble nanocomposite materials capable of storing significant elastic energy. Novel biophysical measurement tools were also developed for measuring the collective force generated by kinesin motor proteins, which will provide insight on the mechanical constraints of active assembly processes. Responsive reconfiguration of nanostructures was studied in terms of using active biomolecules to mediate the optical properties of quantum dot (QD) arrays through modulation of inter-particle spacing and associated energy transfer interaction. Design rules for kinesin-based transport of a wide range of nanoscale cargo (e.g., nanocrystal quantum dots, micron-sized polymer spheres) were developed. Three-dimensional microtubule organizing centers were assembled in which the polar orientation of the microtubules was controlled by a multi-staged assembly process. Overall, a number of enabling technologies were developed over the course of this project, and will drive the exploitation of energy-driven processes to regulate the assembly, disassembly, and dynamic reorganization of nanomaterials.
Understanding the properties and behavior of biomembranes is fundamental to many biological processes and technologies. Microdomains in biomembranes or ''lipid rafts'' are now known to be an integral part of cell signaling, vesicle formation, fusion processes, protein trafficking, and viral and toxin infection processes. Understanding how microdomains form, how they depend on membrane constituents, and how they act not only has biological implications, but also will impact Sandia's effort in development of membranes that structurally adapt to their environment in a controlled manner. To provide such understanding, we created physically-based models of biomembranes. Molecular dynamics (MD) simulations and classical density functional theory (DFT) calculations using these models were applied to phenomena such as microdomain formation, membrane fusion, pattern formation, and protein insertion. Because lipid dynamics and self-organization in membranes occur on length and time scales beyond atomistic MD, we used coarse-grained models of double tail lipid molecules that spontaneously self-assemble into bilayers. DFT provided equilibrium information on membrane structure. Experimental work was performed to further help elucidate the fundamental membrane organization principles.
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write and easily portable. However, the method usually takes twice as long to solve as Newton-GMRES on general problems because it solves two linear systems at each iteration. In this paper, we discuss modifications to Bouaricha's method for a practical implementation, including a special globalization technique and other modifications for greater efficiency. We present numerical results showing computational advantages over Newton-GMRES on some realistic problems. We further discuss a new approach for dealing with singular (or ill-conditioned) matrices. In particular, we modify an algorithm for identifying a turning point so that an increasingly ill-conditioned Jacobian does not prevent convergence.
This SAND report provides the technical progress through April 2005 of the Sandia-led project, ''Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling'', funded by the DOE Office of Science GenomicsGTL Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO{sub 2} are important terms in the global environmental response to anthropogenic atmospheric inputs of CO{sub 2} and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microamy experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will develop a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. The ultimate goal of this effort is develop and apply new experimental and computational methods needed to generate a new level of understanding of how the Synechococcus genome affects carbon fixation at the global scale. Anticipated experimental and computational methods will provide ever-increasing insight about the individual elements and steps in the carbon fixation process, however relating an organism's genome to its cellular response in the presence of varying environments will require systems biology approaches. Thus a primary goal for this effort is to integrate the genomic data generated from experiments and lower level simulations with data from the existing body of literature into a whole cell model. We plan to accomplish this by developing and applying a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats. These challenges are unprecedented in high performance scientific computing and necessitate the development of a companion computational infrastructure to support this effort.
This report describes the test and evaluation methods by which the Teraflops Operating System, or TOS, that resides on Sandia's massively-parallel computer Janus is verified for production release. Also discussed are methods used to build TOS before testing and evaluating, miscellaneous utility scripts, a sample test plan, and a proposed post-test method for quickly examining the large number of test results. The purpose of the report is threefold: (1) to provide a guide to T&E procedures, (2) to aid and guide others who will run T&E procedures on the new ASCI Red Storm machine, and (3) to document some of the history of evaluation and testing of TOS. This report is not intended to serve as an exhaustive manual for testers to conduct T&E procedures.
Threats to water distribution systems include release of contaminants and Denial of Service (DoS) attacks. A better understanding, and validated computational models, of the flow in water distribution systems would enable determination of sensor placement in real water distribution networks, allow source identification, and guide mitigation/minimization efforts. Validation data are needed to evaluate numerical models of network operations. Some data can be acquired in real-world tests, but these are limited by 1) unknown demand, 2) lack of repeatability, 3) too many sources of uncertainty (demand, friction factors, etc.), and 4) expense. In addition, real-world tests have limited numbers of network access points. A scale-model water distribution system was fabricated, and validation data were acquired over a range of flow (demand) conditions. Standard operating variables included system layout, demand at various nodes in the system, and pressure drop across various pipe sections. In addition, the location of contaminant (salt or dye) introduction was varied. Measurements of pressure, flowrate, and concentration at a large number of points, and overall visualization of dye transport through the flow network were completed. Scale-up issues that that were incorporated in the experiment design include Reynolds number, pressure drop across nodes, and pipe friction and roughness. The scale was chosen to be 20:1, so the 10 inch main was modeled with a 0.5 inch pipe in the physical model. Controlled validation tracer tests were run to provide validation to flow and transport models, especially of the degree of mixing at pipe junctions. Results of the pipe mixing experiments showed large deviations from predicted behavior and these have a large impact on standard network operations models.3
This work focuses on different methods to generate confidence regions for nonlinear parameter identification problems. Three methods for confidence region estimation are considered: a linear approximation method, an F-test method, and a Log-Likelihood method. Each of these methods are applied to three case studies. One case study is a problem with synthetic data, and the other two case studies identify hydraulic parameters in groundwater flow problems based on experimental well-test results. The confidence regions for each case study are analyzed and compared. Although the F-test and Log-Likelihood methods result in similar regions, there are differences between these regions and the regions generated by the linear approximation method for nonlinear problems. The differing results, capabilities, and drawbacks of all three methods are discussed.
Semantic graphs offer one promising avenue for intelligence analysis in homeland security. They provide a mechanism for describing a wide variety of relationships between entities of potential interest. The vertices are nouns of various types, e.g. people, organizations, events, etc. Edges in the graph represent different types of relationships between entities, e.g. 'is friends with', 'belongs-to', etc. Semantic graphs offer a number of potential advantages as a knowledge representation system. They allow information of different kinds, and collected in differing ways, to be combined in a seamless manner. A semantic graph is a very compressed representation of some of relationship information. It has been reported that the semantic graph can be two orders of magnitude smaller than the processed intelligence data. This allows for much larger portions of the data universe to be resident in computer memory. Many intelligence queries that are relevant to the terrorist threat are naturally expressed in the language of semantic graphs. One example is the search for 'interesting' relationships between two individuals or between an individual and an event, which can be phrased as a search for short paths in the graph. Another example is the search for an analyst-specified threat pattern, which can be cast as an instance of subgraph isomorphism. It is important to note than many kinds of analysis are not relationship based, so these are not good candidates for semantic graphs. Thus, a semantic graph should always be used in conjunction with traditional knowledge representation and interface methods. Operations that involve looking for chains of relationships (e.g. friend of a friend) are not efficiently executable in a traditional relational database. However, the semantic graph can be thought of as a pre-join of the database, and it is ideally suited for these kinds of operations. Researchers at Sandia National Laboratories are working to facilitate semantic graph analysis. Since intelligence datasets can be extremely large, the focus of this work is on the use of parallel computers. We have been working to develop scalable parallel algorithms that will be at the core of a semantic graph analysis infrastructure. Our work has involved two different thrusts, corresponding to two different computer architectures. The first architecture of interest is distributed memory, message passing computers. These machines are ubiquitous and affordable, but they are challenging targets for graph algorithms. Much of our distributed-memory work to date has been collaborative with researchers at Lawrence Livermore National Laboratory and has focused on finding short paths on distributed memory parallel machines. Our implementation on 32K processors of BlueGene/Light finds shortest paths between two specified vertices in just over a second for random graphs with 4 billion vertices.
Electromagnetic induction is a classic geophysical exploration method designed for subsurface characterization--in particular, sensing the presence of geologic heterogeneities and fluids such as groundwater and hydrocarbons. Several approaches to the computational problems associated with predicting and interpreting electromagnetic phenomena in and around the earth are addressed herein. Publications resulting from the project include [31]. To obtain accurate and physically meaningful numerical simulations of natural phenomena, computational algorithms should operate in discrete settings that reflect the structure of governing mathematical models. In section 2, the extension of algebraic multigrid methods for the time domain eddy current equations to the frequency domain problem is discussed. Software was developed and is available in Trilinos ML package. In section 3 we consider finite element approximations of De Rham's complex. We describe how to develop a family of finite element spaces that forms an exact sequence on hexahedral grids. The ensuing family of non-affine finite elements is called a van Welij complex, after the work [37] of van Welij who first proposed a general method for developing tangentially and normally continuous vector fields on hexahedral elements. The use of this complex is illustrated for the eddy current equations and a conservation law problem. Software was developed and is available in the Ptenos finite element package. The more popular methods of geophysical inversion seek solutions to an unconstrained optimization problem by imposing stabilizing constraints in the form of smoothing operators on some enormous set of model parameters (i.e. ''over-parametrize and regularize''). In contrast we investigate an alternative approach whereby sharp jumps in material properties are preserved in the solution by choosing as model parameters a modest set of variables which describe an interface between adjacent regions in physical space. While still over-parametrized, this choice of model space contains far fewer parameters than before, thus easing the computational burden, in some cases, of the optimization problem. And most importantly, the associated finite element discretization is aligned with the abrupt changes in material properties associated with lithologic boundaries as well as the interface between buried cultural artifacts and the surrounding Earth. In section 4, algorithms and tools are described that associate a smooth interface surface to a given triangulation. In particular, the tools support surface refinement and coarsening. Section 5 describes some preliminary results on the application of interface identification methods to some model problems in geophysical inversion. Due to time constraints, the results described here use the GNU Triangulated Surface Library for the manipulation of surface meshes and the TetGen software library for the generation of tetrahedral meshes.
The peridynamic model was introduced by Silling in 1998. In this paper, we demonstrate the application of the quasistatic peridynamic model to two-dimensional, linear elastic, plane stress and plane strain problems, with special attention to the modeling of plain and reinforced concrete structures. We consider just one deviation from linearity--that which arises due to the irreversible sudden breaking of bonds between particles. The peridynamic model starts with the assumption that Newton's second law holds true on every infinitesimally small free body (or particle) within the domain of analysis. A specified force density function, called the pairwise force function, (with units of force per unit volume per unit volume) between each pair of infinitesimally small particles is postulated to act if the particles are closer together than some finite distance, called the material horizon. The pairwise force function may be assumed to be a function of the relative position and the relative displacement between the two particles. In this paper, we assume that for two particles closer together than the specified 'material horizon' the pairwise force function increases linearly with respect to the stretch, but at some specified stretch, the pairwise force function is irreversibly reduced to zero.
Due to advances in CMOS fabrication technology, high performance computing capabilities have continually grown. More capable hardware has allowed a range of complex scientific applications to be developed. However, these applications present a bottleneck to future performance. Entrenched 'legacy' codes - 'Dusty Decks' - demand that new hardware must remain compatible with existing software. Additionally, conventional architectures faces increasing challenges. Many of these challenges revolve around the growing disparity between processor and memory speed - the 'Memory Wall' - and difficulties scaling to large numbers of parallel processors. To a large extent, these limitations are inherent to the traditional computer architecture. As data is consumed more quickly, moving that data to the point of computation becomes more difficult. Barring any upward revision in the speed of light, this will continue to be a fundamental limitation on the speed of computation. This work focuses on these solving these problems in the context of Light Weight Processing (LWP). LWP is an innovative technique which combines Processing-In-Memory, short vector computation, multithreading, and extended memory semantics. It applies these techniques to try and answer the questions 'What will a next-generation supercomputer look like?' and 'How will we program it?' To that end, this work presents four contributions: (1) An implementation of MPI which uses features of LWP to substantially improve message processing throughput; (2) A technique leveraging extended memory semantics to improve message passing by overlapping computation and communication; (3) An OpenMP library modified to allow efficient partitioning of threads between a conventional CPU and LWPs - greatly improving cost/performance; and (4) An algorithm to extract very small 'threadlets' which can overcome the inherent disadvantages of a simple processor pipeline.
In gas chromatography, a chemical sample separates into its constituent components as it travels along a long thin column. As the component chemicals exit the column they are detected and identified, allowing the chemical makeup of the sample to be determined. For correct identification of the component chemicals, the distribution of the concentration of each chemical along the length of the column must be nearly symmetric. The prediction and control of asymmetries in gas chromatography has been an active research area since the advent of the technique. In this paper, we develop from first principles a general model for isothermal linear chromatography. We use this model to develop closed-form expressions for terms related to the first, second, and third moments of the distribution of the concentration, which determines the velocity, diffusion rate, and asymmetry of the distribution. We show that for all practical experimental situations, only fronting peaks are predicted by this model, suggesting that a nonlinear chromatography model is required to predict tailing peaks. For situations where asymmetries arise, we analyze the rate at which the concentration distribution returns to a normal distribution. Numerical examples are also provided.
The approximate solution of optimization and control problems for systems governed by linear, elliptic partial differential equations is considered. Such problems are most often solved using methods based on the application of the Lagrange multiplier rule followed by discretization through, e.g., a Galerkin finite element method. As an alternative, we show how least-squares finite element methods can be used for this purpose. Penalty-based formulations, another approach widely used in other settings, have not enjoyed the same level of popularity in the partial differential equation case perhaps because naively defined penalty-based methods can have practical deficiencies. We use methodologies associated with modern least-squares finite element methods to develop and analyze practical penalty methods for the approximate solution of optimization problems for systems governed by linear, elliptic partial differential equations. We develop an abstract theory for such problems; along the way, we introduce several methods based on least-squares notions, and compare and constrast their properties.
Under conditions that were predicted as 'safe' by well-established TCAD packages, radiation hardness can still be significantly degraded by a few lucky arsenic ions reaching the gate oxide during self-aligned CMOS source/drain ion implantation. The most likely explanation is that both oxide traps and interface traps are created when ions penetrate and damage the gate oxide after channeling or traveling along polysilicon grain boundaries during the implantation process.
Graph partitioning is often used for load balancing in parallel computing, but it is known that hypergraph partitioning has several advantages. First, hypergraphs more accurately model communication volume, and second, they are more expressive and can better represent nonsymmetric problems. Hypergraph partitioning is particularly suited to parallel sparse matrix-vector multiplication, a common kernel in scientific computing. We present a parallel software package for hypergraph (and sparse matrix) partitioning developed at Sandia National Labs. The algorithm is a variation on multilevel partitioning. Our parallel implementation is novel in that it uses a two-dimensional data distribution among processors. We present empirical results that show our parallel implementation achieves good speedup on several large problems (up to 33 million nonzeros) with up to 64 processors on a Linux cluster.
The latency and throughput of MPI messages are critically important to a range of parallel scientific applications. In many modern networks, both of these performance characteristics are largely driven by the performance of a processor on the network interface. Because of the semantics of MPI, this embedded processor is forced to traverse a linked list of posted receives each time a message is received. As this list grows long, the latency of message reception grows and the throughput of MPI messages decreases. This paper presents a novel hardware feature to handle list management functions on a network interface. By moving functions such as list insertion, list traversal, and list deletion to the hardware unit, latencies are decreased by up to 20% in the zero length queue case with dramatic improvements in the presence of long queues. Similarly, the throughput is increased by up to 10% in the zero length queue case and by nearly 100% in the presence queues of 30 messages.
Proliferation of degrees-of-freedom has plagued discontinuous Galerkin methodology from its inception over 30 years ago. This paper develops a new computational formulation that combines the advantages of discontinuous Galerkin methods with the data structure of their continuous Galerkin counterparts. The new method uses local, element-wise problems to project a continuous finite element space into a given discontinuous space, and then applies a discontinuous Galerkin formulation. The projection leads to parameterization of the discontinuous degrees-of-freedom by their continuous counterparts and has a variational multiscale interpretation. This significantly reduces the computational burden and, at the same time, little or no degradation of the solution occurs. In fact, the new method produces improved solutions compared with the traditional discontinuous Galerkin method in some situations.
Controlled seeding of perturbations is employed to study the evolution of wire array z-pinch implosion instabilities which strongly impact x-ray production when the 3D plasma stagnates on axis. Wires modulated in radius exhibit locally enhanced magnetic field and imploding bubble formation at discontinuities in wire radius due to the perturbed current path. Wires coated with localized spectroscopic dopants are used to track turbulent material flow. Experiments and MHD modeling offer insight into the behavior of z-pinch instabilities.
The success of Lagrangian contact modeling leads one to believe that important aspects of this capability may be used for multi-material modeling when only a portion of the simulation can be represented in a Lagrangian frame. We review current experience with two dual mesh technologies where one of these meshes is a Lagrangian mesh and the other is an Arbitrary Lagrangian/Eulerian (ALE) mesh. These methods are cast in the framework of an operator-split ALE algorithm where a Lagrangian step is followed by a remesh/remap step. An interface-coupled methodology is considered first. This technique is applicable to problems involving contact between materials of dissimilar compliance. The technique models the more compliant (soft) material as ALE while the less compliant (hard) material and associated interface are modeled in a Lagrangian fashion. Loads are transferred between the hard and soft materials via explicit transient dynamics contact algorithms. The use of these contact algorithms remove the requirement of node-tonode matching at the soft-hard interface. In the context of the operator-split ALE algorithm, a single Lagrangian step is performed using a mesh to mesh contact algorithm. At the end of the Lagrangian step the meshes will be slightly offset at the interface but non-interpenetrating. The ALE mesh nodes at the interface are then remeshed to their initial location relative to the Lagrangian body faces and the ALE mesh is smoothed, translated and rotated to follow Lagrangian body. Robust remeshing in the ALE region is required for success of this algorithm, and we describe current work in this area. The second method is an overlapping grid methodology that requires mapping of information between a Lagrangian mesh and an ALE mesh. The Lagrangian mesh describes a relatively hard body that interacts with softer material contained in the ALE mesh. A predicted solution for the velocity field is performed independently on both meshes. Element-centered velocity and momentum are transferred between the meshes using the volume transfer capability implemented in contact algorithms. Data from the ALE mesh is mapped to a phantom mesh that surrounds the Lagrangian mesh, providing for the reaction to the predicted motion of the Lagrangian material. Data from the Lagrangian mesh is mapped directly to the ALE mesh. A momentum balance is performed on both meshes to adjust the velocity field to account for the interaction of the material from the other mesh. Subsequent, remeshing and remapping of the ALE mesh is performed to allow large deformation of the softer material. We overview current progress using this approach and discuss avenues for future research and development.
We provide a common framework for compatible discretizations using algebraic topology to guide our analysis. The main concept is the natural inner product on cochains, which induces a combinatorial Hodge theory. The framework comprises of mutually consistent operations of differentiation and integration, has a discrete Stokes theorem, and preserves the invariants of the DeRham cohomology groups. The latter allows for an elementary calculation of the kernel of the discrete Laplacian. Our framework provides an abstraction that includes examples of compatible finite element, finite volume and finite difference methods. We describe how these methods result from the choice of a reconstruction operator and when they are equivalent.
The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of the Message Passing Interface (MPI) to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed to yield performance benefits. Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have not been studied in depth at the application level. This lack of analysis is complicated by the fact that most MPI implementations do not sufficiently support overlap or independent progress. Recent work has demonstrated a quantifiable advantage for an MPI implementation that uses offload to provide overlap and independent progress. The study is conducted on two different platforms with each having two MPI implementations (one with and one without independent progress). Thus, identical network hardware and virtually identical software stacks are used. Furthermore, one platform, ASCI Red, allows further separation of features such as overlap and offload. Thus, this paper extends previous work by further qualifying the source of the performance advantage: offload, overlap, or independent progress.
Molecular dynamics calculations are performed to study the effect of deformation sequence and history on the inelastic behavior of copper interfaces on the nanoscale. An asymmetric 45 deg tilt bicrystal interface is examined, representing an idealized high-angle grain boundary interface. The interface model is subjected to three different deformation paths: tension then shear, shear then tension, and combined proportional tension and shear. Analysis shows that path-history dependent material behavior is confined within a finite layer of deformation around the bicrystal interface. The relationships between length scale and interface properties, such as the thickness of the path-history dependent layer and the interface strength, are discussed in detail.
A conforming representation composed of 2D finite elements and finite Fourier series is applied to 3D nonlinear non-ideal magnetohydrodynamics using a semi-implicit time-advance. The self-adjoint semi-implicit operator and variational approach to spatial discretization are synergistic and enable simulation in the extremely stiff conditions found in high temperature plasmas without sacrificing the geometric flexibility needed for modeling laboratory experiments. Growth rates for resistive tearing modes with experimentally relevant Lundquist number are computed accurately with time-steps that are large with respect to the global Alfven time and moderate spatial resolution when the finite elements have basis functions of polynomial degree (p) two or larger. An error diffusion method controls the generation of magnetic divergence error. Convergence studies show that this approach is effective for continuous basis functions with p {ge} 2, where the number of test functions for the divergence control terms is less than the number of degrees of freedom in the expansion for vector fields. Anisotropic thermal conduction at realistic ratios of parallel to perpendicular conductivity (x{parallel}/x{perpendicular}) is computed accurately with p {ge} 3 without mesh alignment. A simulation of tearing-mode evolution for a shaped toroidal tokamak equilibrium demonstrates the effectiveness of the algorithm in nonlinear conditions, and its results are used to verify the accuracy of the numerical anisotropic thermal conduction in 3D magnetic topologies.
In this paper we present a two-level overlapping domain decomposition preconditioner for the finite-element discretization of elliptic problems in two and three dimensions. The computational domain is partitioned into overlapping subdomains, and a coarse space correction, based on aggregation techniques, is added. Our definition of the coarse space does not require the introduction of a coarse grid. We consider a set of assumptions on the coarse basis functions to bound the condition number of the resulting preconditioned system. These assumptions involve only geometrical quantities associated with the aggregates and the subdomains. We prove that the condition number using the two-level additive Schwarz preconditioner is O(H/{delta} + H{sub 0}/{delta}), where H and H{sub 0} are the diameters of the subdomains and the aggregates, respectively, and {delta} is the overlap among the subdomains and the aggregates. This extends the bounds presented in [C. Lasser and A. Toselli, Convergence of some two-level overlapping domain decomposition preconditioners with smoothed aggregation coarse spaces, in Recent Developments in Domain Decomposition Methods, Lecture Notes in Comput. Sci. Engrg. 23, L. Pavarino and A. Toselli, eds., Springer-Verlag, Berlin, 2002, pp. 95-117; M. Sala, Domain Decomposition Preconditioners: Theoretical Properties, Application to the Compressible Euler Equations, Parallel Aspects, Ph.D. thesis, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland, 2003; M. Sala, Math. Model. Numer. Anal., 38 (2004), pp. 765-780]. Numerical experiments on a model problem are reported to illustrate the performance of the proposed preconditioner.
This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6
We apply density functional theory (DFT) and the DFT+U technique to study the adsorption of transition metal porphine molecules on atomistically flat Au(111) surfaces. DFT calculations using the Perdew?Burke?Ernzerhof exchange correlation functional correctly predict the palladium porphine (PdP) low-spin ground state. PdP is found to adsorb preferentially on gold in a flat geometry, not in an edgewise geometry, in qualitative agreement with experiments on substituted porphyrins. It exhibits no covalent bonding to Au(111), and the binding energy is a small fraction of an electronvolt. The DFT+U technique, parametrized to B3LYP-predicted spin state ordering of the Mn d-electrons, is found to be crucial for reproducing the correct magnetic moment and geometry of the isolated manganese porphine (MnP) molecule. Adsorption of Mn(II)P on Au(111) substantially alters the Mn ion spin state. Its interaction with the gold substrate is stronger and more site-specific than that of PdP. The binding can be partially reversed by applying an electric potential, which leads to significant changes in the electronic and magnetic properties of adsorbed MnP and 0.1 {angstrom} changes in the Mn-nitrogen distances within the porphine macrocycle. We conjecture that this DFT+U approach may be a useful general method for modeling first-row transition metal ion complexes in a condensed-matter setting.
This paper is about making reversible logic a reality for supercomputing. Reversible logic offers a way to exceed certain basic limits on the performance of computers, yet a powerful case will have to be made to justify its substantial development expense. This paper explores the limits of current, irreversible logic for supercomputers, thus forming a threshold above which reversible logic is the only solution. Problems above this threshold are discussed, with the science and mitigation of global warming being discussed in detail. To further develop the idea of using reversible logic in supercomputing, a design for a 1 Zettaflops supercomputer as required for addressing global climate warming is presented. However, to create such a design requires deviations from the mainstream of both the software for climate simulation and research directions of reversible logic. These deviations provide direction on how to make reversible logic practical.
In contrast to traditional terascale simulations that have known, fixed data inputs, dynamic data-driven (DDD) applications are characterized by unknown data and informed by dynamic observations. DDD simulations give rise to inverse problems of determining unknown data from sparse observations. The main difficulty is that the optimality system is a boundary value problem in 4D space-time, even though the forward simulation is an initial value problem. We construct special-purpose parallel multigrid algorithms that exploit the spectral structure of the inverse operator. Experiments on problems of localizing airborne contaminant release from sparse observations in a regional atmospheric transport model demonstrate that 17-million-parameter inversion can be effected at a cost of just 18 forward simulations with high parallel efficiency. On 1024 Alphaserver EV68 processors, the turnaround time is just 29 minutes. Moreover, inverse problems with 135 million parameters - corresponding to 139 billion total space-time unknowns - are solved in less than 5 hours on the same number of processors. These results suggest that ultra-high resolution data-driven inversion can be carried out sufficiently rapidly for simulation-based 'real-time' hazard assessment.
We address the problem of partitioning and dynamic load balancing on clusters with heterogeneous hardware resources. We propose DRUM, a model that encapsulates hardware resources and their interconnection topology. DRUM provides monitoring facilities for dynamic evaluation of communication, memory, and processing capabilities. Heterogeneity is quantified by merging the information from the monitors to produce a scalar number called 'power.' This power allows DRUM to be used easily by existing load-balancing procedures such as those in the Zoltan Toolkit while placing minimal burden on application programmers. We demonstrate the use of DRUM to guide load balancing in the adaptive solution of a Laplace equation on a heterogeneous cluster. We observed a significant reduction in execution time compared to traditional methods.