We present Poblano v1.0, a Matlab toolbox for solving gradient-based unconstrained optimization problems. Poblano implements three optimization methods (nonlinear conjugate gradients, limited-memory BFGS, and truncated Newton) that require only first order derivative information. In this paper, we describe the Poblano methods, provide numerous examples on how to use Poblano, and present results of Poblano used in solving problems from a standard test collection of unconstrained optimization problems.
The difficulty of calculating the ambient properties of molecular crystals, such as the explosive PETN, has long hampered much needed computational investigations of these materials. One reason for the shortcomings is that the exchange-correlation functionals available for Density Functional Theory (DFT) based calculations do not correctly describe the weak intermolecular van der Waals' forces present in molecular crystals. However, this weak interaction also poses other challenges for the computational schemes used. We will discuss these issues in the context of calculations of lattice constants and structure of PETN with a number of different functionals, and also discuss if these limitations can be circumvented for studies at non-ambient conditions.
This presentation discusses the following topics: (1) Red Sky Background; (2) 3D Torus Interconnect Concepts; (3) Difficulties of Torus in IB; (4) New Routing Code for IB a 3D Torus; (5) Red Sky 3D Torus Implementation; and (6) Managing a Large IB Machine. Computing at Sandia: (1) Capability Computing - Designed for scaling of single large runs, Usually proprietary for maximum performance, and Red Storm is Sandia's current capability machine; (2) Capacity Computing - Computing for the masses, 100s of jobs and 100s of users, Extreme reliability required, Flexibility for changing workload, Thunderbird will be decommissioned this quarter, Red Sky is our future capacity computing platform, and Red Mesa machine for National Renewable Energy Lab. Red Sky main themes are: (1) Cheaper - 5X capacity of Tbird at 2/3 the cost, Substantially cheaper per flop than our last large capacity machine purchase; (2) Leaner - Lower operational costs, Three security environments via modular fabric, Expandable, upgradeable, extensible, and Designed for 6yr. life cycle; and (3) Greener - 15% less power-1/6th power per flop, 40% less water-5M gallons saved annually, 10X better cooling efficiency, and 4x denser footprint.
Constructing high-fidelity control pulses that are robust to control and system/environment fluctuations is a crucial objective for quantum information processing (QIP). We combine dynamical decoupling (DD) with optimal control (OC) to identify control pulses that achieve this objective numerically. Previous DD work has shown that general errors up to (but not including) third order can be removed from {pi}- and {pi}/2-pulses without concatenation. By systematically integrating DD and OC, we are able to increase pulse fidelity beyond this limit. Our hybrid method of quantum control incorporates a newly-developed algorithm for robust OC, providing a nested DD-OC approach to generate robust controls. Motivated by solid-state QIP, we also incorporate relevant experimental constraints into this DD-OC formalism. To demonstrate the advantage of our approach, the resulting quantum controls are compared to previous DD results in open and uncertain model systems.
Alkali nitrate eutectic mixtures are finding application as industrial heat transfer fluids in concentrated solar power generation systems. An important property for such applications is the melting point, or phase coexistence temperature. We have computed melting points for lithium, sodium and potassium nitrate from molecular dynamics simulations using a recently developed method, which uses thermodynamic integration to compute the free energy difference between the solid and liquid phases. The computed melting point for NaNO3 was within 15K of its experimental value, while for LiNO3 and KNO3, the computed melting points were within 100K of the experimental values [4]. We are currently extending the approach to calculate melting temperatures for binary mixtures of lithium and sodium nitrate.
This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a way that minimizes interaction with the I/O system, maximizes the amount of data moved per call, and provides the I/O system the most information about the I/O transfer per request.
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
This report describes the activities and results of a Sandia LDRD project whose objective was to develop and demonstrate foundational aspects of a next-generation nuclear reactor safety code that leverages advanced computational technology. The project scope was directed towards the systems-level modeling and simulation of an advanced, sodium cooled fast reactor, but the approach developed has a more general applicability. The major accomplishments of the LDRD are centered around the following two activities. (1) The development and testing of LIME, a Lightweight Integrating Multi-physics Environment for coupling codes that is designed to enable both 'legacy' and 'new' physics codes to be combined and strongly coupled using advanced nonlinear solution methods. (2) The development and initial demonstration of BRISC, a prototype next-generation nuclear reactor integrated safety code. BRISC leverages LIME to tightly couple the physics models in several different codes (written in a variety of languages) into one integrated package for simulating accident scenarios in a liquid sodium cooled 'burner' nuclear reactor. Other activities and accomplishments of the LDRD include (a) further development, application and demonstration of the 'non-linear elimination' strategy to enable physics codes that do not provide residuals to be incorporated into LIME, (b) significant extensions of the RIO CFD code capabilities, (c) complex 3D solid modeling and meshing of major fast reactor components and regions, and (d) an approach for multi-physics coupling across non-conformal mesh interfaces.
As scientific simulations scale to use petascale machines and beyond, the data volumes generated pose a dual problem. First, with increasing machine sizes, the careful tuning of IO routines becomes more and more important to keep the time spent in IO acceptable. It is not uncommon, for instance, to have 20% of an application's runtime spent performing IO in a 'tuned' system. Careful management of the IO routines can move that to 5% or even less in some cases. Second, the data volumes are so large, on the order of 10s to 100s of TB, that trying to discover the scientifically valid contributions requires assistance at runtime to both organize and annotate the data. Waiting for offline processing is not feasible due both to the impact on the IO system and the time required. To reduce this load and improve the ability of scientists to use the large amounts of data being produced, new techniques for data management are required. First, there is a need for techniques for efficient movement of data from the compute space to storage. These techniques should understand the underlying system infrastructure and adapt to changing system conditions. Technologies include aggregation networks, data staging nodes for a closer parity to the IO subsystem, and autonomic IO routines that can detect system bottlenecks and choose different approaches, such as splitting the output into multiple targets, staggering output processes. Such methods must be end-to-end, meaning that even with properly managed asynchronous techniques, it is still essential to properly manage the later synchronous interaction with the storage system to maintain acceptable performance. Second, for the data being generated, annotations and other metadata must be incorporated to help the scientist understand output data for the simulation run as a whole, to select data and data features without concern for what files or other storage technologies were employed. All of these features should be attained while maintaining a simple deployment for the science code and eliminating the need for allocation of additional computational resources.
The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.
The peridynamic theory of mechanics attempts to unite the mathematical modeling of continuous media, cracks, and particles within a single framework. It does this by replacing the partial differential equations of the classical theory of solid mechanics with integral or integro-differential equations. These equations are based on a model of internal forces within a body in which material points interact with each other directly over finite distances. The classical theory of solid mechanics is based on the assumption of a continuous distribution of mass within a body. It further assumes that all internal forces are contact forces that act across zero distance. The mathematical description of a solid that follows from these assumptions relies on partial differential equations that additionally assume sufficient smoothness of the deformation for the PDEs to make sense in either their strong or weak forms. The classical theory has been demonstrated to provide a good approximation to the response of real materials down to small length scales, particularly in single crystals, provided these assumptions are met. Nevertheless, technology increasingly involves the design and fabrication of devices at smaller and smaller length scales, even interatomic dimensions. Therefore, it is worthwhile to investigate whether the classical theory can be extended to permit relaxed assumptions of continuity, to include the modeling of discrete particles such as atoms, and to allow the explicit modeling of nonlocal forces that are known to strongly influence the behavior of real materials.
Motivated by the needs of seismic inversion and building on our prior experience for fluid-dynamics systems, we present a high-order discontinuous Galerkin (DG) Runge-Kutta method applied to isotropic, linearized elasto-dynamics. Unlike other DG methods recently presented in the literature, our method allows for inhomogeneous material variations within each element that enables representation of realistic earth models — a feature critical for future use in seismic inversion. Likewise, our method supports curved elements and hybrid meshes that include both simplicial and nonsimplicial elements. We demonstrate the capabilities of this method through a series of numerical experiments including hybrid mesh discretizations of the Marmousi2 model as well as a modified Marmousi2 model with a oscillatory ocean bottom that is exactly captured by our discretization.
Mappings from a master element to the physical mesh element, in conjunction with local metrics such as those appearing in the Target-matrix paradigm, are used to measure quality at points within an element. The approach is applied to both linear and quadratic triangular elements; this enables, for example, one to measure quality within a quadratic finite element. Quality within an element may also be measured on a set of symmetry points, leading to so-called symmetry metrics. An important issue having to do with the labeling of the element vertices is relevant to mesh quality tools such as Verdict and Mesquite. Certain quality measures like area, volume, and shape should be label-invariant, while others such as aspect ratio and orientation should not. It is shown that local metrics whose Jacobian matrix is non-constant are label-invariant only at the center of the element, while symmetry metrics can be label-invariant anywhere within the element, provided the reference element is properly restricted.
Biofouling, the unwanted growth of biofilms on a surface, of water-treatment membranes negatively impacts in desalination and water treatment. With biofouling there is a decrease in permeate production, degradation of permeate water quality, and an increase in energy expenditure due to increased cross-flow pressure needed. To date, a universal successful and cost-effect method for controlling biofouling has not been implemented. The overall goal of the work described in this report was to use high-performance computing to direct polymer, material, and biological research to create the next generation of water-treatment membranes. Both physical (micromixers - UV-curable epoxy traces printed on the surface of a water-treatment membrane that promote chaotic mixing) and chemical (quaternary ammonium groups) modifications of the membranes for the purpose of increasing resistance to biofouling were evaluated. Creation of low-cost, efficient water-treatment membranes helps assure the availability of fresh water for human use, a growing need in both the U. S. and the world.
This report summarizes the Combinatorial Algebraic Topology: software, applications & algorithms workshop (CAT Workshop). The workshop was sponsored by the Computer Science Research Institute of Sandia National Laboratories. It was organized by CSRI staff members Scott Mitchell and Shawn Martin. It was held in Santa Fe, New Mexico, August 29-30. The CAT Workshop website has links to some of the talk slides and other information, http://www.cs.sandia.gov/CSRI/Workshops/2009/CAT/index.html. The purpose of the report is to summarize the discussions and recap the sessions. There is a special emphasis on technical areas that are ripe for further exploration, and the plans for follow-up amongst the workshop participants. The intended audiences are the workshop participants, other researchers in the area, and the workshop sponsors.
Bioweapons and emerging infectious diseases pose formidable and growing threats to our national security. Rapid advances in biotechnology and the increasing efficiency of global transportation networks virtually guarantee that the United States will face potentially devastating infectious disease outbreaks caused by novel ('unknown') pathogens either intentionally or accidentally introduced into the population. Unfortunately, our nation's biodefense and public health infrastructure is primarily designed to handle previously characterized ('known') pathogens. While modern DNA assays can identify known pathogens quickly, identifying unknown pathogens currently depends upon slow, classical microbiological methods of isolation and culture that can take weeks to produce actionable information. In many scenarios that delay would be costly, in terms of casualties and economic damage; indeed, it can mean the difference between a manageable public health incident and a full-blown epidemic. To close this gap in our nation's biodefense capability, we will develop, validate, and optimize a system to extract nucleic acids from unknown pathogens present in clinical samples drawn from infected patients. This system will extract nucleic acids from a clinical sample, amplify pathogen and specific host response nucleic acid sequences. These sequences will then be suitable for ultra-high-throughput sequencing (UHTS) carried out by a third party. The data generated from UHTS will then be processed through a new data assimilation and Bioinformatic analysis pipeline that will allow us to characterize an unknown pathogen in hours to days instead of weeks to months. Our methods will require no a priori knowledge of the pathogen, and no isolation or culturing; therefore it will circumvent many of the major roadblocks confronting a clinical microbiologist or virologist when presented with an unknown or engineered pathogen.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide.
Working with leading experts in the field of cognitive neuroscience and computational intelligence, SNL has developed a computational architecture that represents neurocognitive mechanisms associated with how humans remember experiences in their past. The architecture represents how knowledge is organized and updated through information from individual experiences (episodes) via the cortical-hippocampal declarative memory system. We compared the simulated behavioral characteristics with those of humans measured under well established experimental standards, controlling for unmodeled aspects of human processing, such as perception. We used this knowledge to create robust simulations of & human memory behaviors that should help move the scientific community closer to understanding how humans remember information. These behaviors were experimentally validated against actual human subjects, which was published. An important outcome of the validation process will be the joining of specific experimental testing procedures from the field of neuroscience with computational representations from the field of cognitive modeling and simulation.
The kinetic Monte Carlo method and its variants are powerful tools for modeling materials at the mesoscale, meaning at length and time scales in between the atomic and continuum. We have completed a 3 year LDRD project with the goal of developing a parallel kinetic Monte Carlo capability and applying it to materials modeling problems of interest to Sandia. In this report we give an overview of the methods and algorithms developed, and describe our new open-source code called SPPARKS, for Stochastic Parallel PARticle Kinetic Simulator. We also highlight the development of several Monte Carlo models in SPPARKS for specific materials modeling applications, including grain growth, bubble formation, diffusion in nanoporous materials, defect formation in erbium hydrides, and surface growth and evolution.
This report describes trans-organizational efforts to investigate the impact of chip multiprocessors (CMPs) on the performance of important Sandia application codes. The impact of CMPs on the performance and applicability of Sandia's system software was also investigated. The goal of the investigation was to make algorithmic and architectural recommendations for next generation platform acquisitions.
Currently, electrical power generation uses about 140 billion gallons of water per day accounting for over 39% of all freshwater withdrawals thus competing with irrigated agriculture as the leading user of water. Coupled to this water use is the required pumping, conveyance, treatment, storage and distribution of the water which requires on average 3% of all electric power generated. While water and energy use are tightly coupled, planning and management of these fundamental resources are rarely treated in an integrated fashion. Toward this need, a decision support framework has been developed that targets the shared needs of energy and water producers, resource managers, regulators, and decision makers at the federal, state and local levels. The framework integrates analysis and optimization capabilities to identify trade-offs, and 'best' alternatives among a broad list of energy/water options and objectives. The decision support framework is formulated in a modular architecture, facilitating tailored analyses over different geographical regions and scales (e.g., national, state, county, watershed, NERC region). An interactive interface allows direct control of the model and access to real-time results displayed as charts, graphs and maps. Ultimately, this open and interactive modeling framework provides a tool for evaluating competing policy and technical options relevant to the energy-water nexus.
Petaflops systems will have tens to hundreds of thousands of compute nodes which increases the likelihood of faults. Applications use checkpoint/restart to recover from these faults, but even under ideal conditions, applications running on more than 30,000 nodes will likely spend more than half of their total run time saving checkpoints, restarting, and redoing work that was lost. We created a library that performs redundant computations on additional nodes allocated to the application. An active node and its redundant partner form a node bundle which will only fail, and cause an application restart, when both nodes in the bundle fail. The goal of this library is to learn whether this can be done entirely at the user level, what requirements this library places on a Reliability, Availability, and Serviceability (RAS) system, and what its impact on performance and run time is. We find that our redundant MPI layer library imposes a relatively modest performance penalty for applications, but that it greatly reduces the number of applications interrupts. This reduction in interrupts leads to huge savings in restart and rework time. For large-scale applications the savings compensate for the performance loss and the additional nodes required for redundant computations.
Graph algorithms are a key component in a wide variety of intelligence analysis activities. The Graph-Based Informatics for Non-Proliferation and Counter-Terrorism project addresses the critical need of making these graph algorithms accessible to Sandia analysts in a manner that is both intuitive and effective. Specifically we describe the design and implementation of an open source toolkit for doing graph analysis, informatics, and visualization that provides Sandia with novel analysis capability for non-proliferation and counter-terrorism.
Advanced computing hardware and software written to exploit massively parallel architectures greatly facilitate the computation of extremely large problems. On the other hand, these tools, though enabling higher fidelity models, have often resulted in much longer run-times and turn-around-times in providing answers to engineering problems. The impediments include smaller elements and consequently smaller time steps, much larger systems of equations to solve, and the inclusion of nonlinearities that had been ignored in days when lower fidelity models were the norm. The research effort reported focuses on the accelerating the analysis process for structural dynamics though combinations of model reduction and mitigation of some factors that lead to over-meshing.
The modeling of solids is most naturally placed within a Lagrangian framework because it requires constitutive models which depend on knowledge of the original material orientations and subsequent deformations. Detailed kinematic information is needed to ensure material frame indifference which is captured through the deformation gradient F. Such information can be tracked easily in a Lagrangian code. Unfortunately, not all problems can be easily modeled using Lagrangian concepts due to severe distortions in the underlying motion. Either a Lagrangian/Eulerian or a pure Eulerian modeling framework must be introduced. We discuss and contrast several Lagrangian/Eulerian approaches for keeping track of the details of material kinematics.
Shared libraries have become ubiquitous and are used to achieve great resource efficiencies on many platforms. The same properties that enable efficiencies on time-shared computers and convenience on small clusters prove to be great obstacles to scalability on large clusters and High Performance Computing platforms. In addition, Light Weight operating systems such as Catamount have historically not supported the use of shared libraries specifically because they hinder scalability. In this report we will outline the methods of supporting shared libraries on High Performance Computing platforms using Light Weight kernels that we investigated. The considerations necessary to evaluate utility in this area are many and sometimes conflicting. While our initial path forward has been determined based on this evaluation we consider this effort ongoing and remain prepared to re-evaluate any technology that might provide a scalable solution. This report is an evaluation of a range of possible methods of supporting dynamically linked executables on capability class1 High Performance Computing platforms. Efforts are ongoing and extensive testing at scale is necessary to evaluate performance. While performance is a critical driving factor, supporting whatever method is used in a production environment is an equally important and challenging task.
Application performance is determined by a combination of many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, we find that the use of mini-applications - small self-contained proxies for real applications - is an excellent approach for rapidly exploring the parameter space of all these choices. Furthermore, use of mini-applications enriches the interaction between application, library and computer system developers by providing explicit functioning software and concrete performance results that lead to detailed, focused discussions of design trade-offs, algorithm choices and runtime performance issues. In this paper we discuss a collection of mini-applications and demonstrate how we use them to analyze and improve application performance on new and future computer platforms.
This report documents the results of an FY09 ASC V&V Methods level 2 milestone demonstrating new algorithmic capabilities for mixed aleatory-epistemic uncertainty quantification. Through the combination of stochastic expansions for computing aleatory statistics and interval optimization for computing epistemic bounds, mixed uncertainty analysis studies are shown to be more accurate and efficient than previously achievable. Part I of the report describes the algorithms and presents benchmark performance results. Part II applies these new algorithms to UQ analysis of radiation effects in electronic devices and circuits for the QASPR program.
Critical infrastructure resilience has become a national priority for the U. S. Department of Homeland Security. System resilience has been studied for several decades in many different disciplines, but no standards or unifying methods exist for critical infrastructure resilience analysis. Few quantitative resilience methods exist, and those existing approaches tend to be rather simplistic and, hence, not capable of sufficiently assessing all aspects of critical infrastructure resilience. This report documents the results of a late-start Laboratory Directed Research and Development (LDRD) project that investigated the development of quantitative resilience through application of control design methods. Specifically, we conducted a survey of infrastructure models to assess what types of control design might be applicable for critical infrastructure resilience assessment. As a result of this survey, we developed a decision process that directs the resilience analyst to the control method that is most likely applicable to the system under consideration. Furthermore, we developed optimal control strategies for two sets of representative infrastructure systems to demonstrate how control methods could be used to assess the resilience of the systems to catastrophic disruptions. We present recommendations for future work to continue the development of quantitative resilience analysis methods.
Data mining and machine learning techniques can be applied to computer system design to aid in optimizing design decisions, improving system runtime performance. Data mining techniques have been investigated in the context of branch prediction. Specifically, a comparison of traditional branch predictor performance has been made to data mining algorithms. Additionally, the possiblity of whether additional features available within the architectural state might serve to further improve branch prediction has been evaluated. Results show that data mining techniques indicate potential for improved branch prediction, especially when register file contents are included as a feature set.
The 9/30/2009 ASC Level 2 Scalable Analysis Tools for Sensitivity Analysis and UQ (Milestone 3160) contains feature recognition capability required by the user community for certain verification and validation tasks focused around sensitivity analysis and uncertainty quantification (UQ). These feature recognition capabilities include crater detection, characterization, and analysis from CTH simulation data; the ability to call fragment and crater identification code from within a CTH simulation; and the ability to output fragments in a geometric format that includes data values over the fragments. The feature recognition capabilities were tested extensively on sample and actual simulations. In addition, a number of stretch criteria were met including the ability to visualize CTH tracer particles and the ability to visualize output from within an S3D simulation.
We describe the implementation of a prototype fully implicit method for solving three-dimensional quasi-steady state magnetic advection-diffusion problems. This method allows us to solve the magnetic advection diffusion equations in an Eulerian frame with a fixed, user-prescribed velocity field. We have verified the correctness of method and implementation on two standard verification problems, the Solberg-White magnetic shear problem and the Perry-Jones-White rotating cylinder problem.
In this report we summarize research into new parallel algebraic multigrid (AMG) methods. We first provide a introduction to parallel AMG. We then discuss our research in parallel AMG algorithms for very large scale platforms. We detail significant improvements in the AMG setup phase to a matrix-matrix multiplication kernel. We present a smoothed aggregation AMG algorithm with fewer communication synchronization points, and discuss its links to domain decomposition methods. Finally, we discuss a multigrid smoothing technique that utilizes two message passing layers for use on multicore processors.
This report summarizes accomplishments of a three-year project focused on developing technical capabilities for measuring and modeling neuronal processes at the nanoscale. It was successfully demonstrated that nanoprobes could be engineered that were biocompatible, and could be biofunctionalized, that responded within the range of voltages typically associated with a neuronal action potential. Furthermore, the Xyce parallel circuit simulator was employed and models incorporated for simulating the ion channel and cable properties of neuronal membranes. The ultimate objective of the project had been to employ nanoprobes in vivo, with the nematode C elegans, and derive a simulation based on the resulting data. Techniques were developed allowing the nanoprobes to be injected into the nematode and the neuronal response recorded. To the authors's knowledge, this is the first occasion in which nanoparticles have been successfully employed as probes for recording neuronal response in an in vivo animal experimental protocol.