Current Practices to Manage Production-Quality Deployment: ParaView and VTK-m
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Sandia's Dakota software (available at http://dakota.sandia.gov) supports science and engineering transformation through advanced exploration of simulations. Specifically it manages and analyzes ensembles of simulations to provide broader and deeper perspective for analysts and decision makers. This enables them to enhance understanding of risk, improve products, and assess simulation credibility.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Transition metal oxide (TMO) memristors have recently attracted special attention from the semiconductor industry and academia. Memristors are one of the strongest candidates to replace flash memory, and possibly DRAM and SRAM in the near future. Moreover, memristors have a high potential to enable beyond-CMOS technology advances in novel architectures for high performance computing (HPC). The utility of memristors has been demonstrated in reprogrammable logic (cross-bar switches), brain-inspired computing and in non-CMOS complementary logic. Indeed, the potential use of memristors as logic devices is especially important considering the inevitable end of CMOS technology scaling that is anticipated by 2025. In order to aid the on-going Sandia memristor fabrication effort with a memristor design tool and establish a clear physical picture of resistance switching in TMO memristors, we have created and validated with experimental data a simulation tool we name the Memristor Charge Transport (MCT) Simulator.
Proceedings of the International Joint Conference on Neural Networks
Resistive memories enable dramatic energy reductions for neural algorithms. We propose a general purpose neural architecture that can accelerate many different algorithms and determine the device properties that will be needed to run backpropagation on the neural architecture. To maintain high accuracy, the read noise standard deviation should be less than 5% of the weight range. The write noise standard deviation should be less than 0.4% of the weight range and up to 300% of a characteristic update (for the datasets tested). Asymmetric nonlinearities in the change in conductance vs pulse cause weight decay and significantly reduce the accuracy, while moderate symmetric nonlinearities do not have an effect. In order to allow for parallel reads and writes the write current should be less than 100 nA as well.
Proceedings of the International Joint Conference on Neural Networks
Resistive memories enable dramatic energy reductions for neural algorithms. We propose a general purpose neural architecture that can accelerate many different algorithms and determine the device properties that will be needed to run backpropagation on the neural architecture. To maintain high accuracy, the read noise standard deviation should be less than 5% of the weight range. The write noise standard deviation should be less than 0.4% of the weight range and up to 300% of a characteristic update (for the datasets tested). Asymmetric nonlinearities in the change in conductance vs pulse cause weight decay and significantly reduce the accuracy, while moderate symmetric nonlinearities do not have an effect. In order to allow for parallel reads and writes the write current should be less than 100 nA as well.
Proceedings of the International Joint Conference on Neural Networks
Through various means of structural and synaptic plasticity enabling online learning, neural networks are constantly reconfiguring their computational functionality. Neural information content is embodied within the configurations, representations, and computations of neural networks. To explore neural information content, we have developed metrics and computational paradigms to quantify neural information content. We have observed that conventional compression methods may help overcome some of the limiting factors of standard information theoretic techniques employed in neuroscience, and allows us to approximate information in neural data. To do so we have used compressibility as a measure of complexity in order to estimate entropy to quantitatively assess information content of neural ensembles. Using Lempel-Ziv compression we are able to assess the rate of generation of new patterns across a neural ensemble's firing activity over time to approximate the information content encoded by a neural circuit. As a specific case study, we have been investigating the effect of neural mixed coding schemes due to hippocampal adult neurogenesis.
Journal of Chemical Physics
Accurate rational approximations of the Fermi-Dirac distribution are a useful component in many numerical algorithms for electronic structure calculations. The best known approximations use O(log(βΔ)log(-1)) poles to achieve an error tolerance at temperature β-1 over an energy interval Δ. We apply minimax approximation to reduce the number of poles by a factor of four and replace Δ with Δocc, the occupied energy interval. This is particularly beneficial when Δ ≫ Δocc, such as in electronic structure calculations that use a large basis set.
Physical Review A
Surface and color codes are two forms of topological quantum error correction in two spatial dimensions with complementary properties. Surface codes have lower-depth error detection circuits and well-developed decoders to interpret and correct errors, while color codes have transversal Clifford gates and better code efficiency in the number of physical qubits needed to achieve a given code distance. A formal equivalence exists between color codes and folded surface codes, but it does not guarantee the transferability of any of these favorable properties. However, the equivalence does imply the existence of constant-depth circuit implementations of logical Clifford gates on folded surface codes. We achieve and improve this result by constructing two families of folded surface codes with transversal Clifford gates. This construction is presented generally for qudits of any dimension. The specific application of these codes to universal quantum computation based on qubit fusion is also discussed.
Physical Review A
Surface and color codes are two forms of topological quantum error correction in two spatial dimensions with complementary properties. Surface codes have lower-depth error detection circuits and well-developed decoders to interpret and correct errors, while color codes have transversal Clifford gates and better code efficiency in the number of physical qubits needed to achieve a given code distance. A formal equivalence exists between color codes and folded surface codes, but it does not guarantee the transferability of any of these favorable properties. However, the equivalence does imply the existence of constant-depth circuit implementations of logical Clifford gates on folded surface codes. We achieve and improve this result by constructing two families of folded surface codes with transversal Clifford gates. This construction is presented generally for qudits of any dimension. Lastly, the specific application of these codes to universal quantum computation based on qubit fusion is also discussed.
This document is the main user guide for the Sierra/Percept capabilities including the mesh_adapt and mesh_transfer tools. Basic capabilities for uniform mesh refinement (UMR) and mesh transfers are discussed. Examples are used to provide illustration. Future versions of this manual will include more advanced features such as geometry and mesh smoothing. Additionally, all the options for the mesh_adapt code will be described in detail. Capabilities for local adaptivity in the context of offline adaptivity will also be included.
SIAM Journal on Scientific Computing
A multigrid method is proposed that combines ideas from matrix dependent multigrid for structured grids and algebraic multigrid for unstructured grids. It targets problems where a three-dimensional mesh can be viewed as an extrusion of a two-dimensional, unstructured mesh in a third dimension. Our motivation comes from the modeling of thin structures via finite elements and, more specifically, the modeling of ice sheets. Extruded meshes are relatively common for thin structures and often give rise to anisotropic problems when the thin direction mesh spacing is much smaller than the broad direction mesh spacing. Within our approach, the first few multigrid hierarchy levels are obtained by applying matrix dependent multigrid to semicoarsen in a structured thin direction fashion. After sufficient structured coarsening, the resulting mesh contains only a single layer corresponding to a two-dimensional, unstructured mesh. Algebraic multigrid can then be employed in a standard manner to create further coarse levels, as the anisotropic phenomena is no longer present in the single layer problem. The overall approach remains fully algebraic, with the minor exception that some additional information is needed to determine the extruded direction. Furthermore, this facilitates integration of the solver with a variety of different extruded mesh applications.
ACM International Conference Proceeding Series
Managing multi-level memories will require different policies from those used for cache hierarchies, as memory technologies differ in latency, bandwidth, and volatility. To this end we analyze application data allocations and main memory accesses to determine whether an application-driven approach to managing a multi-level memory system comprising stacked and conventional DRAM is viable. Our early analysis shows that the approach is viable, but some applications may require dynamic allocations (i.e., migration) while others are amenable to static allocation.
Abstract not provided.
Computer
The National Strategic Computing Initiative will inevitably produce new computer hardware, but what about software?.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Computer
Power API - the result of collaboration among national laboratories, universities, and major vendors - provides a range of standardized power management functions, from application-level control and measurement to facility-level accounting, including real-time and historical statistics gathering. Support is already available for Intel and AMD CPUs and standalone measurement devices.
New Journal of Physics
State of the art qubit systems are reaching the gate fidelities required for scalable quantum computation architectures. Further improvements in the fidelity of quantum gates demands characterization and benchmarking protocols that are efficient, reliable and extremely accurate. Ideally, a benchmarking protocol should also provide information on how to rectify residual errors. Gate set tomography (GST) is one such protocol designed to give detailed characterization of as-built qubits. We implemented GST on a high-fidelity electron-spin qubit confined by a single 31P atom in 28Si. The results reveal systematic errors that a randomized benchmarking analysis could measure but not identify, whereas GST indicated the need for improved calibration of the length of the control pulses. After introducing this modification, we measured a new benchmark average gate fidelity of , an improvement on the previous value of . Furthermore, GST revealed high levels of non-Markovian noise in the system, which will need to be understood and addressed when the qubit is used within a fault-tolerant quantum computation scheme.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area [13, 3, 5, 10, 4, 21, 19, 16, 7, 17, 20, 18, 11, 1, 6, 14, 12]. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Cyber defense is an asymmetric battle today. We need to understand better what options are available for providing defenders with possible advantages. Our project combines machine learning, optimization, and game theory to obscure our defensive posture from the information the adversaries are able to observe. The main conceptual contribution of this research is to separate the problem of prediction, for which machine learning is used, and the problem of computing optimal operational decisions based on such predictions, coupled with a model of adversarial response. This research includes modeling of the attacker and defender, formulation of useful optimization models for studying adversarial interactions, and user studies to measure the impact of the modeling approaches in realistic settings.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Quantum Information and Computation
We consider four-dimensional qudits as qubit pairs and their qudit Pauli operators as qubit Clifford operators. This introduces a nesting, C21 ⊂ C42 ⊂ C23, where Cmn is the nth level of the m-dimensional qudit Clifford hierarchy. If we can convert between logical qubits and qudits, then qudit Clifford operators are qubit non-Clifford operators. Conversion is achieved by qubit fusion and qudit fission using stabilizer circuits that consume a resource state. This resource is a fused qubit stabilizer state with a fault- tolerant state preparation using stabilizer circuits.
ACM International Conference Proceeding Series
MPI includes all processes in MPI COMM WORLD; this is untenable for reasons of scale, resiliency, and overhead. This paper offers a new approach, extending MPI with a new concept called Sessions, which makes two key contributions: a tighter integration with the underlying runtime system; and a scalable route to communication groups. This is a fundamental change in how we organise and address MPI processes that removes well-known scalability barriers by no longer requiring the global communicator MPI COMM - WORLD.
ACM International Conference Proceeding Series
Scientific workloads running on current extreme-scale systems routinely generate tremendous volumes of data for postprocessing. This data movement has become a serious issue due to its energy cost and the fact that I/O bandwidths have not kept pace with data generation rates. In situ analytics is an increasingly popular alternative in which post-simulation processing is embedded into an application, running as part of the same MPI job. This can reduce data movement costs but introduces a new potential source of interference for the application. Using a validated simulation-based approach, we investigate how best to mitigate the interference from time-shared in situ tasks for a number of key extreme-scale workloads. This paper makes a number of contributions. First, we show that the independent scheduling of in situ analytics tasks can significantly degradation application performance, with slowdowns exceeding 1000%. Second, we demonstrate that the degree of synchronization found in many modern collective algorithms is sufficient to significantly reduce the overheads of this interference to less than 10% in most cases. Finally, we show that many applications already frequently invoke collective operations that use these synchronizing MPI algorithms. Therefore, the syncronization introduced by these MPI collective algorithms can be leveraged to efficiently schedule analytics tasks with minimal changes to existing applications. This paper provides critical analysis and guidance for MPI users and developers on the importance of scheduling in situ analytics tasks. It shows the degree of synchronization needed to mitigate the performance impacts of these time-shared coupled codes and demonstrates how that synchronization can be realized in an extreme-scale environment using modern collective algorithms.
Proceedings of the International Conference on Parallel Processing Workshops
Exascale systems promise the potential for computation atunprecedented scales and resolutions, but achieving exascale by theend of this decade presents significant challenges. A key challenge isdue to the very large number of cores and components and the resultingmean time between failures (MTBF) in the order of hours orminutes. Since the typical run times of target scientific applicationsare longer than this MTBF, fault tolerance techniques will beessential. An important class of failures that must be addressed isprocess or node failures. While checkpoint/restart (C/R) is currentlythe most widely accepted technique for addressing processor failures, coordinated, stable-storage-based global C/R might be unfeasible atexascale when the time to checkpoint exceeds the expected MTBF. This paper explores transparent recovery via implicitly coordinated, diskless, application-driven checkpointing as a way to tolerateprocess failures in MPI applications at exascale. The discussedapproach leverages User Level Failure Mitigation (ULFM), which isbeing proposed as an MPI extension to allow applications to createpolicies for tolerating process failures. Specifically, this paper demonstrates how different implementations ofapplication-driven in-memory checkpoint storage and recovery comparein terms of performance and scalability. We also experimentally evaluate the effectiveness and scalability ofthe Fenix online global recovery framework on a production system-the Titan Cray XK7 at ORNL-and demonstrate the ability of Fenix totolerate dynamically injected failures using the execution of fourbenchmarks and mini-applications with different behaviors.
Proceedings - 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN-W 2016
As high-performance computing systems continue to grow in scale and complexity, the study of faults and errors is critical to the design of future systems and mitigation schemes. Fault modes in system DRAM are a frequently-investigated key aspect of memory reliability. While current schemes require offline analysis for proper classification, current state-of-the-art mitigation techniques require accurate online prediction for optimal performance. In this work, we explore the predictive performance of an online machine learning-based approach in classifying DRAM fault modes from two leadership-class supercomputing facilities. Our results compare the predictive performance of this online approach with the current rule-based approach based on expert knowledge, finding a 12% predictive performance improvement. We also investigate the universality of our classifiers by evaluating predictive performance using training data from disparate computing systems to achieve a 7% improvement in predictive performance. Our work provides a critical analysis of this online learning technique and can benefit system designers to help inform best practices for dealing with reliability on future systems.
Journal of Computational Physics
A critical aspect of applying modern computational solution methods to complex multiphysics systems of relevance to nuclear reactor modeling, is the assessment of the predictive capability of specific proposed mathematical models. In this respect the understanding of numerical error, the sensitivity of the solution to parameters associated with input data, boundary condition uncertainty, and mathematical models is critical. Additionally, the ability to evaluate and or approximate the model efficiently, to allow development of a reasonable level of statistical diagnostics of the mathematical model and the physical system, is of central importance. In this study we report on initial efforts to apply integrated adjoint-based computational analysis and automatic differentiation tools to begin to address these issues. The study is carried out in the context of a Reynolds averaged Navier-Stokes approximation to turbulent fluid flow and heat transfer using a particular spatial discretization based on implicit fully-coupled stabilized FE methods. Initial results are presented that show the promise of these computational techniques in the context of nuclear reactor relevant prototype thermal-hydraulics problems.
Proceedings of the National Academy of Sciences of the United States of America
Rewarding experiences are often well remembered, and such memory formation is known to be dependent on dopamine modulation of the neural substrates engaged in learning and memory; however, it is unknown how and where in the brain dopamine signals bias episodic memory toward preceding rather than subsequent events. Here we found that photostimulation of channelrhodopsin-2-expressing dopaminergic fibers in the dentate gyrus induced a long-term depression of cortical inputs, diminished theta oscillations, and impaired subsequent contextual learning. Computational modeling based on this dopamine modulation indicated an asymmetric association of events occurring before and after reward in memory tasks. In subsequent behavioral experiments, preexposure to a natural reward suppressed hippocampus-dependent memory formation, with an effective time window consistent with the duration of dopamine-induced changes of dentate activity. Overall, our results suggest a mechanism by which dopamine enables the hippocampus to encode memory with reduced interference from subsequent experience.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The next two Advanced Technology platforms for the ASC program will feature complex memory hierarchies – in the Trinity supercomputer being deployed in 2016, Intel’s Knights Landing processors will feature 16GB of on-package, high-bandwidth memory, combined with a larger capacity DDR4 memory and in 2018, the Sierra machine deployed at Lawrence Livermore National Laboratory will feature powerful compute nodes containing POWER9 processors with large capacity memories and an array of coherent GPU accelerators also with high bandwidth memories.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Ocean Modelling
Newton–Krylov solvers for ocean tracers have the potential to greatly decrease the computational costs of spinning up deep-ocean tracers, which can take several thousand model years to reach equilibrium with surface processes. One version of the algorithm uses offline tracer transport matrices to simulate an annual cycle of tracer concentrations and applies Newton's method to find concentrations that are periodic in time. Here we present the impact of time-averaging the transport matrices on the equilibrium values of an ideal-age tracer. We compared annually-averaged, monthly-averaged, and 5-day-averaged transport matrices to an online simulation using the ocean component of the Community Earth System Model (CESM) with a nominal horizontal resolution of 1° × 1° and 60 vertical levels. We found that increasing the time resolution of the offline transport model reduced a low age bias from 12% for the annually-averaged transport matrices, to 4% for the monthly-averaged transport matrices, and to less than 2% for the transport matrices constructed from 5-day averages. The largest differences were in areas with strong seasonal changes in the circulation, such as the Northern Indian Ocean. For many applications the relatively small bias obtained using the offline model makes the offline approach attractive because it uses significantly less computer resources and is simpler to set up and run.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Remote sensing systems have firmly established a role in providing immense value to commercial industry, scientific exploration, and national security. Continued maturation of sensing technology has reduced the cost of deploying highly-capable sensors while at the same time increased reliance on the information these sensors can provide. The demand for time on these sensors is unlikely to diminish. Coordination of next-generation sensor systems, larger constellations of satellites, unmanned aerial vehicles, ground telescopes, etc. is prohibitively complex for existing heuristics-based scheduling techniques. The project was a two-year collaboration spanning multiple Sandia centers and included a partnership with Texas A&M University. We have developed algorithms and software for collection scheduling, remote sensor field-of-view pointing models, and bandwidth-constrained prioritization of sensor data. Our approach followed best practices from the operations research and computational geometry communities. These models provide several advantages over state of the art techniques. In particular, our approach is more flexible compared to heuristics that tightly couple models and solution techniques. First, our mixed-integer linear models afford a rigorous analysis so that sensor planners can quantitatively describe a schedule relative to the best possible. Optimal or near-optimal schedules can be produced with commercial solvers in operational run-times. These models can be modified and extended to incorporate different scheduling and resource constraints and objective function definitions. Further, we have extended these models to proactively schedule sensors under weather and ad hoc collection uncertainty. This approach stands in contrast to existing deterministic schedulers which assume a single future weather or ad hoc collection scenario. The field-of-view pointing algorithm produces a mosaic with the fewest number of images required to fully cover a region of interest. The bandwidth-constrained algorithms find the highest priority information that can be transmitted. All of these are based on mixed-integer linear programs so that, in the future, collection scheduling, field-of-view, and bandwidth prioritization can be combined into a single problem. Experiments conducted using the developed models, commercial solvers, and benchmark datasets have demonstrated that proactively scheduling against uncertainty regularly and significantly outperforms deterministic schedulers.
We assess how geospatial-temporal semantic graphs and our GeoGraphy code implementation might contribute to induced seismicity analysis. We focus on evaluating strengths and weaknesses of both 1) the fundamental concept of semantic graphs and 2) our current code implementation. With extensions and research effort, code implementation limitations can be overcome. The paper also describes relevance including possible data input types, expected analytical outcomes and how it can pair with other approaches and fit into a workflow.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Engineering decisions are often formulated as optimization problems such as the optimal design or control of physical systems. In these applications, the resulting optimization problems are constrained by large-scale simulations involving systems of partial differential equations (PDEs), ordinary differential equations (ODEs), and differential algebraic equations (DAEs). In addition, critical components of these systems are fraught with uncertainty, including unverifiable modeling assumptions, unknown boundary and initial conditions, and uncertain coefficients. Typically, these components are estimated using noisy and incomplete data from a variety of sources (e.g., physical experiments). The lack of knowledge of the true underlying probabilistic characterization of model inputs motivates the need for optimal solutions that are robust to this uncertainty. In this report, we introduce a framework for handling "distributional" uncertainties in the context of simulation-based optimization. This includes a novel measure discretization technique that will lead to an adaptive optimization algorithm tailored to exploit the structures inherent to simulation- based optimization.
Parametric sensitivities of dynamic system responses are very useful in a variety of applications, including circuit optimization and uncertainty quantification. Sensitivity calculation methods fall into two related categories: direct and adjoint methods. Effective implementation of such methods in a production circuit simulator poses a number of technical challenges, including instrumentation of device models. This report documents several years of work developing and implementing direct and adjoint sensitivity methods in the Xyce circuit simulator. Much of this work sponsored by the Laboratory Directed Research and Development (LDRD) Program at Sandia National Laboratories, under project LDRD 14-0788.
This report summarizes the methods and algorithms that were developed on the Sandia National Laboratory LDRD project entitled "Advanced Uncertainty Quantification Methods for Circuit Simulation", which was project # 173331 and proposal # 2016-0845. As much of our work has been published in other reports and publications, this report gives an brief summary. Those who are interested in the technical details are encouraged to read the full published results and also contact the report authors for the status of follow-on projects.
Abstract not provided.
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressing four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.
Abstract not provided.
The Next Generation Global Atmosphere Model LDRD project developed a suite of atmosphere models: a shallow water model, an x-z hydrostatic model, and a 3D hydrostatic model, by using Albany, a finite element code. Albany provides access to a large suite of leading-edge Sandia high-performance computing technologies enabled by Trilinos, Dakota, and Sierra. The next-generation capabilities most relevant to a global atmosphere model are performance portability and embedded uncertainty quantification (UQ). Performance portability is the capability for a single code base to run efficiently on diverse set of advanced computing architectures, such as multi-core threading or GPUs. Embedded UQ refers to simulation algorithms that have been modified to aid in the quantifying of uncertainties. In our case, this means running multiple samples for an ensemble concurrently, and reaping certain performance benefits. We demonstrate the effectiveness of these approaches here as a prelude to introducing them into ACME.
This report describes a new capability for hierarchical task-data parallelism using Sandia's Kokkos and Qthreads, and evaluation of this capability with sparse matrix Cholesky factorization and social network triangle enumeration mini-applications. Hierarchical task-data parallelism consists of a collection of tasks with executes-after dependences where each task contains data parallel operations performed on a team of hardware threads. The collection of tasks and dependences form a directed acyclic graph of tasks - a task DAG. Major challenges of this research and development effort include: portability and performance across multicore CPU; manycore Intel Xeon Phi, and NVIDIA GPU architectures; scalability with respect to hardware concurrency and size of the task DAG; and usability of the application programmer interface (API).
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Physical Review Fluids
The Rayleigh-Taylor instability (RTI) is investigated using the direct simulation Monte Carlo (DSMC) method of molecular gas dynamics. Here, fully resolved two-dimensional DSMC RTI simulations are performed to quantify the growth of flat and single-mode perturbed interfaces between two atmospheric-pressure monatomic gases as a function of the Atwood number and the gravitational acceleration. The DSMC simulations reproduce many qualitative features of the growth of the mixing layer and are in reasonable quantitative agreement with theoretical and empirical models in the linear, nonlinear, and self-similar regimes. In some of the simulations at late times, the instability enters the self-similar regime, in agreement with experimental observations. For the conditions simulated, diffusion can influence the initial instability growth significantly.
Applied Physics Letters
Enhancement-mode Si/SiGe electron quantum dots have been pursued extensively by many groups for their potential in quantum computing. Most of the reported dot designs utilize multiple metal-gate layers and use Si/SiGe heterostructures with Ge concentration close to 30%. Here, we report the fabrication and low-temperature characterization of quantum dots in the Si/Si0.8Ge0.2 heterostructures using only one metal-gate layer. We find that the threshold voltage of a channel narrower than 1 μm increases as the width decreases. The higher threshold can be attributed to the combination of quantum confinement and disorder. We also find that the lower Ge ratio used here leads to a narrower operational gate bias range. The higher threshold combined with the limited gate bias range constrains the device design of lithographic quantum dots. We incorporate such considerations in our device design and demonstrate a quantum dot that can be tuned from a single dot to a double dot. The device uses only a single metal-gate layer, greatly simplifying device design and fabrication.
Physical Review B
In both continuum hydrodynamics simulations and also multimillion atom reactive molecular dynamics simulations of shockwave propagation in single crystal pentaerythritol tetranitrate (PETN) containing a cylindrical void, we observed the formation of an initial radially symmetric hot spot. By extending the simulation time to the nanosecond scale, however, we observed the transformation of the small symmetric hot spot into a longitudinally asymmetric hot region extending over a much larger volume. Performing reactive molecular dynamics shock simulations using the reactive force field (ReaxFF) as implemented in the LAMMPS molecular dynamics package, we showed that the longitudinally asymmetric hot region was formed by coalescence of the primary radially symmetric hot spot with a secondary triangular hot zone. We showed that the triangular hot zone coincided with a double-shocked region where the primary planar shockwave was overtaken by a secondary cylindrical shockwave. The secondary cylindrical shockwave originated in void collapse after the primary planar shockwave had passed over the void. A similar phenomenon was observed in continuum hydrodynamics shock simulations using the CTH hydrodynamics package. The formation and growth of extended asymmetric hot regions on nanosecond timescales has important implications for shock initiation thresholds in energetic materials.
Philosophical Magazine (2003, Print)
Low-mobility twin grain boundaries dominate the microstructure of grain boundary-engineered materials and are critical to understanding their plastic deformation behaviour. The presence of solutes, such as hydrogen, has a profound effect on the thermodynamic stability of the grain boundaries. This work examines the case of a Σ3 grain boundary at inclinations from 0° ≤ Φ ≤ 90°. The angle Φ corresponds to the rotation of the Σ3 (1 1 1) < 1 1 0 > (coherent) into the Σ3 (1 1 2) < 1 1 0 > (lateral) twin boundary. To this end, atomistic models of inclined grain boundaries, utilising empirical potentials, are used to elucidate the finite-temperature boundary structure while grand canonical Monte Carlo models are applied to determine the degree of hydrogen segregation. In order to understand the boundary structure and segregation behaviour of hydrogen, the structural unit description of inclined twin grain boundaries is found to provide insight into explaining the observed variation of excess enthalpy and excess hydrogen concentration on inclination angle, but the explanatory power is limited by how the enthalpy of segregation is affected by hydrogen concentration. At higher concentrations, the grain boundaries undergo a defaceting transition. In order to develop a more complete mesoscale model of the interfacial behaviour, an analytical model of boundary energy and hydrogen segregation that relies on modelling the boundary as arrays of discrete 1/3 < 1 1 1 > disconnections is constructed. Lastly, the complex interaction of boundary reconstruction and concentration-dependent segregation behaviour exhibited by inclined twin grain boundaries limits the range of applicability of such an analytical model and illustrates the fundamental limitations for a structural unit model description of segregation in lower stacking fault energy materials.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.