Explicit Partitioned Methods for Coupled Problems Based on Lagrange Multipliers
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Nano Letters
Electron transport through a nanostructure can be characterized in part using concepts from classical fluid dynamics. Hence, it is natural to ask how far the analogy can be taken and whether the electron liquid can exhibit nonlinear dynamical effects such as turbulence. Here we present an ab initio study of the electron dynamics in nanojunctions which reveals that the latter indeed exhibits behavior quite similar to that of a classical fluid. In particular, we find that a transition from laminar to turbulent flow occurs with increasing current, corresponding to increasing Reynolds numbers. These findings reveal unexpected features of electron dynamics and shed new light on our understanding of transport properties of nanoscale systems.
Physical Review Letters
We demonstrate an experimental implementation of robust phase estimation (RPE) to learn the phase of a single-qubit rotation on a trapped Yb+ ion qubit. We show this phase can be estimated with an uncertainty below 4×10-4 rad using as few as 176 total experimental samples, and our estimates exhibit Heisenberg scaling. Unlike standard phase estimation protocols, RPE neither assumes perfect state preparation and measurement, nor requires access to ancillae. We crossvalidate the results of RPE with the more resource-intensive protocol of gate set tomography.
Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
The skip list is an elegant dictionary data structure that is commonly deployed in RAM. A skip list with N elements supports searches, inserts, and deletes in O (log N) operations with high probability (w.h.p.) and range queries returning K elements in O(log N + K) operations w.h.p. A seemingly natural way to generalize the skip list to external memory with block size B is to "promote" with probability 1/B, rather than 1/2. However, there are practical and theoretical obstacles to getting the skip list to retain its efficient performance, space bounds, and high-probability guarantees. We give an external-memory skip list that achieves write-optimized bounds. That is, for 0 < ϵ < 1, range queries take O(logBϵ N + K/B) I/Os w.h.p. and insertions and deletions take O((logBϵ N)/B1-ϵ) amortized I/Os w.h.p. Our write-optimized skip list inherits the virtue of simplicity from RAM skip lists. Moreover, it matches or beats the asymptotic bounds of prior write-optimized data structures such as Bϵ trees or LSM trees. These data structures are deployed in high-performance databases and file systems. The main technical challenge in proving our bounds comes from the fact that there are so few levels in the skip list, an aspect of the data structure that is essential to getting strong external-memory bounds. We use extremal-graph coloring to show that it is possible to decompose paths in the skip list into uncorrelated groups, regardless of the insertion/deletion pattern. Thus, we achieve our bounds by averaging over these uncorrelated paths rather than by averaging over uncorrelated levels, as in the standard skip list.
Abstract not provided.
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressing four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In this paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.
Abstract not provided.
Abstract not provided.
This document is the main user guide for the Sierra/Percept capabilities including the mesh_adapt and mesh_transfer tools. Basic capabilities for uniform mesh refinement (UMR) and mesh transfers are discussed. Examples are used to provide illustration. Future versions of this manual will include more advanced features such as geometry and mesh smoothing. Additionally, all the options for the mesh_adapt code will be described in detail. Capabilities for local adaptivity in the context of offline adaptivity will also be included. This page intentionally left blank.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The USACM Thematic Workshop on Uncertainty Quantification and Data-Driven Modeling was held on March 23-24, 2017, in Austin, TX. The organizers of the technical program were James R. Stewart of Sandia National Laboratories and Krishna Garikipati of University of Michigan. The administrative organizer was Ruth Hengst, who serves as Program Coordinator for the USACM. The organization of this workshop was coordinated through the USACM Technical Thrust Area on Uncertainty Quantification and Probabilistic Analysis. The workshop website (http://uqpm2017.usacm.org) includes the presentation agenda as well as links to several of the presentation slides (permission to access the presentations was granted by each of those speakers, respectively). Herein, this final report contains the complete workshop program that includes the presentation agenda, the presentation abstracts, and the list of posters.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
SIAM/ASA Journal of Uncertainty Quantification
Abstract not provided.
Abstract not provided.
Abstract not provided.
Discrete and Continuous Dynamical Systems - Series B
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
bioRxiv
Abstract not provided.
R&D Magazine
Abstract not provided.
Abstract not provided.
Abstract not provided.
SIAM Journal on Scientific Computing
In this study, quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers of sources. Often simulation processes from sample to sample are similar and much of the data generated from each sample evaluation could be reused. We explore a new method for implementing sampling methods that simultaneously propagates groups of samples together in an embedded fashion, which we call embedded ensemble propagation. We show how this approach takes advantage of properties of modern computer architectures to improve performance by enabling reuse between samples, reducing memory bandwidth requirements, improving memory access patterns, improving opportunities for fine-grained parallelization, and reducing communication costs. We describe a software technique for implementing embedded ensemble propagation based on the use of C++ templates and describe its integration with various scientific computing libraries within Trilinos. We demonstrate improved performance, portability and scalability for the approach applied to the simulation of partial differential equations on a variety of CPU, GPU, and accelerator architectures, including up to 131,072 cores on a Cray XK7 (Titan).
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The growth of a cylindrical s park discharge channel in water and Lexan is studied using a series of one - dimensional simulations with the finite - element radiation - magnetohydrodynamics code ALEGRA. Computed solutions are analyzed in order to characterize the rate of growth and dynamics of the spark c hannels during the rising - current phase of the drive pulse. The current ramp rate is varied between 0.2 and 3.0 kA/ns, and values of the mechanical coupling coefficient K p are extracted for each case. The simulations predict spark channel expansion veloc ities primarily in the range of 2000 to 3500 m/s, channel pressures primarily in the range 10 - 40 GPa, and K p values primarily between 1.1 and 1.4. When Lexan is preheated, slightly larger expansion velocities and smaller K p values are predicted , but the o verall behavior is unchanged.
CAD Computer Aided Design
We present an all-quad meshing algorithm for general domains. We start with a strongly balanced quadtree. In contrast to snapping the quadtree corners onto the geometric domain boundaries, we move them away from the geometry. Then we intersect the moved grid with the geometry. The resulting polygons are converted into quads with midpoint subdivision. Moving away avoids creating any flat angles, either at a quadtree corner or at a geometry–quadtree intersection. We are able to handle two-sided domains, and more complex topologies than prior methods. The algorithm is provably correct and robust in practice. It is cleanup-free, meaning we have angle and edge length bounds without the use of any pillowing, swapping, or smoothing. Thus, our simple algorithm is fast and predictable. This paper has better quality bounds, and the algorithm is demonstrated over more complex domains, than our prior version.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The FY17Q2 milestone of the ECP/VTK-m project, which is the first milestone, includes the completion of design documents for the introduction of virtual methods into the VTK-m framework. Specifically, the ability from within the code of a device (e.g. GPU or Xeon Phi) to jump to a virtual method specified at run time. This change will enable us to drastically reduce the compile time and the executable code size for the VTK-m library. Our first design introduced the idea of adding virtual functions to classes that are used during algorithm execution. (Virtual methods were previously banned from the so called execution environment.) The design was straightforward. VTK-m already has the generic concepts of an “array handle” that provides a uniform interface to memory of different structures and an “array portal” that provides generic access to said memory. These array handles and portals use C++ templating to adjust them to different memory structures. This composition provides a powerful ability to adapt to data sources, but requires knowing static types. The proposed design creates a template specialization of an array portal that decorates another array handle while hiding its type. In this way we can wrap any type of static array handle and then feed it to a single compiled instance of a function. The second design focused on the mechanics of implementing virtual methods on parallel devices with a focus on CUDA. Our initial experiments on CUDA showed a very large overhead for using virtual C++ classes with virtual methods, the standard approach. Instead, we are using an alternate method provided by C that uses function pointers. With the completion of this milestone, we are able to move to the implementation of objects with virtual (like) methods. The upshot will be much faster compile times and much smaller library/executable sizes.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
IEEE Spectrum
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Accurate and efficient constitutive modeling remains a cornerstone issues for solid mechanics analysis. Over the years, the LAME advanced material model library has grown to address this challenge by implementing models capable of describing material systems spanning soft polymers to s ti ff ceramics including both isotropic and anisotropic responses. Inelastic behaviors including (visco) plasticity, damage, and fracture have all incorporated for use in various analyses. This multitude of options and flexibility, however, comes at the cost of many capabilities, features, and responses and the ensuing complexity in the resulting implementation. Therefore, to enhance confidence and enable the utilization of the LAME library in application, this effort seeks to document and verify the various models in the LAME library. Specifically, the broader strategy, organization, and interface of the library itself is first presented. The physical theory, numerical implementation, and user guide for a large set of models is then discussed. Importantly, a number of verification tests are performed with each model to not only have confidence in the model itself but also highlight some important response characteristics and features that may be of interest to end-users. Finally, in looking ahead to the future, approaches to add material models to this library and further expand the capabilities are presented.
Journal of Parallel and Distributed Computing
A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” and if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Thus, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient. In this paper, we explore the design space for a user-controlled multi-level main memory. Our work identifies situations in which rewriting application kernels can provide significant performance gains when using near-memory. We present algorithms designed for two-level main memory, using divide-and-conquer to partition computations and streaming to exploit data locality. We consider algorithms for the fundamental application of sorting and for the data analysis kernel k-means. Our algorithms asymptotically reduce memory-block transfers under certain architectural parameter settings. We use and extend Sandia National Laboratories’ SST simulation capability to demonstrate the relationship between increased bandwidth and improved algorithmic performance. Memory access counts from simulations corroborate predicted performance improvements for our sorting algorithm. In contrast, the k-means algorithm is generally CPU bound and does not improve when using near-memory except under extreme conditions. These conditions require large instances that rule out SST simulation, but we demonstrate improvements by running on a customized machine with high and low bandwidth memory. These case studies in co-design serve as positive and cautionary templates, respectively, for the major task of optimizing the computational kernels of many fundamental applications for two-level main memory systems.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tar geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The purpose of this short contribution is to report on the development of a Spectral Neighbor Analysis Potential (SNAP) for tungsten. We have focused on the characterization of elastic and defect properties of the pure material in order to support molecular dynamics simulations of plasma-facing materials in fusion reactors. A parallel genetic algorithm approach was used to efficiently search for fitting parameters optimized against a large number of objective functions. In addition, we have shown that this many-body tungsten potential can be used in conjunction with a simple helium pair potential1 to produce accurate defect formation energies for the W-He binary system.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
International Journal of Rock Mechanics and Mining Sciences
A non-local formulation of classical continuum mechanics theory known as peridynamics is used to study fracture initiation and growth from a wellbore penetrating the subsurface within the context of propellant-based stimulation. The principal objectives of this work are to analyze the influence of loading conditions on the resulting fracture pattern, to investigate the effect of in-situ stress anisotropy on fracture propagation, and to assess the suitability of peridynamics for modeling complex fracture formation. It is shown that the loading rate significantly influences the number and extent of fractures initiated from a borehole. Results show that low loading rates produce fewer but longer fractures, whereas high loading rates produce numerous shorter fractures around the borehole. The numerical method is able to predict fracture growth patterns over a wide range of loading and stress conditions. Our results also show that fracture growth is attenuated with increasing in-situ confining stress, and, in the case of confining stress anisotropy, fracture extensions are largest in the direction perpendicular to the minimum compressive stress. Since the results are in broad qualitative agreement with experimental and numerical studies found in the literature, suggesting that peridynamics can be a powerful tool in the study of complex fracture network formation.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Materials Processing Technology
An adage within the Additive Manufacturing (AM) community is that “complexity is free”. Complicated geometric features that normally drive manufacturing cost and limit design options are not typically problematic in AM. While geometric complexity is usually viewed from the perspective of part design, this advantage of AM also opens up new options in rapid, efficient material property evaluation and qualification. In the current work, an array of 100 miniature tensile bars are produced and tested for a comparable cost and in comparable time to a few conventional tensile bars. With this technique, it is possible to evaluate the stochastic nature of mechanical behavior. The current study focuses on stochastic yield strength, ultimate strength, and ductility as measured by strain at failure (elongation). However, this method can be used to capture the statistical nature of many mechanical properties including the full stress-strain constitutive response, elastic modulus, work hardening, and fracture toughness. Moreover, the technique could extend to strain-rate and temperature dependent behavior. As a proof of concept, the technique is demonstrated on a precipitation hardened stainless steel alloy, commonly known as 17-4PH, produced by two commercial AM vendors using a laser powder bed fusion process, also commonly known as selective laser melting. Using two different commercial powder bed platforms, the vendors produced material that exhibited slightly lower strength and markedly lower ductility compared to wrought sheet. Moreover, the properties were much less repeatable in the AM materials as analyzed in the context of a Weibull distribution, and the properties did not consistently meet minimum allowable requirements for the alloy as established by AMS. The diminished, stochastic properties were examined in the context of major contributing factors such as surface roughness and internal lack-of-fusion porosity. This high-throughput capability is expected to be useful for follow-on extensive parametric studies of factors that affect the statistical reliability of AM components.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.