DOE NNSA Vanguard Program
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Materials Science
There has recently been a great deal of interest in employing immiscible solutes to stabilize nanocrystalline microstructures. Existing modeling efforts largely rely on mesoscale Monte Carlo approaches that employ a simplified model of the microstructure and result in highly homogeneous segregation to grain boundaries. However, there is ample evidence from experimental and modeling studies that demonstrates segregation to grain boundaries is highly non-uniform and sensitive to boundary character. This work employs a realistic nanocrystalline microstructure with experimentally relevant global solute concentrations to illustrate inhomogeneous boundary segregation. Furthermore, experiments quantifying segregation in thin films are reported that corroborate the prediction that grain boundary segregation is highly inhomogeneous. In addition to grain boundary structure modifying the degree of segregation, the existence of a phase transformation between low and high solute content grain boundaries is predicted. In order to conduct this study, new embedded atom method interatomic potentials are developed for Pt, Au, and the PtAu binary alloy.
2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
Triangle counting serves as a key building block for a set of important graph algorithms in network science. In this paper, we address the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node. Our implementation uses a linear algebra-based approach to triangle counting that has grown out of work related to our miniTri data analytics miniapplication [1] and our efforts to pose graph algorithms in the language of linear algebra. We leverage KokkosKernels to implement this approach efficiently on multicore architectures. Our performance results are competitive with the fastest known graph traversal-based approaches and are significantly faster than the Graph Challenge reference implementations, up to 670,000 times faster than the C++ reference and 10,000 times faster than the Python reference on a single Intel Haswell node.
SIAM Journal on Scientific Computing
In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments. In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.
Computers and Mathematics with Applications
The discontinuous Petrov–Galerkin (DPG) methodology of Demkowicz and Gopalakrishnan (2010, 2011) guarantees the optimality of the solution in an energy norm, and provides several features facilitating adaptive schemes. A key question that has not yet been answered in general – though there are some results for Poisson, e.g.– is how best to precondition the DPG system matrix, so that iterative solvers may be used to allow solution of large-scale problems. In this paper, we detail a strategy for preconditioning the DPG system matrix using geometric multigrid which we have implemented as part of Camellia (Roberts, 2014, 2016), and demonstrate through numerical experiments its effectiveness in the context of several variational formulations. We observe that in some of our experiments, the behavior of the preconditioner is closely tied to the discrete test space enrichment. We include experiments involving adaptive meshes with hanging nodes for lid-driven cavity flow, demonstrating that the preconditioners can be applied in the context of challenging problems. We also include a scalability study demonstrating that the approach – and our implementation – scales well to many MPI ranks.
Parallel Computing
Transient simulation in circuit simulation tools, such as SPICE and Xyce, depend on scalable and robust sparse LU factorizations for efficient numerical simulation of circuits and power grids. As the need for simulations of very large circuits grow, the prevalence of multicore architectures enable us to use shared memory parallel algorithms for such simulations. A parallel factorization is a critical component of such shared memory parallel simulations. We develop a parallel sparse factorization algorithm that can solve problems from circuit simulations efficiently, and map well to architectural features. This new factorization algorithm exposes hierarchical parallelism to accommodate irregular structure that arise in our target problems. It also uses a hierarchical two-dimensional data layout which reduces synchronization costs and maps to memory hierarchy found in multicore processors. We present an OpenMP based implementation of the parallel algorithm in a new multithreaded solver called Basker in the Trilinos framework. We present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulation. Basker achieves a geometric mean speedup of 5.91× on CPU (16 cores) and 7.4× on Xeon Phi (32 cores) relative to state-of-the-art solver KLU. Basker outperforms Intel MKL Pardiso solver (PMKL) by as much as 30× on CPU (16 cores) and 7.5× on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4× speedup on a challenging matrix sequence taken from an actual Xyce simulation.
Journal of Geophysical Research: Solid Earth
Physics-based models of volcanic eruptions track conduit processes as functions of depth and time. When used in inversions, these models permit integration of diverse geological and geophysical data sets to constrain important parameters of magmatic systems. We develop a 1-D steady state conduit model for effusive eruptions including equilibrium crystallization and gas transport through the conduit and compare with the quasi-steady dome growth phase of Mount St. Helens in 2005. Viscosity increase resulting from pressure-dependent crystallization leads to a natural transition from viscous flow to frictional sliding on the conduit margin. Erupted mass flux depends strongly on wall rock and magma permeabilities due to their impact on magma density. Including both lateral and vertical gas transport reveals competing effects that produce nonmonotonic behavior in the mass flux when increasing magma permeability. Using this physics-based model in a Bayesian inversion, we link data sets from Mount St. Helens such as extrusion flux and earthquake depths with petrological data to estimate unknown model parameters, including magma chamber pressure and water content, magma permeability constants, conduit radius, and friction along the conduit walls. Even with this relatively simple model and limited data, we obtain improved constraints on important model parameters. We find that the magma chamber had low (<5 wt %) total volatiles and that the magma permeability scale is well constrained at ∼10−11.4m2 to reproduce observed dome rock porosities. Compared with previous results, higher magma overpressure and lower wall friction are required to compensate for increased viscous resistance while keeping extrusion rate at the observed value.
Abstract not provided.
This report summarizes a NEAMS (Nuclear Energy Advanced Modeling and Simulation) project focused on developing a sampling capability that can handle the challenges of generating samples from nuclear cross-section data. The covariance information between energy groups tends to be very ill-conditioned and thus poses a problem using traditional methods for generated correlated samples. This report outlines a method that addresses the sample generation from cross-section matrices.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Continuous or regularly scheduled monitoring has the potential to quickly identify changes in the environment. However, even with low - cost sensors, only a limited number of sensors can be deployed. The physical placement of these sensors, along with the sensor technology and operating conditions, can have a large impact on the performance of a monitoring strategy. Chama is an open source Python package which includes mixed - integer, stochastic programming formulations to determine sensor locations and technology that maximize monitoring effectiveness. The methods in Chama are general and can be applied to a wide range of applications. Chama is currently being used to design sensor networks to monitor airborne pollutants and to monitor water quality in water distribution systems. The following documentation includes installation instructions and examples, description of software features, and software license. The software is intended to be used by regulatory agencies, industry, and the research community. It is assumed that the reader is familiar with the Python Programming Language. References are included for addit ional background on software components. Online documentation, hosted at http://chama.readthedocs.io/, will be updated as new features are added. The online version includes API documentation .
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressing four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.
Abstract not provided.
IEEE Transactions on Parallel and Distributed Systems
The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.
IEEE Transactions on Parallel and Distributed Systems
Obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. If online recovery is performed in a local manner further scalability is enabled, not only due to the intrinsic lower costs of recovering locally, but also due to derived effects when using some application types. In this paper we model one such effect, namely multiple failure masking, that manifests when running Stencil parallel computations on an environment when failures are recovered locally. First, the delay propagation shape of one or multiple failures recovered locally is modeled to enable several analyses of the probability of different levels of failure masking under certain Stencil application behaviors. Our results indicate that failure masking is an extremely desirable effect at scale which manifestation is more evident and beneficial as the machine size or the failure rate increase.
The FY17Q4 milestone of the ECP/VTK-m project includes the completion of a key-reduce scheduling mechanism, a spatial division algorithm, an algorithm for basic particle advection, and the computation of smoothed surface normals. With the completion of this milestone, we are able to, respectively, more easily group like elements (a common visualization algorithm operation), provide the fundamentals for geometric search structures, provide the fundamentals for many flow visualization algorithms, and provide more realistic rendering of surfaces approximated with facets.
Physical Review Letters
Randomized benchmarking (RB) is widely used to measure an error rate of a set of quantum gates, by performing random circuits that would do nothing if the gates were perfect. In the limit of no finite-sampling error, the exponential decay rate of the observable survival probabilities, versus circuit length, yields a single error metric r. For Clifford gates with arbitrary small errors described by process matrices, r was believed to reliably correspond to the mean, over all Clifford gates, of the average gate infidelity between the imperfect gates and their ideal counterparts. We show that this quantity is not a well-defined property of a physical gate set. It depends on the representations used for the imperfect and ideal gates, and the variant typically computed in the literature can differ from r by orders of magnitude. We present new theories of the RB decay that are accurate for all small errors describable by process matrices, and show that the RB decay curve is a simple exponential for all such errors. These theories allow explicit computation of the error rate that RB measures (r), but as far as we can tell it does not correspond to the infidelity of a physically allowed (completely positive) representation of the imperfect gates.
Journal of Loss Prevention in the Process Industries
Optimal design of a gas detection systems is challenging because of the numerous sources of uncertainty, including weather and environmental conditions, leak location and characteristics, and process conditions. Rigorous CFD simulations of dispersion scenarios combined with stochastic programming techniques have been successfully applied to the problem of optimal gas detector placement; however, rigorous treatment of sensor failure and nonuniform unavailability has received less attention. To improve reliability of the design, this paper proposes a problem formulation that explicitly considers nonuniform unavailabilities and all backup detection levels. The resulting sensor placement problem is a large-scale mixed-integer nonlinear programming (MINLP) problem that requires a tailored solution approach for efficient solution. We have developed a multitree method which depends on iteratively solving a sequence of upper-bounding master problems and lower-bounding subproblems. The tailored global solution strategy is tested on a real data problem and the encouraging numerical results indicate that our solution framework is promising in solving sensor placement problems. This study was selected for the special issue in JLPPI from the 2016 International Symposium of the MKO Process Safety Center.
Proceedings - IEEE International Conference on Cluster Computing, ICCC
While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed computing models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging software ecosystems.In this paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifically, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, effectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.
Proceedings - IEEE International Conference on Cluster Computing, ICCC
Aggregating millions of hardware components to construct an exascale computing platform will pose significant resilience challenges. In addition to slowdowns associated with detected errors, silent errors are likely to further degrade application performance. Moreover, silent data corruption (SDC) has the potential to undermine the integrity of the results produced by important scientific applications.In this paper, we propose an application-independent mechanism to efficiently detect and correct SDC in read-mostly memory, where SDC may be most likely to occur. We use memory protection mechanisms to maintain compressed backups of application memory. We detect SDC by identifying changes in memory contents that occur without explicit write operations. We demonstrate that, for several applications, our approach can potentially protect a significant fraction of application memory pages from SDC with modest overheads. Moreover, our proposed technique can be straightforwardly combined with many other approaches to provide a significant bulwark against SDC.
Presented in this document is a small portion of the tests that exist in the Sierra/SolidMechanics (Sierra/SM) verification test suite. Most of these tests are run nightly with the Sierra/SM code suite, and the results of the test are checked versus the correct analytical result. For each of the tests presented in this document, the test setup, a description of the analytic solution, and comparison of the Sierra/SM code results to the analytic solution is provided. Mesh convergence is also checked on a nightly basis for several of these tests. This document can be used to confirm that a given code capability is verified or referenced as a compilation of example problems. Additional example problems are provided in the Sierra/SM Example Problems Manual. Note, many other verification tests exist in the Sierra/SM test suite, but have not yet been included in this manual.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The presentation documented the technical approach of the team and summary of the results with sufficient detail to demonstrate both the value and the completion of the milestone. A separate SAND report was also generated with more detail to supplement the presentation.
The overall goal of this work was to utilize the Advanced Power Management (APM) capabilities of the ATS-1 Trinity platform to understand the power usage behavior of ASC workloads running on Trinity and gain insight into the potential for utilizing power management techniques on future ASC platforms.
This report summarizes the work performed as part of a FY17 CSSE L2 milestone to in- vestigate the power usage behavior of ASC workloads running on the ATS-1 Trinity plat- form. Techniques were developed to instrument application code regions of interest using the Power API together with the Kokkos profiling interface and Caliper annotation library. Experiments were performed to understand the power usage behavior of mini-applications and the SNL/ATDM SPARC application running on ATS-1 Trinity Haswell and Knights Landing compute nodes. A taxonomy of power measurement approaches was identified and presented, providing a guide for application developers to follow. Controlled scaling study experiments were performed on up to 2048 nodes of Trinity along with smaller scale ex- periments on Trinity testbed systems. Additionally, power and energy system monitoring information from Trinity was collected and archived for post analysis of "in-the-wild" work- loads. Results were analyzed to assess the sensitivity of the workloads to ATS-1 compute node type (Haswell vs. Knights Landing), CPU frequency control, node-level power capping control, OpenMP configuration, Knights Landing on-package memory configuration, and algorithm/solver configuration. Overall, this milestone lays groundwork for addressing the long-term goal of determining how to best use and operate future ASC platforms to achieve the greatest benefit subject to a constrained power budget.
Abstract not provided.
Sintering is a component fabrication process in which powder is compacted by pressing or some other means and then held at elevated temperature for a period of hours. The powder grains bond with each other, leading to the formation of a solid component with much lower porosity, and therefore higher density and higher strength, than the original powder compact. In this project, we investigated a new way of computationally modeling sintering at the length scale of grains. The model uses a high-fidelity, three-dimensional representation with a few hundred nodes per grain. The numerical model solves the peridynamic equations, in which nonlocal forces allow representation of the attraction, adhesion, and mass diffusion between grains. The deformation of the grains is represented through a viscoelastic material model. The project successfully demonstrated the use of this method to reproduce experimentally observed features of material behavior in sintering, including densification, the evolution of microstructure, and the occurrence of random defects in the sintered solid.
Abstract not provided.
As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilize memory.
This milestone is a tri-lab deliverable supporting ongoing Co-Design efforts impacting applications in the Integrated Codes (IC) program element Advanced Technology Development and Mitigation (ATDM) program element. In FY14, the trilabs looked at porting proxy application to technologies of interest for ATS procurements. In FY15, a milestone was completed evaluating proxy applications in multiple programming models and in FY16, a milestone was completed focusing on the migration of lessons learned back into production code development. This year, the co-design milestone focuses on extracting the knowledge gained and/or code revisions back into production applications.
Abstract not provided.
Abstract not provided.
This report documents the ASC/ATDM Kokkos deliverable "Production Portable Dy- namic Task DAG Capability." This capability enables applications to create and execute a dynamic task DAG ; a collection of heterogeneous computational tasks with a directed acyclic graph (DAG) of "execute after" dependencies where tasks and their dependencies are dynamically created and destroyed as tasks execute. The Kokkos task scheduler executes the dynamic task DAG on the target execution resource; e.g. a multicore CPU, a manycore CPU such as Intel's Knights Landing (KNL), or an NVIDIA GPU. Several major technical challenges had to be addressed during development of Kokkos' Task DAG capability: (1) portability to a GPU with it's simplified hardware and micro- runtime, (2) thread-scalable memory allocation and deallocation from a bounded pool of memory, (3) thread-scalable scheduler for dynamic task DAG, (4) usability by applications.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This project was inspired by two needs. The first is a need for tools to help scientists and engineers to design effective data visualizations for communicating information, whether to the user of a system, an analyst who must make decisions based on complex data, or in the context of a technical report or publication. Most scientists and engineers are not trained in visualization design, and they could benefit from simple metrics to assess how well their visualization's design conveys the intended message. In other words, will the most important information draw the viewer's attention? The second is the need for cognition-based metrics for evaluating new types of visualizations created by researchers in the information visualization and visual analytics communities. Evaluating visualizations is difficult even for experts. However, all visualization methods and techniques are intended to exploit the properties of the human visual system to convey information efficiently to a viewer. Thus, developing evaluation methods that are rooted in the scientific knowledge of the human visual system could be a useful approach. In this project, we conducted fundamental research on how humans make sense of abstract data visualizations, and how this process is influenced by their goals and prior experience. We then used that research to develop a new model, the Data Visualization Saliency Model, that can make accurate predictions about which features in an abstract visualization will draw a viewer's attention. The model is an evaluation tool that can address both of the needs described above, supporting both visualization research and Sandia mission needs.
Abstract not provided.
The SPARC (Sandia Parallel Aerodynamics and Reentry Code) will provide nuclear weapon qualification evidence for the random vibration and thermal environments created by re-entry of a warhead into the earth’s atmosphere. SPARC incorporates the innovative approaches of ATDM projects on several fronts including: effective harnessing of heterogeneous compute nodes using Kokkos, exascale-ready parallel scalability through asynchronous multi-tasking, uncertainty quantification through Sacado integration, implementation of state-of-the-art reentry physics and multiscale models, use of advanced verification and validation methods, and enabling of improved workflows for users. SPARC is being developed primarily for the Department of Energy nuclear weapon program, with additional development and use of the code is being supported by the Department of Defense for conventional weapons programs.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The heterogeneity in mechanical fields introduced by microstructure plays a critical role in the localization of deformation. To resolve this incipient stage of failure, it is therefore necessary to incorporate microstructure with sufficient resolution. On the other hand, computational limitations make it infeasible to represent the microstructure in the entire domain at the component scale. In this study, the authors demonstrate the use of concurrent multi- scale modeling to incorporate explicit, finely resolved microstructure in a critical region while resolving the smoother mechanical fields outside this region with a coarser discretization to limit computational cost. The microstructural physics is modeled with a high-fidelity model that incorporates anisotropic crystal elasticity and rate-dependent crystal plasticity to simulate the behavior of a stainless steel alloy. The component-scale material behavior is treated with a lower fidelity model incorporating isotropic linear elasticity and rate-independent J 2 plas- ticity. The microstructural and component scale subdomains are modeled concurrently, with coupling via the Schwarz alternating method, which solves boundary-value problems in each subdomain separately and transfers solution information between subdomains via Dirichlet boundary conditions. Beyond cases studies in concurrent multiscale, we explore progress in crystal plastic- ity through modular designs, solution methodologies, model verification, and extensions to Sierra/SM and manycore applications. Advances in conformal microstructures having both hexahedral and tetrahedral workflows in Sculpt and Cubit are highlighted. A structure-property case study in two-phase metallic composites applies the Materials Knowledge System to local metrics for void evolution. Discussion includes lessons learned, future work, and a summary of funded efforts and proposed work. Finally, an appendix illustrates the need for two-way coupling through a single degree of freedom.
Abstract not provided.
The ability to simulate wireless networks at large-scale for meaningful amount of time is considerably lacking in today's network simulators. For this reason, many published work in this area often limit their simulation studies to less than a 1,000 nodes and either over-simplify channel characteristics or perform studies over time scales much less than a day. In this report, we show that one can overcome these limitations and study problems of high practical consequence. This work presents two key contributions to high fidelity simulation of large-scale wireless networks: (a) wireless simulations can be sped up by more than 100X in runtime using ideas from spatial indexing algorithms and clipping of negligible signals and (b) clustering and task-oriented programming paradigm can be used to reduce inter- process communication in a parallel discrete event simulation resulting in a better scaling efficiency.
Abstract not provided.
Abstract not provided.
IEEE Spectrum
For more than 50 years, computers have made steady and dramatic improvements, all thanks to Moore’s Law—the exponential increase over time in the number of transistors that can be fabricated on an integrated circuit of a given size. Moore’s Law owed its success to the fact that as transistors were made smaller, they became simultaneously cheaper, faster, and more energy efficient. The payoff from this win-win-win scenario enabled reinvestment in semiconductor fabrication technology that could make even smaller, more densely-packed transistors. And so this virtuous cycle continued, decade after decade. Now though, experts in industry, academia, and government laboratories anticipate that semiconductor miniaturization won’t continue much longer—maybe 10 years or so, at best. Making transistors smaller no longer yields the improvements it used to. The physical characteristics of small transistors forced clock speeds to cease getting faster more than a decade ago, which drove the industry to start building chips with multiple cores. But even multi-core architectures must contend with increasing amounts of “dark silicon,” areas of the chip that must be powered off to avoid overheating.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This LDRD project was developed around the ambitious goal of applying PDE-constrained opti- mization approaches to design Z-machine components whose performance is governed by elec- tromagnetic and plasma models. This report documents the results of this LDRD project. Our differentiating approach was to use topology optimization methods developed for structural design and extend them for application to electromagnetic systems pertinent to the Z-machine. To achieve this objective a suite of optimization algorithms were implemented in the ROL library part of the Trilinos framework. These methods were applied to standalone demonstration problems and the Drekar multi-physics research application. Out of this exploration a new augmented Lagrangian approach to structural design problems was developed. We demonstrate that this approach has favorable mesh-independent performance. Both the final design and the algorithmic performance were independent of the size of the mesh. In addition, topology optimization formulations for the design of conducting networks were developed and demonstrated. Of note, this formulation was used to develop a design for the inner magnetically insulated transmission line on the Z-machine. The resulting electromagnetic device is compared with theoretically postulated designs.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
International Journal of Impact Engineering
Most previous development of the peridynamic theory has assumed a Lagrangian formulation, in which the material model refers to an undeformed reference configuration. In the present work, an Eulerian form of material modeling is developed, in which bond forces depend only on the positions of material points in the deformed configuration. The formulation is consistent with the thermodynamic form of the peridynamic model and is derivable from a suitable expression for the free energy of a material. It is shown that the resulting formulation of peridynamic material models can be used to simulate strong shock waves and fluid response in which very large deformations make the Lagrangian form unsuitable. The Eulerian capability is demonstrated in numerical simulations of ejecta from a wavy free surface on a metal subjected to strong shock wave loading. The Eulerian and Lagrangian contributions to bond force can be combined in a single material model, allowing strength and fracture under tensile or shear loading to be modeled consistently with high compressive stresses. This capability is demonstrated in numerical simulation of bird strike against an aircraft, in which both tensile fracture and high pressure response are important.
Abstract not provided.
Computational Methods in Applied Mathematics
In this paper, a nonlocal convection-diffusion model is introduced for the master equation of Markov jump processes in bounded domains. With minimal assumptions on the model parameters, the nonlocal steady and unsteady state master equations are shown to be well-posed in a weak sense. Finally, then the nonlocal operator is shown to be the generator of finite-range nonsymmetric jump processes and, when certain conditions on the model parameters hold, the generators of finite and infinite activity Lévy and Lévy-type jump processes are shown to be special instances of the nonlocal operator.
Proceedings - 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, DSN-W 2017
Current practice for mitigating DRAM hardwarefaults is to simply discard the entire faulty DIMM. However, this becomes increasingly expensive and wasteful as the priceof memory hardware increases and moves physically closer toprocessing units. Accurately characterizing memory faults inreal-time in order to pre-empt future potentially catastrophicfailures is crucial to conserving resources by blacklisting smallaffected regions of memory rather than discarding an entirehardware component. We further evaluate and extend a machinelearning method for DRAM fault characterization introduced inprior work by Baseman et al. at Los Alamos National Laboratory. We report on the usefulness of a variety of training sets, usinga set of production-relevant metrics to evaluate the method ondata from a leadership-class supercomputing facility. We observean increase in percent of faults successfully mitigated as well asa decrease in percent of wasted blacklisted pages, regardless oftraining set, when using the learned algorithm as compared to ahuman-expert, deterministic, and rule-based approach.
IEEE Transactions on Visualization and Computer Graphics
Evaluating the effectiveness of data visualizations is a challenging undertaking and often relies on one-off studies that test a visualization in the context of one specific task. Researchers across the fields of data science, visualization, and human-computer interaction are calling for foundational tools and principles that could be applied to assessing the effectiveness of data visualizations in a more rapid and generalizable manner. One possibility for such a tool is a model of visual saliency for data visualizations. Visual saliency models are typically based on the properties of the human visual cortex and predict which areas of a scene have visual features (e.g. color, luminance, edges) that are likely to draw a viewer's attention. While these models can accurately predict where viewers will look in a natural scene, they typically do not perform well for abstract data visualizations. In this paper, we discuss the reasons for the poor performance of existing saliency models when applied to data visualizations. We introduce the Data Visualization Saliency (DVS) model, a saliency model tailored to address some of these weaknesses, and we test the performance of the DVS model and existing saliency models by comparing the saliency maps produced by the models to eye tracking data obtained from human viewers. In conclusion, we describe how modified saliency models could be used as general tools for assessing the effectiveness of visualizations, including the strengths and weaknesses of this approach.
Modelling and Simulation in Materials Science and Engineering
Welding is one of the most wide-spread processes used in metal joining. However, there are currently no open-source software implementations for the simulation of microstructural evolution during a weld pass. Here we describe a Potts Monte Carlo based model implemented in the SPPARKS kinetic Monte Carlo computational framework. The model simulates melting, solidification and solid-state microstructural evolution of material in the fusion and heat-affected zones of a weld. The model does not simulate thermal behavior, but rather utilizes user input parameters to specify weld pool and heat-affect zone properties. Weld pool shapes are specified by Bézier curves, which allow for the specification of a wide range of pool shapes. Pool shapes can range from narrow and deep to wide and shallow representing different fluid flow conditions within the pool. Surrounding temperature gradients are calculated with the aide of a closest point projection algorithm. The model also allows simulation of pulsed power welding through time-dependent variation of the weld pool size. Example simulation results and comparisons with laboratory weld observations demonstrate microstructural variation with weld speed, pool shape, and pulsed-power.
We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the use of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.
Advanced Engineering Materials
Additive manufacturing enables the rapid, cost effective production of customized structural components. To fully capitalize on the agility of additive manufacturing, it is necessary to develop complementary high-throughput materials evaluation techniques. In this study, over 1000 nominally identical tensile tests are used to explore the effect of process variability on the mechanical property distributions of a precipitation hardened stainless steel produced by a laser powder bed fusion process, also known as direct metal laser sintering or selective laser melting. With this large dataset, rare defects are revealed that affect only ≈2% of the population, stemming from a single build lot of material. The rare defects cause a substantial loss in ductility and are associated with an interconnected network of porosity. The adoption of streamlined test methods will be paramount to diagnosing and mitigating such dangerous anomalies in future structural components.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
LAMMPS is a classical molecular dynamics code (lammps.sandia.gov) used to model materials science problems at Sandia National Laboratories and around the world. LAMMPS was one of three Sandia codes selected to participate in the Trinity KNL (TR2) Open Science period. During this period, three different problems of interest were investigated using LAMMPS. The first was benchmarking KNL performance using different force field models. The second was simulating void collapse in shocked HNS energetic material using an all-atom model. The third was simulating shock propagation through poly-crystalline RDX energetic material using a coarse-grain model, the results of which were used in an ACM Gordon Bell Prize submission. This report describes the results of these simulations, lessons learned, and some hardware issues found on Trinity KNL as part of this work.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Geophysical Research: Solid Earth
The geodetically derived interseismic moment deficit rate (MDR) provides a first-order constraint on earthquake potential and can play an important role in seismic hazard assessment, but quantifying uncertainty in MDR is a challenging problem that has not been fully addressed. We establish criteria for reliable MDR estimators, evaluate existing methods for determining the probability density of MDR, and propose and evaluate new methods. Geodetic measurements moderately far from the fault provide tighter constraints on MDR than those nearby. Previously used methods can fail catastrophically under predictable circumstances. The bootstrap method works well with strong data constraints on MDR, but can be strongly biased when network geometry is poor. We propose two new methods: the Constrained Optimization Bounding Estimator (COBE) assumes uniform priors on slip rate (from geologic information) and MDR, and can be shown through synthetic tests to be a useful, albeit conservative estimator; the Constrained Optimization Bounding Linear Estimator (COBLE) is the corresponding linear estimator with Gaussian priors rather than point-wise bounds on slip rates. COBE matches COBLE with strong data constraints on MDR. We compare results from COBE and COBLE to previously published results for the interseismic MDR at Parkfield, on the San Andreas Fault, and find similar results; thus, the apparent discrepancy between MDR and the total moment release (seismic and afterslip) in the 2004 Parkfield earthquake remains.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.