Fault-tolerance is a major challenge for many current and future extreme-scale systems, with many studies showing it to be the key limiter to application scalability. While there are a number of studies investigating the performance of various resilience mechanisms, these are typically limited to scales orders of magnitude smaller than expected for next-generation systems and simple benchmark problems. In this paper we show how, with very minor changes, a previously published and validated simulation framework for investigating appli- cation performance of OS noise can be used to simulate the overheads of various resilience mechanisms at scale. Using this framework, we compare the failure-free performance of this simulator against an analytic model to validate its performance and demonstrate its ability to simulate the performance of two popular rollback recovery methods on traces from real
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities. 3
Exascale supercomputing will embody many revolutionary changes in the hardware and software of high-performance computing. A particularly pressing issue is gaining insight into the science behind the exascale computations. Power and I/O speed con- straints will fundamentally change current visualization and analysis work ows. A traditional post-processing work ow involves storing simulation results to disk and later retrieving them for visualization and data analysis. However, at exascale, scien- tists and analysts will need a range of options for moving data to persistent storage, as the current o ine or post-processing pipelines will not be able to capture the data necessary for data analysis of these extreme scale simulations. This Milestone explores two alternate work ows, characterized as in situ and in transit, and compares them. We nd each to have its own merits and faults, and we provide information to help pick the best option for a particular use.
The finite-element shock hydrodynamics code ALEGRA has recently been upgraded to include an X-FEM implementation in 2D for simulating impact, sliding, and release between materials in the Eulerian frame. For validation testing purposes, the problem of long-rod penetration in semi-infinite targets is considered in this report, at velocities of 500 to 3000 m/s. We describe testing simulations done using ALEGRA with and without the X-FEM capability, in order to verify its adequacy by showing X-FEM recovers the good results found with the standard ALEGRA formulation. The X-FEM results for depth of penetration differ from previously measured experimental data by less than 2%, and from the standard formulation results by less than 1%. They converge monotonically under mesh refinement at first order. Sensitivities to domain size and rear boundary condition are investigated and shown to be small. Aside from some simulation stability issues, X-FEM is found to produce good results for this classical impact and penetration problem.
Probabilistic Risk Assessment (PRA) is a fundamental part of safety/quality assurance for nuclear power and nuclear weapons. Traditional PRA very effectively models complex hardware system risks using binary probabilistic models. However, traditional PRA models are not flexible enough to accommodate non-binary soft-causal factors, such as digital instrumentation&control, passive components, aging, common cause failure, and human errors. Bayesian Networks offer the opportunity to incorporate these risks into the PRA framework. This report describes the results of an early career LDRD project titled %E2%80%9CUse of Limited Data to Construct Bayesian Networks for Probabilistic Risk Assessment%E2%80%9D. The goal of the work was to establish the capability to develop Bayesian Networks from sparse data, and to demonstrate this capability by producing a data-informed Bayesian Network for use in Human Reliability Analysis (HRA) as part of nuclear power plant Probabilistic Risk Assessment (PRA). This report summarizes the research goal and major products of the research.
This Report presents numerical tables summarizing properties of intrinsic defects in indium arsenide, InAs, as computed by density functional theory using semi-local density functionals, intended for use as reference tables for a defect physics package in device models.
The state of the art in failure modeling enables assessment of crack nucleation, propagation, and progression to fragmentation due to high velocity impact. Vulnerability assessments suggest a need to track material behavior through failure, to the point of fragmentation and beyond. This eld of research is particularly challenging for structures made of porous quasi-brittle materials, such as ceramics used in modern armor systems, due to the complex material response when loading exceeds the quasi-brittle material's elastic limit. Further complications arise when incorporating the quasi-brittle material response in multi-material Eulerian hydrocode simulations. In this report, recent e orts in coupling a ceramic materials response in the post-failure regime with an Eulerian hydro code are described. Material behavior is modeled by the Kayenta material model [2] and Alegra as the host nite element code [14]. Kayenta, a three invariant phenomenological plasticity model originally developed for modeling the stress response of geologic materials, has in recent years been used with some success in the modeling of ceramic and other quasi-brittle materials to high velocity impact. Due to the granular nature of ceramic materials, Kayenta allows for signi cant pressures to develop due to dilatant plastic ow, even in shear dominated loading where traditional equations of state predict little or no pressure response. When a material's ability to carry further load is compromised, Kayenta allows the material's strength and sti ness to progressively degrade through the evolution of damage to the point of material failure. As material dilatation and damage progress, accommodations are made within Alegra to treat in a consistent manner the evolving state.
This document summarizes the results from a level 3 milestone study within the CASL VUQ effort. We compare the adjoint-based a posteriori error estimation approach with a recent variant of a data-centric verification technique. We provide a brief overview of each technique and then we discuss their relative advantages and disadvantages. We use Drekar::CFD to produce numerical results for steady-state Navier Stokes and SARANS approximations. 3
This paper examines task mapping algorithms for non-contiguously allocated parallel jobs. Several studies have shown that task placement affects job running time for both contiguously and non-contiguously allocated jobs. Traditionally, work on task mapping either uses a very general model where the job has an arbitrary communication pattern or assumes that jobs are allocated contiguously, making them completely isolated from each other. A middle ground between these two cases is the mapping problem for non-contiguous jobs having a specific communication pattern. We propose several task mapping algorithms for jobs with a stencil communication pattern and evaluate them using experiments and simulations. Our strategies improve the running time of a MiniApp by as much as 30% over a baseline strategy. Furthermore, this improvement increases markedly with the job size, demonstrating the importance of task mapping as systems grow toward exascale.