Multi-jagged: A Scalable Multi-section based Spatial Partitioning Algorithm
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The High Performance Linpack (HPL), or Top 500, benchmark [1] is the most widely recognized and discussed metric for ranking high performance computing systems. However, HPL is increasingly unreliable as a true measure of system performance for a growing collection of important science and engineering applications. In this paper we describe a new high performance conjugate gradient (HPCG) benchmark. HPCG is composed of computations and data access patterns more commonly found in applications. Using HPCG we strive for a better correlation to real scientific application performance and expect to drive computer system design and implementation in directions that will better impact performance improvement.
In the following paper, we discuss how to design an ensemble of experiments through the use of compressed sensing. Specifically, we show how to conduct a small number of physical experiments and then use compressed sensing to reconstruct a larger set of data. In order to accomplish this, we organize our results into four sections. We begin by extending the theory of compressed sensing to a finite product of Hilbert spaces. Then, we show how these results apply to experiment design. Next, we develop an efficient reconstruction algorithm that allows us to reconstruct experimental data projected onto a finite element basis. Finally, we verify our approach with two computational experiments.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Physics: Condensed Matter
Abstract not provided.
Abstract not provided.
Physical Review Letters
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Physical Review X
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The Trilinos Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries within an object-oriented framework. It is intended for large-scale, complex multiphysics engineering and scientific applications [2, 4, 3]. Epetra is one of its basic packages. It provides serial and parallel linear algebra capabilities. Before Trilinos version 11.0, released in 2012, Epetra used the C++ int data-type for storing global and local indices for degrees of freedom (DOFs). Since int is typically 32-bit, this limited the largest problem size to be smaller than approximately two billion DOFs. This was true even if a distributed memory machine could handle larger problems. We have added optional support for C++ long long data-type, which is at least 64-bit wide, for global indices. To save memory, maintain the speed of memory-bound operations, and reduce further changes to the code, the local indices are still 32-bit. We document the changes required to achieve this feature and how the new functionality can be used. We also report on the lessons learned in modifying a mature and popular package from various perspectives design goals, backward compatibility, engineering decisions, C++ language features, effects on existing users and other packages, and build integration.
The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A new software capability is introduced into Trilinos as a package. A Trilinos package is an integral unit and, although there are exceptions such as utility packages, each package is typically developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. The Trilinos Developers SQE Guide is a resource for Trilinos package developers who are working under Advanced Simulation and Computing (ASC) and are therefore subject to the ASC Software Quality Engineering Practices as described in the Sandia National Laboratories Advanced Simulation and Computing (ASC) Software Quality Plan: ASC Software Quality Engineering Practices Version 3.0 document [1]. The Trilinos Developer Policies webpage [2] contains a lot of detailed information that is essential for all Trilinos developers. The Trilinos Software Lifecycle Model [3] defines the default lifecycle model for Trilinos packages and provides a context for many of the practices listed in this document.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Increased HPC capability comes with increased complexity, part counts, and fault occurrences. In- creasing the resilience of systems and applications to faults is a critical requirement facing the viability of exascale systems, as the overhead of traditional checkpoint/restart is projected to outweigh its bene ts due to fault rates outpacing I/O bandwidths. As faults occur and propagate throughout hardware and software layers, pervasive noti cation and handling mechanisms are necessary. This report describes an initial investigation of fault types and programming interfaces to mitigate them. Proof-of-concept APIs are presented for the frequent and important cases of memory errors and node failures, and a strategy proposed for lesystem failures. These involve changes to the operating system, runtime, I/O library, and application layers. While a single API for fault handling among hardware and OS and application system-wide remains elusive, the e ort increased our understanding of both the mountainous challenges and the promising trailheads. 3
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report summarizes the result of a NEAMS project focused on sensitivity analysis of a new model for the fission gas behavior (release and swelling) in the BISON fuel performance code of Idaho National Laboratory. Using the new model in BISON, the sensitivity of the calculated fission gas release and swelling to the involved parameters and the associated uncertainties is investigated. The study results in a quantitative assessment of the role of intrinsic uncertainties in the analysis of fission gas behavior in nuclear fuel.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report aims to unify several approaches for building stable projection-based reduced order models (ROMs). Attention is focused on linear time-invariant (LTI) systems. The model reduction procedure consists of two steps: the computation of a reduced basis, and the projection of the governing partial differential equations (PDEs) onto this reduced basis. Two kinds of reduced bases are considered: the proper orthogonal decomposition (POD) basis and the balanced truncation basis. The projection step of the model reduction can be done in two ways: via continuous projection or via discrete projection. First, an approach for building energy-stable Galerkin ROMs for linear hyperbolic or incompletely parabolic systems of PDEs using continuous projection is proposed. The idea is to apply to the set of PDEs a transformation induced by the Lyapunov function for the system, and to build the ROM in the transformed variables. The resulting ROM will be energy-stable for any choice of reduced basis. It is shown that, for many PDE systems, the desired transformation is induced by a special weighted L2 inner product, termed the %E2%80%9Csymmetry inner product%E2%80%9D. Attention is then turned to building energy-stable ROMs via discrete projection. A discrete counterpart of the continuous symmetry inner product, a weighted L2 inner product termed the %E2%80%9CLyapunov inner product%E2%80%9D, is derived. The weighting matrix that defines the Lyapunov inner product can be computed in a black-box fashion for a stable LTI system arising from the discretization of a system of PDEs in space. It is shown that a ROM constructed via discrete projection using the Lyapunov inner product will be energy-stable for any choice of reduced basis. Connections between the Lyapunov inner product and the inner product induced by the balanced truncation algorithm are made. Comparisons are also made between the symmetry inner product and the Lyapunov inner product. The performance of ROMs constructed using these inner products is evaluated on several benchmark test cases.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
SIAM Journal on Optimization
Abstract not provided.
Abstract not provided.
International Journal for Uncertainty Quantification
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Completion of the CASL L3 milestone THM.CFD.P6.03 provides a tabular material properties capability to the Hydra code. A tabular interpolation package used in Sandia codes was modified to support the needs of multi-phase solvers in Hydra. Use of the interface is described. The package was released to Hydra under a government use license. A dummy physics was created in Hydra to prototype use of the interpolation routines. Finally, a test using the dummy physics verifies the correct behavior of the interpolation for a test water table. 3
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
ACM Transactions on Parallel Computing
Abstract not provided.
The estimation of fossil-fuel CO2 emissions (ffCO2) from limited ground-based and satellite measurements of CO2 concentrations will form a key component of the monitoring of treaties aimed at the abatement of greenhouse gas emissions. To that end, we construct a multiresolution spatial parametrization for fossil-fuel CO2 emissions (ffCO2), to be used in atmospheric inversions. Such a parametrization does not currently exist. The parametrization uses wavelets to accurately capture the multiscale, nonstationary nature of ffCO2 emissions and employs proxies of human habitation, e.g., images of lights at night and maps of built-up areas to reduce the dimensionality of the multiresolution parametrization. The parametrization is used in a synthetic data inversion to test its suitability for use in atmospheric inverse problem. This linear inverse problem is predicated on observations of ffCO2 concentrations collected at measurement towers. We adapt a convex optimization technique, commonly used in the reconstruction of compressively sensed images, to perform sparse reconstruction of the time-variant ffCO2 emission field. We also borrow concepts from compressive sensing to impose boundary conditions i.e., to limit ffCO2 emissions within an irregularly shaped region (the United States, in our case). We find that the optimization algorithm performs a data-driven sparsification of the spatial parametrization and retains only of those wavelets whose weights could be estimated from the observations. Further, our method for the imposition of boundary conditions leads to a 10computational saving over conventional means of doing so. We conclude with a discussion of the accuracy of the estimated emissions and the suitability of the spatial parametrization for use in inverse problems with a significant degree of regularization.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Fault-tolerance is a major challenge for many current and future extreme-scale systems, with many studies showing it to be the key limiter to application scalability. While there are a number of studies investigating the performance of various resilience mechanisms, these are typically limited to scales orders of magnitude smaller than expected for next-generation systems and simple benchmark problems. In this paper we show how, with very minor changes, a previously published and validated simulation framework for investigating appli- cation performance of OS noise can be used to simulate the overheads of various resilience mechanisms at scale. Using this framework, we compare the failure-free performance of this simulator against an analytic model to validate its performance and demonstrate its ability to simulate the performance of two popular rollback recovery methods on traces from real
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities. 3
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Applied Physics
In an effort to build a stronger microscopic foundation for radiation damage models in gallium arsenide (GaAs), the electronic properties of radiation-induced damage clusters are studied with atomistic simulations. Molecular dynamics simulations are used to access the time and length scales required for direct simulation of a collision cascade, and density functional theory simulations are used to calculate the electronic properties of isolated damaged clusters that are extracted from these cascades. To study the physical properties of clusters, we analyze the statistics of a randomly generated ensemble of damage clusters because no single cluster adequately represents this class of defects. The electronic properties of damage clusters are accurately described by a classical model of the electrical charging of a semiconducting sphere embedded in a uniform dielectric. The effective band gap of the cluster depends on the degree of internal structural damage, and the gap closes to form a metal in the high-damage limit. We estimate the Fermi level of this metallic state, which corresponds to high-energy amorphous GaAs, to be 0.46 ± 0.07 eV above the valence band edge of crystalline GaAs. © 2013 American Institute of Physics.
Exascale supercomputing will embody many revolutionary changes in the hardware and software of high-performance computing. A particularly pressing issue is gaining insight into the science behind the exascale computations. Power and I/O speed con- straints will fundamentally change current visualization and analysis work ows. A traditional post-processing work ow involves storing simulation results to disk and later retrieving them for visualization and data analysis. However, at exascale, scien- tists and analysts will need a range of options for moving data to persistent storage, as the current o ine or post-processing pipelines will not be able to capture the data necessary for data analysis of these extreme scale simulations. This Milestone explores two alternate work ows, characterized as in situ and in transit, and compares them. We nd each to have its own merits and faults, and we provide information to help pick the best option for a particular use.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The finite-element shock hydrodynamics code ALEGRA has recently been upgraded to include an X-FEM implementation in 2D for simulating impact, sliding, and release between materials in the Eulerian frame. For validation testing purposes, the problem of long-rod penetration in semi-infinite targets is considered in this report, at velocities of 500 to 3000 m/s. We describe testing simulations done using ALEGRA with and without the X-FEM capability, in order to verify its adequacy by showing X-FEM recovers the good results found with the standard ALEGRA formulation. The X-FEM results for depth of penetration differ from previously measured experimental data by less than 2%, and from the standard formulation results by less than 1%. They converge monotonically under mesh refinement at first order. Sensitivities to domain size and rear boundary condition are investigated and shown to be small. Aside from some simulation stability issues, X-FEM is found to produce good results for this classical impact and penetration problem.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Probabilistic Risk Assessment (PRA) is a fundamental part of safety/quality assurance for nuclear power and nuclear weapons. Traditional PRA very effectively models complex hardware system risks using binary probabilistic models. However, traditional PRA models are not flexible enough to accommodate non-binary soft-causal factors, such as digital instrumentation&control, passive components, aging, common cause failure, and human errors. Bayesian Networks offer the opportunity to incorporate these risks into the PRA framework. This report describes the results of an early career LDRD project titled %E2%80%9CUse of Limited Data to Construct Bayesian Networks for Probabilistic Risk Assessment%E2%80%9D. The goal of the work was to establish the capability to develop Bayesian Networks from sparse data, and to demonstrate this capability by producing a data-informed Bayesian Network for use in Human Reliability Analysis (HRA) as part of nuclear power plant Probabilistic Risk Assessment (PRA). This report summarizes the research goal and major products of the research.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
SIAM journal of applied mathematics
Abstract not provided.
Abstract not provided.
Abstract not provided.
This Report presents numerical tables summarizing properties of intrinsic defects in indium arsenide, InAs, as computed by density functional theory using semi-local density functionals, intended for use as reference tables for a defect physics package in device models.
The state of the art in failure modeling enables assessment of crack nucleation, propagation, and progression to fragmentation due to high velocity impact. Vulnerability assessments suggest a need to track material behavior through failure, to the point of fragmentation and beyond. This eld of research is particularly challenging for structures made of porous quasi-brittle materials, such as ceramics used in modern armor systems, due to the complex material response when loading exceeds the quasi-brittle material's elastic limit. Further complications arise when incorporating the quasi-brittle material response in multi-material Eulerian hydrocode simulations. In this report, recent e orts in coupling a ceramic materials response in the post-failure regime with an Eulerian hydro code are described. Material behavior is modeled by the Kayenta material model [2] and Alegra as the host nite element code [14]. Kayenta, a three invariant phenomenological plasticity model originally developed for modeling the stress response of geologic materials, has in recent years been used with some success in the modeling of ceramic and other quasi-brittle materials to high velocity impact. Due to the granular nature of ceramic materials, Kayenta allows for signi cant pressures to develop due to dilatant plastic ow, even in shear dominated loading where traditional equations of state predict little or no pressure response. When a material's ability to carry further load is compromised, Kayenta allows the material's strength and sti ness to progressively degrade through the evolution of damage to the point of material failure. As material dilatation and damage progress, accommodations are made within Alegra to treat in a consistent manner the evolving state.
This document summarizes the results from a level 3 milestone study within the CASL VUQ effort. We compare the adjoint-based a posteriori error estimation approach with a recent variant of a data-centric verification technique. We provide a brief overview of each technique and then we discuss their relative advantages and disadvantages. We use Drekar::CFD to produce numerical results for steady-state Navier Stokes and SARANS approximations. 3
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proposed for publication in www.arXiv.org.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This paper examines task mapping algorithms for non-contiguously allocated parallel jobs. Several studies have shown that task placement affects job running time for both contiguously and non-contiguously allocated jobs. Traditionally, work on task mapping either uses a very general model where the job has an arbitrary communication pattern or assumes that jobs are allocated contiguously, making them completely isolated from each other. A middle ground between these two cases is the mapping problem for non-contiguous jobs having a specific communication pattern. We propose several task mapping algorithms for jobs with a stencil communication pattern and evaluate them using experiments and simulations. Our strategies improve the running time of a MiniApp by as much as 30% over a baseline strategy. Furthermore, this improvement increases markedly with the job size, demonstrating the importance of task mapping as systems grow toward exascale.
Abstract not provided.
Abstract not provided.
Proposed for publication in Jounal of Physical Chemistry A.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proposed for publication in Future Generation Computer Systems.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.