Publications

Results 1876–1900 of 9,998

Search results

Jump to search filters

Progress in Deep Geologic Disposal Safety Assessment in the U.S. since 2010

Mariner, Paul M.; Connolly, Laura A.; Cunningham, Leigh C.; Debusschere, Bert D.; Dobson, David C.; Frederick, Jennifer M.; Hammond, Glenn E.; Jordan, Spencer H.; LaForce, Tara; Nole, Michael A.; Park, Heeho D.; Laros, James H.; Rogers, Ralph D.; Seidl, Daniel T.; Sevougian, Stephen D.; Stein, Emily S.; Swift, Peter N.; Swiler, Laura P.; Vo, Jonathan; Wallace, Michael G.

The Spent Fuel and Waste Science and Technology (SFWST) Campaign of the U.S. Department of Energy (DOE) Office of Nuclear Energy (NE), Office of Spent Fuel & Waste Disposition (SFWD) is conducting research and development (R&D) on geologic disposal of spent nuclear fuel (SNF) and high-level nuclear waste (HLW). Two high priorities for SFWST disposal R&D are design concept development and disposal system modeling (DOE 2011, Table 6). These priorities are directly addressed in the SFWST Geologic Disposal Safety Assessment (GDSA) work package, which is charged with developing a disposal system modeling and analysis capability for evaluating disposal system performance for nuclear waste in geologic media.

More Details

Center for Computing Research Highlights

Hendrickson, Bruce A.; Alvin, Kenneth F.; Miller, Leann A.; Collis, Samuel S.

Sandia has a legacy of leadership in the advancement of high performance computing (HPC) at extreme scales. First-of-a-kind scalable distributed-memory parallel platforms such as the Intel Paragon, ASCI Red (the world’s first teraflops computer), and Red Storm (co-developed with Cray) helped form the basis for one of the most successful supercomputer product lines ever: the Cray XT series. Sandia also has pioneered system software elements—including lightweight operating systems, the Portals network programming interface, advanced interconnection network designs, and scalable I/O— that are critical to achieving scalability on large computing systems.

More Details

Abstract Machine Models and Proxy Architectures for Exascale Computing

Ang, James A.; Barrett, Richard F.; Benner, R.E.; Burke, Daniel; Chan, Cy; Cook, Jeanine C.; Daley, Christopher S.; Donofrio, David; Hammond, Simon D.; Hemmert, Karl S.; Hoekstra, Robert J.; Ibrahim, Khaled; Kelly, Suzanne M.; Le, Hoang; Leung, Vitus J.; Michelogiannakis, George; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan; Unat, D.; Wright, Nick J.; Voskuilen, Gwendolyn R.

To achieve exascale computing, fundamental hardware architectures must change. The most significant consequence of this assertion is the impact on the scientific and engineering applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. In order to adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency into the future. While many details of the exascale architectures are undefined, an abstract machine model is designed to allow application developers to focus on the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. We use the term proxy architecture to describe a parameterized version of an abstract machine model, with the parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models are formulated to enable discussion between the developers of analytic models and simulators and computer hardware architects. They allow for application performance analysis and hardware optimization opportunities. In this report our goal is to provide the application development community with a set of models that can help software developers prepare for exascale. In addition, through the use of proxy architectures, we can enable a more concrete exploration of how well new and evolving application codes map onto future architectures. This second version of the document addresses system scale considerations and provides a system-level abstract machine model with proxy architecture information.

More Details

Evaluating tradeoffs between MPI message matching offload hardware capacity and performance

ACM International Conference Proceeding Series

Levy, Scott L.; Ferreira, Kurt B.

Although its demise has been frequently predicted, the Message Passing Interface (MPI) remains the dominant programming model for scientific applications running on high-performance computing (HPC) systems. MPI specifies powerful semantics for interprocess communication that have enabled scientists to write applications for simulating important physical phenomena. However, these semantics have also presented several significant challenges. For example, the existence of wildcard values has made the efficient enforcement of MPI message matching semantics challenging. Significant research has been dedicated to accelerating MPI message matching. One common approach has been to offload matching to dedicated hardware. One of the challenges that hardware designers have faced is knowing how to size hardware structures to accommodate outstanding match requests. Applications that exceed the capacity of specialized hardware typically must fall back to storing match requests in bulk memory, e.g. DRAM on the host processor. In this paper, we examine the implications of hardware matching and develop guidance on sizing hardware matching structure to strike a balance between minimizing expensive dedicated hardware resources and overall matching performance. By examining the message matching behavior of several important HPC workloads, we show that when specialized hardware matching is not dramatically faster than matching in memory the offload hardware's match queue capacity can be reduced without significantly increasing match time. On the other hand, effectively exploiting the benefits of very fast specialized matching hardware requires sufficient storage resources to ensure that every search completes in the specialized hardware. The data and analysis in this paper provide important guidance for designers of MPI message matching hardware.

More Details

Developing and evaluating Malliavin estimators for intrusive sensitivity analysis of Monte Carlo radiation transport

Bond, Stephen D.; Franke, Brian C.; Lehoucq, Richard B.; Smith, John D.

We will develop Malliavin estimators for Monte Carlo radiation transport by formulating the governing jump stochastic differential equation and deriving the applicable estimators that produce sensitivities for our equations. Efficient and effective sensitivity can be used for design optimization and uncertainty quantification with broad utilization for radiation environments. The technology demonstration will lower development risk for other particle-based simulation methods.

More Details

Rigorous Data Fusion for Computationally Expensive Simulations

Winovich, Nickolas W.; Rushdi, Ahmad R.; Phipps, Eric T.; Ray, Jaideep R.; Lin, Guang; Ebeida, Mohamed S.

This manuscript comprises the final report for the 1-year, FY19 LDRD project "Rigorous Data Fusion for Computationally Expensive Simulations," wherein an alternative approach to Bayesian calibration was developed based a new sampling technique called VoroSpokes. Vorospokes is a novel quadrature and sampling framework defined with respect to Voronoi tessellations of bounded domains in $R^d$ developed within this project. In this work, we first establish local quadrature and sampling results on convex polytopes using randomly directed rays, or spokes, to approximate the quantities of interest for a specified target function. A theoretical justification for both procedures is provided along with empirical results demonstrating the unbiased convergence in the resulting estimates/samples. The local quadrature and sampling procedures are then extended to global procedures defined on more general domains by applying the local results to the cells of a Voronoi tessellation covering the domain in consideration. We then demonstrate how the proposed global sampling procedure can be used to define a natural framework for adaptively constructing Voronoi Piecewise Surrogate (VPS) approximations based on local error estimates. Finally, we show that the adaptive VPS procedure can be used to form a surrogate model approximation to a specified, potentially unnormalized, density function, and that the global sampling procedure can be used to efficiently draw independent samples from the surrogate density in parallel. The performance of the resulting VoroSpokes sampling framework is assessed on a collection of Bayesian inference problems and is shown to provide highly accurate posterior predictions which align with the results obtained using traditional methods such as Gibbs sampling and random-walk Markov Chain Monte Carlo (MCMC). Importantly, the proposed framework provides a foundation for performing Bayesian inference tasks which is entirely independent from the theory of Markov chains.

More Details

Kokkos Training Bootcamp

Trott, Christian R.

This report documents the completion of milestone STPM12-17 Kokkos Training Bootcamp. The goal of this milestone was to hold a combined tutorial and hackathon bootcamp event for the Kokkos community and prospective users. The Kokkos Bootcamp event was held at Argonne National Laboratories from August 27 — August 29, 2019. Attendance being lower than expected (we believe largely due to bad timing), the team focused with a select set of ECP partners on early work in preparation for Aurora. In particular we evaluated issues posed by exposing SYCL and OpenMP target offload to applications via the Kokkos Pro Model.

More Details

Balar: A SST GPU Component for Performance Modeling and Profiling

Hughes, Clayton H.; Hammond, Simon D.; Khairy, Mahmoud; Zhang, Mengchi; Green, Roland; Rogers, Timothy; Hoekstra, Robert J.

Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of massive amounts of data have created a space for massively parallel accelerators capable of maintaining context for thousands of concurrent threads resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. One path for the design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. The SST framework has been proven to scale up to run simulations containing tens of thousands of nodes. A previous report described the initial integration of the open-source, execution-driven GPU simulator, GPGPU-Sim, into the SST framework. This report discusses the results of the integration and how to use the new GPU component in SST. It also provides examples of what it can be used to analyze and a correlation study showing how closely the execution matches that of a Nvidia V100 GPU when running kernels and mini-apps.

More Details

Shortening the Design and Certification Cycle for Additively Manufactured Materials by Improved Mesoscale Simulations and Validation Experiments: Fiscal Year 2019 Status Report

Specht, Paul E.; Mitchell, John A.; Adams, David P.; Brown, Justin L.; Silling, Stewart A.; Wise, Jack L.; Palmer, Todd

This report outlines the fiscal year (FY) 2019 status of an ongoing multi-year effort to develop a general, microstructurally-aware, continuum-level model for representing the dynamic response of material with complex microstructures. This work has focused on accurately representing the response of both conventionally wrought processed and additively manufactured (AM) 304L stainless steel (SS) as a test case. Additive manufacturing, or 3D printing, is an emerging technology capable of enabling shortened design and certification cycles for stockpile components through rapid prototyping. However, there is not an understanding of how the complex and unique microstructures of AM materials affect their mechanical response at high strain rates. To achieve our project goal, an upscaling technique was developed to bridge the gap between the microstructural and continuum scales to represent AM microstructures on a Finite Element (FE) mesh. This process involves the simulations of the additive process using the Sandia developed kinetic Monte Carlo (KMC) code SPPARKS. These SPPARKS microstructures are characterized using clustering algorithms from machine learning and used to populate the quadrature points of a FE mesh. Additionally, a spall kinetic model (SKM) was developed to more accurately represent the dynamic failure of AM materials. Validation experiments were performed using both pulsed power machines and projectile launchers. These experiments have provided equation of state (EOS) and flow strength measurements of both wrought and AM 304L SS to above Mbar pressures. In some experiments, multi-point interferometry was used to quantify the variation is observed material response of the AM 304L SS. Analysis of these experiments is ongoing, but preliminary comparisons of our upscaling technique and SKM to experimental data were performed as a validation exercise. Moving forward, this project will advance and further validate our computational framework, using advanced theory and additional high-fidelity experiments.

More Details

Compatible Particle Discretizations (Final LDRD Report)

Bochev, Pavel B.; Bosler, Peter A.; Kuberry, Paul A.; Perego, Mauro P.; Peterson, Kara J.; Trask, Nathaniel A.

This report summarizes the work performed under a three year LDRD project aiming to develop mathematical and software foundations for compatible meshfree and particle discretizations. We review major technical accomplishments and project metrics such as publications, conference and colloquia presentations and organization of special sessions and minisimposia. The report concludes with a brief summary of ongoing projects and collaborations that utilize the products of this work.

More Details
Results 1876–1900 of 9,998
Results 1876–1900 of 9,998