Publications

Results 2801–3000 of 9,998

Search results

Jump to search filters

Topology Optimization for Nonlinear Transient Applications Using a Minimally Invasive Approach (LDRD Final Report)

Robbins, Joshua R.

The purpose of this project was to devise, implement, and demonstrate a method that can use Sandia's existing analysis codes (e.g., Sierra, Alegra, the CTH hydro code) with minimal modification to generate objective function gradients for optimization-based design in transient, non-linear, coupled-physics applications. The approach uses a Moving Least Squares representation of the geometry to substantially reduce the number of geometric degrees of freedom. A Multiple-Program Multiple-Data computing model is then used to compute objective gradients via finite differencing. Details of the formulation and implementation are provided, and example applications are presented that show effectiveness and scalability of the approach.

More Details

Sierra/SolidMechanics 4.50 Theory Manual

Merewether, Mark T.; Plews, Julia A.; Crane, Nathan K.; de Frias, Gabriel J.; Le, San L.; Littlewood, David J.; Mosby, Matthew D.; Pierson, Kendall H.; Porter, V.L.; Shelton, Timothy S.; Thomas, Jesse D.; Tupek, Michael R.; Veilleux, Michael V.; Xavier, Patrick G.; Manktelow, Kevin M.; Clutz, Christopher J.R.

Presented in this document are the theoretical aspects of capabilities contained in the Sierra / SM code. This manuscript serves as an ideal starting point for understanding the theoretical foundations of the code. For a comprehensive study of these capabilities, the reader is encouraged to explore the many references to scientific articles and textbooks contained in this manual. It is important to point out that some capabilities are still in development and may not be presented in this document. Further updates to this manuscript will be made as these capabilities come closer to production level.

More Details

Using simulation to examine the effect of MPI message matching costs on application performance

ACM International Conference Proceeding Series

Levy, Scott L.; Ferreira, Kurt B.

Attaining high performance with MPI applications requires efficient message matching to minimize message processing overheads and the latency these overheads introduce into application communication. In this paper, we use a validated simulation-based approach to examine the relationship between MPI message matching performance and application time-to-solution. Specifically, we examine how the performance of several important HPC workloads is affected by the time required for matching. Our analysis yields several important contributions: (i) the performance of current workloads is unlikely to be significantly affected by MPI matching unless match queue operations get much slower or match queues get much longer; (ii) match queue designs that provide sublinear performance as a function of queue length are unlikely to yield much benefit unless match queue lengths increase dramatically; and (iii) we provide guidance on how long the mean time per match attempt may be without significantly affecting application performance. The results and analysis in this paper provide valuable guidance on the design and development of MPI message match queues.

More Details

Exploring Applications of Random Walks on Spiking Neural Algorithms

Reeder, Leah E.; Hill, Aaron J.; Aimone, James B.; Severa, William M.

Neuromorphic computing has many promises in the future of computing due to its energy efficient and scalable implementation. Here we extend a neural algorithm that is able to solve the diffusion equation PDE by implementing random walks on neuromorphic hardware. Additionally, we introduce four random walk applications that use this spiking neural algorithm. The four applications currently implemented are: generating a random walk to replicate an image, finding a path between two nodes, finding triangles in a graph, and partitioning a graph into two sections. We then made these four applications available to be implemented on software using a graphical user interface (GUI).

More Details

Adjoint-based Calibration of Plasticity Model Parameters from Digital Image Correlation Data

Granzow, Brian N.; Seidl, Daniel T.

Parameter estimation for mechanical models of plastic deformation utilized in nuclear weapons systems is a laborious process for both experimentalists and constitutive modelers and is critical to producing meaningful numerical predictions. In this work we derive an adjoint-based optimization approach for a stabilized, large-deformation J2 plasticity model that is considerably more computationally efficient but no less accurate than current state of the art methods. Unlike most approaches to model calibration, we drive the inversion procedure with full-field deformation data that can be experimentally measured through established digital image or volume correlation techniques. We present numerical results for two and three dimensional model problems and comment on various directions of future research.

More Details

Kokkos R&D: Remote Memory Spaces WBS STPR 04 Milestone 7

Trott, Christian R.

This report documents the completion of milestone STPRO4-7 Kokkos R&D: Remote Memory Spaces for One-Sided Halo-Exchange. The goal of this milestone was to develop and deploy an initial capability to support PGAS like communication models integrated into Kokkos via Remote Memory Spaces. The team developed semantic requirements for Remote Memory Spaces and implemented a prototype library leveraging four different communication libraries: libQUO, SHMEM, MPI-OneSided and NVSHMEM. In conjunction with ADCD02-COPA the Remote Memory Space capability was used in ExaMiniMD — a Molecular Dynamics Proxy Application — to explore the current state of the technology and its usability. The obtained results demonstrate that usability is very good, allowing a significant simplification communication routines, but performance is still lacking.

More Details

WBS STPR 04 Milestone 4 Report

Trott, Christian R.; Sunderland, Daniel S.; Hoemmen, Mark F.

This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.

More Details

ECP STPR04 Milestone 5 Report

Trott, Christian R.

This report documents the completion of milestone STPRO4-5 Kokkos interoperability with general SIMD types to force vectorization on ATS-1. The Kokkos team worked with application developers to enable the utilization of SIMD intrinsics, which allowed up to 3.7x improvement of the affected kernels on ATS-1 in a proxy application. SIMD types are now deployed in the production code base.

More Details

WBS STPR 04 Milestone 4 Report

Sunderland, Daniel S.; Hoemmen, Mark F.; Trott, Christian R.

This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.

More Details

STPR 04 Milestone 6 Report

Trott, Christian R.; Ibanez-Granados, Daniel A.; Ellingwood, Nathan D.; Bova, S.W.; Labreche, Duane A.

This report documents the completion of milestone STPRO4-6 Kokkos Support for ASC applications and libraries. The team provided consultation and support for numerous ASC code projects including Sandias SPARC, EMPIRE, Aria, GEMMA, Alexa, Trilinos, LAMMPS and nimbleSM. Over the year more than 350 Kokkos github issues were resolved, with over 220 requiring fixes and enhancements to the code base. Resolving these requests, with many of them issued by ASC code teams, provided applications with the necessary capabilities in Kokkos to be successful.

More Details

ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE

Bettencourt, Matthew T.; Kramer, Richard M.; Cartwright, Keith C.; Phillips, Edward G.; Ober, Curtis C.; Pawlowski, Roger P.; Swan, Matthew S.; Kalashnikova, Irina; Phipps, Eric T.; Conde, Sidafa C.; Cyr, Eric C.; Ulmer, Craig D.; Kordenbrock, Todd H.; Levy, Scott L.; Templet, Gary J.; Hu, Jonathan J.; Lin, Paul L.; Glusa, Christian A.; Siefert, Christopher S.; Glass, Micheal W.

This report documents the outcome from the ASC ATDM Level 2 Milestone 6358: Assess Status of Next Generation Components and Physics Models in EMPIRE. This Milestone is an assessment of the EMPIRE (ElectroMagnetic Plasma In Realistic Environments) application and three software components. The assessment focuses on the electromagnetic and electrostatic particle-in-cell solutions for EMPIRE and its associated solver, time integration, and checkpoint-restart components. This information provides a clear understanding of the current status of the EMPIRE application and will help to guide future work in FY19 in order to ready the application for the ASC ATDM L1 Milestone in FY20. It is clear from this assessment that performance of the linear solver will have to be a focus in FY19.

More Details

Quantifying Uncertainty to Improve Decision Making in Machine Learning

Stracuzzi, David J.; Darling, Michael C.; Peterson, Matthew G.; Chen, Maximillian G.

Data-driven modeling, including machine learning methods, continue to play an increasing role in society. Data-driven methods impact decision making for applications ranging from everyday determinations about which news people see and control of self-driving cars to high-consequence national security situations related to cyber security and analysis of nuclear weapons reliability. Although modern machine learning methods have made great strides in model induction and show excellent performance in a broad variety of complex domains, uncertainty remains an inherent aspect of any data-driven model. In this report, we provide an update to the preliminary results on uncertainty quantification for machine learning presented in SAND2017-6776. Specifically, we improve upon the general problem definition and expand upon the experiments conducted for the earlier re- port. Most importantly, we summarize key lessons learned about how and when uncertainty quantification can inform decision making and provide valuable insights into the quality of learned models and potential improvements to them.

More Details

Implementing Neural Adaptive Filtering in Engineered Detection Systems

Chance, Frances S.; Warrender, Christina E.

The retina plays an important role in animal vision --- namely to pre-process visual information before sending it to the brain. The goal of this LDRD was to develop models of motion-sensitive retinal cells for the purpose of developing retinal-inspired algorithms to be applied to real-world data specific to Sandia's national security missions. We specifically focus on detection of small, dim moving targets amidst varying types of clutter or distractor signals. We compare a classic motion-sensitive model, the Hassenstein-Reichardt model, to a model of the OMS (object motion- sensitive) cell, and find that the Reichardt model performs better under continuous clutter (e.g. white noise) but is very sensitive to particular stimulus conditions (e.g. target velocity). We also demonstrate that lateral inhibition, a ubiquitous characteristic of neural circuitry, can effect target-size tuning, improving detection specifically of small targets.

More Details

Chance-Constrained Optimization for Critical Infrastructure Protection

Singh, Bismark S.; Watson, Jean-Paul W.

Stochastic optimization deals with making highly reliable decisions under uncertainty. Chance constraints are a crucial tool of stochastic optimization to develop mathematical optimization models; they form the backbone of many important national security data science applications. These include critical infrastructure resiliency, cyber security, power system operations, and disaster relief management. However, existing algorithms to solve chance-constrained optimization models are severely limited by problem size and structure. In this investigative study, we (i) develop new algorithms to approximate chance-constrained optimization models, (ii) demonstrate the application of chance-constraints to a national security problem, and (iii) investigate related stochastic optimization problems. We believe our work will pave way for new research is stochastic optimization as well as secure national infrastructures against unforeseen attacks.

More Details

High-Throughput Material Characterization using the Virtual Fields Method

Jones, Elizabeth M.; Carroll, Jay D.; Karlson, Kyle N.; Kramer, Sharlotte L.; Lehoucq, Richard B.; Reu, Phillip L.; Seidl, Daniel T.; Turner, Daniel Z.

Modeling material and component behavior using finite element analysis (FEA) is critical for modern engineering. One key to a credible model is having an accurate material model, with calibrated model parameters, which describes the constitutive relationship between the deformation and the resulting stress in the material. As such, identifying material model parameters is critical to accurate and predictive FEA. Traditional calibration approaches use only global data (e.g. extensometers and resultant force) and simplified geometries to find the parameters. However, the utilization of rapidly maturing full-field characterization techniques (e.g. Digital Image Correlation (DIC)) with inverse techniques (e.g. the Virtual Feilds Method (VFM)) provide a new, novel and improved method for parameter identification. This LDRD tested that idea: in particular, whether more parameters could be identified per test when using full-field data. The research described in this report successfully proves this hypothesis by comparing the VFM results with traditional calibration methods. Important products of the research include: verified VFM codes for identifying model parameters, a new look at parameter covariance in material model parameter estimation, new validation techniques to better utilize full-field measurements, and an exploration of optimized specimen design for improved data richness.

More Details

Neural Networks as Surrogates of Nonlinear High-Dimensional Parameter-to-Prediction Maps

Jakeman, John D.; Perego, Mauro P.; Severa, William M.

We present a preliminary investigation of the use of Multi-Layer Perceptrons (MLP) and Recurrent Neural Networks (RNNs) as surrogates of parameter-to-prediction maps of computational expensive dynamical models. In particular, we target the approximation of Quantities of Interest (QoIs) derived from the solution of a Partial Differential Equations (PDEs) at different time instants. In order to limit the scope of our study while targeting a relevant application, we focus on the problem of computing variations in the ice sheets mass (our QoI), which is a proxy for global mean sea-level changes. We present a number of neural network formulations and compare their performance with that of Polynomial Chaos Expansions (PCE) constructed on the same data.

More Details

Engineering Spin-Orbit Interaction in Silicon

Lu, Tzu-Ming L.; Maurer, Leon M.; Bussmann, Ezra B.; Harris, Charles T.; Tracy, Lisa A.; Sapkota, Keshab R.

There has been much interest in leveraging the topological order of materials for quantum information processing. Among the various solid-state systems, one-dimensional topological superconductors made out of strongly spin-orbit-coupled nanowires have been shown to be the most promising material platform. In this project, we investigated the feasibility of turning silicon, which is a non-topological semiconductor and has weak spin-orbit coupling, into a one-dimensional topological superconductor. Our theoretical analysis showed that it is indeed possible to create a sizable effective spin-orbit gap in the energy spectrum of a ballistic one-dimensional electron channel in silicon with the help of nano-magnet arrays. Experimentally, we developed magnetic materials needed for fabricating such nano-magnets, characterized the magnetic behavior at low temperatures, and successfully demonstrated the required magnetization configuration for opening the spin-orbit gap. Our results pave the way toward a practical topological quantum computing platform using silicon, one of the most technologically mature electronic materials.

More Details

Large-Scale System Monitoring Experiences and Recommendations

Ahlgren, V.; Andersson, S.; Brandt, James M.; Cardo, N.; Chunduri, S.; Enos, J.; Fields, P.; Gentile, Ann C.; Gerber, R.; Gienger, M.; Greenseid, J.; Greiner, A.; Hadri, B.; He, Y.; Hoppe, D.; Kaila, U.; Kelly, K.; Klein, M.; Kristiansen, A.; Leak, S.; Mason, M.; Laros, James H.; Piccinali, J-G; Repik, Jason; Rogers, J.; Salminen, S.; Showerman, M.; Whitney, C.; Williams, J.

Abstract not provided.

Characterizing MPI matching via trace-based simulation

Parallel Computing

Ferreira, Kurt B.; Levy, Scott L.; Laros, James H.; Grant, Ryan E.

With the increased scale expected on future leadership-class systems, detailed information about the resource usage and performance of MPI message matching provides important insights into how to maintain application performance on next-generation systems. However, obtaining MPI message matching performance data is often not possible without significant effort. A common approach is to instrument an MPI implementation to collect relevant statistics. While this approach can provide important data, collecting matching data at runtime perturbs the application's execution, including its matching performance, and is highly dependent on the MPI library's matchlist implementation. In this paper, we introduce a trace-based simulation approach to obtain detailed MPI message matching performance data for MPI applications without perturbing their execution. Using a number of key parallel workloads and microbenchmarks, we demonstrate that this simulator approach can rapidly and accurately characterize matching behavior. Specifically, we use our simulator to collect several important statistics about the operation of the MPI posted and unexpected queues. For example, we present data about search lengths and the duration that messages spend in the queues waiting to be matched. Data gathered using this simulation-based approach have significant potential to aid hardware designers in determining resource allocation for MPI matching functions and provide application and middleware developers with insight into the scalability issues associated with MPI message matching.

More Details

Optimal Design and Control of Qubits

von Winckel, Gregory J.

Research interest in developing computing systems that represent logic states using quantum mechanical observables has only increased in the few decades since its inception. While quantum computers, with Josephson junction based qubits, have now been commercially available in the last three years, there is also significant research initiative to develop scalable quantum computers with so-called donor qubits. B.E. Kane first published on a device implementation of a silicon-based quantum computer in 1998, which sparked a wave of follow-on advances due to the attractive nature of silicon-based computing[7]. Nearly all commercial computing systems using classical binary logic are fabricated using a silicon substrate and it is inarguably the most mature material system for semiconductor devices, so that coupling classical and quantum bits on a single substrate is possible. The process of growing and processing silicon crystals into wafers is extremely robust and leads to minimal impurities or structural defects.

More Details

Recent Diagnostic Platform Accomplishments for Studying Vacuum Power Flow Physics at the Sandia Z Accelerator

Laity, George R.; Aragon, Carlos A.; Bennett, Nichelle L.; Bliss, David E.; Laros, James H.; Fierro, Andrew S.; Gomez, Matthew R.; Hess, Mark H.; Hutsel, Brian T.; Jennings, Christopher A.; Johnston, Mark D.; Kossow, Michael R.; Lamppa, Derek C.; Martin, Matthew; Patel, Sonal P.; Porwitzky, Andrew J.; Robinson, Allen C.; Rose, David V.; Vandevender, Pace; Waisman, Eduardo M.; Webb, Timothy J.; Welch, Dale R.; Rochau, G.A.; Savage, Mark E.; Stygar, William; White, William M.; Sinars, Daniel S.; Cuneo, M.E.

Abstract not provided.

FY18 L2 Milestone #6360 Report: Initial Capability of an Arm-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Laros, James H.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white paper entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia National Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

Leveraging Intrinsic Principal Directions for Multifidelity Uncertainty Quantification

Geraci, Gianluca G.; Eldred, Michael S.

In this work we propose an approach for accelerating Uncertainty Quantification (UQ) analysis in the context of Multifidelity applications. In the presence of complex multiphysics applications, which often require a prohibitive computational cost for each evaluation, multifidelity UQ techniques try to accelerate the convergence of statistics by leveraging the in- formation collected from a larger number of a lower fidelity model realizations. However, at the-state-of-the-art, the performance of virtually all the multifidelity UQ techniques is related to the correlation between the high and low-fidelity models. In this work we proposed to design a multifidelity UQ framework based on the identification of independent important directions for each model. The main idea is that if the responses of each model can be represented in a common space, this latter can be shared to enhance the correlation when the samples are drawn with respect to it instead of the original variables. There are also two main additional advantages that follow from this approach. First, the models might be correlated even if their original parametrizations are chosen independently. Second, if the shared space between models has a lower dimensionality than the original spaces, the UQ analysis might benefit from a dimension reduction standpoint. In this work we designed this general framework and we also tested it on several test problems ranging from analytical functions for verification purpose, up to more challenging application problems as an aero-thermo-structural analysis and a scramjet flow analysis.

More Details

Neural Algorithms for Low Power Implementation of Partial Differential Equations

Aimone, James B.; Hill, Aaron J.; Lehoucq, Richard B.; Parekh, Ojas D.; Reeder, Leah E.; Severa, William M.

The rise of low-power neuromorphic hardware has the potential to change high-performance computing; however much of the focus on brain-inspired hardware has been on machine learning applications. A low-power solution for solving partial differential equations could radically change how we approach large-scale computing in the future. The random walk is a fundamental stochastic process that underlies many numerical tasks in scientific computing applications. We consider here two neural algorithms that can be used to efficiently implement random walks on spiking neuromorphic hardware. The first method tracks the positions of individual walkers independently by using a modular code inspired by grid cells in the brain. The second method tracks the densities of random walkers at each spatial location directly. We present the scaling complexity of each of these methods and illustrate their ability to model random walkers under different probabilistic conditions. Finally, we present implementations of these algorithms on neuromorphic hardware.

More Details

Sample Generation for Nuclear Data

Swiler, Laura P.; Adams, Brian M.; Wieselquist, William

This report summarizes a NEAMS (Nuclear Energy Advanced Modeling and Simulation) project focused on developing a sampling capability that can handle the challenges of generating samples from nuclear cross-section data. The covariance information between energy groups tends to be very ill-conditioned and thus poses a problem using traditional methods for generated correlated samples. This report outlines a method that addresses the sample generation from cross-section matrices. The treatment allows one to assume the cross sections are distributed with a multivariate normal distribution, lognormal distribution, or truncated normal distribution.

More Details

Underlying one-step methods and nonautonomous stability of general linear methods

Discrete and Continuous Dynamical Systems - Series B

Steyer, Andrew S.; Van Vleck, Erik S.

We generalize the theory of underlying one-step methods to strictly stable general linear methods (GLMs) solving nonautonomous ordinary differential equations (ODEs) that satisfy a global Lipschitz condition. We combine this theory with the Lyapunov and Sacker-Sell spectral stability theory for one-step methods developed in [34, 35, 36] to analyze the stability of a strictly stable GLM solving a nonautonomous linear ODE. These results are applied to develop a stability diagnostic for the solution of nonautonomous linear ODEs by strictly stable GLMs.

More Details

A Lyapunov and Sacker–Sell spectral stability theory for one-step methods

BIT Numerical Mathematics

Steyer, Andrew S.; Van Vleck, Erik S.

Approximation theory for Lyapunov and Sacker–Sell spectra based upon QR techniques is used to analyze the stability of a one-step method solving a time-dependent (nonautonomous) linear ordinary differential equation (ODE) initial value problem in terms of the local error. Integral separation is used to characterize the conditioning of stability spectra calculations. The stability of the numerical solution by a one-step method of a nonautonomous linear ODE using real-valued, scalar, nonautonomous linear test equations is justified. This analysis is used to approximate exponential growth/decay rates on finite and infinite time intervals and establish global error bounds for one-step methods approximating uniformly, exponentially stable trajectories of nonautonomous and nonlinear ODEs. A time-dependent stiffness indicator and a one-step method that switches between explicit and implicit Runge–Kutta methods based upon time-dependent stiffness are developed based upon the theoretical results.

More Details

Vanguard Astra and ATSE – an ARM-based Advanced Architecture Prototype System and Software Environment (FY18 L2 Milestone #8759 Report)

Laros, James H.; Laros, James H.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white paper entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia National Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads.

More Details

Validation metrics for deterministic and probabilistic data

Journal of Verification, Validation and Uncertainty Quantification

Maupin, Kathryn A.; Swiler, Laura P.; Porter, Nathan W.

Computational modeling and simulation are paramount to modern science. Computational models often replace physical experiments that are prohibitively expensive, dangerous, or occur at extreme scales. Thus, it is critical that these models accurately represent and can be used as replacements for reality. This paper provides an analysis of metrics that may be used to determine the validity of a computational model. While some metrics have a direct physical meaning and a long history of use, others, especially those that compare probabilistic data, are more difficult to interpret. Furthermore, the process of model validation is often application-specific, making the procedure itself challenging and the results difficult to defend. We therefore provide guidance and recommendations as to which validation metric to use, as well as how to use and decipher the results. An example is included that compares interpretations of various metrics and demonstrates the impact of model and experimental uncertainty on validation processes.

More Details

Building 725 Expansion

Lacy, Susan L.; Noe, John P.; Ogden, Jeffry B.; Hammond, Simon D.

In October 2017, Sandia broke ground for a new computing center dedicated to High Performance Computing. The east expansion of Building 725 was entirely conceived of, designed, and built in less than 18 months and is a certified LEED Gold design building, the first of its kind for a data center in the State of New Mexico. This 15,000 square-foot building, with novel energy and water-saving technologies, will house Astra, the first in a new generation of Advanced Architecture Prototype Systems to be deployed by the NNSA and the first of many HPC systems in Building 725 East.

More Details

Opal: A Centralized Memory Manager for Investigating Disaggregated Memory Systems

Kommareddy, Vamsee R.; Hughes, Clayton H.; Hammond, Simon D.; Awad, Amro

Many modern applications have memory footprints that are increasingly large, driving system memory capacities higher and higher. Moreover, these systems are often organized where the bulk of the memory is collocated with the compute capability, which necessitates the need for message passing APIs to facilitate information sharing between compute nodes. Due to the diversity of applications that must run on High-Performance Computing (HPC) systems, the memory utilization can fluctuate wildly from one application to another. And, because memory is located in the node, maintenance can become problematic because each node must be taken offline and upgraded individually. To address these issues, vendors are exploring disaggregated, memory-centric, systems. In this type of organization, there are discrete nodes, reserved solely for memory, which are shared across many compute nodes. Due to their capacity, low-power, and non-volatility, Non-Volatile Memories (NVMs) are ideal candidates for these memory nodes. This report discusses a new component for the Structural Simulation Toolkit (SST), Opal, that can be used to study the impact of using NVMs in a disaggregated system in terms of performance, security, and memory management. This page intentionally left blank.

More Details

Highly scalable discrete-particle simulations with novel coarse-graining: accessing the microscale

Molecular Physics

Mattox, Timothy I.; Larentzos, James P.; Moore, Stan G.; Stone, Christopher P.; Ibanez-Granados, Daniel A.; Thompson, Aidan P.; Lisal, Martin; Brennan, John K.; Plimpton, Steven J.

Simulating energetic materials with complex microstructure is a grand challenge, where until recently, an inherent gap in computational capabilities had existed in modelling grain-scale effects at the microscale. We have enabled a critical capability in modelling the multiscale nature of the energy release and propagation mechanisms in advanced energetic materials by implementing, in the widely used LAMMPS molecular dynamics (MD) package, several novel coarse-graining techniques that also treat chemical reactivity. Our innovative algorithmic developments rooted within the dissipative particle dynamics framework, along with performance optimisations and application of acceleration technologies, have enabled extensions in both the length and time scales far beyond those ever realised by atomistic reactive MD simulations. In this paper, we demonstrate these advances by modelling a shockwave propagating through a microstructured material and comparing performance with the state-of-the-art in atomistic reactive MD techniques. As a result of this work, unparalleled explorations in energetic materials research are now possible.

More Details

Investment optimization to improve power system resilience

2018 International Conference on Probabilistic Methods Applied to Power Systems, PMAPS 2018 - Proceedings

Pierre, Brian J.; Arguello, Bryan A.; Staid, Andrea S.; Guttromson, Ross G.

Power system utilities continue to strive for increased system resiliency. However, quantifying a baseline system resilience, and deciding the optimal investments to improve their resilience is challenging. This paper discusses a method to create scenarios, based on historical data, that represent the threats of severe weather events, their probability of occurrence, and the system wide consequences they generate. This paper also presents a mixed-integer stochastic nonlinear optimization model which uses the scenarios as an input to determine the optimal investments to reduce the system impacts from those scenarios. The optimization model utilizes a DC power flow to determine the loss of load during an event. Loss of load is the consequence that is minimized in this optimization model as the objective function. The results shown in this paper are from the IEEE RTS-96 three area reliability model. The scenario generation and optimization model have also been utilized on full utility models, but those results cannot be published.

More Details

Investment optimization to improve power system resilience

2018 International Conference on Probabilistic Methods Applied to Power Systems, PMAPS 2018 - Proceedings

Pierre, Brian J.; Arguello, Bryan A.; Staid, Andrea S.; Guttromson, Ross G.

Power system utilities continue to strive for increased system resiliency. However, quantifying a baseline system resilience, and deciding the optimal investments to improve their resilience is challenging. This paper discusses a method to create scenarios, based on historical data, that represent the threats of severe weather events, their probability of occurrence, and the system wide consequences they generate. This paper also presents a mixed-integer stochastic nonlinear optimization model which uses the scenarios as an input to determine the optimal investments to reduce the system impacts from those scenarios. The optimization model utilizes a DC power flow to determine the loss of load during an event. Loss of load is the consequence that is minimized in this optimization model as the objective function. The results shown in this paper are from the IEEE RTS-96 three area reliability model. The scenario generation and optimization model have also been utilized on full utility models, but those results cannot be published.

More Details

Stochastic unit commitment performance considering monte carlo wind power scenarios

2018 International Conference on Probabilistic Methods Applied to Power Systems, PMAPS 2018 - Proceedings

Rachunok, Benjamin A.; Staid, Andrea S.; Watson, Jean-Paul W.; Woodruff, David L.; Yang, Dominic

Stochastic versions of the unit commitment problem have been advocated for addressing the uncertainty presented by high levels of wind power penetration. However, little work has been done to study trade-offs between computational complexity and the quality of solutions obtained as the number of probabilistic scenarios is varied. Here, we describe extensive experiments using real publicly available wind power data from the Bonneville Power Administration. Solution quality is measured by re-enacting day-ahead reliability unit commitment (which selects the thermal units that will be used each hour of the next day) and real-time economic dispatch (which determines generation levels) for an enhanced WECC-240 test system in the context of a production cost model simulator; outputs from the simulation, including cost, reliability, and computational performance metrics, are then analyzed. Unsurprisingly, we find that both solution quality and computational difficulty increase with the number of probabilistic scenarios considered. However, we find unexpected transitions in computational difficulty at a specific threshold in the number of scenarios, and report on key trends in solution performance characteristics. Our findings are novel in that we examine these tradeoffs using real-world wind power data in the context of an out-of-sample production cost model simulation, and are relevant for both practitioners interested in deploying and researchers interested in developing scalable solvers for stochastic unit commitment.

More Details

Generation and application of multivariate polynomial quadrature rules

Computer Methods in Applied Mechanics and Engineering

Jakeman, John D.; Narayan, Akil

The search for multivariate quadrature rules of minimal size with a specified polynomial accuracy has been the topic of many years of research. Finding such a rule allows accurate integration of moments, which play a central role in many aspects of scientific computing with complex models. The contribution of this paper is twofold. First, we provide novel mathematical analysis of the polynomial quadrature problem that provides a lower bound for the minimal possible number of nodes in a polynomial rule with specified accuracy. We give concrete but simplistic multivariate examples where a minimal quadrature rule can be designed that achieves this lower bound, along with situations that showcase when it is not possible to achieve this lower bound. Our second contribution is the formulation of an algorithm that is able to efficiently generate multivariate quadrature rules with positive weights on non-tensorial domains. Our tests show success of this procedure in up to 20 dimensions. We test our method on applications to dimension reduction and chemical kinetics problems, including comparisons against popular alternatives such as sparse grids, Monte Carlo and quasi Monte Carlo sequences, and Stroud rules. The quadrature rules computed in this paper outperform these alternatives in almost all scenarios.

More Details

Gradient-based optimization for regression in the functional tensor-train format

Journal of Computational Physics

Gorodetsky, Alex A.; Jakeman, John D.

Predictive analysis of complex computational models, such as uncertainty quantification (UQ), must often rely on using an existing database of simulation runs. In this paper we consider the task of performing low-multilinear-rank regression on such a database. Specifically we develop and analyze an efficient gradient computation that enables gradient-based optimization procedures, including stochastic gradient descent and quasi-Newton methods, for learning the parameters of a functional tensor-train (FT). We compare our algorithms with 22 other nonparametric and parametric regression methods on 10 real-world data sets and show that for many physical systems, exploiting low-rank structure facilitates efficient construction of surrogate models. Here, we use a number of synthetic functions to build insight into behavior of our algorithms, including the rank adaptation and group-sparsity regularization procedures that we developed to reduce overfitting. Finally we conclude the paper by building a surrogate of a physical model of a propulsion plant on a naval vessel.

More Details

Wireless Temperature Sensing Using Permanent Magnets for Nonlinear Feedback Control of Exothermic Polymers

IEEE Sensors Journal

Mazumdar, Anirban; Mazumdar, Yi C.; van Bloemen Waanders, Bart G.; Brooks, Carlton F.; Kuehl, Michael K.; Nemer, Martin N.

Epoxies and resins can require careful temperature sensing and control in order to monitor and prevent degradation. To sense the temperature inside a mold, it is desirable to utilize a small, wireless sensing element. In this paper, we describe a new architecture for wireless temperature sensing and closed-loop temperature control of exothermic polymers. This architecture is the first to utilize magnetic field estimates of the temperature of permanent magnets within a temperature feedback control loop. We further improve performance and applicability by demonstrating sensing performance at relevant temperatures, incorporating a cure estimator, and implementing a nonlinear temperature controller. This novel architecture enables unique experimental results featuring closed-loop control of an exothermic resin without any physical connection to the inside of the mold. In this paper we describe each of the unique features of this approach including magnetic field-based temperature sensing, Extended Kalman Filtering (EKF) for cure state estimation, and nonlinear feedback control over time-varying temperature trajectories. We use experimental results to demonstrate how low-cost permanent magnets can provide wireless temperature sensing up to ~90°C. In addition, we use a polymer curecontrol test-bed to illustrate how internal temperature sensing can provide improved temperature control over both short and long time-scales. In conclusion, this wireless temperature sensing and control architecture holds value for a range of manufacturing applications.

More Details

Open science on Trinity's knights landing partition: An analysis of user job data

ACM International Conference Proceeding Series

Levy, Scott L.; Laros, James H.; Ferreira, Kurt B.

High-performance computing (HPC) systems are critically important to the objectives of universities, national laboratories, and commercial companies. Because of the cost of deploying and maintaining these systems ensuring their efficient use is imperative. Job scheduling and resource management are critically important to the efficient use of HPC systems. As a result, significant research has been conducted on how to effectively schedule user jobs on HPC systems. Developing and evaluating job scheduling algorithms, however, requires a detailed understanding of how users request resources on HPC systems. In this paper, we examine a corpus of job data that was collected on Trinity, a leadership-class supercomputer. During the stabilization period of its Intel Xeon Phi (Knights Landing) partition, it was made available to users outside of a classified environment for the Trinity Open Science Phase 2 campaign. We collected information from the resource manager about each user job that was run during this Open Science period. In this paper, we examine the jobs contained in this dataset. Our analysis reveals several important characteristics of the jobs submitted during the Open Science period and provides critical insight into the use of one of the most powerful supercomputers in existence. Specifically, these data provide important guidance for the design, development, and evaluation of job scheduling and resource management algorithms.

More Details

Predictive Science ASC Alliance Program (PSAAP) II: 2018 Review of the Carbon Capture Multidisciplinary Science Center (CCMSC) at the University of Utah

Hoekstra, Robert J.; Hungerford, Aimee L.; Montoya, David R.; Ferencz, Robert M.; Kuhl, Alan L.; Ruggirello, Kevin P.

The review team convened at the University of Utah March 7-8, 2018, to review the Carbon Capture Multidisciplinary Science Center (CCMSC) funded by the 2nd Predictive Science ASC Alliance Program (PSAAP II). Center leadership and researchers made very clear and informative presentations, accurately portraying their work and successes while candidly discussing their concerns and known areas in need of improvement.

More Details

The case for semi-permanent cache occupancy

ACM International Conference Proceeding Series

Dosanjh, Matthew D.; Ghazimirsaeed, S.M.; Grant, Ryan E.; Schonbein, William W.; Levenhagen, Michael J.; Bridges, Patrick G.; Afsahi, Ahmad

The performance critical path for MPI implementations relies on fast receive side operation, which in turn requires fast list traversal. The performance of list traversal is dependent on data-locality; whether the data is currently contained in a close-to-core cache due to its temporal locality or if its spacial locality allows for predictable pre-fetching. In this paper, we explore the effects of data locality on the MPI matching problem by examining both forms of locality. First, we explore spacial locality, by combining multiple entries into a single linked list element, we can control and modify this form of locality. Secondly, we explore temporal locality by utilizing a new technique called “hot caching”, a process that creates a thread to periodically access certain data, increasing its temporal locality. In this paper, we show that by increasing data locality, we can improve MPI performance on a variety of architectures up to 4x for micro-benchmarks and up to 2x for an application.

More Details

Tacho: Memory-scalable task parallel sparse cholesky factorization

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Kim, Kyungjoo K.; Edwards, Harold C.; Rajamanickam, Sivasankaran R.

We present a memory-scalable, parallel, sparse multifrontal solver for solving symmetric postive-definite systems arising in scientific and engineering applications. Factorizing sparse matrices requires memory for both the computed factors and the temporary workspaces for computing each frontal matrix - a data structure commonly used within multifrontal methods. To factorize multiple frontal matrices in parallel, the conventional approach is to allocate a uniform workspace for each hardware thread. In the manycore era, this results in increasing memory usage proportional to the number of hardware threads. We remedy this problem by using dynamic task parallelism with a scalable memory pool. Tasks are spawned while traversing an assembly tree and executed after their dependences are satisfied. We also use an idea to respawn the tasks when certain conditions are not met. Temporary workspace for frontal matrices in each task is allocated from a memory pool designed by us. If the requested memory space is not available in the memory pool, the task is respawned to yield the hardware thread to execute other tasks. The respawned task is executed after high priority tasks are executed. This approach allows to have robust parallel performance within a bounded memory space. Experimental results demonstrate the merits of our implementation on Intel multicore and manycore architectures.

More Details

Optimal cooperative checkpointing for shared high-performance computing platforms

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Herault, Thomas; Robert, Yves; Bouteiller, Aurelien; Arnold, Dorian; Ferreira, Kurt B.; Bosilca, George; Dongarra, Jack

In high-performance computing environments, input/output (I/O) from various sources often contend for scarce available bandwidth. Adding to the I/O operations inherent to the failure-free execution of an application, I/O from checkpoint/restart (CR) operations (used to ensure progress in the presence of failures) place an additional burden as it increase I/O contention, leading to degraded performance. In this work, we consider a cooperative scheduling policy that optimizes the overall performance of concurrently executing CR-based applications which share valuable I/O resources. First, we provide a theoretical model and then derive a set of necessary constraints needed to minimize the global waste on the platform. Our results demonstrate that the optimal checkpoint interval, as defined by Young/Daly, despite providing a sensible metric for a single application, is not sufficient to optimally address resource contention at the platform scale. We therefore show that combining optimal checkpointing periods with I/O scheduling strategies can provide a significant improvement on the overall application performance, thereby maximizing platform throughput. Overall, these results provide critical analysis and direct guidance on checkpointing large-scale workloads in the presence of competing I/O while minimizing the impact on application performance.

More Details

A comparison of power management mechanisms: P-States vs. node-level power cap control

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Laros, James H.; Grant, Ryan E.; Laros, James H.; Levenhagen, Michael J.; Olivier, Stephen L.; Ward, Harry L.; Younge, Andrew J.

Large-scale HPC systems increasingly incorporate sophisticated power management control mechanisms. While these mechanisms are potentially useful for performing energy and/or power-aware job scheduling and resource management (EPA JSRM), greater understanding of their operation and performance impact on real-world applications is required before they can be applied effectively in practice. In this paper, we compare static p-state control to static node-level power cap control on a Cray XC system. Empirical experiments are performed to evaluate node-to-node performance and power usage variability for the two mechanisms. We find that static p-state control produces more predictable and higher performance characteristics than static node-level power cap control at a given power level. However, this performance benefit is at the cost of less predictable power usage. Static node-level power cap control produces predictable power usage but with more variable performance characteristics. Our results are not intended to show that one mechanism is better than the other. Rather, our results demonstrate that the mechanisms are complementary to one another and highlight their potential for combined use in achieving effective EPA JSRM solutions.

More Details

Level-spread: A new job allocation policy for dragonfly networks

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018

Zhang, Yijia; Tuncer, Ozan; Kaplan, Fulya; Olcoz, Katzalin; Leung, Vitus J.; Coskun, Ayse K.

The dragonfly network topology has attracted attention in recent years owing to its high radix and constant diameter. However, the influence of job allocation on communication time in dragonfly networks is not fully understood. Recent studies have shown that random allocation is better at balancing the network traffic, while compact allocation is better at harnessing the locality in dragonfly groups. Based on these observations, this paper introduces a novel allocation policy called Level-Spread for dragonfly networks. This policy spreads jobs within the smallest network level that a given job can fit in at the time of its allocation. In this way, it simultaneously harnesses node adjacency and balances link congestion. To evaluate the performance of Level-Spread, we run packet-level network simulations using a diverse set of application communication patterns, job sizes, and communication intensities. We also explore the impact of network properties such as the number of groups, number of routers per group, machine utilization level, and global link bandwidth. Level-Spread reduces the communication overhead by 16% on average (and up to 71%) compared to the state-of-The-Art allocation policies.

More Details

Hybrid Finite Element--Spectral Method for the Fractional Laplacian: Approximation Theory and Efficient Solver

SIAM Journal on Scientific Computing

Glusa, Christian A.; Ainsworth, Mark

Here, a numerical scheme is presented for approximating fractional order Poisson problems in two and three dimensions. The scheme is based on reformulating the original problem posed over $\Omega$ on the extruded domain $\mathcal{C}=\Omega\times[0,\infty)$ following. The resulting degenerate elliptic integer order PDE is then approximated using a hybrid FEM-spectral scheme. Finite elements are used in the direction parallel to the problem domain $\Omega$, and an appropriate spectral method is used in the extruded direction. The spectral part of the scheme requires that we approximate the true eigenvalues of the integer order Laplacian over $\Omega$. We derive an a priori error estimate which takes account of the error arising from using an approximation in place of the true eigenvalues. We further present a strategy for choosing approximations of the eigenvalues based on Weyl's law and finite element discretizations of the eigenvalue problem. The system of linear algebraic equations arising from the hybrid FEM-spectral scheme is decomposed into blocks which can be solved effectively using standard iterative solvers such as multigrid and conjugate gradient. Numerical examples in two and three dimensions suggest that the approach is quasi-optimal in terms of complexity.

More Details

Unraveling network-induced memory contention: Deeper insights with machine learning

IEEE Transactions on Parallel and Distributed Systems

Groves, Taylor G.; Grant, Ryan E.; Gonzales, Aaron; Arnold, Dorian

Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems - enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. In this work we examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMC's resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMC's impact grows with scale resulting in up to 3X performance degradation at scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. Additionally, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Lastly, we evaluated three potential techniques to reduce NiMC's impact, namely hardware offloading, core reservation and software-based network throttling.

More Details

Performance of preconditioned iterative solvers in MFiX–Trilinos for fluidized beds

Journal of Supercomputing

Spotz, William S.; Krushnarao Kotteda, V.M.; Kumar, Vinod

MFiX, a general-purpose Fortran-based suite, simulates the complex flow in fluidized bed applications via BiCGStab and GMRES methods along with plane relaxation preconditioners. Trilinos, an object-oriented framework, contains various first- and second-generation Krylov subspace solvers and preconditioners. We developed a framework to integrate MFiX with Trilinos as MFiX does not possess advanced linear methods. The framework allows MFiX to access advanced linear solvers and preconditioners in Trilinos. The integrated solver is called MFiX–Trilinos, here after. In the present work, we study the performance of variants of GMRES and CGS methods in MFiX–Trilinos and BiCGStab and GMRES solvers in MFiX for a 3D gas–solid fluidized bed problem. Two right preconditioners employed along with various solvers in MFiX–Trilinos are Jacobi and smoothed aggregation. The flow from MFiX–Trilinos is validated against the same from MFiX for BiCGStab and GMRES methods. And, the effect of the preconditioning on the iterative solvers in MFiX–Trilinos is also analyzed. In addition, the effect of left and right smoothed aggregation preconditioning on the solvers is studied. The performance of the first- and second-generation solver stacks in MFiX–Trilinos is studied as well for two different problem sizes.

More Details

Generation and application of multivariate polynomial quadrature rules

Computer Methods in Applied Mechanics and Engineering

Jakeman, John D.; Narayan, Akil

The search for multivariate quadrature rules of minimal size with a specified polynomial accuracy has been the topic of many years of research. Finding such a rule allows accurate integration of moments, which play a central role in many aspects of scientific computing with complex models. The contribution of this paper is twofold. First, we provide novel mathematical analysis of the polynomial quadrature problem that provides a lower bound for the minimal possible number of nodes in a polynomial rule with specified accuracy. We give concrete but simplistic multivariate examples where a minimal quadrature rule can be designed that achieves this lower bound, along with situations that showcase when it is not possible to achieve this lower bound. Our second contribution is the formulation of an algorithm that is able to efficiently generate multivariate quadrature rules with positive weights on non-tensorial domains. Our tests show success of this procedure in up to 20 dimensions. We test our method on applications to dimension reduction and chemical kinetics problems, including comparisons against popular alternatives such as sparse grids, Monte Carlo and quasi Monte Carlo sequences, and Stroud rules. The quadrature rules computed in this paper outperform these alternatives in almost all scenarios.

More Details

Large-Scale System Monitoring Experiences and Recommendations

Ahlgren, V.; Andersson, S.; Brandt, James M.; Cardo, N.; Chunduri, S.; Enos, J.; Fields, P.; Gentile, Ann C.; Gerber, R.; Gienger, M.; Greenseid, J.; Greiner, A.; Hadri, B.; He, Y.; Hoppe, D.; Kaila, U.; Kelly, K.; Klein, M.; Kristiansen, A.; Leak, S.; Mason, M.; Laros, James H.; Piccinali, J-G; Repik, Jason; Rogers, J.; Salminen, S.; Showerman, M.; Whitney, C.; Williams, J.

Abstract not provided.

Bi-fidelity approximation for uncertainty quantification and sensitivity analysis of irradiated particle-laden turbulence

Geraci, Gianluca G.; Fairbanks, Hillary; Jofre, Lluis; Iaccarino, Gianluca; Doostan, Alireza

Efficiently performing predictive studies of irradiated particle-laden turbulent flows has the potential of providing significant contributions towards better understanding and optimizing, for example, concentrated solar power systems. As there are many uncertainties inherent in such flows, conducting uncertainty quantification analyses is fundamental to improve the predictive capabilities of the numerical simulations. For largescale, multi-physics problems exhibiting high-dimensional uncertainty, characterizing the stochastic solution presents a significant computational challenge as many methods require a large number of high-fidelity, forward model solves. This requirement results in the need for a possibly infeasible number of simulations when a typical converged high-fidelity simulation requires intensive computational resources. To reduce the cost of quantifying high-dimensional uncertainties, we investigate the application of a non-intrusive, bi-fidelity approximation to estimate statistics of quantities of interest associated with an irradiated particle-laden turbulent flow. This method relies on exploiting the low-rank structure of the solution to accelerate the stochastic sampling and approximation processes by means of cheaper-to-run, lower fidelity representations. The application of this bi-fidelity approximation results in accurate estimates of the QoI statistics while requiring a small number of high-fidelity model evaluations. It also enables efficient computation of sensitivity analyses which highlight that epistemic uncertainty plays an important role in the solution of irradiated, particle-laden turbulent flow.

More Details

Fast Approximate Union Volume in High Dimensions with Line Samples

Mitchell, Scott A.; Awad, Muhammad A.; Ebeida, Mohamed S.; Swiler, Laura P.

The classical problem of calculating the volume of the union of d-dimensional balls is known as "Union Volume." We present line-sampling approximation algorithms for Union Volume. Our methods may be extended to other Boolean operations, such as setminus; or to other shapes, such as hyper-rectangles. The deterministic, exact approaches for Union Volume do not scale well to high dimensions. However, we adapt several of these exact approaches to approximation algorithms based on sampling. We perform local sampling within each ball using lines. We have several variations, depending on how the overlapping volume is partitioned, and depending on whether radial, axis-aligned, or other line patterns are used. Our variations fall within the family of Monte Carlo sampling, and hence have about the same theoretical convergence rate, 1 /$\sqrt{M}$, where M is the number of samples. In our limited experiments, line-sampling proved more accurate per unit work than point samples, because a line sample provides more information, and the analytic equation for a sphere makes the calculation almost as fast. We performed a limited empirical study of the efficiency of these variations. We suggest a more extensive study for future work. We speculate that different ball arrangements, differentiated by the distribution of overlaps in terms of volume and degree, will benefit the most from patterns of line samples that preferentially capture those overlaps. Acknowledgement We thank Karl Bringman for explaining his BF-ApproxUnion (ApproxUnion) algorithm [3] to us. We thank Josiah Manson for pointing out that spoke darts oversample the center and we might get a better answer by uniform sampling. We thank Vijay Natarajan for suggesting random chord sampling. The authors are grateful to Brian Adams, Keith Dalbey, and Vicente Romero for useful technical discussions. This work was sponsored by the Laboratory Directed Research and Development (LDRD) Program at Sandia National Laboratories. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR), Applied Mathematics Program. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525.

More Details

A Role for IEEE in Quantum Computing

Computer

DeBenedictis, Erik

Will quantum computation become an important milestone in human progress? Passionate advocates and equally passionate skeptics abound. IEEE already provides useful, neutral forums for state-of-the-art science and engineering knowledge as well as practical benchmarks for quantum computation evaluation. But could the organization do more.

More Details

Kokkos Training Bootcamp WBS STPM12 Milestone 4

Trott, Christian R.; Lopez, Graham; Shipman, Galen

This report documents the completion of milestone STPM12-4 Kokkos Training Bootcamp. The goal of this milestone was to hold a combined tutorial and hackathon bootcamp event for the Kokkos community and prospective users. The Kokkos Bootcamp event was held on-site at Oak Ridge National Lab from July 24 — July 27, 2018. There were over 40 registered participants from 12 institutions, including 7 Kokkos project staff from SNL, LANL, and ORNL. The event consisted of a roughly a two-day tutorial session including hands exercises, followed by 1.5 days of intensive porting work on codes that the participants brought explore, port, and optimize the use of Kokkos with the help of Kokkos project experts.

More Details

Gas-kinetic simulation of sustained turbulence in minimal Couette flow

Physical Review Fluids

Gallis, Michail A.; Torczynski, J.R.; Bitter, Neal B.; Koehler, Timothy P.; Plimpton, Steven J.; Papadakis, George

Here, we provide a demonstration that gas-kinetic methods incorporating molecular chaos can simulate the sustained turbulence that occurs in wall-bounded turbulent shear flows. The direct simulation Monte Carlo method, a gas-kinetic molecular method that enforces molecular chaos for gas-molecule collisions, is used to simulate the minimal Couette flow at Re = 500 . The resulting law of the wall, the average wall shear stress, the average kinetic energy, and the continually regenerating coherent structures all agree closely with corresponding results from direct numerical simulation of the Navier-Stokes equations. Finally, these results indicate that molecular chaos for collisions in gas-kinetic methods does not prevent development of molecular-scale long-range correlations required to form hydrodynamic-scale turbulent coherent structures.

More Details

Retinal-inspired algorithms for detection of moving objects

ACM International Conference Proceeding Series

Chance, Frances S.; Warrender, Christina E.

The retina plays an important role in animal vision - namely preprocessing visual information before sending it to the brain through the optic nerve. Understanding howthe retina does this is of particular relevance for development and design of neuromorphic sensors, especially those focused towards image processing. Our research focuses on examining mechanisms of motion processing in the retina. We are specifically interested in detection of moving targets under challenging conditions, specifically small or low-contrast (dim) targets amidst high quantities of clutter or distractor signals. In this paper we compare a classic motion-sensitive cell model, the Hassenstein-Reichardt model, to a model of the OMS (object motion-sensitive) cell, that relies primarily on change-detection, and describe scenarios for which each model is better suited. We also examine mechanisms, inspired by features of retinal circuitry, by which performance may be enhanced. For example, lateral inhibition (mediated by amacrine cells) conveys selectivity for small targets to the W3 ganglion cell - we demonstrate that a similar mechanism can be combined with the previously mentioned motion-processing cell models to select small moving targets for further processing.

More Details

Retinal-inspired algorithms for detection of moving objects

ACM International Conference Proceeding Series

Chance, Frances S.; Warrender, Christina E.

The retina plays an important role in animal vision - namely preprocessing visual information before sending it to the brain through the optic nerve. Understanding howthe retina does this is of particular relevance for development and design of neuromorphic sensors, especially those focused towards image processing. Our research focuses on examining mechanisms of motion processing in the retina. We are specifically interested in detection of moving targets under challenging conditions, specifically small or low-contrast (dim) targets amidst high quantities of clutter or distractor signals. In this paper we compare a classic motion-sensitive cell model, the Hassenstein-Reichardt model, to a model of the OMS (object motion-sensitive) cell, that relies primarily on change-detection, and describe scenarios for which each model is better suited. We also examine mechanisms, inspired by features of retinal circuitry, by which performance may be enhanced. For example, lateral inhibition (mediated by amacrine cells) conveys selectivity for small targets to the W3 ganglion cell - we demonstrate that a similar mechanism can be combined with the previously mentioned motion-processing cell models to select small moving targets for further processing.

More Details

Dynamic Analysis of Executables to Detect and Characterize Malware

Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018

Smith, Michael R.; Ingram, Joey; Lamb, Christopher L.; Draelos, Timothy J.; Doak, Justin E.; Aimone, James B.; James, Conrad D.

Malware detection and remediation is an on-going task for computer security and IT professionals. Here, we examine the use of neural algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored. We examine several deep learning techniques, and liquid state machines baselined against a random forest. The experiments examine the effects of concept drift to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the performance evaluation on an operational network may not match the performance achieved in training. Namely, the CAA may be about the same, but the values for precision and recall over the malware can change significantly. We structure experiments to highlight these caveats and offer insights into expected performance in operational environments. In addition, we use the induced models to better understand what differentiates malware samples from goodware, which can further be used as a forensics tool to provide directions for investigation and remediation.

More Details

Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting

Proceedings of the IEEE Conference on Decision and Control

Faust, Aleksandra; Aimone, James B.; James, Conrad D.; Tapia, Lydia

This paper formulates general computation as a feedback-control problem, which allows the agent to autonomously overcome some limitations of standard procedural language programming: resilience to errors and early program termination. Our formulation considers computation to be trajectory generation in the program's variable space. The computing then becomes a sequential decision making problem, solved with reinforcement learning (RL), and analyzed with Lyapunov stability theory to assess the agent's resilience and progression to the goal. We do this through a case study on a quintessential computer science problem, array sorting. Evaluations show that our RL sorting agent makes steady progress to an asymptotically stable goal, is resilient to faulty components, and performs less array manipulations than traditional Quicksort and Bubble sort.

More Details

Exploring and quantifying how communication behaviors in proxies relate to real applications

Proceedings of PMBS 2018: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis

Aaziz, Omar R.; Cook, Jeanine C.; Cook, Jonathan E.; Vaughan, Courtenay T.

Proxy applications, or proxies, are simple applications meant to exercise systems in a way that mimics real applications (their parents). However, characterizing the relationship between the behavior of parent and proxy applications is not an easy task. In prior work [1], we presented a data-driven methodology to characterize the relationship between parent and proxy applications based on collecting runtime data from both and then using data analytics to find their correspondence or divergence. We showed that it worked well for hardware counter data, but our initial attempt using MPI function data was less satisfactory. In this paper, we present an exploratory effort at making an improved quantification of the correspondence of communication behavior for proxies and their respective parent applications. We present experimental evidence of positive results using four proxy applications from the current ECP Proxy Application Suite and their corresponding parent applications (in the ECP application portfolio). Results show that each proxy analyzed is representative of its parent with respect to communication data. In conjunction with our method presented in [1] (correspondence between computation and memory behavior), we get a strong understanding of how well a proxy predicts the comprehensive performance of its parent.

More Details

Low thread-count gustavson: A multithreaded algorithm for sparse matrix-matrix multiplication using perfect hashing

Proceedings of ScalA 2018: 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis

Laros, James H.; Siefert, Christopher S.

Sparse matrix-matrix multiplication is a critical kernel for several scientific computing applications, especially the setup phase of algebraic multigrid. The MPI+X programming model, which is growing in popularity, requires that such kernels be implemented in a way that exploits on-node parallelism. We present a single-pass OpenMP variant of Gustavson's sparse matrix matrix multiplication algorithm designed for architectures (e.g. CPU or Intel Xeon Phi) with reasonably large memory and modest thread counts (tens of threads, not thousands). These assumptions allow us to exploit perfect hashing and dynamic memory allocation to achieve performance improvements of up to 2x over third-party kernels for matrices derived from algebraic multigrid setup.

More Details

Physics-Informed Machine Learning for DRAM Error Modeling

2018 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2018

Baseman, Elisabeth; Debardeleben, Nathan; Blanchard, Sean; Moore, Juston; Tkachenko, Olena; Ferreira, Kurt B.; Siddiqua, Taniya; Sridharan, Vilas

As the scale of high performance computing facilities approaches the exascale era, gaining a detailed understanding of hardware failures becomes important. In particular, the extreme memory capacity of modern supercomputers means that data corruption errors which were statistically negligible at smaller scales will become more prevalent. In order to understand hardware faults and mitigate their adverse effects on exascale workloads, we must learn from the behavior of current hardware. In this work, we investigate the predictability of DRAM errors using field data from two recently decommissioned supercomputers: Cielo, at Los Alamos National Laboratory, and Hopper, at Lawrence Berkeley National Laboratory. Due to the volume and complexity of the field data, we apply statistical machine learning to predict the probability of DRAM errors at previously un-accessed locations. We compare the predictive performance of six machine learning algorithms, and find that a model incorporating physical knowledge of DRAM spatial structure outperforms purely statistical methods. Our findings both support expected physical behavior of DRAM hardware as well as providing a mechanism for real-time error prediction. We demonstrate real-world feasibility by training an error model on one supercomputer and effectively predicting errors on another. Our methods demonstrate the importance of spatial locality over temporal locality in DRAM errors, and show that relatively simple statistical models are effective at predicting future errors based on historical data, allowing proactive error mitigation.

More Details

Merge Network for a Non-Von Neumann Accumulate Accelerator in a 3D Chip

2018 IEEE International Conference on Rebooting Computing, ICRC 2018

Jain, Anirudh; Srikanth, Sriseshan; DeBenedictis, Erik; Krishna, Tushar

Logic-memory integration helps mitigate the von Neumann bottleneck, and this has enabled a new class of architectures that helps accelerate graph analytics and operations on sparse data streams. These utilize merge networks as a key unit of computation. Such networks are highly parallel and their performance increases with tighter coupling between logic and memory when a bitonic algorithm is used. This paper presents energy-efficient on-chip network architectures for merging key-value pairs using both word-parallel and bit-serial paradigms. The proposed architectures are capable of merging two rows of high bandwidth memory (HBM)worth of data in a manner that is completely overlapped with the reading from and writing back to such a row. Furthermore, their energy consumption is about an order of magnitude lower when compared to a naive crossbar based design.

More Details

Exploring and quantifying how communication behaviors in proxies relate to real applications

Proceedings of PMBS 2018: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis

Aaziz, Omar R.; Cook, Jeanine C.; Cook, Jonathan E.; Vaughan, Courtenay T.

Proxy applications, or proxies, are simple applications meant to exercise systems in a way that mimics real applications (their parents). However, characterizing the relationship between the behavior of parent and proxy applications is not an easy task. In prior work [1], we presented a data-driven methodology to characterize the relationship between parent and proxy applications based on collecting runtime data from both and then using data analytics to find their correspondence or divergence. We showed that it worked well for hardware counter data, but our initial attempt using MPI function data was less satisfactory. In this paper, we present an exploratory effort at making an improved quantification of the correspondence of communication behavior for proxies and their respective parent applications. We present experimental evidence of positive results using four proxy applications from the current ECP Proxy Application Suite and their corresponding parent applications (in the ECP application portfolio). Results show that each proxy analyzed is representative of its parent with respect to communication data. In conjunction with our method presented in [1] (correspondence between computation and memory behavior), we get a strong understanding of how well a proxy predicts the comprehensive performance of its parent.

More Details
Results 2801–3000 of 9,998
Results 2801–3000 of 9,998