Publications

Results 1–100 of 9,998
Skip to search filters

(SAI) stalled, active and idle: Characterizing power and performance of large-scale dragonfly networks

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Groves, Taylor G.; Grant, Ryan E.; Hemmert, Karl S.; Hammond, Simon D.; Levenhagen, Michael J.; Arnold, Dorian C.

Exascale networks are expected to comprise a significant part of the total monetary cost and 10-20% of the power budget allocated to exascale systems. Yet, our understanding of current and emerging workloads on these networks is limited. Left ignored, this knowledge gap likely will translate into missed opportunities for (1) improved application performance and (2) decreased power and monetary costs in next generation systems. This work targets a detailed understanding and analysis of the performance and utilization of the dragonfly network topology. Using the Structural Simulation Toolkit (SST) and a range of relevant workloads on a dragonfly topology of 110,592 nodes, we examine network design tradeoffs amongst execution time, power, bandwidth, and the number of global links. Our simulations report stalled, active and idle time on a per-port level of the fabric, in order to provide a detailed picture of future networks. The results of this work show potential savings of 3-10% of the exascale power budget and provide valuable insights to researchers looking for new opportunities to improve performance and increase power efficiency of next generation HPC systems.

More Details

Quantification of Uncertainty in Extreme Scale Computations

Debusschere, Bert D.; Jakeman, John D.; Chowdhary, Kamaljit S.; Safta, Cosmin S.; Sargsyan, Khachik S.; Rai, P.R.; Ghanem, R.G.; Knio, O.K.; La Maitre, O.L.; Winokur, J.W.; Li, G.L.; Ghattas, O.G.; Moser, R.M.; Simmons, C.S.; Alexanderian, A.A.; Gattiker, J.G.; Higdon, D.H.; Lawrence, E.L.; Bhat, S.B.; Marzouk, Y.M.; Bigoni, D.B.; Cui, T.C.; Parno, M.P.

Abstract not provided.

The Ground Truth Program: Simulations as Test Beds for Social Science Research Methods.

Computational and Mathematical Organization Theory

Naugle, Asmeret B.; Russell, Adam R.; Lakkaraju, Kiran L.; Swiler, Laura P.; Verzi, Stephen J.; Romero, Vicente J.

Social systems are uniquely complex and difficult to study, but understanding them is vital to solving the world’s problems. The Ground Truth program developed a new way of testing the research methods that attempt to understand and leverage the Human Domain and its associated complexities. The program developed simulations of social systems as virtual world test beds. Not only were these simulations able to produce data on future states of the system under various circumstances and scenarios, but their causal ground truth was also explicitly known. Research teams studied these virtual worlds, facilitating deep validation of causal inference, prediction, and prescription methods. The Ground Truth program model provides a way to test and validate research methods to an extent previously impossible, and to study the intricacies and interactions of different components of research.

More Details

30 cm Drop Tests

Kalinina, Elena A.; Ammerman, Douglas J.; Grey, Carissa A.; Arviso, Michael A.; Wright, Catherine W.; Lujan, Lucas A.; Flores, Gregg J.; Saltzstein, Sylvia J.

The data from the multi-modal transportation test conducted in 2017 demonstrated that the inputs from the shock events during all transport modes (truck, rail, and ship) were amplified from the cask to the spent commercial nuclear fuel surrogate assemblies. These data do not support common assumption that the cask content experiences the same accelerations as the cask itself. This was one of the motivations for conducting 30 cm drop tests. The goal of the 30 cm drop test is to measure accelerations and strains on the surrogate spent nuclear fuel assembly and to determine whether the fuel rods can maintain their integrity inside a transportation cask when dropped from a height of 30 cm. The 30 cm drop is the remaining NRC normal conditions of transportation regulatory requirement (10 CFR 71.71) for which there are no data on the actual surrogate fuel. Because the full-scale cask and impact limiters were not available (and their cost was prohibitive), it was proposed to achieve this goal by conducting three separate tests. This report describes the first two tests — the 30 cm drop test of the 1/3 scale cask (conducted in December 2018) and the 30 cm drop of the full-scale dummy assembly (conducted in June 2019). The dummy assembly represents the mass of a real spent nuclear fuel assembly. The third test (to be conducted in the spring of 2020) will be the 30 cm drop of the full-scale surrogate assembly. The surrogate assembly represents a real full-scale assembly in physical, material, and mechanical characteristics, as well as in mass.

More Details

3D optical sectioning with a new hyperspectral confocal fluorescence imaging system

Haaland, David M.; Sinclair, Michael B.; Jones, Howland D.; Timlin, Jerilyn A.; Bachand, George B.; Sasaki, Darryl Y.; Davidson, George S.; Van Benthem, Mark V.

A novel hyperspectral fluorescence microscope for high-resolution 3D optical sectioning of cells and other structures has been designed, constructed, and used to investigate a number of different problems. We have significantly extended new multivariate curve resolution (MCR) data analysis methods to deconvolve the hyperspectral image data and to rapidly extract quantitative 3D concentration distribution maps of all emitting species. The imaging system has many advantages over current confocal imaging systems including simultaneous monitoring of numerous highly overlapped fluorophores, immunity to autofluorescence or impurity fluorescence, enhanced sensitivity, and dramatically improved accuracy, reliability, and dynamic range. Efficient data compression in the spectral dimension has allowed personal computers to perform quantitative analysis of hyperspectral images of large size without loss of image quality. We have also developed and tested software to perform analysis of time resolved hyperspectral images using trilinear multivariate analysis methods. The new imaging system is an enabling technology for numerous applications including (1) 3D composition mapping analysis of multicomponent processes occurring during host-pathogen interactions, (2) monitoring microfluidic processes, (3) imaging of molecular motors and (4) understanding photosynthetic processes in wild type and mutant Synechocystis cyanobacteria.

More Details

A 3-D Vortex Code for Parachute Flow Predictions: VIPAR Version 1.0

Strickland, James H.; Homicz, Gregory F.; Porter, V.L.

This report describes a 3-D fluid mechanics code for predicting flow past bluff bodies whose surfaces can be assumed to be made up of shell elements that are simply connected. Version 1.0 of the VIPAR code (Vortex Inflation PARachute code) is described herein. This version contains several first order algorithms that we are in the process of replacing with higher order ones. These enhancements will appear in the next version of VIPAR. The present code contains a motion generator that can be used to produce a large class of rigid body motions. The present code has also been fully coupled to a structural dynamics code in which the geometry undergoes large time dependent deformations. Initial surface geometry is generated from triangular shell elements using a code such as Patran and is written into an ExodusII database file for subsequent input into VIPAR. Surface and wake variable information is output into two ExodusII files that can be post processed and viewed using software such as EnSight{trademark}.

More Details

A Bayesian MACHINE LEARNING FRAMEWORK FOR SELECTION OF THE STRAIN GRADIENT PLASTICITY MULTISCALE MODEL

ASME International Mechanical Engineering Congress and Exposition, Proceedings (IMECE)

Tan, Jingye; Maupin, Kathryn A.; Shao, Shuai; Faghihi, Danial

A class of sequential multiscale models investigated in this study consists of discrete dislocation dynamics (DDD) simulations and continuum strain gradient plasticity (SGP) models to simulate the size effect in plastic deformation of metallic micropillars. The high-fidelity DDD explicitly simulates the microstructural (dislocation) interactions. These simulations account for the effect of dislocation densities and their spatial distributions on plastic deformation. The continuum SGP captures the size-dependent plasticity in micropillars using two length parameters. The main challenge in predictive DDD-SGP multiscale modeling is selecting the proper constitutive relations for the SGP model, which is necessitated by the uncertainty in computational prediction due to DDD's microstructural randomness. This contribution addresses these challenges using a Bayesian learning and model selection framework. A family of SGP models with different fidelities and complexities is constructed using various constitutive relation assumptions. The parameters of the SGP models are then learned from a set of training data furnished by the DDD simulations of micropillars. Bayesian learning allows the assessment of the credibility of plastic deformation prediction by characterizing the microstructural variability and the uncertainty in training data. Additionally, the family of the possible SGP models is subjected to a Bayesian model selection to pick the model that adequately explains the DDD training data. The framework proposed in this study enables learning the physics-based multiscale model from uncertain observational data and determining the optimal computational model for predicting complex physical phenomena, i.e., size effect in plastic deformation of micropillars.

More Details

A Bayesian method for using simulator data to enhance human error probabilities assigned by existing HRA methods

Reliability Engineering and System Safety

Groth, Katrina G.; Swiler, Laura P.; Adams, Susan S.

In the past several years, several international agencies have begun to collect data on human performance in nuclear power plant simulators [1]. This data provides a valuable opportunity to improve human reliability analysis (HRA), but there improvements will not be realized without implementation of Bayesian methods. Bayesian methods are widely used in to incorporate sparse data into models in many parts of probabilistic risk assessment (PRA), but Bayesian methods have not been adopted by the HRA community. In this article, we provide a Bayesian methodology to formally use simulator data to refine the human error probabilities (HEPs) assigned by existing HRA methods. We demonstrate the methodology with a case study, wherein we use simulator data from the Halden Reactor Project to update the probability assignments from the SPAR-H method. The case study demonstrates the ability to use performance data, even sparse data, to improve existing HRA methods. Furthermore, this paper also serves as a demonstration of the value of Bayesian methods to improve the technical basis of HRA.

More Details

A block coordinate descent optimizer for classification problems exploiting convexity

CEUR Workshop Proceedings

Patel, Ravi G.; Trask, Nathaniel A.; Gulian, Mamikon G.; Cyr, Eric C.

Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid Newton/Gradient Descent (NGD) method is consistent with the interpretation of hidden layers as providing an adaptive basis and the linear layer as providing an optimal fit of the basis to data. By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to train the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training. The size of the Hessian in the second-order step scales only with the number weights in the linear layer and not the depth and width of the hidden layers; furthermore, the approach is applicable to arbitrary hidden layer architecture. Previous work applying this adaptive basis perspective to regression problems demonstrated significant improvements in accuracy at reduced training cost, and this work can be viewed as an extension of this approach to classification problems. We first prove that the resulting Hessian matrix is symmetric semi-definite, and that the Newton step realizes a global minimizer. By studying classification of manufactured two-dimensional point cloud data, we demonstrate both an improvement in validation error and a striking qualitative difference in the basis functions encoded in the hidden layer when trained using NGD. Application to image classification benchmarks for both dense and convolutional architectures reveals improved training accuracy, suggesting gains of second-order methods over gradient descent. A Tensorflow implementation of the algorithm is available at github.com/rgp62/.

More Details

A block preconditioner for an exact penalty formulation for stationary MHD

SIAM Journal on Scientific Computing

Phillips, Edward G.; Elman, Howard C.; Cyr, Eric C.; Shadid, John N.; Pawlowski, Roger P.

The magnetohydrodynamics (MHD) equations are used to model the flow of electrically conducting fluids in such applications as liquid metals and plasmas. This system of nonself-adjoint, nonlinear PDEs couples the Navier-Stokes equations for fluids and Maxwell's equations for electromagnetics. There has been recent interest in fully coupled solvers for the MHD system because they allow for fast steady-state solutions that do not require pseudo-time-stepping. When the fully coupled system is discretized, the strong coupling can make the resulting algebraic systems difficult to solve, requiring effective preconditioning of iterative methods for efficiency. In this work, we consider a finite element discretization of an exact penalty formulation for the stationary MHD equations posed in two-dimensional domains. This formulation has the benefit of implicitly enforcing the divergence-free condition on the magnetic field without requiring a Lagrange multiplier. We consider extending block preconditioning techniques developed for the Navier-Stokes equations to the full MHD system. We analyze operators arising in block decompositions from a continuous perspective and apply arguments based on the existence of approximate commutators to develop new preconditioners that account for the physical coupling. This results in a family of parameterized block preconditioners for both Picard and Newton linearizations. We develop an automated method for choosing the relevant parameters and demonstrate the robustness of these preconditioners for a range of the physical nondimensional parameters and with respect to mesh refinement.

More Details

A Block-Based Triangle Counting Algorithm on Heterogeneous Environments

IEEE Transactions on Parallel and Distributed Systems

Yasar, Abdurrahman; Rajamanickam, Sivasankaran R.; Berry, Jonathan W.; Catalyurek, Umit V.

Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.

More Details

A Brief Description of the Kokkos implementation of the SNAP potential in ExaMiniMD

Thompson, Aidan P.; Trott, Christian R.

Within the EXAALT project, the SNAP [1] approach is being used to develop high accuracy potentials for use in large-scale long-time molecular dynamics simulations of materials behavior. In particular, we have developed a new SNAP potential that is suitable for describing the interplay between helium atoms and vacancies in high-temperature tungsten[2]. This model is now being used to study plasma-surface interactions in nuclear fusion reactors for energy production. The high-accuracy of SNAP potentials comes at the price of increased computational cost per atom and increased computational complexity. The increased cost is mitigated by improvements in strong scaling that can be achieved using advanced algorithms [3].

More Details

A brief parallel I/O tutorial

Ward, Harry L.

This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a way that minimizes interaction with the I/O system, maximizes the amount of data moved per call, and provides the I/O system the most information about the I/O transfer per request.

More Details

A case study in working with cell-centered data

Crossno, Patricia J.

This case study provides examples of how some simple decisions the authors made in structuring their algorithms for handling cell-centered data can dramatically influence the results. Although they all know that these decisions produce variations in results, they think that they underestimate the potential magnitude of the differences. More importantly, the users of the codes may not be aware that these choices have been made or what they mean to the resulting visualizations of their data. This raises the question of whether or not these decisions are inadvertently distorting user interpretations of data sets.

More Details

A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing

Vineyard, Craig M.; Verzi, Stephen J.

As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilize memory.

More Details

A checkpoint compression study for high-performance computing systems

International Journal of High Performance Computing Applications

Ibtesham, Dewan; Ferreira, Kurt B.; Arnold, Dorian

As high-performance computing systems continue to increase in size and complexity, higher failure rates and increased overheads for checkpoint/restart (CR) protocols have raised concerns about the practical viability of CR protocols for future systems. Previously, compression has proven to be a viable approach for reducing checkpoint data volumes and, thereby, reducing CR protocol overhead leading to improved application performance. In this article, we further explore compression-based CR optimization by exploring its baseline performance and scaling properties, evaluating whether improved compression algorithms might lead to even better application performance and comparing checkpoint compression against and alongside other software- and hardware-based optimizations. Our results highlights are that: (1) compression is a very viable CR optimization; (2) generic, text-based compression algorithms appear to perform near optimally for checkpoint data compression and faster compression algorithms will not lead to better application performance; (3) compression-based optimizations fare well against and alongside other software-based optimizations; and (4) while hardware-based optimizations outperform software-based ones, they are not as cost effective.

More Details

A christoffel function weighted least squares algorithm for collocation approximations

Mathematics of Computation

Narayan, Akil; Jakeman, John D.; Zhou, Tao

We propose, theoretically investigate, and numerically validate an algorithm for the Monte Carlo solution of least-squares polynomial approximation problems in a collocation framework. Our investigation is motivated by applications in the collocation approximation of parametric functions, which frequently entails construction of surrogates via orthogonal polynomials. A standard Monte Carlo approach would draw samples according to the density defining the orthogonal polynomial family. Our proposed algorithm instead samples with respect to the (weighted) pluripotential equilibrium measure of the domain, and subsequently solves a weighted least-squares problem, with weights given by evaluations of the Christoffel function. We present theoretical analysis to motivate the algorithm, and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest.

More Details

A coarsening method for linear peridynamics

Silling, Stewart A.

A method is obtained for deriving peridynamic material models for a sequence of increasingly coarsened descriptions of a body. The starting point is a known detailed, small scale linearized state-based description. Each successively coarsened model excludes some of the aterial present in the previous model, and the length scale increases accordingly. This excluded material, while not present explicitly in the coarsened model, is nevertheless taken into account implicitly through its effect on the forces in the coarsened material. Numerical examples emonstrate that the method accurately reproduces the effective elastic properties of a composite as well as the effect of a small defect in a homogeneous medium.

More Details

A cognitive-consistency based model of population wide attitude change

Lakkaraju, Kiran L.; Speed, Ann S.

Attitudes play a significant role in determining how individuals process information and behave. In this paper we have developed a new computational model of population wide attitude change that captures the social level: how individuals interact and communicate information, and the cognitive level: how attitudes and concept interact with each other. The model captures the cognitive aspect by representing each individuals as a parallel constraint satisfaction network. The dynamics of this model are explored through a simple attitude change experiment where we vary the social network and distribution of attitudes in a population.

More Details
Results 1–100 of 9,998
Results 1–100 of 9,998