Publications

Results 1–200 of 9,998
Skip to search filters

(SAI) stalled, active and idle: Characterizing power and performance of large-scale dragonfly networks

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Groves, Taylor G.; Grant, Ryan E.; Hemmert, Karl S.; Hammond, Simon D.; Levenhagen, Michael J.; Arnold, Dorian C.

Exascale networks are expected to comprise a significant part of the total monetary cost and 10-20% of the power budget allocated to exascale systems. Yet, our understanding of current and emerging workloads on these networks is limited. Left ignored, this knowledge gap likely will translate into missed opportunities for (1) improved application performance and (2) decreased power and monetary costs in next generation systems. This work targets a detailed understanding and analysis of the performance and utilization of the dragonfly network topology. Using the Structural Simulation Toolkit (SST) and a range of relevant workloads on a dragonfly topology of 110,592 nodes, we examine network design tradeoffs amongst execution time, power, bandwidth, and the number of global links. Our simulations report stalled, active and idle time on a per-port level of the fabric, in order to provide a detailed picture of future networks. The results of this work show potential savings of 3-10% of the exascale power budget and provide valuable insights to researchers looking for new opportunities to improve performance and increase power efficiency of next generation HPC systems.

More Details

Quantification of Uncertainty in Extreme Scale Computations

Debusschere, Bert D.; Jakeman, John D.; Chowdhary, Kamaljit S.; Safta, Cosmin S.; Sargsyan, Khachik S.; Rai, P.R.; Ghanem, R.G.; Knio, O.K.; La Maitre, O.L.; Winokur, J.W.; Li, G.L.; Ghattas, O.G.; Moser, R.M.; Simmons, C.S.; Alexanderian, A.A.; Gattiker, J.G.; Higdon, D.H.; Lawrence, E.L.; Bhat, S.B.; Marzouk, Y.M.; Bigoni, D.B.; Cui, T.C.; Parno, M.P.

Abstract not provided.

The Ground Truth Program: Simulations as Test Beds for Social Science Research Methods.

Computational and Mathematical Organization Theory

Naugle, Asmeret B.; Russell, Adam R.; Lakkaraju, Kiran L.; Swiler, Laura P.; Verzi, Stephen J.; Romero, Vicente J.

Social systems are uniquely complex and difficult to study, but understanding them is vital to solving the world’s problems. The Ground Truth program developed a new way of testing the research methods that attempt to understand and leverage the Human Domain and its associated complexities. The program developed simulations of social systems as virtual world test beds. Not only were these simulations able to produce data on future states of the system under various circumstances and scenarios, but their causal ground truth was also explicitly known. Research teams studied these virtual worlds, facilitating deep validation of causal inference, prediction, and prescription methods. The Ground Truth program model provides a way to test and validate research methods to an extent previously impossible, and to study the intricacies and interactions of different components of research.

More Details

30 cm Drop Tests

Kalinina, Elena A.; Ammerman, Douglas J.; Grey, Carissa A.; Arviso, Michael A.; Wright, Catherine W.; Lujan, Lucas A.; Flores, Gregg J.; Saltzstein, Sylvia J.

The data from the multi-modal transportation test conducted in 2017 demonstrated that the inputs from the shock events during all transport modes (truck, rail, and ship) were amplified from the cask to the spent commercial nuclear fuel surrogate assemblies. These data do not support common assumption that the cask content experiences the same accelerations as the cask itself. This was one of the motivations for conducting 30 cm drop tests. The goal of the 30 cm drop test is to measure accelerations and strains on the surrogate spent nuclear fuel assembly and to determine whether the fuel rods can maintain their integrity inside a transportation cask when dropped from a height of 30 cm. The 30 cm drop is the remaining NRC normal conditions of transportation regulatory requirement (10 CFR 71.71) for which there are no data on the actual surrogate fuel. Because the full-scale cask and impact limiters were not available (and their cost was prohibitive), it was proposed to achieve this goal by conducting three separate tests. This report describes the first two tests — the 30 cm drop test of the 1/3 scale cask (conducted in December 2018) and the 30 cm drop of the full-scale dummy assembly (conducted in June 2019). The dummy assembly represents the mass of a real spent nuclear fuel assembly. The third test (to be conducted in the spring of 2020) will be the 30 cm drop of the full-scale surrogate assembly. The surrogate assembly represents a real full-scale assembly in physical, material, and mechanical characteristics, as well as in mass.

More Details

3D optical sectioning with a new hyperspectral confocal fluorescence imaging system

Haaland, David M.; Sinclair, Michael B.; Jones, Howland D.; Timlin, Jerilyn A.; Bachand, George B.; Sasaki, Darryl Y.; Davidson, George S.; Van Benthem, Mark V.

A novel hyperspectral fluorescence microscope for high-resolution 3D optical sectioning of cells and other structures has been designed, constructed, and used to investigate a number of different problems. We have significantly extended new multivariate curve resolution (MCR) data analysis methods to deconvolve the hyperspectral image data and to rapidly extract quantitative 3D concentration distribution maps of all emitting species. The imaging system has many advantages over current confocal imaging systems including simultaneous monitoring of numerous highly overlapped fluorophores, immunity to autofluorescence or impurity fluorescence, enhanced sensitivity, and dramatically improved accuracy, reliability, and dynamic range. Efficient data compression in the spectral dimension has allowed personal computers to perform quantitative analysis of hyperspectral images of large size without loss of image quality. We have also developed and tested software to perform analysis of time resolved hyperspectral images using trilinear multivariate analysis methods. The new imaging system is an enabling technology for numerous applications including (1) 3D composition mapping analysis of multicomponent processes occurring during host-pathogen interactions, (2) monitoring microfluidic processes, (3) imaging of molecular motors and (4) understanding photosynthetic processes in wild type and mutant Synechocystis cyanobacteria.

More Details

A 3-D Vortex Code for Parachute Flow Predictions: VIPAR Version 1.0

Strickland, James H.; Homicz, Gregory F.; Porter, V.L.

This report describes a 3-D fluid mechanics code for predicting flow past bluff bodies whose surfaces can be assumed to be made up of shell elements that are simply connected. Version 1.0 of the VIPAR code (Vortex Inflation PARachute code) is described herein. This version contains several first order algorithms that we are in the process of replacing with higher order ones. These enhancements will appear in the next version of VIPAR. The present code contains a motion generator that can be used to produce a large class of rigid body motions. The present code has also been fully coupled to a structural dynamics code in which the geometry undergoes large time dependent deformations. Initial surface geometry is generated from triangular shell elements using a code such as Patran and is written into an ExodusII database file for subsequent input into VIPAR. Surface and wake variable information is output into two ExodusII files that can be post processed and viewed using software such as EnSight{trademark}.

More Details

A Bayesian MACHINE LEARNING FRAMEWORK FOR SELECTION OF THE STRAIN GRADIENT PLASTICITY MULTISCALE MODEL

ASME International Mechanical Engineering Congress and Exposition, Proceedings (IMECE)

Tan, Jingye; Maupin, Kathryn A.; Shao, Shuai; Faghihi, Danial

A class of sequential multiscale models investigated in this study consists of discrete dislocation dynamics (DDD) simulations and continuum strain gradient plasticity (SGP) models to simulate the size effect in plastic deformation of metallic micropillars. The high-fidelity DDD explicitly simulates the microstructural (dislocation) interactions. These simulations account for the effect of dislocation densities and their spatial distributions on plastic deformation. The continuum SGP captures the size-dependent plasticity in micropillars using two length parameters. The main challenge in predictive DDD-SGP multiscale modeling is selecting the proper constitutive relations for the SGP model, which is necessitated by the uncertainty in computational prediction due to DDD's microstructural randomness. This contribution addresses these challenges using a Bayesian learning and model selection framework. A family of SGP models with different fidelities and complexities is constructed using various constitutive relation assumptions. The parameters of the SGP models are then learned from a set of training data furnished by the DDD simulations of micropillars. Bayesian learning allows the assessment of the credibility of plastic deformation prediction by characterizing the microstructural variability and the uncertainty in training data. Additionally, the family of the possible SGP models is subjected to a Bayesian model selection to pick the model that adequately explains the DDD training data. The framework proposed in this study enables learning the physics-based multiscale model from uncertain observational data and determining the optimal computational model for predicting complex physical phenomena, i.e., size effect in plastic deformation of micropillars.

More Details

A Bayesian method for using simulator data to enhance human error probabilities assigned by existing HRA methods

Reliability Engineering and System Safety

Groth, Katrina G.; Swiler, Laura P.; Adams, Susan S.

In the past several years, several international agencies have begun to collect data on human performance in nuclear power plant simulators [1]. This data provides a valuable opportunity to improve human reliability analysis (HRA), but there improvements will not be realized without implementation of Bayesian methods. Bayesian methods are widely used in to incorporate sparse data into models in many parts of probabilistic risk assessment (PRA), but Bayesian methods have not been adopted by the HRA community. In this article, we provide a Bayesian methodology to formally use simulator data to refine the human error probabilities (HEPs) assigned by existing HRA methods. We demonstrate the methodology with a case study, wherein we use simulator data from the Halden Reactor Project to update the probability assignments from the SPAR-H method. The case study demonstrates the ability to use performance data, even sparse data, to improve existing HRA methods. Furthermore, this paper also serves as a demonstration of the value of Bayesian methods to improve the technical basis of HRA.

More Details

A block coordinate descent optimizer for classification problems exploiting convexity

CEUR Workshop Proceedings

Patel, Ravi G.; Trask, Nathaniel A.; Gulian, Mamikon G.; Cyr, Eric C.

Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method to train deep neural networks for classification tasks that exploits global convexity of the cross-entropy loss in the weights of the linear layer. Our hybrid Newton/Gradient Descent (NGD) method is consistent with the interpretation of hidden layers as providing an adaptive basis and the linear layer as providing an optimal fit of the basis to data. By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to train the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training. The size of the Hessian in the second-order step scales only with the number weights in the linear layer and not the depth and width of the hidden layers; furthermore, the approach is applicable to arbitrary hidden layer architecture. Previous work applying this adaptive basis perspective to regression problems demonstrated significant improvements in accuracy at reduced training cost, and this work can be viewed as an extension of this approach to classification problems. We first prove that the resulting Hessian matrix is symmetric semi-definite, and that the Newton step realizes a global minimizer. By studying classification of manufactured two-dimensional point cloud data, we demonstrate both an improvement in validation error and a striking qualitative difference in the basis functions encoded in the hidden layer when trained using NGD. Application to image classification benchmarks for both dense and convolutional architectures reveals improved training accuracy, suggesting gains of second-order methods over gradient descent. A Tensorflow implementation of the algorithm is available at github.com/rgp62/.

More Details

A block preconditioner for an exact penalty formulation for stationary MHD

SIAM Journal on Scientific Computing

Phillips, Edward G.; Elman, Howard C.; Cyr, Eric C.; Shadid, John N.; Pawlowski, Roger P.

The magnetohydrodynamics (MHD) equations are used to model the flow of electrically conducting fluids in such applications as liquid metals and plasmas. This system of nonself-adjoint, nonlinear PDEs couples the Navier-Stokes equations for fluids and Maxwell's equations for electromagnetics. There has been recent interest in fully coupled solvers for the MHD system because they allow for fast steady-state solutions that do not require pseudo-time-stepping. When the fully coupled system is discretized, the strong coupling can make the resulting algebraic systems difficult to solve, requiring effective preconditioning of iterative methods for efficiency. In this work, we consider a finite element discretization of an exact penalty formulation for the stationary MHD equations posed in two-dimensional domains. This formulation has the benefit of implicitly enforcing the divergence-free condition on the magnetic field without requiring a Lagrange multiplier. We consider extending block preconditioning techniques developed for the Navier-Stokes equations to the full MHD system. We analyze operators arising in block decompositions from a continuous perspective and apply arguments based on the existence of approximate commutators to develop new preconditioners that account for the physical coupling. This results in a family of parameterized block preconditioners for both Picard and Newton linearizations. We develop an automated method for choosing the relevant parameters and demonstrate the robustness of these preconditioners for a range of the physical nondimensional parameters and with respect to mesh refinement.

More Details

A Block-Based Triangle Counting Algorithm on Heterogeneous Environments

IEEE Transactions on Parallel and Distributed Systems

Yasar, Abdurrahman; Rajamanickam, Sivasankaran R.; Berry, Jonathan W.; Catalyurek, Umit V.

Triangle counting is a fundamental building block in graph algorithms. In this article, we propose a block-based triangle counting algorithm to reduce data movement during both sequential and parallel execution. Our block-based formulation makes the algorithm naturally suitable for heterogeneous architectures. The problem of partitioning the adjacency matrix of a graph is well-studied. Our task decomposition goes one step further: it partitions the set of triangles in the graph. By streaming these small tasks to compute resources, we can solve problems that do not fit on a device. We demonstrate the effectiveness of our approach by providing an implementation on a compute node with multiple sockets, cores and GPUs. The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. Using that metric, our approach is 20 percent faster. When copy times are included, our algorithm takes 3.2 seconds. This is 5.6 times faster than the fastest published CPU-only time.

More Details

A Brief Description of the Kokkos implementation of the SNAP potential in ExaMiniMD

Thompson, Aidan P.; Trott, Christian R.

Within the EXAALT project, the SNAP [1] approach is being used to develop high accuracy potentials for use in large-scale long-time molecular dynamics simulations of materials behavior. In particular, we have developed a new SNAP potential that is suitable for describing the interplay between helium atoms and vacancies in high-temperature tungsten[2]. This model is now being used to study plasma-surface interactions in nuclear fusion reactors for energy production. The high-accuracy of SNAP potentials comes at the price of increased computational cost per atom and increased computational complexity. The increased cost is mitigated by improvements in strong scaling that can be achieved using advanced algorithms [3].

More Details

A brief parallel I/O tutorial

Ward, Harry L.

This document provides common best practices for the efficient utilization of parallel file systems for analysts and application developers. A multi-program, parallel supercomputer is able to provide effective compute power by aggregating a host of lower-power processors using a network. The idea, in general, is that one either constructs the application to distribute parts to the different nodes and processors available and then collects the result (a parallel application), or one launches a large number of small jobs, each doing similar work on different subsets (a campaign). The I/O system on these machines is usually implemented as a tightly-coupled, parallel application itself. It is providing the concept of a 'file' to the host applications. The 'file' is an addressable store of bytes and that address space is global in nature. In essence, it is providing a global address space. Beyond the simple reality that the I/O system is normally composed of a small, less capable, collection of hardware, that concept of a global address space will cause problems if not very carefully utilized. How much of a problem and the ways in which those problems manifest will be different, but that it is problem prone has been well established. Worse, the file system is a shared resource on the machine - a system service. What an application does when it uses the file system impacts all users. It is not the case that some portion of the available resource is reserved. Instead, the I/O system responds to requests by scheduling and queuing based on instantaneous demand. Using the system well contributes to the overall throughput on the machine. From a solely self-centered perspective, using it well reduces the time that the application or campaign is subject to impact by others. The developer's goal should be to accomplish I/O in a way that minimizes interaction with the I/O system, maximizes the amount of data moved per call, and provides the I/O system the most information about the I/O transfer per request.

More Details

A case study in working with cell-centered data

Crossno, Patricia J.

This case study provides examples of how some simple decisions the authors made in structuring their algorithms for handling cell-centered data can dramatically influence the results. Although they all know that these decisions produce variations in results, they think that they underestimate the potential magnitude of the differences. More importantly, the users of the codes may not be aware that these choices have been made or what they mean to the resulting visualizations of their data. This raises the question of whether or not these decisions are inadvertently distorting user interpretations of data sets.

More Details

A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing

Vineyard, Craig M.; Verzi, Stephen J.

As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilize memory.

More Details

A checkpoint compression study for high-performance computing systems

International Journal of High Performance Computing Applications

Ibtesham, Dewan; Ferreira, Kurt B.; Arnold, Dorian

As high-performance computing systems continue to increase in size and complexity, higher failure rates and increased overheads for checkpoint/restart (CR) protocols have raised concerns about the practical viability of CR protocols for future systems. Previously, compression has proven to be a viable approach for reducing checkpoint data volumes and, thereby, reducing CR protocol overhead leading to improved application performance. In this article, we further explore compression-based CR optimization by exploring its baseline performance and scaling properties, evaluating whether improved compression algorithms might lead to even better application performance and comparing checkpoint compression against and alongside other software- and hardware-based optimizations. Our results highlights are that: (1) compression is a very viable CR optimization; (2) generic, text-based compression algorithms appear to perform near optimally for checkpoint data compression and faster compression algorithms will not lead to better application performance; (3) compression-based optimizations fare well against and alongside other software-based optimizations; and (4) while hardware-based optimizations outperform software-based ones, they are not as cost effective.

More Details

A christoffel function weighted least squares algorithm for collocation approximations

Mathematics of Computation

Narayan, Akil; Jakeman, John D.; Zhou, Tao

We propose, theoretically investigate, and numerically validate an algorithm for the Monte Carlo solution of least-squares polynomial approximation problems in a collocation framework. Our investigation is motivated by applications in the collocation approximation of parametric functions, which frequently entails construction of surrogates via orthogonal polynomials. A standard Monte Carlo approach would draw samples according to the density defining the orthogonal polynomial family. Our proposed algorithm instead samples with respect to the (weighted) pluripotential equilibrium measure of the domain, and subsequently solves a weighted least-squares problem, with weights given by evaluations of the Christoffel function. We present theoretical analysis to motivate the algorithm, and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest.

More Details

A coarsening method for linear peridynamics

Silling, Stewart A.

A method is obtained for deriving peridynamic material models for a sequence of increasingly coarsened descriptions of a body. The starting point is a known detailed, small scale linearized state-based description. Each successively coarsened model excludes some of the aterial present in the previous model, and the length scale increases accordingly. This excluded material, while not present explicitly in the coarsened model, is nevertheless taken into account implicitly through its effect on the forces in the coarsened material. Numerical examples emonstrate that the method accurately reproduces the effective elastic properties of a composite as well as the effect of a small defect in a homogeneous medium.

More Details

A cognitive-consistency based model of population wide attitude change

Lakkaraju, Kiran L.; Speed, Ann S.

Attitudes play a significant role in determining how individuals process information and behave. In this paper we have developed a new computational model of population wide attitude change that captures the social level: how individuals interact and communicate information, and the cognitive level: how attitudes and concept interact with each other. The model captures the cognitive aspect by representing each individuals as a parallel constraint satisfaction network. The dynamics of this model are explored through a simple attitude change experiment where we vary the social network and distribution of attitudes in a population.

More Details

A combinatorial method for tracing objects using semantics of their shape

Proceedings - Applied Imagery Pattern Recognition Workshop

Diegert, Carl F.

We present a shape-first approach to finding automobiles and trucks in overhead images and include results from our analysis of an image from the Overhead Imaging Research Dataset [1]. For the OIRDS, our shape-first approach traces candidate vehicle outlines by exploiting knowledge about an overhead image of a vehicle: a vehicle's outline fits into a rectangle, this rectangle is sized to allow vehicles to use local roads, and rectangles from two different vehicles are disjoint. Our shape-first approach can efficiently process high-resolution overhead imaging over wide areas to provide tips and cues for human analysts, or for subsequent automatic processing using machine learning or other analysis based on color, tone, pattern, texture, size, and/or location (shape first). In fact, computationally-intensive complex structural, syntactic, and statistical analysis may be possible when a shape-first work flow sends a list of specific tips and cues down a processing pipeline rather than sending the whole of wide area imaging information. This data flow may fit well when bandwidth is limited between computers delivering ad hoc image exploitation and an imaging sensor. As expected, our early computational experiments find that the shape-first processing stage appears to reliably detect rectangular shapes from vehicles. More intriguing is that our computational experiments with six-inch GSD OIRDS benchmark images show that the shape-first stage can be efficient, and that candidate vehicle locations corresponding to features that do not include vehicles are unlikely to trigger tips and cues. We found that stopping with just the shape-first list of candidate vehicle locations, and then solving a weighted, maximal independent vertex set problem to resolve conflicts among candidate vehicle locations, often correctly traces the vehicles in an OIRDS scene. © 2010 IEEE.

More Details

A combinatorial model for dentate gyrus sparse coding

Neural Computation

Severa, William M.; Parekh, Ojas D.; James, Conrad D.; Aimone, James B.

The dentate gyrus forms a critical link between the entorhinal cortex and CA3 by providing a sparse version of the signal. Concurrent with this increase in sparsity, a widely accepted theory suggests the dentate gyrus performs pattern separation-similar inputs yield decorrelated outputs. Although an active region of study and theory, few logically rigorous arguments detail the dentate gyrus's (DG) coding.We suggest a theoretically tractable, combinatorial model for this action. The model provides formal methods for a highly redundant, arbitrarily sparse, and decorrelated output signal. To explore the value of this model framework, we assess how suitable it is for two notable aspects of DG coding: how it can handle the highly structured grid cell representation in the input entorhinal cortex region and the presence of adult neurogenesis, which has been proposed to produce a heterogeneous code in the DG.We find tailoring themodel to grid cell input yields expansion parameters consistent with the literature. In addition, the heterogeneous coding reflects activity gradation observed experimentally. Finally,we connect this approach with more conventional binary threshold neural circuit models via a formal embedding.

More Details

A comparative critical analysis of modern task-parallel runtimes

Wheeler, Kyle B.; Stark, Dylan S.

The rise in node-level parallelism has increased interest in task-based parallel runtimes for a wide array of application areas. Applications have a wide variety of task spawning patterns which frequently change during the course of application execution, based on the algorithm or solver kernel in use. Task scheduling and load balance regimes, however, are often highly optimized for specific patterns. This paper uses four basic task spawning patterns to quantify the impact of specific scheduling policy decisions on execution time. We compare the behavior of six publicly available tasking runtimes: Intel Cilk, Intel Threading Building Blocks (TBB), Intel OpenMP, GCC OpenMP, Qthreads, and High Performance ParalleX (HPX). With the exception of Qthreads, the runtimes prove to have schedulers that are highly sensitive to application structure. No runtime is able to provide the best performance in all cases, and those that do provide the best performance in some cases, unfortunately, provide extremely poor performance when application structure does not match the schedulers assumptions.

More Details

A comparison of adjoint and data-centric verification techniques

Cyr, Eric C.; Shadid, John N.; Pawlowski, Roger P.

This document summarizes the results from a level 3 milestone study within the CASL VUQ effort. We compare the adjoint-based a posteriori error estimation approach with a recent variant of a data-centric verification technique. We provide a brief overview of each technique and then we discuss their relative advantages and disadvantages. We use Drekar::CFD to produce numerical results for steady-state Navier Stokes and SARANS approximations. 3

More Details

A comparison of computational methods for the maximum contact map overlap of protein pairs

Proposed for publication in INFORMS J on Computing.

Hart, William E.; Hart, William E.; Carr, Robert D.

The maximum contact map overlap (MAX-CMO) between a pair of protein structures can be used as a measure of protein similarity. It is a purely topological measure and does not depend on the sequence of the pairs involved in the comparison. More importantly, the MAX-CMO present a very favorable mathematical structure which allows the formulation of integer, linear and Lagrangian models that can be used to obtain guarantees of optimality. It is not the intention of this paper to discuss the mathematical properties of MAX-CMO in detail as this has been dealt elsewhere. In this paper we compare three algorithms that can be used to obtain maximum contact map overlaps between protein structures. We will point to the weaknesses and strengths of each one. It is our hope that this paper will encourage researchers to develop new and improve methods for protein comparison based on MAX-CMO.

More Details

A comparison of eigensolvers for large-scale 3D modal analysis using AMG-preconditioned iterative methods

International Journal for Numerical Methods in Engineering

Arbenz, Peter; Hetmaniuk, Ulrich L.; Lehoucq, Richard B.; Tuminaro, Raymond S.

The goal of our paper is to compare a number of algorithms for computing a large number of eigenvectors of the generalized symmetric eigenvalue problem arising from a modal analysis of elastic structures. The shift-invert Lanczos algorithm has emerged as the workhorse for the solution of this generalized eigenvalue problem; however, a sparse direct factorization is required for the resulting set of linear equations. Instead, our paper considers the use of preconditioned iterative methods. We present a brief review of available preconditioned eigensolvers followed by a numerical comparison on three problems using a scalable algebraic multigrid (AMG) preconditioner. Copyright © 2005 John Wiley & Sons, Ltd.

More Details

A comparison of floating point and logarithmic number systems for FPGAs

Proceedings - 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2005

Haselman, Michael; Beauchamp, Michael; Wood, Aaron; Hauck, Scott; Underwood, Keith; Hemmert, Karl S.

There have been many papers proposing the use of logarithmic numbers (LNS) as an alternative to floating point because of simpler multiplication, division and exponentiation computations [1,4-9,13]. However, this advantage comes at the cost of complicated, inexact addition and subtraction, as well as the need to convert between the formats. In this work, we created a parameterized LNS library of computational units and compared them to an existing floating point library. Specifically, we considered multiplication, division, addition, subtraction, and format conversion to determine when one format should be used over the other and when it is advantageous to change formats during a calculation. © 2005 IEEE.

More Details

A comparison of high-level programming choices for incomplete sparse factorization across different architectures

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Booth, Joshua D.; Kim, Kyungjoo K.; Rajamanickam, Sivasankaran R.

All many-core systems require fine-grained shared memory parallelism, however the most efficient way to extract such parallelism is far from trivial. Fine-grained parallel algorithms face various performance trade-offs related to tasking, accesses to global data-structures, and use of shared cache. While programming models provide high level abstractions, such as data and task parallelism, algorithmic choices still remain open on how to best implement irregular algorithms, such as sparse factorizations, while taking into account the trade-offs mentioned above. In this paper, we compare these performance trade-offs for task and data parallelism on different hardware architectures such as Intel Sandy Bridge, Intel Xeon Phi, and IBM Power8. We do this by comparing the scaling of a new task-parallel incomplete sparse Cholesky factorization called Tacho and a new data-parallel incomplete sparse LU factorization called Basker. Both solvers utilize Kokkos programming model and were developed within the ShyLU package of Trilinos. Using these two codes we demonstrate how high-level programming changes affect performance and overhead costs on multiple multi/many-core systems. We find that Kokkos is able to provide comparable performance with both parallel-for and task/futures on traditional x86 multicores. However, the choice of which high-level abstraction to use on many-core systems depends on both the architectures and input matrices.

More Details

A comparison of inexact newton and coordinate descent mesh optimization techniques

Knupp, Patrick K.

We compare inexact Newton and coordinate descent methods for optimizing the quality of a mesh by repositioning the vertices, where quality is measured by the harmonic mean of the mean-ratio metric. The effects of problem size, element size heterogeneity, and various vertex displacement schemes on the performance of these algorithms are assessed for a series of tetrahedral meshes.

More Details

A comparison of Lagrangian/Eulerian approaches for tracking the kinematics of high deformation solid motion

Ames, Thomas L.; Robinson, Allen C.

The modeling of solids is most naturally placed within a Lagrangian framework because it requires constitutive models which depend on knowledge of the original material orientations and subsequent deformations. Detailed kinematic information is needed to ensure material frame indifference which is captured through the deformation gradient F. Such information can be tracked easily in a Lagrangian code. Unfortunately, not all problems can be easily modeled using Lagrangian concepts due to severe distortions in the underlying motion. Either a Lagrangian/Eulerian or a pure Eulerian modeling framework must be introduced. We discuss and contrast several Lagrangian/Eulerian approaches for keeping track of the details of material kinematics.

More Details

A comparison of methods for representing sparsely sampled random quantities

Romero, Vicente J.; Swiler, Laura P.; Urbina, Angel U.

This report discusses the treatment of uncertainties stemming from relatively few samples of random quantities. The importance of this topic extends beyond experimental data uncertainty to situations involving uncertainty in model calibration, validation, and prediction. With very sparse data samples it is not practical to have a goal of accurately estimating the underlying probability density function (PDF). Rather, a pragmatic goal is that the uncertainty representation should be conservative so as to bound a specified percentile range of the actual PDF, say the range between 0.025 and .975 percentiles, with reasonable reliability. A second, opposing objective is that the representation not be overly conservative; that it minimally over-estimate the desired percentile range of the actual PDF. The presence of the two opposing objectives makes the sparse-data uncertainty representation problem interesting and difficult. In this report, five uncertainty representation techniques are characterized for their performance on twenty-one test problems (over thousands of trials for each problem) according to these two opposing objectives and other performance measures. Two of the methods, statistical Tolerance Intervals and a kernel density approach specifically developed for handling sparse data, exhibit significantly better overall performance than the others.

More Details

A comparison of Navier Stokes and network models to predict chemical transport in municipal water distribution systems

World Water Congress 2005: Impacts of Global Climate Change - Proceedings of the 2005 World Water and Environmental Resources Congress

Van Bloemen Waanders, B.; Hammond, G.; Shadid, John N.; Collis, S.; Murray, R.

We investigate the accuracy of chemical transport in network models for small geometric configurations. Network model have successfully simulated the general operations of large water distribution systems. However, some of the simplifying assumptions associated with the implementation may cause inaccuracies if chemicals need to be carefully characterized at a high level of detail. In particular, we are interested in precise transport behavior so that inversion and control problems can be applied to water distribution networks. As an initial phase, Navier Stokes combined with a convection-diffusion formulation was used to characterize the mixing behavior at a pipe intersection in two dimensions. Our numerical models predict only on the order of 12-14 % of the chemical to be mixed with the other inlet pipe. Laboratory results show similar behavior and suggest that even if our numerical model is able to resolve turbulence, it may not improve the mixing behavior. This conclusion may not be appropriate however for other sets of operating conditions, and therefore we have started to develop a 3D implementation. Preliminary results for duct geometry are presented. © copyright ASCE 2005.

More Details

A comparison of power management mechanisms: P-States vs. node-level power cap control

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Pedretti, Kevin P.; Grant, Ryan E.; Laros, James H.; Levenhagen, Michael J.; Olivier, Stephen L.; Ward, Harry L.; Younge, Andrew J.

Large-scale HPC systems increasingly incorporate sophisticated power management control mechanisms. While these mechanisms are potentially useful for performing energy and/or power-aware job scheduling and resource management (EPA JSRM), greater understanding of their operation and performance impact on real-world applications is required before they can be applied effectively in practice. In this paper, we compare static p-state control to static node-level power cap control on a Cray XC system. Empirical experiments are performed to evaluate node-to-node performance and power usage variability for the two mechanisms. We find that static p-state control produces more predictable and higher performance characteristics than static node-level power cap control at a given power level. However, this performance benefit is at the cost of less predictable power usage. Static node-level power cap control produces predictable power usage but with more variable performance characteristics. Our results are not intended to show that one mechanism is better than the other. Rather, our results demonstrate that the mechanisms are complementary to one another and highlight their potential for combined use in achieving effective EPA JSRM solutions.

More Details

A comparison of two optimization methods for mesh quality improvement

Proposed for publication in Engineering with Computers.

Knupp, Patrick K.

We compare inexact Newton and coordinate descent optimization methods for improving the quality of a mesh by repositioning the vertices, where the overall quality is measured by the harmonic mean of the mean-ratio metric. The effects of problem size, element size heterogeneity, and various vertex displacement schemes on the performance of these algorithms are assessed for a series of tetrahedral meshes.

More Details

A computational framework for ontologically storing and analyzing very large overhead image sets

Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial 2014

Brost, Randolph B.; Rintoul, Mark D.; McLendon, William C.; Strip, David R.; Parekh, Ojas D.; Woodbridge, Diane W.

We describe a computational approach to remote sensing image analysis that addresses many of the classic problems associated with storage, search, and query. This process starts by automatically annotating the fundamental objects in the image data set that will be used as a basis for an ontology, including both the objects (such as building, road, water, etc.) and their spatial and temporal relationships (is within 100 m of, is surrounded by, has changed in the past year, etc.). Data sets that can include multiple time slices of the same area are then processed using automated tools that reduce the images to the objects and relationships defined in an ontology based on the primitive objects, and this representation is stored in a geospatial-temporal semantic graph. Image searches are then defined in terms of the ontology (e.g. find a building greater than 103 m2 that borders a body of water), and the graph is searched for such relationships. This approach also enables the incorporation of non-image data that is related to the ontology. We demonstrate through an initial implementation of the entire system on large data sets (109 - 1011 pixels) that this system is robust against variations in di?erent image collection parameters, provides a way for analysts to query data sets in a more natural way, and can greatly reduce the memory footprint of the search.

More Details

A Computational Information Criterion for Particle-Tracking with Sparse or Noisy Data

Advances in Water Resources

Tran, Nhat T.; Benson, David A.; Schmidt, Michael J.; Pankavich, Stephen D.

Traditional probabilistic methods for the simulation of advection-diffusion equations (ADEs) often overlook the entropic contribution of the discretization, e.g., the number of particles, within associated numerical methods. Many times, the gain in accuracy of a highly discretized numerical model is outweighed by its associated computational costs or the noise within the data. We address the question of how many particles are needed in a simulation to best approximate and estimate parameters in one-dimensional advective-diffusive transport. To do so, we use the well-known Akaike Information Criterion (AIC) and a recently-developed correction called the Computational Information Criterion (COMIC) to guide the model selection process. Random-walk and mass-transfer particle tracking methods are employed to solve the model equations at various levels of discretization. Numerical results demonstrate that the COMIC provides an optimal number of particles that can describe a more efficient model in terms of parameter estimation and model prediction compared to the model selected by the AIC even when the data is sparse or noisy, the sampling volume is not uniform throughout the physical domain, or the error distribution of the data is non-IID Gaussian.

More Details

A conservative, consistent, and scalable meshfree mimetic method

Journal of Computational Physics

Trask, Nathaniel A.; Bochev, Pavel B.; Perego, Mauro P.

Mimetic methods discretize divergence by restricting the Gauss theorem to mesh cells. Because point clouds lack such geometric entities, construction of a compatible meshfree divergence remains a challenge. In this work, we define an abstract Meshfree Mimetic Divergence (MMD) operator on point clouds by contraction of field and virtual face moments. This MMD satisfies a discrete divergence theorem, provides a discrete local conservation principle, and is first-order accurate. We consider two MMD instantiations. The first one assumes a background mesh and uses generalized moving least squares (GMLS) to obtain the necessary field and face moments. This MMD instance is appropriate for settings where a mesh is available but its quality is insufficient for a robust and accurate mesh-based discretization. The second MMD operator retains the GMLS field moments but defines virtual face moments using computationally efficient weighted graph-Laplacian equations. This MMD instance does not require a background grid and is appropriate for applications where mesh generation creates a computational bottleneck. It allows one to trade an expensive mesh generation problem for a scalable algebraic one, without sacrificing compatibility with the divergence operator. We demonstrate the approach by using the MMD operator to obtain a virtual finite-volume discretization of conservation laws on point clouds. Numerical results in the paper confirm the mimetic properties of the method and show that it behaves similarly to standard finite volume methods.

More Details

A conservative, optimization-based semi-lagrangian spectral element method for passive tracer transport

COUPLED PROBLEMS 2015 - Proceedings of the 6th International Conference on Coupled Problems in Science and Engineering

Bochev, Pavel B.; Moe, Scott A.; Peterson, Kara J.; Ridzal, Denis R.

We present a new optimization-based, conservative, and quasi-monotone method for passive tracer transport. The scheme combines high-order spectral element discretization in space with semi-Lagrangian time stepping. Solution of a singly linearly constrained quadratic program with simple bounds enforces conservation and physically motivated solution bounds. The scheme can handle efficiently a large number of passive tracers because the semi-Lagrangian time stepping only needs to evolve the grid points where the primitive variables are stored and allows for larger time steps than a conventional explicit spectral element method. Numerical examples show that the use of optimization to enforce physical properties does not affect significantly the spectral accuracy for smooth solutions. Performance studies reveal the benefits of high-order approximations, including for discontinuous solutions.

More Details

A coupling strategy for nonlocal and local diffusion models with mixed volume constraints and boundary conditions

Computers and Mathematics with Applications (Oxford)

D'Elia, Marta D.; Perego, Mauro P.; Bochev, Pavel B.; Littlewood, David J.

We develop and analyze an optimization-based method for the coupling of nonlocal and local diffusion problems with mixed volume constraints and boundary conditions. The approach formulates the coupling as a control problem where the states are the solutions of the nonlocal and local equations, the objective is to minimize their mismatch on the overlap of the nonlocal and local domains, and the controls are virtual volume constraints and boundary conditions. When some assumptions on the kernel functions hold, we prove that the resulting optimization problem is well-posed and discuss its implementation using Sandia’s agile software components toolkit. As a result, the latter provides the groundwork for the development of engineering analysis tools, while numerical results for nonlocal diffusion in three-dimensions illustrate key properties of the optimization-based coupling method.

More Details

A coupling strategy for nonlocal and local diffusion models with mixed volume constraints and boundary conditions

Computers and Mathematics with Applications

D'Elia, Marta D.; Perego, Mauro P.; Bochev, Pavel B.; Littlewood, David J.

We develop and analyze an optimization-based method for the coupling of nonlocal and local diffusion problems with mixed volume constraints and boundary conditions. The approach formulates the coupling as a control problem where the states are the solutions of the nonlocal and local equations, the objective is to minimize their mismatch on the overlap of the nonlocal and local domains, and the controls are virtual volume constraints and boundary conditions. When some assumptions on the kernel functions hold, we prove that the resulting optimization problem is well-posed and discuss its implementation using Sandia's agile software components toolkit. The latter provides the groundwork for the development of engineering analysis tools, while numerical results for nonlocal diffusion in three-dimensions illustrate key properties of the optimization-based coupling method.

More Details

A cross-enclave composition mechanism for exascale system software

Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2016 - In conjunction with HPDC 2016

Evans, Noah; Pedretti, Kevin P.; Kocoloski, Brian; Lange, John; Lang, Michael; Bridges, Patrick G.

As supercomputers move to exascale, the number of cores per node continues to increase, but the I/O bandwidth between nodes is increasing more slowly. This leads to computational power outstripping I/O bandwidth. This growth, in turn, encourages moving as much of an HPC workflow as possible onto the node in order to minimize data movement. One particular method of application composition, enclaves, co-locates different operating systems and runtimes on the same node where they communicate by in situ communication mechanisms. In this work, we describe a mechanism for communicating between composed applications. We implement a mechanism using Copy onWrite cooperating with XEMEM shared memory to provide consistent, implicitly unsynchronized communication across enclaves. We then evaluate this mechanism using a composed application and analytics between the Kitten Lightweight Kernel and Linux on top of the Hobbes Operating System and Runtime. These results show a 3% overhead compared to an application running in isolation, demonstrating the viability of this approach.

More Details

A data-driven peridynamic continuum model for upscaling molecular dynamics

D'Elia, Marta D.; Silling, Stewart A.; Yu, Yue Y.; You, Huaiqian Y.

Nonlocal models, including peridynamics, often use integral operators that embed lengthscales in their definition. However, the integrands in these operators are difficult to define from the data that are typically available for a given physical system, such as laboratory mechanical property tests. In contrast, molecular dynamics (MD) does not require these integrands, but it suffers from computational limitations in the length and time scales it can address. To combine the strengths of both methods and to obtain a coarse-grained, homogenized continuum model that efficiently and accurately captures materials’ behavior, we propose a learning framework to extract, from MD data, an optimal Linear Peridynamic Solid (LPS) model as a surrogate for MD displacements. To maximize the accuracy of the learnt model we allow the peridynamic influence function to be partially negative, while preserving the well-posedness of the resulting model. To achieve this, we provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions and develop a constrained optimization algorithm that minimizes the equation residual while enforcing such solvability conditions. This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training. We illustrate the efficacy of the proposed approach with several numerical tests for single layer graphene. Our two-dimensional tests show the robustness of the proposed algorithm on validation data sets that include thermal noise, different domain shapes and external loadings, and discretizations substantially different from the ones used for training.

More Details

A data-driven peridynamic continuum model for upscaling molecular dynamics

Computer Methods in Applied Mechanics and Engineering

You, Huaiqian; Yu, Yue; Silling, Stewart A.; D'Elia, Marta D.

Nonlocal models, including peridynamics, often use integral operators that embed lengthscales in their definition. However, the integrands in these operators are difficult to define from the data that are typically available for a given physical system, such as laboratory mechanical property tests. In contrast, molecular dynamics (MD) does not require these integrands, but it suffers from computational limitations in the length and time scales it can address. To combine the strengths of both methods and to obtain a coarse-grained, homogenized continuum model that efficiently and accurately captures materials’ behavior, we propose a learning framework to extract, from MD data, an optimal Linear Peridynamic Solid (LPS) model as a surrogate for MD displacements. To maximize the accuracy of the learnt model we allow the peridynamic influence function to be partially negative, while preserving the well-posedness of the resulting model. To achieve this, we provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions and develop a constrained optimization algorithm that minimizes the equation residual while enforcing such solvability conditions. This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training. We illustrate the efficacy of the proposed approach with several numerical tests for single layer graphene. Our two-dimensional tests show the robustness of the proposed algorithm on validation data sets that include thermal noise, different domain shapes and external loadings, and discretizations substantially different from the ones used for training.

More Details

A dedicated message matching mechanism for collective communications

ACM International Conference Proceeding Series

Ghazimirsaeed, S.M.; Grant, Ryan E.; Afsahi, Ahmad

The Message Passing Interface (MPI) libraries use message queues to guarantee correct message ordering between communicating processes. Message queues are in the critical path of MPI communications and thus, the performance of message queue operations can have significant impact on the performance of applications. Collective communications are widely used in MPI applications and they can have considerable impact on generating long message queues. In this paper, we propose a message matching mechanism that improves the message queue search time by distinguishing messages coming from point-to-point and collective communications and allocating separate queues for them. Moreover, it dynamically profiles the impact of each collective call on message queues during the application runtime and uses this information to adapt the message queue data structure for each collective operation dynamically. The proposed approach can successfully reduce the queue search time while maintaining scalable memory consumption. The evaluation results show that we can obtain up to 5.5x runtime speedup for applications with long list traversals. Moreover, we can gain up to 15% and 45% queue search time improvement for applications with short and medium list traversals, respectively.

More Details

A design for a V&V and UQ discovery process

Knupp, Patrick K.; Urbina, Angel U.

There is currently sparse literature on how to implement systematic and comprehensive processes for modern V&V/UQ (VU) within large computational simulation projects. Important design requirements have been identified in order to construct a viable 'system' of processes. Significant processes that are needed include discovery, accumulation, and assessment. A preliminary design is presented for a VU Discovery process that accounts for an important subset of the requirements. The design uses a hierarchical approach to set context and a series of place-holders that identify the evidence and artifacts that need to be created in order to tell the VU story and to perform assessments. The hierarchy incorporates VU elements from a Predictive Capability Maturity Model and uses questionnaires to define critical issues in VU. The place-holders organize VU data within a central repository that serves as the official VU record of the project. A review process ensures that those who will contribute to the record have agreed to provide the evidence identified by the Discovery process. VU expertise is an essential part of this process and ensures that the roadmap provided by the Discovery process is adequate. Both the requirements and the design were developed to support the Nuclear Energy Advanced Modeling and Simulation Waste project, which is developing a set of advanced codes for simulating the performance of nuclear waste storage sites. The Waste project served as an example to keep the design of the VU Discovery process grounded in practicalities. However, the system is represented abstractly so that it can be applied to other M&S projects.

More Details

A discontinuous Galerkin method for gravity-driven viscous fingering instabilities in porous media

Journal of Computational Physics

Scovazzi, Guglielmo S.; Collis, Samuel S.; Gerstenberger, Axel G.

We present a new approach to the simulation of gravity-driven viscous fingering instabilities in porous media flow. These instabilities play a very important role during carbon sequestration processes in brine aquifers. Our approach is based on a nonlinear implementation of the discontinuous Galerkin method, and possesses a number of key features. First, the method developed is inherently high order, and is therefore well suited to study unstable flow mechanisms. Secondly, it maintains high-order accuracy on completely unstructured meshes. The combination of these two features makes it a very appealing strategy in simulating the challenging flow patterns and very complex geometries of actual reservoirs and aquifers. This article includes an extensive set of verification studies on the stability and accuracy of the method, and also features a number of computations with unstructured grids and non-standard geometries.

More Details

A distributed-memory hierarchical solver for general sparse linear systems

Parallel Computing

Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran R.; Boman, Erik G.; Darve, Eric

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We present various numerical results to demonstrate the versatility and scalability of the parallel algorithm.

More Details

A dynamic, unified design for dedicated message matching engines for collective and point-to-point communications

Parallel Computing

Ghazimirsaeed, S.M.; Grant, Ryan E.; Afsahi, Ahmad

The Message Passing Interface (MPI) libraries use message queues to guarantee correct message ordering between communicating processes. Message queues are in the critical path of MPI communications and thus, the performance of message queue operations can have significant impact on the performance of applications. Collective communications are widely used in MPI applications and they can have considerable impact on generating long message queues. In this paper, we propose a unified message matching mechanism that improves the message queue search time by distinguishing messages coming from point-to-point and collective communications and using a distinct message queue data structure for them. For collective operations, it dynamically profiles the impact of each collective call on message queues during the application runtime and uses this information to adapt the message queue data structure for each collective dynamically. Moreover, we use a partner/non-partner message queue data structure for the messages coming from point-to-point communications. The proposed approach can successfully reduce the queue search time while maintaining scalable memory consumption. The evaluation results show that we can obtain up to 5.5x runtime speedup for applications with long list traversals. Moreover, we can gain up to 15% and 94% queue search time improvement for all elements in applications with short and medium list traversals, respectively.

More Details
Results 1–200 of 9,998
Results 1–200 of 9,998