Publications

Results 4251–4300 of 9,998

Search results

Jump to search filters

Messier: A Detailed NVM-Based DIMM Model for the SST Simulation Framework

Awad, Amro; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hammond, Simon; Hoekstra, Robert J.; Hughes, Clayton

DRAM technology is the main building block of main memory, however, DRAM scaling is becoming very challenging. The main issues for DRAM scaling are the increasing error rates with each new generation, the geometric and physical constraints of scaling the capacitor part of the DRAM cells, and the high power consumption caused by the continuous need for refreshing cell values. At the same time, emerging Non- Volatile Memory (NVM) technologies, such as Phase-Change Memory (PCM), are emerging as promising replacements for DRAM. NVMs, when compared to current technologies e.g., NAND-based ash, have latencies comparable to DRAM. Additionally, NVMs are non-volatile, which eliminates the need for refresh power and enables persistent memory applications. Finally, NVMs have promising densities and the potential for multi-level cell (MLC) storage.

More Details

Rythmos: Solution and Analysis Package for Differential-Algebraic and Ordinary-Differential Equations

Ober, Curtis C.; Bartlett, Roscoe; Coffey, Todd S.; Pawlowski, Roger

Time integration is a central component for most transient simulations. It coordinates many of the major parts of a simulation together, e.g., a residual calculation with a transient solver, solution with the output, various operator-split physics, and forward and adjoint solutions for inversion. Even though there is this variety in these transient simulations, there is still a common set of algorithms and procedures to progress transient solutions for ordinary-differential equations (ODEs) and differential-alegbraic equations (DAEs). Rythmos is a collection of these algorithms that can be used for the solution of transient simulations. It provides common time-integration methods, such as Backward and Forward Euler, Explicit and Implicit Runge-Kutta, and Backward-Difference Formulas. It can also provide sensitivities, and adjoint components for transient simulations. Rythmos is a package within Trilinos, and requires some other packages (e.g., Teuchos and Thrya) to provide basic time-integration capabilities. It also can be coupled with several other Trilinos packages to provide additional capabilities (e.g., AztecOO and Belos for linear solutions, and NOX for non-linear solutions). The documentation is broken down into three parts: Theory Manual, User's Manual, and Developer's Guide. The Theory Manual contains the basic theory of the time integrators, the nomenclature and mathematical structure utilized within Rythmos, and verification results demonstrating that the designed order of accuracy is achieved. The User's Manual provides information on how to use the Rythmos, description of input parameters through Teuchos Parameter Lists, and description of convergence test examples. The Developer's Guide is a high-level discussion of the design and structure of Rythmos to provide information to developers for the continued development of capabilities. Details of individual components can be found in the Doxygen webpages.

More Details

Recommended Research Directions for Improving the Validation of Complex Systems Models

Vugrin, Eric; Trucano, Timothy G.; Swiler, Laura P.; Finley, Patrick D.; Flanagan, Tatiana P.; Naugle, Asmeret; Tsao, Jeffrey Y.; Verzi, Stephen J.

More Details

Forward modeling of shock-ramped tantalum

AIP Conference Proceedings

Brown, Justin L.; Carpenter, John H.; Seagle, Cristopher T.

Dynamic materials experiments on the Z-machine are beginning to reach a regime where traditional analysis techniques break down. Time dependent phenomena such as strength and phase transition kinetics often make the data obtained in these experiments difficult to interpret. We present an inverse analysis methodology to infer the equation of state (EOS) from velocimetry data in these types of experiments, building on recent advances in the propagation of uncertain EOS information through a hydrocode simulation. An example is given for a shock-ramp experiment in which tantalum was shock compressed to 40 GPa followed by a ramp to 80 GPa. The results are found to be consistent with isothermal compression and Hugoniot data in this regime.

More Details

A software framework for assessing the resilience of drinking water systems to disasters with an example earthquake case study

Environmental Modelling and Software

Klise, Katherine A.; Bynum, Michael L.; Moriarty, Dylan M.; Murray, Regan

Water utilities are vulnerable to a wide variety of human-caused and natural disasters. The Water Network Tool for Resilience (WNTR) is a new open source Python™ package designed to help water utilities investigate resilience of water distribution systems to hazards and evaluate resilience-enhancing actions. In this paper, the WNTR modeling framework is presented and a case study is described that uses WNTR to simulate the effects of an earthquake on a water distribution system. The case study illustrates that the severity of damage is not only a function of system integrity and earthquake magnitude, but also of the available resources and repair strategies used to return the system to normal operating conditions. While earthquakes are particularly concerning since buried water distribution pipelines are highly susceptible to damage, the software framework can be applied to other types of hazards, including power outages and contamination incidents.

More Details

GPU erasure coding for campaign storage

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Curry, Matthew L.; Haddock, Walker; Bangalore, Purushotham V.; Skjellum, Anthony

High-performance computing (HPC) demands high bandwidth and low latency in I/O performance leading to the development of storage systems and I/O software components that strive to provide greater and greater performance. However, capital and energy budgets along with increasing storage capacity requirements have motivated the search for lower cost, large storage systems for HPC. With Burst Buffer technology increasing the bandwidth and reducing the latency for I/O between the compute and storage systems, the back-end storage bandwidth and latency requirements can be reduced, especially underneath an adequately sized modern parallel file system. Cloud computing has led to the development of large, low-cost storage solutions where design has focused on high capacity, availability, and low energy consumption at lowest cost. Cloud computing storage systems leverage duplicates and erasure coding technology to provide high availability at much lower cost than traditional HPC storage systems. Leveraging certain cloud storage infrastructure and concepts in HPC would be valuable economically in terms of cost-effective performance for certain storage tiers. To enable the use of cloud storage technologies for HPC we study the architecture for interfacing cloud storage between the HPC parallel file systems and the archive storage. In this paper, we report our comparison of two erasure coding implementations for the Ceph file system. We compare measurements of various degrees of sharding that are relevant for HPC applications. We show that the Gibraltar GPU Erasure coding library outperforms a CPU implementation of an erasure coding plugin for the Ceph object storage system, opening the potential for new ways to architect such storage systems based on Ceph.

More Details

Global sensitivity analysis and quantification of model error for large eddy simulation in scramjet design

19th AIAA Non-Deterministic Approaches Conference, 2017

Huan, Xun H.; Safta, Cosmin; Sargsyan, Khachik; Geraci, Gianluca; Eldred, Michael S.; Vane, Zachary P.; Lacaze, Guilhem; Oefelein, Joseph; Najm, Habib N.

The development of scramjet engines is an important research area for advancing hypersonic and orbital flights. Progress towards optimal engine designs requires both accurate flow simulations as well as uncertainty quantification (UQ). However, performing UQ for scramjet simulations is challenging due to the large number of uncertain parameters involved and the high computational cost of flow simulations. We address these difficulties by combining UQ algorithms and numerical methods to the large eddy simulation of the HIFiRE scramjet configuration. First, global sensitivity analysis is conducted to identify influential uncertain input parameters, helping reduce the stochastic dimension of the problem and discover sparse representations. Second, as models of different fidelity are available and inevitably used in the overall UQ assessment, a framework for quantifying and propagating the uncertainty due to model error is introduced. These methods are demonstrated on a non-reacting scramjet unit problem with parameter space up to 24 dimensions, using 2D and 3D geometries with static and dynamic treatments of the turbulence subgrid model.

More Details

A combinatorial model for dentate gyrus sparse coding

Neural Computation

Severa, William M.; Parekh, Ojas D.; James, Conrad D.; Aimone, James B.

The dentate gyrus forms a critical link between the entorhinal cortex and CA3 by providing a sparse version of the signal. Concurrent with this increase in sparsity, a widely accepted theory suggests the dentate gyrus performs pattern separation-similar inputs yield decorrelated outputs. Although an active region of study and theory, few logically rigorous arguments detail the dentate gyrus's (DG) coding.We suggest a theoretically tractable, combinatorial model for this action. The model provides formal methods for a highly redundant, arbitrarily sparse, and decorrelated output signal. To explore the value of this model framework, we assess how suitable it is for two notable aspects of DG coding: how it can handle the highly structured grid cell representation in the input entorhinal cortex region and the presence of adult neurogenesis, which has been proposed to produce a heterogeneous code in the DG.We find tailoring themodel to grid cell input yields expansion parameters consistent with the literature. In addition, the heterogeneous coding reflects activity gradation observed experimentally. Finally,we connect this approach with more conventional binary threshold neural circuit models via a formal embedding.

More Details

The approximability of partial vertex covers in trees

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Mkrtchyan, Vahan; Parekh, Ojas D.; Segev, Danny; Subramani, K.

Motivated by applications in risk management of computational systems, we focus our attention on a special case of the partial vertex cover problem, where the underlying graph is assumed to be a tree. Here, we consider four possible versions of this setting, depending on whether vertices and edges are weighted or not. Two of these versions, where edges are assumed to be unweighted, are known to be polynomial-time solvable. However, the computational complexity of this problem with weighted edges, and possibly with weighted vertices, has not been determined yet. The main contribution of this paper is to resolve these questions by fully characterizing which variants of partial vertex cover remain intractable in trees, and which can be efficiently solved. In particular, we propose a pseudo-polynomial DP-based algorithm for the most general case of having weights on both edges and vertices, which is proven to be NP-hard. This algorithm provides a polynomialtime solution method when weights are limited to edges, and combined with additional scaling ideas, leads to an FPTAS for the general case. A secondary contribution of this work is to propose a novel way of using centroid decompositions in trees, which could be useful in other settings as well.

More Details

Machine learning models of errors in large eddy simulation predictions of surface pressure fluctuations

47th AIAA Fluid Dynamics Conference, 2017

Barone, Matthew F.; Fike, Jeffrey; Chowdhary, Kenny; Davis, Warren L.; Ling, Julia; Martin, Shawn

We investigate a novel application of deep neural networks to modeling of errors in prediction of surface pressure fluctuations beneath a compressible, turbulent flow. In this context, the truth solution is given by Direct Numerical Simulation (DNS) data, while the predictive model is a wall-modeled Large Eddy Simulation (LES). The neural network provides a means to map relevant statistical flow-features within the LES solution to errors in prediction of wall pressure spectra. We simulate a number of flat plate turbulent boundary layers using both DNS and wall-modeled LES to build up a database with which to train the neural network. We then apply machine learning techniques to develop an optimized neural network model for the error in terms of relevant flow features.

More Details

The journey from forensic to predictive materials science using density functional theory

Modelling and Simulation in Materials Science and Engineering

Schultz, Peter A.

Approximate methods for electronic structure, implemented in sophisticated computer codes and married to ever-more powerful computing platforms, have become invaluable in chemistry and materials science. The maturing and consolidation of quantum chemistry codes since the 1980s, based upon explicitly correlated electronic wave functions, has made them a staple of modern molecular chemistry. Here, the impact of first principles electronic structure in physics and materials science had lagged owing to the extra formal and computational demands of bulk calculations.

More Details

A high-order staggered meshless method for elliptic problems

SIAM Journal on Scientific Computing

Perego, Mauro; Trask, Nathaniel; Bochev, Pavel

We present a new meshless method for scalar diffusion equations, which is motivated by their compatible discretizations on primal-dual grids. Unlike the latter though, our approach is truly meshless because it only requires the graph of nearby neighbor connectivity of the discretization points xi. This graph defines a local primal-dual grid complex with a virtual dual grid, in the sense that specification of the dual metric attributes is implicit in the method's construction. Our method combines a topological gradient operator on the local primal grid with a generalized moving least squares approximation of the divergence on the local dual grid. We show that the resulting approximation of the div-grad operator maintains polynomial reproduction to arbitrary orders and yields a meshless method, which attains O(hm) convergence in both L2- and H1-norms, similar to mixed finite element methods. We demonstrate this convergence on curvilinear domains using manufactured solutions in two and three dimensions. Application of the new method to problems with discontinuous coefficients reveals solutions that are qualitatively similar to those of compatible mesh-based discretizations.

More Details

A seed placement strategy for conforming voronoi meshing

CCCG 2017 - 29th Canadian Conference on Computational Geometry, Proceedings

Abdelkader, Ahmed; Bajaj, Chandrajit L.; Ebeida, Mohamed S.; Mitchell, Scott A.

We show how to place a set of seed points such that a given piecewise linear complex is the union of some faces in the resulting Voronoi diagram. The seeds are placed on sufficiently small spheres centered at input vertices and are arranged into little circles around each half-edge where every seed is mirrored across the associated triangle. The Voronoi faces common to the seeds of such arrangements yield a mesh conforming to the input complex. If the input contains sharp angles, then additional seeds are needed, analogous to nonobtuse refinement. Finally, we propose local optimizations to reduce the number of seeds and output facets.

More Details

Multilevel-multifidelity acceleration of PDE-constrained optimization

58th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, 2017

Monschke, Jason A.; Eldred, Michael S.

Many engineering design problems can be formulated in the framework of partial differential equation (PDE) constrained optimization. The discretization of a PDE leads to multiple levels of resolution with varying degrees of numerical solution accuracy. Coarse discretizations require less computational time at the expense of increased error. Often there are also reduced fidelity models available, with simplifications to the physics models that are computationally easier to solve. This research develops an up to second-order consistent multilevel-multifidelity (MLMF) optimization scheme that exploits the reduced cost resulting from coarse discretization and reduced fidelity to more efficiently converge to the optimum of a fine-grid high-fidelity problem. This scheme distinguishes multilevel approaches applied to discretizations from multifidelity approaches applied to model forms, and navigates both hierarchies to accelerate convergence. Additive, multiplicative, or a combination of both corrections can be applied to the sub-problems to enforce up to second-order consistency with the fine-grid high-fidelity results. The MLMF optimization algorithm is a wrapper around a subproblem optimization solver, and the MLMF scheme is provably convergent if the subproblem optimizer is provably convergent. Heuristics are developed for efficiently tuning optimization tolerances and iterations at each level and fidelity based on relative solution cost. Accelerated convergence is demonstrated for a simple one-dimensional problem and aerodynamic shape optimization of a transonic airfoil.

More Details

A historical survey of algorithms and hardware architectures for neural-inspired and neuromorphic computing applications

Biologically Inspired Cognitive Architectures

James, Conrad D.; Aimone, James B.; Miner, Nadine E.; Vineyard, Craig M.; Rothganger, Fredrick R.; Carlson, Kristofor D.; Mulder, Samuel A.; Draelos, Timothy J.; Faust, Aleksandra; Marinella, Matthew; Naegle, John H.; Plimpton, Steven J.

Biological neural networks continue to inspire new developments in algorithms and microelectronic hardware to solve challenging data processing and classification problems. Here, we survey the history of neural-inspired and neuromorphic computing in order to examine the complex and intertwined trajectories of the mathematical theory and hardware developed in this field. Early research focused on adapting existing hardware to emulate the pattern recognition capabilities of living organisms. Contributions from psychologists, mathematicians, engineers, neuroscientists, and other professions were crucial to maturing the field from narrowly-tailored demonstrations to more generalizable systems capable of addressing difficult problem classes such as object detection and speech recognition. Algorithms that leverage fundamental principles found in neuroscience such as hierarchical structure, temporal integration, and robustness to error have been developed, and some of these approaches are achieving world-leading performance on particular data classification tasks. In addition, novel microelectronic hardware is being developed to perform logic and to serve as memory in neuromorphic computing systems with optimized system integration and improved energy efficiency. Key to such advancements was the incorporation of new discoveries in neuroscience research, the transition away from strict structural replication and towards the functional replication of neural systems, and the use of mathematical theory frameworks to guide algorithm and hardware developments.

More Details

sPIN: High-performance streaming Processing in the Network

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Hoefler, Torsten; Di Girolamo, Salvatore; Taranov, Konstantin; Grant, Ryan; Brightwell, Ronald B.

Optimizing communication performance is imperative for largescale computing because communication overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized for data movement. However, these devices are limited to fixed functions, such as remote direct memory access. We develop sPIN, a portable programming model to offload simple packet processing functions to the network card. To demonstrate the potential of the model, we design a cycle-accurate simulation environment by combining the network simulator LogGOPSim and the CPU simulator gem5. We implement offloaded message matching, datatype processing, and collective communications and demonstrate transparent full-application speedups. Furthermore, we show how sPIN can be used to accelerate redundant in-memory filesystems and several other use cases. Our work investigates a portable packet-processing network acceleration model similar to compute acceleration with CUDA or OpenCL. We show how such network acceleration enables an eco-system that can significantly speed up applications and system services.

More Details

A generalized sampling and preconditioning scheme for sparse approximation of polynomial chaos expansions

SIAM Journal on Scientific Computing

Jakeman, John D.; Narayan, Akil; Zhou, Tao

We propose an algorithm for recovering sparse orthogonal polynomial expansions via collocation. A standard sampling approach for recovering sparse polynomials uses Monte Carlo sampling, from the density of orthogonality, which results in poor function recovery when the polynomial degree is high. Our proposed approach aims to mitigate this limitation by sampling with respect to the weighted equilibrium measure of the parametric domain and subsequently solves a preconditioned'1-minimization problem, where the weights of the diagonal preconditioning matrix are given by evaluations of the Christoffel function. Our algorithm can be applied to a wide class of orthogonal polynomial families on bounded and unbounded domains, including all classical families. We present theoretical analysis to motivate the algorithm and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest. Numerical examples are also provided to demonstrate that our proposed algorithm leads to comparable or improved accuracy even when compared with Legendre- and Hermite-specific algorithms.

More Details

Horseshoes and hand grenades: The case for approximate coordination in local checkpointing protocols

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Widener, Patrick; Ferreira, Kurt B.; Levy, Scott

Fault-tolerance poses a major challenge for future large-scale systems. Active research into coordinated, uncoordinated, and hybrid checkpointing systems has explored how the introduction of asynchrony can address anticipated scalability issues. While fully uncoordinated approaches have been shown to have significant delays, the degree of sychronization required to keep overheads low has not yet been significantly addressed. In this paper, we use a simulation-based approach to show the impact of synchronization on local checkpoint activity. Specifically, we show the degree of synchronization needed to keep the impacts of local checkpointing low is attainable with current technology for a number of key production HPC workloads. Our work provides a critical analysis and comparison of synchronization and local checkpointing. This enables users and system administrators to fine-tune the checkpointing scheme to the application and system characteristics available.

More Details

BBPH: Using progressive hedging within branch and bound to solve multi-stage stochastic mixed integer programs

Operations Research Letters

Watson, Jean-Paul; Woodruff, David L.; Barnett, Jason

Progressive hedging, though an effective heuristic for solving stochastic mixed integer programs (SMIPs), is not guaranteed to converge in this case. Here, we describe BBPH, a branch and bound algorithm that uses PH at each node in the search tree such that, given sufficient time, it will always converge to a globally optimal solution. In addition to providing a theoretically convergent “wrapper” for PH applied to SMIPs, computational results demonstrate that for some difficult problem instances branch and bound can find improved solutions after exploring only a few nodes.

More Details

A christoffel function weighted least squares algorithm for collocation approximations

Mathematics of Computation

Jakeman, John D.; Narayan, Akil; Zhou, Tao

We propose, theoretically investigate, and numerically validate an algorithm for the Monte Carlo solution of least-squares polynomial approximation problems in a collocation framework. Our investigation is motivated by applications in the collocation approximation of parametric functions, which frequently entails construction of surrogates via orthogonal polynomials. A standard Monte Carlo approach would draw samples according to the density defining the orthogonal polynomial family. Our proposed algorithm instead samples with respect to the (weighted) pluripotential equilibrium measure of the domain, and subsequently solves a weighted least-squares problem, with weights given by evaluations of the Christoffel function. We present theoretical analysis to motivate the algorithm, and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest.

More Details
Results 4251–4300 of 9,998
Results 4251–4300 of 9,998
Top