Publications

Results 2001–2200 of 9,998

Search results

Jump to search filters

Direct simulation Monte Carlo on petaflop supercomputers and beyond

Physics of Fluids

Plimpton, Steven J.; Moore, Stan G.; Borner, A.; Stagg, Alan K.; Koehler, T.P.; Torczynski, J.R.; Gallis, Michail A.

The gold-standard definition of the Direct Simulation Monte Carlo (DSMC) method is given in the 1994 book by Bird [Molecular Gas Dynamics and the Direct Simulation of Gas Flows (Clarendon Press, Oxford, UK, 1994)], which refined his pioneering earlier papers in which he first formulated the method. In the intervening 25 years, DSMC has become the method of choice for modeling rarefied gas dynamics in a variety of scenarios. The chief barrier to applying DSMC to more dense or even continuum flows is its computational expense compared to continuum computational fluid dynamics methods. The dramatic (nearly billion-fold) increase in speed of the largest supercomputers over the last 30 years has thus been a key enabling factor in using DSMC to model a richer variety of flows, due to the method's inherent parallelism. We have developed the open-source SPARTA DSMC code with the goal of running DSMC efficiently on the largest machines, both current and future. It is largely an implementation of Bird's 1994 formulation. Here, we describe algorithms used in SPARTA to enable DSMC to operate in parallel at the scale of many billions of particles or grid cells, or with billions of surface elements. We give a few examples of the kinds of fundamental physics questions and engineering applications that DSMC can address at these scales.

More Details

TATB Sensitivity to Shocks from Electrical Arcs

Propellants, Explosives, Pyrotechnics

Chen, Kenneth C.; Warne, Larry K.; Jorgenson, Roy E.; Niederhaus, John H.

Use of insensitive high explosives (IHEs) has significantly improved ammunition safety because of their remarkable insensitivity to violent cook-off, shock and impact. Triamino-trinitrobenzene (TATB) is the IHE used in many modern munitions. Previously, lightning simulations in different test configurations have shown that the required detonation threshold for standard density TATB at ambient and elevated temperatures (250 C) has a sufficient margin over the shock caused by an arc from the most severe lightning. In this paper, the Braginskii model with Lee-More channel conductivity prescription is used to demonstrate how electrical arcs from lightning could cause detonation in TATB. The steep rise and slow decay in typical lightning pulse are used in demonstrating that the shock pressure from an electrical arc, after reaching the peak, falls off faster than the inverse of the arc radius. For detonation to occur, two necessary detonation conditions must be met: the Pop-Plot criterion and minimum spot size requirement. The relevant Pop-Plot for TATB at 250 C was converted into an empirical detonation criterion, which is applicable to explosives subject to shocks of variable pressure. The arc cross-section was required to meet the minimum detonation spot size reported in the literature. One caveat is that when the shock pressure exceeds the detonation pressure the Pop-Plot may not be applicable, and the minimum spot size requirement may be smaller.

More Details

Composing neural algorithms with Fugu

ACM International Conference Proceeding Series

Aimone, James B.; Severa, William M.; Vineyard, Craig M.

Neuromorphic hardware architectures represent a growing family of potential post-Moore's Law Era platforms. Largely due to event-driving processing inspired by the human brain, these computer platforms can offer significant energy benefits compared to traditional von Neumann processors. Unfortunately there still remains considerable difficulty in successfully programming, configuring and deploying neuromorphic systems. We present the Fugu framework as an answer to this need. Rather than necessitating a developer attain intricate knowledge of how to program and exploit spiking neural dynamics to utilize the potential benefits of neuromorphic computing, Fugu is designed to provide a higher level abstraction as a hardware-independent mechanism for linking a variety of scalable spiking neural algorithms from a variety of sources. Individual kernels linked together provide sophisticated processing through compositionality. Fugu is intended to be suitable for a wide-range of neuromorphic applications, including machine learning, scientific computing, and more brain-inspired neural algorithms. Ultimately, we hope the community adopts this and other open standardization attempts allowing for free exchange and easy implementations of the ever-growing list of spiking neural algorithms.

More Details

IMEX and exact sequence discretization of the multi-fluid plasma model

Journal of Computational Physics

Miller, Sean M.; Cyr, Eric C.; Shadid, John N.; Kramer, Richard M.; Phillips, Edward G.; Conde, Sidafa C.; Pawlowski, Roger P.

Multi-fluid plasma models, where an electron fluid is modeled in addition to multiple ion and neutral species as well as the full set of Maxwell's equations, are useful for representing physics beyond the scope of classic MHD. This advantage presents challenges in appropriately dealing with electron dynamics and electromagnetic behavior characterized by the plasma and cyclotron frequencies and the speed of light. For physical systems, such as those near the MHD asymptotic regime, this requirement drastically increases runtimes for explicit time integration even though resolving fast dynamics may not be critical for accuracy. Implicit time integration methods, with efficient solvers, can help to step over fast time-scales that constrain stability, but do not strongly influence accuracy. As an extension, Implicit-explicit (IMEX) schemes provide an additional mechanism to choose which dynamics are evolved using an expensive implicit solve or resolved using a fast explicit solve. In this study, in addition to IMEX methods we also consider a physics compatible exact sequence spatial discretization. Here, this combines nodal bases (H-Grad) for fluid dynamics with a set of vector bases (H-Curl and H-Div) for Maxwell's equations. This discretization allows for multi-fluid plasma modeling without violating Gauss' laws for the electric and magnetic fields. This initial study presents a discussion of the major elements of this formulation and focuses on demonstrating accuracy in the linear wave regime and in the MHD limit for both a visco-resistive and a dispersive ideal MHD problem.

More Details

Benchmarking event-driven neuromorphic architectures

ACM International Conference Proceeding Series

Vineyard, Craig M.; Green, Sam G.; Severa, William M.; Koc, Cetin K.

Neuromorphic architectures are represented by a broad class of hardware, with artificial neural network (ANN) architectures at one extreme and event-driven spiking architectures at another. Algorithms and applications efficiently processed by one neuromorphic architecture may be unsuitable for another, but it is challenging to compare various neuromorphic architectures among themselves and with traditional computer architectures. In this position paper, we take inspiration from architectural characterizations in scientific computing and motivate the need for neuromorphic architecture comparison techniques, outline relevant performance metrics and analysis tools, and describe cognitive workloads to meaningfully exercise neuromorphic architectures. Additionally, we propose a simulation-based framework for benchmarking a wide range of neuromorphic workloads. While this work is applicable to neuromorphic development in general, we focus on event-driven architectures, as they offer both unique performance characteristics and evaluation challenges.

More Details

Direct Randomized Benchmarking for Multiqubit Devices

Physical Review Letters

Proctor, Timothy J.; Carignan-Dugas, Arnaud; Rudinger, Kenneth M.; Nielsen, Erik N.; Blume-Kohout, Robin J.; Young, Kevin C.

Benchmarking methods that can be adapted to multiqubit systems are essential for assessing the overall or "holistic" performance of nascent quantum processors. The current industry standard is Clifford randomized benchmarking (RB), which measures a single error rate that quantifies overall performance. But, scaling Clifford RB to many qubits is surprisingly hard. It has only been performed on one, two, and three qubits as of this writing. This reflects a fundamental inefficiency in Clifford RB: the n-qubit Clifford gates at its core have to be compiled into large circuits over the one- and two-qubit gates native to a device. As n grows, the quality of these Clifford gates quickly degrades, making Clifford RB impractical at relatively low n. In this Letter, we propose a direct RB protocol that mostly avoids compiling. Instead, it uses random circuits over the native gates in a device, which are seeded by an initial layer of Clifford-like randomization. We demonstrate this protocol experimentally on two to five qubits using the publicly available ibmqx5. We believe this to be the greatest number of qubits holistically benchmarked, and this was achieved on a freely available device without any special tuning up. Our protocol retains the simplicity and convenient properties of Clifford RB: it estimates an error rate from an exponential decay. But, it can be extended to processors with more qubits - we present simulations on 10+ qubits - and it reports a more directly informative and flexible error rate than the one reported by Clifford RB. We show how to use this flexibility to measure separate error rates for distinct sets of gates, and we use this method to estimate the average error rate of a set of cnot gates.

More Details

An indirect ALE discretization of single fluid plasma without a fast magnetosonic time step restriction

Computers and Mathematics with Applications

McGregor, Duncan A.; Robinson, Allen C.

In this paper we present an adjustment to traditional ALE discretizations of resistive MHD where we do not neglect the time derivative of the electric displacement field. This system is referred to variously as a perfect electromagnetic fluid or a single fluid plasma although we refer to the system as Full Maxwell Hydrodynamics (FMHD) in order to evoke its similarities to resistive Magnetohydrodynamics (MHD). Unlike the MHD system the characteristics of this system do not become arbitrarily large in the limit of low densities. In order to take advantage of these improved characteristics of the system we must tightly couple the electromagnetics into the Lagrangian motion and do away with more traditional operator splitting. We provide a number of verification tests to demonstrate both accuracy of the method and an asymptotic preserving (AP) property. In addition we present a prototype calculation of a Z-pinch and find very good agreement between our algorithm and resistive MHD. Further, FMHD leads to a large performance gain (approximately 4.6x speed up) compared to resistive MHD. We unfortunately find our proposed algorithm does not conserve charge leaving us with an open problem.

More Details

Explicit synchronous partitioned algorithms for interface problems based on Lagrange multipliers

Computers and Mathematics with Applications

Peterson, Kara J.; Bochev, Pavel B.; Kuberry, Paul A.

Traditional explicit partitioned schemes exchange boundary conditions between subdomains and can be related to iterative solution methods for the coupled problem. As a result, these schemes may require multiple subdomain solves, acceleration techniques, or optimized transmission conditions to achieve sufficient accuracy and/or stability. We present a new synchronous partitioned method derived from a well-posed mixed finite element formulation of the coupled problem. We transform the resulting Differential Algebraic Equation (DAE) to a Hessenberg index-1 form in which the algebraic equation defines the Lagrange multiplier as an implicit function of the states. Using this fact we eliminate the multiplier and reduce the DAE to a system of explicit ODEs for the states. Explicit time integration both discretizes this system in time and decouples its equations. As a result, the temporal accuracy and stability of our formulation are governed solely by the accuracy and stability of the explicit scheme employed and are not subject to additional stability considerations as in traditional partitioned schemes. We establish sufficient conditions for the formulation to be well-posed and prove that classical mortar finite elements on the interface are a stable choice for the Lagrange multiplier. We show that in this case the condition number of the Schur complement involved in the elimination of the multiplier is bounded by a constant. The paper concludes with numerical examples illustrating the approach for two different interface problems.

More Details

Random walks on jammed networks: Spectral properties

Physical Review E

Lechman, Jeremy B.; Bond, Stephen D.; Bolintineanu, Dan S.; Grest, Gary S.; Yarrington, Cole Y.; Silbert, Leonardo E.

Using random walk analyses we explore diffusive transport on networks obtained from contacts between isotropically compressed, monodisperse, frictionless sphere packings generated over a range of pressures in the vicinity of the jamming transition p→0. For conductive particles in an insulating medium, conduction is determined by the particle contact network with nodes representing particle centers and edges contacts between particles. The transition rate is not homogeneous, but is distributed inhomogeneously due to the randomness of packing and concomitant disorder of the contact network, e.g., the distribution of the coordination number. A narrow escape time scale is used to write a Markov process for random walks on the particle contact network. This stochastic process is analyzed in terms of spectral density of the random, sparse, Euclidean and real, symmetric, positive, semidefinite transition rate matrix. Results show network structures derived from jammed particles have properties similar to ordered, euclidean lattices but also some unique properties that distinguish them from other structures that are in some sense more homogeneous. In particular, the distribution of eigenvalues of the transition rate matrix follow a power law with spectral dimension 3. However, quantitative details of the statistics of the eigenvectors show subtle differences with homogeneous lattices and allow us to distinguish between topological and geometric sources of disorder in the network.

More Details

Modeling Concept and Numerical Simulation of Ultrasonic Wave Propagation in a Moving Fluid-Structure Domain based on a Monolithic Approach

Applied Mathematical Modelling

Hai, Ebna; Bause, Markus; Kuberry, Paul A.

Here in the present study, we propose a novel multiphysics model that merges two time-dependent problems – the Fluid-Structure Interaction (FSI) and the ultrasonic wave propagation in a fluid-structure domain with a one directional coupling from the FSI problem to the ultrasonic wave propagation problem. This model is referred to as the “eXtended fluid-structure interaction (eXFSI)” problem. This model comprises isothermal, incompressible Navier-Stokes equations with nonlinear elastodynamics using the Saint-Venant Kirchhoff solid model. The ultrasonic wave propagation problem comprises monolithically coupled acoustic and elastic wave equations. To ensure that the fluid and structure domains are conforming, we use the ALE technique. The solution principle for the coupled problem is to first solve the FSI problem and then to solve the wave propagation problem. Accordingly, the boundary conditions for the wave propagation problem are automatically adopted from the FSI problem at each time step. The overall problem is highly nonlinear, which is tackled via a Newton-like method. The model is verified using several alternative domain configurations. To ensure the credibility of the modeling approach, the numerical solution is contrasted against experimental data.

More Details

Semi-Automated Design of Functional Elements for a New Approach to Digital Superconducting Electronics: Methodology and Preliminary Results

ISEC 2019 - International Superconductive Electronics Conference

Frank, Michael P.; Lewis, Rupert; Missert, Nancy A.; Henry, Michael D.; Wolak, Matthaeus W.; Debenedictis, Erik P.

In an ongoing project at Sandia National Laboratories, we are attempting to develop a novel style of superconducting digital processing, based on a new model of reversible computation called Asynchronous Ballistic Reversible Computing (ABRC). We envision an approach in which polarized flux-ons scatter elastically from near-lossless functional components, reversibly updating the local digital state of the circuit, while dissipating only a small fraction of the input fluxon energy. This approach to superconducting digital computation is sufficiently unconventional that an appropriate methodology for hand-design of such circuits is not immediately obvious. To gain insight into the design principles that are applicable in this new domain, we are creating a software tool to automatically enumerate possible topologies of reactive, undamped Josephson junction circuits, and sweep the parameter space of each circuit searching for designs exhibiting desired dynamical behaviors. But first, we identified by hand a circuit implementing the simplest possible nontrivial ABRC functional behavior with bits encoded as conserved polarized fluxons, namely, a one-bit reversible memory cell with one bidirectional I/O port. We expect the tool to be useful for designing more complex circuits.

More Details

Camellia: A Rapid Development Framework for Finite Element Solvers

Computational Methods in Applied Mathematics

Roberts, Nathan V.

The discontinuous Petrov-Galerkin (DPG) methodology of Demkowicz and Gopalakrishnan guarantees the optimality of the finite element solution in a user-controllable energy norm, and provides several features supporting adaptive schemes. The approach provides stability automatically; there is no need for carefully derived numerical fluxes (as in DG schemes) or for mesh-dependent stabilization terms (as in stabilized methods). In this paper, we focus on features of Camellia that facilitate implementation of new DPG formulations; chief among these is a rich set of features in support of symbolic manipulation, which allow, e.g., bilinear formulations in the code to appear much as they would on paper. Many of these features are general in the sense that they can also be used in the implementation of other finite element formulations. In fact, because DPG's requirements are essentially a superset of those of other finite element methods, Camellia provides built-in support for most common methods. We believe, however, that the combination of an essentially "hands-free" finite element methodology as found in DPG with the rapid development features of Camellia are particularly winsome, so we focus on use cases in this class. In addition to the symbolic manipulation features mentioned above, Camellia offers support for one-irregular adaptive meshes in 1D, 2D, 3D, and space-time. It provides a geometric multigrid preconditioner particularly suited for DPG problems, and supports distributed parallel execution using MPI. For its load balancing and distributed data structures, Camellia relies on packages from the Trilinos project, which simplifies interfacing with other computational science packages. Camellia also allows loading of standard mesh formats through an interface with the MOAB package. Camellia includes support for static condensation to eliminate element-interior degrees of freedom locally, usually resulting in substantial reduction of the cost of the global problem. We include a discussion of the variational formulations built into Camellia, with references to those formulations in the literature, as well as an MPI performance study.

More Details

The insect brain as a model system for low power electronics and edge processing applications

Proceedings - 2019 IEEE Space Computing Conference, SCC 2019

Yanguas-Gil, Angel; Mane, Anil; Elam, Jeffrey W.; Wang, Felix W.; Severa, William M.; Daram, Anurag R.; Kudithipudi, Dhireesha

The insect brain is a great model system for low power electronics: Insects carry out multisensory integration and are able to change the way the process information, learn, and adapt to changes in their environment with a very limited power budget. This context-dependent processing allows them to implement multiple functionalities within the same network, as well as to minimize power consumption by having context-dependent gains in their first layers of input processing. The combination of low power consumption, adaptability and online learning, and robustness makes them particularly appealing for a number of space applications, from rovers and probes to satellites, all having to deal with the progressive degradation of their capabilities in remote environments. In this work, we explore architectures inspired in the insect brain capable of context-dependent processing and learning. Starting from algorithms, we have explored three different implementations: A spiking implementation in a neuromorphic chip, a custom implementation in an FPGA, and finally hybrid analog/digital implementations based on cross-bar arrays. For the latter, we found that the development of novel resistive materials is crucial in order to enhance the energy efficiency of analog devices while maintaining an adequate footprint. Metal-oxide nanocomposite materials, fabricated using ALD with processes compatible with semiconductor processing, are promising candidates to fill in that role.

More Details

Trust me. QED

SIAM News

Heroux, Michael A.

Consider a standard SIAM journal article containing theoretical results. Each theorem has a proof that typically builds on previous developments. Since every theorem stems from a firm foundation, the research community can trust a result without further evidence. One could thus argue that a theorem does not require a proof because surely an author would not publish it if no proof existed to back it up. Furthermore, respectable reviewers and editors expect proofs without exception, and papers containing proof-less theorems will likely go unpublished.

More Details

Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads

2019 International Conference on High Performance Computing and Simulation, HPCS 2019

Hammond, Simon D.; Hughes, Clayton H.; Levenhagen, Michael J.; Vaughan, Courtenay T.; Younge, Andrew J.; Schwaller, Benjamin S.; Aguilar, Michael J.; Laros, James H.; Laros, James H.

The high performance computing industry is undergoing a period of substantial change. Not least because of fabrication and lithographic challenges in the manufacturing of next-generation processors. As such challenges mount, the industry is looking to generate higher performance from additional functionality in the micro-architecture space as well as a greater emphasis on efficiency in the design of networkon-chip resources and memory subsystems. Such variation in design opens opportunities for new entrants in the data center and server markets where varying compute-to-memory ratios can present end users with more efficient node designs for particular workloads. In this paper we compare the recently released Marvell ThunderX2 Arm processor - arguably the first high-performance computing capable Arm design available in the marketplace. We perform a set of micro-benchmarking and mini-application evaluation on the ThunderX2 comparing it with Intel's Haswell and Skylake Xeon server parts commonly used in contemporary HPC designs. Our findings show that no one processor performs the best across all benchmarks, but that the ThunderX2 excels in areas demanding high memory bandwidth due to the provisioning of more memory channels in its design. We conclude that the ThunderX2 is a serious contender in the HPC server segment and has the potential to offer supercomputing sites with a viable high-performance alternative to existing designs from established industry players.

More Details

A resurgence in neuromorphic architectures enabling remote sensing computation

Proceedings - 2019 IEEE Space Computing Conference, SCC 2019

Vineyard, Craig M.; Severa, William M.; Kagie, Matthew J.; Scholand, Andrew J.; Hays, Park H.

Technological advances have enabled exponential growth in both sensor data collection, as well as computational processing. However, as a limiting factor, the transmission bandwidth in between a space-based sensor and a ground station processing center has not seen the same growth. A resolution to this bandwidth limitation is to move the processing to the sensor, but doing so faces size, weight, and power operational constraints. Different physical constraints on processor manufacturing are spurring a resurgence in neuromorphic approaches amenable to the space-based operational environment. Here we describe historical trends in computer architecture and the implications for neuromorphic computing, as well as give an overview of how remote sensing applications may be impacted by this emerging direction for computing.

More Details

Creation of nanoscale magnetic fields using nano-magnet arrays

AIP Advances

Sapkota, Keshab R.; Eley, S.; Bussmann, Ezra B.; Harris, Charles T.; Maurer, Leon M.; Lu, Tzu-Ming L.

We present the fabrication of nano-magnet arrays, comprised of two sets of interleaving SmCo5 and Co nano-magnets, and the subsequent development and implementation of a protocol to program the array to create a one-dimensional rotating magnetic field. We designed the array based on the microstructural and magnetic properties of SmCo5 films annealed under different conditions, also presented here. Leveraging the extremely high contrast in coercivity between SmCo5 and Co, we applied a sequence of external magnetic fields to program the nano-magnet arrays into a configuration with alternating polarization, which based on simulations creates a rotating magnetic field in the vicinity of nano-magnets. Our proof-of-concept demonstration shows that complex, nanoscale magnetic fields can be synthesized through coercivity contrast of constituent magnetic materials and carefully designed sequences of programming magnetic fields.

More Details

Investigating Fairness in Disaggregated Non-Volatile Memories

Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI

Kommareddy, Vamsee R.; Hughes, Clayton H.; Hammond, Simon D.; Awad, Amro

Many applications have growing demands for memory, particularly in the HPC space, making the memory system a potential bottleneck of next-generation computing systems. Sharing the memory system across processor sockets and nodes becomes a compelling argument given that memory technology is scaling at a slower rate than processor technology. Moreover, as many applications rely on shared data, e.g., graph applications and database workloads, having a large number of nodes accessing shared memory allows for efficient use of resources and avoids duplicating huge files, which can be infeasible for large graphs or scientific data. As new memory technologies come on the market, the flexibility of upgrading memory and system updates become major a concern, disaggregated memory systems where memory is shared across different computing nodes, e.g., System-on-Chip (SoC), is expected to become the most common design/architecture on memory-centric systems, e.g., The Machine project from HP Labs. However, due to the nature of such systems, different users and applications compete for the available memory bandwidth, which can lead to severe contention due to memory traffic from different SoCs. In this paper, we discuss the contention problem in disaggregated memory systems and suggest mechanisms to ensure memory fairness and enforce QoS. Our simulation results show that employing our proposed QoS techniques can speed up memory response time by up to 55%.

More Details

A vision for managing extreme-scale data hoards

Proceedings - International Conference on Distributed Computing Systems

Logan, Jeremy; Mehta, Kshitij; Heber, Gerd; Klasky, Scott; Kurc, Tahsin; Podhorszki, Norbert; Widener, Patrick W.; Wolf, Matthew

Scientific data collections grow ever larger, both in terms of the size of individual data items and of the number and complexity of items. To use and manage them, it is important to directly address issues of robust and actionable provenance. We identify three key drivers as our focus: managing the size and complexity of metadata, lack of a priori information to match usage intents between publishers and consumers of data, and support for campaigns over collections of data driven by multi-disciplinary, collaborating teams. We introduce the Hoarde abstraction as an attempt to formalize a way of looking at collections of data to make them more tractable for later use. Hoarde leverages middleware and systems infrastructures for scientific and technical data management. Through the lens of a select group of challenging data usage scenarios, we discuss some of the aspects of implementation, usage, and forward portability of this new view on data management.

More Details

Compatible meshfree discretization of surface PDEs

Computational Particle Mechanics

Trask, Nathaniel A.; Kuberry, Paul A.

Meshfree discretization of surface partial differential equations is appealing, due to their ability to naturally adapt to deforming motion of the underlying manifold. In this work, we consider an existing scheme proposed by Liang et al. reinterpreted in the context of generalized moving least squares (GMLS), showing that existing numerical analysis from the GMLS literature applies to their scheme. With this interpretation, their approach may then be unified with recent work developing compatible meshfree discretizations for the div-grad problem in Rd. Informally, this is analogous to an extension of collocated finite differences to staggered finite difference methods, but in the manifold setting and with unstructured nodal data. In this way, we obtain a compatible meshfree discretization of elliptic problems on manifolds which is naturally stable for problems with material interfaces, without the need to introduce numerical dissipation or local enrichment near the interface. As a result, we provide convergence studies illustrating the high-order convergence and stability of the approach for manufactured solutions and for an adaptation of the classical five-strip benchmark to a cylindrical manifold.

More Details
Results 2001–2200 of 9,998
Results 2001–2200 of 9,998