Publications

Results 576–600 of 9,998
Skip to search filters

A Layered Approach for Modular Container Construction and Orchestration in HPC Environments

ScienceCloud 2021 - Proceedings of the 11th Workshop on Scientific Cloud Computing

Wofford, Quincy; Bridges, Patrick G.; Widener, Patrick W.

Large-scale, high-throughput computational science faces an accelerating convergence of software and hardware. Software container-based solutions have become common in cloud-based datacenter environments, and are considered promising tools for addressing heterogeneity and portability concerns. However, container solutions reflect a set of assumptions which complicate their adoption by developers and users of scientific workflow applications. Nor are containers a universal solution for deployment in high-performance computing (HPC) environments which have specialized and vertically integrated scheduling and runtime software stacks. In this paper, we present a container design and deployment approach which uses modular layering to ease the deployment of containers into existing HPC environments. This layered approach allows operating system integrations, support for different communication and performance monitoring libraries, and application code to be defined and interchanged in isolation. We describe in this paper the details of our approach, including specifics about container deployment and orchestration for different HPC scheduling systems. We also describe how this layering method can be used to build containers for two separate applications, each deployed on clusters with different batch schedulers, MPI networking support, and performance monitoring requirements. Our experience indicates that the layered approach is a viable strategy for building applications intended to provide similar behavior across widely varying deployment targets.

More Details

Fast three-dimensional rules-based simulation of thermal-sprayed microstructures

Computational Materials Science

Rodgers, Theron R.; Mitchell, John A.; Olson, Aaron J.; Bolintineanu, Dan S.; Vackel, Andrew V.; Moore, Nathan W.

Thermal spray processes involve the repeated impact of millions of discrete particles, whose melting, deformation, and coating-formation dynamics occur at microsecond timescales. The accumulated coating that evolves over minutes is comprised of complex, multiphase microstructures, and the timescale difference between the individual particle solidification and the overall coating formation represents a significant challenge for analysts attempting to simulate microstructure evolution. In order to overcome the computational burden, researchers have created rule-based models (similar to cellular automata methods) that do not directly simulate the physics of the process. Instead, the simulation is governed by a set of predefined rules, which do not capture the fine-details of the evolution, but do provide a useful approximation for the simulation of coating microstructures. Here, we introduce a new rules-based process model for microstructure formation during thermal spray processes. The model is 3D, allows for an arbitrary number of material types, and includes multiple porosity-generation mechanisms. Example results of the model for tantalum coatings are presented along with sensitivity analyses of model parameters and validation against 3D experimental data. The model's computational efficiency allows for investigations into the stochastic variation of coating microstructures, in addition to the typical process-to-structure relationships.

More Details

Higher-order particle representation for particle-in-cell simulations

Journal of Computational Physics

Brown, Dominic A.S.; Bettencourt, Matthew T.; Wright, Steven A.; Maheswaran, Satheesh; Jones, John P.; Jarvis, Stephen A.

In this paper we present an alternative approach to the representation of simulation particles for unstructured electrostatic and electromagnetic PIC simulations. In our modified PIC algorithm we represent particles as having a smooth shape function limited by some specified finite radius, r0. A unique feature of our approach is the representation of this shape by surrounding simulation particles with a set of virtual particles with delta shape, with fixed offsets and weights derived from Gaussian quadrature rules and the value of r0. As the virtual particles are purely computational, they provide the additional benefit of increasing the arithmetic intensity of traditionally memory bound particle kernels. The modified algorithm is implemented within Sandia National Laboratories' unstructured EMPIRE-PIC code, for electrostatic and electromagnetic simulations, using periodic boundary conditions. We show results for a representative set of benchmark problems, including electron orbit, a transverse electromagnetic wave propagating through a plasma, numerical heating, and a plasma slab expansion. Good error reduction across all of the chosen problems is achieved as the particles are made progressively smoother, with the optimal particle radius appearing to be problem-dependent.

More Details

AlCl3-Dosed Si(100)-2 × 1: Adsorbates, Chlorinated Al Chains, and Incorporated Al

Journal of Physical Chemistry C

Radue, Matthew S.; Baek, Sungha; Farzaneh, Azadeh; Dwyer, K.J.; Campbell, Quinn C.; Baczewski, Andrew D.; Bussmann, Ezra B.; Wang, George T.; Mo, Yifei; Misra, Shashank M.; Butera, R.E.

The adsorption of AlCl3 on Si(100) and the effect of annealing the AlCl3-dosed substrate were studied to reveal key surface processes for the development of atomic-precision, acceptor-doping techniques. This investigation was performed via scanning tunneling microscopy (STM), X-ray photoelectron spectroscopy (XPS), and density functional theory (DFT) calculations. At room temperature, AlCl3 readily adsorbed to the Si substrate dimers and dissociated to form a variety of species. Annealing the AlCl3-dosed substrate at temperatures below 450 °C produced unique chlorinated aluminum chains (CACs) elongated along the Si(100) dimer row direction. An atomic model for the chains is proposed with supporting DFT calculations. Al was incorporated into the Si substrate upon annealing at 450 °C and above, and Cl desorption was observed for temperatures beyond 450 °C. Al-incorporated samples were encapsulated in Si and characterized by secondary ion mass spectrometry (SIMS) depth profiling to quantify the Al atom concentration, which was found to be in excess of 1020 cm-3 across a ∼2.7 nm-thick δ-doped region. The Al concentration achieved here and the processing parameters utilized promote AlCl3 as a viable gaseous precursor for novel acceptor-doped Si materials and devices for quantum computing.

More Details

Simplifying and Visualizing the Ontology of Systems Engineering Models

Murdock, Jaimie M.; Carroll, Edward R.

The credibility of an engineering model is of critical importance in large-scale projects. How concerned should an engineer be when reusing someone else's model when they may not know the author or be familiar with the tools that were used to create it? In this report, the authors advance engineers' capabilities for assessing models through examination of the underlying semantic structure of a model--the ontology. This ontology defines the objects in a model, types of objects, and relationships between them. In this study, two advances in ontology simplification and visualization are discussed and are demonstrated on two systems engineering models. These advances are critical steps toward enabling engineering models to interoperate, as well as assessing models for credibility. For example, results of this research show an 80% reduction in file size and representation size, dramatically improving the throughput of graph algorithms applied to the analysis of these models. Finally, four future problems are outlined in ontology research toward establishing credible models--ontology discovery, ontology matching, ontology alignment, and model assessment.

More Details

TEMPI: An Interposed MPI Library with Canonical Representation of MPI Datatypes [Poster]

Pearson, Carl W.; Wu, Kun W.; Chung, I-Hsin C.; Xiong, Jinjun X.; Hwu, Wen-mei H.

TEMPI provides a transparent non-contiguous data-handling layer compatible with various MPIs. MPI Datatypes are a powerful abstraction for allowing an MPI implementation to operate on non-contiguous data. CUDA-aware MPI implementations must also manage transfer of such data between the host system and GPU. The non-unique and recursive nature of MPI datatypes mean that providing fast GPU handling is a challenge. The same noncontiguous pattern may be described in a variety of ways, all of which should be treated equivalently by an implementation. This work introduces a novel technique to do this for strided datatypes. Methods for transferring non-contiguous data between the CPU and GPU depends on the properties of the data layout. This work shows that a simple performance model can accurately select the fastest method. Unfortunately, the combination of MPI software and system hardware available may not provide sufficient performance. The contributions of this work are deployed on OLCF Summit through an interposer library which does not require privileged access to the system to use

More Details

Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - In conjunction with IEEE IPDPS 2021

Loe, Jennifer A.; Glusa, Christian A.; Yamazaki, Ichitaro Y.; Boman, Erik G.; Rajamanickam, Sivasankaran R.

Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strategies at the linear algebra algorithms level if we want to exploit the hardware to its full potential while meeting the accuracy requirements. In this paper, we focus on preconditioned sparse iterative linear solvers, a key kernel in several CSE applications. We present a study of multiprecision strategies for accelerating this kernel on GPUs. We seek the best methods for incorporating multiple precisions into the GMRES linear solver; these include iterative refinement and parallelizable preconditioners. Our work presents strategies to determine when multiprecision GMRES will be effective and to choose parameters for a multiprecision iterative refinement solver to achieve better performance. We use an implementation that is based on the Trilinos library and employs Kokkos Kernels for performance portability of linear algebra kernels. Performance results demonstrate the promise of multiprecision approaches and demonstrate even further improvements are possible by optimizing low-level kernels.

More Details

Quantum foundations of classical reversible computing

Entropy

Frank, Michael P.; Shukla, Karpur

The reversible computation paradigm aims to provide a new foundation for general classical digital computing that is capable of circumventing the thermodynamic limits to the energy efficiency of the conventional, non-reversible digital paradigm. However, to date, the essential rationale for, and analysis of, classical reversible computing (RC) has not yet been expressed in terms that leverage the modern formal methods of non-equilibrium quantum thermodynamics (NEQT). In this paper, we begin developing an NEQT-based foundation for the physics of reversible computing. We use the framework of Gorini-Kossakowski-Sudarshan-Lindblad dynamics (a.k.a. Lindbladians) with multiple asymptotic states, incorporating recent results from resource theory, full counting statistics and stochastic thermodynamics. Important conclusions include that, as expected: (1) Landauer’s Principle indeed sets a strict lower bound on entropy generation in traditional non-reversible architectures for deterministic computing machines when we account for the loss of correlations; and (2) implementations of the alternative reversible computation paradigm can potentially avoid such losses, and thereby circumvent the Landauer limit, potentially allowing the efficiency of future digital computing technologies to continue improving indefinitely. We also outline a research plan for identifying the fundamental minimum energy dissipation of reversible computing machines as a function of speed.

More Details

Peridynamic model for microballistic perforation of multilayer graphene

Theoretical and Applied Fracture Mechanics

Silling, Stewart A.; Fermen-Coker, Müge

The peridynamic theory of solid mechanics is applied to the continuum modeling of the impact of small, high-velocity silica spheres on multilayer graphene targets. The model treats the laminate as a brittle elastic membrane. The material model includes separate failure criteria for the initial rupture of the membrane and for propagating cracks. Material variability is incorporated by assigning random variations in elastic properties within Voronoi cells. The computational model is shown to reproduce the primary aspects of the response observed in experiments, including the growth of a family of radial cracks from the point of impact.

More Details
Results 576–600 of 9,998
Results 576–600 of 9,998