Publications

Results 4601–4800 of 9,998

Search results

Jump to search filters

Complex Systems Models and Their Applications: Towards a New Science of Verification, Validation & Uncertainty Quantification

Tsao, Jeffrey Y.; Trucano, Timothy G.; Kleban, S.D.; Naugle, Asmeret B.; Verzi, Stephen J.; Swiler, Laura P.; Johnson, Curtis M.; Smith, Mark A.; Flanagan, Tatiana P.; Vugrin, Eric D.; Gabert, Kasimir G.; Lave, Matthew S.; Chen, Wei; Delaurentis, Daniel; Hubler, Alfred; Oberkampf, Bill

This report contains the written footprint of a Sandia-hosted workshop held in Albuquerque, New Mexico, June 22-23, 2016 on “Complex Systems Models and Their Applications: Towards a New Science of Verification, Validation and Uncertainty Quantification,” as well as of pre-work that fed into the workshop. The workshop’s intent was to explore and begin articulating research opportunities at the intersection between two important Sandia communities: the complex systems (CS) modeling community, and the verification, validation and uncertainty quantification (VVUQ) community The overarching research opportunity (and challenge) that we ultimately hope to address is: how can we quantify the credibility of knowledge gained from complex systems models, knowledge that is often incomplete and interim, but will nonetheless be used, sometimes in real-time, by decision makers?

More Details

Improved Solver Settings for 3D Exploding Wire Simulations in ALEGRA

Doney, Robert; Siefert, Christopher S.; Niederhaus, John H.

We are interested in simulating a variety of problems in 3 dimensions (3D) featuring large electric currents. While 2D simulations have been quite informative, cylindrical symmetry may interfere with a problem’s relevant physics. Specifically, all objects in the domain behave as if they are extruded 360°—turning particles into hoops. In dealing with electrical current, this can have serious ramifications on the current pathways. In 3D (r, φ, z) currents can adjust their pathways anywhere along those 360 degrees given the right conditions; however, in 2D (r, z) those pathways can be completely choked off because an insulating hoop, rather than a particle, is present.

More Details

Curve Reconstruction with Many Fewer Samples

Computer Graphics Forum

Ohrhallinger, S.; Mitchell, Scott A.; Wimmer, M.

We consider the problem of sampling points from a collection of smooth curves in the plane, such that the Crust family of proximity-based reconstruction algorithms can rebuild the curves. Reconstruction requires a dense sampling of local features, i.e., parts of the curve that are close in Euclidean distance but far apart geodesically. We show that ε < 0.47-sampling is sufficient for our proposed HNN-Crust variant, improving upon the state-of-the-art requirement of ε < -sampling. Thus we may reconstruct curves with many fewer samples. We also present a new sampling scheme that reduces the required density even further than ε < 0.47-sampling. We achieve this by better controlling the spacing between geodesically consecutive points. Our novel sampling condition is based on the reach, the minimum local feature size along intervals between samples. This is mathematically closer to the reconstruction density requirements, particularly near sharp-angled features. We prove lower and upper bounds on reach ρ-sampling density in terms of lfs ε-sampling and demonstrate that we typically reduce the required number of samples for reconstruction by more than half.

More Details

Lagrangian Material Tracers (LMT) for Simulating Material Damage in ALEGRA

Sanchez, Jason J.; Luchini, Christopher B.; Strack, Otto E.

A method for providing non-diffuse transport of material quantities in arbitrary Lagrangian-Eulerian (ALE) dynamic solid mechanics computations is presented. ALE computations are highly desirable for simulating dynamic problems that incorporate multiple materials and large deformations. Despite the advantages of using ALE for such problems, the method is associated with diffusion of material quantities due to the advection transport step of the computational cycle. This drawback poses great difficulty for applications of material failure for which discrete features are important, but are smeared out as a result of the diffusive advection operation. The focus of this work is an ALE method that incorporates transport of variables on discrete, massless points that move with the velocity field, referred to as Lagrangian material tracers (LMT), and consequently prevents diffusion of certain material quantities of interest. A detailed description of the algorithm is provided along with discussion of its computational aspects. Simulation results include a simple proof of concept, verification using a manufactured solution, and fragmentation of a uniformly loaded thin ring that clearly demonstrates the improvement offered by the ALE LMT method.

More Details

A comparison of high-level programming choices for incomplete sparse factorization across different architectures

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Booth, Joshua D.; Kim, Kyungjoo K.; Rajamanickam, Sivasankaran R.

All many-core systems require fine-grained shared memory parallelism, however the most efficient way to extract such parallelism is far from trivial. Fine-grained parallel algorithms face various performance trade-offs related to tasking, accesses to global data-structures, and use of shared cache. While programming models provide high level abstractions, such as data and task parallelism, algorithmic choices still remain open on how to best implement irregular algorithms, such as sparse factorizations, while taking into account the trade-offs mentioned above. In this paper, we compare these performance trade-offs for task and data parallelism on different hardware architectures such as Intel Sandy Bridge, Intel Xeon Phi, and IBM Power8. We do this by comparing the scaling of a new task-parallel incomplete sparse Cholesky factorization called Tacho and a new data-parallel incomplete sparse LU factorization called Basker. Both solvers utilize Kokkos programming model and were developed within the ShyLU package of Trilinos. Using these two codes we demonstrate how high-level programming changes affect performance and overhead costs on multiple multi/many-core systems. We find that Kokkos is able to provide comparable performance with both parallel-for and task/futures on traditional x86 multicores. However, the choice of which high-level abstraction to use on many-core systems depends on both the architectures and input matrices.

More Details

NiMC: Characterizing and Eliminating Network-Induced Memory Contention

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Groves, Taylor G.; Grant, Ryan E.; Arnold, Dorian

Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems - enabling asynchronous data transfers, so that applications may fully utilize all CPU resources while simultaneously sharing data amongst remote nodes. We examined this network-induced memory contention (NiMC), the interactions between RDMA and the memory subsystem when applications and out-of-band services compete for memory resources, and NiMC's resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantified NiMC and show that NiMC's impact grows with scale resulting in up to 3X performance degradation at scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. We also evaluated three potential techniques to reduce NiMC's performance impact, namely hardware offloading, core reservation and software-based network throttling. While all three of these solutions show promise, we provide guidelines that help select the best solution for a given environment.

More Details

Overcoming challenges in scalable power monitoring with the power API

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Grant, Ryan E.; Levenhagen, Michael J.; Olivier, Stephen L.; DeBonis, David D.; Laros, James H.; Laros, James H.

Power will be a first-class operating constraint for Exascale computing. In order to manage power consumption of systems, measurement and control methods need to be developed. While several approaches have been developed by hardware manufacturers, they are vendor-specific and in some cases implementation-specific interfaces. Integrating all of the individual device level measurement and control functionality in a single system is a difficult task that requires system specific code. Sandia National Laboratories, in collaboration with many industry and academic partners, has developed a Power API specification, consisting of a broad range of interfaces spanning from low-level hardware to platform management and accounting. In order for many of the interfaces to be useful, especially at large scale, measurement data must be collected and control directives must be distributed in a scalable manner. This paper details the challenges of providing large scale power measurement and control and the scalable collection and control distribution architecture that is being integrated into the Power API reference implementation.

More Details

Basker: A threaded sparse LU factorization utilizing hierarchical parallelism and data layouts

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Booth, Joshua D.; Rajamanickam, Sivasankaran R.; Thornquist, Heidi K.

Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to parallelize the Gilbert-Peierls algorithm for sparse LU factorization. As architectures evolve, there exists a need for algorithms that are hierarchical in nature to match the hierarchy in thread teams, individual threads, and vector level parallelism. Basker is designed to map well to this hierarchy in architectures. There is also a need for data layouts to match multiple levels of hierarchy in memory. Basker uses a two-dimensional hierarchical structure of sparse matrices that maps to the hierarchy in the memory architectures and to the hierarchy in parallelism. We present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulations. Basker achieves a geometric mean speedup of 5.91× on CPU (16 cores) and 7.4× on Xeon Phi (32 cores) relative to KLU. Basker outperforms Intel MKL Pardiso (PMKL) by as much as 30× on CPU (16 cores) and 7.5× on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4× speedup on a challenging matrix sequence taken from an actual Xyce simulation.

More Details

Parallel Graph Coloring for Manycore Architectures

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Deveci, Mehmet D.; Boman, Erik G.; Devine, Karen D.; Rajamanickam, Sivasankaran R.

Graph algorithms are challenging to parallelize on manycore architectures due to complex data dependencies and irregular memory access. We consider the well studied problem of coloring the vertices of a graph. In many applications it is important to compute a coloring with few colors in near-lineartime. In parallel, the optimistic (speculative) coloring method by Gebremedhin and Manne is the preferred approach but it needs to be modified for manycore architectures. We discuss a range of implementation issues for this vertex-based optimistic approach. We also propose a novel edge-based optimistic approach that has more parallelism and is better suited to GPUs. We study the performance empirically on two architectures(Xeon Phi and GPU) and across many data sets (from finite element problems to social networks). Our implementation uses the Kokkos library, so it is portable across platforms. We show that on GPUs, we significantly reduce the number of colors (geometric mean 4X, but up to 48X) as compared to the widely used cuSPARSE library. In addition, our edge-based algorithm is 1.5 times faster on average than cuSPARSE, where it hasspeedups up to 139X on a circuit problem. We also show the effect of the coloring on a conjugate gradient solver using multi-colored Symmetric Gauss-Seidel method as preconditioner, the higher coloring quality found by the proposed methods reduces the overall solve time up to 33% compared to cuSPARSE.

More Details

Increasing Molecular Dynamics Simulation Rates with an 8-Fold Increase in Electrical Power Efficiency

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Brown, W.M.; Semin, Andrey; Hebenstreit, Michael; Khvostov, Sergey; Raman, Karthik; Plimpton, Steven J.

Electrical power efficiency is a primary concern in designing modern HPC systems. Common strategies to improve CPU power efficiency rely on increased parallelism within a processor that is enabled both by an increase in the vector capabilities within the core and also the number of cores within a processor. Although many-core processors have been available for some time, achieving power-efficient performance has been challenging due to the offload model. Here, we evaluate performance of the molecular dynamics code LAMMPS on two new Intel® processors including the second generation many-core Intel® Xeon Phi™ processor that is available as a bootable CPU. We describe our approach to measure power consumption out-of-band and software optimizations necessary to achieve energy efficiency. We analyze benefits from Intel® Advanced Vector Extensions 512 instructions and demonstrate increased simulations rates with over 9X the CPU+DRAM power efficiency when compared to the unoptimized code on previous generation processors.

More Details

Improving Application Resilience to Memory Errors with Lightweight Compression

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Levy, Scott L.; Ferreira, Kurt B.; Bridges, Patrick G.

In next-generation extreme-scale systems, application performance will be limited by memory performance characteristics. The first exascale system is projected to contain many petabytes of memory. In addition to the sheer volume of the memory required, device trends, such as shrinking feature sizes and reduced supply voltages, have the potential to increase the frequency of memory errors. As a result, resilience to memory errors is a key challenge. In this paper, we evaluate the viability of using memory compression to repair detectable uncorrectable errors (DUEs) in memory. We develop a software library, evaluate its performance and demonstrate that it is able to significantly compress memory of HPC applications. Further, we show that exploiting compressed memory pages to correct memory errors can significantly improve application performance on next-generation systems.

More Details

Stability of Peridynamic Correspondence Material Models and Their Particle Discretizations

Silling, Stewart A.

Peridynamic correspondence material models provide a way to combine a material model from the local theory with the inherent capabilities of peridynamics to model long-range forces and fracture. However, correspondence models in a typical particle discretization suffer from zero-energy mode instability. These instabilities are shown here to be an aspect of material stability. A stability condition is derived for state-based materials starting from the requirement of potential energy minimization. It is shown that all correspondence materials fail this stability condition due to zero-energy deformation modes of the family. To eliminate these modes, a term is added to the correspondence strain energy density that resists deviations from a uniform deformation. The resulting material model satisfies the stability condition while effectively leaving the stress tensor unchanged. Computational examples demonstrate the effectiveness of the modified material model in avoiding zero-energy mode instability in a peridynamic particle code.

More Details

Enabling fast, stable and accurate peridynamic computations using multi-time-step integration

Computer Methods in Applied Mechanics and Engineering

Lindsay, Payton L.; Parks, Michael L.; Prakash, A.

Peridynamics is a nonlocal extension of classical continuum mechanics that is well-suited for solving problems with discontinuities such as cracks. This paper extends the peridynamic formulation to decompose a problem domain into a number of smaller overlapping subdomains and to enable the use of different time steps in different subdomains. This approach allows regions of interest to be isolated and solved at a small time step for increased accuracy while the rest of the problem domain can be solved at a larger time step for greater computational efficiency. Performance of the proposed method in terms of stability, accuracy, and computational cost is examined and several numerical examples are presented to corroborate the findings.

More Details

Anti-persistence on persistent storage: History-independent sparse tables and dictionaries

Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

Bender, Michael A.; Berry, Jonathan W.; Johnson, Rob; Kroeger, Thomas M.; Mccauley, Samuel; Phillips, Cynthia A.; Simon, Bertrand; Singh, Shikha; Zage, David J.

We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent cache-oblivious B-tree and a history-independent external-memory skip list. One of the main contributions is a data structure we build on the way - a history-independent packed-memory array (PMA). The PMA supports efficient range queries, one of the most important operations for answering database queries. Our HI PMA matches the asymptotic bounds of prior non-HI packed-memory arrays and sparse tables. Specifically, a PMA maintains a dynamic set of elements in sorted order in a linearsized array. Inserts and deletes take an amortized O(log2 N) element moves with high probability. Simple experiments with our implementation of HI PMAs corroborate our theoretical analysis. Comparisons to regular PMAs give preliminary indications that the practical cost of adding history-independence is not too large. Our HI cache-oblivious B-tree bounds match those of prior non-HI cache-oblivious B-trees. Searches take O(logB N) I/Os; inserts and deletes take O(log2N/B + logB N) amortized I/Os with high probability; and range queries returning k elements take O(logB N + k/B) I/Os. Our HI external-memory skip list achieves optimal bounds with high probability, analogous to in-memory skip lists: O(logB N) I/Os for point queries and amortized O(logB N) I/Os for inserts/deletes. Range queries returning k elements run in O(logB N + k/B) I/Os. In contrast, the best possible high-probability bounds for inserting into the folklore B-skip list, which promotes elements with probability 1/B, is just Θ(log N) I/Os. This is no better than the bounds one gets from running an inmemory skip list in external memory.

More Details

Scalable implicit incompressible resistive MHD with stabilized FE and fully-coupled Newton-Krylov-AMG

Computer Methods in Applied Mechanics and Engineering

Shadid, John N.; Pawlowski, Roger P.; Cyr, Eric C.; Tuminaro, Raymond S.; Chacon, L.; Weber, Paula D.

The computational solution of the governing balance equations for mass, momentum, heat transfer and magnetic induction for resistive magnetohydrodynamics (MHD) systems can be extremely challenging. These difficulties arise from both the strong nonlinear, nonsymmetric coupling of fluid and electromagnetic phenomena, as well as the significant range of time- and length-scales that the interactions of these physical mechanisms produce. This paper explores the development of a scalable, fully-implicit stabilized unstructured finite element (FE) capability for 3D incompressible resistive MHD. The discussion considers the development of a stabilized FE formulation in context of the variational multiscale (VMS) method, and describes the scalable implicit time integration and direct-to-steady-state solution capability. The nonlinear solver strategy employs Newton-Krylov methods, which are preconditioned using fully-coupled algebraic multilevel preconditioners. These preconditioners are shown to enable a robust, scalable and efficient solution approach for the large-scale sparse linear systems generated by the Newton linearization. Verification results demonstrate the expected order-of-accuracy for the stabilized FE discretization. The approach is tested on a variety of prototype problems, that include MHD duct flows, an unstable hydromagnetic Kelvin-Helmholtz shear layer, and a 3D island coalescence problem used to model magnetic reconnection. Initial results that explore the scaling of the solution methods are also presented on up to 128K processors for problems with up to 1.8B unknowns on a CrayXK7.

More Details

Convergence studies in meshfree peridynamic simulations

Computers and Mathematics with Applications

Seleson, Pablo; Littlewood, David J.

Meshfree methods are commonly applied to discretize peridynamic models, particularly in numerical simulations of engineering problems. Such methods discretize peridynamic bodies using a set of nodes with characteristic volume, leading to particle-based descriptions of systems. In this paper, we perform convergence studies of static peridynamic problems. We show that commonly used meshfree methods in peridynamics suffer from accuracy and convergence issues, due to a rough approximation of the contribution of nodes near the boundary of the neighborhood of a given node to numerical integrations. We propose two methods to improve meshfree peridynamic simulations. The first method uses accurate computations of volumes of intersections between neighbor cells and the neighborhood of a given node, referred to as partial volumes. The second method employs smooth influence functions with a finite support within peridynamic kernels. Numerical results demonstrate great improvements in accuracy and convergence of peridynamic numerical solutions when using the proposed methods.

More Details

Optimization-based additive decomposition of weakly coercive problems with applications

Computers and Mathematics with Applications

Bochev, Pavel B.; Ridzal, Denis R.

We present an abstract mathematical framework for an optimization-based additive decomposition of a large class of variational problems into a collection of concurrent subproblems. The framework replaces a given monolithic problem by an equivalent constrained optimization formulation in which the subproblems define the optimization constraints and the objective is to minimize the mismatch between their solutions. The significance of this reformulation stems from the fact that one can solve the resulting optimality system by an iterative process involving only solutions of the subproblems. Consequently, assuming that stable numerical methods and efficient solvers are available for every subproblem, our reformulation leads to robust and efficient numerical algorithms for a given monolithic problem by breaking it into subproblems that can be handled more easily. An application of the framework to the Oseen equations illustrates its potential.

More Details

A coupling strategy for nonlocal and local diffusion models with mixed volume constraints and boundary conditions

Computers and Mathematics with Applications

D'Elia, Marta D.; Perego, Mauro P.; Bochev, Pavel B.; Littlewood, David J.

We develop and analyze an optimization-based method for the coupling of nonlocal and local diffusion problems with mixed volume constraints and boundary conditions. The approach formulates the coupling as a control problem where the states are the solutions of the nonlocal and local equations, the objective is to minimize their mismatch on the overlap of the nonlocal and local domains, and the controls are virtual volume constraints and boundary conditions. When some assumptions on the kernel functions hold, we prove that the resulting optimization problem is well-posed and discuss its implementation using Sandia's agile software components toolkit. The latter provides the groundwork for the development of engineering analysis tools, while numerical results for nonlocal diffusion in three-dimensions illustrate key properties of the optimization-based coupling method.

More Details

A cross-enclave composition mechanism for exascale system software

Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2016 - In conjunction with HPDC 2016

Evans, Noah; Laros, James H.; Kocoloski, Brian; Lange, John; Lang, Michael; Bridges, Patrick G.

As supercomputers move to exascale, the number of cores per node continues to increase, but the I/O bandwidth between nodes is increasing more slowly. This leads to computational power outstripping I/O bandwidth. This growth, in turn, encourages moving as much of an HPC workflow as possible onto the node in order to minimize data movement. One particular method of application composition, enclaves, co-locates different operating systems and runtimes on the same node where they communicate by in situ communication mechanisms. In this work, we describe a mechanism for communicating between composed applications. We implement a mechanism using Copy onWrite cooperating with XEMEM shared memory to provide consistent, implicitly unsynchronized communication across enclaves. We then evaluate this mechanism using a composed application and analytics between the Kitten Lightweight Kernel and Linux on top of the Hobbes Operating System and Runtime. These results show a 3% overhead compared to an application running in isolation, demonstrating the viability of this approach.

More Details

A coupling strategy for nonlocal and local diffusion models with mixed volume constraints and boundary conditions

Computers and Mathematics with Applications (Oxford)

D'Elia, Marta D.; Perego, Mauro P.; Bochev, Pavel B.; Littlewood, David J.

We develop and analyze an optimization-based method for the coupling of nonlocal and local diffusion problems with mixed volume constraints and boundary conditions. The approach formulates the coupling as a control problem where the states are the solutions of the nonlocal and local equations, the objective is to minimize their mismatch on the overlap of the nonlocal and local domains, and the controls are virtual volume constraints and boundary conditions. When some assumptions on the kernel functions hold, we prove that the resulting optimization problem is well-posed and discuss its implementation using Sandia’s agile software components toolkit. As a result, the latter provides the groundwork for the development of engineering analysis tools, while numerical results for nonlocal diffusion in three-dimensions illustrate key properties of the optimization-based coupling method.

More Details

A spectral mimetic least-squares method for the Stokes equations with no-slip boundary condition

Computers and Mathematics with Applications

Gerritsma, Marc; Bochev, Pavel B.

Formulation of locally conservative least-squares finite element methods (LSFEMs) for the Stokes equations with the no-slip boundary condition has been a long standing problem. Existing LSFEMs that yield exactly divergence free velocities require non-standard boundary conditions (Bochev and Gunzburger, 2009 [3]), while methods that admit the no-slip condition satisfy the incompressibility equation only approximately (Bochev and Gunzburger, 2009 [4, Chapter 7]). Here we address this problem by proving a new non-standard stability bound for the velocity-vorticity-pressure Stokes system augmented with a no-slip boundary condition. This bound gives rise to a norm-equivalent least-squares functional in which the velocity can be approximated by div-conforming finite element spaces, thereby enabling a locally-conservative approximations of this variable. We also provide a practical realization of the new LSFEM using high-order spectral mimetic finite element spaces (Kreeft et al., 2011) and report several numerical tests, which confirm its mimetic properties.

More Details

Optical networks for high-performance computing: Promises and perils

5th IEEE Photonics Society Optical Interconnects Conference, OI 2016

Rodrigues, Arun

Optical networks hold great promise for improving the performance of supercomputers, yet they have always proven just out of reach. This talk will examine the potential of optical interconnects, barriers to adoption, and possible solutions from hardware/software co-design.

More Details

A cross-enclave composition mechanism for exascale system software

Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2016 - In conjunction with HPDC 2016

Evans, Noah; Laros, James H.; Kocoloski, Brian; Lange, John; Lang, Michael; Bridges, Patrick G.

As supercomputers move to exascale, the number of cores per node continues to increase, but the I/O bandwidth between nodes is increasing more slowly. This leads to computational power outstripping I/O bandwidth. This growth, in turn, encourages moving as much of an HPC workflow as possible onto the node in order to minimize data movement. One particular method of application composition, enclaves, co-locates different operating systems and runtimes on the same node where they communicate by in situ communication mechanisms. In this work, we describe a mechanism for communicating between composed applications. We implement a mechanism using Copy onWrite cooperating with XEMEM shared memory to provide consistent, implicitly unsynchronized communication across enclaves. We then evaluate this mechanism using a composed application and analytics between the Kitten Lightweight Kernel and Linux on top of the Hobbes Operating System and Runtime. These results show a 3% overhead compared to an application running in isolation, demonstrating the viability of this approach.

More Details

In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms

Computer Graphics Forum

Bauer, A.C.; Abbasi, H.; Ahrens, J.; Childs, H.; Geveci, B.; Klasky, S.; Moreland, Kenneth D.; O'Leary, P.; Vishwanath, V.; Whitlock, B.; Bethel, E.W.

The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization data without first writing to disk, i. e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed/visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, where fine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Third is the ability to use all available resources, CPU's and accelerators, in the computation of analysis products. This STAR paper brings together researchers, developers and practitioners using in situ methods in extreme-scale HPC with the goal to present existing methods, infrastructures, and a range of computational science and engineering applications using in situ analysis and visualization.

More Details

Mini-Ckpts: Surviving OS failures in persistent memory

Proceedings of the International Conference on Supercomputing

Fiala, David; Mueller, Frank; Ferreira, Kurt B.; Engelmann, Christian

Concern is growing in the high-performance computing (HPC) community on the reliability of future extreme- scale systems. Current efforts have focused on appli- cation fault-tolerance rather than the operating system (OS), despite the fact that recent studies have suggested that failures in OS memory may be more likely. The OS is critical to a system's correct and efficient operation of the node and processes it governs-and the parallel na- ture of HPC applications means any single node failure generally forces all processes of this application to ter- minate due to tight communication in HPC. Therefore, the OS itself must be capable of tolerating failures in a robust system. In this work, we introduce mini-ckpts, a framework which enables application survival despite the occurrence of a fatal OS failure or crash. minickpts achieves this tolerance by ensuring that the crit- ical data describing a process is preserved in persistent memory prior to the failure. Following the failure, the OS is rejuvenated via a warm reboot and the applica- tion continues execution effectively making the failure and restart transparent. The mini-ckpts rejuvenation and recovery process is measured to take between three to six seconds and has a failure-free overhead of between 3-5% for a number of key HPC workloads. In contrast to current fault-tolerance methods, this work ensures that the operating and runtime systems can continue in the presence of faults. This is a much finer-grained and dynamic method of fault-tolerance than the current coarse-grained application-centric methods. Handling faults at this level has the potential to greatly reduce overheads and enables mitigation of additional faults.

More Details

An examination of the impact of failure distribution on coordinated checkpoint/restart

FTXS 2016 - Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale

Levy, Scott L.; Ferreira, Kurt B.

Fault tolerance is a key challenge to building the first exascale system. To understand the potential impacts of failures on next-generation systems, significant effort has been devoted to collecting, characterizing and analyzing failures on current systems. These studies require large volumes of data and complex analysis. Because the occurrence of failures in large-scale systems is unpredictable, failures are commonly modeled as a stochastic process. Failure data from current systems is examined in an attempt to identify the underlying probability distribution and its statistical properties. In this paper, we use modeling to examine the impact of failure distributions on the time-to-solution and the optimal checkpoint interval of applications that use coordinated checkpoint/restart. Using this approach, we show that as failures become more frequent, the failure distribution has a larger influence on application performance. We also show that as failure times are less tightly grouped (i.e., as the standard deviation increases) the underlying probability distribution has a greater impact on application performance. Finally, we show that computing the checkpoint interval based on the assumption that failures are exponentially distributed has a modest impact on application performance even when failures are drawn from a different distribution. Our work provides critical analysis and guidance to the process of analyzing failure data in the context of coordinated checkpoint/restart. Specifically, the data presented in this paper helps to distinguish cases where the failure distribution has a strong influence on application performance from those cases when the failure distribution has relatively little impact.

More Details

Power signatures of electric field and thermal switching regimes in memristive SET transitions

Journal of Physics D: Applied Physics

Hughart, David R.; Gao, Xujiao G.; Mamaluy, Denis M.; Marinella, Matthew J.; Mickel, Patrick R.

We present a study of the 'snap-back' regime of resistive switching hysteresis in bipolar TaOx memristors, identifying power signatures in the electronic transport. Using a simple model based on the thermal and electric field acceleration of ionic mobilities, we provide evidence that the 'snap-back' transition represents a crossover from a coupled thermal and electric-field regime to a primarily thermal regime, and is dictated by the reconnection of a ruptured conducting filament. We discuss how these power signatures can be used to limit filament radius growth, which is important for operational properties such as power, speed, and retention.

More Details

ALEGRA based computation of magnetostatic configurations

2016 IEEE/ACES International Conference on Wireless Information Technology, ICWITS 2016 and System and Applied Computational Electromagnetics, ACES 2016 - Proceedings

Grinfeld, Michael; Mcdonald, Jason; Niederhaus, John H.

We explore how reliable the ALEGRA MHD code is in its static limit. Also, we explore (in the quasi-static approximation) the process of evolution of the magnetic fields inside and outside an inclusion and the parameters for which the quasi-static approach provides for self-consistent results.

More Details

High Performance Computing: Power Application Programming Interface Specification (V.1.3)

Laros, James H.; Kelly, Suzanne M.; Laros, James H.; Grant, Ryan E.; Olivier, Stephen L.; Levenhagen, Michael J.; DeBonis, David D.

Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area [13, 3, 5, 10, 4, 21, 19, 16, 7, 17, 20, 18, 11, 1, 6, 14, 12]. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

More Details

Constrained Versions of DEDICOM for Use in Unsupervised Part-Of-Speech Tagging

Dunlavy, Daniel D.; Chew, Peter A.

This reports describes extensions of DEDICOM (DEcomposition into DIrectional COMponents) data models [3] that incorporate bound and linear constraints. The main purpose of these extensions is to investigate the use of improved data models for unsupervised part-of-speech tagging, as described by Chew et al. [2]. In that work, a single domain, two-way DEDICOM model was computed on a matrix of bigram fre- quencies of tokens in a corpus and used to identify parts-of-speech as an unsupervised approach to that problem. An open problem identi ed in that work was the com- putation of a DEDICOM model that more closely resembled the matrices used in a Hidden Markov Model (HMM), speci cally through post-processing of the DEDICOM factor matrices. The work reported here consists of the description of several models that aim to provide a direct solution to that problem and a way to t those models. The approach taken here is to incorporate the model requirements as bound and lin- ear constrains into the DEDICOM model directly and solve the data tting problem as a constrained optimization problem. This is in contrast to the typical approaches in the literature, where the DEDICOM model is t using unconstrained optimization approaches, and model requirements are satis ed as a post-processing step.

More Details

Path Network Recovery Using Remote Sensing Data and Geospatial-Temporal Semantic Graphs

McLendon, William C.; Brost, Randolph B.

Remote sensing systems produce large volumes of high-resolution images that are difficult to search. The GeoGraphy (pronounced Geo-Graph-y) framework [2, 20] encodes remote sensing imagery into a geospatial-temporal semantic graph representation to enable high level semantic searches to be performed. Typically scene objects such as buildings and trees tend to be shaped like blocks with few holes, but other shapes generated from path networks tend to have a large number of holes and can span a large geographic region due to their connectedness. For example, we have a dataset covering the city of Philadelphia in which there is a single road network node spanning a 6 mile x 8 mile region. Even a simple question such as "find two houses near the same street" might give unexpected results. More generally, nodes arising from networks of paths (roads, sidewalks, trails, etc.) require additional processing to make them useful for searches in GeoGraphy. We have assigned the term Path Network Recovery to this process. Path Network Recovery is a three-step process involving (1) partitioning the network node into segments, (2) repairing broken path segments interrupted by occlusions or sensor noise, and (3) adding path-aware search semantics into GeoQuestions. This report covers the path network recovery process, how it is used, and some example use cases of the current capabilities.

More Details

Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems

Levy, Scott L.

High-performance computing (HPC) systems enable scientists to numerically model complex phenomena in many important physical systems. The next major milestone in the development of HPC systems is the construction of the rst supercomputer capable executing more than an exa op, 1018 oating point operations per second. On systems of this scale, failures will occur much more frequently than on current systems. As a result, resilience is a key obstacle to building next-generation extremescale systems. Coordinated checkpointing is currently the most widely-used mechanism for handling failures on HPC systems. Although coordinated checkpointing remains e ective on current systems, increasing the scale of today's systems to build next-generation systems will increase the cost of fault tolerance as more and more time is taken away from the application to protect against or recover from failure. Rollback avoidance techniques seek to mitigate the cost of checkpoint/restart by allowing an application to continue its execution rather than rolling back to an earlier checkpoint when failures occur. These techniqes include failure prediction and preventive migration, replicated computation, fault-tolerant algorithms, and softwarebased memory fault correction. In this thesis, we examine how rollback avoidance techniques can be used to address failures on extreme-scale systems. Using a combination of analytic modeling and simulation, we evaluate the potential impact of rollback avoidance on these systems. We then present a novel rollback avoidance technique that exploits similarities in application memory. Finally, we examine the feasibility of using this technique to protect against memory faults in kernel memory.

More Details

Ifpack2 User's Guide 1.0

Prokopenko, Andrey V.; Siefert, Christopher S.; Hu, Jonathan J.; Hoemmen, Mark F.; Klinvex, Alicia M.

This is the definitive user manual for the I FPACK 2 package in the Trilinos project. I FPACK 2 pro- vides implementations of iterative algorithms (e.g., Jacobi, SOR, additive Schwarz) and processor- based incomplete factorizations. I FPACK 2 is part of the Trilinos T PETRA solver stack, is templated on index, scalar, and node types, and leverages node-level parallelism indirectly through its use of T PETRA kernels. I FPACK 2 can be used to solve to matrix systems with greater than 2 billion rows (using 64-bit indices). Any options not documented in this manual should be considered strictly experimental .

More Details
Results 4601–4800 of 9,998
Results 4601–4800 of 9,998