Numerical Analysis of Robust Phase Estimation
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Thermodynamic quantities, such as pressure and internal energy, and their derivatives, are used in many applications. Depending on application, a natural set of quantities related to one of four thermodynamic potentials are typically used. For example, hydro-codes use internal energy derived quantities and Equation of State work often uses Helmholtz free energy quantities. When performing work spanning over several fields, transformations between one set of quantities and another set of quantities are often needed. A short, but comprehensive, review of such transformations are given in this report.
International Journal of High Performance Computing Applications
We describe a new high-performance conjugate-gradient (HPCG) benchmark. HPCG is composed of computations and data-access patterns commonly found in scientific applications. HPCG strives for a better correlation to existing codes from the computational science domain and to be representative of their performance. HPCG is meant to help drive the computer system design and implementation in directions that will better impact future performance improvement.
We present a verification and validation analysis of a coordinate-transformation-based numerical solution method for the two-dimensional axisymmetric magnetic diffusion equation, implemented in the finite-element simulation code ALEGRA. The transformation, suggested by Melissen and Simkin, yields an equation set perfectly suited for linear finite elements and for problems with large jumps in material conductivity near the axis. The verification analysis examines transient magnetic diffusion in a rod or wire in a very low conductivity background by first deriving an approximate analytic solution using perturbation theory. This approach for generating a reference solution is shown to be not fully satisfactory. A specialized approach for manufacturing an exact solution is then used to demonstrate second-order convergence under spatial refinement and tem- poral refinement. For this new implementation, a significant improvement relative to previously available formulations is observed. Benefits in accuracy for computed current density and Joule heating are also demonstrated. The validation analysis examines the circuit-driven explosion of a copper wire using resistive magnetohydrodynamics modeling, in comparison to experimental tests. The new implementation matches the accuracy of the existing formulation, with both formulations capturing the experimental burst time and action to within approximately 2%.
Journal of Applied Physics
We report on a new technique for obtaining off-Hugoniot pressure vs. density data for solid metals compressed to extreme pressure by a magnetically driven liner implosion on the Z-machine (Z) at Sandia National Laboratories. In our experiments, the liner comprises inner and outer metal tubes. The inner tube is composed of a sample material (e.g., Ta and Cu) whose compressed state is to be inferred. The outer tube is composed of Al and serves as the current carrying cathode. Another aluminum liner at much larger radius serves as the anode. A shaped current pulse quasi-isentropically compresses the sample as it implodes. The iterative method used to infer pressure vs. density requires two velocity measurements. Photonic Doppler velocimetry probes measure the implosion velocity of the free (inner) surface of the sample material and the explosion velocity of the anode free (outer) surface. These two velocities are used in conjunction with magnetohydrodynamic simulation and mathematical optimization to obtain the current driving the liner implosion, and to infer pressure and density in the sample through maximum compression. This new equation of state calibration technique is illustrated using a simulated experiment with a Cu sample. Monte Carlo uncertainty quantification of synthetic data establishes convergence criteria for experiments. Results are presented from experiments with Al/Ta, Al/Cu, and Al liners. Symmetric liner implosion with quasi-isentropic compression to peak pressure ∼1000 GPa is achieved in all cases. These experiments exhibit unexpectedly softer behavior above 200 GPa, which we conjecture is related to differences in the actual and modeled properties of aluminum.
2015 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2015
Rebooting Computing (RC) is an effort in the IEEE to rethink future computers. RC started in 2012 by the co-chairs, Elie Track (IEEE Council on Superconductivity) and Tom Conte (Computer Society). RC takes a holistic approach, considering revolutionary as well as evolutionary solutions needed to advance computer technologies. Three summits have been held in 2013 and 2014, discussing different technologies, from emerging devices to user interface, from security to energy efficiency, from neuromorphic to reversible computing. The first part of this paper introduces RC to the design automation community and solicits revolutionary ideas from the community for the directions of future computer research. Energy efficiency is identified as one of the most important challenges in future computer technologies. The importance of energy efficiency spans from miniature embedded sensors to wearable computers, from individual desktops to data centers. To gauge the state of the art, the RC Committee organized the first Low Power Image Recognition Challenge (LPIRC). Each image contains one or multiple objects, among 200 categories. A contestant has to provide a working system that can recognize the objects and report the bounding boxes of the objects. The second part of this paper explains LPIRC and the solutions from the top two winners.
Handbook of Peridynamic Modeling
As discussed in the previous chapter, the purpose of peridynamics is to unify the mechanics of continuous media, continuous media with evolving discontinuities, and discrete particles. To accomplish this, peridynamics avoids the use of partial derivatives of the deformation with respect to spatial coordinates. Instead, it uses integral equations that remain valid on discontinuities. Discrete particles, as will be discussed later in this chapter, are treated using Dirac delta functions.
ESAIM: Mathematical Modelling and Numerical Analysis
We formulate and analyze an optimization-based Atomistic-to-Continuum (AtC) coupling method for problems with point defects. Application of a potential-based atomistic model near the defect core enables accurate simulation of the defect. Away from the core, where site energies become nearly independent of the lattice position, the method switches to a more efficient continuum model. The two models are merged by minimizing the mismatch of their states on an overlap region, subject to the atomistic and continuum force balance equations acting independently in their domains. We prove that the optimization problem is well-posed and establish error estimates.
Proceedings of the Workshop on Algorithm Engineering and Experiments
Solving Laplacian linear systems is an important task in a variety of practical and theoretical applications. This problem is known to have solutions that perform in linear times polylogarithmic work in theory, but these algorithms are difficult to implement in practice. We examine existing solution techniques in order to determine the best methods currently available and for which types of problems are they useful. We perform timing experiments using a variety of solvers on a variety of problems and present our results. We discover differing solver behavior between web graphs and a class of synthetic graphs designed to model them.
Conference Proceedings of the Society for Experimental Mechanics Series
It is well known that the derivative-based classical approach to strain is problematic when the displacement field is irregular, noisy, or discontinuous. Difficulties arise wherever the displacements are not differentiable. We present an alternative, nonlocal approach to calculating strain from digital image correlation (DIC) data that is well-defined and robust, even for the pathological cases that undermine the classical strain measure. This integral formulation for strain has no spatial derivatives and when the displacement field is smooth, the nonlocal strain and the classical strain are identical. We submit that this approach to computing strains from displacements will greatly improve the fidelity and efficacy of DIC for new application spaces previously untenable in the classical framework.
Parallel Computing
We present a local search strategy to improve the coordinate-based mapping of a parallel job's tasks to the MPI ranks of its parallel allocation in order to reduce network congestion and the job's communication time. The goal is to reduce the number of network hops between communicating pairs of ranks. Our target is applications with a nearest-neighbor stencil communication pattern running on mesh systems with non-contiguous processor allocation, such as Cray XE and XK Systems. Using the miniGhost mini-app, which models the shock physics application CTH, we demonstrate that our strategy reduces application running time while also reducing the runtime variability. We further show that mapping quality can vary based on the selected allocation algorithm, even between allocation algorithms of similar apparent quality.
SIAM Journal on Scientific Computing
Magnetohydrodynamic (MHD) representations are used to model a wide range of plasma physics applications and are characterized by a nonlinear system of partial differential equations that strongly couples a charged fluid with the evolution of electromagnetic fields. The resulting linear systems that arise from discretization and linearization of the nonlinear problem are generally difficult to solve. In this paper, we investigate multigrid preconditioners for this system. We consider two well-known multigrid relaxation methods for incompressible fluid dynamics: Braess-Sarazin relaxation and Vanka relaxation. We first extend these to the context of steady-state one-fluid viscoresistive MHD. Then we compare the two relaxation procedures within a multigrid-preconditioned GMRES method employed within Newton's method. To isolate the effects of the different relaxation methods, we use structured grids, inf-sup stable finite elements, and geometric interpolation. We present convergence and timing results for a two-dimensional, steady-state test problem.
Frontiers in Neuroscience
The exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-based architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
SIAM Journal on Scientific Computing
This paper describes the design of Teko, an object-oriented C++ library for implementing advanced block preconditioners. Mathematical design criteria that elucidate the needs of block preconditioning libraries and techniques are explained and shown to motivate the structure of Teko. For instance, a principal design choice was for Teko to strongly reflect the mathematical statement of the preconditioners to reduce development burden and permit focus on the numerics. Additional mechanisms are explained that provide a pathway to developing an optimized production capable block preconditioning capability with Teko. Finally, Teko is demonstrated on fluid flow and magnetohydrodynamics applications. In addition to highlighting the features of the Teko library, these new results illustrate the effectiveness of recent preconditioning developments applied to advanced discretization approaches.
Abstract not provided.
Abstract not provided.
Abstract not provided.
In recent years, advanced network analytics have become increasingly important to na- tional security with applications ranging from cyber security to detection and disruption of ter- rorist networks. While classical computing solutions have received considerable investment, the development of quantum algorithms to address problems, such as data mining of attributed relational graphs, is a largely unexplored space. Recent theoretical work has shown that quan- tum algorithms for graph analysis can be more efficient than their classical counterparts. Here, we have implemented a trapped-ion-based two-qubit quantum information proces- sor to address these goals. Building on Sandia's microfabricated silicon surface ion traps, we have designed, realized and characterized a quantum information processor using the hyperfine qubits encoded in two 171 Yb + ions. We have implemented single qubit gates using resonant microwave radiation and have employed Gate set tomography (GST) to characterize the quan- tum process. For the first time, we were able to prove that the quantum process surpasses the fault tolerance thresholds of some quantum codes by demonstrating a diamond norm distance of less than 1 . 9 x 10 [?] 4 . We used Raman transitions in order to manipulate the trapped ions' motion and realize two-qubit gates. We characterized the implemented motion sensitive and insensitive single qubit processes and achieved a maximal process infidelity of 6 . 5 x 10 [?] 5 . We implemented the two-qubit gate proposed by Molmer and Sorensen and achieved a fidelity of more than 97 . 7%.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
SIAM Journal on Matrix Analysis and Applications
This work presents a new Krylov-subspace-recycling method for efficiently solving sequences of linear systems of equations characterized by varying right-hand sides and symmetric-positive-definite matrices. As opposed to typical truncation strategies used in recycling such as deflation, we propose a truncation method inspired by goal-oriented proper orthogonal decomposition (POD) from model reduction. This idea is based on the observation that model reduction aims to compute a low-dimensional subspace that contains an accurate solution; as such, we expect the proposed method to generate a low-dimensional subspace that is well suited for computing solutions that can satisfy inexact tolerances. In particular, we propose specific goal-oriented POD "ingredients" that align the optimality properties of POD with the objective of Krylov-subspace recycling. To compute solutions in the resulting "augmented" POD subspace, we propose a hybrid direct/iterative three-stage method that leverages (1) the optimal ordering of POD basis vectors, and (2) well-conditioned reduced matrices. Numerical experiments performed on solid-mechanics problems highlight the benefits of the proposed method over existing approaches for Krylov-subspace recycling.
SEG Technical Program Expanded Abstracts
We present a synthetic study investigating the resolution limits of Full Wavefield Inversion (FWI) when applied to data generated from a visco-TTI-elastic (VTE) model. We compare VTE inversion having fixed Q and TTI, with acoustic inversion of acoustically generated data and elastic inversion of elastically generated data.
SEG Technical Program Expanded Abstracts
The need to better represent the material properties within the earth's interior has driven the development of higherfidelity physics, e.g., visco-tilted-transversely-isotropic (visco- TTI) elastic media and material interfaces, such as the ocean bottom and salt boundaries. This is especially true for full waveform inversion (FWI), where one would like to reproduce the real-world effects and invert on unprocessed raw data. Here we present a numerical formulation using a Discontinuous Galerkin (DG) finite-element (FE) method, which incorporates the desired high-fidelity physics and material interfaces. To offset the additional costs of this material representation, we include a variety of techniques (e.g., non-conformal meshing, and local polynomial refinement), which reduce the overall costs with little effect on the solution accuracy.
Procedia Engineering
We introduce Recursive Spoke Darts (RSD): a recursive hyperplane sampling algorithm that exploits the full duality between Voronoi and Delaunay entities of various dimensions. Our algorithm abandons the dependence on the empty sphere principle in the generation of Delaunay simplices providing the foundation needed for scalable consistent meshing. The algorithm relies on two simple operations: line-hyperplane trimming and spherical range search. Consequently, this approach improves scalability as multiple processors can operate on different seeds at the same time. Moreover, generating consistent meshes across processors eliminates the communication needed between them, improving scalability even more. We introduce a simple tweak to the algo- rithm which makes it possible not to visit all vertices of a Voronoi cell, generating almost-exact Delaunay graphs while avoiding the natural curse of dimensionality in high dimensions.
Human Vision and Electronic Imaging 2016, HVEI 2016
We know the rainbow color map is terrible, and it is emphatically reviled by the visualization community, yet its use continues to persist. Why do we continue to use a this perceptual encoding with so many known flaws? Instead of focusing on why we should not use rainbow colors, this position statement explores the rational for why we do pick these colors despite their flaws. Often the decision is influenced by a lack of knowledge, but even experts that know better sometimes choose poorly. A larger issue is the expedience that we have inadvertently made the rainbow color map become. Knowing why the rainbow color map is used will help us move away from it. Education is good, but clearly not sufficient. We gain traction by making sensible color alternatives more convenient. It is not feasible to force, a color map on users. Our goal is to supplant the rainbow color map as a common standard, and we w ill find that even those wedded to it will migrate away.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
System-of-systems modeling has traditionally focused on physical systems rather than humans, but recent events have proved the necessity of considering the human in the loop. As technology becomes more complex and layered security continues to increase in importance, capturing humans and their interactions with technologies within the system-of-systems will be increasingly necessary. After an extensive job-task analysis, a novel type of system-ofsystems simulation model has been created to capture the human-technology interactions on an extra-small forward operating base to better understand performance, key security drivers, and the robustness of the base. In addition to the model, an innovative framework for using detection theory to calculate d’ for individual elements of the layered security system, and for the entire security system as a whole, is under development.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
We present here an example of how a large,multi-dimensional unstructured data set, namely aircraft trajectories over the United States, can be analyzed using relatively straightforward unsupervised learning techniques. We begin by adding a rough structure to the trajectory data using the notion of distance geometry. This provides a very generic structure to the data that allows it to be indexed as an n-dimensional vector. We then do a clustering based on the HDBSCAN algorithm to both group flights with similar shapes and find outliers that have a relatively unique shape. Next, we expand the notion of geometric features to more specialized features and demonstrate the power of these features to solve specific problems. Finally, we highlight not just the power of the technique but also the speed and simplicity of the implementation by demonstrating them on very large data sets.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A critical challenge in data science is conveying the meaning of data to human decision makers. While working with visualizations, decision makers are engaged in a visual search for information to support their reasoning process. As sensors proliferate and high performance computing becomes increasingly accessible, the volume of data decision makers must contend with is growing continuously and driving the need for more efficient and effective data visualizations. Consequently, researchers across the fields of data science, visualization, and human-computer interaction are calling for foundational tools and principles to assess the effectiveness of data visualizations. In this paper, we compare the performance of three different saliency models across a common set of data visualizations. This comparison establishes a performance baseline for assessment of new data visualization saliency models.
ECS Transactions
As CMOS technology approaches the end of its scaling, oxide-based memristors have become one of the leading candidates for post-CMOS memory and logic devices. To facilitate the understanding of physical switching mechanisms and accelerate experimental development of memristors, we have developed a three-dimensional fully-coupled electrical and thermal transport model, which captures all the important processes that drive memristive switching and is applicable for simulating a wide range of memristors. The model is applied to simulate the RESET and SET switching in a 3D filamentary TaOx memristor. Extensive simulations show that the switching dynamics of the bipolar device is determined by thermally-activated field-dominant processes: with Joule heating, the raised temperature enables the movement of oxygen vacancies, and the field drift dominates the overall motion of vacancies. Simulated current-voltage hysteresis and device resistance profiles as a function of time and voltage during RESET and SET switching show good agreement with experimental measurement.
American Society of Mechanical Engineers, Fluids Engineering Division (Publication) FEDSM
This contribution is the second part of three papers on Adaptive Multigrid Methods for the eXtended Fluid-Structure Interaction (eXFSI) Problem, where we introduce a monolithic variational formulation and solution techniques. To the best of our knowledge, such a model is new in the literature. This model is used to design an on-line structural health monitoring (SHM) system in order to determine the coupled acoustic and elastic wave propagation in moving domains and optimum locations for SHM sensors. In a monolithic nonlinear fluid-structure interaction (FSI), the fluid and structure models are formulated in different coordinate systems. This makes the FSI setup of a common variational description difficult and challenging. This article presents the state-of-the-art in the finite element approximation of FSI problem based on monolithic variational formulation in the well-established arbitrary Lagrangian Eulerian (ALE) framework. This research focuses on the newly developed mathematical model of a new FSI problem, which is referred to as extended Fluid-Structure Interaction (eXFSI) problem in the ALE framework. The eXFSI is a strongly coupled problem of typical FSI with a coupled wave propagation problem on the fluid-solid interface (WpFSI). The WpFSI is a strongly coupled problem of acoustic and elastic wave equations, where wave propagation problems automatically adopts the boundary conditions from the FSI problem at each time step. The ALE approach provides a simple but powerful procedure to couple solid deformations with fluid flows by a monolithic solution algorithm. In such a setting, the fluid problems are transformed to a fixed reference configuration by the ALE mapping. The goal of this work is the development of concepts for the efficient numerical solution of eXFSI problem, the analysis of various fluid-solid mesh motion techniques and comparison of different second-order time-stepping schemes. This work consists of the investigation of different time stepping scheme formulations for a nonlinear FSI problem coupling the acoustic/elastic wave propagation on the fluid-structure interface. Temporal discretization is based on finite differences and is formulated as a one step-θ scheme, from which we can consider the following particular cases: the implicit Euler, Crank-Nicolson, shifted Crank-Nicolson and the Fractional-Step-θ schemes. The nonlinear problem is solved with a Newton-like method where the discretization is done with a Galerkin finite element scheme. The implementation is accomplished via the software library package DOPELIB based on the deal. II finite element library for the computation of different eXFSI configurations.
Top Fuel 2016: LWR Fuels with Enhanced Safety and Performance
Best-estimate fuel performance codes such as BISON currently under development at the Idaho National Laboratory, utilize empirical and mechanistic lower-length-scale informed correlations to predict fuel behavior under normal operating and accident reactor conditions. Traditionally, best-estimate results are presented using the correlations with no quantification of the uncertainty in the output metrics of interest. However, there are associated uncertainties in the input parameters and correlations used to determine the behavior of the fuel and cladding under irradiation. Therefore, it is important to perform uncertainty quantification and include confidence bounds on the output metrics that take into account the uncertainties in the inputs. In addition, sensitivity analyses can be performed to determine which input parameters have the greatest influence on the outputs. In this paper we couple the BISON fuel performance code to the DAKOTA uncertainty analysis software to analyze a representative fuel performance problem. The case studied in this paper is based upon rod 1 from the IFA-432 integral experiment performed at the Halden Reactor in Norway. The rodlet is representative of a BWR fuel rod. The input parameters uncertainties are broken into three separate categories including boundary condition uncertainties (e.g., power, coolant flow rate), manufacturing uncertainties (e.g., pellet diameter, cladding thickness), and model uncertainties (e.g., fuel thermal conductivity, fuel swelling). Utilizing DAKOTA, a variety of statistical analysis techniques are applied to quantify the uncertainty and sensitivity of the output metrics of interest. Specifically, we demonstrate the use of sampling methods, polynomial chaos expansions, surrogate models, and variance-based decomposition. The output metrics investigated in this study are the fuel centerline temperature, cladding surface temperature, fission gas released, and fuel rod diameter. The results highlight the importance of quantifying the uncertainty and sensitivity in fuel performance modeling predictions and the need for additional research into improving the material models that are currently available.
Eurographics Symposium on Geometry Processing
We introduce an algorithmic framework for tuning the spatial density of disks in a maximal random packing, without changing the sizing function or radii of disks. Starting from any maximal random packing such as a Maximal Poisson-disk Sampling (MPS), we iteratively relocate, inject (add), or eject (remove) disks, using a set of three successively more-aggressive local operations. We may achieve a user-defined density, either more dense or more sparse, almost up to the theoretical structured limits. The tuned samples are conflict-free, retain coverage maximality, and, except in the extremes, retain the blue noise randomness properties of the input. We change the density of the packing one disk at a time, maintaining the minimum disk separation distance and the maximum domain coverage distance required of any maximal packing. These properties are local, and we can handle spatially-varying sizing functions. Using fewer points to satisfy a sizing function improves the efficiency of some applications. We apply the framework to improve the quality of meshes, removing non-obtuse angles; and to more accurately model fiber reinforced polymers for elastic and failure simulations.
ASME 2016 Dynamic Systems and Control Conference, DSCC 2016
Temperature monitoring is essential in automation, mechatronics, robotics and other dynamic systems. Wireless methods which can sense multiple temperatures at the same time without the use of cables or slip-rings can enable many new applications. A novel method utilizing small permanent magnets is presented for wirelessly measuring the temperature of multiple points moving in repeatable motions. The technique utilizes linear least squares inversion to separate the magnetic field contributions of each magnet as it changes temperature. The experimental setup and calibration methods are discussed. Initial experiments show that temperatures from 5 to 50 °C can be accurately tracked for three neodymium iron boron magnets in a stationary configuration and while traversing in arbitrary, repeatable trajectories. This work presents a new sensing capability that can be extended to tracking multiple temperatures inside opaque vessels, on rotating bearings, within batteries, or at the tip of complex endeffectors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extended tasking to increase functionality and to support optimizations, for instance with the taskloop construct. However, task scheduling remains opaque, which leads to inconsistent performance on NUMA architectures. We assess design issues for task affinity and explore several approaches to enable it. We evaluate these proposals with implementations in the Nanos++ and LLVM OpenMP runtimes that improve performance up to 40% and significantly reduce execution time variation.
ASME International Mechanical Engineering Congress and Exposition, Proceedings (IMECE)
A framework is developed to integrate the existing MFiX (Multiphase Flow with Interphase eXchanges) flow solver with state-of-the-art linear equation solver packages in Trilinos. The integrated solver is tested on various flow problems. The performance of the solver is evaluated on fluidized bed problems and observed that the integrated flow solver performs better compared to the native solver.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A critical challenge in data science is conveying the meaning of data to human decision makers. While working with visualizations, decision makers are engaged in a visual search for information to support their reasoning process. As sensors proliferate and high performance computing becomes increasingly accessible, the volume of data decision makers must contend with is growing continuously and driving the need for more efficient and effective data visualizations. Consequently, researchers across the fields of data science, visualization, and human-computer interaction are calling for foundational tools and principles to assess the effectiveness of data visualizations. In this paper, we compare the performance of three different saliency models across a common set of data visualizations. This comparison establishes a performance baseline for assessment of new data visualization saliency models.
Frontiers in Cellular and Infection Microbiology
Mycobacterium tuberculosis associated granuloma formation can be viewed as a structural immune response that can contain and halt the spread of the pathogen. In several mammalian hosts, including non-human primates, Mtb granulomas are often hypoxic, although this has not been observed in wild type murine infection models. While a presumed consequence, the structural contribution of the granuloma to oxygen limitation and the concomitant impact on Mtb metabolic viability and persistence remains to be fully explored. We develop a multiscale computational model to test to what extent in vivo Mtb granulomas become hypoxic, and investigate the effects of hypoxia on host immune response efficacy and mycobacterial persistence. Our study integrates a physiological model of oxygen dynamics in the extracellular space of alveolar tissue, an agent-based model of cellular immune response, and a systems biology-based model of Mtb metabolic dynamics. Our theoretical studies suggest that the dynamics of granuloma organization mediates oxygen availability and illustrates the immunological contribution of this structural host response to infection outcome. Furthermore, our integrated model demonstrates the link between structural immune response and mechanistic drivers influencing Mtbs adaptation to its changing microenvironment and the qualitative infection outcome scenarios of clearance, containment, dissemination, and a newly observed theoretical outcome of transient containment. We observed hypoxic regions in the containment granuloma similar in size to granulomas found in mammalian in vivo models of Mtb infection. In the case of the containment outcome, our model uniquely demonstrates that immune response mediated hypoxic conditions help foster the shift down of bacteria through two stages of adaptation similar to thein vitro non-replicating persistence (NRP) observed in the Wayne model of Mtb dormancy. The adaptation in part contributes to the ability of Mtb to remain dormant for years after initial infection.
ASME International Mechanical Engineering Congress and Exposition, Proceedings (IMECE)
The peridynamic theory of solid mechanics provides a natural framework for modeling constitutive response and simulating dynamic crack propagation, pervasive damage, and fragmentation. In the case of a fragmenting body, the principal quantities of interest include the number of fragments, and the masses and velocities of the fragments. We present a method for identifying individual fragments in a peridynamic simulation. We restrict ourselves to the meshfree approach of Silling and Askari, in which nodal volumes are used to discretize the computational domain. Nodal volumes, which are connected by peridynamic bonds, may separate as a result of material damage and form groups that represent fragments. Nodes within each fragment have similar velocities and their collective motion resembles that of a rigid body. The identification of fragments is achieved through inspection of the peridynamic bonds, established at the onset of the simulation, and the evolving damage value associated with each bond. An iterative approach allows for the identification of isolated groups of nodal volumes by traversing the network of bonds present in a body. The process of identifying fragments may be carried out at specified times during the simulation, revealing the progression of damage and the creation of fragments. Incorporating the fragment identification algorithm directly within the simulation code avoids the need to write bond data to disk, which is often prohibitively expensive. Results are recorded using fragment identification numbers. The identification number for each fragment is stored at each node within the fragment and written to disk, allowing for any number of post-processing operations, for example the construction of cumulative distribution functions for quantities of interest. Care is taken with regard to very small clusters of isolated nodes, including individual nodes for which all bonds have failed. Small clusters of nodes may be treated as tiny fragments, or may be omitted from the fragment identification process. The fragment identification algorithm is demonstrated using the Sierra/SolidMechanics analysis code. It is applied to a simulation of pervasive damage resulting from a spherical projectile impacting a brittle disk, and to a simulation of fragmentation of an expanding ductile ring.
CEUR Workshop Proceedings
Computational Science and Engineering (CSE) software can benefit substantially from an explicit focus on quality improvement. This is especially true as we face increased demands in both modeling and software complexities. At the same time, just desiring improved quality is not sufficient. We must work with the entities that provide CSE research teams with publication venues, funding, and professional recognition in order to increase incentives for improved software quality. In fact, software quality is precisely calibrated to the expectations, explicit and implicit, set by these entities. We will see broad improvements in sustainability and productivity only when publishers, funding agencies and employers raise their expectations for software quality. CSE software community leaders, those who are in a position to inform and influence these entities, have a unique opportunity to broadly and positively impact software quality by working to establish incentives that will spur creative and novel approaches to improve developer productivity and software sustainability.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract not provided.
Handbook of Peridynamic Modeling
Therefore, to design a building or a bridge that stands up and is safe, one might assume that engineers must need to know a lot about these tensor fields and stress potentials, in all their mathematical glory. If not, then surely they must depend on specialists in continuum mechanics for guidance. Right?
Weak scaling studies were performed for the explicit solid dynamics component of the ALEGRA code on two Cray supercomputer platforms during the period 2012-2015, involving a production-oriented hypervelocity impact problem. Results from these studies are presented, with analysis of the performance, scaling, and throughput of the code on these machines. The analysis demonstrates logarithmic scaling of the average CPU time per cycle up to core counts on the order of 10,000. At higher core counts, variable performance is observed, with significant upward excursions in compute time from the logarithmic trend. However, for core counts less than 10,000, the results show a 3 × improvement in simulation throughput, and a 2 × improvement in logarithmic scaling. This improvement is linked to improved memory performance on the Cray platforms, and to significant improvements made over this period to the data layout used by ALEGRA.
The Center for Computing Research (CCR) at Sandia National Laboratories organizes a summer student program each summer, in coordination with the Computer Science Research Institute (CSRI) and Cyber Engineering Research Institute (CERI).
Proceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015
Recent advances in sensor technology have made continuous real-time health monitoring available in both hospital and non-hospital settings. Since data collected from high frequency medical sensors includes a huge amount of data, storing and processing continuous medical data is an emerging big data area. Especially detecting anomaly in real time is important for patients' emergency detection and prevention. A time series discord indicates a subsequence that has the maximum difference to the rest of the time series subsequences, meaning that it has abnormal or unusual data trends. In this study, we implemented two versions of time series discord detection algorithms on a high performance parallel database management system (DBMS) and applied them to 240 Hz waveform data collected from 9,723 patients. The initial brute force version of the discord detection algorithm takes each possible subsequence and calculates a distance to the nearest non-self match to find the biggest discords in time series. For the heuristic version of the algorithm, a combination of an array and a trie structure was applied to order time series data for enhancing time efficiency. The study results showed efficient data loading, decoding and discord searches in a large amount of data, benefiting from the time series discord detection algorithm and the architectural characteristics of the parallel DBMS including data compression, data pipe-lining, and task scheduling.
Abstract not provided.
The problem of computing quantum-accurate design-scale solutions to mechanics problems is rich with applications and serves as the background to modern multiscale science research. The prob- lem can be broken into component problems comprised of communicating across adjacent scales, which when strung together create a pipeline for information to travel from quantum scales to design scales. Traditionally, this involves connections between a) quantum electronic structure calculations and molecular dynamics and between b) molecular dynamics and local partial differ- ential equation models at the design scale. The second step, b), is particularly challenging since the appropriate scales of molecular dynamic and local partial differential equation models do not overlap. The peridynamic model for continuum mechanics provides an advantage in this endeavor, as the basic equations of peridynamics are valid at a wide range of scales limiting from the classical partial differential equation models valid at the design scale to the scale of molecular dynamics. In this work we focus on the development of multiscale finite element methods for the peridynamic model, in an effort to create a mathematically consistent channel for microscale information to travel from the upper limits of the molecular dynamics scale to the design scale. In particular, we first develop a Nonlocal Multiscale Finite Element Method which solves the peridynamic model at multiple scales to include microscale information at the coarse-scale. We then consider a method that solves a fine-scale peridynamic model to build element-support basis functions for a coarse- scale local partial differential equation model, called the Mixed Locality Multiscale Finite Element Method. Given decades of research and development into finite element codes for the local partial differential equation models of continuum mechanics there is a strong desire to couple local and nonlocal models to leverage the speed and state of the art of local models with the flexibility and accuracy of the nonlocal peridynamic model. In the mixed locality method this coupling occurs across scales, so that the nonlocal model can be used to communicate material heterogeneity at scales inappropriate to local partial differential equation models. Additionally, the computational burden of the weak form of the peridynamic model is reduced dramatically by only requiring that the model be solved on local patches of the simulation domain which may be computed in parallel, taking advantage of the heterogeneous nature of next generation computing platforms. Addition- ally, we present a novel Galerkin framework, the 'Ambulant Galerkin Method', which represents a first step towards a unified mathematical analysis of local and nonlocal multiscale finite element methods, and whose future extension will allow the analysis of multiscale finite element methods that mix models across scales under certain assumptions of the consistency of those models.
Abstract not provided.
Abstract not provided.
Abstract not provided.
IEEE Transactions on Parallel and Distributed Systems
Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups in the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Final report for Cognitive Computing for Security LDRD 165613. It reports on the development of hybrid of general purpose/ne uromorphic computer architecture, with an emphasis on potential implementation with memristors.
Abstract not provided.
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressing four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
We study a time-parallel approach to solving quadratic optimization problems with linear time-dependent partial differential equation (PDE) constraints. These problems arise in formulations of optimal control, optimal design and inverse problems that are governed by parabolic PDE models. They may also arise as subproblems in algorithms for the solution of optimization problems with nonlinear time-dependent PDE constraints, e.g., in sequential quadratic programming methods. We apply a piecewise linear finite element discretization in space to the PDE constraint, followed by the Crank-Nicolson discretization in time. The objective function is discretized using finite elements in space and the trapezoidal rule in time. At this point in the discretization, auxiliary state variables are introduced at each discrete time interval, with the goal to enable: (i) a decoupling in time; and (ii) a fixed-point iteration to recover the solution of the discrete optimality system. The fixed-point iterative schemes can be used either as preconditioners for Krylov subspace methods or as smoothers for multigrid (in time) schemes. We present promising numerical results for both use cases.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
In this report we formulate eigenvalue-based methods for model calibration using a PDE-constrained optimization framework. We derive the abstract optimization operators from first principles and implement these methods using Sierra-SD and the Rapid Optimization Library (ROL). To demon- strate this approach, we use experimental measurements and an inverse solution to compute the joint and elastic foam properties of a low-fidelity unit (LFU) model.
Abstract not provided.
Additive manufacturing
This paper presents an end-to-end design process for compliance minimization based topological optimization of cellular structures through to the realization of a final printed product. Homogenization is used to derive properties representative of these structures through direct numerical simulation of unit cell models of the underlying periodic structure. The resulting homogenized properties are then used assuming uniform distribution of the cellular structure to compute the final macro-scale structure. A new method is then presented for generating an STL representation of the final optimized part that is suitable for printing on typical industrial machines. Quite fine cellular structures are shown to be possible using this method as compared to other approaches that use nurb based CAD representations of the geometry. Finally, results are presented that illustrate the fine-scale stresses developed in the final macro-scale optimized part and suggestions are made as to incorporate these features into the overall optimization process.
Abstract not provided.
Abstract not provided.
Abstract not provided.
2015 4th Berkeley Symposium on Energy Efficient Electronic Systems, E3S 2015 - Proceedings
As transistors start to approach fundamental limits and Moore's law slows down, new devices and architectures are needed to enable continued performance gains. New approaches based on RRAM (resistive random access memory) or memristor crossbars can enable the processing of large amounts of data[1, 2]. One of the most promising applications for RRAM crossbars is brain inspired or neuromorphic computing[3, 4].
2015 4th Berkeley Symposium on Energy Efficient Electronic Systems, E3S 2015 - Proceedings
Millivolt switches will not only improve energy efficiency, but will enable a new capability to manage the energy-reliability tradeoff. By effectively utilizing this system-level capability, it may be possible to obtain one or two additional generations of scaling beyond current projections. Millivolt switches will enable further energy scaling, a process that is expected to continue until the technology encounters thermal noise errors [Theis 10]. If thermal noise errors can be accommodated at higher levels through a new form of error correction, it may be possible to scale about 3× lower in system energy than is currently projected. A general solution to errors would also address long standing problems with Cosmic Ray strikes, weak and aging parts, some cyber security vulnerabilities, etc.
2015 4th Berkeley Symposium on Energy Efficient Electronic Systems, E3S 2015 - Proceedings
Millivolt switches will not only improve energy efficiency, but will enable a new capability to manage the energy-reliability tradeoff. By effectively utilizing this system-level capability, it may be possible to obtain one or two additional generations of scaling beyond current projections. Millivolt switches will enable further energy scaling, a process that is expected to continue until the technology encounters thermal noise errors [Theis 10]. If thermal noise errors can be accommodated at higher levels through a new form of error correction, it may be possible to scale about 3× lower in system energy than is currently projected. A general solution to errors would also address long standing problems with Cosmic Ray strikes, weak and aging parts, some cyber security vulnerabilities, etc.
Proceedings of ISAV 2015: 1st International Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis
We present an architecture for high-performance computers that integrates in situ analysis of hardware and system monitoring data with application-specific data to reduce application runtimes and improve overall platform utilization. Large-scale high-performance computing systems typically use monitoring as a tool unrelated to application execution. Monitoring data flows from sampling points to a centralized off-system machine for storage and post-processing when root-cause analysis is required. Along the way, it may also be used for instantaneous threshold-based error detection. Applications can know their application state and possibly allocated resource state, but typically, they have no insight into globally shared resource state that may affect their execution. By analyzing performance data in situ rather than off-line, we enable applications to make real-time decisions about their resource utilization. We address the particular case of in situ network congestion analysis and its potential to improve task placement and data partitioning. We present several design and analysis considerations.
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
Application resilience is a key challenge that has to be addressed to realize the exascale vision. Online recovery, even when it involves all processes, can dramatically reduce the overhead of failures as compared to the more traditional approach where the job is terminated and restarted from the last checkpoint. In this paper we explore how local recovery can be used for certain classes of applications to further reduce overheads due to resilience. Specifically we develop programming support and scalable runtime mechanisms to enable online and transparent local recovery for stencil-based parallel applications on current leadership class systems. We also show how multiple independent failures can be masked to effectively reduce the impact on the total time to solution. We integrate these mechanisms with the S3D combustion simulation, and experimentally demonstrate (using the Titan Cray-XK7 system at ORNL) the ability to tolerate high failure rates (i.e., node failures every 5 seconds) with low overhead while sustaining performance, at scales up to 262144 cores.
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
We consider techniques to improve the performance of parallel sparse triangular solution on non-uniform memory architecture multicores by extending earlier coloring and level set schemes for single-core multiprocessors. We develop STS-k, where k represents a small number of transformations for latency reduction from increased spatial and temporal locality of data accesses. We propose a graph model of data reuse to inform the development of STS-k and to prove that computing an optimal cost schedule is NP-complete. We observe significant speed-ups with STS-3 on 32-core Intel Westmere-Ex and 24-core AMD 'MagnyCours' processors. Incremental gains solely from the 3-level transformations in STS-3 for a fixed ordering, correspond to reductions in execution times by factors of 1.4(Intel) and 1.5(AMD) for level sets and 2(Intel) and 2.2(AMD) for coloring. On average, execution times are reduced by a factor of 6(Intel) and 4(AMD) for STS-3 with coloring compared to a reference implementation using level sets.
Proceedings of E2SC 2015: 3rd International Workshop on Energy Efficient Supercomputing - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis
Power consumption of extreme-scale supercomputers has become a key performance bottleneck. Yet current practices do not leverage power management opportunities, instead running at maximum power. This is not sustainable. Future systems will need to manage power as a critical resource, directing it to where it has greatest benefit. Power capping is one mechanism for managing power budgets, however its behavior is not well understood. This paper presents an empirical evaluation of several key HPC workloads running under a power cap on a Cray XC40 system, and provides a comparison of this technique with p-state control, demonstrating the performance differences of each. These results show: 1.) Maximum performance requires ensuring the cap is not reached; 2.) Performance slowdown under a cap can be attributed to cascading delays which result in unsynchronized performance variability across nodes; and, 3.) Due to lag in reaction time, considerable time is spent operating above the set cap. This work provides a timely and much needed comparison of HPC application performance under a power cap and attempts to enable users and system administrators to understand how to best optimize application performance on power-constrained HPC systems.
2015 IEEE High Performance Extreme Computing Conference, HPEC 2015
It is challenging to obtain scalable HPC performance on real applications, especially for data science applications with irregular memory access and computation patterns. To drive co-design efforts in architecture, system, and application design, we are developing miniapps representative of data science workloads. These in turn stress the state of the art in Graph BLAS-like Graph Algorithm Building Blocks (GABB). In this work, we outline a Graph BLAS-like, linear algebra based approach to miniTri, one such miniapp. We describe a task-based prototype implementation and give initial scalability results.
International Journal of High Performance Computing Applications
As high-performance computing systems continue to increase in size and complexity, higher failure rates and increased overheads for checkpoint/restart (CR) protocols have raised concerns about the practical viability of CR protocols for future systems. Previously, compression has proven to be a viable approach for reducing checkpoint data volumes and, thereby, reducing CR protocol overhead leading to improved application performance. In this article, we further explore compression-based CR optimization by exploring its baseline performance and scaling properties, evaluating whether improved compression algorithms might lead to even better application performance and comparing checkpoint compression against and alongside other software- and hardware-based optimizations. Our results highlights are that: (1) compression is a very viable CR optimization; (2) generic, text-based compression algorithms appear to perform near optimally for checkpoint data compression and faster compression algorithms will not lead to better application performance; (3) compression-based optimizations fare well against and alongside other software-based optimizations; and (4) while hardware-based optimizations outperform software-based ones, they are not as cost effective.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Using a novel formal methods approach, we have generated computer-veri ed proofs of major theorems pertinent to the quantum phase estimation algorithm. This was accomplished using our Prove-It software package in Python. While many formal methods tools are available, their practical utility is limited. Translating a problem of interest into these systems and working through the steps of a proof is an art form that requires much expertise. One must surrender to the preferences and restrictions of the tool regarding how mathematical notions are expressed and what deductions are allowed. Automation is a major driver that forces restrictions. Our focus, on the other hand, is to produce a tool that allows users the ability to con rm proofs that are essentially known already. This goal is valuable in itself. We demonstrate the viability of our approach that allows the user great exibility in expressing state- ments and composing derivations. There were no major obstacles in following a textbook proof of the quantum phase estimation algorithm. There were tedious details of algebraic manipulations that we needed to implement (and a few that we did not have time to enter into our system) and some basic components that we needed to rethink, but there were no serious roadblocks. In the process, we made a number of convenient additions to our Prove-It package that will make certain algebraic manipulations easier to perform in the future. In fact, our intent is for our system to build upon itself in this manner.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proceedings - IEEE International Conference on Cluster Computing, ICCC
A broad range of physical phenomena in science and engineering can be explored using finite difference and volume based application codes. Incorporating Adaptive Mesh Refinement (AMR) into these codes focuses attention on the most critical parts of a simulation, enabling increased numerical accuracy of the solution while limiting memory consumption. However, adaptivity comes at the cost of increased runtime complexity, which is particularly challenging on emerging and expected future architectures. In order to explore the design space offered by new computing environments, we have developed a proxy application called miniAMR. MiniAMR exposes a range of the important issues that will significantly impact the performance potential of full application codes. In this paper, we describe miniAMR, demonstrate what is designed to represent in a full application code, and illustrate how it can be used to exploit future high performance computing architectures. To ensure an accurate understanding of what miniAMR is intended to represent, we compare it with CTH, a shock hydrodynamics code in heavy use throughout several computational science and engineering communities.
Proceedings - IEEE International Conference on Cluster Computing, ICCC
High-performance computing systems are shifting away from traditional interconnect topologies to exploit new technologies and to reduce interconnect power consumption. The Dragonfly topology is one promising candidate for new systems, with several variations already in production. It is hierarchical, with local links forming groups and global links joining the groups. At each level, the interconnect is a clique, with a link between each pair of switches in a group and a link between each pair of groups. This paper shows that the intergroup links can be made in meaningfully different ways. We evaluate three previously-proposed approaches for link organization (called global link arrangements) in two ways. First, we use bisection bandwidth, an important and commonly-used measure of the potential for communication bottlenecks. We show that the global link arrangements often give bisection bandwidths differing by 10s of percent, with the specific separation varying based on the relative bandwidths of local and global links. For the link bandwidths used in a current Dragonfly implementation, it is 33%. Second, we show that the choice of global link arrangement can greatly impact the regularity of task mappings for nearest neighbor stencil communication patterns, an important pattern in scientific applications.
ACM International Conference Proceeding Series
In recent work we quantified the anticipated performance boost when a sorting algorithm is modified to leverage user- Addressable "near-memory," which we call scratchpad. This architectural feature is expected in the Intel Knight's Land- ing processors that will be used in DOE's next large-scale supercomputer. This paper expands our analytical study of the scratch- pad to consider k-means clustering, a classical data-analysis technique that is ubiquitous in the literature and in prac- Tice. We present new theoretical results using the model introduced in [13], which measures memory transfers and assumes that computations are memory-bound. Our the- oretical results indicate that scratchpad-aware versions of k-means clustering can expect performance boosts for high- dimensional instances with relatively few cluster centers. These constraints may limit the practical impact of scratch- pad for k-means acceleration, so we discuss their origins and practical implications. We corroborate our theory with ex- perimental runs on a system instrumented to mimic one with scratchpad memory. We also contribute a semi-formalization of the computa- Tional properties that are necessary and sufficient to predict a performance boost from scratchpad-aware variants of al- gorithms. We have observed and studied these properties in the context of sorting, and now clustering. We conclude with some thoughts on the application of these properties to new areas. Specifically, we believe that dense linear algebra has similar properties to k-means, while sparse linear algebra and FFT computations are more sim-ilar to sorting. The sparse operations are more common in scientific computing, so we expect scratchpad to have signif- icant impact in that area.
Statistical Analysis and Data Mining
This study aimed to organize a body of trajectories in order to identify, search for and classify both common and uncommon behaviors among objects such as aircraft and ships. Existing comparison functions such as the Fréchet distance are computationally expensive and yield counterintuitive results in some cases. We propose an approach using feature vectors whose components represent succinctly the salient information in trajectories. These features incorporate basic information such as the total distance traveled and the distance between start/stop points as well as geometric features related to the properties of the convex hull, trajectory curvature and general distance geometry. Additionally, these features can generally be mapped easily to behaviors of interest to humans who are searching large databases. Most of these geometric features are invariant under rigid transformation. Furthermore, we demonstrate the use of different subsets of these features to identify trajectories similar to an exemplar, cluster a database of several hundred thousand trajectories and identify outliers.
Statistical Analysis and Data Mining
This study aimed to organize a body of trajectories in order to identify, search for and classify both common and uncommon behaviors among objects such as aircraft and ships. Existing comparison functions such as the Fréchet distance are computationally expensive and yield counterintuitive results in some cases. We propose an approach using feature vectors whose components represent succinctly the salient information in trajectories. These features incorporate basic information such as the total distance traveled and the distance between start/stop points as well as geometric features related to the properties of the convex hull, trajectory curvature and general distance geometry. Additionally, these features can generally be mapped easily to behaviors of interest to humans who are searching large databases. Most of these geometric features are invariant under rigid transformation. We demonstrate the use of different subsets of these features to identify trajectories similar to an exemplar, cluster a database of several hundred thousand trajectories and identify outliers.
Statistical Analysis and Data Mining
Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. We present a preliminary evaluation of three methods for determining both match quality scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The purpose of this report is to document a multi-year plan for enhancing turbulence modeling in Hydra-TH for the Consortium for Advanced Simulation of Light Water Reactors (CASL) program. Hydra-TH is being developed to the meet the high- fidelity, high-Reynolds number CFD based thermal hydraulic simulation needs of the program. This work is being conducted within the thermal hydraulics methods (THM) focus area. This report is an extension of THM CASL milestone L3:THM.CFD.P10.02 [33] (March, 2015) and picks up where it left off. It will also serve to meet the requirements of CASL THM level three milestone, L3:THM.CFD.P11.04, scheduled for completion September 30, 2015. The objectives of this plan will be met by: maturation of recently added turbulence models, strategic design/development of new models and systematic and rigorous testing of existing and new models and model extensions. While multi-phase turbulent flow simulations are important to the program, only single-phase modeling will be considered in this report. Large Eddy Simulation (LES) is also an important modeling methodology. However, at least in the first year, the focus is on steady-state Reynolds Averaged Navier-Stokes (RANS) turbulence modeling.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Future exascale systems are under increased pressure to find power savings. The network, while it consumes a considerable amount of power is often left out of the picture when discussing total system power. Even when network power is being considered, the references are frequently a decade or older and rely on models that lack validation on modern inter- connects. In this work we explore how dynamic mechanisms of an Infiniband network save power and at what granularity we can engage these features. We explore this within the context of the host controller adapter (HCA) on the node and for the fabric, i.e. switches, using three different mechanisms of dynamic link width, frequency and disabling of links for QLogic and Mellanox systems. Our results show that while there is some potential for modest power savings, real world systems need to improved responsiveness to adjustments in order to fully leverage these savings. This page intentionally left blank.
Abstract not provided.
Abstract not provided.
One of the most important concerns in parallel computing is the proper distribution of workload across processors. For most scientific applications on massively parallel machines, the best approach to this distribution is to employ data parallelism; that is, to break the datastructures supporting a computation into pieces and then to assign those pieces to different processors. Collectively, these partitioning and assignment tasks comprise the domain mapping problem.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.