Publications Search

PyTrilinos: Parallel Solvers and Simulation Tools for Python

Spotz, William S.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Exploring Workloads of Adaptive Mesh Refinement

Vaughan, Courtenay T.; Barrett, Richard F.; Jayaraj, Jagan

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Encoding and Analyzing Aerial Imagery Using Geospatial Semantic Graphs

Rintoul, Mark D.; Watson, Jean-Paul; Mclendon, William; Parekh, Ojas D.; Martin, Shawn

While collection capabilities have yielded an ever-increasing volume of aerial imagery, analytic techniques for identifying patterns in and extracting relevant information from this data have seriously lagged. The vast majority of imagery is never examined, due to a combination of the limited bandwidth of human analysts and limitations of existing analysis tools. In this report, we describe an alternative, novel approach to both encoding and analyzing aerial imagery, using the concept of a geospatial semantic graph. The advantages of our approach are twofold. First, intuitive templates can be easily specified in terms of the domain language in which an analyst converses. These templates can be used to automatically and efficiently search large graph databases, for specific patterns of interest. Second, unsupervised machine learning techniques can be applied to automatically identify patterns in the graph databases, exposing recurring motifs in imagery. We illustrate our approach using real-world data for Anne Arundel County, Maryland, and compare the performance of our approach to that of an expert human analyst.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

New Tracer Advection Schemes for CAM-SE

Peterson, Kara J.; Taylor, Mark A.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Ecient Multi-Frame Super-Resolution for Imagery with Lateral Shifts

Applied Optics

Kouri, Drew P.; Shields, Eric A.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2014

DOI OSTI

MPPM, Viewed as a co-design effort

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Woodward, Paul R.; Jayaraj, Jagan; Barrett, Richard F.

The Piecewise Parabolic Method (PPM) was designed as a means of exploring compressible gas dynam-ics problems of interest in astrophysics, including super-sonic jets, compressible turbulence, stellar convection, and turbulent mixing and burning of gases in stellar interiors. Over time, the capabilities encapsulated in PPM have co-evolved with the availability of a series of high performance computing platforms. Implementation of the algorithm has adapted to and advanced with the architectural capabilities and characteristics of these machines. This adaptability of our PPM codes has enabled targeted astrophysical applica-tions of PPM to exploit these scarce resources to explore complex physical phenomena. Here we describe the means by which this was accomplished, and set a path forward, with a new miniapp, mPPM, for continuing this process in a diverse and dynamic architecture design environment. Adaptations in mPPM for the latest high performance machines are discussed that address the important issue of limited bandwidth from locally attached main memory to the microprocessor chip.

More Details

TYPE Conference Poster YEAR 2014

OSTI Scopus

Using a complementary emulation-simulation co-design approach to assess application readiness for Processing-in-Memory systems

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stelle, George W.; Olivier, Stephen L.; Stark, Dylan T.; Rodrigues, Arun; Hemmert, Karl S.

Disruptive changes to computer architecture are paving the way toward extreme scale computing. The co-design strategy of collaborative research and development among computer architects, system software designers, and application teams can help to ensure that applications not only cope but thrive with these changes. In this paper, we present a novel combined co-design approach of emulation and simulation in the context of investigating future Processing in Memory (PIM) architectures. PIM enables co-location of data and computation to decrease data movement, to provide increases in memory speed and capacity compared to existing technologies and, perhaps most importantly for extreme scale, to improve energy efficiency. Our evaluation of PIM focuses on three mini-applications representing important production applications. The emulation and simulation studies examine the effects of locality-aware versus locality-oblivious data distribution and computation, and they compare PIM to conventional architectures. Both studies contribute in their own way to the overall understanding of the application-architecture interactions, and our results suggest that PIM technology shows great potential for efficient computation without negatively impacting productivity.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI Scopus

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Yamazaki, Ichitaro; Rajamanickam, Sivasankaran; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

Surrogate models for mixed discrete-continuous variables

Studies in Computational Intelligence

Swiler, Laura P.

Large-scale computational models have become common tools for analyzing complex man-made systems. However, when coupled with optimization or uncertainty quantification methods in order to conduct extensive model exploration and analysis, the computational expense quickly becomes intractable. Furthermore, these models may have both continuous and discrete parameters. One common approach to mitigating the computational expense is the use of response surface approximations. While well developed for models with continuous parameters, they are still new and largely untested for models with both continuous and discrete parameters. In this work, we describe and investigate the performance of three types of response surfaces developed for mixed-variable models: Adaptive Component Selection and Shrinkage Operator, Treed Gaussian Process, and Gaussian Process with Special Correlation Functions. We focus our efforts on test problems with a small number of parameters of interest, a characteristic of many physics-based engineering models. We present the results of our studies and offer some insights regarding the performance of each response surface approximation method. © 2014 Springer International Publishing Switzerland.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI Scopus

Gaussian process adaptive importance sampling

International Journal for Uncertainty Quantification

Dalbey, Keith; Swiler, Laura P.

The objective is to calculate the probability, PF, that a device will fail when its inputs, x, are randomly distributed with probability density, p (x), e.g., the probability that a device will fracture when subject to varying loads. Here failure is defined as some scalar function, y (x), exceeding a threshold, T. If evaluating y (x) via physical or numerical experiments is sufficiently expensive or PF is sufficiently small, then Monte Carlo (MC) methods to estimate PF will be unfeasible due to the large number of function evaluations required for a specified accuracy. Importance sampling (IS), i.e., preferentially sampling from “important” regions in the input space and appropriately down-weighting to obtain an unbiased estimate, is one approach to assess PF more efficiently. The inputs are sampled from an importance density, pʹ (x). We present an adaptive importance sampling (AIS) approach which endeavors to adaptively improve the estimate of the ideal importance density, p* (x), during the sampling process. Our approach uses a mixture of component probability densities that each approximate p* (x). An iterative process is used to construct the sequence of improving component probability densities. At each iteration, a Gaussian process (GP) surrogate is used to help identify areas in the space where failure is likely to occur. The GPs are not used to directly calculate the failure probability; they are only used to approximate the importance density. Thus, our Gaussian process adaptive importance sampling (GPAIS) algorithm overcomes limitations involving using a potentially inaccurate surrogate model directly in IS calculations. This robust GPAIS algorithm performs surprisingly well on a pathological test function.

More Details

TYPE Conference YEAR 2014

OSTI Scopus

Gaussian process adaptive importance sampling

International Journal for Uncertainty Quantification

Dalbey, Keith; Swiler, Laura P.

The objective is to calculate the probability, PF, that a device will fail when its inputs, x, are randomly distributed with probability density, p (x), e.g., the probability that a device will fracture when subject to varying loads. Here failure is defined as some scalar function, y (x), exceeding a threshold, T. If evaluating y (x) via physical or numerical experiments is sufficiently expensive or PF is sufficiently small, then Monte Carlo (MC) methods to estimate PF will be unfeasible due to the large number of function evaluations required for a specified accuracy. Importance sampling (IS), i.e., preferentially sampling from “important” regions in the input space and appropriately down-weighting to obtain an unbiased estimate, is one approach to assess PF more efficiently. The inputs are sampled from an importance density, pʹ (x). We present an adaptive importance sampling (AIS) approach which endeavors to adaptively improve the estimate of the ideal importance density, p* (x), during the sampling process. Our approach uses a mixture of component probability densities that each approximate p* (x). An iterative process is used to construct the sequence of improving component probability densities. At each iteration, a Gaussian process (GP) surrogate is used to help identify areas in the space where failure is likely to occur. The GPs are not used to directly calculate the failure probability; they are only used to approximate the importance density. Thus, our Gaussian process adaptive importance sampling (GPAIS) algorithm overcomes limitations involving using a potentially inaccurate surrogate model directly in IS calculations. This robust GPAIS algorithm performs surprisingly well on a pathological test function.

More Details

TYPE Journal Article YEAR 2014

OSTI Scopus

Streaming data analytics via message passing with application to graph algorithms

Journal of Parallel and Distributed Computing

Plimpton, Steven J.; Shead, Tim

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which provides a communication model enabling a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH provides multiple communication backends including MPI and sockets/ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how streaming MapReduce operations can be implemented using the PHISH communication model, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, sub-graph isomorphism matching, and connected component finding. We also provide benchmark timings comparing MPI and socket performance for several kernel operations useful in streaming algorithms. © 2014 Elsevier Inc. All rights reserved.

More Details

TYPE Journal Article YEAR 2014

Scopus OSTI DOI

Origin and effect of nonlocality in a composite

Journal of Mechanics of Materials and Structures

Silling, Stewart

A simple demonstration of nonlocality in a heterogeneous material is presented. By analysis of the microscale deformation of a two-component layered medium, it is shown that nonlocal interactions necessarily appear in a homogenized model of the system. Explicit expressions for the nonlocal forces are determined. The way these nonlocal forces appear in various nonlocal elasticity theories is derived. The length scales that emerge involve the constituent material properties as well as their geometrical dimensions. A peridynamic material model for the smoothed displacement field is derived. It is demonstrated by comparison with experimental data that the incorporation of nonlocality in modeling improves the prediction of the stress concentration in an open-hole tension test on a composite plate. © 2014 Mathematical Sciences Publishers.

More Details

TYPE Conference YEAR 2014

DOI OSTI Scopus

Asking the right questions: Benchmarking fault-tolerant extreme-scale systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Widener, Patrick; Ferreira, Kurt; Levy, Scott; Bridges, Patrick G.; Arnold, Dorian; Brightwell, Ronald B.

Much recent research has explored fault-tolerance mechanisms intended for current and future extreme-scale systems. Evaluations of the suitability of checkpoint-based solutions have typically been carried out using relatively uncomplicated computational kernels designed to measure floating point performance. More recent investigations have added scaled-down "proxy" applications to more closely match the composition and behavior of deployed ones. However, the information obtained from these studies (whether floating point performance or application runtime) is not necessarily of the most value in evaluating resilience strategies. We observe that even when using a more sophisticated metric, the information available from evaluating uncoordinated checkpointing using both microbenchmarks and proxy applications does not agree. This implies that not only might researchers be asking the wrong questions, but that the answers to the right ones might be unexpected and potentially misleading. We seek to open a discussion on whether benchmarks designed to provide predictable performance evaluations of HPC hardware and toolchains are providing the right feedback for the evaluation of fault-tolerance in these applications, and more generally on how benchmarking of resilience mechanisms ought to be approached in the exascale design space. © 2014 Springer-Verlag Berlin Heidelberg.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

Controlling self-force for unstructured particle-in-cell (PIC) codes

IEEE Transactions on Plasma Science

Bettencourt, Matthew T.

A new algorithm was developed, which reduces the self-force in particle-in-cell codes on unstructured meshes in a predictable and controllable way. This is accomplished by computing a charge density weighting function for a particle, which reproduces the Green's function solution to Poisson's equation at nodes when using a standard finite element method methodology. This provides a superior local potential and allows for particle-particle particle-mesh techniques to be used to subtract off local force contributions, including fictitious self-forces resulting in accurate long-range forces on a particle and improved local Coulomb collisions. Local physical forces are then computed using the Green's function on local particle pairs and added to the long-range forces. Results were shown with up to five orders reduction in self-force and superior intraparticle forces for two test cases. © 2013 IEEE.

More Details

TYPE Journal Article YEAR 2014

OSTI Scopus

BFS and coloring-based parallel algorithms for strongly connected components and related problems

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

Slota, George M.; Rajamanickam, Sivasankaran; Madduri, Kamesh

Finding the strongly connected components (SCCs) of a directed graph is a fundamental graph-theoretic problem. Tarjan's algorithm is an efficient serial algorithm to find SCCs, but relies on the hard-to-parallelize depth-first search (DFS). We observe that implementations of several parallel SCC detection algorithms show poor parallel performance on modern multicore platforms and large-scale networks. This paper introduces the Multistep method, a new approach that avoids work inefficiencies seen in prior SCC approaches. It does not rely on DFS, but instead uses a combination of breadth-first search (BFS) and a parallel graph coloring routine. We show that the Multistep method scales well on several real-world graphs, with performance fairly independent of topological properties such as the size of the largest SCC and the total number of SCCs. On a 16-core Intel Xeon platform, our algorithm achieves a 20X speedup over the serial approach on a 2 billion edge graph, fully decomposing it in under two seconds. For our collection of test networks, we observe that the Multistep method is 1.92X faster (mean speedup) than the state-of-the-art Hong et al. SCC method. In addition, we modify the Multistep method to find connected and weakly connected components, as well as introduce a novel algorithm for determining articulation vertices of biconnected components. These approaches all utilize the same underlying BFS and coloring routines. © 2014 IEEE.

More Details

TYPE Conference YEAR 2014

Scopus OSTI