Publications

Results 6201–6300 of 9,998

Search results

Jump to search filters

Task mapping stencil computations for non-contiguous allocations

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Leung, Vitus J.; Bunde, David P.; Ebbers, Johnathan; Feer, Stefan P.; Price, Nickolas W.; Rhodes, Zachary D.; Swank, Matthew

We examine task mapping algorithms for systems that allocate jobs non-contiguously. Several studies have shown that task placement affects job running time. We focus on jobs with a stencil communication pattern and use experiments on a Cray XE to evaluate novel task mapping algorithms as well as some adapted to this setting. This is done with the miniGhost miniApp which mimics the behavior of CTH, a shock physics application. Our strategies improve average and single-run times by as much as 28% and 36% over a baseline strategy, respectively.

More Details

Sensitivity of precipitation to parameter values in the community atmosphere model version 5

Swiler, Laura P.; Wildey, Timothy M.

One objective of the Climate Science for a Sustainable Energy Future (CSSEF) program is to develop the capability to thoroughly test and understand the uncertainties in the overall climate model and its components as they are being developed. The focus on uncertainties involves sensitivity analysis: the capability to determine which input parameters have a major influence on the output responses of interest. This report presents some initial sensitivity analysis results performed by Lawrence Livermore National Laboratory (LNNL), Sandia National Laboratories (SNL), and Pacific Northwest National Laboratory (PNNL). In the 2011-2012 timeframe, these laboratories worked in collaboration to perform sensitivity analyses of a set of CAM5, 2° runs, where the response metrics of interest were precipitation metrics. The three labs performed their sensitivity analysis (SA) studies separately and then compared results. Overall, the results were quite consistent with each other although the methods used were different. This exercise provided a robustness check of the global sensitivity analysis metrics and identified some strongly influential parameters.

More Details

Dakota uncertainty quantification methods applied to the NEK-5000 SAHEX model

Weirs, Vincent G.

This report summarizes the results of a NEAMS project focused on the use of uncertainty and sensitivity analysis methods within the NEK-5000 and Dakota software framework for assessing failure probabilities as part of probabilistic risk assessment. NEK-5000 is a software tool under development at Argonne National Laboratory to perform computational fluid dynamics calculations for applications such as thermohydraulics of nuclear reactor cores. Dakota is a software tool developed at Sandia National Laboratories containing optimization, sensitivity analysis, and uncertainty quantification algorithms. The goal of this work is to demonstrate the use of uncertainty quantification methods in Dakota with NEK-5000.

More Details

Optimality conditions for the numerical solution of optimization problems with PDE constraints :

Aguilo Valentin, Miguel A.; Ridzal, Denis R.

A theoretical framework for the numerical solution of partial di erential equation (PDE) constrained optimization problems is presented in this report. This theoretical framework embodies the fundamental infrastructure required to e ciently implement and solve this class of problems. Detail derivations of the optimality conditions required to accurately solve several parameter identi cation and optimal control problems are also provided in this report. This will allow the reader to further understand how the theoretical abstraction presented in this report translates to the application.

More Details

Numerical implementation of time-dependent density functional theory for extended systems in extreme environments

Baczewski, Andrew D.; Shulenburger, Luke N.; Desjarlais, Michael P.; Magyar, Rudolph J.

In recent years, DFT-MD has been shown to be a useful computational tool for exploring the properties of WDM. These calculations achieve excellent agreement with shock compression experiments, which probe the thermodynamic parameters of the Hugoniot state. New X-ray Thomson Scattering diagnostics promise to deliver independent measurements of electronic density and temperature, as well as structural information in shocked systems. However, they require the development of new levels of theory for computing the associated observables within a DFT framework. The experimentally observable x-ray scattering cross section is related to the electronic density-density response function, which is obtainable using TDDFT - a formally exact extension of conventional DFT that describes electron dynamics and excited states. In order to develop a capability for modeling XRTS data and, more generally, to establish a predictive capability for rst principles simulations of matter in extreme conditions, real-time TDDFT with Ehrenfest dynamics has been implemented in an existing PAW code for DFT-MD calculations. The purpose of this report is to record implementation details and benchmarks as the project advances from software development to delivering novel scienti c results. Results range from tests that establish the accuracy, e ciency, and scalability of our implementation, to calculations that are veri ed against accepted results in the literature. Aside from the primary XRTS goal, we identify other more general areas where this new capability will be useful, including stopping power calculations and electron-ion equilibration.

More Details

Bayesian calibration of the Community Land Model using surrogates

Ray, Jaideep R.; Swiler, Laura P.

We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditional on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural error in CLM under two error models. We find that surrogate models can be created for CLM in most cases. The posterior distributions are more predictive than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters' distributions significantly. The structural error model reveals a correlation time-scale which can be used to identify the physical process that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.

More Details

Statistically significant relational data mining :

Berry, Jonathan W.; Leung, Vitus J.; Phillips, Cynthia A.; Pinar, Ali P.; Robinson, David G.

This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

More Details

Encoding and analyzing aerial imagery using geospatial semantic graphs

Rintoul, Mark D.; Watson, Jean-Paul W.; McLendon, William C.; Parekh, Ojas D.; Martin, Shawn

While collection capabilities have yielded an ever-increasing volume of aerial imagery, analytic techniques for identifying patterns in and extracting relevant information from this data have seriously lagged. The vast majority of imagery is never examined, due to a combination of the limited bandwidth of human analysts and limitations of existing analysis tools. In this report, we describe an alternative, novel approach to both encoding and analyzing aerial imagery, using the concept of a geospatial semantic graph. The advantages of our approach are twofold. First, intuitive templates can be easily specified in terms of the domain language in which an analyst converses. These templates can be used to automatically and efficiently search large graph databases, for specific patterns of interest. Second, unsupervised machine learning techniques can be applied to automatically identify patterns in the graph databases, exposing recurring motifs in imagery. We illustrate our approach using real-world data for Anne Arundel County, Maryland, and compare the performance of our approach to that of an expert human analyst.

More Details

MPPM, Viewed as a co-design effort

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Woodward, Paul R.; Jayaraj, Jagan J.; Barrett, Richard F.

The Piecewise Parabolic Method (PPM) was designed as a means of exploring compressible gas dynam-ics problems of interest in astrophysics, including super-sonic jets, compressible turbulence, stellar convection, and turbulent mixing and burning of gases in stellar interiors. Over time, the capabilities encapsulated in PPM have co-evolved with the availability of a series of high performance computing platforms. Implementation of the algorithm has adapted to and advanced with the architectural capabilities and characteristics of these machines. This adaptability of our PPM codes has enabled targeted astrophysical applica-tions of PPM to exploit these scarce resources to explore complex physical phenomena. Here we describe the means by which this was accomplished, and set a path forward, with a new miniapp, mPPM, for continuing this process in a diverse and dynamic architecture design environment. Adaptations in mPPM for the latest high performance machines are discussed that address the important issue of limited bandwidth from locally attached main memory to the microprocessor chip.

More Details

Using a complementary emulation-simulation co-design approach to assess application readiness for Processing-in-Memory systems

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stelle, George; Olivier, Stephen L.; Stark, Dylan S.; Rodrigues, Arun; Hemmert, Karl S.

Disruptive changes to computer architecture are paving the way toward extreme scale computing. The co-design strategy of collaborative research and development among computer architects, system software designers, and application teams can help to ensure that applications not only cope but thrive with these changes. In this paper, we present a novel combined co-design approach of emulation and simulation in the context of investigating future Processing in Memory (PIM) architectures. PIM enables co-location of data and computation to decrease data movement, to provide increases in memory speed and capacity compared to existing technologies and, perhaps most importantly for extreme scale, to improve energy efficiency. Our evaluation of PIM focuses on three mini-applications representing important production applications. The emulation and simulation studies examine the effects of locality-aware versus locality-oblivious data distribution and computation, and they compare PIM to conventional architectures. Both studies contribute in their own way to the overall understanding of the application-architecture interactions, and our results suggest that PIM technology shows great potential for efficient computation without negatively impacting productivity.

More Details

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Yamazaki, Ichitaro; Rajamanickam, Sivasankaran R.; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

More Details

Cubic-scaling algorithm and self-consistent field for the random-phase approximation with second-order screened exchange

Journal of Chemical Physics

Moussa, Jonathan E.

The random-phase approximation with second-order screened exchange (RPA+SOSEX) is a model of electron correlation energy with two caveats: its accuracy depends on an arbitrary choice of mean field, and it scales as O(n 5) operations and O(n3) memory for n electrons. We derive a new algorithm that reduces its scaling to O(n3) operations and O(n2) memory using controlled approximations and a new self-consistent field that approximates Brueckner coupled-cluster doubles theory with RPA+SOSEX, referred to as Brueckner RPA theory. The algorithm comparably reduces the scaling of second-order Møller-Plesset perturbation theory with smaller cost prefactors than RPA+SOSEX. Within a semiempirical model, we study H2 dissociation to test accuracy and Hn rings to verify scaling. © 2014 AIP Publishing LLC.

More Details

Surrogate models for mixed discrete-continuous variables

Studies in Computational Intelligence

Swiler, Laura P.

Large-scale computational models have become common tools for analyzing complex man-made systems. However, when coupled with optimization or uncertainty quantification methods in order to conduct extensive model exploration and analysis, the computational expense quickly becomes intractable. Furthermore, these models may have both continuous and discrete parameters. One common approach to mitigating the computational expense is the use of response surface approximations. While well developed for models with continuous parameters, they are still new and largely untested for models with both continuous and discrete parameters. In this work, we describe and investigate the performance of three types of response surfaces developed for mixed-variable models: Adaptive Component Selection and Shrinkage Operator, Treed Gaussian Process, and Gaussian Process with Special Correlation Functions. We focus our efforts on test problems with a small number of parameters of interest, a characteristic of many physics-based engineering models. We present the results of our studies and offer some insights regarding the performance of each response surface approximation method. © 2014 Springer International Publishing Switzerland.

More Details

Streaming data analytics via message passing with application to graph algorithms

Journal of Parallel and Distributed Computing

Plimpton, Steven J.; Shead, Tim

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which provides a communication model enabling a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH provides multiple communication backends including MPI and sockets/ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how streaming MapReduce operations can be implemented using the PHISH communication model, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, sub-graph isomorphism matching, and connected component finding. We also provide benchmark timings comparing MPI and socket performance for several kernel operations useful in streaming algorithms. © 2014 Elsevier Inc. All rights reserved.

More Details

Origin and effect of nonlocality in a composite

Journal of Mechanics of Materials and Structures

Silling, Stewart A.

A simple demonstration of nonlocality in a heterogeneous material is presented. By analysis of the microscale deformation of a two-component layered medium, it is shown that nonlocal interactions necessarily appear in a homogenized model of the system. Explicit expressions for the nonlocal forces are determined. The way these nonlocal forces appear in various nonlocal elasticity theories is derived. The length scales that emerge involve the constituent material properties as well as their geometrical dimensions. A peridynamic material model for the smoothed displacement field is derived. It is demonstrated by comparison with experimental data that the incorporation of nonlocality in modeling improves the prediction of the stress concentration in an open-hole tension test on a composite plate. © 2014 Mathematical Sciences Publishers.

More Details

Optimization-based conservative transport on the cubed-sphere grid

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Peterson, Kara J.; Bochev, Pavel B.; Ridzal, Denis R.

Transport algorithms are highly important for dynamical modeling of the atmosphere, where it is critical that scalar tracer species are conserved and satisfy physical bounds. We present an optimization-based algorithm for the conservative transport of scalar quantities (i.e. mass) on the cubed sphere grid, which preserves local solution bounds without the use of flux limiters. The optimization variables are the net mass updates to the cell, the objective is to minimize the discrepancy between these variables and suitable high-order cell mass update (the "target"), and the constraints are derived from the local solution bounds and the conservation of the total mass. The resulting robust and efficient algorithm for conservative and local bound-preserving transport on the sphere further demonstrates the flexibility and scope of the recently developed optimization-based modeling approach [1, 2]. © 2014 Springer-Verlag.

More Details

Asking the right questions: Benchmarking fault-tolerant extreme-scale systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Widener, Patrick W.; Ferreira, Kurt; Levy, Scott; Bridges, Patrick G.; Arnold, Dorian; Brightwell, Ronald B.

Much recent research has explored fault-tolerance mechanisms intended for current and future extreme-scale systems. Evaluations of the suitability of checkpoint-based solutions have typically been carried out using relatively uncomplicated computational kernels designed to measure floating point performance. More recent investigations have added scaled-down "proxy" applications to more closely match the composition and behavior of deployed ones. However, the information obtained from these studies (whether floating point performance or application runtime) is not necessarily of the most value in evaluating resilience strategies. We observe that even when using a more sophisticated metric, the information available from evaluating uncoordinated checkpointing using both microbenchmarks and proxy applications does not agree. This implies that not only might researchers be asking the wrong questions, but that the answers to the right ones might be unexpected and potentially misleading. We seek to open a discussion on whether benchmarks designed to provide predictable performance evaluations of HPC hardware and toolchains are providing the right feedback for the evaluation of fault-tolerance in these applications, and more generally on how benchmarking of resilience mechanisms ought to be approached in the exascale design space. © 2014 Springer-Verlag Berlin Heidelberg.

More Details

Investigating the integration of supercomputers and data-Warehouse appliances

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Oldfield, Ron A.; Davidson, George; Ulmer, Craig D.; Wilson, Andrew T.

Two decades of experience with massively parallel supercomputing has given insight into the problem domains where these architectures are cost effective. Likewise experience with database machines and more recently massively parallel database appliances has shown where these architectures are valuable. Combining both architectures to simultaneously solve problems has received much less attention. In this paper, we describe a motivating application for economic modeling that requires both HPC and database capabilities. Then we discuss hardware and software integration issues related to a direct integration of a Cray XT supercomputer and a Netezza database appliance. © 2014 Springer-Verlag Berlin Heidelberg.

More Details

Controlling self-force for unstructured particle-in-cell (PIC) codes

IEEE Transactions on Plasma Science

Bettencourt, Matthew T.

A new algorithm was developed, which reduces the self-force in particle-in-cell codes on unstructured meshes in a predictable and controllable way. This is accomplished by computing a charge density weighting function for a particle, which reproduces the Green's function solution to Poisson's equation at nodes when using a standard finite element method methodology. This provides a superior local potential and allows for particle-particle particle-mesh techniques to be used to subtract off local force contributions, including fictitious self-forces resulting in accurate long-range forces on a particle and improved local Coulomb collisions. Local physical forces are then computed using the Green's function on local particle pairs and added to the long-range forces. Results were shown with up to five orders reduction in self-force and superior intraparticle forces for two test cases. © 2013 IEEE.

More Details

Origin and effect of nonlocality in a layered composite

Silling, Stewart A.

A simple demonstration of nonlocality in a heterogeneous material is presented. By analysis of the microscale deformation of a two-component layered medium, it is shown that nonlocal interactions necessarily appear in a homogenized model of the system. Explicit expressions for the nonlocal forces are determined. The way these nonlocal forces appear in various nonlocal elasticity theories is derived. The length scales that emerge involve the constituent material properties as well as their geometrical dimen- sions. A peridynamic material model for the smoothed displacement eld is derived. It is demonstrated by comparison with experimental data that the incorporation of non- locality in modeling dramatically improves the prediction of the stress concentration in an open hole tension test on a composite plate.

More Details

BFS and coloring-based parallel algorithms for strongly connected components and related problems

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

Slota, George M.; Rajamanickam, Sivasankaran R.; Madduri, Kamesh

Finding the strongly connected components (SCCs) of a directed graph is a fundamental graph-theoretic problem. Tarjan's algorithm is an efficient serial algorithm to find SCCs, but relies on the hard-to-parallelize depth-first search (DFS). We observe that implementations of several parallel SCC detection algorithms show poor parallel performance on modern multicore platforms and large-scale networks. This paper introduces the Multistep method, a new approach that avoids work inefficiencies seen in prior SCC approaches. It does not rely on DFS, but instead uses a combination of breadth-first search (BFS) and a parallel graph coloring routine. We show that the Multistep method scales well on several real-world graphs, with performance fairly independent of topological properties such as the size of the largest SCC and the total number of SCCs. On a 16-core Intel Xeon platform, our algorithm achieves a 20X speedup over the serial approach on a 2 billion edge graph, fully decomposing it in under two seconds. For our collection of test networks, we observe that the Multistep method is 1.92X faster (mean speedup) than the state-of-the-art Hong et al. SCC method. In addition, we modify the Multistep method to find connected and weakly connected components, as well as introduce a novel algorithm for determining articulation vertices of biconnected components. These approaches all utilize the same underlying BFS and coloring routines. © 2014 IEEE.

More Details

A pervasive parallel framework for visualization: final report for FWP 10-014707

Moreland, Kenneth D.

We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documents the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.

More Details
Results 6201–6300 of 9,998
Results 6201–6300 of 9,998