Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
Leung, Vitus J.; Bunde, David P.; Ebbers, Johnathan; Feer, Stefan P.; Price, Nickolas W.; Rhodes, Zachary D.; Swank, Matthew
We examine task mapping algorithms for systems that allocate jobs non-contiguously. Several studies have shown that task placement affects job running time. We focus on jobs with a stencil communication pattern and use experiments on a Cray XE to evaluate novel task mapping algorithms as well as some adapted to this setting. This is done with the miniGhost miniApp which mimics the behavior of CTH, a shock physics application. Our strategies improve average and single-run times by as much as 28% and 36% over a baseline strategy, respectively.
One objective of the Climate Science for a Sustainable Energy Future (CSSEF) program is to develop the capability to thoroughly test and understand the uncertainties in the overall climate model and its components as they are being developed. The focus on uncertainties involves sensitivity analysis: the capability to determine which input parameters have a major influence on the output responses of interest. This report presents some initial sensitivity analysis results performed by Lawrence Livermore National Laboratory (LNNL), Sandia National Laboratories (SNL), and Pacific Northwest National Laboratory (PNNL). In the 2011-2012 timeframe, these laboratories worked in collaboration to perform sensitivity analyses of a set of CAM5, 2° runs, where the response metrics of interest were precipitation metrics. The three labs performed their sensitivity analysis (SA) studies separately and then compared results. Overall, the results were quite consistent with each other although the methods used were different. This exercise provided a robustness check of the global sensitivity analysis metrics and identified some strongly influential parameters.
This report summarizes the results of a NEAMS project focused on the use of uncertainty and sensitivity analysis methods within the NEK-5000 and Dakota software framework for assessing failure probabilities as part of probabilistic risk assessment. NEK-5000 is a software tool under development at Argonne National Laboratory to perform computational fluid dynamics calculations for applications such as thermohydraulics of nuclear reactor cores. Dakota is a software tool developed at Sandia National Laboratories containing optimization, sensitivity analysis, and uncertainty quantification algorithms. The goal of this work is to demonstrate the use of uncertainty quantification methods in Dakota with NEK-5000.
A theoretical framework for the numerical solution of partial di erential equation (PDE) constrained optimization problems is presented in this report. This theoretical framework embodies the fundamental infrastructure required to e ciently implement and solve this class of problems. Detail derivations of the optimality conditions required to accurately solve several parameter identi cation and optimal control problems are also provided in this report. This will allow the reader to further understand how the theoretical abstraction presented in this report translates to the application.
In recent years, DFT-MD has been shown to be a useful computational tool for exploring the properties of WDM. These calculations achieve excellent agreement with shock compression experiments, which probe the thermodynamic parameters of the Hugoniot state. New X-ray Thomson Scattering diagnostics promise to deliver independent measurements of electronic density and temperature, as well as structural information in shocked systems. However, they require the development of new levels of theory for computing the associated observables within a DFT framework. The experimentally observable x-ray scattering cross section is related to the electronic density-density response function, which is obtainable using TDDFT - a formally exact extension of conventional DFT that describes electron dynamics and excited states. In order to develop a capability for modeling XRTS data and, more generally, to establish a predictive capability for rst principles simulations of matter in extreme conditions, real-time TDDFT with Ehrenfest dynamics has been implemented in an existing PAW code for DFT-MD calculations. The purpose of this report is to record implementation details and benchmarks as the project advances from software development to delivering novel scienti c results. Results range from tests that establish the accuracy, e ciency, and scalability of our implementation, to calculations that are veri ed against accepted results in the literature. Aside from the primary XRTS goal, we identify other more general areas where this new capability will be useful, including stopping power calculations and electron-ion equilibration.
We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditional on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural error in CLM under two error models. We find that surrogate models can be created for CLM in most cases. The posterior distributions are more predictive than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters' distributions significantly. The structural error model reveals a correlation time-scale which can be used to identify the physical process that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
While collection capabilities have yielded an ever-increasing volume of aerial imagery, analytic techniques for identifying patterns in and extracting relevant information from this data have seriously lagged. The vast majority of imagery is never examined, due to a combination of the limited bandwidth of human analysts and limitations of existing analysis tools. In this report, we describe an alternative, novel approach to both encoding and analyzing aerial imagery, using the concept of a geospatial semantic graph. The advantages of our approach are twofold. First, intuitive templates can be easily specified in terms of the domain language in which an analyst converses. These templates can be used to automatically and efficiently search large graph databases, for specific patterns of interest. Second, unsupervised machine learning techniques can be applied to automatically identify patterns in the graph databases, exposing recurring motifs in imagery. We illustrate our approach using real-world data for Anne Arundel County, Maryland, and compare the performance of our approach to that of an expert human analyst.
Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis
The Piecewise Parabolic Method (PPM) was designed as a means of exploring compressible gas dynam-ics problems of interest in astrophysics, including super-sonic jets, compressible turbulence, stellar convection, and turbulent mixing and burning of gases in stellar interiors. Over time, the capabilities encapsulated in PPM have co-evolved with the availability of a series of high performance computing platforms. Implementation of the algorithm has adapted to and advanced with the architectural capabilities and characteristics of these machines. This adaptability of our PPM codes has enabled targeted astrophysical applica-tions of PPM to exploit these scarce resources to explore complex physical phenomena. Here we describe the means by which this was accomplished, and set a path forward, with a new miniapp, mPPM, for continuing this process in a diverse and dynamic architecture design environment. Adaptations in mPPM for the latest high performance machines are discussed that address the important issue of limited bandwidth from locally attached main memory to the microprocessor chip.
Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis
Disruptive changes to computer architecture are paving the way toward extreme scale computing. The co-design strategy of collaborative research and development among computer architects, system software designers, and application teams can help to ensure that applications not only cope but thrive with these changes. In this paper, we present a novel combined co-design approach of emulation and simulation in the context of investigating future Processing in Memory (PIM) architectures. PIM enables co-location of data and computation to decrease data movement, to provide increases in memory speed and capacity compared to existing technologies and, perhaps most importantly for extreme scale, to improve energy efficiency. Our evaluation of PIM focuses on three mini-applications representing important production applications. The emulation and simulation studies examine the effects of locality-aware versus locality-oblivious data distribution and computation, and they compare PIM to conventional architectures. Both studies contribute in their own way to the overall understanding of the application-architecture interactions, and our results suggest that PIM technology shows great potential for efficient computation without negatively impacting productivity.
Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.
A method is investigated to reduce the number of numerical parameters in a material model for a solid. The basis of the method is to detect interdependencies between parameters within a class of materials of interest. The method is demonstrated for a set of material property data for iron and steel using the Johnson-Cook plasticity model.
A simple demonstration of nonlocality in a heterogeneous material is presented. By analysis of the microscale deformation of a two-component layered medium, it is shown that nonlocal interactions necessarily appear in a homogenized model of the system. Explicit expressions for the nonlocal forces are determined. The way these nonlocal forces appear in various nonlocal elasticity theories is derived. The length scales that emerge involve the constituent material properties as well as their geometrical dimen- sions. A peridynamic material model for the smoothed displacement eld is derived. It is demonstrated by comparison with experimental data that the incorporation of non- locality in modeling dramatically improves the prediction of the stress concentration in an open hole tension test on a composite plate.
We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documents the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.