Publications

Results 3601–3700 of 9,998

Search results

Jump to search filters

A Brief Description of the Kokkos implementation of the SNAP potential in ExaMiniMD

Thompson, Aidan P.; Trott, Christian R.

Within the EXAALT project, the SNAP [1] approach is being used to develop high accuracy potentials for use in large-scale long-time molecular dynamics simulations of materials behavior. In particular, we have developed a new SNAP potential that is suitable for describing the interplay between helium atoms and vacancies in high-temperature tungsten[2]. This model is now being used to study plasma-surface interactions in nuclear fusion reactors for energy production. The high-accuracy of SNAP potentials comes at the price of increased computational cost per atom and increased computational complexity. The increased cost is mitigated by improvements in strong scaling that can be achieved using advanced algorithms [3].

More Details

Evaluation of a Class of Simple and Effective Uncertainty Methods for Sparse Samples of Random Variables and Functions

Romero, Vicente J.; Bonney, Matthew; Schroeder, Benjamin B.; Weirs, Vincent G.

When very few samples of a random quantity are available from a source distribution of unknown shape, it is usually not possible to accurately infer the exact distribution from which the data samples come. Under-estimation of important quantities such as response variance and failure probabilities can result. For many engineering purposes, including design and risk analysis, we attempt to avoid under-estimation with a strategy to conservatively estimate (bound) these types of quantities -- without being overly conservative -- when only a few samples of a random quantity are available from model predictions or replicate experiments. This report examines a class of related sparse-data uncertainty representation and inference approaches that are relatively simple, inexpensive, and effective. Tradeoffs between the methods' conservatism, reliability, and risk versus number of data samples (cost) are quantified with multi-attribute metrics use d to assess method performance for conservative estimation of two representative quantities: central 95% of response; and 10-4 probability of exceeding a response threshold in a tail of the distribution. Each method's performance is characterized with 10,000 random trials on a large number of diverse and challenging distributions. The best method and number of samples to use in a given circumstance depends on the uncertainty quantity to be estimated, the PDF character, and the desired reliability of bounding the true value. On the basis of this large data base and study, a strategy is proposed for selecting the method and number of samples for attaining reasonable credibility levels in bounding these types of quantities when sparse samples of random variables or functions are available from experiments or simulations.

More Details

Grain boundary phase transformations in PtAu and relevance to thermal stabilization of bulk nanocrystalline metals

Journal of Materials Science

O'Brien, Christopher J.; Barr, Christopher M.; Price, Patrick M.; Hattar, Khalid M.; Foiles, Stephen M.

There has recently been a great deal of interest in employing immiscible solutes to stabilize nanocrystalline microstructures. Existing modeling efforts largely rely on mesoscale Monte Carlo approaches that employ a simplified model of the microstructure and result in highly homogeneous segregation to grain boundaries. However, there is ample evidence from experimental and modeling studies that demonstrates segregation to grain boundaries is highly non-uniform and sensitive to boundary character. This work employs a realistic nanocrystalline microstructure with experimentally relevant global solute concentrations to illustrate inhomogeneous boundary segregation. Furthermore, experiments quantifying segregation in thin films are reported that corroborate the prediction that grain boundary segregation is highly inhomogeneous. In addition to grain boundary structure modifying the degree of segregation, the existence of a phase transformation between low and high solute content grain boundaries is predicted. In order to conduct this study, new embedded atom method interatomic potentials are developed for Pt, Au, and the PtAu binary alloy.

More Details

Fast linear algebra-based triangle counting with KokkosKernels

2017 IEEE High Performance Extreme Computing Conference, HPEC 2017

Wolf, Michael W.; Deveci, Mehmet D.; Berry, Jonathan W.; Hammond, Simon D.; Rajamanickam, Sivasankaran R.

Triangle counting serves as a key building block for a set of important graph algorithms in network science. In this paper, we address the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node. Our implementation uses a linear algebra-based approach to triangle counting that has grown out of work related to our miniTri data analytics miniapplication [1] and our efforts to pose graph algorithms in the language of linear algebra. We leverage KokkosKernels to implement this approach efficiently on multicore architectures. Our performance results are competitive with the fastest known graph traversal-based approaches and are significantly faster than the Graph Challenge reference implementations, up to 670,000 times faster than the C++ reference and 10,000 times faster than the Python reference on a single Intel Haswell node.

More Details

Scalable Failure Masking for Stencil Computations using Ghost Region Expansion and Cell to Rank Remapping

SIAM Journal on Scientific Computing

Gamell, Marc; Teranishi, Keita T.; Kolla, Hemanth K.; Mayo, Jackson M.; Heroux, Michael A.; Chen, Jacqueline H.; Parashar, Manish

In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments. In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.

More Details

A geometric multigrid preconditioning strategy for DPG system matrices

Computers and Mathematics with Applications

Roberts, Nathan V.

The discontinuous Petrov–Galerkin (DPG) methodology of Demkowicz and Gopalakrishnan (2010, 2011) guarantees the optimality of the solution in an energy norm, and provides several features facilitating adaptive schemes. A key question that has not yet been answered in general – though there are some results for Poisson, e.g.– is how best to precondition the DPG system matrix, so that iterative solvers may be used to allow solution of large-scale problems. In this paper, we detail a strategy for preconditioning the DPG system matrix using geometric multigrid which we have implemented as part of Camellia (Roberts, 2014, 2016), and demonstrate through numerical experiments its effectiveness in the context of several variational formulations. We observe that in some of our experiments, the behavior of the preconditioner is closely tied to the discrete test space enrichment. We include experiments involving adaptive meshes with hanging nodes for lid-driven cavity flow, demonstrating that the preconditioners can be applied in the context of challenging problems. We also include a scalability study demonstrating that the approach – and our implementation – scales well to many MPI ranks.

More Details

XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY17

Moreland, Kenneth D.; Pugmire, David; Rogers, David; Childs, Hank; Ma, Kwan-Liu; Geveci, Berk

The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressing four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.

More Details

Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts

Parallel Computing

Booth, Joshua D.; Ellingwood, Nathan D.; Thornquist, Heidi K.; Rajamanickam, Sivasankaran

Transient simulation in circuit simulation tools, such as SPICE and Xyce, depend on scalable and robust sparse LU factorizations for efficient numerical simulation of circuits and power grids. As the need for simulations of very large circuits grow, the prevalence of multicore architectures enable us to use shared memory parallel algorithms for such simulations. A parallel factorization is a critical component of such shared memory parallel simulations. We develop a parallel sparse factorization algorithm that can solve problems from circuit simulations efficiently, and map well to architectural features. This new factorization algorithm exposes hierarchical parallelism to accommodate irregular structure that arise in our target problems. It also uses a hierarchical two-dimensional data layout which reduces synchronization costs and maps to memory hierarchy found in multicore processors. We present an OpenMP based implementation of the parallel algorithm in a new multithreaded solver called Basker in the Trilinos framework. We present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulation. Basker achieves a geometric mean speedup of 5.91× on CPU (16 cores) and 7.4× on Xeon Phi (32 cores) relative to state-of-the-art solver KLU. Basker outperforms Intel MKL Pardiso solver (PMKL) by as much as 30× on CPU (16 cores) and 7.5× on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4× speedup on a challenging matrix sequence taken from an actual Xyce simulation.

More Details

Modeling and simulating multiple failure masking enabled by local recovery for stencil-based applications at extreme scales

IEEE Transactions on Parallel and Distributed Systems

Gamell, Marc; Teranishi, Keita T.; Mayo, Jackson M.; Kolla, Hemanth K.; Heroux, Michael A.; Chen, Jacqueline H.; Parashar, Manish

Obtaining multi-process hard failure resilience at the application level is a key challenge that must be overcome before the promise of exascale can be fully realized. Previous work has shown that online global recovery can dramatically reduce the overhead of failures when compared to the more traditional approach of terminating the job and restarting it from the last stored checkpoint. If online recovery is performed in a local manner further scalability is enabled, not only due to the intrinsic lower costs of recovering locally, but also due to derived effects when using some application types. In this paper we model one such effect, namely multiple failure masking, that manifests when running Stencil parallel computations on an environment when failures are recovered locally. First, the delay propagation shape of one or multiple failures recovered locally is modeled to enable several analyses of the probability of different levels of failure masking under certain Stencil application behaviors. Our results indicate that failure masking is an extremely desirable effect at scale which manifestation is more evident and beneficial as the machine size or the failure rate increase.

More Details

Trends in Data Locality Abstractions for HPC Systems

IEEE Transactions on Parallel and Distributed Systems

Unat, Didem; Dubey, Anshu; Hoefler, Torsten; Shalf, John B.; Abraham, Mark; Bianco, Mauro; Chamberlain, Bradford L.; Cledat, Romain; Edwards, Harold C.; Finkel, Hal; Fuerlinger, Karl; Hannig, Frank; Jeannot, Emmanuel; Kamil, Amir; Keasler, Jeff; Kelly, Paul H.J.; Leung, Vitus J.; Ltaief, Hatem; Maruyama, Naoya; Newburn, Chris J.; Pericas, Miquel

More Details

Constraining the Magmatic System at Mount St. Helens (2004–2008) Using Bayesian Inversion With Physics-Based Models Including Gas Escape and Crystallization

Journal of Geophysical Research: Solid Earth

Wong, Ying Q.; Segall, Paul; Bradley, Andrew M.; Anderson, Kyle

Physics-based models of volcanic eruptions track conduit processes as functions of depth and time. When used in inversions, these models permit integration of diverse geological and geophysical data sets to constrain important parameters of magmatic systems. We develop a 1-D steady state conduit model for effusive eruptions including equilibrium crystallization and gas transport through the conduit and compare with the quasi-steady dome growth phase of Mount St. Helens in 2005. Viscosity increase resulting from pressure-dependent crystallization leads to a natural transition from viscous flow to frictional sliding on the conduit margin. Erupted mass flux depends strongly on wall rock and magma permeabilities due to their impact on magma density. Including both lateral and vertical gas transport reveals competing effects that produce nonmonotonic behavior in the mass flux when increasing magma permeability. Using this physics-based model in a Bayesian inversion, we link data sets from Mount St. Helens such as extrusion flux and earthquake depths with petrological data to estimate unknown model parameters, including magma chamber pressure and water content, magma permeability constants, conduit radius, and friction along the conduit walls. Even with this relatively simple model and limited data, we obtain improved constraints on important model parameters. We find that the magma chamber had low (<5 wt %) total volatiles and that the magma permeability scale is well constrained at ∼10−11.4m2 to reproduce observed dome rock porosities. Compared with previous results, higher magma overpressure and lower wall friction are required to compensate for increased viscous resistance while keeping extrusion rate at the observed value.

More Details

Treatment of Nuclear Data Covariance Information in Sample Generation

Swiler, Laura P.; Adams, Brian M.; Wieselquist, William

This report summarizes a NEAMS (Nuclear Energy Advanced Modeling and Simulation) project focused on developing a sampling capability that can handle the challenges of generating samples from nuclear cross-section data. The covariance information between energy groups tends to be very ill-conditioned and thus poses a problem using traditional methods for generated correlated samples. This report outlines a method that addresses the sample generation from cross-section matrices.

More Details

Sensor Placement Optimization using Chama

Klise, Katherine A.; Laird, Carl D.; Nicholson, Bethany L.

Continuous or regularly scheduled monitoring has the potential to quickly identify changes in the environment. However, even with low - cost sensors, only a limited number of sensors can be deployed. The physical placement of these sensors, along with the sensor technology and operating conditions, can have a large impact on the performance of a monitoring strategy. Chama is an open source Python package which includes mixed - integer, stochastic programming formulations to determine sensor locations and technology that maximize monitoring effectiveness. The methods in Chama are general and can be applied to a wide range of applications. Chama is currently being used to design sensor networks to monitor airborne pollutants and to monitor water quality in water distribution systems. The following documentation includes installation instructions and examples, description of software features, and software license. The software is intended to be used by regulatory agencies, industry, and the research community. It is assumed that the reader is familiar with the Python Programming Language. References are included for addit ional background on software components. Online documentation, hosted at http://chama.readthedocs.io/, will be updated as new features are added. The online version includes API documentation .

More Details

Milestone Completion Report WBS 1.3.5.05 ECP/VTK-m FY17Q4 [MS-17/03-06] Key Reduce / Spatial Division / Basic Advect / Normals STDA05-4

Moreland, Kenneth D.

The FY17Q4 milestone of the ECP/VTK-m project includes the completion of a key-reduce scheduling mechanism, a spatial division algorithm, an algorithm for basic particle advection, and the computation of smoothed surface normals. With the completion of this milestone, we are able to, respectively, more easily group like elements (a common visualization algorithm operation), provide the fundamentals for geometric search structures, provide the fundamentals for many flow visualization algorithms, and provide more realistic rendering of surfaces approximated with facets.

More Details

What Randomized Benchmarking Actually Measures

Physical Review Letters

Proctor, Timothy J.; Rudinger, Kenneth M.; Young, Kevin C.; Sarovar, Mohan S.; Blume-Kohout, Robin J.

Randomized benchmarking (RB) is widely used to measure an error rate of a set of quantum gates, by performing random circuits that would do nothing if the gates were perfect. In the limit of no finite-sampling error, the exponential decay rate of the observable survival probabilities, versus circuit length, yields a single error metric r. For Clifford gates with arbitrary small errors described by process matrices, r was believed to reliably correspond to the mean, over all Clifford gates, of the average gate infidelity between the imperfect gates and their ideal counterparts. We show that this quantity is not a well-defined property of a physical gate set. It depends on the representations used for the imperfect and ideal gates, and the variant typically computed in the literature can differ from r by orders of magnitude. We present new theories of the RB decay that are accurate for all small errors describable by process matrices, and show that the RB decay curve is a simple exponential for all such errors. These theories allow explicit computation of the error rate that RB measures (r), but as far as we can tell it does not correspond to the infidelity of a physically allowed (completely positive) representation of the imperfect gates.

More Details

Evaluating the Viability of Using Compression to Mitigate Silent Corruption of Read-Mostly Application Data

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Levy, Scott L.; Ferreira, Kurt B.; Bridges, Patrick G.

Aggregating millions of hardware components to construct an exascale computing platform will pose significant resilience challenges. In addition to slowdowns associated with detected errors, silent errors are likely to further degrade application performance. Moreover, silent data corruption (SDC) has the potential to undermine the integrity of the results produced by important scientific applications.In this paper, we propose an application-independent mechanism to efficiently detect and correct SDC in read-mostly memory, where SDC may be most likely to occur. We use memory protection mechanisms to maintain compressed backups of application memory. We detect SDC by identifying changes in memory contents that occur without explicit write operations. We demonstrate that, for several applications, our approach can potentially protect a significant fraction of application memory pages from SDC with modest overheads. Moreover, our proposed technique can be straightforwardly combined with many other approaches to provide a significant bulwark against SDC.

More Details

Enabling Diverse Software Stacks on Supercomputers Using High Performance Virtual Clusters

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Younge, Andrew J.; Laros, James H.; Grant, Ryan E.; Gaines, Brian G.; Brightwell, Ronald B.

While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed computing models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging software ecosystems.In this paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifically, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, effectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.

More Details

A global stochastic programming approach for the optimal placement of gas detectors with nonuniform unavailabilities

Journal of Loss Prevention in the Process Industries

Laird, Carl D.; Liu, Jianfeng

Optimal design of a gas detection systems is challenging because of the numerous sources of uncertainty, including weather and environmental conditions, leak location and characteristics, and process conditions. Rigorous CFD simulations of dispersion scenarios combined with stochastic programming techniques have been successfully applied to the problem of optimal gas detector placement; however, rigorous treatment of sensor failure and nonuniform unavailability has received less attention. To improve reliability of the design, this paper proposes a problem formulation that explicitly considers nonuniform unavailabilities and all backup detection levels. The resulting sensor placement problem is a large-scale mixed-integer nonlinear programming (MINLP) problem that requires a tailored solution approach for efficient solution. We have developed a multitree method which depends on iteratively solving a sequence of upper-bounding master problems and lower-bounding subproblems. The tailored global solution strategy is tested on a real data problem and the encouraging numerical results indicate that our solution framework is promising in solving sensor placement problems. This study was selected for the special issue in JLPPI from the 2016 International Symposium of the MKO Process Safety Center.

More Details
Results 3601–3700 of 9,998
Results 3601–3700 of 9,998