Within the EXAALT project, the SNAP [1] approach is being used to develop high accuracy potentials for use in large-scale long-time molecular dynamics simulations of materials behavior. In particular, we have developed a new SNAP potential that is suitable for describing the interplay between helium atoms and vacancies in high-temperature tungsten[2]. This model is now being used to study plasma-surface interactions in nuclear fusion reactors for energy production. The high-accuracy of SNAP potentials comes at the price of increased computational cost per atom and increased computational complexity. The increased cost is mitigated by improvements in strong scaling that can be achieved using advanced algorithms [3].
When very few samples of a random quantity are available from a source distribution of unknown shape, it is usually not possible to accurately infer the exact distribution from which the data samples come. Under-estimation of important quantities such as response variance and failure probabilities can result. For many engineering purposes, including design and risk analysis, we attempt to avoid under-estimation with a strategy to conservatively estimate (bound) these types of quantities -- without being overly conservative -- when only a few samples of a random quantity are available from model predictions or replicate experiments. This report examines a class of related sparse-data uncertainty representation and inference approaches that are relatively simple, inexpensive, and effective. Tradeoffs between the methods' conservatism, reliability, and risk versus number of data samples (cost) are quantified with multi-attribute metrics use d to assess method performance for conservative estimation of two representative quantities: central 95% of response; and 10-4 probability of exceeding a response threshold in a tail of the distribution. Each method's performance is characterized with 10,000 random trials on a large number of diverse and challenging distributions. The best method and number of samples to use in a given circumstance depends on the uncertainty quantity to be estimated, the PDF character, and the desired reliability of bounding the true value. On the basis of this large data base and study, a strategy is proposed for selecting the method and number of samples for attaining reasonable credibility levels in bounding these types of quantities when sparse samples of random variables or functions are available from experiments or simulations.
There has recently been a great deal of interest in employing immiscible solutes to stabilize nanocrystalline microstructures. Existing modeling efforts largely rely on mesoscale Monte Carlo approaches that employ a simplified model of the microstructure and result in highly homogeneous segregation to grain boundaries. However, there is ample evidence from experimental and modeling studies that demonstrates segregation to grain boundaries is highly non-uniform and sensitive to boundary character. This work employs a realistic nanocrystalline microstructure with experimentally relevant global solute concentrations to illustrate inhomogeneous boundary segregation. Furthermore, experiments quantifying segregation in thin films are reported that corroborate the prediction that grain boundary segregation is highly inhomogeneous. In addition to grain boundary structure modifying the degree of segregation, the existence of a phase transformation between low and high solute content grain boundaries is predicted. In order to conduct this study, new embedded atom method interatomic potentials are developed for Pt, Au, and the PtAu binary alloy.
Triangle counting serves as a key building block for a set of important graph algorithms in network science. In this paper, we address the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node. Our implementation uses a linear algebra-based approach to triangle counting that has grown out of work related to our miniTri data analytics miniapplication [1] and our efforts to pose graph algorithms in the language of linear algebra. We leverage KokkosKernels to implement this approach efficiently on multicore architectures. Our performance results are competitive with the fastest known graph traversal-based approaches and are significantly faster than the Graph Challenge reference implementations, up to 670,000 times faster than the C++ reference and 10,000 times faster than the Python reference on a single Intel Haswell node.
In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments. In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.
The discontinuous Petrov–Galerkin (DPG) methodology of Demkowicz and Gopalakrishnan (2010, 2011) guarantees the optimality of the solution in an energy norm, and provides several features facilitating adaptive schemes. A key question that has not yet been answered in general – though there are some results for Poisson, e.g.– is how best to precondition the DPG system matrix, so that iterative solvers may be used to allow solution of large-scale problems. In this paper, we detail a strategy for preconditioning the DPG system matrix using geometric multigrid which we have implemented as part of Camellia (Roberts, 2014, 2016), and demonstrate through numerical experiments its effectiveness in the context of several variational formulations. We observe that in some of our experiments, the behavior of the preconditioner is closely tied to the discrete test space enrichment. We include experiments involving adaptive meshes with hanging nodes for lid-driven cavity flow, demonstrating that the preconditioners can be applied in the context of challenging problems. We also include a scalability study demonstrating that the approach – and our implementation – scales well to many MPI ranks.