Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which typically consists of a set of simulation runs exploring the same problem space. An ensemble can be thought of as a set of samples within a multi-variate domain, where each sample is a vector whose value defines a point in high-dimensional space. To understand and describe the underlying problem being modeled in the simulations, ensemble analysis looks for shared behaviors and common features across the group of runs. Additionally, ensemble analysis tries to quantify differences found in any members that deviate from the rest of the group. The Slycat™ system integrates data management, scalable analysis, and visualization. Results are viewed remotely on a user’s desktop via commodity web clients using a multi-tiered hierarchy of computation and data storage, as shown in Figure 1. Our goal is to operate on data as close to the source as possible, thereby reducing time and storage costs associated with data movement. Consequently, we are working to develop parallel analysis capabilities that operate on High Performance Computing (HPC) platforms, to explore approaches for reducing data size, and to implement strategies for staging computation across the Slycat™ hierarchy. Within Slycat™, data and visual analysis are organized around projects, which are shared by a project team. Project members are explicitly added, each with a designated set of permissions. Although users sign-in to access Slycat™, individual accounts are not maintained. Instead, authentication is used to determine project access. Within projects, Slycat™ models capture analysis results and enable data exploration through various visual representations. Although for scientists each simulation run is a model of real-world phenomena given certain conditions, we use the term model to refer to our modeling of the ensemble data, not the physics. Different model types often provide complementary perspectives on data features when analyzing the same data set. Each model visualizes data at several levels of abstraction, allowing the user to range from viewing the ensemble holistically to accessing numeric parameter values for a single run. Bookmarks provide a mechanism for sharing results, enabling interesting model states to be labeled and saved.
Within the SEQUOIA project, funded by the DARPA EQUiPS program, we pursue algorithmic approaches that enable comprehensive design under uncertainty, through inclusion of aleatory/parametric and epistemic/model form uncertainties within scalable forward/inverse UQ approaches. These statistical methods are embedded within design processes that manage computational expense through active subspace, multilevel-multifidelity, and reduced-order modeling approximations. To demonstrate these methods, we focus on the design of devices that involve multi-physics interactions in advanced aerospace vehicles. A particular problem of interest is the shape design of nozzles for advanced vehicles such as the Northrop Grumman UCAS X-47B, involving coupled aero-structural-thermal simulations for nozzle performance. In this paper, we explore a combination of multilevel and multifidelity forward and inverse UQ algorithms to reduce the overall computational cost of the analysis by leveraging hierarchies of model form (i.e., multifidelity hierarchies) and solution discretization (i.e., multilevel hierarchies) in order of exploit trade offs between solution accuracy and cost. In particular, we seek the most cost effective fusion of information across complex multi-dimensional modeling hierarchies. Results to date indicate the utility of multiple approaches, including methods that optimally allocate resources when estimator variance varies smoothly across levels, methods that allocate sufficient sampling density based on sparsity estimates, and methods that employ greedy multilevel refinement.
Recent high-performance computing (HPC) platforms such as the Trinity Advanced Technology System (ATS-1) feature burst buffer resources that can have a dramatic impact on an application’s I/O performance. While these non-volatile memory (NVM) resources provide a new tier in the storage hierarchy, developers must find the right way to incorporate the technology into their applications in order to reap the benefits. Similar to other laboratories, Sandia is actively investigating ways in which these resources can be incorporated into our existing libraries and workflows without burdening our application developers with excessive, platform-specific details. This FY18Q1 milestone summaries our progress in adapting the Sandia Parallel Aerodynamics and Reentry Code (SPARC) in Sandia’s ATDM program to leverage Trinity’s burst buffers for checkpoint/restart operations. We investigated four different approaches with varying tradeoffs in this work: (1) simply updating job script to use stage-in/stage out burst buffer directives, (2) modifying SPARC to use LANL’s hierarchical I/O (HIO) library to store/retrieve checkpoints, (3) updating Sandia’s IOSS library to incorporate the burst buffer in all meshing I/O operations, and (4) modifying SPARC to use our Kelpie distributed memory library to store/retrieve checkpoints. Team members were successful in generating initial implementation for all four approaches, but were unable to obtain performance numbers in time for this report (reasons: initial problem sizes were not large enough to stress I/O, and SPARC refactor will require changes to our code). When we presented our work to the SPARC team, they expressed the most interest in the second and third approaches. The HIO work was favored because it is lightweight, unobtrusive, and should be portable to ATS-2. The IOSS work is seen as a long-term solution, and is favored because all I/O work (including checkpoints) can be deferred to a single library.
Safety basis analysts throughout the U.S. Department of Energy (DOE) complex rely heavily on the information provided in the DOE Handbook, DOE-HDBK-3010, Airborne Release Fractions/Rates and Respirable Fractions for Nonreactor Nuclear Facilities, to determine radionuclide source terms from postulated accident scenarios. In calculating source terms, analysts tend to use the DOE Handbook’s bounding values on airborne release fractions (ARFs) and respirable fractions (RFs) for various categories of insults (representing potential accident release categories). This is typically due to both time constraints and the avoidance of regulatory critique. Unfortunately, these bounding ARFs/RFs represent extremely conservative values. Moreover, they were derived from very limited small-scale bench/laboratory experiments and/or from engineered judgment. Thus, the basis for the data may not be representative of the actual unique accident conditions and configurations being evaluated. The goal of this research is to develop a more accurate and defensible method to determine bounding values for the DOE Handbook using state-of-art multi-physics-based computer codes.
Containerization, or OS-level virtualization has taken root within the computing industry. However, container utilization and its impact on performance and functionality within High Performance Computing (HPC) is still relatively undefined. This paper investigates the use of containers with advanced supercomputing and HPC system software. With this, we define a model for parallel MPI application DevOps and deployment using containers to enhance development effort and provide container portability from laptop to clouds or supercomputers. In this endeavor, we extend the use of Sin- gularity containers to a Cray XC-series supercomputer. We use the HPCG and IMB benchmarks to investigate potential points of overhead and scalability with containers on a Cray XC30 testbed system. Furthermore, we also deploy the same containers with Docker on Amazon's Elastic Compute Cloud (EC2), and compare against our Cray supercomputer testbed. Our results indicate that Singularity containers operate at native performance when dynamically linking Cray's MPI libraries on a Cray supercomputer testbed, and that while Amazon EC2 may be useful for initial DevOps and testing, scaling HPC applications better fits supercomputing resources like a Cray.
Optimizing thermal generation commitments and dispatch in the presence of high penetrations of renewable resources such as solar energy requires a characterization of their stochastic properties. In this study, we describe novel methods designed to create day-ahead, wide-area probabilistic solar power scenarios based only on historical forecasts and associated observations of solar power production. Each scenario represents a possible trajectory for solar power in next-day operations with an associated probability computed by algorithms that use historical forecast errors. Scenarios are created by segmentation of historic data, fitting non-parametric error distributions using epi-splines, and then computing specific quantiles from these distributions. Additionally, we address the challenge of establishing an upper bound on solar power output. Our specific application driver is for use in stochastic variants of core power systems operations optimization problems, e.g., unit commitment and economic dispatch. These problems require as input a range of possible future realizations of renewables production. However, the utility of such probabilistic scenarios extends to other contexts, e.g., operator and trader situational awareness. Finally, we compare the performance of our approach to a recently proposed method based on quantile regression, and demonstrate that our method performs comparably to this approach in terms of two widely used methods for assessing the quality of probabilistic scenarios: the Energy score and the Variogram score.
We describe pyomo.dae, an open source Python-based modeling framework that enables high-level abstract specification of optimization problems with differential and algebraic equations. The pyomo.dae framework is integrated with the Pyomo open source algebraic modeling language, and is available at http://www.pyomo.org. One key feature of pyomo.dae is that it does not restrict users to standard, predefined forms of differential equations, providing a high degree of modeling flexibility and the ability to express constraints that cannot be easily specified in other modeling frameworks. Other key features of pyomo.dae are the ability to specify optimization problems with high-order differential equations and partial differential equations, defined on restricted domain types, and the ability to automatically transform high-level abstract models into finite-dimensional algebraic problems that can be solved with off-the-shelf solvers. Moreover, pyomo.dae users can leverage existing capabilities of Pyomo to embed differential equation models within stochastic and integer programming models and mathematical programs with equilibrium constraint formulations. Collectively, these features enable the exploration of new modeling concepts, discretization schemes, and the benchmarking of state-of-the-art optimization solvers.
In numerous applications, scientists and engineers acquire varied forms of data that partially characterize the inputs to an underlying physical system. This data is then used to inform decisions such as controls and designs. Consequently, it is critical that the resulting control or design is robust to the inherent uncertainties associated with the unknown probabilistic characterization of the model inputs. Here in this work, we consider optimal control and design problems constrained by partial differential equations with uncertain inputs. We do not assume a known probabilistic model for the inputs, but rather we formulate the problem as a distributionally robust optimization problem where the outer minimization problem determines the control or design, while the inner maximization problem determines the worst-case probability measure that matches desired characteristics of the data. We analyze the inner maximization problem in the space of measures and introduce a novel measure approximation technique, based on the approximation of continuous functions, to discretize the unknown probability measure. Finally, we prove consistency of our approximated min-max problem and conclude with numerical results.
We derive an intrinsically temperature-dependent approximation to the correlation grand potential for many-electron systems in thermodynamical equilibrium in the context of finite-temperature reduced-density-matrix-functional theory (FT-RDMFT). We demonstrate its accuracy by calculating the magnetic phase diagram of the homogeneous electron gas. We compare it to known limits from highly accurate quantum Monte Carlo calculations as well as to phase diagrams obtained within existing exchange-correlation approximations from density functional theory and zero-temperature RDMFT.
With the rapid spread in use of Digital Image Correlation (DIC) globally, it is important there be some standard methods of verifying and validating DIC codes. To this end, the DIC Challenge board was formed and is maintained under the auspices of the Society for Experimental Mechanics (SEM) and the international DIC society (iDICs). The goal of the DIC Board and the 2D–DIC Challenge is to supply a set of well-vetted sample images and a set of analysis guidelines for standardized reporting of 2D–DIC results from these sample images, as well as for comparing the inherent accuracy of different approaches and for providing users with a means of assessing their proper implementation. This document will outline the goals of the challenge, describe the image sets that are available, and give a comparison between 12 commercial and academic 2D–DIC codes using two of the challenge image sets.
The role of an external field on capillary waves at the liquid-vapor interface of a dipolar fluid is investigated using molecular dynamics simulations. For fields parallel to the interface, the interfacial width squared increases linearly with respect to the logarithm of the size of the interface across all field strengths tested. The value of the slope decreases with increasing field strength, indicating that the field dampens the capillary waves. With the inclusion of the parallel field, the surface stiffness increases with increasing field strength faster than the surface tension. For fields perpendicular to the interface, the interfacial width squared is linear with respect to the logarithm of the size of the interface for small field strengths, and the surface stiffness is less than the surface tension. Above a critical field strength that decreases as the size of the interface increases, the interface becomes unstable due to the increased amplitude of the capillary waves.
Algebraic multigrid (AMG) preconditioners are considered for discretized systems of partial differential equations (PDEs) where unknowns associated with different physical quantities are not necessarily colocated at mesh points. Specifically, we investigate a Q2−Q1 mixed finite element discretization of the incompressible Navier–Stokes equations where the number of velocity nodes is much greater than the number of pressure nodes. Consequently, some velocity degrees of freedom (DOFs) are defined at spatial locations where there are no corresponding pressure DOFs. Thus, AMG approaches leveraging this colocated structure are not applicable. This paper instead proposes an automatic AMG coarsening that mimics certain pressure/velocity DOF relationships of the Q2−Q1 discretization. The main idea is to first automatically define coarse pressures in a somewhat standard AMG fashion and then to carefully (but automatically) choose coarse velocity unknowns so that the spatial location relationship between pressure and velocity DOFs resembles that on the finest grid. To define coefficients within the intergrid transfers, an energy minimization AMG (EMIN-AMG) is utilized. EMIN-AMG is not tied to specific coarsening schemes and grid transfer sparsity patterns, and so it is applicable to the proposed coarsening. Numerical results highlighting solver performance are given on Stokes and incompressible Navier–Stokes problems.
This paper describes versions of OPAL, the Occam-Plausibility Algorithm (Farrell et al. in J Comput Phys 295:189–208, 2015) in which the use of Bayesian model plausibilities is replaced with information-theoretic methods, such as the Akaike information criterion and the Bayesian information criterion. Applications to complex systems of coarse-grained molecular models approximating atomistic models of polyethylene materials are described. All of these model selection methods take into account uncertainties in the model, the observational data, the model parameters, and the predicted quantities of interest. A comparison of the models chosen by Bayesian model selection criteria and those chosen by the information-theoretic criteria is given.
In this study we developed an efficient Bayesian inversion framework for interpreting marine seismic Amplitude Versus Angle and Controlled-Source Electromagnetic data for marine reservoir characterization. The framework uses a multi-chain Markov-chain Monte Carlo sampler, which is a hybrid of DiffeRential Evolution Adaptive Metropolis and Adaptive Metropolis samplers. The inversion framework is tested by estimating reservoir-fluid saturations and porosity based on marine seismic and Controlled-Source Electromagnetic data. The multi-chain Markov-chain Monte Carlo is scalable in terms of the number of chains, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. As a demonstration, the approach is used to efficiently and accurately estimate the porosity and saturations in a representative layered synthetic reservoir. The results indicate that the seismic Amplitude Versus Angle and Controlled-Source Electromagnetic joint inversion provides better estimation of reservoir saturations than the seismic Amplitude Versus Angle only inversion, especially for the parameters in deep layers. The performance of the inversion approach for various levels of noise in observational data was evaluated — reasonable estimates can be obtained with noise levels up to 25%. Sampling efficiency due to the use of multiple chains was also checked and was found to have almost linear scalability.
We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability, anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.
The FY18Q1 milestone of the ECP/VTK-m project includes the implementation of a multiblock data set, the completion of a gradients filtering operation, and the release of version 1.1 of the VTK-m software. With the completion of this milestone, the new multiblock data set allows us to iteratively schedule algorithms on composite data structures such as assemblies or hierarchies like AMR. The new gradient algorithms approximate derivatives of fields in 3D structures with finite differences. Finally, the release of VTK-m version 1.1 tags a stable release of the software that can more easily be incorporated into external projects.
We study the optimization of an energy function used by the meshing community to measure and improve mesh quality. This energy is non-traditional because it is dependent on both the primal triangulation and its dual Voronoi (power) diagram. The energy is a measure of the mesh's quality for usage in Discrete Exterior Calculus (DEC), a method for numerically solving PDEs. In DEC, the PDE domain is triangulated and this mesh is used to obtain discrete approximations of the continuous operators in the PDE. The energy of a mesh gives an upper bound on the error of the discrete diagonal approximation of the Hodge star operator. In practice, one begins with an initial mesh and then makes adjustments to produce a mesh of lower energy. However, we have discovered several shortcomings in directly optimizing this energy, e.g. its non-convexity, and we show that the search for an optimized mesh may lead to mesh inversion (malformed triangles). We propose a new energy function to address some of these issues.
Simulating HPC systems is a difficult task and the emergence of “Beyond CMOS” architectures and execution models will increase that difficulty. This document presents a “tutorial” on some of the simulation challenges faced by conventional and non-conventional architectures (Section 1) and goals and requirements for simulating Beyond CMOS systems (Section 2). These provide background for proposed short- and long-term roadmaps for simulation efforts at Sandia (Sections 3 and 4). Additionally, a brief explanation of a proof-of-concept integration of a Beyond CMOS architectural simulator is presented (Section 2.3).
Conventional wisdom in the spacecraft domain is that on-orbit computation is expensive, and thus, information is traditionally funneled to the ground as directly as possible. The explosion of information due to larger sensors, the advancements of Moore's law, and other considerations lead us to revisit this practice. In this article, we consider the trade-off between computation, storage, and transmission, viewed as an energy minimization problem.
Unlike general purpose computer architectures that are comprised of complex processor cores and sequential computation, the brain is innately parallel and contains highly complex connections between computational units (neurons). Key to the architecture of the brain is a functionality enabled by the combined effect of spiking communication and sparse connectivity with unique variable efficacies and temporal latencies. Utilizing these neuroscience principles, we have developed the Spiking Temporal Processing Unit (STPU) architecture which is well-suited for areas such as pattern recognition and natural language processing. In this paper, we formally describe the STPU, implement the STPU on a field programmable gate array, and show measured performance data.
Most existing concepts for hardware implementation of reversible computing invoke an adiabatic computing paradigm, in which individual degrees of freedom (e.g., node voltages) are synchronously transformed under the influence of externallysupplied driving signals. But distributing these "power/clock" signals to all gates within a design while efficiently recovering their energy is difficult. Can we reduce clocking overhead using a ballistic approach, wherein data signals self-propagating between devices drive most state transitions? Traditional concepts of ballistic computing, such as the classic Billiard-Ball Model, typically rely on a precise synchronization of interacting signals, which can fail due to exponential amplification of timing differences when signals interact. In this paper, we develop a general model of Asynchronous Ballistic Reversible Computing (ABRC) that aims to address these problems by eliminating the requirement for precise synchronization between signals. Asynchronous reversible devices in this model are isomorphic to a restricted set of Mealy finite-state machines. We explore ABRC devices having up to 3 bidirectional I/O terminals and up to 2 internal states, identifying a simple pair of such devices that comprises a computationally universal set of primitives. We also briefly discuss how ABRC might be implemented using single flux quanta in superconducting circuits.
Most existing concepts for hardware implementation of reversible computing invoke an adiabatic computing paradigm, in which individual degrees of freedom (e.g., node voltages) are synchronously transformed under the influence of externallysupplied driving signals. But distributing these "power/clock" signals to all gates within a design while efficiently recovering their energy is difficult. Can we reduce clocking overhead using a ballistic approach, wherein data signals self-propagating between devices drive most state transitions? Traditional concepts of ballistic computing, such as the classic Billiard-Ball Model, typically rely on a precise synchronization of interacting signals, which can fail due to exponential amplification of timing differences when signals interact. In this paper, we develop a general model of Asynchronous Ballistic Reversible Computing (ABRC) that aims to address these problems by eliminating the requirement for precise synchronization between signals. Asynchronous reversible devices in this model are isomorphic to a restricted set of Mealy finite-state machines. We explore ABRC devices having up to 3 bidirectional I/O terminals and up to 2 internal states, identifying a simple pair of such devices that comprises a computationally universal set of primitives. We also briefly discuss how ABRC might be implemented using single flux quanta in superconducting circuits.
Proceedings of LLVM-HPC 2017: 4th Workshop on the LLVM Compiler Infrastructure in HPC - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing compilers for task-level parallelism are still in their infancy. This work explores a compiler front end that translates OpenMP tasking semantics to Tapir, an extension to LLVM IR that represents fork-join parallelism. This enables analyses and optimizations that were previously inaccessible to OpenMP codes, as well as the ability to target additional runtimes at code generation. Using a Cilk runtime back end, we compare results to existing OpenMP implementations. Initial performance results for the Barcelona OpenMP task suite show performance improvements over existing implementations.
Proceedings of PDSW-DISCS 2017 - 2nd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis
Significant challenges exist in the efficient retrieval of data from extreme-scale simulations. An important and evolving method of addressing these challenges is application-level metadata management. Historically, HDF5 and NetCDF have eased data retrieval by offering rudimentary attribute capabilities that provide basic metadata. ADIOS simplified data retrieval by utilizing metadata for each process' data. EMPRESS provides a simple example of the next step in this evolution by integrating per-process metadata with the storage system itself, making it more broadly useful than single file or application formats. Additionally, it allows for more robust and customizable metadata.
Many applications, such as PDE based simulations and machine learning, apply BLAS/LAPACK routines to large groups of small matrices. While existing batched BLAS APIs provide meaningful speedup for this problem type, a non-canonical data layout enabling cross-matrix vectorization may provide further significant speedup. In this paper, we propose a new compact data layout that interleaves matrices in blocks according to the SIMD vector length. We combine this compact data layout with a new interface to BLAS/LAPACK routines that can be used within a hierarchical parallel application. Our layout provides up to 14x, 45x, and 27x speedup against OpenMP loops around optimized DGEMM, DTRSM and DGETRF kernels, respectively, on the Intel Knights Landing architecture. We discuss the compact batched BLAS/LAPACK implementations in two libraries, KokkosKernels and Intel® Math Kernel Library. We demonstrate the APIs in a line solver for coupled PDEs. Finally, we present detailed performance analysis of our kernels.
Within the EXAALT project, the SNAP [1] approach is being used to develop high accuracy potentials for use in large-scale long-time molecular dynamics simulations of materials behavior. In particular, we have developed a new SNAP potential that is suitable for describing the interplay between helium atoms and vacancies in high-temperature tungsten[2]. This model is now being used to study plasma-surface interactions in nuclear fusion reactors for energy production. The high-accuracy of SNAP potentials comes at the price of increased computational cost per atom and increased computational complexity. The increased cost is mitigated by improvements in strong scaling that can be achieved using advanced algorithms [3].
When very few samples of a random quantity are available from a source distribution of unknown shape, it is usually not possible to accurately infer the exact distribution from which the data samples come. Under-estimation of important quantities such as response variance and failure probabilities can result. For many engineering purposes, including design and risk analysis, we attempt to avoid under-estimation with a strategy to conservatively estimate (bound) these types of quantities -- without being overly conservative -- when only a few samples of a random quantity are available from model predictions or replicate experiments. This report examines a class of related sparse-data uncertainty representation and inference approaches that are relatively simple, inexpensive, and effective. Tradeoffs between the methods' conservatism, reliability, and risk versus number of data samples (cost) are quantified with multi-attribute metrics use d to assess method performance for conservative estimation of two representative quantities: central 95% of response; and 10-4 probability of exceeding a response threshold in a tail of the distribution. Each method's performance is characterized with 10,000 random trials on a large number of diverse and challenging distributions. The best method and number of samples to use in a given circumstance depends on the uncertainty quantity to be estimated, the PDF character, and the desired reliability of bounding the true value. On the basis of this large data base and study, a strategy is proposed for selecting the method and number of samples for attaining reasonable credibility levels in bounding these types of quantities when sparse samples of random variables or functions are available from experiments or simulations.
We develop and demonstrate a new, hybrid simulation approach for charged fluids, which combines the accuracy of the nonlocal, classical density functional theory (cDFT) with the efficiency of the Poisson–Nernst–Planck (PNP) equations. The approach is motivated by the fact that the more accurate description of the physics in the cDFT model is required only near the charged surfaces, while away from these regions the PNP equations provide an acceptable representation of the ionic system. We formulate the hybrid approach in two stages. The first stage defines a coupled hybrid model in which the PNP and cDFT equations act independently on two overlapping domains, subject to suitable interface coupling conditions. At the second stage we apply the principles of the alternating Schwarz method to the hybrid model by using the interface conditions to define the appropriate boundary conditions and volume constraints exchanged between the PNP and the cDFT subdomains. Numerical examples with two representative examples of ionic systems demonstrate the numerical properties of the method and its potential to reduce the computational cost of a full cDFT calculation, while retaining the accuracy of the latter near the charged surfaces.