A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Continued progress in computing has augmented the quest for higher performance with a new quest for higher energy efficiency. This has led to the re-emergence of Processing-In-Memory (PIM) ar- chitectures that offer higher density and performance with some boost in energy efficiency. Past PIM work either integrated a standard CPU with a conventional DRAM to improve the CPU- memory link, or used a bit-level processor with Single Instruction Multiple Data (SIMD) control, but neither matched the energy consumption of the memory to the computation. We originally proposed to develop a new architecture derived from PIM that more effectively addressed energy efficiency for high performance scientific, data analytics, and neuromorphic applications. We also originally planned to implement a von Neumann architecture with arithmetic/logic units (ALUs) that matched the power consumption of an advanced storage array to maximize energy efficiency. Implementing this architecture in storage was our original idea, since by augmenting storage (in- stead of memory), the system could address both in-memory computation and applications that accessed larger data sets directly from storage, hence Processing-in-Memory-and-Storage (PIMS). However, as our research matured, we discovered several things that changed our original direc- tion, the most important being that a PIM that implements a standard von Neumann-type archi- tecture results in significant energy efficiency improvement, but only about a O(10) performance improvement. In addition to this, the emergence of new memory technologies moved us to propos- ing a non-von Neumann architecture, called Superstrider, implemented not in storage, but in a new DRAM technology called High Bandwidth Memory (HBM). HBM is a stacked DRAM tech- nology that includes a logic layer where an architecture such as Superstrider could potentially be implemented.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Computer
Could combining quantum computing and machine learning with Moore's law produce a true 'rebooted computer'? This article posits that a three-technology hybrid-computing approach might yield sufficiently improved answers to a broad class of problems such that energy efficiency will no longer be the dominant concern.
Computational Mechanics
The heterogeneity in mechanical fields introduced by microstructure plays a critical role in the localization of deformation. To resolve this incipient stage of failure, it is therefore necessary to incorporate microstructure with sufficient resolution. On the other hand, computational limitations make it infeasible to represent the microstructure in the entire domain at the component scale. In this study, the authors demonstrate the use of concurrent multiscale modeling to incorporate explicit, finely resolved microstructure in a critical region while resolving the smoother mechanical fields outside this region with a coarser discretization to limit computational cost. The microstructural physics is modeled with a high-fidelity model that incorporates anisotropic crystal elasticity and rate-dependent crystal plasticity to simulate the behavior of a stainless steel alloy. The component-scale material behavior is treated with a lower fidelity model incorporating isotropic linear elasticity and rate-independent J2 plasticity. The microstructural and component scale subdomains are modeled concurrently, with coupling via the Schwarz alternating method, which solves boundary-value problems in each subdomain separately and transfers solution information between subdomains via Dirichlet boundary conditions. In this study, the framework is applied to model incipient localization in tensile specimens during necking.
New Journal of Physics
Quantum state tomography on a d-dimensional system demands resources that grow rapidly with d. They may be reduced by using model selection to tailor the number of parameters in the model (i.e., the size of the density matrix). Most model selection methods typically rely on a test statistic and a null theory that describes its behavior when two models are equally good. Here, we consider the loglikelihood ratio. Because of the positivity constraint ρ ≥ 0, quantum state space does not generally satisfy local asymptotic normality (LAN), meaning the classical null theory for the loglikelihood ratio (the Wilks theorem) should not be used. Thus, understanding and quantifying how positivity affects the null behavior of this test statistic is necessary for its use in model selection for state tomography. We define a new generalization of LAN, metric-projected LAN, show that quantum state space satisfies it, and derive a replacement for the Wilks theorem. In addition to enabling reliable model selection, our results shed more light on the qualitative effects of the positivity constraint on state tomography.
Physical Review B
Shock wave interactions with defects, such as pores, are known to play a key role in the chemical initiation of energetic materials. The shock response of hexanitrostilbene is studied through a combination of large-scale reactive molecular dynamics and mesoscale hydrodynamic simulations. In order to extend our simulation capability at the mesoscale to include weak shock conditions (<6 GPa), atomistic simulations of pore collapse are used to define a strain-rate-dependent strength model. Comparing these simulation methods allows us to impose physically reasonable constraints on the mesoscale model parameters. In doing so, we have been able to study shock waves interacting with pores as a function of this viscoplastic material response. We find that the pore collapse behavior of weak shocks is characteristically different than that of strong shocks.
The scientific goal of ExaWind Exascale Computing Project (ECP) is to advance our fundamental understanding of the flow physics governing whole wind plant performance, including wake formation, complex terrain impacts, and turbine-turbine-interaction effects. Current methods for modeling wind plant performance fall short due to insufficient model fidelity and inadequate treatment of key phenomena, combined with a lack of computational power necessary to address the wide range of relevant length scales associated with wind plants. Thus, our ten-year exascale challenge is the predictive simulation of a wind plant composed of O(100) multi-MW wind turbines sited within a 100 km2 area with complex terrain, involving simulations with O(100) billion grid points. The project plan builds progressively from predictive petascale simulations of a single turbine, where the detailed blade geometry is resolved, meshes rotate and deform with blade motions, and atmospheric turbulence is realistically modeled, to a multi turbine array in complex terrain. The ALCC allocation will be used continually throughout the allocation period. In the first half of the allocation period, small (e.g., for testing Kokkos algorithms) and medium (e.g., 10K cores for highly resolved ABL simulations) sized jobs will be typical. In the second half of the allocation period, we will also have a number of large submittals for our resolved-turbine simulations. A challenge in the latter period is that small time step sizes will require long wall-clock times for statistically meaningful solutions. As such, we expect our allocation-hour burn rate to increase as we move through the allocation period.
International Journal of Quantum Chemistry
The inverse problem of Kohn–Sham density functional theory (DFT) is often solved in an effort to benchmark and design approximate exchange-correlation potentials. The forward and inverse problems of DFT rely on the same equations but the numerical methods for solving each problem are substantially different. We examine both problems in this tutorial with a special emphasis on the algorithms and error analysis needed for solving the inverse problem. Two inversion methods based on partial differential equation constrained optimization and constrained variational ideas are introduced. We compare and contrast several different inversion methods applied to one-dimensional finite and periodic model systems.
2018 SEG International Exposition and Annual Meeting, SEG 2018
Methods for the efficient representation of fracture response in geoelectric models impact an impressively broad range of problems in applied geophysics. We adopt the recently-developed hierarchical material property representation in finite element analysis (Weiss, 2017) to model the electrostatic response of a discrete set of vertical fractures in the near surface and compare these results to those from anisotropic continuum models. We also examine the power law behavior of these results and compare to continuum theory. We find that in measurement profiles from a single point source in directions both parallel and perpendicular to the fracture set, the fracture signature persists over all distances. Furthermore, the homogenization limit (distance at which the individual fracture anomalies are too small to be either measured or of interest) is not strictly a function of the geometric distribution of the fractures, but also their conductivity relative to the background. Hence, we show that the definition of “representative elementary volume”, that distance over which the statistics of the underlying heterogeneities is stationary, is incomplete as it pertains to the applicability of an equivalent continuum model. We also show that detailed interrogation of such intrinsically heterogeneous models may reveal power law behavior that appears anomalous, thus suggesting a possible mechanism to reconcile emerging theories in fractional calculus with classical electromagnetic theory.
Applied Computational Electromagnetics Society Newsletter
A closed-form solution is described here for the equilibrium configurations of the magnetic field in a simple heterogeneous domain. This problem and its solution are used for rigorous assessment of the accuracy of the ALEGRA code in the quasistatic limit. By the equilibrium configuration we understand the static condition, or the stationary states without macroscopic current. The analysis includes quite a general class of 2D solutions for which a linear isotropic metallic matrix is placed inside a stationary magnetic field approaching a constant value ° at infinity. The process of evolution of the magnetic fields inside and outside the inclusion and the parameters for which the quasi-static approach provides for self-consistent results is also explored. It is demonstrated that under spatial mesh refinement, ALEGRA converges to the analytic solution for the interior of the inclusion at the expected rate, for both body-fitted and regular rectangular meshes.
Springer Proceedings in Complexity
Anomaly detection is an important problem in various fields of complex systems research including image processing, data analysis, physical security and cybersecurity. In image processing, it is used for removing noise while preserving image quality, and in data analysis, physical security and cybersecurity, it is used to find interesting data points, objects or events in a vast sea of information. Anomaly detection will continue to be an important problem in domains intersecting with “Big Data”. In this paper we provide a novel algorithm for anomaly detection that uses phase-coded spiking neurons as basic computational elements.
10th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2018, co-located with USENIX ATC 2018
Abstract not provided.
Multiscale Modeling and Simulation
In many applications the resolution of small-scale heterogeneities remains a significant hurdle to robust and reliable predictive simulations. In particular, while material variability at the mesoscale plays a fundamental role in processes such as material failure, the resolution required to capture mechanisms at this scale is often computationally intractable. Multiscale methods aim to overcome this difficulty through judicious choice of a subscale problem and a robust manner of passing information between scales. One promising approach is the multiscale finite element method, which increases the fidelity of macroscale simulations by solving lower-scale problems that produce enriched multiscale basis functions. In this study, we present the first work toward application of the multiscale finite element method to the nonlocal peridynamic theory of solid mechanics. This is achieved within the context of a discontinuous Galerkin framework that facilitates the description of material discontinuities and does not assume the existence of spatial derivatives. Analysis of the resulting nonlocal multiscale finite element method is achieved using the ambulant Galerkin method, developed here with sufficient generality to allow for application to multiscale finite element methods for both local and nonlocal models that satisfy minimal assumptions. We conclude with preliminary results on a mixed-locality multiscale finite element method in which a nonlocal model is applied at the fine scale and a local model at the coarse scale.
IEEE Transactions on Visualization and Computer Graphics
Evaluating the effectiveness of data visualizations is a challenging undertaking and often relies on one-off studies that test a visualization in the context of one specific task. Researchers across the fields of data science, visualization, and human-computer interaction are calling for foundational tools and principles that could be applied to assessing the effectiveness of data visualizations in a more rapid and generalizable manner. One possibility for such a tool is a model of visual saliency for data visualizations. Visual saliency models are typically based on the properties of the human visual cortex and predict which areas of a scene have visual features (e.g. color, luminance, edges) that are likely to draw a viewer's attention. While these models can accurately predict where viewers will look in a natural scene, they typically do not perform well for abstract data visualizations. In this paper, we discuss the reasons for the poor performance of existing saliency models when applied to data visualizations. We introduce the Data Visualization Saliency (DVS) model, a saliency model tailored to address some of these weaknesses, and we test the performance of the DVS model and existing saliency models by comparing the saliency maps produced by the models to eye tracking data obtained from human viewers. Finally, we describe how modified saliency models could be used as general tools for assessing the effectiveness of visualizations, including the strengths and weaknesses of this approach.
AIAA Non-Deterministic Approaches Conference, 2018
When very few samples of a random quantity are available from a source distribution or probability density function (PDF) of unknown shape, it is usually not possible to accurately infer the PDF from which the data samples come. Then a significant component of epistemic uncertainty exists concerning the source distribution of random or aleatory variability. For many engineering purposes, including design and risk analysis, one would normally want to avoid inference related under-estimation of important quantities such as response variance, and failure probabilities. Recent research has established the practicality and effectiveness of a class of simple and inexpensive UQ Methods for reasonable conservative estimation of such quantities when only sparse samples of a random quantity are available. This class of UQ methods is explained, demonstrated, and analyzed in this paper within the context of the Sandia Cantilever Beam End-to-End UQ Problem, Part A.1. Several sets of sparse replicate data are involved and several representative uncertainty quantities are to be estimated: A) beam deflection variability, in particular the 2.5 to 97.5 percentile “central 95%” range of the sparsely sampled PDF of deflection; and B) a small exceedance probability associated with a tail of the PDF integrated beyond a specified deflection tolerance.
AIAA Aerospace Sciences Meeting, 2018
Unstructured grid adaptation is a tool to control Computational Fluid Dynamics (CFD) discretization error. However, adaptive grid techniques have made limited impact on production analysis workflows where the control of discretization error is critical to obtaining reliable simulation results. Issues that prevent the use of adaptive grid methods are identified by applying unstructured grid adaptation methods to a series of benchmark cases. Once identified, these challenges to existing adaptive workflows can be addressed. Unstructured grid adaptation is evaluated for test cases described on the Turbulence Modeling Resource (TMR) web site, which documents uniform grid refinement of multiple schemes. The cases are turbulent flow over a Hemisphere Cylinder and an ONERA M6 Wing. Adaptive grid force and moment trajectories are shown for three integrated grid adaptation processes with Mach interpolation control and output error based metrics. The integrated grid adaptation process with a finite element (FE) discretization produced results consistent with uniform grid refinement of fixed grids. The integrated grid adaptation processes with finite volume schemes were slower to converge to the reference solution than the FE method. Metric conformity is documented on grid/metric snapshots for five grid adaptation mechanics implementations. These tools produce anisotropic boundary conforming grids requested by the adaptation process.
AIAA Non-Deterministic Approaches Conference, 2018
When very few samples of a random quantity are available from a source distribution or probability density function (PDF) of unknown shape, it is usually not possible to accurately infer the PDF from which the data samples come. Then a significant component of epistemic uncertainty exists concerning the source distribution of random or aleatory variability. For many engineering purposes, including design and risk analysis, one would normally want to avoid inference related under-estimation of important quantities such as response variance, and failure probabilities. Recent research has established the practicality and effectiveness of a class of simple and inexpensive UQ Methods for reasonable conservative estimation of such quantities when only sparse samples of a random quantity are available. This class of UQ methods is explained, demonstrated, and analyzed in this paper within the context of the Sandia Cantilever Beam End-to-End UQ Problem, Part A.1. Several sets of sparse replicate data are involved and several representative uncertainty quantities are to be estimated: A) beam deflection variability, in particular the 2.5 to 97.5 percentile “central 95%” range of the sparsely sampled PDF of deflection; and B) a small exceedance probability associated with a tail of the PDF integrated beyond a specified deflection tolerance.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
In this work we present a computational capability featuring a hierarchy of models with different fidelities for the solution of electrokinetics problems at the micro-/nano-scale. A multifidelity approach allows the selection of the most appropriate model, in terms of accuracy and computational cost, for the particular application at hand. We demonstrate the proposed multifidelity approach by studying the mobility of a colloid in a micro-channel as a function of the colloid charge and of the size of the ions dissolved in the fluid.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Independent meshing of subdomains separated by an interface can lead to spatially non-coincident discrete interfaces. We present an optimization-based coupling method for such problems, which does not require a common mesh refinement of the interface, has optimal H1 convergence rates, and passes a patch test. The method minimizes the mismatch of the state and normal stress extensions on discrete interfaces subject to the subdomain equations, while interface “fluxes” provide virtual Neumann controls.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Maintaining the performance of high-performance computing (HPC) applications with the expected increase in failures is a major challenge for next-generation extreme-scale systems. With increasing scale, resilience activities (e.g. checkpointing) are expected to become more diverse, less tightly synchronized, and more computationally intensive. Few existing studies, however, have examined how decisions about scheduling resilience activities impact application performance. In this work, we examine the relationship between the duration and frequency of resilience activities and application performance. Our study reveals several key findings: (i) the aggregate amount of time consumed by resilience activities is not an effective metric for predicting application performance; (ii) the duration of the interruptions due to resilience activities has the greatest influence on application performance; shorter, but more frequent, interruptions are correlated with better application performance; and (iii) the differential impact of resilience activities across applications is related to the applications’ inter-collective frequencies; the performance of applications that perform infrequent collective operations scales better in the presence of resilience activities than the performance of applications that perform more frequent collective operations. This initial study demonstrates the importance of considering how resilience activities are scheduled. We provide critical analysis and direct guidance on how the resilience challenges of future systems can be met while minimizing the impact on application performance.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
We review the physical foundations of Landauer’s Principle, which relates the loss of information from a computational process to an increase in thermodynamic entropy. Despite the long history of the Principle, its fundamental rationale and proper interpretation remain frequently misunderstood. Contrary to some misinterpretations of the Principle, the mere transfer of entropy between computational and non-computational subsystems can occur in a thermodynamically reversible way without increasing total entropy. However, Landauer’s Principle is not about general entropy transfers; rather, it more specifically concerns the ejection of (all or part of) some correlated information from a controlled, digital form (e.g., a computed bit) to an uncontrolled, non-computational form, i.e., as part of a thermal environment. Any uncontrolled thermal system will, by definition, continually re-randomize the physical information in its thermal state, from our perspective as observers who cannot predict the exact dynamical evolution of the microstates of such environments. Thus, any correlations involving information that is ejected into and subsequently thermalized by the environment will be lost from our perspective, resulting directly in an irreversible increase in thermodynamic entropy. Avoiding the ejection and thermalization of correlated computational information motivates the reversible computing paradigm, although the requirements for computations to be thermodynamically reversible are less restrictive than frequently described, particularly in the case of stochastic computational operations. There are interesting possibilities for the design of computational processes that utilize stochastic, many-to-one computational operations while nevertheless avoiding net entropy increase that remain to be fully explored.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
In this work we present a computational capability featuring a hierarchy of models with different fidelities for the solution of electrokinetics problems at the micro-/nano-scale. A multifidelity approach allows the selection of the most appropriate model, in terms of accuracy and computational cost, for the particular application at hand. We demonstrate the proposed multifidelity approach by studying the mobility of a colloid in a micro-channel as a function of the colloid charge and of the size of the ions dissolved in the fluid.
AIAA Journal
The development of scramjet engines is an important research area for advancing hypersonic and orbital flights. Progress toward optimal engine designs requires accurate flow simulations together with uncertainty quantification. However, performing uncertainty quantification for scramjet simulations is challenging due to the large number of uncertainparameters involvedandthe high computational costofflow simulations. These difficulties are addressedin this paper by developing practical uncertainty quantification algorithms and computational methods, and deploying themin the current studyto large-eddy simulations ofajet incrossflow inside a simplified HIFiRE Direct Connect Rig scramjet combustor. First, global sensitivity analysis is conducted to identify influential uncertain input parameters, which can help reduce the system's stochastic dimension. Second, because models of different fidelity are used in the overall uncertainty quantification assessment, a framework for quantifying and propagating the uncertainty due to model error is presented. These methods are demonstrated on a nonreacting jet-in-crossflow test problem in a simplified scramjet geometry, with parameter space up to 24 dimensions, using static and dynamic treatments of the turbulence subgrid model, and with two-dimensional and three-dimensional geometries.
SIAM-ASA Journal on Uncertainty Quantification
Previous work has demonstrated that propagating groups of samples, called ensembles, together through forward simulations can dramatically reduce the aggregate cost of sampling-based uncertainty propagation methods [E. Phipps, M. D'Elia, H. C. Edwards, M. Hoemmen, J. Hu, and S. Rajamanickam, SIAM J. Sci. Comput., 39 (2017), pp. C162-C193]. However, critical to the success of this approach when applied to challenging problems of scientific interest is the grouping of samples into ensembles to minimize the total computational work. For example, the total number of linear solver iterations for ensemble systems may be strongly influenced by which samples form the ensemble when applying iterative linear solvers to parameterized and stochastic linear systems. In this work we explore sample grouping strategies for local adaptive stochastic collocation methods applied to PDEs with uncertain input data, in particular canonical anisotropic diffusion problems where the diffusion coefficient is modeled by truncated Karhunen-Loève expansions. We demonstrate that a measure of the total anisotropy of the diffusion coefficient is a good surrogate for the number of linear solver iterations for each sample and therefore provides a simple and effective metric for grouping samples.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.
AIAA Non-Deterministic Approaches Conference, 2018
The development of scramjet engines is an important research area for advancing hypersonic and orbital flights. Progress towards optimal engine designs requires accurate and computationally affordable flow simulations, as well as uncertainty quantification (UQ). While traditional UQ techniques can become prohibitive under expensive simulations and high-dimensional parameter spaces, polynomial chaos (PC) surrogate modeling is a useful tool for alleviating some of the computational burden. However, non-intrusive quadrature-based constructions of PC expansions relying on a single high-fidelity model can still be quite expensive. We thus introduce a two-stage numerical procedure for constructing PC surrogates while making use of multiple models of different fidelity. The first stage involves an initial dimension reduction through global sensitivity analysis using compressive sensing. The second stage utilizes adaptive sparse quadrature on a multifidelity expansion to compute PC surrogate coefficients in the reduced parameter space where quadrature methods can be more effective. The overall method is used to produce accurate surrogates and to propagate uncertainty induced by uncertain boundary conditions and turbulence model parameters, for performance quantities of interest from large eddy simulations of supersonic reactive flows inside a scramjet engine.
Abstract not provided.
SIAM Journal on Scientific Computing
Multiple physical time-scales can arise in electromagnetic simulations when dissipative effects are introduced through boundary conditions, when currents follow external time-scales, and when material parameters vary spatially. In such scenarios, the time-scales of interest may be much slower than the fastest time-scales supported by the Maxwell equations, therefore making implicit time integration an efficient approach. The use of implicit temporal discretizations results in linear systems in which fast time-scales, which severely constrain the stability of an explicit method, can manifest as so-called stiff modes. This study proposes a new block preconditioner for structure preserving (also termed physics compatible) discretizations of the Maxwell equations in first order form. The intent of the preconditioner is to enable the efficient solution of multiple-time-scale Maxwell type systems. An additional benefit of the developed preconditioner is that it requires only a traditional multigrid method for its subsolves and compares well against alternative approaches that rely on specialized edge-based multigrid routines that may not be readily available. Results demonstrate parallel scalability at large electromagnetic wave CFL numbers on a variety of test problems.
Computer Aided Chemical Engineering
The solution of the Optimal Power Flow (OPF) and Unit Commitment (UC) problems (i.e., determining generator schedules and set points that satisfy demands) is critical for efficient and reliable operation of the electricity grid. For computational efficiency, the alternating current OPF (ACOPF) problem is usually formulated with a linearized transmission model, often referred to as the DCOPF problem. However, these linear approximations do not guarantee global optimality or even feasibility for the true nonlinear alternating current (AC) system. Nonlinear AC power flow models can and should be used to improve model fidelity, but successful global solution of problems with these models requires the availability of strong relaxations of the AC optimal power flow constraints. In this paper, we use McCormick envelopes to strengthen the well-known second-order cone (SOC) relaxation of the ACOPF problem. With this improved relaxation, we can further include tight bounds on the voltages at the reference bus, and this paper demonstrates the effectiveness of this for improved bounds tightening. We present results on the optimality gap of both the base SOC relaxation and our Strengthened SOC (SSOC) relaxation for the National Information and Communications Technology Australia (NICTA) Energy System Test Case Archive (NESTA). For the cases where the SOC relaxation yields an optimality gap more than 0.1 %, the SSOC relaxation with bounds tightening further reduces the optimality gap by an average of 67 % and ultimately reduces the optimality gap to less than 0.1 % for 58 % of all the NESTA cases considered. Stronger relaxations enable more efficient global solution of the ACOPF problem and can improve computational efficiency of MINLP problems with AC power flow constraints, e.g., unit commitment.
Computer Aided Chemical Engineering
Real-time energy pricing has caused a paradigm shift for process operations with flexibility becoming a critical driver of economics. As such, incorporating real-time pricing into planning and scheduling optimization formulations has received much attention over the past two decades (Zhang and Grossman, 2016). These formulations, however, focus on 1-hour or longer time discretizations and neglect process dynamics. Recent analysis of historical price data from the California electricity market (CAISO) reveals that a majority of economic opportunities come from fast market layers, i.e., real-time energy market and ancillary services (Dowling et al., 2017). We present a dynamic optimization framework to quantify the revenue opportunities of chemical manufacturing systems providing frequency regulation (FR). Recent analysis of first order systems finds that slow process dynamics naturally dampen high frequency harmonics in FR signals (Dowling and Zavala, 2017). As a consequence, traditional chemical processes with long time constants may be able to provide fast flexibility without disrupting product quality, performance of downstream unit operations, etc. This study quantifies the ability of a distillation system to provide sufficient dynamic flexibility to adjust energy demands every 4 seconds in response to market signals. Using a detailed differential algebraic equation (DAE) model (Hahn and Edgar, 2002) and historic data from the Texas electricity market (ECROT), we estimate revenue opportunities for different column designs. We implement our model using the algebraic modeling language Pyomo (Hart et al., 2011) and its dynamic optimization extension Pyomo.DAE (Nicholson et al., 2017). These software packages enable rapid development of complex optimization models using high-level modelling constructs and provide flexible tools for initializing and discretizing DAE models.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
MPI usage patterns are changing as applications move towards fully-multithreaded runtimes. However, the impact of these patterns on MPI message matching is not well-studied. In particular, MPI’s mechanic for receiver-side data placement, message matching, can be impacted by increased message volume and nondeterminism incurred by multithreading. While there has been significant developer interest and work to provide an efficient MPI interface for multithreaded access, there has not been a study showing how these patterns affect messaging patterns and matching behavior. In this paper, we present a framework for studying the effects of multithreading on MPI message matching. This framework allows us to explore the implications of different common communication patterns and thread-level decompositions. We present a study of these impacts on the architecture of two of the Top 10 supercomputers (NERSC’s Cori and LANL’s Trinity). This data provides a baseline to evaluate reasonable matching engine queue lengths, search depths, and queue drain times under the multithreaded model. Furthermore, the study highlights surprising results on the challenge posed by message matching for multithreaded application performance.
20th Topical Meeting of the Radiation Protection and Shielding Division, RPSD 2018
The design of satellites usually includes the objective of minimizing mass due to high launch costs, which is complicated by the need to protect sensitive electronics from the space radiation environment. There is growing interest in automated design optimization techniques to help achieve that objective. Traditional optimization approaches that rely exclusively on response functions (e.g. dose calculations) can be quite expensive when applied to transport problems. Previously we showed how adjoint-based transport sensitivities used in conjunction with gradient-based optimization algorithms can be quite effective in designing mass-efficient electron/proton shields in one-dimensional slab geometries. In this paper we extend that work to two-dimensional Cartesian geometries. This consists primarily of deriving the sensitivities to geometric changes, given a particular prescription for parametrizing the shield geometry. We incorporate these sensitivities into our optimization process and demonstrate their effectiveness in such design calculations.
International Journal of High Performance Computing Applications
Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on those workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.
Proceedings of SPIE - The International Society for Optical Engineering
We discuss uncertainty quantification in multisensor data integration and analysis, including estimation methods and the role of uncertainty in decision making and trust in automated analytics. The challenges associated with automatically aggregating information across multiple images, identifying subtle contextual cues, and detecting small changes in noisy activity patterns are well-established in the intelligence, surveillance, and reconnaissance (ISR) community. In practice, such questions cannot be adequately addressed with discrete counting, hard classifications, or yes/no answers. For a variety of reasons ranging from data quality to modeling assumptions to inadequate definitions of what constitutes "interesting" activity, variability is inherent in the output of automated analytics, yet it is rarely reported. Consideration of these uncertainties can provide nuance to automated analyses and engender trust in their results. In this work, we assert the importance of uncertainty quantification for automated data analytics and outline a research agenda. We begin by defining uncertainty in the context of machine learning and statistical data analysis, identify its sources, and motivate the importance and impact of its quantification. We then illustrate these issues and discuss methods for data-driven uncertainty quantification in the context of a multi-source image analysis example. We conclude by identifying several specific research issues and by discussing the potential long-term implications of uncertainty quantification for data analytics, including sensor tasking and analyst trust in automated analytics.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Supercomputing hardware is undergoing a period of significant change. In order to cope with the rapid pace of hardware and, in many cases, programming model innovation, we have developed the Kokkos Programming Model – a C++-based abstraction that permits performance portability across diverse architectures. Our experience has shown that the abstractions developed can significantly frustrate debugging and profiling activities because they break expected code proximity and layout assumptions. In this paper we present the Kokkos Profiling interface, a lightweight, suite of hooks to which debugging and profiling tools can attach to gain deep insights into the execution and data structure behaviors of parallel programs written to the Kokkos interface.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Modern supercomputers are shared among thousands of users running a variety of applications. Knowing which applications are running in the system can bring substantial benefits: knowledge of applications that intensively use shared resources can aid scheduling; unwanted applications such as cryptocurrency mining or password cracking can be blocked; system architects can make design decisions based on system usage. However, identifying applications on supercomputers is challenging because applications are executed using esoteric scripts along with binaries that are compiled and named by users. This paper introduces a novel technique to identify applications running on supercomputers. Our technique, Taxonomist, is based on the empirical evidence that applications have different and characteristic resource utilization patterns. Taxonomist uses machine learning to classify known applications and also detect unknown applications. We test our technique with a variety of benchmarks and cryptocurrency miners, and also with applications that users of a production supercomputer ran during a 6 month period. We show that our technique achieves nearly perfect classification for this challenging data set.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
In modern shared-memory NUMA systems which typically consist of two or more multi-core processor packages with local memory, affinity of data to computation is crucial for achieving high performance with an OpenMP program. OpenMP* 3.0 introduced support for task-parallel programs in 2008 and has continued to extend its applicability and expressiveness. However, the ability to support data affinity of tasks is missing. In this paper, we investigate several approaches for task-to-data affinity that combine locality-aware task distribution and task stealing. We introduce the task affinity clause that will be part of OpenMP 5.0 and provide the reasoning behind its design. Evaluation with our experimental implementation in the LLVM OpenMP runtime shows that task affinity improves execution performance up to 4.5x on an 8-socket NUMA machine and significantly reduces runtime variability of OpenMP tasks. Our results demonstrate that a variety of applications can benefit from task affinity and that the presented clause is closing the gap of task-to-data affinity in OpenMP 5.0.
AISTech - Iron and Steel Technology Conference Proceedings
Abstract not provided.
Computer Aided Chemical Engineering
In this work, we describe new capabilities for the Pyomo.GDP modeling environment, moving beyond classical reformulation approaches to include non-standard reformulations and a new logic-based solver, GDPopt. Generalized Disjunctive Programs (GDPs) address optimization problems involving both discrete and continuous decision variables. For difficult problems, advanced reformulations such as the disjunctive “basic step” to intersect multiple disjunctions or the use of procedural reformulations may be necessary. Complex nonlinear GDP models may also be tackled using logic-based outer approximation. These expanded capabilities highlight the flexibility that Pyomo.GDP offers modelers in applying novel strategies to solve difficult optimization problems.
Computer Aided Chemical Engineering
New modelling and optimization platforms have enabled the creation of frameworks for solution strategies that are based on solving sequences of dynamic optimization problems. This study demonstrates the application of the Python-based Pyomo platform as a basis for formulating and solving Nonlinear Model Predictive Control (NMPC) and Moving Horizon Estimation (MHE) problems, which enables fast on-line computations through large-scale nonlinear optimization and Nonlinear Programming (NLP) sensitivity. We describe these underlying approaches and sensitivity computations, and showcase the implementation of the framework with large DAE case studies including tray-by-tray distillation models and Bubbling Fluidized Bed Reactors (BFB).
SIAM-ASA Journal on Uncertainty Quantification
The recovery of approximately sparse or compressible coefficients in a polynomial chaos expansion is a common goal in many modern parametric uncertainty quantification (UQ) problems. However, relatively little effort in UQ has been directed toward theoretical and computational strategies for addressing the sparse corruptions problem, where a small number of measurements are highly corrupted. Such a situation has become pertinent today since modern computational frameworks are sufficiently complex with many interdependent components that may introduce hardware and software failures, some of which can be difficult to detect and result in a highly polluted simulation result. In this paper we present a novel compressive sampling-based theoretical analysis for a regularized \ell1 minimization algorithm that aims to recover sparse expansion coefficients in the presence of measurement corruptions. Our recovery results are uniform (the theoretical guarantees hold for all compressible signals and compressible corruptions vectors) and prescribe algorithmic regularization parameters in terms of a user-defined a priori estimate on the ratio of measurements that are believed to be corrupted. We also propose an iteratively reweighted optimization algorithm that automatically refines the value of the regularization parameter and empirically produces superior results. Our numerical results test our framework on several medium to high dimensional examples of solutions to parameterized differential equations and demonstrate the effectiveness of our approach.
International Journal for Uncertainty Quantification
Polynomial chaos methods have been extensively used to analyze systems in uncertainty quantification. Furthermore, several approaches exist to determine a low-dimensional approximation (or sparse approximation) for some quantity of interest in a model, where just a few orthogonal basis polynomials are required. In this work, we consider linear dynamical systems consisting of ordinary differential equations with random variables. The aim of this paper is to explore methods for producing low-dimensional approximations of the quantity of interest further. We investigate two numerical techniques to compute a low-dimensional representation, which both fit the approximation to a set of samples in the time domain. On the one hand, a frequency domain analysis of a stochastic Galerkin system yields the selection of the basis polynomials. It follows a linear least squares problem. On the other hand, a sparse minimization yields the choice of the basis polynomials by information from the time domain only. An orthogonal matching pursuit produces an approximate solution of the minimization problem. Finally, we compare the two approaches using a test example from a mechanical application.
American Society of Mechanical Engineers, Pressure Vessels and Piping Division (Publication) PVP
An ENUN 32P cask supplied by Equipos Nucleares S.A. (ENSA) was transported 9,600 miles by road, sea, and rail in 2017 in order to collect shock and vibration data on the cask system and surrogate spent fuel assemblies within the cask. The task of examining 101,857 ASCII data files – 6.002 terabytes of data (this includes binary and ASCII files) – has begun. Some results of preliminary analyses are presented in this paper. A total of seventy-seven accelerometers and strain gauges were attached by Sandia National Laboratories (SNL) to three surrogate spent fuel assemblies, the cask basket, the cask body, the transport cradle, and the transport platforms. The assemblies were provided by SNL, Empresa Nacional de Residuos Radiactivos, S.A. (ENRESA), and a collaboration of Korean institutions. The cask system was first subjected to cask handling operations at the ENSA facility. The cask was then transported by heavy-haul truck in northern Spain and shipped from Spain to Belgium and subsequently to Baltimore on two roll-on/roll-off ships. From Baltimore, the cask was transported by rail using a 12- axle railcar to the American Association of Railroads’ Transportation Technology Center, Inc. (TTCI) near Pueblo, Colorado where a series of special rail tests were performed. Data were continuously collected during this entire sequence of multi-modal transportation events. (We did not collect data on the transfer between modes of transportation.) Of particular interest – indeed the original motivation for these tests – are the strains measured on the zirconium-alloy tubes in the assemblies. The strains for each of the transport modes are compared to the yield strength of irradiated Zircaloy to illustrate the margin against rod failure during normal conditions of transport. The accelerometer data provides essential comparisons of the accelerations on the different components of the cask system exhibiting both amplification and attenuation of the accelerations at the transport platforms through the cradle and cask and up to the interior of the cask. These data are essential for modeling cask systems. This paper concentrates on analyses of the testing of the cask on a 12-axle railcar at TTCI.
Abstract not provided.
Proceedings International Conference on Automated Planning and Scheduling, ICAPS
We consider the problem of selecting a small set (mosaic) of sensor images (footprints) whose union covers a two-dimensional Region Of Interest (ROI) on Earth. We take the approach of modeling the mosaic problem as a Mixed-Integer Linear Program (MILP). This allows solutions to this subproblem to feed into a larger remote-sensor collection-scheduling MILP. This enables the scheduler to dynamically consider alternative mosaics, without having to perform any new geometric computations. Our approach to set up the optimization problem uses maximal disk sampling and point-in-polygon geometric calculations. Footprints may be of any shape, even non-convex, and we show examples using a variety of shapes that may occur in practice. The general integer optimization problem can become computationally expensive for large problems. In practice, the number of placed footprints is within an order of magnitude of ten, making the time to solve to optimality on the order of minutes. This is fast enough to make the approach relevant for near real-time mission applications. We provide open source software for all our methods, "GeoPlace."
Computer Aided Chemical Engineering
Optimization problems under uncertainty involve making decisions without the full knowledge of the impact the decisions will have and before all the facts relevant to those decisions are known. These problems are common, for example, in process synthesis and design, planning and scheduling, supply chain management, and generation and distribution of electric power. The sources of uncertainty in optimization problems fall into two broad categories: endogenous and exogenous. Exogenous uncertain parameters are realized at a known stage (e.g., time period or decision point) in the problem irrespective of the values of the decision variables. For example, demand is generally considered to be independent of any capacity expansion decisions in process industries, and hence, is regarded as an exogenous uncertain parameter. In contrast, decisions impact endogenous uncertain parameters. The impact can either be in the resolution or in the distribution of the uncertain parameter. The realized values of a Type-I endogenous uncertain parameter are affected by the decisions. An example of this type of uncertainty would be facility protection problem where the likelihood of a facility failing to deliver goods or services after a disruptive event depends on the level of resources allocated as protection to that facility. On the other hand, only the realization times of Type-II endogenous uncertain parameters are affected by decisions. For example, in a clinical trial planning problem, whether a clinical trial is successful or not is only realized after the clinical trial has been completed, and whether the clinical trial is successful or not is not impacted by when the clinical trial is started. There are numerous approaches to modelling and solving optimization problems with exogenous and/or endogenous uncertainty, including (adjustable) robust optimization, (approximate) dynamic programming, model predictive control, and stochastic programming. Stochastic programming is a particularly attractive approach, as there is a straightforward translation from the deterministic model to the stochastic equivalent. The challenge with stochastic programming arises through the rapid, sometimes exponential, growth in the program size as we sample the uncertainty space or increase the number of recourse stages. In this talk, we will give an overview of our research activities developing practical stochastic programming approaches to problems with exogeneous and/or endogenous uncertainty. We will highlight several examples from power systems planning and operations, process modelling, synthesis and design optimization, artificial lift infrastructure planning for shale gas production, and clinical trial planning. We will begin by discussing the straightforward case of exogenous uncertainty. In this situation, the stochastic program can be expressed completely by a deterministic model, a scenario tree, and the scenario-specific parameterizations of the deterministic model. Beginning with the deterministic model, modelers create instances of the deterministic model for each scenario using the scenario-specific data. Coupling the scenario models occurs through the addition of nonanticipativity constraints, equating the stage decision variables across all scenarios that pass through the same stage node in the scenario tree. Modelling tools like PySP (Watson, 2012) greatly simplify the process of composing large stochastic programs by beginning either with an abstract representation of the deterministic model written in Pyomo (Hart, et al., 2017) and scenario data, or a function that will return the deterministic Pyomo model for a specific scenario. PySP automatically can create the extensive form (deterministic equivalent) model from a general representation of the scenario tree. The challenge with large scale stochastic programs with exogenous uncertainty arises through managing the growth of the problem size. Fortunately, there are several well-known approaches to decomposing the problem, both stage-wise (e.g., Benders’ decomposition) and scenario-based (e.g., Lagrangian relaxation or Progressive Hedging), enabling the direct solution of stochastic programs with hundreds or thousands of scenarios. We will then discuss developments in modelling and solving stochastic programs with endogenous uncertainty. These problems are significantly more challenging to both pose and to solve, due to the exponential growth in scenarios required to cover the decision-dependent uncertainties relative to the number of stages in the problem. In this situation, standardized frameworks for expressing stochastic programs do not exist, requiring a modeler to explicitly generate the representations and nonanticipativity constraints. Further, the size of the resulting scenario space (frequently exceeding millions of scenarios) precludes the direct solution of the resulting program. In this case, numerous decomposition algorithms and heuristics have been developed (e.g., Lagrangean decomposition-based algorithms (Tarhan, et al. 2013) or Knapsack-based decomposition Algorithms (Christian and Cremaschi, 2015)).
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which typically consists of a set of simulation runs exploring the same problem space. An ensemble can be thought of as a set of samples within a multi-variate domain, where each sample is a vector whose value defines a point in high-dimensional space. To understand and describe the underlying problem being modeled in the simulations, ensemble analysis looks for shared behaviors and common features across the group of runs. Additionally, ensemble analysis tries to quantify differences found in any members that deviate from the rest of the group. The Slycat™ system integrates data management, scalable analysis, and visualization. Results are viewed remotely on a user’s desktop via commodity web clients using a multi-tiered hierarchy of computation and data storage, as shown in Figure 1. Our goal is to operate on data as close to the source as possible, thereby reducing time and storage costs associated with data movement. Consequently, we are working to develop parallel analysis capabilities that operate on High Performance Computing (HPC) platforms, to explore approaches for reducing data size, and to implement strategies for staging computation across the Slycat™ hierarchy. Within Slycat™, data and visual analysis are organized around projects, which are shared by a project team. Project members are explicitly added, each with a designated set of permissions. Although users sign-in to access Slycat™, individual accounts are not maintained. Instead, authentication is used to determine project access. Within projects, Slycat™ models capture analysis results and enable data exploration through various visual representations. Although for scientists each simulation run is a model of real-world phenomena given certain conditions, we use the term model to refer to our modeling of the ensemble data, not the physics. Different model types often provide complementary perspectives on data features when analyzing the same data set. Each model visualizes data at several levels of abstraction, allowing the user to range from viewing the ensemble holistically to accessing numeric parameter values for a single run. Bookmarks provide a mechanism for sharing results, enabling interesting model states to be labeled and saved.
Abstract not provided.
Abstract not provided.
Abstract not provided.
AIAA Non-Deterministic Approaches Conference, 2018
Within the SEQUOIA project, funded by the DARPA EQUiPS program, we pursue algorithmic approaches that enable comprehensive design under uncertainty, through inclusion of aleatory/parametric and epistemic/model form uncertainties within scalable forward/inverse UQ approaches. These statistical methods are embedded within design processes that manage computational expense through active subspace, multilevel-multifidelity, and reduced-order modeling approximations. To demonstrate these methods, we focus on the design of devices that involve multi-physics interactions in advanced aerospace vehicles. A particular problem of interest is the shape design of nozzles for advanced vehicles such as the Northrop Grumman UCAS X-47B, involving coupled aero-structural-thermal simulations for nozzle performance. In this paper, we explore a combination of multilevel and multifidelity forward and inverse UQ algorithms to reduce the overall computational cost of the analysis by leveraging hierarchies of model form (i.e., multifidelity hierarchies) and solution discretization (i.e., multilevel hierarchies) in order of exploit trade offs between solution accuracy and cost. In particular, we seek the most cost effective fusion of information across complex multi-dimensional modeling hierarchies. Results to date indicate the utility of multiple approaches, including methods that optimally allocate resources when estimator variance varies smoothly across levels, methods that allocate sufficient sampling density based on sparsity estimates, and methods that employ greedy multilevel refinement.
Abstract not provided.
Abstract not provided.
Recent high-performance computing (HPC) platforms such as the Trinity Advanced Technology System (ATS-1) feature burst buffer resources that can have a dramatic impact on an application’s I/O performance. While these non-volatile memory (NVM) resources provide a new tier in the storage hierarchy, developers must find the right way to incorporate the technology into their applications in order to reap the benefits. Similar to other laboratories, Sandia is actively investigating ways in which these resources can be incorporated into our existing libraries and workflows without burdening our application developers with excessive, platform-specific details. This FY18Q1 milestone summaries our progress in adapting the Sandia Parallel Aerodynamics and Reentry Code (SPARC) in Sandia’s ATDM program to leverage Trinity’s burst buffers for checkpoint/restart operations. We investigated four different approaches with varying tradeoffs in this work: (1) simply updating job script to use stage-in/stage out burst buffer directives, (2) modifying SPARC to use LANL’s hierarchical I/O (HIO) library to store/retrieve checkpoints, (3) updating Sandia’s IOSS library to incorporate the burst buffer in all meshing I/O operations, and (4) modifying SPARC to use our Kelpie distributed memory library to store/retrieve checkpoints. Team members were successful in generating initial implementation for all four approaches, but were unable to obtain performance numbers in time for this report (reasons: initial problem sizes were not large enough to stress I/O, and SPARC refactor will require changes to our code). When we presented our work to the SPARC team, they expressed the most interest in the second and third approaches. The HIO work was favored because it is lightweight, unobtrusive, and should be portable to ATS-2. The IOSS work is seen as a long-term solution, and is favored because all I/O work (including checkpoints) can be deferred to a single library.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Safety basis analysts throughout the U.S. Department of Energy (DOE) complex rely heavily on the information provided in the DOE Handbook, DOE-HDBK-3010, Airborne Release Fractions/Rates and Respirable Fractions for Nonreactor Nuclear Facilities, to determine radionuclide source terms from postulated accident scenarios. In calculating source terms, analysts tend to use the DOE Handbook’s bounding values on airborne release fractions (ARFs) and respirable fractions (RFs) for various categories of insults (representing potential accident release categories). This is typically due to both time constraints and the avoidance of regulatory critique. Unfortunately, these bounding ARFs/RFs represent extremely conservative values. Moreover, they were derived from very limited small-scale bench/laboratory experiments and/or from engineered judgment. Thus, the basis for the data may not be representative of the actual unique accident conditions and configurations being evaluated. The goal of this research is to develop a more accurate and defensible method to determine bounding values for the DOE Handbook using state-of-art multi-physics-based computer codes.
Abstract not provided.
Proceedings of the International Conference on Cloud Computing Technology and Science, CloudCom
Containerization, or OS-level virtualization has taken root within the computing industry. However, container utilization and its impact on performance and functionality within High Performance Computing (HPC) is still relatively undefined. This paper investigates the use of containers with advanced supercomputing and HPC system software. With this, we define a model for parallel MPI application DevOps and deployment using containers to enhance development effort and provide container portability from laptop to clouds or supercomputers. In this endeavor, we extend the use of Sin- gularity containers to a Cray XC-series supercomputer. We use the HPCG and IMB benchmarks to investigate potential points of overhead and scalability with containers on a Cray XC30 testbed system. Furthermore, we also deploy the same containers with Docker on Amazon's Elastic Compute Cloud (EC2), and compare against our Cray supercomputer testbed. Our results indicate that Singularity containers operate at native performance when dynamically linking Cray's MPI libraries on a Cray supercomputer testbed, and that while Amazon EC2 may be useful for initial DevOps and testing, scaling HPC applications better fits supercomputing resources like a Cray.
Solar Energy
Optimizing thermal generation commitments and dispatch in the presence of high penetrations of renewable resources such as solar energy requires a characterization of their stochastic properties. In this study, we describe novel methods designed to create day-ahead, wide-area probabilistic solar power scenarios based only on historical forecasts and associated observations of solar power production. Each scenario represents a possible trajectory for solar power in next-day operations with an associated probability computed by algorithms that use historical forecast errors. Scenarios are created by segmentation of historic data, fitting non-parametric error distributions using epi-splines, and then computing specific quantiles from these distributions. Additionally, we address the challenge of establishing an upper bound on solar power output. Our specific application driver is for use in stochastic variants of core power systems operations optimization problems, e.g., unit commitment and economic dispatch. These problems require as input a range of possible future realizations of renewables production. However, the utility of such probabilistic scenarios extends to other contexts, e.g., operator and trader situational awareness. Finally, we compare the performance of our approach to a recently proposed method based on quantile regression, and demonstrate that our method performs comparably to this approach in terms of two widely used methods for assessing the quality of probabilistic scenarios: the Energy score and the Variogram score.
Mathematical Programming Computation
We describe pyomo.dae, an open source Python-based modeling framework that enables high-level abstract specification of optimization problems with differential and algebraic equations. The pyomo.dae framework is integrated with the Pyomo open source algebraic modeling language, and is available at http://www.pyomo.org. One key feature of pyomo.dae is that it does not restrict users to standard, predefined forms of differential equations, providing a high degree of modeling flexibility and the ability to express constraints that cannot be easily specified in other modeling frameworks. Other key features of pyomo.dae are the ability to specify optimization problems with high-order differential equations and partial differential equations, defined on restricted domain types, and the ability to automatically transform high-level abstract models into finite-dimensional algebraic problems that can be solved with off-the-shelf solvers. Moreover, pyomo.dae users can leverage existing capabilities of Pyomo to embed differential equation models within stochastic and integer programming models and mathematical programs with equilibrium constraint formulations. Collectively, these features enable the exploration of new modeling concepts, discretization schemes, and the benchmarking of state-of-the-art optimization solvers.
SIAM Journal on Numerical Analysis
In numerous applications, scientists and engineers acquire varied forms of data that partially characterize the inputs to an underlying physical system. This data is then used to inform decisions such as controls and designs. Consequently, it is critical that the resulting control or design is robust to the inherent uncertainties associated with the unknown probabilistic characterization of the model inputs. Here in this work, we consider optimal control and design problems constrained by partial differential equations with uncertain inputs. We do not assume a known probabilistic model for the inputs, but rather we formulate the problem as a distributionally robust optimization problem where the outer minimization problem determines the control or design, while the inner maximization problem determines the worst-case probability measure that matches desired characteristics of the data. We analyze the inner maximization problem in the space of measures and introduce a novel measure approximation technique, based on the approximation of continuous functions, to discretize the unknown probability measure. Finally, we prove consistency of our approximated min-max problem and conclude with numerical results.
Physical Review A
We derive an intrinsically temperature-dependent approximation to the correlation grand potential for many-electron systems in thermodynamical equilibrium in the context of finite-temperature reduced-density-matrix-functional theory (FT-RDMFT). We demonstrate its accuracy by calculating the magnetic phase diagram of the homogeneous electron gas. We compare it to known limits from highly accurate quantum Monte Carlo calculations as well as to phase diagrams obtained within existing exchange-correlation approximations from density functional theory and zero-temperature RDMFT.
Experimental Mechanics
With the rapid spread in use of Digital Image Correlation (DIC) globally, it is important there be some standard methods of verifying and validating DIC codes. To this end, the DIC Challenge board was formed and is maintained under the auspices of the Society for Experimental Mechanics (SEM) and the international DIC society (iDICs). The goal of the DIC Board and the 2D–DIC Challenge is to supply a set of well-vetted sample images and a set of analysis guidelines for standardized reporting of 2D–DIC results from these sample images, as well as for comparing the inherent accuracy of different approaches and for providing users with a means of assessing their proper implementation. This document will outline the goals of the challenge, describe the image sets that are available, and give a comparison between 12 commercial and academic 2D–DIC codes using two of the challenge image sets.
Physical Review E
The role of an external field on capillary waves at the liquid-vapor interface of a dipolar fluid is investigated using molecular dynamics simulations. For fields parallel to the interface, the interfacial width squared increases linearly with respect to the logarithm of the size of the interface across all field strengths tested. The value of the slope decreases with increasing field strength, indicating that the field dampens the capillary waves. With the inclusion of the parallel field, the surface stiffness increases with increasing field strength faster than the surface tension. For fields perpendicular to the interface, the interfacial width squared is linear with respect to the logarithm of the size of the interface for small field strengths, and the surface stiffness is less than the surface tension. Above a critical field strength that decreases as the size of the interface increases, the interface becomes unstable due to the increased amplitude of the capillary waves.
Abstract not provided.
Abstract not provided.
Numerical Linear Algebra with Applications
Algebraic multigrid (AMG) preconditioners are considered for discretized systems of partial differential equations (PDEs) where unknowns associated with different physical quantities are not necessarily colocated at mesh points. Specifically, we investigate a Q2−Q1 mixed finite element discretization of the incompressible Navier–Stokes equations where the number of velocity nodes is much greater than the number of pressure nodes. Consequently, some velocity degrees of freedom (DOFs) are defined at spatial locations where there are no corresponding pressure DOFs. Thus, AMG approaches leveraging this colocated structure are not applicable. This paper instead proposes an automatic AMG coarsening that mimics certain pressure/velocity DOF relationships of the Q2−Q1 discretization. The main idea is to first automatically define coarse pressures in a somewhat standard AMG fashion and then to carefully (but automatically) choose coarse velocity unknowns so that the spatial location relationship between pressure and velocity DOFs resembles that on the finest grid. To define coefficients within the intergrid transfers, an energy minimization AMG (EMIN-AMG) is utilized. EMIN-AMG is not tied to specific coarsening schemes and grid transfer sparsity patterns, and so it is applicable to the proposed coarsening. Numerical results highlighting solver performance are given on Stokes and incompressible Navier–Stokes problems.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Research in Mathematical Sciences
This paper describes versions of OPAL, the Occam-Plausibility Algorithm (Farrell et al. in J Comput Phys 295:189–208, 2015) in which the use of Bayesian model plausibilities is replaced with information-theoretic methods, such as the Akaike information criterion and the Bayesian information criterion. Applications to complex systems of coarse-grained molecular models approximating atomistic models of polyethylene materials are described. All of these model selection methods take into account uncertainties in the model, the observational data, the model parameters, and the predicted quantities of interest. A comparison of the models chosen by Bayesian model selection criteria and those chosen by the information-theoretic criteria is given.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Applied Geophysics
In this study we developed an efficient Bayesian inversion framework for interpreting marine seismic Amplitude Versus Angle and Controlled-Source Electromagnetic data for marine reservoir characterization. The framework uses a multi-chain Markov-chain Monte Carlo sampler, which is a hybrid of DiffeRential Evolution Adaptive Metropolis and Adaptive Metropolis samplers. The inversion framework is tested by estimating reservoir-fluid saturations and porosity based on marine seismic and Controlled-Source Electromagnetic data. The multi-chain Markov-chain Monte Carlo is scalable in terms of the number of chains, and is useful for computationally demanding Bayesian model calibration in scientific and engineering problems. As a demonstration, the approach is used to efficiently and accurately estimate the porosity and saturations in a representative layered synthetic reservoir. The results indicate that the seismic Amplitude Versus Angle and Controlled-Source Electromagnetic joint inversion provides better estimation of reservoir saturations than the seismic Amplitude Versus Angle only inversion, especially for the parameters in deep layers. The performance of the inversion approach for various levels of noise in observational data was evaluated — reasonable estimates can be obtained with noise levels up to 25%. Sampling efficiency due to the use of multiple chains was also checked and was found to have almost linear scalability.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability, anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.
Abstract not provided.
The FY18Q1 milestone of the ECP/VTK-m project includes the implementation of a multiblock data set, the completion of a gradients filtering operation, and the release of version 1.1 of the VTK-m software. With the completion of this milestone, the new multiblock data set allows us to iteratively schedule algorithms on composite data structures such as assemblies or hierarchies like AMR. The new gradient algorithms approximate derivatives of fields in 3D structures with finite differences. Finally, the release of VTK-m version 1.1 tags a stable release of the software that can more easily be incorporated into external projects.
We study the optimization of an energy function used by the meshing community to measure and improve mesh quality. This energy is non-traditional because it is dependent on both the primal triangulation and its dual Voronoi (power) diagram. The energy is a measure of the mesh's quality for usage in Discrete Exterior Calculus (DEC), a method for numerically solving PDEs. In DEC, the PDE domain is triangulated and this mesh is used to obtain discrete approximations of the continuous operators in the PDE. The energy of a mesh gives an upper bound on the error of the discrete diagonal approximation of the Hodge star operator. In practice, one begins with an initial mesh and then makes adjustments to produce a mesh of lower energy. However, we have discovered several shortcomings in directly optimizing this energy, e.g. its non-convexity, and we show that the search for an optimized mesh may lead to mesh inversion (malformed triangles). We propose a new energy function to address some of these issues.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Simulating HPC systems is a difficult task and the emergence of “Beyond CMOS” architectures and execution models will increase that difficulty. This document presents a “tutorial” on some of the simulation challenges faced by conventional and non-conventional architectures (Section 1) and goals and requirements for simulating Beyond CMOS systems (Section 2). These provide background for proposed short- and long-term roadmaps for simulation efforts at Sandia (Sections 3 and 4). Additionally, a brief explanation of a proof-of-concept integration of a Beyond CMOS architectural simulator is presented (Section 2.3).
2017 IEEE International Conference on Wireless for Space and Extreme Environments, WiSEE 2017
Conventional wisdom in the spacecraft domain is that on-orbit computation is expensive, and thus, information is traditionally funneled to the ground as directly as possible. The explosion of information due to larger sensors, the advancements of Moore's law, and other considerations lead us to revisit this practice. In this article, we consider the trade-off between computation, storage, and transmission, viewed as an energy minimization problem.
2017 IEEE International Conference on Rebooting Computing, ICRC 2017 - Proceedings
Unlike general purpose computer architectures that are comprised of complex processor cores and sequential computation, the brain is innately parallel and contains highly complex connections between computational units (neurons). Key to the architecture of the brain is a functionality enabled by the combined effect of spiking communication and sparse connectivity with unique variable efficacies and temporal latencies. Utilizing these neuroscience principles, we have developed the Spiking Temporal Processing Unit (STPU) architecture which is well-suited for areas such as pattern recognition and natural language processing. In this paper, we formally describe the STPU, implement the STPU on a field programmable gate array, and show measured performance data.
2017 IEEE International Conference on Rebooting Computing, ICRC 2017 - Proceedings
Most existing concepts for hardware implementation of reversible computing invoke an adiabatic computing paradigm, in which individual degrees of freedom (e.g., node voltages) are synchronously transformed under the influence of externallysupplied driving signals. But distributing these "power/clock" signals to all gates within a design while efficiently recovering their energy is difficult. Can we reduce clocking overhead using a ballistic approach, wherein data signals self-propagating between devices drive most state transitions? Traditional concepts of ballistic computing, such as the classic Billiard-Ball Model, typically rely on a precise synchronization of interacting signals, which can fail due to exponential amplification of timing differences when signals interact. In this paper, we develop a general model of Asynchronous Ballistic Reversible Computing (ABRC) that aims to address these problems by eliminating the requirement for precise synchronization between signals. Asynchronous reversible devices in this model are isomorphic to a restricted set of Mealy finite-state machines. We explore ABRC devices having up to 3 bidirectional I/O terminals and up to 2 internal states, identifying a simple pair of such devices that comprises a computationally universal set of primitives. We also briefly discuss how ABRC might be implemented using single flux quanta in superconducting circuits.
2017 IEEE International Conference on Rebooting Computing, ICRC 2017 - Proceedings
Most existing concepts for hardware implementation of reversible computing invoke an adiabatic computing paradigm, in which individual degrees of freedom (e.g., node voltages) are synchronously transformed under the influence of externallysupplied driving signals. But distributing these "power/clock" signals to all gates within a design while efficiently recovering their energy is difficult. Can we reduce clocking overhead using a ballistic approach, wherein data signals self-propagating between devices drive most state transitions? Traditional concepts of ballistic computing, such as the classic Billiard-Ball Model, typically rely on a precise synchronization of interacting signals, which can fail due to exponential amplification of timing differences when signals interact. In this paper, we develop a general model of Asynchronous Ballistic Reversible Computing (ABRC) that aims to address these problems by eliminating the requirement for precise synchronization between signals. Asynchronous reversible devices in this model are isomorphic to a restricted set of Mealy finite-state machines. We explore ABRC devices having up to 3 bidirectional I/O terminals and up to 2 internal states, identifying a simple pair of such devices that comprises a computationally universal set of primitives. We also briefly discuss how ABRC might be implemented using single flux quanta in superconducting circuits.
Proceedings of LLVM-HPC 2017: 4th Workshop on the LLVM Compiler Infrastructure in HPC - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing compilers for task-level parallelism are still in their infancy. This work explores a compiler front end that translates OpenMP tasking semantics to Tapir, an extension to LLVM IR that represents fork-join parallelism. This enables analyses and optimizations that were previously inaccessible to OpenMP codes, as well as the ability to target additional runtimes at code generation. Using a Cilk runtime back end, we compare results to existing OpenMP implementations. Initial performance results for the Barcelona OpenMP task suite show performance improvements over existing implementations.
Proceedings of PDSW-DISCS 2017 - 2nd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis
Significant challenges exist in the efficient retrieval of data from extreme-scale simulations. An important and evolving method of addressing these challenges is application-level metadata management. Historically, HDF5 and NetCDF have eased data retrieval by offering rudimentary attribute capabilities that provide basic metadata. ADIOS simplified data retrieval by utilizing metadata for each process' data. EMPRESS provides a simple example of the next step in this evolution by integrating per-process metadata with the storage system itself, making it more broadly useful than single file or application formats. Additionally, it allows for more robust and customizable metadata.
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
Many applications, such as PDE based simulations and machine learning, apply BLAS/LAPACK routines to large groups of small matrices. While existing batched BLAS APIs provide meaningful speedup for this problem type, a non-canonical data layout enabling cross-matrix vectorization may provide further significant speedup. In this paper, we propose a new compact data layout that interleaves matrices in blocks according to the SIMD vector length. We combine this compact data layout with a new interface to BLAS/LAPACK routines that can be used within a hierarchical parallel application. Our layout provides up to 14x, 45x, and 27x speedup against OpenMP loops around optimized DGEMM, DTRSM and DGETRF kernels, respectively, on the Intel Knights Landing architecture. We discuss the compact batched BLAS/LAPACK implementations in two libraries, KokkosKernels and Intel® Math Kernel Library. We demonstrate the APIs in a line solver for coupled PDEs. Finally, we present detailed performance analysis of our kernels.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Within the EXAALT project, the SNAP [1] approach is being used to develop high accuracy potentials for use in large-scale long-time molecular dynamics simulations of materials behavior. In particular, we have developed a new SNAP potential that is suitable for describing the interplay between helium atoms and vacancies in high-temperature tungsten[2]. This model is now being used to study plasma-surface interactions in nuclear fusion reactors for energy production. The high-accuracy of SNAP potentials comes at the price of increased computational cost per atom and increased computational complexity. The increased cost is mitigated by improvements in strong scaling that can be achieved using advanced algorithms [3].
Abstract not provided.
Abstract not provided.
Abstract not provided.
When very few samples of a random quantity are available from a source distribution of unknown shape, it is usually not possible to accurately infer the exact distribution from which the data samples come. Under-estimation of important quantities such as response variance and failure probabilities can result. For many engineering purposes, including design and risk analysis, we attempt to avoid under-estimation with a strategy to conservatively estimate (bound) these types of quantities -- without being overly conservative -- when only a few samples of a random quantity are available from model predictions or replicate experiments. This report examines a class of related sparse-data uncertainty representation and inference approaches that are relatively simple, inexpensive, and effective. Tradeoffs between the methods' conservatism, reliability, and risk versus number of data samples (cost) are quantified with multi-attribute metrics use d to assess method performance for conservative estimation of two representative quantities: central 95% of response; and 10-4 probability of exceeding a response threshold in a tail of the distribution. Each method's performance is characterized with 10,000 random trials on a large number of diverse and challenging distributions. The best method and number of samples to use in a given circumstance depends on the uncertainty quantity to be estimated, the PDF character, and the desired reliability of bounding the true value. On the basis of this large data base and study, a strategy is proposed for selecting the method and number of samples for attaining reasonable credibility levels in bounding these types of quantities when sparse samples of random variables or functions are available from experiments or simulations.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Computational Physics
We develop and demonstrate a new, hybrid simulation approach for charged fluids, which combines the accuracy of the nonlocal, classical density functional theory (cDFT) with the efficiency of the Poisson–Nernst–Planck (PNP) equations. The approach is motivated by the fact that the more accurate description of the physics in the cDFT model is required only near the charged surfaces, while away from these regions the PNP equations provide an acceptable representation of the ionic system. We formulate the hybrid approach in two stages. The first stage defines a coupled hybrid model in which the PNP and cDFT equations act independently on two overlapping domains, subject to suitable interface coupling conditions. At the second stage we apply the principles of the alternating Schwarz method to the hybrid model by using the interface conditions to define the appropriate boundary conditions and volume constraints exchanged between the PNP and the cDFT subdomains. Numerical examples with two representative examples of ionic systems demonstrate the numerical properties of the method and its potential to reduce the computational cost of a full cDFT calculation, while retaining the accuracy of the latter near the charged surfaces.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.