Bayesian optimization is an effective surrogate-based optimization method that has been widely used for simulation-based applications. However, the traditional Bayesian optimization (BO) method is only applicable to single-fidelity applications, whereas multiple levels of fidelity exist in reality. In this work, we propose a bi-fidelity known/unknown constrained Bayesian optimization method for design applications. The proposed framework, called sBF-BO-2CoGP, is built on a two-level CoKriging method to predict the objective function. An external binary classifier, which is also another CoKriging model, is used to distinguish between feasible and infeasible regions. The sBF-BO-2CoGP method is demonstrated using a numerical example and a flip-chip application for design optimization to minimize the warpage deformation under thermal loading conditions.
Virtual testbeds are a core component of cyber experimentation as they allow for fast and relatively inexpensive modeling of computer systems. Unlike simulations, virtual testbeds run real software on virtual hardware which allows them to capture unknown or complex behaviors. However, virtualization is known to increase latency and decrease throughput. Could these and other artifacts from virtualization undermine the experiments that we wish to run? For the past three years, we have attempted to quantify where and how virtual testbeds differ from their physical counterparts to address this concern. While performance differences have been widely studied, we aim to uncover behavioral differences. We have run over 10,000 experiments and processed over half a petabyte of data. Complete details of our methodology and our experimental results from applying that methodology are published in previous work. In this paper, we describe our lessons learned in the process of constructing and instrumenting both physical and virtual testbeds and analyzing the results from each.
Communication networks have evolved to a level of sophistication that requires computer models and numerical simulations to understand and predict their behavior. A network simulator is a software that enables the network designer to model several components of a computer network such as nodes, routers, switches and links and events such as data transmissions and packet errors in order to obtain device and network level metrics. Network simulations, as many other numerical approximations that model complex systems, are subject to the specification of parameters and operative conditions of the system. Very often the full characterization of the system and their input is not possible, therefore Uncertainty Quantification (UQ) strategies need to be deployed to evaluate the statistics of its response and behavior. UQ techniques, despite the advancements in the last two decades, still suffer in the presence of a large number of uncertain variables and when the regularity of the systems response cannot be guaranteed. In this context, multifidelity approaches have gained popularity in the UQ community recently due to their flexibility and robustness with respect to these challenges. The main idea behind these techniques is to extract information from a limited number of high-fidelity model realizations and complement them with a much larger number of a set of lower fidelity evaluations. The final result is an estimator with a much lower variance, i.e. a more accurate and reliable estimator can be obtained. In this contribution we investigate the possibility to deploy multifidelity UQ strategies to computer network analysis. Two numerical configurations are studied based on a simplified network with one client and one server. Preliminary results for these tests suggest that multifidelity sampling techniques might be used as effective tools for UQ tools in network applications.
Haddock, Walker; Bangalore, Purushotham V.; Curry, Matthew L.; Skjellum, Anthony
Exascale computing demands high bandwidth and low latency I/O on the computing edge. Object storage systems can provide higher bandwidth and lower latencies than tape archive. File transfer nodes present a single point of mediation through which data moving between these storage systems must pass. By increasing the performance of erasure coding, stripes can be subdivided into large numbers of shards. This paper’s contribution is a prototype nearline disk object storage system based on Ceph. We show that using general purpose graphical processing units (GPGPU) for erasure coding on file transfer nodes is effective when using a large number of shards. We describe an architecture for nearline disk archive storage for use with high performance computing (HPC) and demonstrate the performance with benchmarking results. We compare the benchmark performance of our design with the IntelR⃝ Storage Acceleration Library (ISA-L) CPU based erasure coding libraries using the native Ceph erasure coding feature.
This paper considers response surface approximations for discontinuous quantities of interest. Our objective is not to adaptively characterize the interface defining the discontinuity. Instead, we utilize an epistemic description of the uncertainty in the location of a discontinuity to produce robust bounds on sample-based estimates of probabilistic quantities of interest. We demonstrate that two common machine learning strategies for classification, one based on nearest neighbors (Voronoi cells) and one based on support vector machines, provide reasonable descriptions of the region where the discontinuity may reside. In higher dimensional spaces, we demonstrate that support vector machines are more accurate for discontinuities defined by smooth interfaces. We also show how gradient information, often available via adjoint-based approaches, can be used to define indicators to effectively detect a discontinuity and to decompose the samples into clusters using an unsupervised learning technique. Numerical results demonstrate the epistemic bounds on probabilistic quantities of interest for simplistic models and for a compressible fluid model with a shock-induced discontinuity.
Probabilistic simulations of the post-closure performance of a generic deep geologic repository for commercial spent nuclear fuel in shale host rock provide a test case for comparing sensitivity analysis methods available in Geologic Disposal Safety Assessment (GDSA) Framework, the U.S. Department of Energy's state-of-the-art toolkit for repository performance assessment. Simulations assume a thick low-permeability shale with aquifers (potential paths to the biosphere) above and below the host rock. Multi-physics simulations on the 7-million-cell grid are run in a high-performance computing environment with PFLOTRAN. Epistemic uncertain inputs include properties of the engineered and natural systems. The output variables of interest, maximum I-129 concentrations (independent of time) at observation points in the aquifers, vary over several orders of magnitude. Variance-based global sensitivity analyses (i.e., calculations of sensitivity indices) conducted with Dakota use polynomial chaos expansion (PCE) and Gaussian process (GP) surrogate models. Results of analyses conducted with raw output concentrations and with log-transformed output concentrations are compared. Using log-transformed concentrations results in larger sensitivity indices for more influential input variables, smaller sensitivity indices for less influential input variables, and more consistent values for sensitivity indices between methods (PCE and GP) and between analyses repeated with samples of different sizes.
Significant testing is required to design and certify primary aircraft structures subject to High Energy Dynamic Impact (HEDI) events; current work under the NASA Advanced Composites Consortium (ACC) HEDI Project seeks to determine the state-of-the-art of dynamic fracture simulations for composite structures in these events. This paper discusses one of three Progressive Damage Analysis (PDA) methods selected for the second phase of the NASA ACC project: peridynamics, through its implementation in EMU. A brief discussion of peridynamic theory is provided, including the effects of nonlinearity and strain rate dependence of the matrix followed by a blind prediction and test-analysis correlation for ballistic impact testing performed for configured skin-stringer panels.
The study of hypersonic flows and their underlying aerothermochemical reactions is particularly important in the design and analysis of vehicles exiting and reentering Earth’s atmosphere. Computational physics codes can be employed to simulate these phenomena; however, code verification of these codes is necessary to certify their credibility. To date, few approaches have been presented for verifying codes that simulate hypersonic flows, especially flows reacting in thermochemical nonequilibrium. In this paper, we present our code-verification techniques for hypersonic reacting flows in thermochemical nonequilibrium, as well as their deployment in the Sandia Parallel Aerodynamics and Reentry Code (SPARC).
Near-wall turbulence models in Large-Eddy Simulation (LES) typically approximate near-wall behavior using a solution to the mean flow equations. This approach inevitably leads to errors when the modeled flow does not satisfy the assumptions surrounding the use of a mean flow approximation for an unsteady boundary condition. Herein, modern machine learning (ML) techniques are utilized to implement a coordinate frame invariant model of the wall shear stress that is derived specifically for complex flows for which mean near-wall models are known to fail. The model operates on a set of scalar and vector invariants based on data taken from the first LES grid point off the wall. Neural networks were trained and validated on spatially filtered direct numerical simulation (DNS) data. The trained networks were then tested on data to which they were never previously exposed and comparisons of the accuracy of the networks’ predictions of wall-shear stress were made to both a standard mean wall model approach and to the true stress values taken from the DNS data. The ML approach showed considerable improvement in both the accuracy of individual shear stress predictions as well as produced a more accurate distribution of wall shear stress values than did the standard mean wall model. This result held both in regions where the standard mean approach typically performs satisfactorily as well as in regions where it is known to fail, and also in cases where the networks were trained and tested on data taken from the same flow type/region as well as when trained and tested on data from different respective flow topologies.
We describe new machine-learning-based methods to defeature CAD models for tetrahedral meshing. Using machine learning predictions of mesh quality for geometric features of a CAD model prior to meshing we can identify potential problem areas and improve meshing outcomes by presenting a prioritized list of suggested geometric operations to users. Our machine learning models are trained using a combination of geometric and topological features from the CAD model and local quality metrics for ground truth. We demonstrate a proof-of-concept implementation of the resulting work ow using Sandia's Cubit Geometry and Meshing Toolkit.
The study of hypersonic flows and their underlying aerothermochemical reactions is particularly important in the design and analysis of vehicles exiting and reentering Earth’s atmosphere. Computational physics codes can be employed to simulate these phenomena; however, code verification of these codes is necessary to certify their credibility. To date, few approaches have been presented for verifying codes that simulate hypersonic flows, especially flows reacting in thermochemical nonequilibrium. In this paper, we present our code-verification techniques for hypersonic reacting flows in thermochemical nonequilibrium, as well as their deployment in the Sandia Parallel Aerodynamics and Reentry Code (SPARC).
Composition of computational science applications into both ad hoc pipelines for analysis of collected or generated data and into well-defined and repeatable workflows is becoming increasingly popular. Meanwhile, dedicated high performance computing storage environments are rapidly becoming more diverse, with both significant amounts of non-volatile memory storage and mature parallel file systems available. At the same time, computational science codes are being coupled to data analysis tools which are not filesystem-oriented. In this paper, we describe how the FAODEL data management service can expose different available data storage options and mediate among them in both application- and FAODEL-directed ways. These capabilities allow applications to exploit their knowledge of the different types of data they may exchange during a workflow execution, and also provide FAODEL with mechanisms to proactively tune data storage behavior when appropriate. We describe the implementation of these capabilities in FAODEL and how they are used by applications, and present preliminary performance results demonstrating the potential benefits of our approach.
Krichmar, Jeffrey L.; Severa, William M.; Khan, Muhammad S.; Olds, James L.
The Artificial Intelligence (AI) revolution foretold of during the 1960s is well underway in the second decade of the twenty first century. Its period of phenomenal growth likely lies ahead. AI-operated machines and technologies will extend the reach of Homo sapiens far beyond the biological constraints imposed by evolution: outwards further into deep space, as well as inwards into the nano-world of DNA sequences and relevant medical applications. And yet, we believe, there are crucial lessons that biology can offer that will enable a prosperous future for AI. For machines in general, and for AI's especially, operating over extended periods or in extreme environments will require energy usage orders of magnitudes more efficient than exists today. In many operational environments, energy sources will be constrained. The AI's design and function may be dependent upon the type of energy source, as well as its availability and accessibility. Any plans for AI devices operating in a challenging environment must begin with the question of how they are powered, where fuel is located, how energy is stored and made available to the machine, and how long the machine can operate on specific energy units. While one of the key advantages of AI use is to reduce the dimensionality of a complex problem, the fact remains that some energy is required for functionality. Hence, the materials and technologies that provide the needed energy represent a critical challenge toward future use scenarios of AI and should be integrated into their design. Here we look to the brain and other aspects of biology as inspiration for Biomimetic Research for Energy-efficient AI Designs (BREAD).
Herrington, Adam R.; Lauritzen, Peter H.; Taylor, Mark A.; Goldhaber, Steve; Eaton; Reed; Ullrich, Paul A.
Atmospheric modeling with element-based high-order Galerkin methods presents a unique challenge to the conventional physics–dynamics coupling paradigm, due to the highly irregular distribution of nodes within an element and the distinct numerical characteristics of the Galerkin method. The conventional coupling procedure is to evaluate the physical parameterizations (physics) on the dynamical core grid. Evaluating the physics at the nodal points exacerbates numerical noise from the Galerkin method, enabling and amplifying local extrema at element boundaries. Grid imprinting may be substantially reduced through the introduction of an entirely separate, approximately isotropic finite-volume grid for evaluating the physics forcing. Integration of the spectral basis over the control volumes provides an area-average state to the physics, which is more representative of the state in the vicinity of the nodal points rather than the nodal point itself and is more consistent with the notion of a “large-scale state” required by conventional physics packages. This study documents the implementation of a quasi-equal-area physics grid into NCAR’s Community Atmosphere Model Spectral Element and is shown to be effective at mitigating grid imprinting in the solution. The physics grid is also appropriate for coupling to other components within the Community Earth System Model, since the coupler requires component fluxes to be defined on a finite-volume grid, and one can be certain that the fluxes on the physics grid are, indeed, volume averaged.
While peak shaving is commonly used to reduce power costs, chemical process facilities that can reduce power consumption on demand during emergencies (e.g., extreme weather events) bring additional value through improved resilience. For process facilities to effectively negotiate demand response (DR) contracts and make investment decisions regarding flexibility, they need to quantify their additional value to the grid. We present a grid‐centric mixed‐integer stochastic programming framework to determine the value of DR for improving grid resilience in place of capital investments that can be cost prohibitive for system operators. We formulate problems using both a linear approximation and a nonlinear alternating current power flow model. Our numerical results with both models demonstrate that DR can be used to reduce the capital investment necessary for resilience, increasing the value that chemical process facilities bring through DR. However, the linearized model often underestimates the amount of DR needed in our case studies. Published 2018. This article is a U.S. Government work and is in the public domain in the USA. AIChE J , 65: e16508, 2019
Photodetection plays a key role in basic science and technology, with exquisite performance having been achieved down to the single-photon level. Further improvements in photodetectors would open new possibilities across a broad range of scientific disciplines and enable new types of applications. However, it is still unclear what is possible in terms of ultimate performance and what properties are needed for a photodetector to achieve such performance. Here, we present a general modeling framework for photodetectors whereby the photon field, the absorption process, and the amplification process are all treated as one coupled quantum system. The formalism naturally handles field states with single or multiple photons as well as a variety of detector configurations and includes a mathematical definition of ideal photodetector performance. The framework reveals how specific photodetector architectures introduce limitations and tradeoffs for various performance metrics, providing guidance for optimization and design.
Large-scale collaborative scientific software projects require more knowledge than any one person typically possesses. This makes coordination and communication of knowledge and expertise a key factor in creating and safeguarding software quality, without which we cannot have sustainable software. However, as researchers attempt to scale up the production of software, they are confronted by problems of awareness and understanding. This presents an opportunity to develop better practices and tools that directly address these challenges. To that end, we conducted a case study of developers of the Trilinos project. We surveyed the software development challenges addressed and show how those problems are connected with what they know and how they communicate. Based on these data, we provide a series of practicable recommendations, and outline a path forward for future research.
Physical security systems (PSS) and humans are inescapably tied in the current physical security paradigm. Yet, physical security system evaluations often end at the console that displays information to the human. That is, these evaluations do not account for human-in-The-loop factors that can greatly impact performance of the security system, even though methods for doing so are well-established. This paper highlights two examples of methods for evaluating the human component of the current physical security system. One of these methods is qualitative, focusing on the information the human needs to adequately monitor alarms on a physical site. The other of these methods objectively measures the impact of false alarm rates on threat detection. These types of human-centric evaluations are often treated as unnecessary or not cost effective under the belief that human cognition is straightforward and errors can be either trained away or mitigated with technology. These assumptions are not always correct, are often surprising, and can often only be identified with objective assessments of human-system performance. Thus, taking the time to perform human element evaluations can identify unintuitive human-system weaknesses and can provide significant cost savings in the form of mitigating vulnerabilities and reducing costly system patches or retrofits to correct an issue after the system has been deployed.
The ECP/VTK-m project is providing the core capabilities to perform scientific visualization on Exascale architectures. The ECP/VTK-m project fills the critical feature gap of performing visualization and analysis on processors like graphics-based processors and many integrated core. The results of this project will be delivered in tools like ParaView, Vislt, and Ascent as well as in stand-alone form. Moreover, these projects are depending on this ECP effort to be able to make effective use of ECP architectures.
This work explores the current performance and scaling of a fully-implicit stabilized unstructured finite element (FE) variational multiscale (VMS) capability for large-scale simulations of 3D incompressible resistive magnetohydrodynamics (MHD). The large-scale linear systems that are generated by a Newton nonlinear solver approach are iteratively solved by preconditioned Krylov subspace methods. The efficiency of this approach is critically dependent on the scalability and performance of the algebraic multigrid preconditioner. This study considers the performance of the numerical methods as recently implemented in the second-generation Trilinos implementation that is 64-bit compliant and is not limited by the 32-bit global identifiers of the original Epetra-based Trilinos. The study presents representative results for a Poisson problem on 1.6 million cores of an IBM Blue Gene/Q platform to demonstrate very large-scale parallel execution. Additionally, results for a more challenging steady-state MHD generator and a transient solution of a benchmark MHD turbulence calculation for the full resistive MHD system are also presented. These results are obtained on up to 131,000 cores of a Cray XC40 and one million cores of a BG/Q system.
Emerging memory devices, such as resistive crossbars, have the capacity to store large amounts of data in a single array. Acquiring the data stored in large-capacity crossbars in a sequential fashion can become a bottleneck. We present practical methods, based on sparse sampling, to quickly acquire sparse data stored on emerging memory devices that support the basic summation kernel, reducing the acquisition time from linear to sub-linear. The experimental results show that at least an order of magnitude improvement in acquisition time can be achieved when the data are sparse. Finally, in addition, we show that the energy cost associated with our approach is competitive to that of the sequential method.
With Non-Volatile Memories (NVMs) beginning to enter the mainstream computing market, it is time to consider how to secure NVM-equipped computing systems. Recent Meltdown and Spectre attacks are evidence that security must be intrinsic to computing systems and not added as an afterthought. Processor vendors are taking the first steps and are beginning to build security primitives into commodity processors. One security primitive that is associated with the use of emerging NVMs is memory encryption. Memory encryption, while necessary, is very challenging when used with NVMs because it exacerbates the write endurance problem. Secure architectures use cryptographic metadata that must be persisted and restored to allow secure recovery of data in the event of power-loss. Specifically, encryption counters must be persistent to enable secure and functional recovery of an interrupted system. However, the cost of ensuring and maintaining persistence for these counters can be significant. In this paper, we propose a novel scheme to maintain encryption counters without the need for frequent updates. Our new memory controller design, Osiris, repurposes memory Error-Correction Codes (ECCs) to enable fast restoration and recovery of encryption counters. To evaluate our design, we use Gem5 to run eight memory-intensive workloads selected from SPEC2006 and U.S. Department of Energy (DoE) proxy applications. Compared to a write-Through counter-cache scheme, on average, Osiris can reduce 48.7% of the memory writes (increase lifetime by 1.95x), and reduce the performance overhead from 51.5% (for write-Through) to only 5.8%. Furthermore, without the need for backup battery or extra power-supply hold-up time, Osiris performs better than a battery-backed write-back (5.8% vs. 6.6% overhead) and has less write-Traffic (2.6% vs. 5.9% overhead).
Nanocrystalline metals offer significant improvements in structural performance over conventional alloys. However, their performance is limited by grain boundary instability and limited ductility. Solute segregation has been proposed as a stabilization mechanism, however the solute atoms can embrittle grain boundaries and further degrade the toughness. In the present study, we confirm the embrittling effect of solute segregation in Pt-Au alloys. However, more importantly, we show that inhomogeneous chemical segregation to the grain boundary can lead to a new toughening mechanism termed compositional crack arrest. Energy dissipation is facilitated by the formation of nanocrack networks formed when cracks arrested at regions of the grain boundaries that were starved in the embrittling element. This mechanism, in concert with triple junction crack arrest, provides pathways to optimize both thermal stability and energy dissipation. A combination of in situ tensile deformation experiments and molecular dynamics simulations elucidate both the embrittling and toughening processes that can occur as a function of solute content.
Deep neural networks are often computationally expensive, during both the training stage and inference stage. Training is always expensive, because back-propagation requires high-precision floating-pointmultiplication and addition. However, various mathematical optimizations may be employed to reduce the computational cost of inference. Optimized inference is important for reducing power consumption and latency and for increasing throughput. This chapter introduces the central approaches for optimizing deep neural network inference: pruning "unnecessary" weights, quantizing weights and inputs, sharing weights between layer units, compressing weights before transferring from main memory, distilling large high-performance models into smaller models, and decomposing convolutional filters to reduce multiply and accumulate operations. In this chapter, using a unified notation, we provide a mathematical and algorithmic description of the aforementioned deep neural network inference optimization methods.
A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expression (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities.In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naïve solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naïve solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.
High resolution simulation of viscous fingering can offer an accurate and detailed prediction for subsurface engineering processes involving fingering phenomena. The fully implicit discontinuous Galerkin (DG) method has been shown to be an accurate and stable method to model viscous fingering with high Peclet number and mobility ratio. In this paper, we present two techniques to speedup large scale simulations of this kind. The first technique relies on a simple p-adaptive scheme in which high order basis functions are employed only in elements near the finger fronts where the concentration has a sharp change. As a result, the number of degrees of freedom is significantly reduced and the simulation yields almost identical results to the more expensive simulation with uniform high order elements throughout the mesh. The second technique for speedup involves improving the solver efficiency. We present an algebraic multigrid (AMG) preconditioner which allows the DG matrix to leverage the robust AMG preconditioner designed for the continuous Galerkin (CG) finite element method. The resulting preconditioner works effectively for fixed order DG as well as p-adaptive DG problems. With the improvements provided by the p-adaptivity and AMG preconditioning, we can perform high resolution three-dimensional viscous fingering simulations required for miscible displacement with high Peclet number and mobility ratio in greater detail than before for well injection problems.
Shor's groundbreaking quantum algorithm for integer factoring provides an exponential speedup over the best-known classical algorithms. In the 20 years since Shor's algorithm was conceived, only a handful of fundamental quantum algorithmic kernels, generally providing modest polynomial speedups over classical algorithms, have been invented. To better understand the potential advantage quantum resources provide over their classical counterparts, one may consider other resources than execution time of algorithms. Quantum Approximation Algorithms direct the power of quantum computing towards optimization problems where quantum resources provide higher-quality solutions instead of faster execution times. We provide a new rigorous analysis of the recent Quantum Approximate Optimization Algorithm, demonstrating that it provably outperforms the best known classical approximation algorithm for special hard cases of the fundamental Maximum Cut graph-partitioning problem. We also develop new types of classical approximation algorithms for finding near-optimal low-energy states of physical systems arising in condensed matter by extending seminal discrete optimization techniques. Our interdisciplinary work seeks to unearth new connections between discrete optimization and quantum information science.
Gate-controllable spin-orbit coupling is often one requisite for spintronic devices. For practical spin field-effect transistors, another essential requirement is ballistic spin transport, where the spin precession length is shorter than the mean free path such that the gate-controlled spin precession is not randomized by disorder. In this letter, we report the observation of a gate-induced crossover from weak localization to weak anti-localization in the magneto-resistance of a high-mobility two-dimensional hole gas in a strained germanium quantum well. From the magneto-resistance, we extract the phase-coherence time, spin-orbit precession time, spin-orbit energy splitting, and cubic Rashba coefficient over a wide density range. The mobility and the mean free path increase with increasing hole density, while the spin precession length decreases due to increasingly stronger spin-orbit coupling. As the density becomes larger than ∼6 × 1011 cm-2, the spin precession length becomes shorter than the mean free path, and the system enters the ballistic spin transport regime. We also report here the numerical methods and code developed for calculating the magneto-resistance in the ballistic regime, where the commonly used HLN and ILP models for analyzing weak localization and anti-localization are not valid. These results pave the way toward silicon-compatible spintronic devices.
Here, the feasibility of Neumann series expansion of Maxwell’s equations in the electrostatic limit is investigated for potentially rapid and approximate subsurface imaging of geologic features proximal to metallic infrastructure in an oilfield environment. While generally useful for efficient modeling of mild conductivity perturbations in uncluttered settings, we raise the question of its suitability for situations, such as oilfield, where metallic artifacts are pervasive, and in some cases, in direct electrical contact with the conductivity perturbation on which the Neumann series is computed. Convergence of the Neumann series and its residual error are computed using the hierarchical finite element framework for a canonical oilfield model consisting of an “L” shaped, steel-cased well, energized by a steady state electrode, and penetrating a small set of mildly conducting fractures near the heel of the well. For a given node spacing h in the finite element mesh, we find that the Neumann series is ultimately convergent if the conductivity is small enough - a result consistent with previous presumptions on the necessity of small conductivity perturbations. However, we also demonstrate that the spectral radius of the Neumann series operator grows as ~ 1/h, thus suggesting that in the limit of the continuous problem h → 0, the Neumann series is intrinsically divergent for all conductivity perturbation, regardless of their smallness. The hierarchical finite element methodology itself is critically analyzed and shown to possess the h2 error convergence of traditional linear finite elements, thereby supporting the conclusion of an inescapably divergent Neumann series for this benchmark example. Application of the Neumann series to oilfield problems with metallic clutter should therefore be done with careful consideration to the coupling between infrastructure and geology. Here, the methods used here are demonstrably useful in such circumstances.
Accurate and efficient constitutive modeling remains a cornerstone issue for solid mechanics analysis. Over the years, the LAMÉ advanced material model library has grown to address this challenge by implementing models capable of describing material systems spanning soft polymers to stiff ceramics including both isotropic and anisotropic responses. Inelastic behaviors including (visco)plasticity, damage, and fracture have all incorporated for use in various analyses. This multitude of options and flexibility, however, comes at the cost of many capabilities, features, and responses and the ensuing complexity in the resulting implementation. Therefore, to enhance confidence and enable the utilization of the LAMÉ library in application, this effort seeks to document and verify the various models in the LAMÉ library. Specifically, the broader strategy, organization, and interface of the library itself is first presented. The physical theory, numerical implementation, and user guide for a large set of models is then discussed. Importantly, a number of verification tests are performed with each model to not only have confidence in the model itself but also highlight some important response characteristics and features that may be of interest to end-users. Finally, in looking ahead to the future, approaches to add material models to this library and further expand the capabilities are presented.
Nanocrystalline metals offer significant improvements in structural performance over conventional alloys. However, their performance is limited by grain boundary instability and limited ductility. Solute segregation has been proposed as a stabilization mechanism, however the solute atoms can embrittle grain boundaries and further degrade the toughness. In the present study, we confirm the embrittling effect of solute segregation in Pt–Au alloys. However, more importantly, we show that inhomogeneous chemical segregation to the grain boundary can lead to a new toughening mechanism termed compositional crack arrest. Energy dissipation is facilitated by the formation of nanocrack networks formed when cracks arrested at regions of the grain boundaries that were starved in the embrittling element. This mechanism, in concert with triple junction crack arrest, provides pathways to optimize both thermal stability and energy dissipation. A combination of in situ tensile deformation experiments and molecular dynamics simulations elucidate both the embrittling and toughening processes that can occur as a function of solute content.
We study connections between the alternating direction method of multipliers (ADMM), the classical method of multipliers (MM), and progressive hedging (PH). The connections are used to derive benchmark metrics and strategies to monitor and accelerate convergence and to help explain why ADMM and PH are capable of solving complex nonconvex NLPs. Specifically, we observe that ADMM is an inexact version of MM and approaches its performance when multiple coordination steps are performed. In addition, we use the observation that PH is a specialization of ADMM and borrow Lyapunov function and primal-dual feasibility metrics used in ADMM to explain why PH is capable of solving nonconvex NLPs. This analysis also highlights that specialized PH schemes can be derived to tackle a wider range of stochastic programs and even other problem classes. Our exposition is tutorial in nature and seeks to to motivate algorithmic improvements and new decomposition strategies
We seek scalable benchmarks for entity resolution problems. Solutions to these problems range from trivial approaches such as string sorting to sophisticated methods such as statistical relational learning. The theoretical and practical complexity of these approaches varies widely, so one of the primary purposes of a benchmark will be to quantify the trade-off between solution quality and runtime. We are motivated by the ubiquitous nature of entity resolution as a fundamental problem faced by any organization that ingests large amounts of noisy text data. A benchmark is typically a rigid specification that provides an objective measure usable for ranking implementations of an algorithm. For example the Top500 and HPCG500 bench- marks rank supercomputers based on their performance of dense and sparse linear algebra problems (respectively). These two benchmarks require participants to report FLOPS counts attainable on various machines. Our purpose is slightly different. Whereas the supercomputing benchmarks mentioned above hold algorithms constant and aim to rank machines, we are primarily interested in ranking algorithms. As mentioned above, entity resolution problems can be approached in completely different ways. We believe that users of our benchmarks must decide what sort of procedure to run before comparing implementations and architectures. Eventually, we also wish to provide a mechanism for ranking machines while holding algorithmic approach constant . Our primary contributions are parallel algorithms for computing solution quality mea- sures per entity. We find in some real datasets that many entities are quite easy to resolve while others are difficult, with a heavy skew toward the former case. Therefore, measures such as global confusion matrices, F measures, etc. do not meet our benchmarking needs. We design methods for computing solution quality at the granularity of a single entity in order to know when proposed solutions do well in difficult situations (perhaps justifying extra computational), or struggling in easy situations. We report on progress toward a viable benchmark for comparing entity resolution algo- rithms. Our work is incomplete, but we have designed and prototyped several algorithms to help evalute the solution quality of competing approaches to these problems. We envision a benchmark in which the objective measure is a ratio of solution quality to runtime.
This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.
A compelling narrative has taken hold as quantum computing explodes into the commercial sector: Quantum computing in 2018 is like classical computing in 1965. In 1965 Gordon Moore wrote his famous paper about integrated circuits, saying: "At present, [minimum cost] is reached when 50 components are used per circuit. But... the complexity for minimum component costs has increased at a rate of roughly a factor of two per year... by 1975, the number of components per integrated circuit for minimum cost will be 65,000." This narrative is both appealing (we want to believe that quantum computing will follow the incredibly successful path of classical computing!) and plausible (2018 saw IBM, Intel, and Google announce 50-qubit integrated chips). But it is also deeply misleading. Here is an alternative: Quantum computing in 2018 is like classical computing in 1938. In 1938, John Atanasoff and Clifford Berry built the very first electronic digital computer. It had no program, and was not Turing-complete. Vacuum tubes — the standard "bit" for 20 years — were still 5 years in the future. ENIAC and the achievement of "computational supremacy" (over hand calculation) wouldn't arrive for 8 years, despite the accelerative effect of WWII. Integrated circuits and the information age were more than 20 years away. Neither of these analogies is perfect. Quantum computing technology is more like 1938, while the level of funding and excitement suggest 1965 (or later!). But the point of the cautionary analogy to 1938 is simple: Quantum computing in 2018 is a research field. It is far too early to establish metrics or benchmarks for performance. The best role for neutral organizations like IEEE is to encourage and shape research into metrics and benchmarks, so as to be ready when they become necessary. This white paper presents the evidence and reasoning for this claim. We explain what it means to say that quantum computing is a "research field", and why metrics and benchmarks for quantum processors also constitute a research field. We discuss the potential for harmful consequences of prematurely establishing standards or frameworks. We conclude by suggesting specific actions that IEEE or similar organizations can take to accelerate the development of good metrics and benchmarks for quantum computing.
Molecular dynamics simulations are carried out to characterize irradiation effects in TiO2 rutile, for wide ranges of temperatures (300-900 K) and primary knock-on atom (PKA) energies (1-10 keV). The number of residual defects decreases with increased temperature and decreased PKA energy, but is independent of PKA type. In the ballistic phase, more oxygen than titanium defects are produced, however, the primary residual defects are titanium vacancies and interstitials. Defect clustering depends on the PKA energy, temperature, and defect production. For some 10 keV PKAs, the largest cluster of vacancies at the peak of the ballistic phase and after annealing has up to ≈1200 and 100 vacancies, respectively. For the 10 keV PKAs at 300 K, the energy storage, primarily in residual Ti vacancies and interstitials, is estimated at 140-310 eV. It decreases with increased temperature to as little as 5-180 eV at 900 K. Selected area electron diffraction patterns and radial distribution functions confirm that although localized amorphous regions form during the ballistic phase, TiO2 regains full crystallinity after annealing.
We study two inexact methods for solutions of random eigenvalue problems in the context of spectral stochastic finite elements. In particular, given a parameter-dependent, symmetric matrix operator, the methods solve for eigenvalues and eigenvectors represented using polynomial chaos expansions. Both methods are based on the stochastic Galerkin formulation of the eigenvalue problem and they exploit its Kronecker-product structure. The first method is an inexact variant of the stochastic inverse subspace iteration [B. Sousedfk, H. C. Elman, SIAM/ASA Journal on Uncertainty Quantification 4(1), pp. 163-189, 2016]. The second method is based on an inexact variant of Newton iteration. In both cases, the problems are formulated so that the associated stochastic Galerkin matrices are symmetric, and the corresponding linear problems are solved using preconditioned Krylov subspace methods with several novel hierarchical preconditioners. The accuracy of the methods is compared with that of Monte Carlo and stochastic collocation, and the effectiveness of the methods is illustrated by numerical experiments.
Deep neural networks (DNN) now outperform competing methods in many academic and industrial domains. These high-capacity universal function approximators have recently been leveraged by deep reinforcement learning (RL) algorithms to obtain impressive results for many control and decision making problems. During the past three years, research toward pruning, quantization, and compression of DNNs has reduced the mathematical, and therefore time and energy, requirements of DNN-based inference. For example, DNN optimization techniques have been developed which reduce storage requirements of VGG-16 from 552MB to 11.3MB, while maintaining the full-model accuracy for image classification. Building from DNN optimization results, the computer architecture community is taking increasing interest in exploring DNN hardware accelerator designs. Based on recent deep RL performance, we expect hardware designers to begin considering architectures appropriate for accelerating these algorithms too. However, it is currently unknown how, when, or if the 'noise' introduced by DNN optimization techniques will degrade deep RL performance. This work measures these impacts, using standard OpenAI Gym benchmarks. Our results show that mathematically optimized RL policies can perform equally to full-precision RL, while requiring substantially less computation. We also observe that different optimizations are better suited than others for different problem domains. By beginning to understand the impacts of mathematical optimizations on RL policy performance, this work serves as a starting point toward the development of low power or high performance deep RL accelerators.
Holes in germanium-rich heterostructures provide a compelling alternative for achieving spin based qubits compared to traditional approaches such as electrons in silicon. In this project, we addressed the question of whether holes in Ge/SiGe quantum wells can be confined into laterally defined quantum dots and made into qubits. Through this effort, we successfully fabricated and operated single-metal-layer quantum dot devices in Ge/SiGe in multiple devices. For single quantum dots, we measured the capacitances of the quantum dot to the surface electrodes and find that they reasonably compare to expected values based on the electrode dimensions, suggested that we have formed a lithographic quantum dot. We also compare the results to detailed self-consistent calculations of the expected potential. Finally, we demonstrate, for the first time, a double quantum dot in the Ge/SiGe material system.
This study explores a Bayesian calibration framework for the RAMPAGE alloy potential model for Cu-Ni and Cu-Zr systems, respectively. In RAMPAGE potentials, it is proposed that once calibrated potentials for individual elements are available, the inter-species interactions can be described by fitting a Morse potential for pair interactions with three parameters, while densities for the embedding function can be scaled by two parameters from the elemental densities. Global sensitivity analysis tools were employed to understand the impact each parameter has on the MD simulation results. A transitional Markov Chain Monte Carlo algorithm was used to generate samples from the multimodal posterior distribution consistent with the discrepancy between MD simulation results and DFT data. For the Cu-Ni system the posterior predictive tests indicate that the fitted interatomic potential model agrees well with the DFT data, justifying the basic RAMPAGE assumptions. For the Cu-Zr system, where the phase diagram suggests more complicated atomic interactions than in the case of Cu-Ni, the RAMPAGE potential captured only a subset of the DFT data. The resulting posterior distribution for the 5 model parameters exhibited several modes, with each mode corresponding to specific simulation data and a suboptimal agreement with the DFT results.
In this report, we present preliminary research into nonparametric clustering methods for multi-source imagery data and quantifying the performance of these models. In many domain areas, data sets do not necessarily follow well-defined and well-known probability distributions, such as the normal, gamma, and exponential. This is especially true when combining data from multiple sources describing a common set of objects (which we call multimodal analysis), where the data in each source can follow different distributions and need to be analyzed in conjunction with one another. This necessitates nonparametric density estimation methods, which allow the data to better dictate the distribution of the data. One prominent example of multimodal analysis is multimodal image analysis, when we analyze multiple images taken using different radar systems of the same scene of interest. We develop uncertainty analysis methods, which are inherent in the use of probabilistic models but often not taken advance of, to assess the performance of probabilistic clustering methods used for analyzing multimodal images. This added information helps assess model performance and how much trust decision-makers should have in the obtained analysis results. The developed methods illustrate some ways in which uncertainty can inform decisions that arise when designing and using machine learning models.
Scope and Objectives: Kokkos Support provides cyber resources and conducts training events for current and prospective Kokkos users; In person training events are organized in various venues providing both generic Kokkos tutorials with lectures and exercises, as well as hands-on work on users applications.
Sparse matrix-matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, KKSPGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.
ParaView Catalyst is an API for accessing the scalable visualization infrastructure of ParaView in an in-situ context. In-situ visualization allows simulation codes to access data post-processing operations while the simulation is running. In-situ techniques can reduce data post-processing time, allow computational steering, and increase the resolution and frequency of data output. For a simulation code to use ParaView Catalyst, adapter code needs to be created that interfaces the simulations data structures to ParaView/VTK data structures. Under ATDM, Catalyst is to be integrated with SPARC, a code used for simulation of unsteady reentry vehicle flow.
Prokopenko, Andrey; Thomas, Stephen; Swirydowicz, Kasia; Ananthan, Shreyas; Hu, Jonathan J.; Williams, Alan B.; Sprague, Michael
The goal of the ExaWind project is to enable predictive simulations of wind farms composed of many MW-scale turbines situated in complex terrain. Predictive simulations will require computational fluid dynamics (CFD) simulations for which the mesh resolves the geometry of the turbines, and captures the rotation and large deflections of blades. Whereas such simulations for a single turbine are arguably petascale class, multi-turbine wind farm simulations will require exascale-class resources. The primary code in the ExaWind project is Nalu, which is an unstructured-grid solver for the acousticallyincompressible Navier-Stokes equations, and mass continuity is maintained through pressure projection. The model consists of the mass-continuity Poisson-type equation for pressure and a momentum equation for the velocity. For such modeling approaches, simulation times are dominated by linear-system setup and solution for the continuity and momentum systems. For the ExaWind challenge problem, the moving meshes greatly affect overall solver costs as re-initialization of matrices and re-computation of preconditioners is required at every time step In this Milestone, we examine the effect of threading on the solver stack performance against flat-MPI results obtained from previous milestones using Haswell performance data full-turbine simulations. Whereas the momentum equations are solved only with the Trilinos solvers, we investigate two algebraic-multigrid preconditioners for the continuity equations: Trilinos/Muelu and HYPRE/BoomerAMG. These two packages embody smoothed-aggregation and classical Ruge-Stiiben AMG methods, respectively. In our FY18 Q2 report, we described our efforts to improve setup and solve of the continuity equations under flat-MPI parallelism. While significant improvement was demonstrated in the solve phase, setup times remained larger than expected. Starting with the optimized settings described in the Q2 report, we explore here simulation performance where OpenMP threading is employed in the solver stack. For Trilinos, threading is acheived through the Kokkos abstraction where, whereas HYPRE/BoomerAMG employs straight OpenMP. We examined results for our mid-resolution baseline turbine simulation configuration (229M DOF). Simulations on 2048 Haswell cores explored the effect of decreasing the number of MPI ranks while increasing the number of threads. Both HYPRE and Trilinos exhibited similar overal solution times, and both showed dramatic increases in simulation time in the shift from MPI ranks to OpenMP threads. This increase is attributed to the large amount of work per MPI rank starting at the single-thread configuration. Decreasing MPI ranks, while increasing threads, may be increasing simulation time due to thread synchronization and start-up overhead contributing to the latency and serial time in the model. These result showed that an MPI+OpenMP parallel decomposition will be more effective as the amount per MPI rank computation per MPI rank decreases and the communication latency increases. This idea was demonstrated in a strong scaling study of our low-resolution baseline model (29M DOF) with the Trilinos-HYPRE configuration. While MPI-only results showed scaling improvement out to about 1536 cores, engaging threading carried scaling improvements out to 4128 cores — roughly 7000 DOF per core. This is an important result as improved strong scaling is needed for simulations to be executed over sufficiently long simulated durations (i.e., for many timesteps). In addition to threading work described above, the team examined solver-performance improvements by exploring communication-overhead in the HYPRE-GMRES implementation through a communicationoptimal- GMRE algorithm (CO-GMRES), and offloading compute-intensive solver actions to GPUs. To those ends, a HYPRE mini-app was allow us to easily test different solver approaches and HYPRE parameter settings without running the entire Nalu code. With GPU acceleration on the Summitdev supercomputer, a 20x speedup was achieved for the overall preconditioner and solver execution time for the mini-app. A study on Haswell processors showed that CO-GMRES provides benefits as one increases MPI ranks.
This final report summarizes the results of the Laboratory Directed Research and Devel- opment (LDRD) Project Number 212587 entitled "Modeling Charged Defects in Non-Cubic Semiconductors for Radiation Effects Studies in Next Generation Electronic Materials" . The goal of this project was to extend a predictive capability for modeling defect level energies using first principle density functional theory methods (e.g., for radiation effects assessments) to semiconductors with non-cubic crystal structures. Computational methods that proved accurate for predicting defect levels in standard cubic semiconductors, were found to have shortcomings when applied to the lowered symmetry structures prevalent in next generation electronic materials such as SiC, GaN, and Ga203, stemming from an error in the treatment of the electrostatic boundary conditions. I describe methods to generalized the local moment countercharge (LMCC) scheme to position a charge in bulk supercell calculations of charged defects, circumventing the problem of measuring a dipole in a periodically replicated bulk calculation.
This SAND report fulfills the final report requirement for the Born Qualified Grand Challenge LDRD. Born Qualified was funded from FY16-FY18 with a total budget of ~$13M over the 3 years of funding. Overall 70+ staff, Post Docs, and students supported this project over its lifetime. The driver for Born Qualified was using Additive Manufacturing (AM) to change the qualification paradigm for low volume, high value, high consequence, complex parts that are common in high-risk industries such as ND, defense, energy, aerospace, and medical. AM offers the opportunity to transform design, manufacturing, and qualification with its unique capabilities. AM is a disruptive technology, allowing the capability to simultaneously create part and material while tightly controlling and monitoring the manufacturing process at the voxel level, with the inherent flexibility and agility in printing layer-by-layer. AM enables the possibility of measuring critical material and part parameters during manufacturing, thus changing the way we collect data, assess performance, and accept or qualify parts. It provides an opportunity to shift from the current iterative design-build-test qualification paradigm using traditional manufacturing processes to design-by-predictivity where requirements are addressed concurrently and rapidly. The new qualification paradigm driven by AM provides the opportunity to predict performance probabilistically, to optimally control the manufacturing process, and to implement accelerated cycles of learning. Exploiting these capabilities to realize a new uncertainty quantification-driven qualification that is rapid, flexible, and practical is the focus of this effort.
A key component of most large-scale rendering systems is a parallel image compositing algorithm, and the most commonly used compositing algorithms are binary swap and its variants. Although shown to be very efficient, one of the classic limitations of binary swap is that it only works on a number of processes that is a perfect power of 2. Multiple variations of binary swap have been independently introduced to overcome this limitation and handle process counts that have factors that are not 2. To date, few of these approaches have been directly compared against each other, making it unclear which approach is best. This paper presents a fresh implementation of each of these methods using a common software framework to make them directly comparable. These methods to run binary swap with odd factors are directly compared. The results show that some simple compositing approaches work as well or better than more complex algorithms that are more difficult to implement.
File fragment classification is an important step in the task of file carving in digital forensics. In file carving, files must be reconstructed based on their content as a result of their fragmented storage on disk or in memory. Existing methods for classification of file fragments typically use hand-engineered features, such as byte histograms or entropy measures. In this paper, we propose an approach using sparse coding that enables automated feature extraction. Sparse coding, or sparse dictionary learning, is an unsupervised learning algorithm, and is capable of extracting features based simply on how well those features can be used to reconstruct the original data. With respect to file fragments, we learn sparse dictionaries for n-grams, continuous sequences of bytes, of different sizes. These dictionaries may then be used to estimate n-gram frequencies for a given file fragment, but for significantly larger n-gram sizes than are typically found in existing methods which suffer from combinatorial explosion. To demonstrate the capability of our sparse coding approach, we used the resulting features to train standard classifiers, such as support vector machines over multiple file types. Experimentally, we achieved significantly better classification results with respect to existing methods, especially when the features were used in supplement to existing hand-engineered features.
We propose a functional integral framework for the derivation of hierarchies of Landau-Lifshitz-Bloch (LLB) equations that describe the flow toward equilibrium of the first and second moments of the magnetization. The short-scale description is defined by the stochastic Landau-Lifshitz-Gilbert equation, under both Markovian or non-Markovian noise, and takes into account interaction terms that are of practical relevance. Depending on the interactions, different hierarchies on the moments are obtained in the corresponding LLB equations. Two closure Ansätze are discussed and tested by numerical methods that are adapted to the symmetries of the problem. Our formalism provides a rigorous bridge between the atomistic spin dynamics simulations at short scales and micromagnetic descriptions at larger scales.
Trusting simulation output is crucial for Sandia's mission objectives. We rely on these simulations to perform our high-consequence mission tasks given our treaty obligations. Other science and modelling needs, while they may not be high-consequence, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed work- flow and provenance systems to aid in both automating simulation and modelling execution, but to also aid in determining exactly how was some output created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated "sandbox" and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project was an initial exploration into extending the container concept to also include storage and to use writable containers, auto generated by the system, as a way to link the contained data back to the simulation and input deck used to create it.
Reproducibility is an essential ingredient of the scientific enterprise. The ability to reproduce results builds trust that we can rely on the results as foundations for future scientific exploration. Presently, the fields of computational and computing sciences provide two opposing definitions of reproducible and replicable. In computational sciences, reproducible research means authors provide all necessary data and computer codes to run analyses again, so others can re-obtain the results (J. Claerbout et al., 1992). The concept was adopted and extended by several communities, where it was distinguished from replication: collecting new data to address the same question, and arriving at consistent findings (Peng et al. 2006). The Association of Computing Machinery (ACM), representing computer science and industry professionals, recently established a reproducibility initiative, adopting essentially opposite definitions. The purpose of this report is to raise awareness of the opposite definitions and propose a path to a compatible taxonomy.
This report summarizes the data analysis activities that were performed under the Born Qualified Grand Challenge Project from 2016 - 2018. It is meant to document the characterization of additively manufactured parts and processe s for this project as well as demonstrate and identify further analyses and data science that could be done relating material processes to microstructure to properties to performance.
This report summarizes the work performed under the Sandia LDRD project "Adverse Event Prediction Using Graph-Augmented Temporal Analysis." The goal of the project was to develop a method for analyzing multiple time-series data streams to identify precursors providing advance warning of the potential occurrence of events of interest. The proposed approach combined temporal analysis of each data stream with reasoning about relationships between data streams using a geospatial-temporal semantic graph. This class of problems is relevant to several important topics of national interest. In the course of this work we developed new temporal analysis techniques, including temporal analysis using Markov Chain Monte Carlo techniques, temporal shift algorithms to refine forecasts, and a version of Ripley's K-function extended to support temporal precursor identification. This report summarizes the project's major accomplishments, and gathers the abstracts and references for the publication sub-missions and reports that were prepared as part of this work. We then describe work in progress that is not yet ready for publication.
Running visualization and analysis algorithms on ATS-1 platforms is a critical step for supporting ATDM apps at the exascale. We are leveraging VTK-m to port our algorithms to the ATS-specific hardware and ensuring that it runs well.
Neural-inspired spike-based computing machines often claim to achieve considerable advantages in terms of energy and time efficiency by using spikes for computation and communication. However, fundamental questions about spike-based computation remain unanswered. For instance, how much advantage do spike-based approaches have over conventionalmethods, and underwhat circumstances does spike-based computing provide a comparative advantage? Simply implementing existing algorithms using spikes as the medium of computation and communication is not guaranteed to yield an advantage. Here, we demonstrate that spike-based communication and computation within algorithms can increase throughput, and they can decrease energy cost in some cases. We present several spiking algorithms, including sorting a set of numbers in ascending/descending order, as well as finding the maximum or minimum ormedian of a set of numbers.We also provide an example application: a spiking median-filtering approach for image processing providing a low-energy, parallel implementation. The algorithms and analyses presented here demonstrate that spiking algorithms can provide performance advantages and offer efficient computation of fundamental operations useful in more complex algorithms.
An analysis of microgrids to increase resilience was conducted for the island of Puerto Rico. Critical infrastructure throughout the island was mapped to the key services provided by those sectors to help inform primary and secondary service sources during a major disruption to the electrical grid. Additionally, a resilience metric of burden was developed to quantify community resilience, and a related baseline resilience figure was calculated for the area. To improve resilience, Sandia performed an analysis of where clusters of critical infrastructure are located and used these suggested resilience node locations to create a portfolio of 159 microgrid options throughout Puerto Rico. The team then calculated the impact of these microgrids on the region's ability to provide critical services during an outage, and compared this impact to high-level estimates of cost for each microgrid to generate a set of efficient microgrid portfolios costing in the range of 218-917M dollars. This analysis is a refinement of the analysis delivered on June 01, 2018.
Concurrency and Computation. Practice and Experience
Bernholdt, David E.; Boehm, Swen; Bosilca, George; Venkata, Manjunath G.; Grant, Ryan E.; Naughton, Thomas; Pritchard, Howard P.; Schulz, Martin; Vallee, Geoffroy R.
The Exascale Computing Project (ECP) is currently the primary effort in the United States focused on developing “exascale” levels of computing capabilities, including hardware, software, and applications. In order to obtain a more thorough understanding of how the software projects under the ECP are using, and planning to use the Message Passing Interface (MPI), and help guide the work of our own project within the ECP, we created a survey. Of the 97 ECP projects active at the time the survey was distributed, we received 77 responses, 56 of which reported that their projects were using MPI. Furthermore, this paper reports the results of that survey for the benefit of the broader community of MPI developers.