ATDM Data Warehouse: Data Management Services for Exascale Computing
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Drinking water systems face multiple challenges, including aging infrastructure, water quality concerns, uncertainty in supply and demand, natural disasters, environmental emergencies, and cyber and terrorist attacks. All of these have the potential to disrupt a large portion of a water system causing damage to infrastructure and outages to customers. Increasing resilience to these types of hazards is essential to improving water security. As one of the United States (US) sixteen critical infrastructure sectors, drinking water is a national priority. The National Infrastructure Advisory Council defined infrastructure resilience as “the ability to reduce the magnitude and/or duration of disruptive events. The effectiveness of a resilient infrastructure or enterprise depends upon its ability to anticipate, absorb, adapt to, and/or rapidly recover from a potentially disruptive event”. Being able to predict how drinking water systems will perform during disruptive incidents and understanding how to best absorb, recover from, and more successfully adapt to such incidents can help enhance resilience.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
LAMMPS is a classical molecular dynamics code (lammps.sandia.gov) used to model materials science problems at Sandia National Laboratories and around the world. LAMMPS was one of three Sandia codes selected to participate in the Trinity KNL (TR2) Open Science period. During this period, three different problems of interest were investigated using LAMMPS. The first was benchmarking KNL performance using different force field models. The second was simulating void collapse in shocked HNS energetic material using an all-atom model. The third was simulating shock propagation through poly-crystalline RDX energetic material using a coarse-grain model, the results of which were used in an ACM Gordon Bell Prize submission. This report describes the results of these simulations, lessons learned, and some hardware issues found on Trinity KNL as part of this work.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Journal of Geophysical Research: Solid Earth
The geodetically derived interseismic moment deficit rate (MDR) provides a first-order constraint on earthquake potential and can play an important role in seismic hazard assessment, but quantifying uncertainty in MDR is a challenging problem that has not been fully addressed. We establish criteria for reliable MDR estimators, evaluate existing methods for determining the probability density of MDR, and propose and evaluate new methods. Geodetic measurements moderately far from the fault provide tighter constraints on MDR than those nearby. Previously used methods can fail catastrophically under predictable circumstances. The bootstrap method works well with strong data constraints on MDR, but can be strongly biased when network geometry is poor. We propose two new methods: the Constrained Optimization Bounding Estimator (COBE) assumes uniform priors on slip rate (from geologic information) and MDR, and can be shown through synthetic tests to be a useful, albeit conservative estimator; the Constrained Optimization Bounding Linear Estimator (COBLE) is the corresponding linear estimator with Gaussian priors rather than point-wise bounds on slip rates. COBE matches COBLE with strong data constraints on MDR. We compare results from COBE and COBLE to previously published results for the interseismic MDR at Parkfield, on the San Andreas Fault, and find similar results; thus, the apparent discrepancy between MDR and the total moment release (seismic and afterslip) in the 2004 Parkfield earthquake remains.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Digest of Technical Papers - Symposium on VLSI Technology
Analog resistive memories promise to reduce the energy of neural networks by orders of magnitude. However, the write variability and write nonlinearity of current devices prevent neural networks from training to high accuracy. We present a novel periodic carry method that uses a positional number system to overcome this while maintaining the benefit of parallel analog matrix operations. We demonstrate how noisy, nonlinear TaOx devices that could only train to 80% accuracy on MNIST, can now reach 97% accuracy, only 1% away from an ideal numeric accuracy of 98%. On a file type dataset, the TaOx devices achieve ideal numeric accuracy. In addition, low noise, linear Li1-xCoO2 devices train to ideal numeric accuracies using periodic carry on both datasets.
2017 12th System of Systems Engineering Conference, SoSE 2017
As system of systems (SoS) models become increasingly complex and interconnected a new approach is needed to capture the effects of humans within the SoS. Many real-life events have shown the detrimental outcomes of failing to account for humans in the loop. This research introduces a novel and cross-disciplinary methodology for modeling humans interacting with technologies to perform tasks within an SoS specifically within a layered physical security system use case. Metrics and formulations developed for this new way of looking at SoS termed sociotechnical SoS allow for the quantification of the interplay of effectiveness and efficiency seen in detection theory to measure the ability of a physical security system to detect and respond to threats. This methodology has been applied to a notional representation of a small military Forward Operating Base (FOB) as a proof-of-concept.
2017 12th System of Systems Engineering Conference, SoSE 2017
As system of systems (SoS) models become increasingly complex and interconnected a new approach is needed to capture the effects of humans within the SoS. Many real-life events have shown the detrimental outcomes of failing to account for humans in the loop. This research introduces a novel and cross-disciplinary methodology for modeling humans interacting with technologies to perform tasks within an SoS specifically within a layered physical security system use case. Metrics and formulations developed for this new way of looking at SoS termed sociotechnical SoS allow for the quantification of the interplay of effectiveness and efficiency seen in detection theory to measure the ability of a physical security system to detect and respond to threats. This methodology has been applied to a notional representation of a small military Forward Operating Base (FOB) as a proof-of-concept.
ACM International Conference Proceeding Series
In 2016, Lewis Rhodes Labs, (LRL), shipped the first commercially viable Neuromorphic Processing Unit, (NPU), branded as a Neuromorphic Data Microscope (NDM). This product leverages architectural mechanisms derived from the sensory cortex of the human brain to efficiently implement pattern matching. LRL and Sandia National Labs have optimized this product for streaming analytics, and demonstrated a 1,000x power per operation reduction in an FPGA format. When reduced to an ASIC, the efficiency will improve to 1,000,000x. Additionally, the neuromorphic nature of the device gives it powerful computational attributes that are counterintuitive to those schooled in traditional von Neumann architectures. The Neuromorphic Data Microscope is the first of a broad class of brain-inspired, time domain processors that will profoundly alter the functionality and economics of data processing.
Wind Energy
Forecasts of available wind power are critical in key electric power systems operations planning problems, including economic dispatch and unit commitment. Such forecasts are necessarily uncertain, limiting the reliability and cost effectiveness of operations planning models based on a single deterministic or “point” forecast. A common approach to address this limitation involves the use of a number of probabilistic scenarios, each specifying a possible trajectory of wind power production, with associated probability. We present and analyze a novel method for generating probabilistic wind power scenarios, leveraging available historical information in the form of forecasted and corresponding observed wind power time series. We estimate non-parametric forecast error densities, specifically using epi-spline basis functions, allowing us to capture the skewed and non-parametric nature of error densities observed in real-world data. We then describe a method to generate probabilistic scenarios from these basis functions that allows users to control for the degree to which extreme errors are captured.We compare the performance of our approach to the current state-of-the-art considering publicly available data associated with the Bonneville Power Administration, analyzing aggregate production of a number of wind farms over a large geographic region. Finally, we discuss the advantages of our approach in the context of specific power systems operations planning problems: stochastic unit commitment and economic dispatch. Here, our methodology is embodied in the joint Sandia – University of California Davis Prescient software package for assessing and analyzing stochastic operations strategies.
Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017
Network messaging delay historically constitutes a large portion of the wall-clock time for High Performance Computing (HPC) applications, as these applications run on many nodes and involve intensive communication among their tasks. Dragonfly network topology has emerged as a promising solution for building exascale HPC systems owing to its low network diameter and large bisection bandwidth. Dragonfly includes local links that form groups and global links that connect these groups via high bandwidth optical links. Many aspects of the dragonfly network design are yet to be explored, such as the performance impact of the connectivity of the global links, i.e., global link arrangements, the bandwidth of the local and global links, or the job allocation algorithm. This paper first introduces a packet-level simulation framework to model the performance of HPC applications in detail. The proposed framework is able to simulate known MPI (message passing interface) routines as well as applications with custom-defined communication patterns for a given job placement algorithm and network topology. Using this simulation framework, we investigate the coupling between global link bandwidth and arrangements, communication pattern and intensity, job allocation and task mapping algorithms, and routing mechanisms in dragonfly topologies. We demonstrate that by choosing the right combination of system settings and workload allocation algorithms, communication overhead can be decreased by up to 44%. We also show that circulant arrangement provides up to 15% higher bisection bandwidth compared to the other arrangements, but for realistic workloads, the performance impact of link arrangements is less than 3%.
Abstract not provided.
Journal of Computational Physics
A Lagrangian particle method (called LPM) based on the flow map is presented for tracer transport on the sphere. The particles carry tracer values and are located at the centers and vertices of triangular Lagrangian panels. Remeshing is applied to control particle disorder and two schemes are compared, one using direct tracer interpolation and another using inverse flow map interpolation with sampling of the initial tracer density. Test cases include a moving-vortices flow and reversing-deformational flow with both zero and nonzero divergence, as well as smooth and discontinuous tracers. We examine the accuracy of the computed tracer density and tracer integral, and preservation of nonlinear correlation in a pair of tracers. We compare results obtained using LPM and the Lin–Rood finite-volume scheme. An adaptive particle/panel refinement scheme is demonstrated.
Applied Numerical Mathematics
Implicit–Explicit (IMEX) schemes are widely used for time integration methods for approximating solutions to a large class of problems. In this work, we develop accurate a posteriori error estimates of a quantity-of-interest for approximations obtained from multi-stage IMEX schemes. This is done by first defining a finite element method that is nodally equivalent to an IMEX scheme, then using typical methods for adjoint-based error estimation. The use of a nodally equivalent finite element method allows a decomposition of the error into multiple components, each describing the effect of a different portion of the method on the total error in a quantity-of-interest.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
This report summarizes a NEAMS (Nuclear Energy Advanced Modeling and Simulation) project focused on integrating Dakota into the NEAMS Workbench. The NEAMS Workbench, developed at Oak Ridge National Laboratory, is a new software framework that provides a graphical user interface, input file creation, parsing, validation, job execution, workflow management, and output processing for a variety of nuclear codes. Dakota is a tool developed at Sandia National Laboratories that provides a suite of uncertainty quantification and optimization algorithms. Providing Dakota within the NEAMS Workbench allows users of nuclear simulation codes to perform uncertainty and optimization studies on their nuclear codes from within a common, integrated environment. Details of the integration and parsing are provided, along with an example of Dakota running a sampling study on the fuels performance code, BISON, from within the NEAMS Workbench.
Abstract not provided.
Abstract not provided.
Abstract not provided.
The FY17Q3 milestone of the ECP/VTK-m project includes the completion of a VTK-m filter that computes normal vectors for surfaces. Normal vectors are those that point perpendicular to the surface and are an important direction when rendering the surface. The implementation includes the parallel algorithm itself, a filter module to simplify integrating it into other software, and documentation in the VTK-m Users’ Guide. With the completion of this milestone, we are able to necessary information to rendering systems to provide appropriate shading of surfaces. This milestone also feeds into subsequent milestones that progressively improve the approximation of surface direction.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Scripta Materialia
Additive manufacturing offers unprecedented opportunities to design complex structures optimized for performance envelopes inaccessible under conventional manufacturing constraints. Additive processes also promote realization of engineered materials with microstructures and properties that are impossible via traditional synthesis techniques. Enthused by these capabilities, optimization design tools have experienced a recent revival. The current capabilities of additive processes and optimization tools are summarized briefly, while an emerging opportunity is discussed to achieve a holistic design paradigm whereby computational tools are integrated with stochastic process and material awareness to enable the concurrent optimization of design topologies, material constructs and fabrication processes.
Proceedings of the International Joint Conference on Neural Networks
Considerable effort is currently being spent designing neuromorphic hardware for addressing challenging problems in a variety of pattern-matching applications. These neuromorphic systems offer low power architectures with intrinsically parallel and simple spiking neuron processing elements. Unfortunately, these new hardware architectures have been largely developed without a clear justification for using spiking neurons to compute quantities for problems of interest. Specifically, the use of spiking for encoding information in time has not been explored theoretically with complexity analysis to examine the operating conditions under which neuromorphic computing provides a computational advantage (time, space, power, etc.) In this paper, we present and formally analyze the use of temporal coding in a neural-inspired algorithm for optimization-based computation in neural spiking architectures.
Proceedings of the International Joint Conference on Neural Networks
Neural machine learning methods, such as deep neural networks (DNN), have achieved remarkable success in a number of complex data processing tasks. These methods have arguably had their strongest impact on tasks such as image and audio processing - data processing domains in which humans have long held clear advantages over conventional algorithms. In contrast to biological neural systems, which are capable of learning continuously, deep artificial networks have a limited ability for incorporating new information in an already trained network. As a result, methods for continuous learning are potentially highly impactful in enabling the application of deep networks to dynamic data sets. Here, inspired by the process of adult neurogenesis in the hippocampus, we explore the potential for adding new neurons to deep layers of artificial neural networks in order to facilitate their acquisition of novel information while preserving previously trained data representations. Our results on the MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes lower and upper case letters and digits, demonstrate that neurogenesis is well suited for addressing the stability-plasticity dilemma that has long challenged adaptive machine learning algorithms.
Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
We consider the problem of writing performance portablesparse matrix-sparse matrix multiplication (SPGEMM) kernelfor many-core architectures. We approach the SPGEMMkernel from the perspectives of algorithm design and implementation, and its practical usage. First, we design ahierarchical, memory-efficient SPGEMM algorithm. We thendesign and implement thread scalable data structures thatenable us to develop a portable SPGEMM implementation. We show that the method achieves performance portabilityon massively threaded architectures, namely Intel's KnightsLanding processors (KNLs) and NVIDIA's Graphic ProcessingUnits (GPUs), by comparing its performance to specializedimplementations. Second, we study an important aspectof SPGEMM's usage in practice by reusing the structure ofinput matrices, and show speedups up to 3× compared to thebest specialized implementation on KNLs. We demonstratethat the portable method outperforms 4 native methods on2 different GPU architectures (up to 17× speedup), and it ishighly thread scalable on KNLs, in which it obtains 101× speedup on 256 threads.
Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
The in-memory graph layout affects performance of distributed-memory graph computations. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, and reordering vertex and edge identifiers. In this work, we consider one-dimensional graph layouts, where disjoint sets of vertices and their adjacencies are partitioned among processors. Using the PuLP graph partitioning method and a breadth-first search (BFS)-based vertex ordering strategy, we empirically evaluate the impact of this graph layout on a collection of five distributed-memory graph computations. Our evaluation considers several objective metrics in addition to execution time, and we observe a considerable performance improvement over randomization.
Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
Energy efficiency in high performance computing (HPC) will be critical to limit operating costs and carbon footprints in future supercomputing centers. Energy efficiency of a computation can be improved by reducing time to completion without a substantial increase in power drawn or by reducing power with a little increase in time to completion. We present an Adaptive Core-specific Runtime (ACR) that dynamically adapts core frequencies to workload characteristics, and show examples of both reductions in power and improvement in the average performance. This improvement in energy efficiency is obtained without changes to the application. The adaptation policy embedded in the runtime uses existing core-specific power controls like software-controlled clock modulation and per-core Dynamic Voltage Frequency Scaling (DVFS) introduced in Intel Haswell. Experiments on six standard MPI benchmarks and a real world application show an overall 20% improvement in energy efficiency with less than 1% increase in execution time on 32 nodes (1024 cores) using per-core DVFS. An improvement in energy efficiency of up to 42% is obtained with the real world application ParaDis through a combination of speedup and power reduction. For one configuration, ParaDis achieves an average speedup of 11%, while the power is lowered by about 31%. The average improvement in the performance seen is a direct result of the reduction in run-to-run variation and running at turbo frequencies.
2017 IEEE Int. Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2017
In order to provide high system resilience, it is important to understand the nature of the faults that occur in the field. This study analyzes fault rates from a production system that has been monitored for five years, capturing data for the entire operational lifetime of the system. The data show that devices in this system did not show any sign of aging during the monitoring period, suggesting that the lifetime of a system may be longer than five years. In DRAM, the relative incidence of fault modes changed insignificantly over the system's lifetime: The relative rate of each fault mode at the end of the system's lifetime was within 1.4 percentage point of the rate observed during the first year. SRAM caches in the system exhibited different fault modes including cache-way fault and single-bit faults. Overall, this study provides insights on how fault modes and types in a system evolve over the system's lifetime.
Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2017 - In conjunction with HPDC
This paper describes improvements in task scheduling for the Chapel parallel programming language provided in its default on-node tasking runtime, the Qthreads library. We describe a new scheduler distrib which builds on the approaches of two previous Qthreads schedulers, Sherwood and Nemesis, and combines the best aspects of both-work stealing and load balancing from Sherwood and a lock free queue access from Nemesis- to make task queuing better suited for the use of Chapel in the manycore era. We demonstrate the efficacy of this new scheduler by showing improvements in various individual benchmarks of the Chapel test suite on the Intel Knights Landing architecture.
Handbook of Uncertainty Quantification
When faced with a restrictive evaluation budget that is typical of today's highfidelity simulation models, the effective exploitation of lower-fidelity alternatives within the uncertainty quantification (UQ) process becomes critically important. Herein, we explore the use of multifidelity modeling within UQ, for which we rigorously combine information from multiple simulation-based models within a hierarchy of fidelity, in seeking accurate high-fidelity statistics at lower computational cost. Motivated by correction functions that enable the provable convergence of a multifidelity optimization approach to an optimal high-fidelity point solution, we extend these ideas to discrepancy modeling within a stochastic domain and seek convergence of a multifidelity uncertainty quantification process to globally integrated high-fidelity statistics. For constructing stochastic models of both the low-fidelity model and the model discrepancy, we employ stochastic expansion methods (non-intrusive polynomial chaos and stochastic collocation) computed by integration/interpolation on structured sparse grids or regularized regression on unstructured grids. We seek to employ a coarsely resolved grid for the discrepancy in combination with a more finely resolved Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the US Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. Grid for the low-fidelity model. The resolutions of these grids may be defined statically or determined through uniform and adaptive refinement processes. Adaptive refinement is particularly attractive, as it has the ability to preferentially target stochastic regions where the model discrepancy becomes more complex, i.e., where the predictive capabilities of the low-fidelity model start to break down and greater reliance on the high-fidelity model (via the discrepancy) is necessary. These adaptive refinement processes can either be performed separately for the different grids or within a coordinated multifidelity algorithm. In particular, we present an adaptive greedy multifidelity approach in which we extend the generalized sparse grid concept to consider candidate index set refinements drawn from multiple sparse grids, as governed by induced changes in the statistical quantities of interest and normalized by relative computational cost. Through a series of numerical experiments using statically defined sparse grids, adaptive multifidelity sparse grids, and multifidelity compressed sensing, we demonstrate that the multifidelity UQ process converges more rapidly than a single-fidelity UQ in cases where the variance of the discrepancy is reduced relative to the variance of the high-fidelity model (resulting in reductions in initial stochastic error), where the spectrum of the expansion coefficients of the model discrepancy decays more rapidly than that of the high-fidelity model (resulting in accelerated convergence rates), and/or where the discrepancy is more sparse than the high-fidelity model (requiring the recovery of fewer significant terms).
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.