Publications

Results 3901–4000 of 9,998

Search results

Jump to search filters

Integration of Dakota into the NEAMS Workbench

Swiler, Laura P.; Lefebvre, Robert A.; Langley, Brandon R.; Thompson, Adam B.

This report summarizes a NEAMS (Nuclear Energy Advanced Modeling and Simulation) project focused on integrating Dakota into the NEAMS Workbench. The NEAMS Workbench, developed at Oak Ridge National Laboratory, is a new software framework that provides a graphical user interface, input file creation, parsing, validation, job execution, workflow management, and output processing for a variety of nuclear codes. Dakota is a tool developed at Sandia National Laboratories that provides a suite of uncertainty quantification and optimization algorithms. Providing Dakota within the NEAMS Workbench allows users of nuclear simulation codes to perform uncertainty and optimization studies on their nuclear codes from within a common, integrated environment. Details of the integration and parsing are provided, along with an example of Dakota running a sampling study on the fuels performance code, BISON, from within the NEAMS Workbench.

More Details

Milestone Completion Report WBS 1.3.5.05 ECP/VTK-m FY17Q3 [MS-17/02] Faceted Surface Normals STDA05-3

Moreland, Kenneth D.

The FY17Q3 milestone of the ECP/VTK-m project includes the completion of a VTK-m filter that computes normal vectors for surfaces. Normal vectors are those that point perpendicular to the surface and are an important direction when rendering the surface. The implementation includes the parallel algorithm itself, a filter module to simplify integrating it into other software, and documentation in the VTK-m Users’ Guide. With the completion of this milestone, we are able to necessary information to rendering systems to provide appropriate shading of surfaces. This milestone also feeds into subsequent milestones that progressively improve the approximation of surface direction.

More Details

Additive manufacturing: Toward holistic design

Scripta Materialia

Jared, Bradley H.; Aguilo Valentin, Miguel A.; Beghini, Lauren L.; Boyce, Brad B.; Clark, Brett W.; Cook, Adam W.; Kaehr, Bryan J.; Robbins, Joshua R.

Additive manufacturing offers unprecedented opportunities to design complex structures optimized for performance envelopes inaccessible under conventional manufacturing constraints. Additive processes also promote realization of engineered materials with microstructures and properties that are impossible via traditional synthesis techniques. Enthused by these capabilities, optimization design tools have experienced a recent revival. The current capabilities of additive processes and optimization tools are summarized briefly, while an emerging opportunity is discussed to achieve a holistic design paradigm whereby computational tools are integrated with stochastic process and material awareness to enable the concurrent optimization of design topologies, material constructs and fabrication processes.

More Details

Optimization-based computation with spiking neurons

Proceedings of the International Joint Conference on Neural Networks

Verzi, Stephen J.; Vineyard, Craig M.; Vugrin, Eric D.; Sahakian, Meghan A.; James, Conrad D.; Aimone, James B.

Considerable effort is currently being spent designing neuromorphic hardware for addressing challenging problems in a variety of pattern-matching applications. These neuromorphic systems offer low power architectures with intrinsically parallel and simple spiking neuron processing elements. Unfortunately, these new hardware architectures have been largely developed without a clear justification for using spiking neurons to compute quantities for problems of interest. Specifically, the use of spiking for encoding information in time has not been explored theoretically with complexity analysis to examine the operating conditions under which neuromorphic computing provides a computational advantage (time, space, power, etc.) In this paper, we present and formally analyze the use of temporal coding in a neural-inspired algorithm for optimization-based computation in neural spiking architectures.

More Details

Neurogenesis deep learning: Extending deep networks to accommodate new classes

Proceedings of the International Joint Conference on Neural Networks

Draelos, Timothy J.; Miner, Nadine E.; Lamb, Christopher L.; Cox, Jonathan A.; Vineyard, Craig M.; Carlson, Kristofor D.; Severa, William M.; James, Conrad D.; Aimone, James B.

Neural machine learning methods, such as deep neural networks (DNN), have achieved remarkable success in a number of complex data processing tasks. These methods have arguably had their strongest impact on tasks such as image and audio processing - data processing domains in which humans have long held clear advantages over conventional algorithms. In contrast to biological neural systems, which are capable of learning continuously, deep artificial networks have a limited ability for incorporating new information in an already trained network. As a result, methods for continuous learning are potentially highly impactful in enabling the application of deep networks to dynamic data sets. Here, inspired by the process of adult neurogenesis in the hippocampus, we explore the potential for adding new neurons to deep layers of artificial neural networks in order to facilitate their acquisition of novel information while preserving previously trained data representations. Our results on the MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes lower and upper case letters and digits, demonstrate that neurogenesis is well suited for addressing the stability-plasticity dilemma that has long challenged adaptive machine learning algorithms.

More Details

Performance-portable sparse matrix-matrix multiplication for many-core architectures

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017

Deveci, Mehmet D.; Trott, Christian R.; Rajamanickam, Sivasankaran R.

We consider the problem of writing performance portablesparse matrix-sparse matrix multiplication (SPGEMM) kernelfor many-core architectures. We approach the SPGEMMkernel from the perspectives of algorithm design and implementation, and its practical usage. First, we design ahierarchical, memory-efficient SPGEMM algorithm. We thendesign and implement thread scalable data structures thatenable us to develop a portable SPGEMM implementation. We show that the method achieves performance portabilityon massively threaded architectures, namely Intel's KnightsLanding processors (KNLs) and NVIDIA's Graphic ProcessingUnits (GPUs), by comparing its performance to specializedimplementations. Second, we study an important aspectof SPGEMM's usage in practice by reusing the structure ofinput matrices, and show speedups up to 3× compared to thebest specialized implementation on KNLs. We demonstratethat the portable method outperforms 4 native methods on2 different GPU architectures (up to 17× speedup), and it ishighly thread scalable on KNLs, in which it obtains 101× speedup on 256 threads.

More Details

Order or shuffle: Empirically evaluating vertex order impact on parallel graph computations

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017

Slota, George M.; Rajamanickam, Sivasankaran R.; Madduri, Kamesh

The in-memory graph layout affects performance of distributed-memory graph computations. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, and reordering vertex and edge identifiers. In this work, we consider one-dimensional graph layouts, where disjoint sets of vertices and their adjacencies are partitioned among processors. Using the PuLP graph partitioning method and a breadth-first search (BFS)-based vertex ordering strategy, we empirically evaluate the impact of this graph layout on a collection of five distributed-memory graph computations. Our evaluation considers several objective metrics in addition to execution time, and we observe a considerable performance improvement over randomization.

More Details

An Adaptive Core-Specific Runtime for Energy Efficiency

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017

Bhalachandra, Sridutt; Porterfield, Allan; Olivier, Stephen L.; Prins, Jan F.

Energy efficiency in high performance computing (HPC) will be critical to limit operating costs and carbon footprints in future supercomputing centers. Energy efficiency of a computation can be improved by reducing time to completion without a substantial increase in power drawn or by reducing power with a little increase in time to completion. We present an Adaptive Core-specific Runtime (ACR) that dynamically adapts core frequencies to workload characteristics, and show examples of both reductions in power and improvement in the average performance. This improvement in energy efficiency is obtained without changes to the application. The adaptation policy embedded in the runtime uses existing core-specific power controls like software-controlled clock modulation and per-core Dynamic Voltage Frequency Scaling (DVFS) introduced in Intel Haswell. Experiments on six standard MPI benchmarks and a real world application show an overall 20% improvement in energy efficiency with less than 1% increase in execution time on 32 nodes (1024 cores) using per-core DVFS. An improvement in energy efficiency of up to 42% is obtained with the real world application ParaDis through a combination of speedup and power reduction. For one configuration, ParaDis achieves an average speedup of 11%, while the power is lowered by about 31%. The average improvement in the performance seen is a direct result of the reduction in run-to-run variation and running at turbo frequencies.

More Details

Lifetime memory reliability data from the field

2017 IEEE Int. Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2017

Siddiqua, Taniya; Sridharan, Vilas; Raasch, Steven E.; Debardeleben, Nathan; Ferreira, Kurt B.; Levy, Scott L.; Baseman, Elisabeth; Guan, Qiang

In order to provide high system resilience, it is important to understand the nature of the faults that occur in the field. This study analyzes fault rates from a production system that has been monitored for five years, capturing data for the entire operational lifetime of the system. The data show that devices in this system did not show any sign of aging during the monitoring period, suggesting that the lifetime of a system may be longer than five years. In DRAM, the relative incidence of fault modes changed insignificantly over the system's lifetime: The relative rate of each fault mode at the end of the system's lifetime was within 1.4 percentage point of the rate observed during the first year. SRAM caches in the system exhibited different fault modes including cache-way fault and single-bit faults. Overall, this study provides insights on how fault modes and types in a system evolve over the system's lifetime.

More Details

Scheduling Chapel tasks with Qthreads on manycore: A tale of two schedulers

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2017 - In conjunction with HPDC

Evans, Noah; Olivier, Stephen L.; Barrett, Richard F.; Stelle, George

This paper describes improvements in task scheduling for the Chapel parallel programming language provided in its default on-node tasking runtime, the Qthreads library. We describe a new scheduler distrib which builds on the approaches of two previous Qthreads schedulers, Sherwood and Nemesis, and combines the best aspects of both-work stealing and load balancing from Sherwood and a lock free queue access from Nemesis- to make task queuing better suited for the use of Chapel in the manycore era. We demonstrate the efficacy of this new scheduler by showing improvements in various individual benchmarks of the Chapel test suite on the Intel Knights Landing architecture.

More Details

Multifidelity uncertainty quantification using spectral stochastic discrepancy models

Handbook of Uncertainty Quantification

Eldred, Michael S.; Ng, Leo W.T.; Barone, Matthew F.; Domino, Stefan P.

When faced with a restrictive evaluation budget that is typical of today's highfidelity simulation models, the effective exploitation of lower-fidelity alternatives within the uncertainty quantification (UQ) process becomes critically important. Herein, we explore the use of multifidelity modeling within UQ, for which we rigorously combine information from multiple simulation-based models within a hierarchy of fidelity, in seeking accurate high-fidelity statistics at lower computational cost. Motivated by correction functions that enable the provable convergence of a multifidelity optimization approach to an optimal high-fidelity point solution, we extend these ideas to discrepancy modeling within a stochastic domain and seek convergence of a multifidelity uncertainty quantification process to globally integrated high-fidelity statistics. For constructing stochastic models of both the low-fidelity model and the model discrepancy, we employ stochastic expansion methods (non-intrusive polynomial chaos and stochastic collocation) computed by integration/interpolation on structured sparse grids or regularized regression on unstructured grids. We seek to employ a coarsely resolved grid for the discrepancy in combination with a more finely resolved Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the US Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. Grid for the low-fidelity model. The resolutions of these grids may be defined statically or determined through uniform and adaptive refinement processes. Adaptive refinement is particularly attractive, as it has the ability to preferentially target stochastic regions where the model discrepancy becomes more complex, i.e., where the predictive capabilities of the low-fidelity model start to break down and greater reliance on the high-fidelity model (via the discrepancy) is necessary. These adaptive refinement processes can either be performed separately for the different grids or within a coordinated multifidelity algorithm. In particular, we present an adaptive greedy multifidelity approach in which we extend the generalized sparse grid concept to consider candidate index set refinements drawn from multiple sparse grids, as governed by induced changes in the statistical quantities of interest and normalized by relative computational cost. Through a series of numerical experiments using statically defined sparse grids, adaptive multifidelity sparse grids, and multifidelity compressed sensing, we demonstrate that the multifidelity UQ process converges more rapidly than a single-fidelity UQ in cases where the variance of the discrepancy is reduced relative to the variance of the high-fidelity model (resulting in reductions in initial stochastic error), where the spectrum of the expansion coefficients of the model discrepancy decays more rapidly than that of the high-fidelity model (resulting in accelerated convergence rates), and/or where the discrepancy is more sparse than the high-fidelity model (requiring the recovery of fewer significant terms).

More Details
Results 3901–4000 of 9,998
Results 3901–4000 of 9,998