Publications

Results 6301–6350 of 9,998

Search results

Jump to search filters

Uncertainty quantification methods for model calibration validation, and risk analysis

16th AIAA Non-Deterministic Approaches Conference

Sargsyan, Khachik S.; Najm, H.N.; Chowdhary, Kamaljit S.; Debusschere, Bert D.; Swiler, Laura P.; Eldred, Michael S.

In this paper we propose a series of methodologies to address the problems in the NASA Langley Multidisciplinary UQ Challenge. A Bayesian approach is employed to characterize and calibrate the epistemic parameters in problem A, while variance-based global sensitivity analysis is proposed for problem B. For problems C and D we propose nested sampling methods for mixed aleatory-epistemic UQ.

More Details

Multilevel summation methods for efficient evaluation of long-range pairwise interactions in atomistic and coarse-grained molecular simulation

Bond, Stephen D.

The availability of efficient algorithms for long-range pairwise interactions is central to the success of numerous applications, ranging in scale from atomic-level modeling of materials to astrophysics. This report focuses on the implementation and analysis of the multilevel summation method for approximating long-range pairwise interactions. The computational cost of the multilevel summation method is proportional to the number of particles, N, which is an improvement over FFTbased methods whos cost is asymptotically proportional to N logN. In addition to approximating electrostatic forces, the multilevel summation method can be use to efficiently approximate convolutions with long-range kernels. As an application, we apply the multilevel summation method to a discretized integral equation formulation of the regularized generalized Poisson equation. Numerical results are presented using an implementation of the multilevel summation method in the LAMMPS software package. Preliminary results show that the computational cost of the method scales as expected, but there is still a need for further optimization.

More Details

Thermal hydraulic simulations, error estimation and parameter sensitivity studies in Drekar::CFD

Shadid, John N.; Pawlowski, Roger P.; Cyr, Eric C.; Wildey, Timothy M.

This report describes work directed towards completion of the Thermal Hydraulics Methods (THM) CFD Level 3 Milestone THM.CFD.P7.05 for the Consortium for Advanced Simulation of Light Water Reactors (CASL) Nuclear Hub effort. The focus of this milestone was to demonstrate the thermal hydraulics and adjoint based error estimation and parameter sensitivity capabilities in the CFD code called Drekar::CFD. This milestone builds upon the capabilities demonstrated in three earlier milestones; THM.CFD.P4.02 [12], completed March, 31, 2012, THM.CFD.P5.01 [15] completed June 30, 2012 and THM.CFD.P5.01 [11] completed on October 31, 2012.

More Details

Hybrid methods for cybersecurity analysis :

Davis, Warren L.; Dunlavy, Daniel D.

Early 2010 saw a signi cant change in adversarial techniques aimed at network intrusion: a shift from malware delivered via email attachments toward the use of hidden, embedded hyperlinks to initiate sequences of downloads and interactions with web sites and network servers containing malicious software. Enterprise security groups were well poised and experienced in defending the former attacks, but the new types of attacks were larger in number, more challenging to detect, dynamic in nature, and required the development of new technologies and analytic capabilities. The Hybrid LDRD project was aimed at delivering new capabilities in large-scale data modeling and analysis to enterprise security operators and analysts and understanding the challenges of detection and prevention of emerging cybersecurity threats. Leveraging previous LDRD research e orts and capabilities in large-scale relational data analysis, large-scale discrete data analysis and visualization, and streaming data analysis, new modeling and analysis capabilities were quickly brought to bear on the problems in email phishing and spear phishing attacks in the Sandia enterprise security operational groups at the onset of the Hybrid project. As part of this project, a software development and deployment framework was created within the security analyst work ow tool sets to facilitate the delivery and testing of new capabilities as they became available, and machine learning algorithms were developed to address the challenge of dynamic threats. Furthermore, researchers from the Hybrid project were embedded in the security analyst groups for almost a full year, engaged in daily operational activities and routines, creating an atmosphere of trust and collaboration between the researchers and security personnel. The Hybrid project has altered the way that research ideas can be incorporated into the production environments of Sandias enterprise security groups, reducing time to deployment from months and years to hours and days for the application of new modeling and analysis capabilities to emerging threats. The development and deployment framework has been generalized into the Hybrid Framework and incor- porated into several LDRD, WFO, and DOE/CSL projects and proposals. And most importantly, the Hybrid project has provided Sandia security analysts with new, scalable, extensible analytic capabilities that have resulted in alerts not detectable using their previous work ow tool sets.

More Details

Investigation of ALEGRA shock hydrocode algorithms using an exact free surface jet flow solution

Robinson, Allen C.

Computational testing of the arbitrary Lagrangian-Eulerian shock physics code, ALEGRA, is presented using an exact solution that is very similar to a shaped charge jet flow. The solution is a steady, isentropic, subsonic free surface flow with significant compression and release and is provided as a steady state initial condition. There should be no shocks and no entropy production throughout the problem. The purpose of this test problem is to present a detailed and challenging computation in order to provide evidence for algorithmic strengths and weaknesses in ALEGRA which should be examined further. The results of this work are intended to be used to guide future algorithmic improvements in the spirit of test-driven development processes.

More Details

Xyce parallel electronic simulator users' guide, Version 6.0.1

Keiter, Eric R.; Warrender, Christina E.; Mei, Ting M.; Russo, Thomas V.; Schiek, Richard S.; Thornquist, Heidi K.; Verley, Jason V.; Coffey, Todd S.; Pawlowski, Roger P.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

More Details

Xyce parallel electronic simulator reference guide, Version 6.0.1

Keiter, Eric R.; Mei, Ting M.; Russo, Thomas V.; Pawlowski, Roger P.; Schiek, Richard S.; Coffey, Todd S.; Thornquist, Heidi K.; Verley, Jason V.; Warrender, Christina E.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

More Details

Using simulation to evaluate the performance of resilience strategies and process failures

Levy, Scott L.; Ferreira, Kurt; Widener, Patrick W.

Fault-tolerance has been identified as a major challenge for future extreme-scale systems. Current predictions suggest that, as systems grow in size, failures will occur more frequently. Because increases in failure frequency reduce the performance and scalability of these systems, significant effort has been devoted to developing and refining resilience mechanisms to mitigate the impact of failures. However, effective evaluation of these mechanisms has been challenging. Current systems are smaller and have significantly different architectural features (e.g., interconnect, persistent storage) than we expect to see in next-generation systems. To overcome these challenges, we propose the use of simulation. Simulation has been shown to be an effective tool for investigating performance characteristics of applications on future systems. In this work, we: identify the set of system characteristics that are necessary for accurate performance prediction of resilience mechanisms for HPC systems and applications; demonstrate how these system characteristics can be incorporated into an existing large-scale simulator; and evaluate the predictive performance of our modified simulator. We also describe how we were able to optimize the simulator for large temporal and spatial scales-allowing the simulator to run 4x faster and use over 100x less memory.

More Details

SNAP: Strong scaling high fidelity molecular dynamics simulations on leadership-class computing platforms

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Trott, Christian R.; Hammond, Simon D.; Thompson, Aidan P.

The rapidly improving compute capability of contemporary processors and accelerators is providing the opportunity for significant increases in the accuracy and fidelity of scientific calculations. In this paper we present performance studies of a new molecular dynamics (MD) potential called SNAP. The SNAP potential has shown great promise in accurately reproducing physics and chemistry not described by simpler potentials. We have developed new algorithms to exploit high single-node concurrency provided by three different classes of machine: the Titan GPU-based system operated by Oak Ridge National Laboratory, the combined Sequoia and Vulcan BlueGene/Q machines located at Lawrence Livermore National Laboratory, and the large-scale Intel Sandy Bridge system, Chama, located at Sandia. Our analysis focuses on strong scaling experiments with approximately 246,000 atoms over the range 1-122,880 nodes on Sequoia/Vulcan and 40-18,630 nodes on Titan. We compare these machine in terms of both simulation rate and power efficiency. We find that node performance correlates with power consumption across the range of machines, except for the case of extreme strong scaling, where more powerful compute nodes show greater efficiency. This study is a unique assessment of a challenging, scientifically relevant calculation running on several of the world's leading contemporary production supercomputing platforms. © 2014 Springer International Publishing.

More Details

Simulation of workflow and threat characteristics for cyber security incident response teams

Proceedings of the Human Factors and Ergonomics Society

Reed, Theodore M.; Abbott, Robert G.; Anderson, Benjamin R.; Nauer, Kevin S.

Within large organizations, the defense of cyber assets generally involves the use of various mechanisms, such as intrusion detection systems, to alert cyber security personnel to suspicious network activity. Resulting alerts are reviewed by the organization's cyber security personnel to investigate and assess the threat and initiate appropriate actions to defend the organization's network assets. While automated software routines are essential to cope with the massive volumes of data transmitted across data networks, the ultimate success of an organization's efforts to resist adversarial attacks upon their cyber assets relies on the effectiveness of individuals and teams. This paper reports research to understand the factors that impact the effectiveness of Cyber Security Incidence Response Teams (CSIRTs). Specifically, a simulation is described that captures the workflow within a CSIRT. The simulation is then demonstrated in a study comparing the differential response time to threats that vary with respect to key characteristics (attack trajectory, targeted asset and perpetrator). It is shown that the results of the simulation correlate with data from the actual incident response times of a professional CSIRT.

More Details

Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications

Proceedings of ExaMPI 2014: Exascale MPI 2014 - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stark, Dylan S.; Barrett, Richard F.; Grant, Ryan E.; Olivier, Stephen L.; Laros, James H.; Vaughan, Courtenay T.

Advances in node-level architecture and interconnect technology needed to reach extreme scale necessitate a reevaluation of long-standing models of computation, in particular bulk synchronous processing. The end of Dennard-scaling and subsequent increases in CPU core counts each successive generation of general purpose processor has made the ability to leverage parallelism for communication an increasingly critical aspect for future extreme-scale application performance. But the use of massive multithreading in combination with MPI is an open research area, with many proposed approaches requiring code changes that can be unfeasible for important large legacy applications already written in MPI. This paper covers the design and initial evaluation of an extension of a massive multithreading runtime system supporting dynamic parallelism to interface with MPI to handle fine-grain parallel communication and communication-computation overlap. Our initial evaluation of the approach uses the ubiquitous stencil computation, in three dimensions, with the halo exchange as the driving example that has a demonstrated tie to real code bases. The preliminary results suggest that even for a very well-studied and balanced workload and message exchange pattern, co-scheduling work and communication tasks is effective at significant levels of decomposition using up to 131,072 cores. Furthermore, we demonstrate useful communication-computation overlap when handling blocking send and receive calls, and show evidence suggesting that we can decrease the burstiness of network traffic, with a corresponding decrease in the rate of stalls (congestion) seen on the host link and network.

More Details

Enhancing least-squares finite element methods through a quantity-of-interest

SIAM Journal on Numerical Analysis

Cyr, Eric C.; Chaudhry, Jehanzeb H.; Liu, Kuo; Manteuffel, Thomas A.; Olson, Luke N.; Tang, Lei

In this paper we introduce an approach that augments least-squares finite element formulations with user-specified quantities-of-interest. The method incorporates the quantity-ofinterest into the least-squares functional and inherits the global approximation properties of the standard formulation as well as increased resolution of the quantity-of-interest. We establish theoretical properties such as optimality and enhanced convergence under a set of general assumptions. Central to the approach is that it offers an element-level estimate of the error in the quantity-ofinterest. As a result, we introduce an adaptive approach that yields efficient, adaptively refined approximations. Several numerical experiments for a range of situations are presented to support the theory and highlight the effectiveness of our methodology. Notably, the results show that the new approach is effective at improving the accuracy per total computational cost.

More Details

Development, characterization, and modeling of a TaOx ReRAM for a neuromorphic accelerator

ECS Transactions

Marinella, Matthew J.; Mickel, Patrick R.; Lohn, Andrew L.; Hughart, David R.; Bondi, Robert J.; Mamaluy, Denis M.; Hjalmarson, Harold P.; Stevens, James E.; Decker, Seth D.; Apodaca, Roger A.; Evans, Brian R.; Aimone, James B.; Rothganger, Fredrick R.; James, Conrad D.; DeBenedictis, Erik

Resistive random access memory (ReRAM), or memristors, may be capable of significantly improve the efficiency of neuromorphic computing, when used as a central component of an analog hardware accelerator. However, the significant electrical variation within a device and between devices degrades the maximum efficiency and accuracy which can be achieved by a ReRAMbased neuromorphic accelerator. In this report, the electrical variability is characterized, with a particular focus on that which is due to fundamental, intrinsic factors. Analytical and ab initio models are presented which offer some insight into the factors responsible for this variability.

More Details

Development, characterization, and modeling of a TaOx ReRAM for a neuromorphic accelerator

ECS Transactions

Marinella, Matthew J.; Mickel, Patrick R.; Lohn, Andrew L.; Hughart, David R.; Bondi, Robert J.; Mamaluy, Denis M.; Hjalmarson, Harold P.; Stevens, James E.; Decker, Seth D.; Apodaca, Roger A.; Evans, Brian R.; Aimone, James B.; Rothganger, Fredrick R.; James, Conrad D.; DeBenedictis, Erik

Resistive random access memory (ReRAM), or memristors, may be capable of significantly improve the efficiency of neuromorphic computing, when used as a central component of an analog hardware accelerator. However, the significant electrical variation within a device and between devices degrades the maximum efficiency and accuracy which can be achieved by a ReRAMbased neuromorphic accelerator. In this report, the electrical variability is characterized, with a particular focus on that which is due to fundamental, intrinsic factors. Analytical and ab initio models are presented which offer some insight into the factors responsible for this variability.

More Details

PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks

Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Slota, George M.; Madduri, Kamesh; Rajamanickam, Sivasankaran R.

We present PuLP, a parallel and memory-efficient graph partitioning method specifically designed to partition low-diameter networks with skewed degree distributions. Graph partitioning is an important Big Data problem because it impacts the execution time and energy efficiency of graph analytics on distributed-memory platforms. Partitioning determines the in-memory layout of a graph, which affects locality, intertask load balance, communication time, and overall memory utilization of graph analytics. A novel feature of our method PuLP (Partitioning using Label Propagation) is that it optimizes for multiple objective metrics simultaneously, while satisfying multiple partitioning constraints. Using our method, we are able to partition a web crawl with billions of edges on a single compute server in under a minute. For a collection of test graphs, we show that PuLP uses 8-39× less memory than state-of-the-art partitioners and is up to 14.5× faster, on average, than alternate approaches (with 16-way parallelism). We also achieve better partitioning quality results for the multi-objective scenario.

More Details

Exploiting geometric partitioning in task mapping for parallel computers

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

Deveci, Mehmet; Rajamanickam, Sivasankaran R.; Leung, Vitus J.; Pedretti, Kevin P.; Olivier, Stephen L.; Bunde, David P.; Catalyurek, Umit V.; Devine, Karen D.

We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a job are not necessarily located within a contiguous block nor within close proximity to each other in the network. The goal is to assign tasks to cores so that interdependent tasks are performed by 'nearby' cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We show that, for the structured finite difference mini-app Mini Ghost, our mapping method reduced execution time 34% on average on 65,536 cores of a Cray XE6. In a molecular dynamics mini-app, Mini MD, our mapping method reduced communication time by 26% on average on 6144 cores. We also compare our mapping with graph-based mappings from the LibTopoMap library and show that our mappings reduced the communication time on average by 15% in MiniGhost and 10% in MiniMD. © 2014 IEEE.

More Details

Gaussian process adaptive importance sampling

International Journal for Uncertainty Quantification

Dalbey, Keith D.; Swiler, Laura P.

The objective is to calculate the probability, PF, that a device will fail when its inputs, x, are randomly distributed with probability density, p (x), e.g., the probability that a device will fracture when subject to varying loads. Here failure is defined as some scalar function, y (x), exceeding a threshold, T. If evaluating y (x) via physical or numerical experiments is sufficiently expensive or PF is sufficiently small, then Monte Carlo (MC) methods to estimate PF will be unfeasible due to the large number of function evaluations required for a specified accuracy. Importance sampling (IS), i.e., preferentially sampling from “important” regions in the input space and appropriately down-weighting to obtain an unbiased estimate, is one approach to assess PF more efficiently. The inputs are sampled from an importance density, pʹ (x). We present an adaptive importance sampling (AIS) approach which endeavors to adaptively improve the estimate of the ideal importance density, p* (x), during the sampling process. Our approach uses a mixture of component probability densities that each approximate p* (x). An iterative process is used to construct the sequence of improving component probability densities. At each iteration, a Gaussian process (GP) surrogate is used to help identify areas in the space where failure is likely to occur. The GPs are not used to directly calculate the failure probability; they are only used to approximate the importance density. Thus, our Gaussian process adaptive importance sampling (GPAIS) algorithm overcomes limitations involving using a potentially inaccurate surrogate model directly in IS calculations. This robust GPAIS algorithm performs surprisingly well on a pathological test function.

More Details

Spacecraft state-of-health (SOH) analysis via data mining

13th International Conference on Space Operations, SpaceOps 2014

Lindsay, Stephen R.; Woodbridge, Diane W.

Spacecraft state-of-health (SOH) analysis typically consists of limit-checking to compare incoming measurand values against their predetermined limits. While useful, this approach requires significant engineering insight along with the ability to evolve limit values over time as components degrade and their operating environment changes. In addition, it fails to take into account the effects of measurand combinations, as multiple values together could signify an imminent problem. A more powerful approach is to apply data mining techniques to uncover hidden trends and patterns as well as interactions among groups of measurands. In an internal research and development effort, software engineers at Sandia National Laboratories explored ways to mine SOH data from a remote sensing spacecraft. Because our spacecraft uses variable sample rates and packetized telemetry to transmit values for 30,000 measurands across 700 unique packet IDs, our data is characterized by a wide disparity of time and value pairs. We discuss how we summarized and aligned this data to be efficiently applied to data mining algorithms. We apply supervised learning including decision tree and principal component analysis and unsupervised learning including k-means and orthogonal partitioning clustering and one-class support vector machine to four different spacecraft SOH scenarios after the data preprocessing step. Our experiment results show that data mining is a very good low-cost and high-payoff approach to SOH analysis and provides an excellent way to exploit vast quantities of time-series data among groups of measurands in different scenarios. Our scenarios show that the supervised cases were particularly useful in identifying key contributors to anomalous events, and the unsupervised cases were well-suited for automated analysis of the system as a whole. The developed underlying models can be updated over time to accurately represent a changing operating environment and ultimately to extend the mission lifetime of our valuable space assets.

More Details

Reducing the bulk of the bulk synchronous parallel model

Parallel Processing Letters

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon D.

For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.

More Details
Results 6301–6350 of 9,998
Results 6301–6350 of 9,998