Publications

Results 6201–6400 of 9,998

Search results

Jump to search filters

Task mapping stencil computations for non-contiguous allocations

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Leung, Vitus J.; Bunde, David P.; Ebbers, Johnathan; Feer, Stefan P.; Price, Nickolas W.; Rhodes, Zachary D.; Swank, Matthew

We examine task mapping algorithms for systems that allocate jobs non-contiguously. Several studies have shown that task placement affects job running time. We focus on jobs with a stencil communication pattern and use experiments on a Cray XE to evaluate novel task mapping algorithms as well as some adapted to this setting. This is done with the miniGhost miniApp which mimics the behavior of CTH, a shock physics application. Our strategies improve average and single-run times by as much as 28% and 36% over a baseline strategy, respectively.

More Details

Sensitivity of precipitation to parameter values in the community atmosphere model version 5

Swiler, Laura P.; Wildey, Timothy M.

One objective of the Climate Science for a Sustainable Energy Future (CSSEF) program is to develop the capability to thoroughly test and understand the uncertainties in the overall climate model and its components as they are being developed. The focus on uncertainties involves sensitivity analysis: the capability to determine which input parameters have a major influence on the output responses of interest. This report presents some initial sensitivity analysis results performed by Lawrence Livermore National Laboratory (LNNL), Sandia National Laboratories (SNL), and Pacific Northwest National Laboratory (PNNL). In the 2011-2012 timeframe, these laboratories worked in collaboration to perform sensitivity analyses of a set of CAM5, 2° runs, where the response metrics of interest were precipitation metrics. The three labs performed their sensitivity analysis (SA) studies separately and then compared results. Overall, the results were quite consistent with each other although the methods used were different. This exercise provided a robustness check of the global sensitivity analysis metrics and identified some strongly influential parameters.

More Details

Dakota uncertainty quantification methods applied to the NEK-5000 SAHEX model

Weirs, Vincent G.

This report summarizes the results of a NEAMS project focused on the use of uncertainty and sensitivity analysis methods within the NEK-5000 and Dakota software framework for assessing failure probabilities as part of probabilistic risk assessment. NEK-5000 is a software tool under development at Argonne National Laboratory to perform computational fluid dynamics calculations for applications such as thermohydraulics of nuclear reactor cores. Dakota is a software tool developed at Sandia National Laboratories containing optimization, sensitivity analysis, and uncertainty quantification algorithms. The goal of this work is to demonstrate the use of uncertainty quantification methods in Dakota with NEK-5000.

More Details

Optimality conditions for the numerical solution of optimization problems with PDE constraints :

Aguilo Valentin, Miguel A.; Ridzal, Denis R.

A theoretical framework for the numerical solution of partial di erential equation (PDE) constrained optimization problems is presented in this report. This theoretical framework embodies the fundamental infrastructure required to e ciently implement and solve this class of problems. Detail derivations of the optimality conditions required to accurately solve several parameter identi cation and optimal control problems are also provided in this report. This will allow the reader to further understand how the theoretical abstraction presented in this report translates to the application.

More Details

Numerical implementation of time-dependent density functional theory for extended systems in extreme environments

Baczewski, Andrew D.; Shulenburger, Luke N.; Desjarlais, Michael P.; Magyar, Rudolph J.

In recent years, DFT-MD has been shown to be a useful computational tool for exploring the properties of WDM. These calculations achieve excellent agreement with shock compression experiments, which probe the thermodynamic parameters of the Hugoniot state. New X-ray Thomson Scattering diagnostics promise to deliver independent measurements of electronic density and temperature, as well as structural information in shocked systems. However, they require the development of new levels of theory for computing the associated observables within a DFT framework. The experimentally observable x-ray scattering cross section is related to the electronic density-density response function, which is obtainable using TDDFT - a formally exact extension of conventional DFT that describes electron dynamics and excited states. In order to develop a capability for modeling XRTS data and, more generally, to establish a predictive capability for rst principles simulations of matter in extreme conditions, real-time TDDFT with Ehrenfest dynamics has been implemented in an existing PAW code for DFT-MD calculations. The purpose of this report is to record implementation details and benchmarks as the project advances from software development to delivering novel scienti c results. Results range from tests that establish the accuracy, e ciency, and scalability of our implementation, to calculations that are veri ed against accepted results in the literature. Aside from the primary XRTS goal, we identify other more general areas where this new capability will be useful, including stopping power calculations and electron-ion equilibration.

More Details

Bayesian calibration of the Community Land Model using surrogates

Ray, Jaideep R.; Swiler, Laura P.

We present results from the Bayesian calibration of hydrological parameters of the Community Land Model (CLM), which is often used in climate simulations and Earth system models. A statistical inverse problem is formulated for three hydrological parameters, conditional on observations of latent heat surface fluxes over 48 months. Our calibration method uses polynomial and Gaussian process surrogates of the CLM, and solves the parameter estimation problem using a Markov chain Monte Carlo sampler. Posterior probability densities for the parameters are developed for two sites with different soil and vegetation covers. Our method also allows us to examine the structural error in CLM under two error models. We find that surrogate models can be created for CLM in most cases. The posterior distributions are more predictive than the default parameter values in CLM. Climatologically averaging the observations does not modify the parameters' distributions significantly. The structural error model reveals a correlation time-scale which can be used to identify the physical process that could be contributing to it. While the calibrated CLM has a higher predictive skill, the calibration is under-dispersive.

More Details

Statistically significant relational data mining :

Berry, Jonathan W.; Leung, Vitus J.; Phillips, Cynthia A.; Pinar, Ali P.; Robinson, David G.

This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

More Details

Encoding and analyzing aerial imagery using geospatial semantic graphs

Rintoul, Mark D.; Watson, Jean-Paul W.; McLendon, William C.; Parekh, Ojas D.; Martin, Shawn

While collection capabilities have yielded an ever-increasing volume of aerial imagery, analytic techniques for identifying patterns in and extracting relevant information from this data have seriously lagged. The vast majority of imagery is never examined, due to a combination of the limited bandwidth of human analysts and limitations of existing analysis tools. In this report, we describe an alternative, novel approach to both encoding and analyzing aerial imagery, using the concept of a geospatial semantic graph. The advantages of our approach are twofold. First, intuitive templates can be easily specified in terms of the domain language in which an analyst converses. These templates can be used to automatically and efficiently search large graph databases, for specific patterns of interest. Second, unsupervised machine learning techniques can be applied to automatically identify patterns in the graph databases, exposing recurring motifs in imagery. We illustrate our approach using real-world data for Anne Arundel County, Maryland, and compare the performance of our approach to that of an expert human analyst.

More Details

MPPM, Viewed as a co-design effort

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Woodward, Paul R.; Jayaraj, Jagan J.; Barrett, Richard F.

The Piecewise Parabolic Method (PPM) was designed as a means of exploring compressible gas dynam-ics problems of interest in astrophysics, including super-sonic jets, compressible turbulence, stellar convection, and turbulent mixing and burning of gases in stellar interiors. Over time, the capabilities encapsulated in PPM have co-evolved with the availability of a series of high performance computing platforms. Implementation of the algorithm has adapted to and advanced with the architectural capabilities and characteristics of these machines. This adaptability of our PPM codes has enabled targeted astrophysical applica-tions of PPM to exploit these scarce resources to explore complex physical phenomena. Here we describe the means by which this was accomplished, and set a path forward, with a new miniapp, mPPM, for continuing this process in a diverse and dynamic architecture design environment. Adaptations in mPPM for the latest high performance machines are discussed that address the important issue of limited bandwidth from locally attached main memory to the microprocessor chip.

More Details

Using a complementary emulation-simulation co-design approach to assess application readiness for Processing-in-Memory systems

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stelle, George; Olivier, Stephen L.; Stark, Dylan S.; Rodrigues, Arun; Hemmert, Karl S.

Disruptive changes to computer architecture are paving the way toward extreme scale computing. The co-design strategy of collaborative research and development among computer architects, system software designers, and application teams can help to ensure that applications not only cope but thrive with these changes. In this paper, we present a novel combined co-design approach of emulation and simulation in the context of investigating future Processing in Memory (PIM) architectures. PIM enables co-location of data and computation to decrease data movement, to provide increases in memory speed and capacity compared to existing technologies and, perhaps most importantly for extreme scale, to improve energy efficiency. Our evaluation of PIM focuses on three mini-applications representing important production applications. The emulation and simulation studies examine the effects of locality-aware versus locality-oblivious data distribution and computation, and they compare PIM to conventional architectures. Both studies contribute in their own way to the overall understanding of the application-architecture interactions, and our results suggest that PIM technology shows great potential for efficient computation without negatively impacting productivity.

More Details

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Yamazaki, Ichitaro; Rajamanickam, Sivasankaran R.; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

More Details

Cubic-scaling algorithm and self-consistent field for the random-phase approximation with second-order screened exchange

Journal of Chemical Physics

Moussa, Jonathan E.

The random-phase approximation with second-order screened exchange (RPA+SOSEX) is a model of electron correlation energy with two caveats: its accuracy depends on an arbitrary choice of mean field, and it scales as O(n 5) operations and O(n3) memory for n electrons. We derive a new algorithm that reduces its scaling to O(n3) operations and O(n2) memory using controlled approximations and a new self-consistent field that approximates Brueckner coupled-cluster doubles theory with RPA+SOSEX, referred to as Brueckner RPA theory. The algorithm comparably reduces the scaling of second-order Møller-Plesset perturbation theory with smaller cost prefactors than RPA+SOSEX. Within a semiempirical model, we study H2 dissociation to test accuracy and Hn rings to verify scaling. © 2014 AIP Publishing LLC.

More Details

Surrogate models for mixed discrete-continuous variables

Studies in Computational Intelligence

Swiler, Laura P.

Large-scale computational models have become common tools for analyzing complex man-made systems. However, when coupled with optimization or uncertainty quantification methods in order to conduct extensive model exploration and analysis, the computational expense quickly becomes intractable. Furthermore, these models may have both continuous and discrete parameters. One common approach to mitigating the computational expense is the use of response surface approximations. While well developed for models with continuous parameters, they are still new and largely untested for models with both continuous and discrete parameters. In this work, we describe and investigate the performance of three types of response surfaces developed for mixed-variable models: Adaptive Component Selection and Shrinkage Operator, Treed Gaussian Process, and Gaussian Process with Special Correlation Functions. We focus our efforts on test problems with a small number of parameters of interest, a characteristic of many physics-based engineering models. We present the results of our studies and offer some insights regarding the performance of each response surface approximation method. © 2014 Springer International Publishing Switzerland.

More Details

Streaming data analytics via message passing with application to graph algorithms

Journal of Parallel and Distributed Computing

Plimpton, Steven J.; Shead, Tim

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which provides a communication model enabling a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH provides multiple communication backends including MPI and sockets/ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how streaming MapReduce operations can be implemented using the PHISH communication model, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, sub-graph isomorphism matching, and connected component finding. We also provide benchmark timings comparing MPI and socket performance for several kernel operations useful in streaming algorithms. © 2014 Elsevier Inc. All rights reserved.

More Details

Origin and effect of nonlocality in a composite

Journal of Mechanics of Materials and Structures

Silling, Stewart A.

A simple demonstration of nonlocality in a heterogeneous material is presented. By analysis of the microscale deformation of a two-component layered medium, it is shown that nonlocal interactions necessarily appear in a homogenized model of the system. Explicit expressions for the nonlocal forces are determined. The way these nonlocal forces appear in various nonlocal elasticity theories is derived. The length scales that emerge involve the constituent material properties as well as their geometrical dimensions. A peridynamic material model for the smoothed displacement field is derived. It is demonstrated by comparison with experimental data that the incorporation of nonlocality in modeling improves the prediction of the stress concentration in an open-hole tension test on a composite plate. © 2014 Mathematical Sciences Publishers.

More Details

Optimization-based conservative transport on the cubed-sphere grid

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Peterson, Kara J.; Bochev, Pavel B.; Ridzal, Denis R.

Transport algorithms are highly important for dynamical modeling of the atmosphere, where it is critical that scalar tracer species are conserved and satisfy physical bounds. We present an optimization-based algorithm for the conservative transport of scalar quantities (i.e. mass) on the cubed sphere grid, which preserves local solution bounds without the use of flux limiters. The optimization variables are the net mass updates to the cell, the objective is to minimize the discrepancy between these variables and suitable high-order cell mass update (the "target"), and the constraints are derived from the local solution bounds and the conservation of the total mass. The resulting robust and efficient algorithm for conservative and local bound-preserving transport on the sphere further demonstrates the flexibility and scope of the recently developed optimization-based modeling approach [1, 2]. © 2014 Springer-Verlag.

More Details

Asking the right questions: Benchmarking fault-tolerant extreme-scale systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Widener, Patrick W.; Ferreira, Kurt; Levy, Scott; Bridges, Patrick G.; Arnold, Dorian; Brightwell, Ronald B.

Much recent research has explored fault-tolerance mechanisms intended for current and future extreme-scale systems. Evaluations of the suitability of checkpoint-based solutions have typically been carried out using relatively uncomplicated computational kernels designed to measure floating point performance. More recent investigations have added scaled-down "proxy" applications to more closely match the composition and behavior of deployed ones. However, the information obtained from these studies (whether floating point performance or application runtime) is not necessarily of the most value in evaluating resilience strategies. We observe that even when using a more sophisticated metric, the information available from evaluating uncoordinated checkpointing using both microbenchmarks and proxy applications does not agree. This implies that not only might researchers be asking the wrong questions, but that the answers to the right ones might be unexpected and potentially misleading. We seek to open a discussion on whether benchmarks designed to provide predictable performance evaluations of HPC hardware and toolchains are providing the right feedback for the evaluation of fault-tolerance in these applications, and more generally on how benchmarking of resilience mechanisms ought to be approached in the exascale design space. © 2014 Springer-Verlag Berlin Heidelberg.

More Details

Investigating the integration of supercomputers and data-Warehouse appliances

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Oldfield, Ron A.; Davidson, George; Ulmer, Craig D.; Wilson, Andrew T.

Two decades of experience with massively parallel supercomputing has given insight into the problem domains where these architectures are cost effective. Likewise experience with database machines and more recently massively parallel database appliances has shown where these architectures are valuable. Combining both architectures to simultaneously solve problems has received much less attention. In this paper, we describe a motivating application for economic modeling that requires both HPC and database capabilities. Then we discuss hardware and software integration issues related to a direct integration of a Cray XT supercomputer and a Netezza database appliance. © 2014 Springer-Verlag Berlin Heidelberg.

More Details

Controlling self-force for unstructured particle-in-cell (PIC) codes

IEEE Transactions on Plasma Science

Bettencourt, Matthew T.

A new algorithm was developed, which reduces the self-force in particle-in-cell codes on unstructured meshes in a predictable and controllable way. This is accomplished by computing a charge density weighting function for a particle, which reproduces the Green's function solution to Poisson's equation at nodes when using a standard finite element method methodology. This provides a superior local potential and allows for particle-particle particle-mesh techniques to be used to subtract off local force contributions, including fictitious self-forces resulting in accurate long-range forces on a particle and improved local Coulomb collisions. Local physical forces are then computed using the Green's function on local particle pairs and added to the long-range forces. Results were shown with up to five orders reduction in self-force and superior intraparticle forces for two test cases. © 2013 IEEE.

More Details

Origin and effect of nonlocality in a layered composite

Silling, Stewart A.

A simple demonstration of nonlocality in a heterogeneous material is presented. By analysis of the microscale deformation of a two-component layered medium, it is shown that nonlocal interactions necessarily appear in a homogenized model of the system. Explicit expressions for the nonlocal forces are determined. The way these nonlocal forces appear in various nonlocal elasticity theories is derived. The length scales that emerge involve the constituent material properties as well as their geometrical dimen- sions. A peridynamic material model for the smoothed displacement eld is derived. It is demonstrated by comparison with experimental data that the incorporation of non- locality in modeling dramatically improves the prediction of the stress concentration in an open hole tension test on a composite plate.

More Details

BFS and coloring-based parallel algorithms for strongly connected components and related problems

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

Slota, George M.; Rajamanickam, Sivasankaran R.; Madduri, Kamesh

Finding the strongly connected components (SCCs) of a directed graph is a fundamental graph-theoretic problem. Tarjan's algorithm is an efficient serial algorithm to find SCCs, but relies on the hard-to-parallelize depth-first search (DFS). We observe that implementations of several parallel SCC detection algorithms show poor parallel performance on modern multicore platforms and large-scale networks. This paper introduces the Multistep method, a new approach that avoids work inefficiencies seen in prior SCC approaches. It does not rely on DFS, but instead uses a combination of breadth-first search (BFS) and a parallel graph coloring routine. We show that the Multistep method scales well on several real-world graphs, with performance fairly independent of topological properties such as the size of the largest SCC and the total number of SCCs. On a 16-core Intel Xeon platform, our algorithm achieves a 20X speedup over the serial approach on a 2 billion edge graph, fully decomposing it in under two seconds. For our collection of test networks, we observe that the Multistep method is 1.92X faster (mean speedup) than the state-of-the-art Hong et al. SCC method. In addition, we modify the Multistep method to find connected and weakly connected components, as well as introduce a novel algorithm for determining articulation vertices of biconnected components. These approaches all utilize the same underlying BFS and coloring routines. © 2014 IEEE.

More Details

A pervasive parallel framework for visualization: final report for FWP 10-014707

Moreland, Kenneth D.

We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documents the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.

More Details

Uncertainty quantification methods for model calibration validation, and risk analysis

16th AIAA Non-Deterministic Approaches Conference

Sargsyan, Khachik S.; Najm, H.N.; Chowdhary, Kamaljit S.; Debusschere, Bert D.; Swiler, Laura P.; Eldred, Michael S.

In this paper we propose a series of methodologies to address the problems in the NASA Langley Multidisciplinary UQ Challenge. A Bayesian approach is employed to characterize and calibrate the epistemic parameters in problem A, while variance-based global sensitivity analysis is proposed for problem B. For problems C and D we propose nested sampling methods for mixed aleatory-epistemic UQ.

More Details

Multilevel summation methods for efficient evaluation of long-range pairwise interactions in atomistic and coarse-grained molecular simulation

Bond, Stephen D.

The availability of efficient algorithms for long-range pairwise interactions is central to the success of numerous applications, ranging in scale from atomic-level modeling of materials to astrophysics. This report focuses on the implementation and analysis of the multilevel summation method for approximating long-range pairwise interactions. The computational cost of the multilevel summation method is proportional to the number of particles, N, which is an improvement over FFTbased methods whos cost is asymptotically proportional to N logN. In addition to approximating electrostatic forces, the multilevel summation method can be use to efficiently approximate convolutions with long-range kernels. As an application, we apply the multilevel summation method to a discretized integral equation formulation of the regularized generalized Poisson equation. Numerical results are presented using an implementation of the multilevel summation method in the LAMMPS software package. Preliminary results show that the computational cost of the method scales as expected, but there is still a need for further optimization.

More Details

Thermal hydraulic simulations, error estimation and parameter sensitivity studies in Drekar::CFD

Shadid, John N.; Pawlowski, Roger P.; Cyr, Eric C.; Wildey, Timothy M.

This report describes work directed towards completion of the Thermal Hydraulics Methods (THM) CFD Level 3 Milestone THM.CFD.P7.05 for the Consortium for Advanced Simulation of Light Water Reactors (CASL) Nuclear Hub effort. The focus of this milestone was to demonstrate the thermal hydraulics and adjoint based error estimation and parameter sensitivity capabilities in the CFD code called Drekar::CFD. This milestone builds upon the capabilities demonstrated in three earlier milestones; THM.CFD.P4.02 [12], completed March, 31, 2012, THM.CFD.P5.01 [15] completed June 30, 2012 and THM.CFD.P5.01 [11] completed on October 31, 2012.

More Details

Hybrid methods for cybersecurity analysis :

Davis, Warren L.; Dunlavy, Daniel D.

Early 2010 saw a signi cant change in adversarial techniques aimed at network intrusion: a shift from malware delivered via email attachments toward the use of hidden, embedded hyperlinks to initiate sequences of downloads and interactions with web sites and network servers containing malicious software. Enterprise security groups were well poised and experienced in defending the former attacks, but the new types of attacks were larger in number, more challenging to detect, dynamic in nature, and required the development of new technologies and analytic capabilities. The Hybrid LDRD project was aimed at delivering new capabilities in large-scale data modeling and analysis to enterprise security operators and analysts and understanding the challenges of detection and prevention of emerging cybersecurity threats. Leveraging previous LDRD research e orts and capabilities in large-scale relational data analysis, large-scale discrete data analysis and visualization, and streaming data analysis, new modeling and analysis capabilities were quickly brought to bear on the problems in email phishing and spear phishing attacks in the Sandia enterprise security operational groups at the onset of the Hybrid project. As part of this project, a software development and deployment framework was created within the security analyst work ow tool sets to facilitate the delivery and testing of new capabilities as they became available, and machine learning algorithms were developed to address the challenge of dynamic threats. Furthermore, researchers from the Hybrid project were embedded in the security analyst groups for almost a full year, engaged in daily operational activities and routines, creating an atmosphere of trust and collaboration between the researchers and security personnel. The Hybrid project has altered the way that research ideas can be incorporated into the production environments of Sandias enterprise security groups, reducing time to deployment from months and years to hours and days for the application of new modeling and analysis capabilities to emerging threats. The development and deployment framework has been generalized into the Hybrid Framework and incor- porated into several LDRD, WFO, and DOE/CSL projects and proposals. And most importantly, the Hybrid project has provided Sandia security analysts with new, scalable, extensible analytic capabilities that have resulted in alerts not detectable using their previous work ow tool sets.

More Details

Investigation of ALEGRA shock hydrocode algorithms using an exact free surface jet flow solution

Robinson, Allen C.

Computational testing of the arbitrary Lagrangian-Eulerian shock physics code, ALEGRA, is presented using an exact solution that is very similar to a shaped charge jet flow. The solution is a steady, isentropic, subsonic free surface flow with significant compression and release and is provided as a steady state initial condition. There should be no shocks and no entropy production throughout the problem. The purpose of this test problem is to present a detailed and challenging computation in order to provide evidence for algorithmic strengths and weaknesses in ALEGRA which should be examined further. The results of this work are intended to be used to guide future algorithmic improvements in the spirit of test-driven development processes.

More Details

Xyce parallel electronic simulator users' guide, Version 6.0.1

Keiter, Eric R.; Warrender, Christina E.; Mei, Ting M.; Russo, Thomas V.; Schiek, Richard S.; Thornquist, Heidi K.; Verley, Jason V.; Coffey, Todd S.; Pawlowski, Roger P.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

More Details

Xyce parallel electronic simulator reference guide, Version 6.0.1

Keiter, Eric R.; Mei, Ting M.; Russo, Thomas V.; Pawlowski, Roger P.; Schiek, Richard S.; Coffey, Todd S.; Thornquist, Heidi K.; Verley, Jason V.; Warrender, Christina E.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

More Details

Using simulation to evaluate the performance of resilience strategies and process failures

Levy, Scott L.; Ferreira, Kurt; Widener, Patrick W.

Fault-tolerance has been identified as a major challenge for future extreme-scale systems. Current predictions suggest that, as systems grow in size, failures will occur more frequently. Because increases in failure frequency reduce the performance and scalability of these systems, significant effort has been devoted to developing and refining resilience mechanisms to mitigate the impact of failures. However, effective evaluation of these mechanisms has been challenging. Current systems are smaller and have significantly different architectural features (e.g., interconnect, persistent storage) than we expect to see in next-generation systems. To overcome these challenges, we propose the use of simulation. Simulation has been shown to be an effective tool for investigating performance characteristics of applications on future systems. In this work, we: identify the set of system characteristics that are necessary for accurate performance prediction of resilience mechanisms for HPC systems and applications; demonstrate how these system characteristics can be incorporated into an existing large-scale simulator; and evaluate the predictive performance of our modified simulator. We also describe how we were able to optimize the simulator for large temporal and spatial scales-allowing the simulator to run 4x faster and use over 100x less memory.

More Details

SNAP: Strong scaling high fidelity molecular dynamics simulations on leadership-class computing platforms

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Trott, Christian R.; Hammond, Simon D.; Thompson, Aidan P.

The rapidly improving compute capability of contemporary processors and accelerators is providing the opportunity for significant increases in the accuracy and fidelity of scientific calculations. In this paper we present performance studies of a new molecular dynamics (MD) potential called SNAP. The SNAP potential has shown great promise in accurately reproducing physics and chemistry not described by simpler potentials. We have developed new algorithms to exploit high single-node concurrency provided by three different classes of machine: the Titan GPU-based system operated by Oak Ridge National Laboratory, the combined Sequoia and Vulcan BlueGene/Q machines located at Lawrence Livermore National Laboratory, and the large-scale Intel Sandy Bridge system, Chama, located at Sandia. Our analysis focuses on strong scaling experiments with approximately 246,000 atoms over the range 1-122,880 nodes on Sequoia/Vulcan and 40-18,630 nodes on Titan. We compare these machine in terms of both simulation rate and power efficiency. We find that node performance correlates with power consumption across the range of machines, except for the case of extreme strong scaling, where more powerful compute nodes show greater efficiency. This study is a unique assessment of a challenging, scientifically relevant calculation running on several of the world's leading contemporary production supercomputing platforms. © 2014 Springer International Publishing.

More Details

Simulation of workflow and threat characteristics for cyber security incident response teams

Proceedings of the Human Factors and Ergonomics Society

Reed, Theodore M.; Abbott, Robert G.; Anderson, Benjamin R.; Nauer, Kevin S.

Within large organizations, the defense of cyber assets generally involves the use of various mechanisms, such as intrusion detection systems, to alert cyber security personnel to suspicious network activity. Resulting alerts are reviewed by the organization's cyber security personnel to investigate and assess the threat and initiate appropriate actions to defend the organization's network assets. While automated software routines are essential to cope with the massive volumes of data transmitted across data networks, the ultimate success of an organization's efforts to resist adversarial attacks upon their cyber assets relies on the effectiveness of individuals and teams. This paper reports research to understand the factors that impact the effectiveness of Cyber Security Incidence Response Teams (CSIRTs). Specifically, a simulation is described that captures the workflow within a CSIRT. The simulation is then demonstrated in a study comparing the differential response time to threats that vary with respect to key characteristics (attack trajectory, targeted asset and perpetrator). It is shown that the results of the simulation correlate with data from the actual incident response times of a professional CSIRT.

More Details

Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications

Proceedings of ExaMPI 2014: Exascale MPI 2014 - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stark, Dylan S.; Barrett, Richard F.; Grant, Ryan E.; Olivier, Stephen L.; Laros, James H.; Vaughan, Courtenay T.

Advances in node-level architecture and interconnect technology needed to reach extreme scale necessitate a reevaluation of long-standing models of computation, in particular bulk synchronous processing. The end of Dennard-scaling and subsequent increases in CPU core counts each successive generation of general purpose processor has made the ability to leverage parallelism for communication an increasingly critical aspect for future extreme-scale application performance. But the use of massive multithreading in combination with MPI is an open research area, with many proposed approaches requiring code changes that can be unfeasible for important large legacy applications already written in MPI. This paper covers the design and initial evaluation of an extension of a massive multithreading runtime system supporting dynamic parallelism to interface with MPI to handle fine-grain parallel communication and communication-computation overlap. Our initial evaluation of the approach uses the ubiquitous stencil computation, in three dimensions, with the halo exchange as the driving example that has a demonstrated tie to real code bases. The preliminary results suggest that even for a very well-studied and balanced workload and message exchange pattern, co-scheduling work and communication tasks is effective at significant levels of decomposition using up to 131,072 cores. Furthermore, we demonstrate useful communication-computation overlap when handling blocking send and receive calls, and show evidence suggesting that we can decrease the burstiness of network traffic, with a corresponding decrease in the rate of stalls (congestion) seen on the host link and network.

More Details

Enhancing least-squares finite element methods through a quantity-of-interest

SIAM Journal on Numerical Analysis

Cyr, Eric C.; Chaudhry, Jehanzeb H.; Liu, Kuo; Manteuffel, Thomas A.; Olson, Luke N.; Tang, Lei

In this paper we introduce an approach that augments least-squares finite element formulations with user-specified quantities-of-interest. The method incorporates the quantity-ofinterest into the least-squares functional and inherits the global approximation properties of the standard formulation as well as increased resolution of the quantity-of-interest. We establish theoretical properties such as optimality and enhanced convergence under a set of general assumptions. Central to the approach is that it offers an element-level estimate of the error in the quantity-ofinterest. As a result, we introduce an adaptive approach that yields efficient, adaptively refined approximations. Several numerical experiments for a range of situations are presented to support the theory and highlight the effectiveness of our methodology. Notably, the results show that the new approach is effective at improving the accuracy per total computational cost.

More Details

Development, characterization, and modeling of a TaOx ReRAM for a neuromorphic accelerator

ECS Transactions

Marinella, Matthew J.; Mickel, Patrick R.; Lohn, Andrew L.; Hughart, David R.; Bondi, Robert J.; Mamaluy, Denis M.; Hjalmarson, Harold P.; Stevens, James E.; Decker, Seth D.; Apodaca, Roger A.; Evans, Brian R.; Aimone, James B.; Rothganger, Fredrick R.; James, Conrad D.; DeBenedictis, Erik

Resistive random access memory (ReRAM), or memristors, may be capable of significantly improve the efficiency of neuromorphic computing, when used as a central component of an analog hardware accelerator. However, the significant electrical variation within a device and between devices degrades the maximum efficiency and accuracy which can be achieved by a ReRAMbased neuromorphic accelerator. In this report, the electrical variability is characterized, with a particular focus on that which is due to fundamental, intrinsic factors. Analytical and ab initio models are presented which offer some insight into the factors responsible for this variability.

More Details

Development, characterization, and modeling of a TaOx ReRAM for a neuromorphic accelerator

ECS Transactions

Marinella, Matthew J.; Mickel, Patrick R.; Lohn, Andrew L.; Hughart, David R.; Bondi, Robert J.; Mamaluy, Denis M.; Hjalmarson, Harold P.; Stevens, James E.; Decker, Seth D.; Apodaca, Roger A.; Evans, Brian R.; Aimone, James B.; Rothganger, Fredrick R.; James, Conrad D.; DeBenedictis, Erik

Resistive random access memory (ReRAM), or memristors, may be capable of significantly improve the efficiency of neuromorphic computing, when used as a central component of an analog hardware accelerator. However, the significant electrical variation within a device and between devices degrades the maximum efficiency and accuracy which can be achieved by a ReRAMbased neuromorphic accelerator. In this report, the electrical variability is characterized, with a particular focus on that which is due to fundamental, intrinsic factors. Analytical and ab initio models are presented which offer some insight into the factors responsible for this variability.

More Details

PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks

Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Slota, George M.; Madduri, Kamesh; Rajamanickam, Sivasankaran R.

We present PuLP, a parallel and memory-efficient graph partitioning method specifically designed to partition low-diameter networks with skewed degree distributions. Graph partitioning is an important Big Data problem because it impacts the execution time and energy efficiency of graph analytics on distributed-memory platforms. Partitioning determines the in-memory layout of a graph, which affects locality, intertask load balance, communication time, and overall memory utilization of graph analytics. A novel feature of our method PuLP (Partitioning using Label Propagation) is that it optimizes for multiple objective metrics simultaneously, while satisfying multiple partitioning constraints. Using our method, we are able to partition a web crawl with billions of edges on a single compute server in under a minute. For a collection of test graphs, we show that PuLP uses 8-39× less memory than state-of-the-art partitioners and is up to 14.5× faster, on average, than alternate approaches (with 16-way parallelism). We also achieve better partitioning quality results for the multi-objective scenario.

More Details

Exploiting geometric partitioning in task mapping for parallel computers

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

Deveci, Mehmet; Rajamanickam, Sivasankaran R.; Leung, Vitus J.; Pedretti, Kevin P.; Olivier, Stephen L.; Bunde, David P.; Catalyurek, Umit V.; Devine, Karen D.

We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a job are not necessarily located within a contiguous block nor within close proximity to each other in the network. The goal is to assign tasks to cores so that interdependent tasks are performed by 'nearby' cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We show that, for the structured finite difference mini-app Mini Ghost, our mapping method reduced execution time 34% on average on 65,536 cores of a Cray XE6. In a molecular dynamics mini-app, Mini MD, our mapping method reduced communication time by 26% on average on 6144 cores. We also compare our mapping with graph-based mappings from the LibTopoMap library and show that our mappings reduced the communication time on average by 15% in MiniGhost and 10% in MiniMD. © 2014 IEEE.

More Details

Gaussian process adaptive importance sampling

International Journal for Uncertainty Quantification

Dalbey, Keith D.; Swiler, Laura P.

The objective is to calculate the probability, PF, that a device will fail when its inputs, x, are randomly distributed with probability density, p (x), e.g., the probability that a device will fracture when subject to varying loads. Here failure is defined as some scalar function, y (x), exceeding a threshold, T. If evaluating y (x) via physical or numerical experiments is sufficiently expensive or PF is sufficiently small, then Monte Carlo (MC) methods to estimate PF will be unfeasible due to the large number of function evaluations required for a specified accuracy. Importance sampling (IS), i.e., preferentially sampling from “important” regions in the input space and appropriately down-weighting to obtain an unbiased estimate, is one approach to assess PF more efficiently. The inputs are sampled from an importance density, pʹ (x). We present an adaptive importance sampling (AIS) approach which endeavors to adaptively improve the estimate of the ideal importance density, p* (x), during the sampling process. Our approach uses a mixture of component probability densities that each approximate p* (x). An iterative process is used to construct the sequence of improving component probability densities. At each iteration, a Gaussian process (GP) surrogate is used to help identify areas in the space where failure is likely to occur. The GPs are not used to directly calculate the failure probability; they are only used to approximate the importance density. Thus, our Gaussian process adaptive importance sampling (GPAIS) algorithm overcomes limitations involving using a potentially inaccurate surrogate model directly in IS calculations. This robust GPAIS algorithm performs surprisingly well on a pathological test function.

More Details

Spacecraft state-of-health (SOH) analysis via data mining

13th International Conference on Space Operations, SpaceOps 2014

Lindsay, Stephen R.; Woodbridge, Diane W.

Spacecraft state-of-health (SOH) analysis typically consists of limit-checking to compare incoming measurand values against their predetermined limits. While useful, this approach requires significant engineering insight along with the ability to evolve limit values over time as components degrade and their operating environment changes. In addition, it fails to take into account the effects of measurand combinations, as multiple values together could signify an imminent problem. A more powerful approach is to apply data mining techniques to uncover hidden trends and patterns as well as interactions among groups of measurands. In an internal research and development effort, software engineers at Sandia National Laboratories explored ways to mine SOH data from a remote sensing spacecraft. Because our spacecraft uses variable sample rates and packetized telemetry to transmit values for 30,000 measurands across 700 unique packet IDs, our data is characterized by a wide disparity of time and value pairs. We discuss how we summarized and aligned this data to be efficiently applied to data mining algorithms. We apply supervised learning including decision tree and principal component analysis and unsupervised learning including k-means and orthogonal partitioning clustering and one-class support vector machine to four different spacecraft SOH scenarios after the data preprocessing step. Our experiment results show that data mining is a very good low-cost and high-payoff approach to SOH analysis and provides an excellent way to exploit vast quantities of time-series data among groups of measurands in different scenarios. Our scenarios show that the supervised cases were particularly useful in identifying key contributors to anomalous events, and the unsupervised cases were well-suited for automated analysis of the system as a whole. The developed underlying models can be updated over time to accurately represent a changing operating environment and ultimately to extend the mission lifetime of our valuable space assets.

More Details

Reducing the bulk of the bulk synchronous parallel model

Parallel Processing Letters

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon D.

For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.

More Details

LDRD final report : mesoscale modeling of dynamic loading of heterogeneous materials

Robbins, Joshua R.; Dingreville, Remi P.; Voth, Thomas E.; Furnish, Michael D.

Material response to dynamic loading is often dominated by microstructure (grain structure, porosity, inclusions, defects). An example critically important to Sandia's mission is dynamic strength of polycrystalline metals where heterogeneities lead to localization of deformation and loss of shear strength. Microstructural effects are of broad importance to the scientific community and several institutions within DoD and DOE; however, current models rely on inaccurate assumptions about mechanisms at the sub-continuum or mesoscale. Consequently, there is a critical need for accurate and robust methods for modeling heterogeneous material response at this lower length scale. This report summarizes work performed as part of an LDRD effort (FY11 to FY13; project number 151364) to meet these needs.

More Details

Edge remap for solids

Love, Edward L.; Robinson, Allen C.; Ridzal, Denis R.

We review the edge element formulation for describing the kinematics of hyperelastic solids. This approach is used to frame the problem of remapping the inverse deformation gradient for Arbitrary Lagrangian-Eulerian (ALE) simulations of solid dynamics. For hyperelastic materials, the stress state is completely determined by the deformation gradient, so remapping this quantity effectively updates the stress state of the material. A method, inspired by the constrained transport remap in electromagnetics, is reviewed, according to which the zero-curl constraint on the inverse deformation gradient is implicitly satisfied. Open issues related to the accuracy of this approach are identified. An optimization-based approach is implemented to enforce positivity of the determinant of the deformation gradient. The efficacy of this approach is illustrated with numerical examples.

More Details

QCAD simulation and optimization of semiconductor double quantum dots

Nielsen, Erik N.; Gao, Xujiao G.; Kalashnikova, Irina; Muller, Richard P.; Salinger, Andrew G.; Young, Ralph W.

We present the Quantum Computer Aided Design (QCAD) simulator that targets modeling quantum devices, particularly silicon double quantum dots (DQDs) developed for quantum qubits. The simulator has three di erentiating features: (i) its core contains nonlinear Poisson, e ective mass Schrodinger, and Con guration Interaction solvers that have massively parallel capability for high simulation throughput, and can be run individually or combined self-consistently for 1D/2D/3D quantum devices; (ii) the core solvers show superior convergence even at near-zero-Kelvin temperatures, which is critical for modeling quantum computing devices; (iii) it couples with an optimization engine Dakota that enables optimization of gate voltages in DQDs for multiple desired targets. The Poisson solver includes Maxwell- Boltzmann and Fermi-Dirac statistics, supports Dirichlet, Neumann, interface charge, and Robin boundary conditions, and includes the e ect of dopant incomplete ionization. The solver has shown robust nonlinear convergence even in the milli-Kelvin temperature range, and has been extensively used to quickly obtain the semiclassical electrostatic potential in DQD devices. The self-consistent Schrodinger-Poisson solver has achieved robust and monotonic convergence behavior for 1D/2D/3D quantum devices at very low temperatures by using a predictor-correct iteration scheme. The QCAD simulator enables the calculation of dot-to-gate capacitances, and comparison with experiment and between solvers. It is observed that computed capacitances are in the right ballpark when compared to experiment, and quantum con nement increases capacitance when the number of electrons is xed in a quantum dot. In addition, the coupling of QCAD with Dakota allows to rapidly identify which device layouts are more likely leading to few-electron quantum dots. Very efficient QCAD simulations on a large number of fabricated and proposed Si DQDs have made it possible to provide fast feedback for design comparison and optimization.

More Details

Power/energy use cases for high performance computing

Laros, James H.; Kelly, Suzanne M.

Power and Energy have been identified as a first order challenge for future extreme scale high performance computing (HPC) systems. In practice the breakthroughs will need to be provided by the hardware vendors. But to make the best use of the solutions in an HPC environment, it will likely require periodic tuning by facility operators and software components. This document describes the actions and interactions needed to maximize power resources. It strives to cover the entire operational space in which an HPC system occupies. The descriptions are presented as formal use cases, as documented in the Unified Modeling Language Specification [1]. The document is intended to provide a common understanding to the HPC community of the necessary management and control capabilities. Assuming a common understanding can be achieved, the next step will be to develop a set of Application Programing Interfaces (APIs) to which hardware vendors and software developers could utilize to steer power consumption.

More Details

Incremental learning for automated knowledge capture

Davis, Warren L.; Dixon, Kevin R.; Martin, Nathaniel M.; Wendt, Jeremy D.

People responding to high-consequence national-security situations need tools to help them make the right decision quickly. The dynamic, time-critical, and ever-changing nature of these situations, especially those involving an adversary, require models of decision support that can dynamically react as a situation unfolds and changes. Automated knowledge capture is a key part of creating individualized models of decision making in many situations because it has been demonstrated as a very robust way to populate computational models of cognition. However, existing automated knowledge capture techniques only populate a knowledge model with data prior to its use, after which the knowledge model is static and unchanging. In contrast, humans, including our national-security adversaries, continually learn, adapt, and create new knowledge as they make decisions and witness their effect. This artificial dichotomy between creation and use exists because the majority of automated knowledge capture techniques are based on traditional batch machine-learning and statistical algorithms. These algorithms are primarily designed to optimize the accuracy of their predictions and only secondarily, if at all, concerned with issues such as speed, memory use, or ability to be incrementally updated. Thus, when new data arrives, batch algorithms used for automated knowledge capture currently require significant recomputation, frequently from scratch, which makes them ill suited for use in dynamic, timecritical, high-consequence decision making environments. In this work we seek to explore and expand upon the capabilities of dynamic, incremental models that can adapt to an ever-changing feature space.

More Details

The relational blackboard

22nd Annual Conference on Behavior Representation in Modeling and Simulation, BRiMS 2013 - Co-located with the International Conference on Cognitive Modeling

Abbott, Robert G.

Modeling agent behaviors in complex task environments requires the agent to be sensitive to complex stimuli such as the positions and actions of varying numbers of other entities. Entity state updates may be received asynchronously rather than on a coordinated clock signal, so the world state must be estimated based on the most recent information available for each entity. The simulation environment is likely to be distributed across several computers over a network. This paper presents the Relational Blackboard (RBB), which is a framework developed to address these needs with clarity and efficiency. The purpose of this paper is to explain the concepts used to represent and process spatio-temporal data in the RBB framework so researchers in related areas can apply the concepts and software to their own problems of interest; detailed description of our own research will be found in other papers. The software is freely available under the BSD open-source license at http://rbb.sandia.gov.

More Details

Evaluating Near-Term Adiabatic Quantum Computing

Parekh, Ojas D.; Aidun, John B.; Dubicka, Irene D.; Landahl, Andrew J.; Shulenburger, Luke N.; Tigges, Chris P.; Wendt, Jeremy D.

This report summarizes the first year’s effort on the Enceladus project, under which Sandia was asked to evaluate the potential advantages of adiabatic quantum computing for analyzing large data sets in the near future, 5-to-10 years from now. We were not specifically evaluating the machine being sold by D-Wave Systems, Inc; we were asked to anticipate what future adiabatic quantum computers might be able to achieve. While realizing that the greatest potential anticipated from quantum computation is still far into the future, a special purpose quantum computing capability, Adiabatic Quantum Optimization (AQO), is under active development and is maturing relatively rapidly; indeed, D-Wave Systems Inc. already offers an AQO device based on superconducting flux qubits. The AQO architecture solves a particular class of problem, namely unconstrained quadratic Boolean optimization. Problems in this class include many interesting and important instances. Because of this, further investigation is warranted into the range of applicability of this class of problem for addressing challenges of analyzing big data sets and the effectiveness of AQO devices to perform specific analyses on big data. Further, it is of interest to also consider the potential effectiveness of anticipated special purpose adiabatic quantum computers (AQCs), in general, for accelerating the analysis of big data sets. The objective of the present investigation is an evaluation of the potential of AQC to benefit analysis of big data problems in the next five to ten years, with our main focus being on AQO because of its relative maturity. We are not specifically assessing the efficacy of the D-Wave computing systems, though we do hope to perform some experimental calculations on that device in the sequel to this project, at least to provide some data to compare with our theoretical estimates.

More Details

Computational Mechanics for Heterogeneous Materials

Baczewski, Andrew D.; Yarrington, Cole Y.; Bond, Stephen D.; Erikson, William W.; Lehoucq, Richard B.; Mondy, L.A.; Noble, David R.; Pierce, Flint P.; Roberts, Christine C.; Van Swol, Frank

The subject of this work is the development of models for the numerical simulation of matter, momentum, and energy balance in heterogeneous materials. These are materials that consist of multiple phases or species or that are structured on some (perhaps many) scale(s). By computational mechanics we mean to refer generally to the standard type of modeling that is done at the level of macroscopic balance laws (mass, momentum, energy). We will refer to the flow or flux of these quantities in a generalized sense as transport. At issue here are the forms of the governing equations in these complex materials which are potentially strongly inhomogeneous below some correlation length scale and are yet homogeneous on larger length scales. The question then becomes one of how to model this behavior and what are the proper multi-scale equations to capture the transport mechanisms across scales. To address this we look to the area of generalized stochastic process that underlie the transport processes in homogeneous materials. The archetypal example being the relationship between a random walk or Brownian motion stochastic processes and the associated Fokker-Planck or diffusion equation. Here we are interested in how this classical setting changes when inhomogeneities or correlations in structure are introduced into the problem. Aspects of non-classical behavior need to be addressed, such as non-Fickian behavior of the mean-squared-displacement (MSD) and non-Gaussian behavior of the underlying probability distribution of jumps. We present an experimental technique and apparatus built to investigate some of these issues. We also discuss diffusive processes in inhomogeneous systems, and the role of the chemical potential in diffusion of hard spheres is considered. Also, the relevance to liquid metal solutions is considered. Finally we present an example of how inhomogeneities in material microstructure introduce fluctuations at the meso-scale for a thermal conduction problem. These fluctuations due to random microstructures also provide a means of characterizing the aleatory uncertainty in material properties at the mesoscale.

More Details

An extended finite element method with algebraic constraints (XFEM-AC) for problems with weak discontinuities

Computer Methods in Applied Mechanics and Engineering

Kramer, Richard M.; Bochev, Pavel B.; Siefert, Christopher S.; Voth, Thomas E.

We present a new extended finite element method with algebraic constraints (XFEM-AC) for recovering weakly discontinuous solutions across internal element interfaces. If necessary, cut elements are further partitioned by a local secondary cut into body-fitting subelements. Each resulting subelement contributes an enrichment of the parent element. The enriched solutions are then tied using algebraic constraints, which enforce C0 continuity across both cuts. These constraints impose equivalence of the enriched and body-fitted finite element solutions, and are the key differentiating feature of the XFEM-AC. In so doing, a stable mixed formulation is obtained without having to explicitly construct a compatible Lagrange multiplier space and prove a formal inf-sup condition. Likewise, convergence of the XFEM-AC solution follows from its equivalence to the interface-fitted finite element solution. This relationship is further exploited to improve the numerical solution of the resulting XFEM-AC linear system. Examples are shown demonstrating the new approach for both steady-state and transient diffusion problems. © 2013 Elsevier B.V.

More Details

Qualification for PowerInsight accuracy of power measurements

Laros, James H.; Pedretti, Kevin

Accuracy of component based power measuring devices forms a necessary basis for research in the area of power-efficient and power-aware computing. The accuracy of these devices must be quantified within a reasonable tolerance. This study focuses on PowerInsight, an out- of-band embedded measuring device which takes readings of power rails on compute nodes within a HPC system in realtime. We quantify how well the device performs in comparison to a digital oscilloscope as well as PowerMon2. We show that the accuracy is within a 6% deviation on measurements under reasonable load.

More Details
Results 6201–6400 of 9,998
Results 6201–6400 of 9,998