Publications

Results 6301–6350 of 9,998

Search results

Jump to search filters

SNAP: Strong scaling high fidelity molecular dynamics simulations on leadership-class computing platforms

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Trott, Christian R.; Hammond, Simon; Thompson, A.P.

The rapidly improving compute capability of contemporary processors and accelerators is providing the opportunity for significant increases in the accuracy and fidelity of scientific calculations. In this paper we present performance studies of a new molecular dynamics (MD) potential called SNAP. The SNAP potential has shown great promise in accurately reproducing physics and chemistry not described by simpler potentials. We have developed new algorithms to exploit high single-node concurrency provided by three different classes of machine: the Titan GPU-based system operated by Oak Ridge National Laboratory, the combined Sequoia and Vulcan BlueGene/Q machines located at Lawrence Livermore National Laboratory, and the large-scale Intel Sandy Bridge system, Chama, located at Sandia. Our analysis focuses on strong scaling experiments with approximately 246,000 atoms over the range 1-122,880 nodes on Sequoia/Vulcan and 40-18,630 nodes on Titan. We compare these machine in terms of both simulation rate and power efficiency. We find that node performance correlates with power consumption across the range of machines, except for the case of extreme strong scaling, where more powerful compute nodes show greater efficiency. This study is a unique assessment of a challenging, scientifically relevant calculation running on several of the world's leading contemporary production supercomputing platforms. © 2014 Springer International Publishing.

More Details

Simulation of workflow and threat characteristics for cyber security incident response teams

Proceedings of the Human Factors and Ergonomics Society

Reed, Theodore; Abbott, Robert G.; Anderson, Benjamin; Nauer, Kevin

Within large organizations, the defense of cyber assets generally involves the use of various mechanisms, such as intrusion detection systems, to alert cyber security personnel to suspicious network activity. Resulting alerts are reviewed by the organization's cyber security personnel to investigate and assess the threat and initiate appropriate actions to defend the organization's network assets. While automated software routines are essential to cope with the massive volumes of data transmitted across data networks, the ultimate success of an organization's efforts to resist adversarial attacks upon their cyber assets relies on the effectiveness of individuals and teams. This paper reports research to understand the factors that impact the effectiveness of Cyber Security Incidence Response Teams (CSIRTs). Specifically, a simulation is described that captures the workflow within a CSIRT. The simulation is then demonstrated in a study comparing the differential response time to threats that vary with respect to key characteristics (attack trajectory, targeted asset and perpetrator). It is shown that the results of the simulation correlate with data from the actual incident response times of a professional CSIRT.

More Details

Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications

Proceedings of ExaMPI 2014: Exascale MPI 2014 - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.; Vaughan, Courtenay T.

Advances in node-level architecture and interconnect technology needed to reach extreme scale necessitate a reevaluation of long-standing models of computation, in particular bulk synchronous processing. The end of Dennard-scaling and subsequent increases in CPU core counts each successive generation of general purpose processor has made the ability to leverage parallelism for communication an increasingly critical aspect for future extreme-scale application performance. But the use of massive multithreading in combination with MPI is an open research area, with many proposed approaches requiring code changes that can be unfeasible for important large legacy applications already written in MPI. This paper covers the design and initial evaluation of an extension of a massive multithreading runtime system supporting dynamic parallelism to interface with MPI to handle fine-grain parallel communication and communication-computation overlap. Our initial evaluation of the approach uses the ubiquitous stencil computation, in three dimensions, with the halo exchange as the driving example that has a demonstrated tie to real code bases. The preliminary results suggest that even for a very well-studied and balanced workload and message exchange pattern, co-scheduling work and communication tasks is effective at significant levels of decomposition using up to 131,072 cores. Furthermore, we demonstrate useful communication-computation overlap when handling blocking send and receive calls, and show evidence suggesting that we can decrease the burstiness of network traffic, with a corresponding decrease in the rate of stalls (congestion) seen on the host link and network.

More Details

Enhancing least-squares finite element methods through a quantity-of-interest

SIAM Journal on Numerical Analysis

Cyr, Eric C.; Chaudhry, Jehanzeb H.; Liu, Kuo; Manteuffel, Thomas A.; Olson, Luke N.; Tang, Lei

In this paper we introduce an approach that augments least-squares finite element formulations with user-specified quantities-of-interest. The method incorporates the quantity-ofinterest into the least-squares functional and inherits the global approximation properties of the standard formulation as well as increased resolution of the quantity-of-interest. We establish theoretical properties such as optimality and enhanced convergence under a set of general assumptions. Central to the approach is that it offers an element-level estimate of the error in the quantity-ofinterest. As a result, we introduce an adaptive approach that yields efficient, adaptively refined approximations. Several numerical experiments for a range of situations are presented to support the theory and highlight the effectiveness of our methodology. Notably, the results show that the new approach is effective at improving the accuracy per total computational cost.

More Details

An optimization-based atomistic-to-continuum coupling method

SIAM Journal on Numerical Analysis

Olson, Derek; Luskin, Mitchell; Shapeev, Alexander V.; Bochev, Pavel B.

We present a new optimization-based method for atomistic-to-continuum (AtC) coupling. The main idea is to cast the latter as a constrained optimization problem with virtual Dirichlet controls on the interfaces between the atomistic and continuum subdomains. The optimization objective is to minimize the error between the atomistic and continuum solutions on the overlap between the two subdomains, while the atomistic and continuum force balance equations provide the constraints. Separation, rather then blending of the atomistic and continuum problems, and their subsequent use as constraints in the optimization problem distinguishes our approach from the existing AtC formulations. We present and analyze the method in the context of a one-dimensional chain of atoms modeled using a linearized two-body potential with next-nearest neighbor interactions.

More Details

Development, characterization, and modeling of a TaOx ReRAM for a neuromorphic accelerator

ECS Transactions

Marinella, Matthew; Mickel, Patrick R.; Lohn, Andrew J.; Hughart, David R.; Bondi, Robert J.; Mamaluy, Denis; Hjalmarson, Harold P.; Stevens, James E.; Decker, Seth; Apodaca, Roger; Evans, Brian R.; Aimone, James B.; Rothganger, Fredrick R.; James, Conrad D.; Debenedictis, Erik

Resistive random access memory (ReRAM), or memristors, may be capable of significantly improve the efficiency of neuromorphic computing, when used as a central component of an analog hardware accelerator. However, the significant electrical variation within a device and between devices degrades the maximum efficiency and accuracy which can be achieved by a ReRAMbased neuromorphic accelerator. In this report, the electrical variability is characterized, with a particular focus on that which is due to fundamental, intrinsic factors. Analytical and ab initio models are presented which offer some insight into the factors responsible for this variability.

More Details

PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks

Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Slota, George M.; Madduri, Kamesh; Rajamanickam, Sivasankaran

We present PuLP, a parallel and memory-efficient graph partitioning method specifically designed to partition low-diameter networks with skewed degree distributions. Graph partitioning is an important Big Data problem because it impacts the execution time and energy efficiency of graph analytics on distributed-memory platforms. Partitioning determines the in-memory layout of a graph, which affects locality, intertask load balance, communication time, and overall memory utilization of graph analytics. A novel feature of our method PuLP (Partitioning using Label Propagation) is that it optimizes for multiple objective metrics simultaneously, while satisfying multiple partitioning constraints. Using our method, we are able to partition a web crawl with billions of edges on a single compute server in under a minute. For a collection of test graphs, we show that PuLP uses 8-39× less memory than state-of-the-art partitioners and is up to 14.5× faster, on average, than alternate approaches (with 16-way parallelism). We also achieve better partitioning quality results for the multi-objective scenario.

More Details

Spacecraft state-of-health (SOH) analysis via data mining

13th International Conference on Space Operations, SpaceOps 2014

Lindsay, Stephen R.; Woodbridge, Diane M.

Spacecraft state-of-health (SOH) analysis typically consists of limit-checking to compare incoming measurand values against their predetermined limits. While useful, this approach requires significant engineering insight along with the ability to evolve limit values over time as components degrade and their operating environment changes. In addition, it fails to take into account the effects of measurand combinations, as multiple values together could signify an imminent problem. A more powerful approach is to apply data mining techniques to uncover hidden trends and patterns as well as interactions among groups of measurands. In an internal research and development effort, software engineers at Sandia National Laboratories explored ways to mine SOH data from a remote sensing spacecraft. Because our spacecraft uses variable sample rates and packetized telemetry to transmit values for 30,000 measurands across 700 unique packet IDs, our data is characterized by a wide disparity of time and value pairs. We discuss how we summarized and aligned this data to be efficiently applied to data mining algorithms. We apply supervised learning including decision tree and principal component analysis and unsupervised learning including k-means and orthogonal partitioning clustering and one-class support vector machine to four different spacecraft SOH scenarios after the data preprocessing step. Our experiment results show that data mining is a very good low-cost and high-payoff approach to SOH analysis and provides an excellent way to exploit vast quantities of time-series data among groups of measurands in different scenarios. Our scenarios show that the supervised cases were particularly useful in identifying key contributors to anomalous events, and the unsupervised cases were well-suited for automated analysis of the system as a whole. The developed underlying models can be updated over time to accurately represent a changing operating environment and ultimately to extend the mission lifetime of our valuable space assets.

More Details

Origin and effect of nonlocality in a layered composite

Silling, Stewart

A simple demonstration of nonlocality in a heterogeneous material is presented. By analysis of the microscale deformation of a two-component layered medium, it is shown that nonlocal interactions necessarily appear in a homogenized model of the system. Explicit expressions for the nonlocal forces are determined. The way these nonlocal forces appear in various nonlocal elasticity theories is derived. The length scales that emerge involve the constituent material properties as well as their geometrical dimen- sions. A peridynamic material model for the smoothed displacement eld is derived. It is demonstrated by comparison with experimental data that the incorporation of non- locality in modeling dramatically improves the prediction of the stress concentration in an open hole tension test on a composite plate.

More Details

Reducing the bulk of the bulk synchronous parallel model

Parallel Processing Letters

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon

For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.

More Details

A pervasive parallel framework for visualization: final report for FWP 10-014707

Moreland, Kenneth D.

We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documents the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.

More Details

Multilevel summation methods for efficient evaluation of long-range pairwise interactions in atomistic and coarse-grained molecular simulation

Bond, Stephen D.

The availability of efficient algorithms for long-range pairwise interactions is central to the success of numerous applications, ranging in scale from atomic-level modeling of materials to astrophysics. This report focuses on the implementation and analysis of the multilevel summation method for approximating long-range pairwise interactions. The computational cost of the multilevel summation method is proportional to the number of particles, N, which is an improvement over FFTbased methods whos cost is asymptotically proportional to N logN. In addition to approximating electrostatic forces, the multilevel summation method can be use to efficiently approximate convolutions with long-range kernels. As an application, we apply the multilevel summation method to a discretized integral equation formulation of the regularized generalized Poisson equation. Numerical results are presented using an implementation of the multilevel summation method in the LAMMPS software package. Preliminary results show that the computational cost of the method scales as expected, but there is still a need for further optimization.

More Details

Thermal Hydraulic Simulations, Error Estimation and Parameter Sensitivity Studies in Drekar::CFD

Shadid, John N.; Pawlowski, Roger; Cyr, Eric C.; Wildey, Timothy

This report describes work directed towards completion of the Thermal Hydraulics Methods (THM) CFD Level 3 Milestone THM.CFD.P7.05 for the Consortium for Advanced Simulation of Light Water Reactors (CASL) Nuclear Hub effort. The focus of this milestone was to demonstrate the thermal hydraulics and adjoint based error estimation and parameter sensitivity capabilities in the CFD code called Drekar::CFD. This milestone builds upon the capabilities demonstrated in three earlier milestones; THM.CFD.P4.02, completed March, 31, 2012, THM.CFD.P5.01 completed June 30, 2012 and THM.CFD.P5.01 completed on October 31, 2012.

More Details

Hybrid methods for cybersecurity analysis :

Davis, Warren L.; Dunlavy, Daniel M.

Early 2010 saw a signi cant change in adversarial techniques aimed at network intrusion: a shift from malware delivered via email attachments toward the use of hidden, embedded hyperlinks to initiate sequences of downloads and interactions with web sites and network servers containing malicious software. Enterprise security groups were well poised and experienced in defending the former attacks, but the new types of attacks were larger in number, more challenging to detect, dynamic in nature, and required the development of new technologies and analytic capabilities. The Hybrid LDRD project was aimed at delivering new capabilities in large-scale data modeling and analysis to enterprise security operators and analysts and understanding the challenges of detection and prevention of emerging cybersecurity threats. Leveraging previous LDRD research e orts and capabilities in large-scale relational data analysis, large-scale discrete data analysis and visualization, and streaming data analysis, new modeling and analysis capabilities were quickly brought to bear on the problems in email phishing and spear phishing attacks in the Sandia enterprise security operational groups at the onset of the Hybrid project. As part of this project, a software development and deployment framework was created within the security analyst work ow tool sets to facilitate the delivery and testing of new capabilities as they became available, and machine learning algorithms were developed to address the challenge of dynamic threats. Furthermore, researchers from the Hybrid project were embedded in the security analyst groups for almost a full year, engaged in daily operational activities and routines, creating an atmosphere of trust and collaboration between the researchers and security personnel. The Hybrid project has altered the way that research ideas can be incorporated into the production environments of Sandias enterprise security groups, reducing time to deployment from months and years to hours and days for the application of new modeling and analysis capabilities to emerging threats. The development and deployment framework has been generalized into the Hybrid Framework and incor- porated into several LDRD, WFO, and DOE/CSL projects and proposals. And most importantly, the Hybrid project has provided Sandia security analysts with new, scalable, extensible analytic capabilities that have resulted in alerts not detectable using their previous work ow tool sets.

More Details

Investigation of ALEGRA shock hydrocode algorithms using an exact free surface jet flow solution

Robinson, Allen C.

Computational testing of the arbitrary Lagrangian-Eulerian shock physics code, ALEGRA, is presented using an exact solution that is very similar to a shaped charge jet flow. The solution is a steady, isentropic, subsonic free surface flow with significant compression and release and is provided as a steady state initial condition. There should be no shocks and no entropy production throughout the problem. The purpose of this test problem is to present a detailed and challenging computation in order to provide evidence for algorithmic strengths and weaknesses in ALEGRA which should be examined further. The results of this work are intended to be used to guide future algorithmic improvements in the spirit of test-driven development processes.

More Details

Xyce parallel electronic simulator users' guide, Version 6.0.1

Keiter, Eric R.; Warrender, Christina E.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Thornquist, Heidi K.; Verley, Jason C.; Coffey, Todd S.; Pawlowski, Roger

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

More Details

Xyce parallel electronic simulator reference guide, Version 6.0.1

Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Pawlowski, Roger; Schiek, Richard; Coffey, Todd S.; Thornquist, Heidi K.; Verley, Jason C.; Warrender, Christina E.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

More Details

Using Simulation to Evaluate the Performance of Resilience Strategies and Process Failures

Levy, Scott L.N.; Ferreira, Kurt; Widener, Patrick

Fault-tolerance has been identified as a major challenge for future extreme-scale systems. Current predictions suggest that, as systems grow in size, failures will occur more frequently. Because increases in failure frequency reduce the performance and scalability of these systems, significant effort has been devoted to developing and refining resilience mechanisms to mitigate the impact of failures. However, effective evaluation of these mechanisms has been challenging. Current systems are smaller and have significantly different architectural features (e.g., interconnect, persistent storage) than we expect to see in next-generation systems. To overcome these challenges, we propose the use of simulation. Simulation has been shown to be an effective tool for investigating performance characteristics of applications on future systems. In this work, we: identify the set of system characteristics that are necessary for accurate performance prediction of resilience mechanisms for HPC systems and applications; demonstrate how these system characteristics can be incorporated into an existing large-scale simulator; and evaluate the predictive performance of our modified simulator. We also describe how we were able to optimize the simulator for large temporal and spatial scales-allowing the simulator to run 4x faster and use over 100x less memory.

More Details
Results 6301–6350 of 9,998
Results 6301–6350 of 9,998